or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

index.md

index.mddocs/

0

# pyexcel-xls

1

2

A Python wrapper library for reading, manipulating, and writing data in Excel XLS format, with support for XLSX and XLSM formats depending on the xlrd version. Built as both a standalone library and a plugin for the pyexcel ecosystem, providing seamless Excel file operations with advanced features like pagination, merged cell detection, and hidden row/column handling.

3

4

## Package Information

5

6

- **Package Name**: pyexcel-xls

7

- **Language**: Python

8

- **Installation**: `pip install pyexcel-xls`

9

- **Dependencies**: xlrd, xlwt, pyexcel-io

10

11

## Core Imports

12

13

```python

14

from pyexcel_xls import get_data, save_data

15

```

16

17

For internal I/O functionality (re-exported from pyexcel-io):

18

19

```python

20

from pyexcel_xls import read_data, write_data, isstream

21

```

22

23

For version checking:

24

25

```python

26

from pyexcel_xls import XLRD_VERSION_2_OR_ABOVE, supported_file_formats

27

```

28

29

For utility functions:

30

31

```python

32

from pyexcel_xls.xlsr import xldate_to_python_date

33

```

34

35

## Basic Usage

36

37

```python

38

from pyexcel_xls import get_data, save_data

39

from collections import OrderedDict

40

41

# Read data from an Excel file

42

data = get_data("input.xls")

43

print(data) # {"Sheet1": [[1, 2, 3], [4, 5, 6]], "Sheet2": [["a", "b", "c"]]}

44

45

# Write data to an Excel file

46

output_data = OrderedDict([

47

("Sheet 1", [[1, 2, 3], [4, 5, 6]]),

48

("Sheet 2", [["row 1", "row 2", "row 3"]])

49

])

50

save_data("output.xls", output_data)

51

52

# Work with memory streams

53

from io import BytesIO

54

55

# Write to memory

56

stream = BytesIO()

57

save_data(stream, output_data)

58

59

# Read from memory

60

stream.seek(0)

61

memory_data = get_data(stream)

62

```

63

64

## Architecture

65

66

pyexcel-xls operates as a bridge between Excel files and Python data structures through a plugin-based architecture:

67

68

- **Plugin System**: Registers with pyexcel-io as a format handler for XLS/XLSX/XLSM files

69

- **Reader Classes**: `XLSReader`, `XLSInFile`, `XLSInMemory`, `XLSInContent` handle different input sources

70

- **Writer Classes**: `XLSWriter`, `XLSheetWriter` handle XLS file creation using xlwt

71

- **Sheet Interface**: `XLSheet` provides unified access to worksheet data with optional features

72

- **xlrd Integration**: Uses xlrd library for reading, with version-dependent format support

73

- **xlwt Integration**: Uses xlwt library for writing XLS files with automatic date/time formatting

74

75

The architecture supports both standalone usage and integration with the broader pyexcel ecosystem, allowing seamless data exchange between Excel files and Python applications.

76

77

## Capabilities

78

79

### Reading Excel Files

80

81

Read data from XLS, XLSX, and XLSM files (format support depends on xlrd version).

82

83

```python { .api }

84

def get_data(afile, file_type=None, **keywords):

85

"""

86

Standalone function for reading Excel files.

87

88

Parameters:

89

- afile: str | file-like | bytes - File path, file object, or binary content

90

- file_type: str | None - File type override (defaults to 'xls' for streams)

91

- **keywords: dict - Additional options for controlling read behavior

92

93

Keyword Arguments:

94

- start_row: int - Starting row index for pagination (0-based)

95

- row_limit: int - Maximum number of rows to read

96

- start_column: int - Starting column index for pagination (0-based)

97

- column_limit: int - Maximum number of columns to read

98

- skip_hidden_row_and_column: bool - Skip hidden rows/columns (default: True, XLS only)

99

- detect_merged_cells: bool - Detect and handle merged cells (default: False)

100

- skip_hidden_sheets: bool - Skip hidden worksheets (default: True)

101

- filename: str - For xlrd file opening

102

- logfile: file-like - For xlrd logging

103

- verbosity: int - xlrd verbosity level

104

- use_mmap: bool - Use memory mapping for xlrd

105

- file_contents: bytes - Direct file content for xlrd

106

- encoding_override: str - Character encoding override for xlrd

107

- formatting_info: bool - Preserve formatting information (xlrd)

108

- on_demand: bool - Load sheets on demand (xlrd)

109

- ragged_rows: bool - Handle ragged rows (xlrd)

110

- auto_detect_int: bool - Convert floats without decimals to integers (default: True)

111

112

Returns:

113

dict - Dictionary with sheet names as keys and sheet data as 2D lists

114

"""

115

```

116

117

### Writing Excel Files

118

119

Write data to XLS files with automatic formatting for date/time values.

120

121

```python { .api }

122

def save_data(afile, data, file_type=None, **keywords):

123

"""

124

Standalone function for writing Excel files.

125

126

Parameters:

127

- afile: str | file-like - File path or file object to write to

128

- data: dict - Dictionary with sheet names as keys and 2D list data as values

129

- file_type: str | None - File type override (defaults to 'xls' for streams)

130

- **keywords: dict - Additional options for controlling write behavior

131

132

Keyword Arguments:

133

- encoding: str - Character encoding for XLS files (default: 'ascii')

134

- style_compression: int - Style compression level (default: 2)

135

136

Returns:

137

None

138

"""

139

```

140

141

### Low-Level I/O Functions

142

143

Direct access to pyexcel-io functionality for advanced use cases.

144

145

```python { .api }

146

def read_data(afile, file_type=None, **keywords):

147

"""

148

Low-level reading function (re-exported from pyexcel_io.io.get_data).

149

Identical to get_data() but provided for consistency with pyexcel-io API.

150

151

Parameters and returns: Same as get_data()

152

"""

153

154

def write_data(afile, data, file_type=None, **keywords):

155

"""

156

Low-level writing function (re-exported from pyexcel_io.io.save_data).

157

Identical to save_data() but provided for consistency with pyexcel-io API.

158

159

Parameters and returns: Same as save_data()

160

"""

161

162

def isstream(afile):

163

"""

164

Check if an object is a stream.

165

166

Parameters:

167

- afile: any - Object to check

168

169

Returns:

170

bool - True if object is a stream

171

"""

172

```

173

174

### Version and Format Information

175

176

Access version-dependent format support and compatibility information.

177

178

```python { .api }

179

XLRD_VERSION_2_OR_ABOVE: bool

180

# Boolean indicating if xlrd version 2.0 or greater is installed

181

182

supported_file_formats: list[str]

183

# List of supported file formats, varies by xlrd version

184

# ["xls", "xlsx", "xlsm"] for xlrd < 2.0

185

# ["xls"] for xlrd >= 2.0

186

187

def xlrd_version_2_or_greater():

188

"""

189

Check if xlrd version 2.0 or greater is installed.

190

191

Returns:

192

bool - True if xlrd version is 2.0 or greater

193

"""

194

```

195

196

### Advanced Reading Features

197

198

#### Pagination Support

199

200

Read large files in chunks to manage memory usage:

201

202

```python

203

# Read specific row range

204

partial_data = get_data("large_file.xls", start_row=10, row_limit=50)

205

206

# Read specific column range

207

column_data = get_data("large_file.xls", start_column=2, column_limit=5)

208

209

# Combine row and column ranges

210

subset_data = get_data("large_file.xls",

211

start_row=10, row_limit=20,

212

start_column=2, column_limit=8)

213

```

214

215

#### Merged Cell Detection

216

217

Handle merged cells by spreading values across all merged cells:

218

219

```python

220

# Enable merged cell detection (may impact performance)

221

data = get_data("file_with_merged_cells.xls", detect_merged_cells=True)

222

```

223

224

#### Hidden Content Handling

225

226

Control visibility of hidden sheets, rows, and columns:

227

228

```python

229

# Include hidden sheets in output

230

data = get_data("file.xls", skip_hidden_sheets=False)

231

232

# Include hidden rows and columns (XLS only)

233

data = get_data("file.xls", skip_hidden_row_and_column=False)

234

```

235

236

### Integration with pyexcel

237

238

When installed, pyexcel-xls automatically registers as a plugin for pyexcel:

239

240

```python

241

import pyexcel as pe

242

243

# Read through pyexcel interface

244

book = pe.get_book(file_name="data.xls")

245

sheet = pe.get_sheet(file_name="data.xls")

246

247

# Write through pyexcel interface

248

sheet.save_as("output.xls")

249

```

250

251

### Utility Functions

252

253

Additional utility functions for date handling and type conversion.

254

255

```python { .api }

256

def xldate_to_python_date(value, date_mode):

257

"""

258

Convert Excel date value to Python date/time object.

259

260

Parameters:

261

- value: float - Excel date serial number

262

- date_mode: int - Date mode (0 for 1900 system, 1 for 1904 system)

263

264

Returns:

265

datetime.date | datetime.time | datetime.datetime - Converted date/time object

266

"""

267

```

268

269

## Types

270

271

```python { .api }

272

# Data structure for Excel workbooks

273

ExcelData = dict[str, list[list[any]]]

274

# Dictionary mapping sheet names to 2D lists of cell values

275

276

# Supported file formats (varies by xlrd version)

277

SupportedFormats = list[str] # ["xls", "xlsx", "xlsm"] or ["xls"]

278

279

# File input types

280

FileInput = str | BytesIO | BinaryIO | bytes

281

# File path, file-like object, or binary content

282

283

# Sheet data structure

284

SheetData = list[list[any]]

285

# 2D list representing rows and columns of cell values

286

287

# Date mode constants (for xldate_to_python_date)

288

DateMode = int # 0 for 1900 system, 1 for 1904 system

289

290

# Cell value types that may be encountered

291

CellValue = str | int | float | datetime.date | datetime.time | datetime.datetime | bool | None

292

# Individual cell values after type conversion

293

294

# xlrd keyword parameters for advanced reading

295

XlrdKeywords = dict[str, any]

296

# Valid keys: filename, logfile, verbosity, use_mmap, file_contents,

297

# encoding_override, formatting_info, on_demand, ragged_rows

298

```

299

300

## Version Compatibility

301

302

- **xlrd >= 2.0**: Only supports XLS format (xlsx/xlsm support removed by xlrd)

303

- **xlrd < 2.0**: Supports XLS, XLSX, and XLSM formats

304

- **Python**: 3.6+ required

305

- **Writing**: Only XLS format supported (uses xlwt)

306

307

## Limitations

308

309

- Fonts, colors, and charts are not supported

310

- Cannot read password-protected files

311

- XLS files limited to ~65,000 rows

312

- Writing only supports XLS format (not XLSX/XLSM)

313

- Advanced formatting information is not preserved during read/write operations