A wrapper library to read, manipulate and write data in xls format, supporting xlsx and xlsm formats
npx @tessl/cli install tessl/pypi-pyexcel-xls@0.7.00
# pyexcel-xls
1
2
A Python wrapper library for reading, manipulating, and writing data in Excel XLS format, with support for XLSX and XLSM formats depending on the xlrd version. Built as both a standalone library and a plugin for the pyexcel ecosystem, providing seamless Excel file operations with advanced features like pagination, merged cell detection, and hidden row/column handling.
3
4
## Package Information
5
6
- **Package Name**: pyexcel-xls
7
- **Language**: Python
8
- **Installation**: `pip install pyexcel-xls`
9
- **Dependencies**: xlrd, xlwt, pyexcel-io
10
11
## Core Imports
12
13
```python
14
from pyexcel_xls import get_data, save_data
15
```
16
17
For internal I/O functionality (re-exported from pyexcel-io):
18
19
```python
20
from pyexcel_xls import read_data, write_data, isstream
21
```
22
23
For version checking:
24
25
```python
26
from pyexcel_xls import XLRD_VERSION_2_OR_ABOVE, supported_file_formats
27
```
28
29
For utility functions:
30
31
```python
32
from pyexcel_xls.xlsr import xldate_to_python_date
33
```
34
35
## Basic Usage
36
37
```python
38
from pyexcel_xls import get_data, save_data
39
from collections import OrderedDict
40
41
# Read data from an Excel file
42
data = get_data("input.xls")
43
print(data) # {"Sheet1": [[1, 2, 3], [4, 5, 6]], "Sheet2": [["a", "b", "c"]]}
44
45
# Write data to an Excel file
46
output_data = OrderedDict([
47
("Sheet 1", [[1, 2, 3], [4, 5, 6]]),
48
("Sheet 2", [["row 1", "row 2", "row 3"]])
49
])
50
save_data("output.xls", output_data)
51
52
# Work with memory streams
53
from io import BytesIO
54
55
# Write to memory
56
stream = BytesIO()
57
save_data(stream, output_data)
58
59
# Read from memory
60
stream.seek(0)
61
memory_data = get_data(stream)
62
```
63
64
## Architecture
65
66
pyexcel-xls operates as a bridge between Excel files and Python data structures through a plugin-based architecture:
67
68
- **Plugin System**: Registers with pyexcel-io as a format handler for XLS/XLSX/XLSM files
69
- **Reader Classes**: `XLSReader`, `XLSInFile`, `XLSInMemory`, `XLSInContent` handle different input sources
70
- **Writer Classes**: `XLSWriter`, `XLSheetWriter` handle XLS file creation using xlwt
71
- **Sheet Interface**: `XLSheet` provides unified access to worksheet data with optional features
72
- **xlrd Integration**: Uses xlrd library for reading, with version-dependent format support
73
- **xlwt Integration**: Uses xlwt library for writing XLS files with automatic date/time formatting
74
75
The architecture supports both standalone usage and integration with the broader pyexcel ecosystem, allowing seamless data exchange between Excel files and Python applications.
76
77
## Capabilities
78
79
### Reading Excel Files
80
81
Read data from XLS, XLSX, and XLSM files (format support depends on xlrd version).
82
83
```python { .api }
84
def get_data(afile, file_type=None, **keywords):
85
"""
86
Standalone function for reading Excel files.
87
88
Parameters:
89
- afile: str | file-like | bytes - File path, file object, or binary content
90
- file_type: str | None - File type override (defaults to 'xls' for streams)
91
- **keywords: dict - Additional options for controlling read behavior
92
93
Keyword Arguments:
94
- start_row: int - Starting row index for pagination (0-based)
95
- row_limit: int - Maximum number of rows to read
96
- start_column: int - Starting column index for pagination (0-based)
97
- column_limit: int - Maximum number of columns to read
98
- skip_hidden_row_and_column: bool - Skip hidden rows/columns (default: True, XLS only)
99
- detect_merged_cells: bool - Detect and handle merged cells (default: False)
100
- skip_hidden_sheets: bool - Skip hidden worksheets (default: True)
101
- filename: str - For xlrd file opening
102
- logfile: file-like - For xlrd logging
103
- verbosity: int - xlrd verbosity level
104
- use_mmap: bool - Use memory mapping for xlrd
105
- file_contents: bytes - Direct file content for xlrd
106
- encoding_override: str - Character encoding override for xlrd
107
- formatting_info: bool - Preserve formatting information (xlrd)
108
- on_demand: bool - Load sheets on demand (xlrd)
109
- ragged_rows: bool - Handle ragged rows (xlrd)
110
- auto_detect_int: bool - Convert floats without decimals to integers (default: True)
111
112
Returns:
113
dict - Dictionary with sheet names as keys and sheet data as 2D lists
114
"""
115
```
116
117
### Writing Excel Files
118
119
Write data to XLS files with automatic formatting for date/time values.
120
121
```python { .api }
122
def save_data(afile, data, file_type=None, **keywords):
123
"""
124
Standalone function for writing Excel files.
125
126
Parameters:
127
- afile: str | file-like - File path or file object to write to
128
- data: dict - Dictionary with sheet names as keys and 2D list data as values
129
- file_type: str | None - File type override (defaults to 'xls' for streams)
130
- **keywords: dict - Additional options for controlling write behavior
131
132
Keyword Arguments:
133
- encoding: str - Character encoding for XLS files (default: 'ascii')
134
- style_compression: int - Style compression level (default: 2)
135
136
Returns:
137
None
138
"""
139
```
140
141
### Low-Level I/O Functions
142
143
Direct access to pyexcel-io functionality for advanced use cases.
144
145
```python { .api }
146
def read_data(afile, file_type=None, **keywords):
147
"""
148
Low-level reading function (re-exported from pyexcel_io.io.get_data).
149
Identical to get_data() but provided for consistency with pyexcel-io API.
150
151
Parameters and returns: Same as get_data()
152
"""
153
154
def write_data(afile, data, file_type=None, **keywords):
155
"""
156
Low-level writing function (re-exported from pyexcel_io.io.save_data).
157
Identical to save_data() but provided for consistency with pyexcel-io API.
158
159
Parameters and returns: Same as save_data()
160
"""
161
162
def isstream(afile):
163
"""
164
Check if an object is a stream.
165
166
Parameters:
167
- afile: any - Object to check
168
169
Returns:
170
bool - True if object is a stream
171
"""
172
```
173
174
### Version and Format Information
175
176
Access version-dependent format support and compatibility information.
177
178
```python { .api }
179
XLRD_VERSION_2_OR_ABOVE: bool
180
# Boolean indicating if xlrd version 2.0 or greater is installed
181
182
supported_file_formats: list[str]
183
# List of supported file formats, varies by xlrd version
184
# ["xls", "xlsx", "xlsm"] for xlrd < 2.0
185
# ["xls"] for xlrd >= 2.0
186
187
def xlrd_version_2_or_greater():
188
"""
189
Check if xlrd version 2.0 or greater is installed.
190
191
Returns:
192
bool - True if xlrd version is 2.0 or greater
193
"""
194
```
195
196
### Advanced Reading Features
197
198
#### Pagination Support
199
200
Read large files in chunks to manage memory usage:
201
202
```python
203
# Read specific row range
204
partial_data = get_data("large_file.xls", start_row=10, row_limit=50)
205
206
# Read specific column range
207
column_data = get_data("large_file.xls", start_column=2, column_limit=5)
208
209
# Combine row and column ranges
210
subset_data = get_data("large_file.xls",
211
start_row=10, row_limit=20,
212
start_column=2, column_limit=8)
213
```
214
215
#### Merged Cell Detection
216
217
Handle merged cells by spreading values across all merged cells:
218
219
```python
220
# Enable merged cell detection (may impact performance)
221
data = get_data("file_with_merged_cells.xls", detect_merged_cells=True)
222
```
223
224
#### Hidden Content Handling
225
226
Control visibility of hidden sheets, rows, and columns:
227
228
```python
229
# Include hidden sheets in output
230
data = get_data("file.xls", skip_hidden_sheets=False)
231
232
# Include hidden rows and columns (XLS only)
233
data = get_data("file.xls", skip_hidden_row_and_column=False)
234
```
235
236
### Integration with pyexcel
237
238
When installed, pyexcel-xls automatically registers as a plugin for pyexcel:
239
240
```python
241
import pyexcel as pe
242
243
# Read through pyexcel interface
244
book = pe.get_book(file_name="data.xls")
245
sheet = pe.get_sheet(file_name="data.xls")
246
247
# Write through pyexcel interface
248
sheet.save_as("output.xls")
249
```
250
251
### Utility Functions
252
253
Additional utility functions for date handling and type conversion.
254
255
```python { .api }
256
def xldate_to_python_date(value, date_mode):
257
"""
258
Convert Excel date value to Python date/time object.
259
260
Parameters:
261
- value: float - Excel date serial number
262
- date_mode: int - Date mode (0 for 1900 system, 1 for 1904 system)
263
264
Returns:
265
datetime.date | datetime.time | datetime.datetime - Converted date/time object
266
"""
267
```
268
269
## Types
270
271
```python { .api }
272
# Data structure for Excel workbooks
273
ExcelData = dict[str, list[list[any]]]
274
# Dictionary mapping sheet names to 2D lists of cell values
275
276
# Supported file formats (varies by xlrd version)
277
SupportedFormats = list[str] # ["xls", "xlsx", "xlsm"] or ["xls"]
278
279
# File input types
280
FileInput = str | BytesIO | BinaryIO | bytes
281
# File path, file-like object, or binary content
282
283
# Sheet data structure
284
SheetData = list[list[any]]
285
# 2D list representing rows and columns of cell values
286
287
# Date mode constants (for xldate_to_python_date)
288
DateMode = int # 0 for 1900 system, 1 for 1904 system
289
290
# Cell value types that may be encountered
291
CellValue = str | int | float | datetime.date | datetime.time | datetime.datetime | bool | None
292
# Individual cell values after type conversion
293
294
# xlrd keyword parameters for advanced reading
295
XlrdKeywords = dict[str, any]
296
# Valid keys: filename, logfile, verbosity, use_mmap, file_contents,
297
# encoding_override, formatting_info, on_demand, ragged_rows
298
```
299
300
## Version Compatibility
301
302
- **xlrd >= 2.0**: Only supports XLS format (xlsx/xlsm support removed by xlrd)
303
- **xlrd < 2.0**: Supports XLS, XLSX, and XLSM formats
304
- **Python**: 3.6+ required
305
- **Writing**: Only XLS format supported (uses xlwt)
306
307
## Limitations
308
309
- Fonts, colors, and charts are not supported
310
- Cannot read password-protected files
311
- XLS files limited to ~65,000 rows
312
- Writing only supports XLS format (not XLSX/XLSM)
313
- Advanced formatting information is not preserved during read/write operations