0
# Pyexcel-xlsx
1
2
A Python wrapper library that provides Excel XLSX and XLSM file reading and writing capabilities using openpyxl as the backend. It offers both standalone functionality and seamless integration as a plugin for the pyexcel ecosystem, supporting memory-efficient operations, pagination, and advanced features like merged cell handling and hidden sheet processing.
3
4
## Package Information
5
6
- **Package Name**: pyexcel-xlsx
7
- **Language**: Python
8
- **Installation**: `pip install pyexcel-xlsx`
9
10
## Core Imports
11
12
```python
13
from pyexcel_xlsx import get_data, save_data
14
```
15
16
For advanced usage with reader/writer classes:
17
18
```python
19
from pyexcel_xlsx.xlsxr import XLSXBook, XLSXBookInContent, FastSheet, SlowSheet
20
from pyexcel_xlsx.xlsxw import XLSXWriter, XLSXSheetWriter
21
```
22
23
For module constants:
24
25
```python
26
from pyexcel_xlsx import __FILE_TYPE__
27
```
28
29
## Basic Usage
30
31
```python
32
from pyexcel_xlsx import get_data, save_data
33
from collections import OrderedDict
34
35
# Read an Excel file
36
data = get_data("input.xlsx")
37
print(data) # {'Sheet1': [[1, 2, 3], [4, 5, 6]], 'Sheet2': [['a', 'b', 'c']]}
38
39
# Write data to an Excel file
40
output_data = OrderedDict()
41
output_data.update({"Sheet 1": [[1, 2, 3], [4, 5, 6]]})
42
output_data.update({"Sheet 2": [["row 1", "row 2", "row 3"]]})
43
save_data("output.xlsx", output_data)
44
45
# Memory operations
46
from io import BytesIO
47
io_buffer = BytesIO()
48
save_data(io_buffer, output_data)
49
io_buffer.seek(0)
50
data_from_memory = get_data(io_buffer)
51
```
52
53
## Capabilities
54
55
### Reading Excel Files
56
57
Load data from XLSX and XLSM files with support for pagination, hidden content handling, and merged cell processing.
58
59
```python { .api }
60
def get_data(afile, file_type=None, **keywords):
61
"""
62
Read data from Excel files (xlsx, xlsm).
63
64
Parameters:
65
- afile: str or file-like object, path to Excel file or file object
66
- file_type: str, optional file type ('xlsx' or 'xlsm'), auto-detected if None
67
- library: str, optional library identifier for plugin selection
68
- start_row: int, starting row index for pagination (0-based)
69
- row_limit: int, maximum number of rows to read
70
- start_column: int, starting column index for pagination (0-based)
71
- column_limit: int, maximum number of columns to read
72
- skip_hidden_sheets: bool, whether to skip hidden sheets (default: True)
73
- detect_merged_cells: bool, whether to detect and handle merged cells (default: False)
74
- skip_hidden_row_and_column: bool, whether to skip hidden rows/columns (default: True)
75
76
Returns:
77
dict: Dictionary with sheet names as keys and list of lists as values
78
"""
79
```
80
81
### Writing Excel Files
82
83
Save data to XLSX and XLSM files with write-only optimization for memory efficiency.
84
85
```python { .api }
86
def save_data(afile, data, file_type=None, **keywords):
87
"""
88
Write data to Excel files (xlsx, xlsm).
89
90
Parameters:
91
- afile: str or file-like object, path to Excel file or file object
92
- data: dict, dictionary with sheet names as keys and data as list of lists
93
- file_type: str, optional file type ('xlsx' or 'xlsm'), auto-detected if None
94
- **keywords: additional options passed to underlying writer
95
96
Returns:
97
None
98
"""
99
```
100
101
### Module Constants
102
103
Core module constants and file type identifiers.
104
105
```python { .api }
106
__FILE_TYPE__ = "xlsx" # Default file type identifier
107
```
108
109
### Advanced Reading Classes
110
111
Low-level reader classes for custom Excel file processing with configurable options for performance and feature trade-offs.
112
113
```python { .api }
114
class XLSXBook:
115
"""
116
Excel file reader with configurable options for performance and features.
117
"""
118
119
def __init__(
120
self,
121
file_alike_object,
122
file_type,
123
skip_hidden_sheets=True,
124
detect_merged_cells=False,
125
skip_hidden_row_and_column=True,
126
**keywords
127
):
128
"""
129
Initialize Excel file reader.
130
131
Parameters:
132
- file_alike_object: str or file-like object, Excel file path or file object
133
- file_type: str, file type identifier
134
- skip_hidden_sheets: bool, whether to skip hidden sheets
135
- detect_merged_cells: bool, whether to detect merged cells (disables read-only mode)
136
- skip_hidden_row_and_column: bool, whether to skip hidden content (disables read-only mode)
137
- **keywords: additional options
138
"""
139
140
def read_sheet(self, sheet_index):
141
"""
142
Read a specific sheet by index.
143
144
Parameters:
145
- sheet_index: int, zero-based sheet index
146
147
Returns:
148
Sheet reader object (FastSheet or SlowSheet)
149
"""
150
151
def close(self):
152
"""Close the Excel workbook and free resources."""
153
154
class XLSXBookInContent(XLSXBook):
155
"""
156
Excel reader for binary content from memory.
157
"""
158
159
def __init__(self, file_content, file_type, **keywords):
160
"""
161
Initialize Excel reader from binary content.
162
163
Parameters:
164
- file_content: bytes, binary content of Excel file
165
- file_type: str, file type identifier
166
- **keywords: additional options
167
"""
168
169
class FastSheet:
170
"""
171
Fast sheet reader using openpyxl's read-only mode for performance.
172
"""
173
174
def __init__(self, sheet, **keywords):
175
"""
176
Initialize fast sheet reader.
177
178
Parameters:
179
- sheet: openpyxl worksheet object
180
- **keywords: additional options (unused)
181
"""
182
183
def row_iterator(self):
184
"""
185
Iterate through rows in the sheet.
186
187
Returns:
188
Generator yielding row objects
189
"""
190
191
def column_iterator(self, row):
192
"""
193
Iterate through columns in a row.
194
195
Parameters:
196
- row: openpyxl row object
197
198
Returns:
199
Generator yielding cell values
200
"""
201
202
class SlowSheet(FastSheet):
203
"""
204
Slower sheet reader that supports hidden content and merged cells.
205
"""
206
207
def __init__(self, sheet, **keywords):
208
"""
209
Initialize slower sheet reader with advanced features.
210
211
Parameters:
212
- sheet: openpyxl worksheet object
213
- **keywords: additional options
214
"""
215
216
def row_iterator(self):
217
"""
218
Iterate through rows, skipping hidden rows.
219
220
Returns:
221
Generator yielding (row, row_index) tuples
222
"""
223
224
def column_iterator(self, row_struct):
225
"""
226
Iterate through columns, skipping hidden columns and handling merged cells.
227
228
Parameters:
229
- row_struct: tuple of (row, row_index)
230
231
Returns:
232
Generator yielding cell values with merged cell handling
233
"""
234
```
235
236
### Advanced Writing Classes
237
238
Low-level writer classes for custom Excel file generation with write-only optimization.
239
240
```python { .api }
241
class XLSXWriter:
242
"""
243
Excel file writer using write-only mode for memory efficiency.
244
"""
245
246
def __init__(self, file_alike_object, file_type, **keywords):
247
"""
248
Initialize Excel file writer.
249
250
Parameters:
251
- file_alike_object: str or file-like object, output file path or file object
252
- file_type: str, file type identifier (unused placeholder parameter)
253
- **keywords: additional options
254
"""
255
256
def create_sheet(self, name):
257
"""
258
Create a new sheet in the workbook.
259
260
Parameters:
261
- name: str, sheet name
262
263
Returns:
264
XLSXSheetWriter: Sheet writer object
265
"""
266
267
def write(self, incoming_dict):
268
"""
269
Write dictionary data to Excel file with multiple sheets.
270
271
Parameters:
272
- incoming_dict: dict, dictionary with sheet names as keys and data as values
273
"""
274
275
def close(self):
276
"""Save the workbook to file and close."""
277
278
class XLSXSheetWriter:
279
"""
280
Individual sheet writer for Excel files.
281
"""
282
283
def __init__(self, xlsx_sheet, sheet_name="Sheet"):
284
"""
285
Initialize sheet writer.
286
287
Parameters:
288
- xlsx_sheet: openpyxl worksheet object
289
- sheet_name: str, name for the sheet
290
"""
291
292
def write_row(self, array):
293
"""
294
Write a row of data to the sheet.
295
296
Parameters:
297
- array: list, row data as list of values
298
"""
299
300
def close(self):
301
"""Close the sheet writer (no operation required)."""
302
```
303
304
## Types
305
306
```python { .api }
307
from typing import Any, Dict, List, Union
308
from io import BinaryIO, BytesIO
309
310
# Sheet data structure
311
SheetData = List[List[Any]] # List of rows, each row is a list of cell values
312
313
# Workbook data structure
314
WorkbookData = Dict[str, SheetData] # Dictionary mapping sheet names to sheet data
315
316
# File-like objects supported
317
FileTypes = Union[str, BinaryIO, BytesIO] # File paths, binary streams, or BytesIO objects
318
319
# Constants
320
FILE_TYPE = "xlsx" # Default file type identifier
321
```
322
323
## Supported Features
324
325
- **File Formats**: XLSX (Excel Open XML Spreadsheet), XLSM (Excel Open XML Macro-Enabled Spreadsheet)
326
- **Read Modes**: Read-only mode for performance, full mode for advanced features
327
- **Write Mode**: Write-only mode for memory efficiency
328
- **Pagination**: Reading subsets of data with start_row, row_limit, start_column, column_limit
329
- **Hidden Content**: Skip or include hidden sheets, rows, and columns
330
- **Merged Cells**: Detection and handling of merged cell ranges
331
- **Memory Operations**: Direct BytesIO and file-like object support
332
- **Integration**: Seamless pyexcel plugin integration
333
- **Data Types**: Automatic handling of dates, times, numbers, and text
334
- **Multiple Sheets**: Full support for multi-sheet workbooks
335
336
## Error Handling
337
338
The library integrates with pyexcel-io error handling. Common exceptions include:
339
340
- **FileNotFoundError**: When specified file doesn't exist
341
- **PermissionError**: When file is locked or access denied
342
- **ValueError**: When invalid parameters or data formats are provided
343
- **openpyxl exceptions**: Underlying Excel file format errors
344
345
## Performance Considerations
346
347
- **Read-only mode**: Enabled by default for better performance, disabled when `skip_hidden_row_and_column=True` or `detect_merged_cells=True`
348
- **Write-only mode**: Always used for writing to minimize memory usage
349
- **Pagination**: Use for large files to limit memory consumption
350
- **Hidden content**: Skipping hidden content improves performance
351
- **Merged cells**: Detection has performance overhead, only enable when needed