or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

index.md

tile.json

tessl/pypi-pyexcel-xlsx

A wrapper library to read, manipulate and write data in xlsx and xlsm format

Workspace: tessl
Visibility: Public
Created: 3 months ago
Last updated: 3 months ago
Describes: pkg:pypi/pyexcel-xlsx@0.6.x

To install, run

npx @tessl/cli install tessl/pypi-pyexcel-xlsx@0.6.0

Pyexcel-xlsx

A Python wrapper library that provides Excel XLSX and XLSM file reading and writing capabilities using openpyxl as the backend. It offers both standalone functionality and seamless integration as a plugin for the pyexcel ecosystem, supporting memory-efficient operations, pagination, and advanced features like merged cell handling and hidden sheet processing.

Package Information

Package Name: pyexcel-xlsx
Language: Python
Installation: pip install pyexcel-xlsx

Core Imports

from pyexcel_xlsx import get_data, save_data

For advanced usage with reader/writer classes:

from pyexcel_xlsx.xlsxr import XLSXBook, XLSXBookInContent, FastSheet, SlowSheet
from pyexcel_xlsx.xlsxw import XLSXWriter, XLSXSheetWriter

For module constants:

from pyexcel_xlsx import __FILE_TYPE__

Basic Usage

from pyexcel_xlsx import get_data, save_data
from collections import OrderedDict

# Read an Excel file
data = get_data("input.xlsx")
print(data)  # {'Sheet1': [[1, 2, 3], [4, 5, 6]], 'Sheet2': [['a', 'b', 'c']]}

# Write data to an Excel file
output_data = OrderedDict()
output_data.update({"Sheet 1": [[1, 2, 3], [4, 5, 6]]})
output_data.update({"Sheet 2": [["row 1", "row 2", "row 3"]]})
save_data("output.xlsx", output_data)

# Memory operations
from io import BytesIO
io_buffer = BytesIO()
save_data(io_buffer, output_data)
io_buffer.seek(0)
data_from_memory = get_data(io_buffer)

Capabilities

Reading Excel Files

Load data from XLSX and XLSM files with support for pagination, hidden content handling, and merged cell processing.

def get_data(afile, file_type=None, **keywords):
    """
    Read data from Excel files (xlsx, xlsm).
    
    Parameters:
    - afile: str or file-like object, path to Excel file or file object
    - file_type: str, optional file type ('xlsx' or 'xlsm'), auto-detected if None
    - library: str, optional library identifier for plugin selection
    - start_row: int, starting row index for pagination (0-based)
    - row_limit: int, maximum number of rows to read
    - start_column: int, starting column index for pagination (0-based)
    - column_limit: int, maximum number of columns to read
    - skip_hidden_sheets: bool, whether to skip hidden sheets (default: True)
    - detect_merged_cells: bool, whether to detect and handle merged cells (default: False)
    - skip_hidden_row_and_column: bool, whether to skip hidden rows/columns (default: True)
    
    Returns:
    dict: Dictionary with sheet names as keys and list of lists as values
    """

Writing Excel Files

Save data to XLSX and XLSM files with write-only optimization for memory efficiency.

def save_data(afile, data, file_type=None, **keywords):
    """
    Write data to Excel files (xlsx, xlsm).
    
    Parameters:
    - afile: str or file-like object, path to Excel file or file object
    - data: dict, dictionary with sheet names as keys and data as list of lists
    - file_type: str, optional file type ('xlsx' or 'xlsm'), auto-detected if None
    - **keywords: additional options passed to underlying writer
    
    Returns:
    None
    """

Module Constants

Core module constants and file type identifiers.

__FILE_TYPE__ = "xlsx"  # Default file type identifier

Advanced Reading Classes

Low-level reader classes for custom Excel file processing with configurable options for performance and feature trade-offs.

class XLSXBook:
    """
    Excel file reader with configurable options for performance and features.
    """
    
    def __init__(
        self,
        file_alike_object,
        file_type,
        skip_hidden_sheets=True,
        detect_merged_cells=False,
        skip_hidden_row_and_column=True,
        **keywords
    ):
        """
        Initialize Excel file reader.
        
        Parameters:
        - file_alike_object: str or file-like object, Excel file path or file object
        - file_type: str, file type identifier
        - skip_hidden_sheets: bool, whether to skip hidden sheets
        - detect_merged_cells: bool, whether to detect merged cells (disables read-only mode)
        - skip_hidden_row_and_column: bool, whether to skip hidden content (disables read-only mode)
        - **keywords: additional options
        """
    
    def read_sheet(self, sheet_index):
        """
        Read a specific sheet by index.
        
        Parameters:
        - sheet_index: int, zero-based sheet index
        
        Returns:
        Sheet reader object (FastSheet or SlowSheet)
        """
    
    def close(self):
        """Close the Excel workbook and free resources."""

class XLSXBookInContent(XLSXBook):
    """
    Excel reader for binary content from memory.
    """
    
    def __init__(self, file_content, file_type, **keywords):
        """
        Initialize Excel reader from binary content.
        
        Parameters:
        - file_content: bytes, binary content of Excel file
        - file_type: str, file type identifier
        - **keywords: additional options
        """

class FastSheet:
    """
    Fast sheet reader using openpyxl's read-only mode for performance.
    """
    
    def __init__(self, sheet, **keywords):
        """
        Initialize fast sheet reader.
        
        Parameters:
        - sheet: openpyxl worksheet object
        - **keywords: additional options (unused)
        """
    
    def row_iterator(self):
        """
        Iterate through rows in the sheet.
        
        Returns:
        Generator yielding row objects
        """
    
    def column_iterator(self, row):
        """
        Iterate through columns in a row.
        
        Parameters:
        - row: openpyxl row object
        
        Returns:
        Generator yielding cell values
        """

class SlowSheet(FastSheet):
    """
    Slower sheet reader that supports hidden content and merged cells.
    """
    
    def __init__(self, sheet, **keywords):
        """
        Initialize slower sheet reader with advanced features.
        
        Parameters:
        - sheet: openpyxl worksheet object
        - **keywords: additional options
        """
    
    def row_iterator(self):
        """
        Iterate through rows, skipping hidden rows.
        
        Returns:
        Generator yielding (row, row_index) tuples
        """
    
    def column_iterator(self, row_struct):
        """
        Iterate through columns, skipping hidden columns and handling merged cells.
        
        Parameters:
        - row_struct: tuple of (row, row_index)
        
        Returns:
        Generator yielding cell values with merged cell handling
        """

Advanced Writing Classes

Low-level writer classes for custom Excel file generation with write-only optimization.

class XLSXWriter:
    """
    Excel file writer using write-only mode for memory efficiency.
    """
    
    def __init__(self, file_alike_object, file_type, **keywords):
        """
        Initialize Excel file writer.
        
        Parameters:
        - file_alike_object: str or file-like object, output file path or file object
        - file_type: str, file type identifier (unused placeholder parameter)
        - **keywords: additional options
        """
    
    def create_sheet(self, name):
        """
        Create a new sheet in the workbook.
        
        Parameters:
        - name: str, sheet name
        
        Returns:
        XLSXSheetWriter: Sheet writer object
        """
    
    def write(self, incoming_dict):
        """
        Write dictionary data to Excel file with multiple sheets.
        
        Parameters:
        - incoming_dict: dict, dictionary with sheet names as keys and data as values
        """
    
    def close(self):
        """Save the workbook to file and close."""

class XLSXSheetWriter:
    """
    Individual sheet writer for Excel files.
    """
    
    def __init__(self, xlsx_sheet, sheet_name="Sheet"):
        """
        Initialize sheet writer.
        
        Parameters:
        - xlsx_sheet: openpyxl worksheet object
        - sheet_name: str, name for the sheet
        """
    
    def write_row(self, array):
        """
        Write a row of data to the sheet.
        
        Parameters:
        - array: list, row data as list of values
        """
    
    def close(self):
        """Close the sheet writer (no operation required)."""

Types

from typing import Any, Dict, List, Union
from io import BinaryIO, BytesIO

# Sheet data structure
SheetData = List[List[Any]]  # List of rows, each row is a list of cell values

# Workbook data structure  
WorkbookData = Dict[str, SheetData]  # Dictionary mapping sheet names to sheet data

# File-like objects supported
FileTypes = Union[str, BinaryIO, BytesIO]  # File paths, binary streams, or BytesIO objects

# Constants
FILE_TYPE = "xlsx"  # Default file type identifier

Supported Features

File Formats: XLSX (Excel Open XML Spreadsheet), XLSM (Excel Open XML Macro-Enabled Spreadsheet)
Read Modes: Read-only mode for performance, full mode for advanced features
Write Mode: Write-only mode for memory efficiency
Pagination: Reading subsets of data with start_row, row_limit, start_column, column_limit
Hidden Content: Skip or include hidden sheets, rows, and columns
Merged Cells: Detection and handling of merged cell ranges
Memory Operations: Direct BytesIO and file-like object support
Integration: Seamless pyexcel plugin integration
Data Types: Automatic handling of dates, times, numbers, and text
Multiple Sheets: Full support for multi-sheet workbooks

Error Handling

The library integrates with pyexcel-io error handling. Common exceptions include:

FileNotFoundError: When specified file doesn't exist
PermissionError: When file is locked or access denied
ValueError: When invalid parameters or data formats are provided
openpyxl exceptions: Underlying Excel file format errors

Performance Considerations

Read-only mode: Enabled by default for better performance, disabled when skip_hidden_row_and_column=True or detect_merged_cells=True
Write-only mode: Always used for writing to minimize memory usage
Pagination: Use for large files to limit memory consumption
Hidden content: Skipping hidden content improves performance
Merged cells: Detection has performance overhead, only enable when needed