or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

index.md

tile.json

tessl/pypi-pyxdf

Python library for importing XDF (Extensible Data Format) files used in neuroscience and biosignal research

Workspace: tessl
Visibility: Public
Created: 3 months ago
Last updated: 3 months ago
Describes: pkg:pypi/pyxdf@1.17.x

To install, run

npx @tessl/cli install tessl/pypi-pyxdf@1.17.0

pyxdf

A Python library for importing XDF (Extensible Data Format) files commonly used in neuroscience and biosignal research. PyXDF provides a simple interface to load multi-stream time-series data recorded from Lab Streaming Layer (LSL) systems, supporting various data formats and advanced processing features like clock synchronization and jitter removal.

Package Information

Package Name: pyxdf
Language: Python
Installation: pip install pyxdf
Requirements: Python 3.9+, numpy>=2.0.2

Core Imports

import pyxdf

Direct function imports:

from pyxdf import load_xdf, resolve_streams, match_streaminfos

Advanced imports (for low-level operations):

from pyxdf.pyxdf import open_xdf, parse_xdf, parse_chunks

Basic Usage

import pyxdf
import matplotlib.pyplot as plt
import numpy as np

# Load an XDF file
streams, header = pyxdf.load_xdf("recording.xdf")

# Process each stream
for stream in streams:
    y = stream["time_series"]
    
    if isinstance(y, list):
        # String markers - draw vertical lines
        for timestamp, marker in zip(stream["time_stamps"], y):
            plt.axvline(x=timestamp)
            print(f'Marker "{marker[0]}" @ {timestamp:.2f}s')
    elif isinstance(y, np.ndarray):
        # Numeric data - plot as lines
        plt.plot(stream["time_stamps"], y)
    else:
        raise RuntimeError("Unknown stream format")

plt.show()

Architecture

PyXDF operates on the XDF (Extensible Data Format) specification, processing multi-stream recordings with:

Chunks: Atomic file units containing headers, samples, clock offsets, or footers
Streams: Individual data sources with metadata, timestamps, and time-series data
Clock Synchronization: Robust timestamp alignment across streams using ClockOffset chunks
Jitter Removal: Regularization of sampling intervals for improved data quality

The library handles file corruption gracefully, supports compressed files (.xdfz), and provides advanced processing options for research applications requiring high temporal precision.

Capabilities

XDF File Loading

Core functionality for importing XDF files with comprehensive data processing, stream selection, and timing corrections.

def load_xdf(
    filename,
    select_streams=None,
    *,
    on_chunk=None,
    synchronize_clocks=True,
    handle_clock_resets=True,
    dejitter_timestamps=True,
    jitter_break_threshold_seconds=1,
    jitter_break_threshold_samples=500,
    clock_reset_threshold_seconds=5,
    clock_reset_threshold_stds=5,
    clock_reset_threshold_offset_seconds=1,
    clock_reset_threshold_offset_stds=10,
    winsor_threshold=0.0001,
    verbose=None,
):
    """
    Import an XDF file with optional stream selection and processing.

    Args:
        filename (str): Path to XDF file (*.xdf or *.xdfz)
        select_streams (int | list[int] | list[dict] | None): Stream selection criteria
        on_chunk (callable, optional): Callback function for chunk processing
        synchronize_clocks (bool): Enable clock synchronization (default: True)
        handle_clock_resets (bool): Handle computer restarts during recording (default: True)
        dejitter_timestamps (bool): Perform jitter removal for regular streams (default: True)
        jitter_break_threshold_seconds (float): Break detection threshold in seconds (default: 1)
        jitter_break_threshold_samples (int): Break detection threshold in samples (default: 500)
        clock_reset_threshold_seconds (float): Clock reset detection threshold (default: 5)
        clock_reset_threshold_stds (float): Reset detection in standard deviations (default: 5)
        clock_reset_threshold_offset_seconds (float): Offset threshold for resets (default: 1)
        clock_reset_threshold_offset_stds (float): Offset threshold in stds (default: 10)
        winsor_threshold (float): Robust fitting threshold (default: 0.0001)
        verbose (bool | None): Logging level control

    Returns:
        tuple[list[dict], dict]: (streams, fileheader)
            - streams: List of stream dictionaries
            - fileheader: File header metadata
    """

Stream Data Structure

Each stream in the returned list contains:

# Stream dictionary structure
stream = {
    "time_series": Union[np.ndarray, list],  # Channel x Sample data or string markers
    "time_stamps": np.ndarray,               # Sample timestamps (synchronized)
    "info": {                                # Stream metadata
        "name": list[str],                   # Stream name
        "type": list[str],                   # Content type (EEG, Events, etc.)
        "channel_count": list[str],          # Number of channels
        "channel_format": list[str],         # Data format (int8, float32, etc.)
        "nominal_srate": list[str],          # Declared sampling rate
        "effective_srate": float,            # Measured sampling rate
        "stream_id": int,                    # Unique stream identifier
        "segments": list[tuple[int, int]],   # Data break segments (start, end)
        "desc": dict,                        # Domain-specific metadata
    },
    "clock_times": list[float],              # Clock measurement times
    "clock_values": list[float],             # Clock offset values
}

Supported Data Formats

Numeric: int8, int16, int32, int64, float32, double64
String: Marker and event data as string arrays
File Formats: Uncompressed (.xdf) and gzip-compressed (.xdfz, .xdf.gz)

Stream Discovery and Selection

Utilities for discovering streams in XDF files and selecting streams based on criteria.

def resolve_streams(fname):
    """
    Resolve streams in given XDF file without loading data.

    Args:
        fname (str): Path to XDF file

    Returns:
        list[dict]: Stream information dictionaries with metadata
    """

def match_streaminfos(stream_infos, parameters):
    """
    Find stream IDs matching specified criteria.

    Args:
        stream_infos (list[dict]): Stream information from resolve_streams
        parameters (list[dict]): Matching criteria as key-value pairs

    Returns:
        list[int]: Stream IDs matching all criteria

    Examples:
        # Match streams by name
        match_streaminfos(infos, [{"name": "EEG"}])
        
        # Match by type and name
        match_streaminfos(infos, [{"type": "EEG", "name": "ActiChamp"}])
        
        # Match multiple criteria (OR logic)
        match_streaminfos(infos, [{"type": "EEG"}, {"name": "Markers"}])
    """

Low-Level File Operations

Advanced utilities for direct XDF file handling and chunk-level processing.

def open_xdf(file):
    """
    Open XDF file for reading with format validation.

    Args:
        file (str | pathlib.Path | io.RawIOBase): File path or opened binary file handle

    Returns:
        io.BufferedReader | gzip.GzipFile: Opened file handle positioned after magic bytes

    Raises:
        IOError: If file is not a valid XDF file (missing XDF: magic bytes)
        ValueError: If file handle is opened in text mode
        Exception: If file does not exist
    """

def parse_xdf(fname):
    """
    Parse and return all chunks from an XDF file without processing.

    Args:
        fname (str): Path to XDF file

    Returns:
        list[dict]: Raw chunks containing headers, samples, and metadata
    """

def parse_chunks(chunks):
    """
    Extract stream information from parsed XDF chunks.

    Args:
        chunks (list[dict]): Raw chunks from parse_xdf

    Returns:
        list[dict]: Stream metadata dictionaries suitable for resolve_streams
    """

Stream Selection Examples

# Load specific stream by ID
streams, _ = pyxdf.load_xdf("file.xdf", select_streams=5)

# Load multiple streams by ID  
streams, _ = pyxdf.load_xdf("file.xdf", select_streams=[1, 3, 5])

# Load streams by criteria
streams, _ = pyxdf.load_xdf("file.xdf", select_streams=[{"type": "EEG"}])

# Load streams matching name and type
criteria = [{"type": "EEG", "name": "BrainAmp"}]
streams, _ = pyxdf.load_xdf("file.xdf", select_streams=criteria)

Command Line Tools

Python modules providing command-line utilities for XDF file inspection and playback.

Metadata Inspection

# python -m pyxdf.cli.print_metadata -f=/path/to/file.xdf

Prints stream metadata including:

Stream count and basic information
Channel counts and data shapes
Sampling rates (nominal and effective)
Stream durations and segment information
Unique identifiers and stream types

LSL Playback (requires pylsl)

# python -m pyxdf.cli.playback_lsl filename [options]

Replays XDF data over Lab Streaming Layer (LSL) streams in real-time with configurable options:

Parameters:

filename (str): Path to the XDF file to playback (required)
--playback_speed (float): Playback speed multiplier (default: 1.0)
--loop: Loop playback of the file continuously (flag, default: False)
--wait_for_consumer: Wait for LSL consumer before starting playback (flag, default: False)

Features:

Real-time playback: Maintains original timing relationships between streams
Loop mode: Continuous playback for prototyping and testing
Rate control: Adjustable playback speed for faster or slower replay
Consumer waiting: Optional wait for LSL consumers to connect before starting
Multi-stream support: Handles all streams in the XDF file simultaneously

# Basic playback
python -m pyxdf.cli.playback_lsl recording.xdf

# Loop mode with 2x speed
python -m pyxdf.cli.playback_lsl recording.xdf --playback_speed 2.0 --loop

# Wait for consumers before starting
python -m pyxdf.cli.playback_lsl recording.xdf --wait_for_consumer

# Slow motion playback at half speed
python -m pyxdf.cli.playback_lsl recording.xdf --playback_speed 0.5

Advanced Usage Examples

Custom Stream Processing

def process_chunk(data, timestamps, info, stream_id):
    """Custom chunk processing callback."""
    # Apply real-time filtering, downsampling, etc.
    if info["type"][0] == "EEG":
        # Apply notch filter to EEG data
        filtered_data = apply_notch_filter(data, 60.0)  # Remove 60Hz noise
        return filtered_data, timestamps, info
    return data, timestamps, info

# Load with custom processing
streams, _ = pyxdf.load_xdf("recording.xdf", on_chunk=process_chunk)

Handling Data Breaks

# Load with custom break detection
streams, _ = pyxdf.load_xdf(
    "recording.xdf",
    jitter_break_threshold_seconds=0.5,  # Detect 500ms breaks
    jitter_break_threshold_samples=100   # Or 100-sample breaks
)

# Process segments separately
for stream in streams:
    for start_idx, end_idx in stream["info"]["segments"]:
        segment_data = stream["time_series"][start_idx:end_idx+1]
        segment_times = stream["time_stamps"][start_idx:end_idx+1]
        # Process each continuous segment
        process_segment(segment_data, segment_times)

Clock Synchronization Control

# Disable automatic processing for manual control
streams, _ = pyxdf.load_xdf(
    "recording.xdf",
    synchronize_clocks=False,    # Skip automatic sync
    dejitter_timestamps=False,   # Skip jitter removal
    verbose=True                 # Enable debug logging
)

# Access raw clock information
for stream in streams:
    clock_times = stream["clock_times"]
    clock_values = stream["clock_values"]
    # Implement custom synchronization
    custom_sync_timestamps = apply_custom_sync(
        stream["time_stamps"], clock_times, clock_values
    )

Error Handling

PyXDF includes robust error handling for common issues:

try:
    streams, header = pyxdf.load_xdf("corrupted.xdf")
except IOError as e:
    print(f"File error: {e}")
    # Raised for invalid XDF files (missing magic bytes) or file access issues
except ValueError as e:
    print(f"Invalid stream selection: {e}")
    # Raised for malformed select_streams parameter or no matching streams
except FileNotFoundError as e:
    print(f"File not found: {e}")
    # Raised when XDF file doesn't exist
except struct.error as e:
    print(f"Data corruption detected: {e}")
    # Raised for corrupted binary data, library attempts recovery
except Exception as e:
    print(f"Parsing error: {e}")
    # General parsing errors - library attempts to recover and load partial data

Error Recovery Mechanisms:

PyXDF automatically handles many failure scenarios:

File corruption: When binary chunks are corrupted, scans forward to find valid boundary chunks
Missing streams: Handles interrupted recordings gracefully, returns available data
Clock resets: Detects and corrects for computer restarts during recording using statistical analysis
Malformed XML: Skips corrupted metadata elements while preserving time-series data
Incomplete files: Loads available data from truncated recordings caused by system failures
Memory issues: Processes large files chunk-by-chunk to handle memory constraints
Data type mismatches: Handles inconsistent data formats across chunks

Specific Error Conditions:

ValueError("No matching streams found.") - When select_streams criteria match no streams
ValueError("Argument 'select_streams' must be...") - Invalid select_streams parameter format
IOError("Invalid XDF file") - File doesn't start with "XDF:" magic bytes
ValueError("file has to be opened in binary mode") - Text mode file handle passed to open_xdf
Exception("file does not exist") - File path doesn't exist when using open_xdf
EOFError - Unexpected end of file, handled gracefully with partial data recovery

Types

# Type annotations for main function parameters
filename: Union[str, pathlib.Path]
select_streams: Union[None, int, list[int], list[dict]]
on_chunk: Union[None, Callable[[np.ndarray, np.ndarray, dict, int], tuple[np.ndarray, np.ndarray, dict]]]

# Stream selection criteria format
stream_criteria: dict[str, str]  # e.g., {"type": "EEG", "name": "BrainAmp"}

# Stream info structure from resolve_streams
StreamInfo = {
    "stream_id": int,
    "name": str,
    "type": str,
    "source_id": str,
    "created_at": str,
    "uid": str,
    "session_id": str,
    "hostname": str,
    "channel_count": int,
    "channel_format": str,
    "nominal_srate": float,
}