CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-awkward

Manipulate JSON-like data with NumPy-like idioms for scientific computing and high-energy physics.

Pending

Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Overview
Eval results
Files

array-creation.mddocs/

Array Creation and Construction

Comprehensive functions for creating awkward arrays from various data sources including Python iterables, NumPy arrays, JSON data, binary formats, and other array libraries. Supports both direct construction and incremental building for complex nested structures.

Capabilities

From Python Data

Create arrays directly from Python lists, tuples, dictionaries, and other iterables, automatically inferring the appropriate nested structure and data types.

def from_iter(iterable, *, allow_record=True, highlevel=True, behavior=None, attrs=None, initial=1024, resize=8):
    """
    Create an array from a Python iterable.
    
    Parameters:
    - iterable: Nested Python data structure (lists, tuples, dicts, etc.)
    - allow_record: bool, if False, prohibit record types at the outermost level
    - highlevel: bool, if True return Array, if False return Content layout
    - behavior: dict, custom behavior for the array
    - attrs: dict, metadata attributes for the array
    - initial: int, initial size in bytes for buffers
    - resize: float, resize multiplier for buffers (> 1.0)
    
    Returns:
    Array or Content layout containing the iterable data
    """

def from_numpy(array, highlevel=True, behavior=None):
    """
    Create an array from a NumPy array.
    
    Parameters:
    - array: numpy.ndarray to convert
    - highlevel: bool, if True return Array, if False return Content layout  
    - behavior: dict, custom behavior for the array
    
    Returns:
    Array or Content layout wrapping the NumPy data
    """

def from_regular(array, axis=1, highlevel=True, behavior=None):
    """
    Create an array from a regular (rectangular) nested structure.
    
    Parameters:
    - array: Regular array-like structure
    - axis: int, axis along which to interpret regularity
    - highlevel: bool, if True return Array, if False return Content layout
    - behavior: dict, custom behavior for the array
    
    Returns:
    Array with RegularArray layout for the specified axis
    """

From Structured Data Formats

Create arrays from structured data formats like JSON, maintaining the hierarchical structure and supporting mixed data types within the same array.

def from_json(source, highlevel=True, behavior=None, nan_string=None, 
             infinity_string=None, minus_infinity_string=None):
    """
    Parse JSON data into an array.
    
    Parameters:
    - source: JSON string, bytes, or file-like object
    - highlevel: bool, if True return Array, if False return Content layout
    - behavior: dict, custom behavior for the array
    - nan_string: str, string to interpret as NaN
    - infinity_string: str, string to interpret as positive infinity
    - minus_infinity_string: str, string to interpret as negative infinity
    
    Returns:
    Array containing the parsed JSON data
    """

def from_buffers(form, length, container, buffer_key=None, highlevel=True, behavior=None):
    """
    Create an array from Form description and data buffers.
    
    Parameters:
    - form: Form describing the array structure
    - length: int, length of the array
    - container: dict-like object containing named buffers
    - buffer_key: callable, function to generate buffer keys
    - highlevel: bool, if True return Array, if False return Content layout
    - behavior: dict, custom behavior for the array
    
    Returns:
    Array reconstructed from the form and buffers
    """

From File Formats

Direct reading from various file formats commonly used in scientific computing and data analysis, with support for chunked reading and metadata preservation.

def from_parquet(path, columns=None, row_groups=None, lazy=False, 
                lazy_cache="new", lazy_cache_key=None, highlevel=True, behavior=None):
    """
    Read array data from Parquet files.
    
    Parameters:
    - path: str or file-like, Parquet file path or object
    - columns: list of str, columns to read (None for all)
    - row_groups: list of int, row groups to read (None for all)
    - lazy: bool, if True create lazy array
    - lazy_cache: str or dict, cache configuration for lazy arrays
    - lazy_cache_key: str, cache key for lazy arrays  
    - highlevel: bool, if True return Array, if False return Content layout
    - behavior: dict, custom behavior for the array
    
    Returns:
    Array containing the Parquet data
    """

def from_feather(file, columns=None, highlevel=True, behavior=None):
    """
    Read array data from Feather/Arrow IPC files.
    
    Parameters:
    - file: str or file-like, Feather file path or object
    - columns: list of str, columns to read (None for all)
    - highlevel: bool, if True return Array, if False return Content layout
    - behavior: dict, custom behavior for the array
    
    Returns:
    Array containing the Feather data
    """

def from_avro_file(file, highlevel=True, behavior=None):
    """
    Read array data from Avro files.
    
    Parameters:
    - file: str or file-like, Avro file path or object
    - highlevel: bool, if True return Array, if False return Content layout
    - behavior: dict, custom behavior for the array
    
    Returns:
    Array containing the Avro data
    """

def metadata_from_parquet(path):
    """
    Extract metadata from Parquet files without reading data.
    
    Parameters:
    - path: str, path to Parquet file
    
    Returns:
    dict containing Parquet metadata information
    """

From Other Array Libraries

Seamless integration with popular array libraries and machine learning frameworks, preserving data structure and enabling cross-ecosystem workflows.

def from_arrow(array, highlevel=True, behavior=None):
    """
    Create an array from Apache Arrow data.
    
    Parameters:
    - array: pyarrow.Array or pyarrow.ChunkedArray
    - highlevel: bool, if True return Array, if False return Content layout
    - behavior: dict, custom behavior for the array
    
    Returns:
    Array containing the Arrow data
    """

def from_arrow_schema(schema, highlevel=True, behavior=None):
    """
    Create an empty array from Apache Arrow schema.
    
    Parameters:
    - schema: pyarrow.Schema describing the array structure
    - highlevel: bool, if True return Array, if False return Content layout
    - behavior: dict, custom behavior for the array
    
    Returns:
    Empty Array with the specified schema
    """

def from_torch(array, highlevel=True, behavior=None):
    """
    Create an array from PyTorch tensor.
    
    Parameters:
    - array: torch.Tensor to convert
    - highlevel: bool, if True return Array, if False return Content layout
    - behavior: dict, custom behavior for the array
    
    Returns:
    Array containing the PyTorch tensor data
    """

def from_tensorflow(array, highlevel=True, behavior=None):
    """
    Create an array from TensorFlow tensor.
    
    Parameters:
    - array: tf.Tensor to convert
    - highlevel: bool, if True return Array, if False return Content layout
    - behavior: dict, custom behavior for the array
    
    Returns:
    Array containing the TensorFlow tensor data
    """

def from_raggedtensor(tensor, highlevel=True, behavior=None):
    """
    Create an array from TensorFlow RaggedTensor.
    
    Parameters:
    - tensor: tf.RaggedTensor to convert
    - highlevel: bool, if True return Array, if False return Content layout
    - behavior: dict, custom behavior for the array
    
    Returns:
    Array containing the RaggedTensor data
    """

def from_jax(array, highlevel=True, behavior=None):
    """
    Create an array from JAX array.
    
    Parameters:
    - array: jax.numpy.ndarray to convert
    - highlevel: bool, if True return Array, if False return Content layout
    - behavior: dict, custom behavior for the array
    
    Returns:
    Array containing the JAX array data
    """

def from_cupy(array, highlevel=True, behavior=None):
    """
    Create an array from CuPy array.
    
    Parameters:
    - array: cupy.ndarray to convert
    - highlevel: bool, if True return Array, if False return Content layout
    - behavior: dict, custom behavior for the array
    
    Returns:
    Array containing the CuPy array data
    """

def from_dlpack(tensor, highlevel=True, behavior=None):
    """
    Create an array from DLPack tensor.
    
    Parameters:
    - tensor: DLPack tensor capsule
    - highlevel: bool, if True return Array, if False return Content layout
    - behavior: dict, custom behavior for the array
    
    Returns:
    Array containing the DLPack tensor data
    """

Specialized Construction

Functions for creating arrays with specific patterns or from specialized data sources common in scientific computing workflows.

def from_categorical(array, highlevel=True, behavior=None):
    """
    Create an array from categorical data representation.
    
    Parameters:
    - array: Categorical data structure
    - highlevel: bool, if True return Array, if False return Content layout
    - behavior: dict, custom behavior for the array
    
    Returns:
    Array with categorical data structure
    """

def from_rdataframe(df, highlevel=True, behavior=None):
    """
    Create an array from ROOT RDataFrame.
    
    Parameters:
    - df: ROOT.RDataFrame object
    - highlevel: bool, if True return Array, if False return Content layout
    - behavior: dict, custom behavior for the array
    
    Returns:
    Array containing the RDataFrame data
    """

def zeros_like(array, highlevel=True, behavior=None):
    """
    Create an array of zeros with the same structure as input.
    
    Parameters:
    - array: Array whose structure to copy
    - highlevel: bool, if True return Array, if False return Content layout
    - behavior: dict, custom behavior for the array
    
    Returns:
    Array filled with zeros matching input structure
    """

def ones_like(array, highlevel=True, behavior=None):
    """
    Create an array of ones with the same structure as input.
    
    Parameters:
    - array: Array whose structure to copy
    - highlevel: bool, if True return Array, if False return Content layout
    - behavior: dict, custom behavior for the array
    
    Returns:
    Array filled with ones matching input structure
    """

def full_like(array, fill_value, highlevel=True, behavior=None):
    """
    Create an array filled with a value, matching input structure.
    
    Parameters:
    - array: Array whose structure to copy
    - fill_value: Value to fill the array with
    - highlevel: bool, if True return Array, if False return Content layout
    - behavior: dict, custom behavior for the array
    
    Returns:
    Array filled with fill_value matching input structure
    """

Incremental Building

ArrayBuilder provides a flexible way to construct complex arrays incrementally, supporting nested structures, mixed types, and efficient memory management.

class ArrayBuilder:
    """
    Builder for incrementally constructing arrays with complex nested structures.
    """
    
    def __init__(self, behavior=None):
        """
        Initialize a new ArrayBuilder.
        
        Parameters:
        - behavior: dict, custom behavior for built arrays
        """
    
    # Primitive value methods
    def null(self):
        """Append a null/None value."""
    
    def boolean(self, x):
        """
        Append a boolean value.
        
        Parameters:
        - x: bool, boolean value to append
        """
    
    def integer(self, x):
        """
        Append an integer value.
        
        Parameters:
        - x: int, integer value to append
        """
    
    def real(self, x):
        """
        Append a real (float) value.
        
        Parameters:
        - x: float, real value to append
        """
    
    def complex(self, real, imag=0):
        """
        Append a complex value.
        
        Parameters:
        - real: float, real part
        - imag: float, imaginary part (default 0)
        """
    
    def string(self, x):
        """
        Append a string value.
        
        Parameters:
        - x: str, string value to append
        """
    
    def bytestring(self, x):
        """
        Append a byte string value.
        
        Parameters:
        - x: bytes, byte string value to append
        """
    
    def datetime(self, x):
        """
        Append a datetime value.
        
        Parameters:
        - x: datetime, datetime value to append
        """
    
    def timedelta(self, x):
        """
        Append a timedelta value.
        
        Parameters:
        - x: timedelta, timedelta value to append
        """
    
    def append(self, x):
        """
        Generic method for appending various types of data.
        
        Parameters:
        - x: Various types (None, bool, int, float, str, Array, Record, or Python data)
        """
    
    def extend(self, iterable):
        """
        Append all items from an iterable.
        
        Parameters:
        - iterable: Iterable containing data to append
        """
    
    # Nested structure methods  
    def begin_list(self):
        """Begin building a list (variable-length)."""
    
    def end_list(self):
        """End building the current list."""
    
    def begin_tuple(self, numfields):
        """
        Begin building a tuple (fixed-length).
        
        Parameters:
        - numfields: int, number of fields in tuple
        """
    
    def end_tuple(self):
        """End building the current tuple."""
    
    def begin_record(self, name=None):
        """
        Begin building a record (named fields).
        
        Parameters:
        - name: str, optional name for the record type
        """
    
    def end_record(self):
        """End building the current record."""
    
    def field(self, key):
        """
        Set the field key for the next value in a record.
        
        Parameters:
        - key: str, field name
        """
    
    def index(self, i):
        """
        Set the index for the next value in a tuple.
        
        Parameters:
        - i: int, index position
        """
    
    # Context managers for convenience
    def list(self):
        """
        Context manager for building a list.
        
        Returns:
        Context manager that calls begin_list/end_list
        """
    
    def tuple(self, numfields):
        """
        Context manager for building a tuple.
        
        Parameters:
        - numfields: int, number of fields in tuple
        
        Returns:
        Context manager that calls begin_tuple/end_tuple
        """
    
    def record(self, name=None):
        """
        Context manager for building a record.
        
        Parameters:
        - name: str, optional name for the record type
        
        Returns:
        Context manager that calls begin_record/end_record
        """
    
    # Finalization
    def snapshot(self):
        """
        Create an Array from the current builder state.
        
        Returns:
        Array containing the built data
        """

Usage Examples

Basic Construction

import awkward as ak
import numpy as np

# From Python lists
data = [[1, 2, 3], [4], [5, 6]]
array = ak.from_iter(data)

# From NumPy arrays
np_array = np.array([[1, 2], [3, 4]])
ak_array = ak.from_numpy(np_array)

# From JSON
json_data = '[{"x": [1, 2], "y": 3}, {"x": [4], "y": 5}]'
array = ak.from_json(json_data)

Incremental Building

import awkward as ak

builder = ak.ArrayBuilder()

# Build nested structure
with builder.list():
    builder.integer(1)
    builder.integer(2)
    
with builder.record():
    builder.field("x")
    builder.real(3.14)
    builder.field("y")
    with builder.list():
        builder.string("hello")
        builder.string("world")

array = builder.snapshot()

File I/O

import awkward as ak

# Read from Parquet
array = ak.from_parquet("data.parquet")

# Read from JSON file
with open("data.json") as f:
    array = ak.from_json(f)

# From Arrow
import pyarrow as pa
arrow_array = pa.array([1, 2, 3])
ak_array = ak.from_arrow(arrow_array)

Install with Tessl CLI

npx tessl i tessl/pypi-awkward

docs

array-creation.md

array-manipulation.md

data-conversion.md

index.md

integration.md

mathematical-operations.md

string-operations.md

type-system.md

tile.json