CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-awkward

Manipulate JSON-like data with NumPy-like idioms for scientific computing and high-energy physics.

Pending

Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Overview
Eval results
Files

string-operations.mddocs/

String Operations

Comprehensive string processing capabilities modeled after Apache Arrow's compute functions, providing efficient operations on arrays of strings including pattern matching, transformations, analysis, and categorical operations. All functions work seamlessly with nested string arrays.

Capabilities

String Case Transformations

Functions for changing the case of string arrays while preserving array structure and handling missing values appropriately.

def str.capitalize(array):
    """
    Capitalize the first character of each string.
    
    Parameters:
    - array: Array of strings to capitalize
    
    Returns:
    Array with strings having first character capitalized
    """

def str.lower(array):
    """
    Convert strings to lowercase.
    
    Parameters:
    - array: Array of strings to convert
    
    Returns:
    Array with strings converted to lowercase
    """

def str.upper(array):
    """
    Convert strings to uppercase.
    
    Parameters:
    - array: Array of strings to convert
    
    Returns:
    Array with strings converted to uppercase  
    """

def str.swapcase(array):
    """
    Swap case of each character in strings.
    
    Parameters:
    - array: Array of strings to swap case
    
    Returns:
    Array with case of each character swapped
    """

def str.title(array):
    """
    Convert strings to title case (capitalize first letter of each word).
    
    Parameters:
    - array: Array of strings to convert
    
    Returns:
    Array with strings converted to title case
    """

String Reversal and Ordering

Functions for reversing string content and analyzing string structure.

def str.reverse(array):
    """
    Reverse each string character by character.
    
    Parameters:
    - array: Array of strings to reverse
    
    Returns:
    Array with strings reversed
    """

String Padding and Alignment

Functions for padding strings to specified widths with customizable fill characters and alignment options.

def str.center(array, width, padding=" "):
    """
    Center strings in fields of specified width.
    
    Parameters:
    - array: Array of strings to center
    - width: int, minimum width of resulting strings
    - padding: str, character to use for padding (default space)
    
    Returns:
    Array with strings centered and padded to specified width
    """

def str.lpad(array, width, padding=" "):
    """
    Left-pad strings to specified width.
    
    Parameters:
    - array: Array of strings to pad
    - width: int, minimum width of resulting strings
    - padding: str, character to use for padding (default space)
    
    Returns:
    Array with strings left-padded to specified width
    """

def str.rpad(array, width, padding=" "):
    """
    Right-pad strings to specified width.
    
    Parameters:
    - array: Array of strings to pad
    - width: int, minimum width of resulting strings  
    - padding: str, character to use for padding (default space)
    
    Returns:
    Array with strings right-padded to specified width
    """

String Trimming and Cleanup

Functions for removing unwanted characters from the beginning, end, or both ends of strings.

def str.trim(array, characters=None):
    """
    Remove leading and trailing characters from strings.
    
    Parameters:
    - array: Array of strings to trim
    - characters: str, characters to remove (None for whitespace)
    
    Returns:
    Array with specified characters trimmed from both ends
    """

def str.ltrim(array, characters=None):
    """
    Remove leading characters from strings.
    
    Parameters:
    - array: Array of strings to trim
    - characters: str, characters to remove (None for whitespace)
    
    Returns:
    Array with specified characters trimmed from start
    """

def str.rtrim(array, characters=None):
    """
    Remove trailing characters from strings.
    
    Parameters:
    - array: Array of strings to trim
    - characters: str, characters to remove (None for whitespace)
    
    Returns:
    Array with specified characters trimmed from end
    """

def str.trim_whitespace(array):
    """
    Remove leading and trailing whitespace from strings.
    
    Parameters:
    - array: Array of strings to trim
    
    Returns:
    Array with whitespace trimmed from both ends
    """

def str.ltrim_whitespace(array):
    """
    Remove leading whitespace from strings.
    
    Parameters:
    - array: Array of strings to trim
    
    Returns:
    Array with whitespace trimmed from start
    """

def str.rtrim_whitespace(array):
    """
    Remove trailing whitespace from strings.
    
    Parameters:
    - array: Array of strings to trim
    
    Returns:
    Array with whitespace trimmed from end
    """

String Length and Analysis

Functions for analyzing string properties including length, character counts, and pattern occurrences.

def str.length(array):
    """
    Get length of each string in characters.
    
    Parameters:
    - array: Array of strings to measure
    
    Returns:
    Array of integers representing string lengths
    """

def str.count_substring(array, pattern, ignore_case=False):
    """
    Count non-overlapping occurrences of substring in each string.
    
    Parameters:
    - array: Array of strings to search
    - pattern: str, substring pattern to count
    - ignore_case: bool, if True perform case-insensitive search
    
    Returns:
    Array of integers representing count of pattern occurrences
    """

def str.count_substring_regex(array, pattern, flags=0):
    """
    Count non-overlapping regex matches in each string.
    
    Parameters:
    - array: Array of strings to search
    - pattern: str, regular expression pattern to count
    - flags: int, regex flags (e.g., re.IGNORECASE)
    
    Returns:
    Array of integers representing count of pattern matches
    """

String Search and Pattern Finding

Functions for locating patterns within strings using both literal and regular expression matching.

def str.find_substring(array, pattern, start=0, end=None, ignore_case=False):
    """
    Find first occurrence of substring in each string.
    
    Parameters:
    - array: Array of strings to search
    - pattern: str, substring pattern to find
    - start: int, starting position for search
    - end: int, ending position for search (None for end of string)
    - ignore_case: bool, if True perform case-insensitive search
    
    Returns:
    Array of integers representing position of first match (-1 if not found)
    """

def str.find_substring_regex(array, pattern, flags=0):
    """
    Find first regex match position in each string.
    
    Parameters:
    - array: Array of strings to search  
    - pattern: str, regular expression pattern to find
    - flags: int, regex flags (e.g., re.IGNORECASE)
    
    Returns:
    Array of integers representing position of first match (-1 if not found)
    """

Character Type Predicates

Functions for testing character properties and string composition, useful for data validation and filtering.

def str.is_alnum(array):
    """
    Test if all characters in strings are alphanumeric.
    
    Parameters:
    - array: Array of strings to test
    
    Returns:
    Array of booleans indicating if strings are alphanumeric
    """

def str.is_alpha(array):
    """
    Test if all characters in strings are alphabetic.
    
    Parameters:
    - array: Array of strings to test
    
    Returns:
    Array of booleans indicating if strings are alphabetic
    """

def str.is_ascii(array):
    """
    Test if all characters in strings are ASCII.
    
    Parameters:
    - array: Array of strings to test
    
    Returns:
    Array of booleans indicating if strings contain only ASCII characters
    """

def str.is_decimal(array):
    """
    Test if all characters in strings are decimal digits.
    
    Parameters:
    - array: Array of strings to test
    
    Returns:
    Array of booleans indicating if strings are decimal
    """

def str.is_digit(array):
    """
    Test if all characters in strings are digits.
    
    Parameters:
    - array: Array of strings to test
    
    Returns:
    Array of booleans indicating if strings contain only digits
    """

def str.is_lower(array):
    """
    Test if all cased characters in strings are lowercase.
    
    Parameters:
    - array: Array of strings to test
    
    Returns:
    Array of booleans indicating if strings are lowercase
    """

def str.is_numeric(array):
    """
    Test if all characters in strings are numeric.
    
    Parameters:
    - array: Array of strings to test
    
    Returns:  
    Array of booleans indicating if strings are numeric
    """

def str.is_printable(array):
    """
    Test if all characters in strings are printable.
    
    Parameters:
    - array: Array of strings to test
    
    Returns:
    Array of booleans indicating if strings are printable
    """

def str.is_space(array):
    """
    Test if all characters in strings are whitespace.
    
    Parameters:
    - array: Array of strings to test
    
    Returns:
    Array of booleans indicating if strings contain only whitespace
    """

def str.is_title(array):
    """
    Test if strings are in title case.
    
    Parameters:
    - array: Array of strings to test
    
    Returns:
    Array of booleans indicating if strings are in title case
    """

def str.is_upper(array):
    """
    Test if all cased characters in strings are uppercase.
    
    Parameters:
    - array: Array of strings to test
    
    Returns:
    Array of booleans indicating if strings are uppercase
    """

Pattern Matching and Boolean Tests

Functions for testing string patterns using various matching strategies including prefix/suffix, regex, and SQL-like patterns.

def str.starts_with(array, pattern, ignore_case=False):
    """
    Test if strings start with specified pattern.
    
    Parameters:
    - array: Array of strings to test
    - pattern: str, pattern to match at start of strings
    - ignore_case: bool, if True perform case-insensitive matching
    
    Returns:
    Array of booleans indicating if strings start with pattern
    """

def str.ends_with(array, pattern, ignore_case=False):
    """
    Test if strings end with specified pattern.
    
    Parameters:
    - array: Array of strings to test
    - pattern: str, pattern to match at end of strings
    - ignore_case: bool, if True perform case-insensitive matching
    
    Returns:
    Array of booleans indicating if strings end with pattern
    """

def str.match_substring(array, pattern, ignore_case=False):
    """
    Test if strings contain specified substring.
    
    Parameters:
    - array: Array of strings to test
    - pattern: str, substring pattern to match
    - ignore_case: bool, if True perform case-insensitive matching
    
    Returns:
    Array of booleans indicating if strings contain pattern
    """

def str.match_substring_regex(array, pattern, flags=0):
    """
    Test if strings match regular expression pattern.
    
    Parameters:
    - array: Array of strings to test
    - pattern: str, regular expression pattern to match
    - flags: int, regex flags (e.g., re.IGNORECASE)
    
    Returns:
    Array of booleans indicating if strings match pattern
    """

def str.match_like(array, pattern, ignore_case=False, escape=None):
    """
    Test strings using SQL LIKE pattern matching.
    
    Parameters:
    - array: Array of strings to test
    - pattern: str, SQL LIKE pattern (% for any chars, _ for single char)
    - ignore_case: bool, if True perform case-insensitive matching
    - escape: str, escape character for literal % and _ (default None)
    
    Returns:
    Array of booleans indicating if strings match LIKE pattern
    """

Set Membership Operations

Functions for testing string membership in collections and finding positions within value sets.

def str.is_in(array, values):
    """
    Test if strings are in specified collection of values.
    
    Parameters:
    - array: Array of strings to test
    - values: Array or sequence of strings to test membership against
    
    Returns:
    Array of booleans indicating if strings are in value set
    """

def str.index_in(array, values):
    """
    Find index of strings in specified collection of values.
    
    Parameters:
    - array: Array of strings to find indices for
    - values: Array or sequence of strings to find indices in
    
    Returns:
    Array of integers representing index in values (-1 if not found)
    """

String Replacement and Modification

Functions for replacing and modifying string content using literal patterns, regular expressions, or slice operations.

def str.replace_substring(array, pattern, replacement, max_replacements=None):
    """
    Replace occurrences of substring with replacement string.
    
    Parameters:
    - array: Array of strings to modify
    - pattern: str, substring pattern to replace
    - replacement: str, replacement string
    - max_replacements: int, maximum number of replacements per string (None for all)
    
    Returns:
    Array with substring occurrences replaced
    """

def str.replace_substring_regex(array, pattern, replacement, max_replacements=None):
    """
    Replace regex matches with replacement string.
    
    Parameters:
    - array: Array of strings to modify
    - pattern: str, regular expression pattern to replace
    - replacement: str, replacement string (can include capture groups)
    - max_replacements: int, maximum number of replacements per string (None for all)
    
    Returns:
    Array with regex matches replaced
    """

def str.replace_slice(array, start, stop, replacement):
    """
    Replace string slice with replacement string.
    
    Parameters:
    - array: Array of strings to modify
    - start: int, start index of slice to replace
    - stop: int, stop index of slice to replace  
    - replacement: str, replacement string
    
    Returns:
    Array with string slices replaced
    """

def str.repeat(array, repeats):
    """
    Repeat each string specified number of times.
    
    Parameters:
    - array: Array of strings to repeat
    - repeats: int or Array of ints, number of repetitions for each string
    
    Returns:
    Array with strings repeated
    """

String Extraction and Slicing

Functions for extracting parts of strings using position-based slicing or pattern-based extraction.

def str.slice(array, start=0, stop=None, step=1):
    """
    Extract substring using slice notation.
    
    Parameters:
    - array: Array of strings to slice
    - start: int, start index (default 0)
    - stop: int, stop index (None for end of string)
    - step: int, step size (default 1)
    
    Returns:
    Array containing extracted substrings
    """

def str.extract_regex(array, pattern, flags=0):
    """
    Extract regex capture groups from strings.
    
    Parameters:
    - array: Array of strings to extract from
    - pattern: str, regular expression with capture groups
    - flags: int, regex flags (e.g., re.IGNORECASE)
    
    Returns:
    Array of tuples/records containing captured groups (None if no match)
    """

String Splitting and Joining

Functions for splitting strings into components and joining string arrays into single strings.

def str.split_whitespace(array, max_splits=None):
    """
    Split strings on whitespace characters.
    
    Parameters:
    - array: Array of strings to split
    - max_splits: int, maximum number of splits per string (None for unlimited)
    
    Returns:
    Array of lists containing string components
    """

def str.split_pattern(array, pattern, max_splits=None):
    """
    Split strings on literal pattern.
    
    Parameters:
    - array: Array of strings to split
    - pattern: str, literal pattern to split on
    - max_splits: int, maximum number of splits per string (None for unlimited)
    
    Returns:
    Array of lists containing string components
    """

def str.split_pattern_regex(array, pattern, max_splits=None, flags=0):
    """
    Split strings using regular expression pattern.
    
    Parameters:
    - array: Array of strings to split
    - pattern: str, regular expression pattern to split on
    - max_splits: int, maximum number of splits per string (None for unlimited)
    - flags: int, regex flags (e.g., re.IGNORECASE)
    
    Returns:
    Array of lists containing string components
    """

def str.join(array, separator):
    """
    Join arrays of strings using separator.
    
    Parameters:
    - array: Array of string lists to join
    - separator: str, separator to use between elements
    
    Returns:
    Array of strings created by joining list elements
    """

def str.join_element_wise(array, separator):
    """
    Join corresponding elements from multiple string arrays.
    
    Parameters:
    - array: Array of string lists where each inner list contains strings to join
    - separator: str, separator to use between elements
    
    Returns:
    Array of strings created by joining corresponding elements
    """

Categorical String Operations

Functions for working with categorical string data, enabling memory-efficient storage and processing of repeated string values.

def str.to_categorical(array):
    """
    Convert string array to categorical representation.
    
    Parameters:
    - array: Array of strings to convert
    
    Returns:
    Array with categorical representation (indices + categories)
    """

Usage Examples

Basic String Operations

import awkward as ak

# Create array of strings
names = ak.Array(["alice", "bob", "CHARLIE", "diana"])

# Case transformations
upper_names = ak.str.upper(names)        # ["ALICE", "BOB", "CHARLIE", "DIANA"]  
lower_names = ak.str.lower(names)        # ["alice", "bob", "charlie", "diana"]
title_names = ak.str.title(names)        # ["Alice", "Bob", "Charlie", "Diana"]

# String properties
lengths = ak.str.length(names)           # [5, 3, 7, 5]
is_upper = ak.str.is_upper(names)        # [False, False, True, False]

String Filtering and Matching

import awkward as ak

emails = ak.Array(["user@example.com", "admin@site.org", "test@example.com"])

# Pattern matching
has_example = ak.str.match_substring(emails, "example")  # [True, False, True]
starts_admin = ak.str.starts_with(emails, "admin")      # [False, True, False]
ends_com = ak.str.ends_with(emails, ".com")             # [True, False, True]

# Filter based on pattern
example_emails = emails[has_example]  # ["user@example.com", "test@example.com"]

String Transformations

import awkward as ak

# Nested string arrays
data = ak.Array([["hello world", "test"], ["python", "awkward array"]])

# Split strings
split_data = ak.str.split_whitespace(data)  
# [[["hello", "world"], ["test"]], [["python"], ["awkward", "array"]]]

# Replace patterns  
cleaned = ak.str.replace_substring(data, "test", "demo")
# [["hello world", "demo"], ["python", "awkward array"]]

# Extract parts
first_words = ak.str.split_whitespace(data)[:, :, 0]
# [["hello", "test"], ["python", "awkward"]]

String Padding and Formatting

import awkward as ak

numbers = ak.Array(["1", "22", "333"])

# Pad strings
left_padded = ak.str.lpad(numbers, 5, "0")    # ["00001", "00022", "00333"]
centered = ak.str.center(numbers, 5, "*")     # ["**1**", "*22**", "*333*"]

# Trim whitespace
messy = ak.Array(["  hello  ", " world ", "test"])
clean = ak.str.trim_whitespace(messy)         # ["hello", "world", "test"]

Regular Expression Operations

import awkward as ak
import re

text = ak.Array(["Phone: 123-456-7890", "Call me at 555-123-4567", "No phone"])

# Extract phone numbers
phone_pattern = r'(\d{3})-(\d{3})-(\d{4})'
matches = ak.str.extract_regex(text, phone_pattern)

# Count pattern occurrences  
digit_count = ak.str.count_substring_regex(text, r'\d')  # [10, 10, 0]

# Boolean matching
has_phone = ak.str.match_substring_regex(text, phone_pattern)  # [True, True, False]

Advanced String Processing

import awkward as ak

# String arrays with missing values
data = ak.Array([["alice", "bob"], None, ["charlie"]])

# Operations handle None gracefully
upper_data = ak.str.upper(data)  # [["ALICE", "BOB"], None, ["CHARLIE"]]

# Join string lists
sentences = ak.Array([["hello", "world"], ["python", "is", "great"]])  
joined = ak.str.join(sentences, " ")  # ["hello world", "python is great"]

# Categorical conversion for memory efficiency
categories = ak.Array(["red", "blue", "red", "green", "blue", "red"])
categorical = ak.str.to_categorical(categories)  # More memory efficient

Install with Tessl CLI

npx tessl i tessl/pypi-awkward

docs

array-creation.md

array-manipulation.md

data-conversion.md

index.md

integration.md

mathematical-operations.md

string-operations.md

type-system.md

tile.json