CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-hdmf

A hierarchical data modeling framework for modern science data standards

Pending
Overview
Eval results
Files

term-sets.mddocs/

Term Sets and Ontologies

HDMF provides integration with ontologies and controlled vocabularies through term sets, type configuration, and semantic validation. This system enables standardized terminology usage, data validation against ontologies, and semantic interoperability across different data standards.

Capabilities

Term Set Implementation

Core implementation for working with ontological term sets and controlled vocabularies.

class TermSet:
    """
    Term set implementation for ontologies and controlled vocabularies.
    
    Provides validation and lookup capabilities for terms from external
    ontologies, enabling semantic consistency and data standardization.
    """
    
    def __init__(self, term_schema_path: str = None, **kwargs):
        """
        Initialize term set.
        
        Args:
            term_schema_path: Path to term schema file
            **kwargs: Additional term set properties:
                - sources: List of ontology sources
                - view: Specific view or subset of terms
                - name: Name for the term set
                - view_set: Set of terms in the view
        """
    
    def validate(self, value) -> bool:
        """
        Validate value against term set.
        
        Args:
            value: Value to validate
            
        Returns:
            True if value is valid according to term set
            
        Raises:
            ValueError: If validation fails with details
        """
    
    def __getitem__(self, key):
        """
        Get term information by key.
        
        Args:
            key: Term identifier or label
            
        Returns:
            Term information dictionary
        """
    
    def search_terms(self, query: str, **kwargs) -> list:
        """
        Search for terms matching query.
        
        Args:
            query: Search query string
            **kwargs: Search options:
                - case_sensitive: Whether search is case sensitive
                - fuzzy: Enable fuzzy matching
                - limit: Maximum number of results
                
        Returns:
            List of matching terms
        """
    
    def get_term_hierarchy(self, term_id: str) -> dict:
        """
        Get hierarchical relationships for a term.
        
        Args:
            term_id: Term identifier
            
        Returns:
            Dictionary with parent/child relationships
        """
    
    @property
    def sources(self) -> list:
        """List of ontology sources."""
    
    @property
    def view(self) -> str:
        """Current view or subset name."""
    
    @property
    def name(self) -> str:
        """Name of the term set."""
    
    @property
    def view_set(self) -> set:
        """Set of terms in the current view."""

Term Set Wrapper

Wrapper class that allows datasets and attributes to be associated with term sets for validation.

class TermSetWrapper:
    """
    Wrapper allowing datasets/attributes to have associated TermSets.
    
    Enables any HDF5 dataset or attribute to be validated against
    ontological terms while preserving the original data structure.
    """
    
    def __init__(self, value, field: str, termset: TermSet, **kwargs):
        """
        Initialize term set wrapper.
        
        Args:
            value: Original value to wrap
            field: Field name being wrapped
            termset: TermSet for validation
            **kwargs: Additional wrapper properties:
                - dtype: Data type for the wrapped value
                - allow_multiple: Allow multiple term selection
        """
    
    def append(self, data):
        """
        Append data with term validation.
        
        Args:
            data: Data to append (will be validated)
            
        Raises:
            ValueError: If appended data fails term validation
        """
    
    def extend(self, data):
        """
        Extend with iterable data, validating each element.
        
        Args:
            data: Iterable data to extend with
            
        Raises:
            ValueError: If any element fails term validation
        """
    
    def validate_value(self, value) -> bool:
        """
        Validate individual value against term set.
        
        Args:
            value: Value to validate
            
        Returns:
            True if value is valid
        """
    
    @property
    def value(self):
        """Wrapped value."""
    
    @property
    def field(self) -> str:
        """Field name being wrapped."""
    
    @property
    def termset(self) -> TermSet:
        """Associated TermSet."""
    
    @property
    def dtype(self):
        """Data type of wrapped value."""

Type Configuration

Global configuration system for managing data type validation with term sets.

class TypeConfigurator:
    """
    Global configuration manager for data type validation with TermSets.
    
    Manages mappings between data types and their associated term sets,
    enabling automatic validation and ontology enforcement across HDMF.
    """
    
    @staticmethod
    def get_config() -> dict:
        """
        Get current type configuration.
        
        Returns:
            Dictionary with current type-to-termset mappings
        """
    
    @staticmethod  
    def load_type_config(config_path: str):
        """
        Load type configuration from file.
        
        Args:
            config_path: Path to configuration file (JSON/YAML)
            
        The configuration file should specify mappings between
        data types and their associated term sets:
        
        {
            "cell_type": {
                "termset": "cell_ontology",
                "view": "neurons"
            },
            "brain_region": {
                "termset": "brain_atlas", 
                "view": "allen_mouse"
            }
        }
        """
    
    @staticmethod
    def unload_type_config():
        """Unload current type configuration and reset to defaults."""
    
    @staticmethod
    def register_termset(data_type: str, termset: TermSet, **kwargs):
        """
        Register term set for a specific data type.
        
        Args:
            data_type: Data type identifier
            termset: TermSet to associate with the type
            **kwargs: Additional registration options
        """
    
    @staticmethod
    def get_termset(data_type: str) -> TermSet:
        """
        Get term set for a data type.
        
        Args:
            data_type: Data type identifier
            
        Returns:
            TermSet associated with the data type, or None
        """
    
    @staticmethod
    def validate_type_value(data_type: str, value) -> bool:
        """
        Validate value for a specific data type.
        
        Args:
            data_type: Data type identifier
            value: Value to validate
            
        Returns:
            True if value is valid for the data type
        """

Term Set Utilities

Utility functions for working with term sets and ontologies.

def load_termset_from_file(file_path: str, **kwargs) -> TermSet:
    """
    Load term set from file.
    
    Args:
        file_path: Path to term set file (JSON/YAML/OWL)
        **kwargs: Loading options
        
    Returns:
        TermSet loaded from file
    """

def create_termset_from_list(terms: list, name: str, **kwargs) -> TermSet:
    """
    Create simple term set from list of terms.
    
    Args:
        terms: List of allowed terms
        name: Name for the term set
        **kwargs: Additional term set properties
        
    Returns:
        TermSet created from the term list
    """

def validate_with_termset(data, termset: TermSet, **kwargs) -> dict:
    """
    Validate data against term set with detailed results.
    
    Args:
        data: Data to validate (scalar, list, or array)
        termset: TermSet for validation
        **kwargs: Validation options
        
    Returns:
        Dictionary with validation results:
        {
            'valid': bool,
            'invalid_values': list,
            'suggestions': dict
        }
    """

def find_common_termsets(termsets: list) -> TermSet:
    """
    Find common terms across multiple term sets.
    
    Args:
        termsets: List of TermSet objects
        
    Returns:
        TermSet containing intersection of all input term sets
    """

Usage Examples

Basic Term Set Usage

from hdmf.term_set import TermSet, TermSetWrapper
import json

# Create simple term set from predefined terms
cell_types = TermSet(
    name='cell_types',
    sources=['cell_ontology'],
    view='basic_types',
    view_set={
        'pyramidal_neuron',
        'interneuron', 
        'astrocyte',
        'oligodendrocyte',
        'microglia'
    }
)

# Validate individual terms
valid_term = cell_types.validate('pyramidal_neuron')  # True
invalid_term = cell_types.validate('unknown_cell')    # False (raises ValueError)

print(f"Pyramidal neuron is valid: {valid_term}")

# Search for terms
search_results = cell_types.search_terms('neuron', fuzzy=True)
print(f"Neuron-related terms: {search_results}")

# Get term information
if 'pyramidal_neuron' in cell_types.view_set:
    term_info = cell_types['pyramidal_neuron']
    print(f"Term info: {term_info}")

Using Term Set Wrappers with Data

from hdmf.term_set import TermSetWrapper
from hdmf.common import VectorData
import numpy as np

# Create term set for brain regions
brain_regions = TermSet(
    name='brain_regions',
    sources=['allen_brain_atlas'],
    view='mouse_cortex',
    view_set={
        'primary_visual_cortex',
        'primary_motor_cortex', 
        'somatosensory_cortex',
        'auditory_cortex',
        'prefrontal_cortex'
    }
)

# Create data with term validation
region_data = [
    'primary_visual_cortex',
    'primary_motor_cortex',  
    'somatosensory_cortex'
]

# Wrap data with term set validation
validated_regions = TermSetWrapper(
    value=region_data,
    field='brain_region',
    termset=brain_regions
)

print(f"Original data: {validated_regions.value}")
print(f"Field: {validated_regions.field}")

# Append new data (with validation)
try:
    validated_regions.append('auditory_cortex')  # Valid - will succeed
    print("Successfully added auditory cortex")
except ValueError as e:
    print(f"Validation error: {e}")

try:
    validated_regions.append('invalid_region')  # Invalid - will fail
except ValueError as e:
    print(f"Validation error: {e}")

# Use in HDMF VectorData
region_vector = VectorData(
    name='recording_regions',
    description='Brain regions where recordings were made',
    data=validated_regions
)

Type Configuration Management

from hdmf.term_set import TypeConfigurator, TermSet
import json

# Create configuration for different data types
cell_type_termset = TermSet(
    name='cell_types',
    view_set={'pyramidal', 'interneuron', 'astrocyte'}
)

behavior_termset = TermSet(
    name='behaviors', 
    view_set={'running', 'grooming', 'resting', 'exploring'}
)

# Register term sets for specific data types
TypeConfigurator.register_termset('cell_type', cell_type_termset)
TypeConfigurator.register_termset('behavior_state', behavior_termset)

# Validate values using type configuration
cell_valid = TypeConfigurator.validate_type_value('cell_type', 'pyramidal')
behavior_valid = TypeConfigurator.validate_type_value('behavior_state', 'running')

print(f"Cell type 'pyramidal' is valid: {cell_valid}")
print(f"Behavior 'running' is valid: {behavior_valid}")

# Get current configuration
config = TypeConfigurator.get_config()
print(f"Current type configuration: {list(config.keys())}")

# Save configuration to file for reuse
config_dict = {
    'cell_type': {
        'terms': list(cell_type_termset.view_set),
        'source': 'cell_ontology'
    },
    'behavior_state': {
        'terms': list(behavior_termset.view_set),
        'source': 'behavior_ontology'
    }
}

with open('type_config.json', 'w') as f:
    json.dump(config_dict, f, indent=2)

# Load configuration from file
TypeConfigurator.load_type_config('type_config.json')

Advanced Term Set Operations

from hdmf.term_set import TermSet, find_common_termsets, validate_with_termset

# Create multiple overlapping term sets
mouse_terms = TermSet(
    name='mouse_anatomy',
    view_set={
        'cortex', 'hippocampus', 'thalamus', 
        'cerebellum', 'brainstem', 'olfactory_bulb'
    }
)

rat_terms = TermSet(
    name='rat_anatomy',
    view_set={
        'cortex', 'hippocampus', 'thalamus',
        'cerebellum', 'brainstem', 'striatum'
    }
)

human_terms = TermSet(
    name='human_anatomy', 
    view_set={
        'cortex', 'hippocampus', 'thalamus',
        'cerebellum', 'brainstem', 'amygdala'
    }
)

# Find common terms across species
common_terms = find_common_termsets([mouse_terms, rat_terms, human_terms])
print(f"Common brain regions: {common_terms.view_set}")

# Validate data with detailed results
test_data = [
    'cortex',           # Valid in all
    'hippocampus',      # Valid in all  
    'olfactory_bulb',   # Only in mouse
    'invalid_region'    # Invalid everywhere
]

validation_result = validate_with_termset(test_data, common_terms)
print(f"Validation results: {validation_result}")

# Results would show:
# {
#     'valid': False,
#     'invalid_values': ['olfactory_bulb', 'invalid_region'],
#     'suggestions': {
#         'olfactory_bulb': ['available_in_mouse_only'],
#         'invalid_region': ['did_you_mean: cortex, thalamus']
#     }
# }

Integration with HDMF Common Data Structures

from hdmf.common import DynamicTable, VectorData
from hdmf.term_set import TermSet, TermSetWrapper, TypeConfigurator

# Set up term sets for experimental metadata
species_terms = TermSet(
    name='species',
    view_set={'mus_musculus', 'rattus_norvegicus', 'homo_sapiens'}
)

sex_terms = TermSet(
    name='sex',
    view_set={'male', 'female', 'unknown'}
)

# Register with type configurator
TypeConfigurator.register_termset('species', species_terms)
TypeConfigurator.register_termset('sex', sex_terms)

# Create subject table with term validation
subjects_table = DynamicTable(
    name='subjects',
    description='Subject information with ontology validation'
)

# Add columns with term set validation
subjects_table.add_column('subject_id', 'Subject identifier')
subjects_table.add_column('species', 'Species (ontology validated)')
subjects_table.add_column('sex', 'Sex (ontology validated)')
subjects_table.add_column('age_days', 'Age in days', dtype='int')

# Add rows with automatic validation
def add_validated_subject(table, subject_id, species, sex, age_days):
    """Add subject with term validation."""
    
    # Validate terms before adding
    species_valid = TypeConfigurator.validate_type_value('species', species)
    sex_valid = TypeConfigurator.validate_type_value('sex', sex)
    
    if not species_valid:
        raise ValueError(f"Invalid species: {species}")
    if not sex_valid:
        raise ValueError(f"Invalid sex: {sex}")
    
    # Add row if validation passes
    table.add_row(
        subject_id=subject_id,
        species=species,
        sex=sex,
        age_days=age_days
    )

# Add subjects with validation
try:
    add_validated_subject(subjects_table, 'mouse_001', 'mus_musculus', 'male', 90)
    add_validated_subject(subjects_table, 'mouse_002', 'mus_musculus', 'female', 85)
    print("Successfully added validated subjects")
except ValueError as e:
    print(f"Validation error: {e}")

# Try to add invalid data
try:
    add_validated_subject(subjects_table, 'invalid_001', 'invalid_species', 'male', 90)
except ValueError as e:
    print(f"Expected validation error: {e}")

print(f"Subjects table has {len(subjects_table)} validated entries")

Creating Custom Ontology Integrations

from hdmf.term_set import TermSet
import requests
import json

class OntologyTermSet(TermSet):
    """
    Extended TermSet that can load terms from external ontology APIs.
    """
    
    def __init__(self, ontology_id: str, api_base_url: str, **kwargs):
        self.ontology_id = ontology_id
        self.api_base_url = api_base_url
        
        # Load terms from API
        terms = self._load_terms_from_api()
        
        super().__init__(
            name=ontology_id,
            view_set=set(terms.keys()),
            **kwargs
        )
        
        self.term_definitions = terms
    
    def _load_terms_from_api(self) -> dict:
        """Load terms from ontology API."""
        # This is a mock implementation - real implementation would
        # connect to actual ontology services like OLS, BioPortal, etc.
        
        mock_terms = {
            'CL:0000540': {
                'label': 'neuron',
                'definition': 'A cell that is electrically active and specialized for the conduction and transmission of electrical signals.',
                'synonyms': ['nerve cell']
            },
            'CL:0000129': {
                'label': 'microglial cell', 
                'definition': 'A central nervous system macrophage found in the brain.',
                'synonyms': ['microglia']
            }
        }
        
        return mock_terms
    
    def get_term_definition(self, term_id: str) -> str:
        """Get definition for a term."""
        if term_id in self.term_definitions:
            return self.term_definitions[term_id]['definition']
        return None
    
    def search_by_synonym(self, synonym: str) -> list:
        """Search terms by synonym."""
        matches = []
        for term_id, term_data in self.term_definitions.items():
            if synonym.lower() in [s.lower() for s in term_data.get('synonyms', [])]:
                matches.append(term_id)
        return matches

# Usage
cell_ontology = OntologyTermSet(
    ontology_id='cell_ontology',
    api_base_url='https://www.ebi.ac.uk/ols/api/ontologies/cl'
)

# Use enhanced features
definition = cell_ontology.get_term_definition('CL:0000540')
print(f"Neuron definition: {definition}")

microglia_matches = cell_ontology.search_by_synonym('microglia')
print(f"Microglia term IDs: {microglia_matches}")

Install with Tessl CLI

npx tessl i tessl/pypi-hdmf

docs

build-system.md

common-data.md

containers.md

data-utils.md

index.md

io-backends.md

query.md

specification.md

term-sets.md

utils.md

validation.md

tile.json