tessl/pypi-streamlit

A faster way to build and share data apps

Overview

Eval results

Files

Caching and Performance

Name: tessl/pypi-streamlit
Author: tessl

Caching decorators and performance optimization tools for efficient data processing and resource management. Streamlit's caching system enables applications to avoid expensive recomputations and resource loading.

Capabilities

Data Caching

Cache expensive data computations that can be serialized and shared across sessions.

def cache_data(func=None, *, ttl=None, max_entries=None, show_spinner=True, persist=None, experimental_allow_widgets=False, hash_funcs=None, validate=None):
    """
    Decorator to cache functions that return serializable data.

    Args:
        func (callable, optional): Function to cache (when used as decorator)
        ttl (float, optional): Time-to-live in seconds
        max_entries (int, optional): Maximum number of cached entries
        show_spinner (bool): Whether to show spinner during computation
        persist (str, optional): Persistence mode ("disk" for persistent storage)
        experimental_allow_widgets (bool): Allow widgets in cached functions
        hash_funcs (dict, optional): Custom hash functions for parameter types
        validate (callable, optional): Function to validate cached values

    Returns:
        callable: Decorated function with caching capability
    """

Example usage:

@st.cache_data
def load_data(file_path):
    """Load and process data file."""
    df = pd.read_csv(file_path)
    return df.groupby('category').sum()

# Cache with TTL (expires after 1 hour)
@st.cache_data(ttl=3600)
def fetch_api_data(endpoint, params):
    """Fetch data from API with 1-hour cache."""
    response = requests.get(endpoint, params=params)
    return response.json()

# Cache with persistence (survives app restarts)
@st.cache_data(persist="disk")
def expensive_computation(data, algorithm):
    """Expensive ML computation with disk persistence."""
    model = train_model(data, algorithm)
    return model.predictions

# Cache with custom validation
@st.cache_data(validate=lambda x: len(x) > 0)
def get_user_data(user_id):
    """Get user data with validation."""
    return database.fetch_user(user_id)

# Cache with max entries limit
@st.cache_data(max_entries=100)
def process_query(query, filters):
    """Process search query with LRU eviction."""
    return search_engine.process(query, filters)

Resource Caching

Cache global resources like database connections, models, and objects that cannot be serialized.

def cache_resource(func=None, *, ttl=None, max_entries=None, show_spinner=True, validate=None, hash_funcs=None):
    """
    Decorator to cache functions that return non-serializable resources.

    Args:
        func (callable, optional): Function to cache (when used as decorator)
        ttl (float, optional): Time-to-live in seconds
        max_entries (int, optional): Maximum number of cached entries
        show_spinner (bool): Whether to show spinner during computation
        validate (callable, optional): Function to validate cached resources
        hash_funcs (dict, optional): Custom hash functions for parameter types

    Returns:
        callable: Decorated function with resource caching capability
    """

Example usage:

@st.cache_resource
def get_database_connection():
    """Create database connection (shared across sessions)."""
    return sqlite3.connect("app.db", check_same_thread=False)

@st.cache_resource
def load_ml_model(model_path):
    """Load ML model (expensive, non-serializable)."""
    import tensorflow as tf
    return tf.keras.models.load_model(model_path)

# Resource with TTL (model refreshes daily)
@st.cache_resource(ttl=86400)
def get_trained_model(training_data_hash):
    """Load or train model with daily refresh."""
    return train_model(training_data_hash)

# Resource with validation
@st.cache_resource(validate=lambda conn: conn.is_connected())
def get_api_client(api_key):
    """Get API client with connection validation."""
    return APIClient(api_key)

# Limited resource cache
@st.cache_resource(max_entries=5)
def create_processor(config):
    """Create data processor (max 5 configurations cached)."""
    return DataProcessor(config)

Legacy Caching (Deprecated)

The original caching function, now deprecated in favor of cache_data and cache_resource.

def cache(func=None, persist=False, allow_output_mutation=False, show_spinner=True, suppress_st_warning=False, hash_funcs=None, max_entries=None, ttl=None):
    """
    Legacy caching decorator (deprecated).

    Args:
        func (callable, optional): Function to cache
        persist (bool): Whether to persist cache to disk
        allow_output_mutation (bool): Allow mutation of cached return values
        show_spinner (bool): Whether to show spinner during computation
        suppress_st_warning (bool): Suppress Streamlit warnings
        hash_funcs (dict, optional): Custom hash functions
        max_entries (int, optional): Maximum number of cached entries
        ttl (float, optional): Time-to-live in seconds

    Returns:
        callable: Decorated function with caching

    Note:
        Deprecated. Use st.cache_data or st.cache_resource instead.
    """

Performance Optimization Patterns

Data Loading Optimization

# Cache expensive data loading
@st.cache_data
def load_large_dataset(data_source):
    """Load and preprocess large dataset."""
    df = pd.read_parquet(data_source)  # Fast format
    df = df.fillna(0)  # Preprocessing
    return df

# Cache with parameters
@st.cache_data
def filter_data(df, category, date_range):
    """Filter dataset based on parameters."""
    mask = (df['category'] == category) &
           (df['date'].between(date_range[0], date_range[1]))
    return df[mask]

# Usage with cached functions
data = load_large_dataset("data.parquet")
filtered_data = filter_data(data, selected_category, date_range)

Model and Resource Management

# Cache ML models
@st.cache_resource
def load_prediction_model():
    """Load trained model for predictions."""
    return joblib.load("model.pkl")

@st.cache_resource
def get_feature_encoder():
    """Load feature preprocessing pipeline."""
    return joblib.load("encoder.pkl")

# Cache database connections
@st.cache_resource
def init_database():
    """Initialize database connection pool."""
    return ConnectionPool(
        host="localhost",
        database="myapp",
        max_connections=10
    )

# Usage pattern
model = load_prediction_model()
encoder = get_feature_encoder()
db = init_database()

# Now use these cached resources
features = encoder.transform(user_input)
prediction = model.predict(features)

API and External Service Caching

# Cache API calls with TTL
@st.cache_data(ttl=300)  # 5 minutes
def fetch_stock_prices(symbols):
    """Fetch current stock prices (cached for 5 minutes)."""
    api_key = st.secrets["stock_api_key"]
    response = requests.get(f"https://api.stocks.com/prices",
                          params={"symbols": ",".join(symbols), "key": api_key})
    return response.json()

@st.cache_data(ttl=3600)  # 1 hour
def get_weather_data(location):
    """Fetch weather data (cached for 1 hour)."""
    api_key = st.secrets["weather_api_key"]
    response = requests.get(f"https://api.weather.com/current",
                          params={"location": location, "key": api_key})
    return response.json()

# Usage with error handling
try:
    weather = get_weather_data(user_location)
    st.metric("Temperature", f"{weather['temp']}°F")
except Exception as e:
    st.error(f"Could not fetch weather data: {e}")

Custom Hash Functions

# Custom hash for complex objects
def hash_dataframe(df):
    """Custom hash function for pandas DataFrames."""
    return hash(pd.util.hash_pandas_object(df).sum())

@st.cache_data(hash_funcs={pd.DataFrame: hash_dataframe})
def process_dataframe(df, operations):
    """Process DataFrame with custom hashing."""
    result = df.copy()
    for op in operations:
        result = apply_operation(result, op)
    return result

# Custom hash for file objects
def hash_file(file_obj):
    """Hash file based on content."""
    if hasattr(file_obj, 'name'):
        return hash((file_obj.name, os.path.getmtime(file_obj.name)))
    return hash(file_obj.read())

@st.cache_data(hash_funcs={type(open(__file__)): hash_file})
def process_uploaded_file(file):
    """Process uploaded file with content-based hashing."""
    return pd.read_csv(file)

Cache Management

# Clear specific cache
@st.cache_data
def expensive_function(param):
    return compute_result(param)

# Clear cache manually
if st.button("Clear Cache"):
    expensive_function.clear()
    st.success("Cache cleared!")

# Clear all caches
if st.button("Clear All Caches"):
    st.cache_data.clear()
    st.cache_resource.clear()
    st.success("All caches cleared!")

# Conditional cache clearing
if st.checkbox("Force Refresh"):
    expensive_function.clear()
    result = expensive_function(user_input)
else:
    result = expensive_function(user_input)

Performance Monitoring

import time
import streamlit as st

# Monitor cache performance
@st.cache_data
def monitored_function(data):
    start_time = time.time()
    result = expensive_computation(data)
    end_time = time.time()

    # Log performance metrics
    st.sidebar.metric("Computation Time", f"{end_time - start_time:.2f}s")
    return result

# Cache hit/miss tracking
cache_stats = {"hits": 0, "misses": 0}

@st.cache_data
def tracked_function(param):
    cache_stats["misses"] += 1
    return compute_result(param)

# Display cache statistics
col1, col2 = st.sidebar.columns(2)
col1.metric("Cache Hits", cache_stats["hits"])
col2.metric("Cache Misses", cache_stats["misses"])

Best Practices

When to Use Each Cache Type

Use @st.cache_data for:

Data loading from files, APIs, or databases
Data transformations and computations
Serializable objects (DataFrames, lists, dicts, numbers, strings)
Results that can be safely shared across users

Use @st.cache_resource for:

Database connections and connection pools
ML models and trained algorithms
File handles and open resources
Objects with locks or threads
Non-serializable or stateful objects

Cache Configuration Guidelines

# For frequently accessed, stable data
@st.cache_data(persist="disk")
def load_reference_data():
    return pd.read_csv("reference.csv")

# For real-time data with appropriate TTL
@st.cache_data(ttl=60)  # 1 minute
def get_live_metrics():
    return fetch_current_metrics()

# For user-specific data with size limits
@st.cache_data(max_entries=1000)
def get_user_analysis(user_id, analysis_type):
    return perform_analysis(user_id, analysis_type)

# For expensive resources with validation
@st.cache_resource(validate=lambda x: x.is_healthy())
def get_ml_service():
    return MLService()

Install with Tessl CLI