or run

tessl search
Log in

Version

Files

docs

ai-registry.mdclarify.mddata-io.mddebugger.mdevaluation.mdexperiments.mdexplainer-config.mdindex.mdjumpstart.mdlineage.mdmlops.mdmonitoring.mdprocessing.mdremote-functions.mdresources.mds3-utilities.mdserving.mdtraining.mdworkflow-primitives.md
tile.json

experiments.mddocs/

Experiments & Tracking

Experiment tracking and management for organizing ML workflows, comparing runs, and managing model metadata.

Capabilities

Experiment

Experiment management for organizing related trials and runs.

class Experiment:
    """
    Experiment management for ML workflows.

    Parameters:
        experiment_name: str - Experiment name (required)
            - 1-120 characters
            - Alphanumeric, hyphens, underscores
        description: Optional[str] - Experiment description
            - Maximum 3072 characters
        tags: Optional[List[Tag]] - Resource tags
        sagemaker_session: Optional[Session] - SageMaker session

    Methods:
        create(experiment_name, description=None, tags=None, sagemaker_session=None) -> Experiment
            Create new experiment.
            
            Parameters:
                experiment_name: str - Experiment name (required)
                description: Optional[str] - Description
                tags: Optional[List[Tag]] - Tags
                sagemaker_session: Optional[Session] - Session
            
            Returns:
                Experiment: Created experiment
            
            Raises:
                ValueError: If experiment_name invalid
                ClientError: If experiment already exists
        
        load(experiment_name, sagemaker_session=None) -> Experiment
            Load existing experiment.
            
            Parameters:
                experiment_name: str - Experiment name (required)
                sagemaker_session: Optional[Session] - Session
            
            Returns:
                Experiment: Loaded experiment
            
            Raises:
                ClientError: If experiment doesn't exist
        
        list(sort_by="CreationTime", sort_order="Descending", max_results=100, 
             sagemaker_session=None) -> List[Experiment]
            List experiments.
            
            Parameters:
                sort_by: str - Sort field (default: "CreationTime")
                    - "CreationTime", "Name"
                sort_order: str - Sort order (default: "Descending")
                    - "Ascending", "Descending"
                max_results: int - Maximum results (default: 100, max: 100)
                sagemaker_session: Optional[Session] - Session
            
            Returns:
                List[Experiment]: Experiments list
        
        delete() -> None
            Delete the experiment.
            
            Raises:
                ClientError: If experiment has associated runs (delete runs first)

    Attributes:
        experiment_name: str - Experiment name
        experiment_arn: str - Experiment ARN
        description: Optional[str] - Experiment description
        creation_time: datetime - Creation timestamp
        created_by: Dict - Creator information
        last_modified_time: datetime - Last modification timestamp
        last_modified_by: Dict - Last modifier information
    
    Notes:
        - Experiments organize related runs/trials
        - Delete all runs before deleting experiment
        - Tags useful for cost tracking and organization
        - Cannot rename experiment after creation
    """

Usage:

from sagemaker.core.experiments import Experiment

# Create experiment for project
try:
    experiment = Experiment.create(
        experiment_name="customer-churn-prediction",
        description="Experiments for customer churn prediction model",
        tags=[
            {"Key": "Project", "Value": "CustomerChurn"},
            {"Key": "Team", "Value": "DataScience"}
        ]
    )
    print(f"Experiment created: {experiment.experiment_arn}")
    
except ClientError as e:
    if e.response['Error']['Code'] == 'ResourceInUse':
        print("Experiment already exists, loading...")
        experiment = Experiment.load("customer-churn-prediction")

# List all experiments
experiments = Experiment.list(
    sort_by="CreationTime",
    sort_order="Descending",
    max_results=20
)

print(f"\nRecent experiments:")
for exp in experiments[:5]:
    print(f"  {exp.experiment_name} - {exp.description}")

# Delete experiment (after deleting all runs)
# experiment.delete()

Run

Run management for individual training runs within experiments.

class Run:
    """
    Run management for tracking training executions.

    Parameters:
        experiment_name: str - Parent experiment name (required)
        run_name: Optional[str] - Run name (auto-generated if not provided)
            - Format: auto-generated includes timestamp
        sagemaker_session: Optional[Session] - SageMaker session

    Methods:
        log_parameter(name, value) -> None
            Log single parameter.
            
            Parameters:
                name: str - Parameter name (required)
                value: Union[str, int, float, bool] - Parameter value (required)
            
            Raises:
                ValueError: If value not JSON-serializable
        
        log_parameters(parameters) -> None
            Log multiple parameters.
            
            Parameters:
                parameters: Dict[str, Any] - Parameters dictionary (required)
        
        log_metric(name, value, step=None, timestamp=None) -> None
            Log metric value.
            
            Parameters:
                name: str - Metric name (required)
                value: float - Metric value (required)
                step: Optional[int] - Training step/epoch
                timestamp: Optional[datetime] - Timestamp
        
        log_metrics(metrics, step=None) -> None
            Log multiple metrics.
            
            Parameters:
                metrics: Dict[str, float] - Metrics dictionary (required)
                step: Optional[int] - Training step/epoch
        
        log_artifact(name, value, media_type="text/plain") -> None
            Log artifact.
            
            Parameters:
                name: str - Artifact name (required)
                value: str - Artifact value (required)
                media_type: str - Media type (default: "text/plain")
        
        log_file(file_path, name=None, media_type=None, is_output=True) -> None
            Log file as artifact.
            
            Parameters:
                file_path: str - Local file path (required)
                name: Optional[str] - Artifact name (default: filename)
                media_type: Optional[str] - Media type (auto-detected)
                is_output: bool - Is output artifact (default: True)
            
            Raises:
                FileNotFoundError: If file doesn't exist
        
        log_model(model_data_uri, model_type=None, framework=None, framework_version=None) -> None
            Log model artifact.
            
            Parameters:
                model_data_uri: str - S3 URI for model (required)
                model_type: Optional[str] - Model type
                framework: Optional[str] - Framework name
                framework_version: Optional[str] - Framework version
        
        wait() -> None
            Wait for run to complete (if associated with job).
        
        list(experiment_name, sort_by="CreationTime", sort_order="Descending", 
             max_results=100) -> List[Run]
            List runs in experiment.
            
            Parameters:
                experiment_name: str - Experiment name (required)
                sort_by: str - Sort field
                sort_order: str - Sort order
                max_results: int - Maximum results (1-100)
            
            Returns:
                List[Run]: Runs list

    Context Manager:
        Use with 'with' statement for automatic resource management and cleanup.

    Attributes:
        run_name: str - Run name
        experiment_name: str - Parent experiment name
        run_arn: str - Run ARN
        status: str - Run status

    Notes:
        - Use context manager for automatic cleanup
        - Log parameters before training
        - Log metrics during/after training
        - Log model and artifacts after training
        - Parameters immutable after logging
        - Metrics can be logged multiple times (time series)
    """

Usage:

from sagemaker.core.experiments import Run
import json

# Create and use run with context manager
with Run(
    experiment_name="customer-churn-prediction",
    run_name="xgboost-trial-1",
    sagemaker_session=session
) as run:
    # Log hyperparameters at start
    run.log_parameter("algorithm", "xgboost")
    run.log_parameter("learning_rate", 0.1)
    run.log_parameter("max_depth", 5)
    run.log_parameter("num_rounds", 100)
    
    # Or log all at once
    run.log_parameters({
        "min_child_weight": 3,
        "subsample": 0.8,
        "colsample_bytree": 0.8
    })
    
    # Train model (pseudo-code)
    model, history = train_xgboost_model()
    
    # Log metrics during training
    for epoch, metrics in enumerate(history):
        run.log_metrics({
            "train_loss": metrics["train_loss"],
            "train_accuracy": metrics["train_acc"],
            "val_loss": metrics["val_loss"],
            "val_accuracy": metrics["val_acc"]
        }, step=epoch)
    
    # Log final metrics
    run.log_metric("final_accuracy", 0.94)
    run.log_metric("final_f1", 0.92)
    run.log_metric("auc_roc", 0.96)
    
    # Log model
    model_uri = "s3://my-bucket/models/xgboost-model.tar.gz"
    run.log_model(
        model_data_uri=model_uri,
        model_type="xgboost",
        framework="xgboost",
        framework_version="1.7.3"
    )
    
    # Log artifacts
    run.log_file(
        file_path="confusion_matrix.png",
        name="confusion_matrix",
        media_type="image/png"
    )
    
    run.log_file(
        file_path="feature_importance.json",
        name="feature_importance",
        media_type="application/json"
    )
    
    # Log custom artifact
    config = {
        "preprocessing": "standard_scaler",
        "feature_selection": "top_20",
        "class_weights": {0: 1.0, 1: 2.5}
    }
    run.log_artifact(
        name="training_config",
        value=json.dumps(config),
        media_type="application/json"
    )

# Run automatically closed and finalized
print(f"Run completed: {run.run_name}")

Integration with Training

from sagemaker.train import ModelTrainer
from sagemaker.core.experiments import Run, Experiment

# Create experiment if needed
try:
    experiment = Experiment.create(
        experiment_name="hyperparameter-search",
        description="Finding optimal hyperparameters for ResNet"
    )
except ClientError:
    experiment = Experiment.load("hyperparameter-search")

# Run training with experiment tracking
hyperparams_to_test = [
    {"learning_rate": 0.01, "batch_size": 32},
    {"learning_rate": 0.001, "batch_size": 64},
    {"learning_rate": 0.0001, "batch_size": 128}
]

best_accuracy = 0
best_run = None

for i, hyperparams in enumerate(hyperparams_to_test):
    with Run(
        experiment_name=experiment.experiment_name,
        run_name=f"trial-{i+1}"
    ) as run:
        # Log hyperparameters
        run.log_parameters(hyperparams)
        run.log_parameter("optimizer", "adam")
        run.log_parameter("epochs", 10)
        
        # Create and train model
        trainer = ModelTrainer(
            training_image="pytorch-image",
            role=role,
            compute=Compute(
                instance_type="ml.p3.2xlarge",
                instance_count=1
            ),
            hyperparameters=hyperparams
        )
        
        trainer.train(input_data_config=[train_data, val_data])
        
        # Get metrics from training job
        job = trainer._latest_training_job
        final_metrics = job.final_metric_data_list
        
        # Log metrics
        for metric in final_metrics:
            metric_name = metric["MetricName"]
            metric_value = metric["Value"]
            run.log_metric(metric_name, metric_value)
            
            if metric_name == "validation:accuracy" and metric_value > best_accuracy:
                best_accuracy = metric_value
                best_run = run.run_name
        
        # Log model artifact
        run.log_model(
            model_data_uri=job.model_artifacts["S3ModelArtifacts"],
            model_type="pytorch",
            framework="pytorch",
            framework_version="2.0"
        )

print(f"\nBest run: {best_run} with accuracy: {best_accuracy}")

Integration with Pipelines

from sagemaker.mlops.workflow import Pipeline, TrainingStep, PipelineExperimentConfig
from sagemaker.core.workflow import ExecutionVariables

# Configure pipeline with experiment tracking
experiment_config = PipelineExperimentConfig(
    experiment_name="pipeline-experiment",
    trial_name=ExecutionVariables.PIPELINE_EXECUTION_ID  # Unique per execution
)

# Create pipeline
pipeline = Pipeline(
    name="training-pipeline",
    steps=[preprocess_step, train_step, evaluate_step],
    pipeline_experiment_config=experiment_config
)

# Each execution creates a new trial/run
execution1 = pipeline.start()  # Creates trial with execution ID 1
execution2 = pipeline.start()  # Creates trial with execution ID 2

# List runs created by pipeline
runs = Run.list(experiment_name="pipeline-experiment")
print(f"Total pipeline runs: {len(runs)}")

Advanced Usage

Compare Multiple Runs

from sagemaker.core.experiments import Run
import pandas as pd

# Get all runs from experiment
runs = Run.list(
    experiment_name="hyperparameter-search",
    sort_by="CreationTime",
    sort_order="Descending"
)

# Extract metrics and parameters
results = []
for run in runs:
    # Load run details
    run_obj = Run(
        experiment_name=run.experiment_name,
        run_name=run.run_name
    )
    
    # Get logged data (access via SageMaker API)
    run_details = {
        "run_name": run.run_name,
        "creation_time": run.creation_time,
        # Parameters and metrics retrieved via describe API
    }
    results.append(run_details)

# Create comparison DataFrame
df = pd.DataFrame(results)

# Find best run by metric
best_run = df.loc[df['validation_accuracy'].idxmax()]
print(f"\nBest run: {best_run['run_name']}")
print(f"Parameters: learning_rate={best_run['learning_rate']}, batch_size={best_run['batch_size']}")
print(f"Validation accuracy: {best_run['validation_accuracy']}")

# Visualize results
import matplotlib.pyplot as plt

plt.figure(figsize=(10, 6))
plt.scatter(df['learning_rate'], df['validation_accuracy'], s=df['batch_size'])
plt.xlabel('Learning Rate')
plt.ylabel('Validation Accuracy')
plt.title('Hyperparameter Search Results')
plt.xscale('log')
plt.show()

Nested Runs for Ensemble Models

# Parent run for ensemble
with Run(
    experiment_name="ensemble-models",
    run_name="ensemble-voting-v1"
) as parent_run:
    parent_run.log_parameter("ensemble_type", "voting")
    parent_run.log_parameter("voting_strategy", "soft")
    parent_run.log_parameter("num_models", 3)
    
    models = ["xgboost", "random_forest", "neural_net"]
    model_scores = []
    
    # Child runs for individual models
    for i, model_type in enumerate(models):
        with Run(
            experiment_name="ensemble-models",
            run_name=f"ensemble-v1-model-{i}-{model_type}"
        ) as child_run:
            # Link to parent
            child_run.log_parameter("parent_run", parent_run.run_name)
            child_run.log_parameter("model_type", model_type)
            child_run.log_parameter("ensemble_index", i)
            
            # Train individual model
            model, accuracy = train_model(model_type)
            model_scores.append(accuracy)
            
            # Log child metrics
            child_run.log_metric("accuracy", accuracy)
            child_run.log_metric("training_time", training_time)
            
            # Log model
            child_run.log_model(
                model_data_uri=f"s3://bucket/models/{model_type}.tar.gz",
                model_type=model_type
            )
    
    # Log ensemble metrics
    ensemble_predictions = create_ensemble(models)
    ensemble_accuracy = evaluate_ensemble(ensemble_predictions)
    
    parent_run.log_metric("ensemble_accuracy", ensemble_accuracy)
    parent_run.log_metric("improvement_over_best", 
                         ensemble_accuracy - max(model_scores))
    parent_run.log_parameters({
        "model_1_accuracy": model_scores[0],
        "model_2_accuracy": model_scores[1],
        "model_3_accuracy": model_scores[2]
    })

print(f"Ensemble run completed: {parent_run.run_name}")

Hyperparameter Tuning Integration

from sagemaker.train.tuner import HyperparameterTuner
from sagemaker.core.experiments import Experiment, Run

# Create experiment for HPO
experiment = Experiment.create(
    experiment_name="hpo-experiment",
    description="Hyperparameter optimization for CNN"
)

# Each tuning trial automatically tracked as run
tuner = HyperparameterTuner(
    model_trainer=trainer,
    objective_metric_name="validation:accuracy",
    hyperparameter_ranges=hyperparameter_ranges,
    max_jobs=20,
    max_parallel_jobs=3
)

# Start tuning
tuner.tune()

# Trials automatically logged as runs
runs = Run.list(experiment_name="hpo-experiment")
print(f"Total HPO trials: {len(runs)}")

# Each run contains:
# - Hyperparameter values
# - Training metrics
# - Model artifacts
# - Training job details

Logging Complex Artifacts

import matplotlib.pyplot as plt
import json
import numpy as np

with Run(experiment_name="model-analysis", run_name="visualization-run") as run:
    # Log configuration as JSON
    config = {
        "architecture": "resnet50",
        "preprocessing": {
            "normalization": "imagenet",
            "augmentation": ["flip", "rotate", "crop"]
        },
        "training": {
            "optimizer": "adam",
            "loss": "cross_entropy",
            "metrics": ["accuracy", "f1"]
        }
    }
    run.log_artifact(
        name="config",
        value=json.dumps(config, indent=2),
        media_type="application/json"
    )
    
    # Generate and log training curve
    plt.figure(figsize=(10, 6))
    plt.plot(train_losses, label='Training Loss')
    plt.plot(val_losses, label='Validation Loss')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.legend()
    plt.title('Training Progress')
    plt.savefig("loss_curve.png")
    run.log_file("loss_curve.png", name="training_curve", media_type="image/png")
    
    # Log confusion matrix
    plt.figure(figsize=(8, 8))
    plot_confusion_matrix(confusion_matrix)
    plt.savefig("confusion_matrix.png")
    run.log_file("confusion_matrix.png", media_type="image/png")
    
    # Log dataset statistics
    stats = {
        "total_samples": 50000,
        "class_distribution": {
            "class_0": 25000,
            "class_1": 15000,
            "class_2": 10000
        },
        "split": {
            "train": 0.7,
            "val": 0.15,
            "test": 0.15
        },
        "features": {
            "image_size": [224, 224, 3],
            "normalization": "imagenet"
        }
    }
    run.log_artifact(
        name="dataset_stats",
        value=json.dumps(stats, indent=2),
        media_type="application/json"
    )
    
    # Log feature importance
    feature_importance = calculate_feature_importance(model)
    run.log_artifact(
        name="feature_importance",
        value=json.dumps(feature_importance),
        media_type="application/json"
    )

Reproducibility Best Practices

import random
import numpy as np
import torch
import sys
import os

# Set all random seeds
def set_seed(seed):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)

seed = 42
set_seed(seed)

with Run(experiment_name="reproducible-experiment", run_name="trial-1") as run:
    # Log all environment details
    run.log_parameter("random_seed", seed)
    run.log_parameter("python_version", sys.version)
    run.log_parameter("torch_version", torch.__version__)
    run.log_parameter("numpy_version", np.__version__)
    run.log_parameter("cuda_version", torch.version.cuda if torch.cuda.is_available() else "none")
    
    # Log hardware info
    run.log_parameter("device", "cuda" if torch.cuda.is_available() else "cpu")
    if torch.cuda.is_available():
        run.log_parameter("gpu_name", torch.cuda.get_device_name(0))
        run.log_parameter("gpu_memory_gb", torch.cuda.get_device_properties(0).total_memory / 1e9)
    
    # Log data hash for verification
    import hashlib
    data_hash = hashlib.sha256(training_data.tobytes()).hexdigest()
    run.log_parameter("data_hash", data_hash)
    
    # Training code with deterministic behavior
    model = train_deterministic_model()
    
    # Log model hash
    model_hash = compute_model_hash(model)
    run.log_parameter("model_hash", model_hash)
    
    # Results fully reproducible with same seed

MLflow Integration

MLflow is automatically integrated when using mlflow_resource_arn parameters in evaluators and trainers.

# Create MLflow tracking server in SageMaker
import boto3

sm_client = boto3.client('sagemaker')

# Create tracking server
response = sm_client.create_mlflow_tracking_server(
    TrackingServerName='my-mlflow-server',
    ArtifactStoreUri='s3://my-bucket/mlflow',
    RoleArn='arn:aws:iam::123456789012:role/MLflowRole',
    AutomaticModelRegistration=True
)

mlflow_arn = response['TrackingServerArn']

# Use with evaluations
evaluator = BenchMarkEvaluator(
    benchmark="MMLU",
    model="my-model",
    mlflow_resource_arn=mlflow_arn,
    mlflow_experiment_name="model-evaluation",
    mlflow_run_name="mmlu-baseline"
)

# Results automatically logged to MLflow
execution = evaluator.evaluate()
execution.wait()

# Access via MLflow UI or Python client
import mlflow

mlflow.set_tracking_uri(mlflow_tracking_server_url)
experiment = mlflow.get_experiment_by_name("model-evaluation")
runs = mlflow.search_runs(experiment_ids=[experiment.experiment_id])
print(runs[['run_name', 'metrics.accuracy', 'params.model']])

Internal Classes

_Trial (Deprecated)

class _Trial:
    """
    Internal trial management (deprecated, use Run instead).

    Legacy class for backward compatibility with SDK V2.
    New code should use Run class which provides same functionality
    with improved API design.

    Notes:
        - Deprecated in SDK V3
        - Use Run for new code
        - Existing _Trial code continues to work
    """

_TrialComponent (Deprecated)

class _TrialComponent:
    """
    Internal trial component management (deprecated, use Run instead).

    Legacy class for backward compatibility with SDK V2.
    New code should use Run class.

    Notes:
        - Deprecated in SDK V3
        - Run class provides equivalent functionality
    """

_RunContext

class _RunContext:
    """
    Internal run context management.

    Manages run lifecycle and resource cleanup.
    Automatically used by Run context manager.

    Notes:
        - Internal implementation detail
        - Handles run lifecycle: create, log, finalize
        - Ensures proper cleanup on context exit
        - Not intended for direct use
    """

Validation and Constraints

Experiment Constraints

  • Experiment name: 1-120 characters, alphanumeric, hyphens, underscores
  • Description: Maximum 3072 characters
  • Maximum experiments per account: 5000
  • Tags: Maximum 50 per experiment

Run Constraints

  • Run name: Auto-generated or custom (1-120 characters)
  • Maximum parameters: 300 per run
  • Maximum metrics: 500,000 data points per run
  • Maximum artifacts: 30 per run
  • Parameter value length: Maximum 256 characters
  • Metric value: Must be numeric (float)

Logging Constraints

  • Parameter immutability: Cannot change after logging
  • Metric time series: Can log same metric multiple times with different steps
  • File size limit: 5 MB per artifact file
  • Concurrent logging: Thread-safe within single run context

Common Error Scenarios

  1. Experiment Already Exists:

    • Cause: Creating experiment with existing name
    • Solution: Use Experiment.load() or catch ResourceInUse error
  2. Run Not Finalized:

    • Cause: Accessing run outside context manager
    • Solution: Use with Run(...) as run: for automatic finalization
  3. Parameter Type Error:

    • Cause: Logging non-serializable value
    • Solution: Convert to string, int, float, or bool
  4. File Not Found:

    • Cause: log_file() with invalid path
    • Solution: Verify file exists before logging
  5. Metric Not Numeric:

    • Cause: Logging string as metric
    • Solution: Ensure metric values are float/int, use log_parameter for strings
  6. Run Name Collision:

    • Cause: Multiple runs with same name in experiment
    • Solution: Use auto-generated names or ensure uniqueness