Open source library for training and deploying models on Amazon SageMaker.
npx @tessl/cli install tessl/pypi-sagemaker@2.251.00
# SageMaker Python SDK
1
2
A comprehensive Python library for training and deploying machine learning models on Amazon SageMaker. Provides high-level abstractions and APIs for the complete machine learning workflow including data preprocessing, model training, hyperparameter tuning, batch inference, and real-time endpoint deployment across popular frameworks like TensorFlow, PyTorch, Scikit-learn, XGBoost, and Hugging Face.
3
4
## Package Information
5
6
- **Package Name**: sagemaker
7
- **Language**: Python
8
- **Installation**: `pip install sagemaker`
9
- **Documentation**: https://sagemaker.readthedocs.io/
10
11
## Core Imports
12
13
```python
14
import sagemaker
15
```
16
17
Common session and role management:
18
19
```python
20
from sagemaker import Session, get_execution_role
21
```
22
23
Training and model deployment:
24
25
```python
26
from sagemaker import Estimator, Model, Predictor
27
from sagemaker.inputs import TrainingInput
28
```
29
30
## Basic Usage
31
32
```python
33
import sagemaker
34
from sagemaker import Session, get_execution_role
35
from sagemaker.sklearn import SKLearn
36
37
# Set up SageMaker session and IAM role
38
sagemaker_session = Session()
39
role = get_execution_role()
40
41
# Create a scikit-learn estimator
42
sklearn_estimator = SKLearn(
43
entry_point="train.py",
44
framework_version="1.2-1",
45
instance_type="ml.m5.large",
46
role=role,
47
sagemaker_session=sagemaker_session
48
)
49
50
# Train the model
51
sklearn_estimator.fit({"training": "s3://my-bucket/train-data"})
52
53
# Deploy the model
54
predictor = sklearn_estimator.deploy(
55
initial_instance_count=1,
56
instance_type="ml.m5.large"
57
)
58
59
# Make predictions
60
predictions = predictor.predict(test_data)
61
62
# Clean up
63
predictor.delete_endpoint()
64
```
65
66
## Architecture
67
68
The SageMaker Python SDK follows a layered architecture that abstracts AWS SageMaker complexity:
69
70
- **Session Layer**: Manages AWS credentials, regions, and service configurations
71
- **Estimator Layer**: High-level training interfaces for different ML frameworks
72
- **Model Layer**: Model deployment and management abstractions
73
- **Predictor Layer**: Real-time and batch inference clients
74
- **Processing Layer**: Data preprocessing and feature engineering jobs
75
- **Pipeline Layer**: End-to-end ML workflow orchestration
76
77
This design enables developers to focus on ML logic while the SDK handles AWS service integration, resource management, and deployment complexities.
78
79
## Capabilities
80
81
### Core Training and Model Management
82
83
Fundamental classes for training models and managing deployments including estimators, models, predictors, and session management. These form the foundation of the SageMaker workflow.
84
85
```python { .api }
86
class Estimator:
87
def __init__(self, image_uri: str, role: str = None, instance_count: int = None,
88
instance_type: str = None, keep_alive_period_in_seconds: int = None,
89
volume_size: int = 30, max_run: int = 24*60*60, input_mode: str = "File",
90
output_path: str = None, base_job_name: str = None,
91
sagemaker_session: Session = None, hyperparameters: dict = None,
92
tags: list = None, subnets: list = None, security_group_ids: list = None,
93
**kwargs): ...
94
def fit(self, inputs, wait: bool = True, logs: str = "All", job_name: str = None,
95
experiment_config: dict = None): ...
96
def deploy(self, initial_instance_count: int, instance_type: str, **kwargs) -> Predictor: ...
97
98
class Model:
99
def __init__(self, image_uri: str = None, model_data: str = None, role: str = None,
100
predictor_cls: callable = None, env: dict = None, name: str = None,
101
vpc_config: dict = None, sagemaker_session: Session = None,
102
enable_network_isolation: bool = None, model_kms_key: str = None,
103
image_config: dict = None, source_dir: str = None, code_location: str = None,
104
entry_point: str = None, container_log_level: int = logging.INFO,
105
dependencies: list = None, git_config: dict = None, **kwargs): ...
106
def deploy(self, initial_instance_count: int, instance_type: str, **kwargs) -> Predictor: ...
107
108
class Predictor:
109
def predict(self, data, **kwargs): ...
110
def delete_endpoint(self): ...
111
112
class Session:
113
def __init__(self, boto_session=None, sagemaker_client=None, sagemaker_runtime_client=None,
114
sagemaker_featurestore_runtime_client=None, default_bucket: str = None,
115
settings=None, sagemaker_metrics_client=None, sagemaker_config: dict = None,
116
default_bucket_prefix: str = None): ...
117
def upload_data(self, path: str, bucket: str, key_prefix: str) -> str: ...
118
119
def get_execution_role(sagemaker_session: Session = None, use_default: bool = False) -> str: ...
120
```
121
122
[Core Training and Models](./core-training.md)
123
124
### Framework-Specific Training
125
126
Support for popular ML frameworks including PyTorch, TensorFlow, Scikit-learn, XGBoost, Hugging Face, and MXNet. Each framework provides optimized containers and training configurations.
127
128
```python { .api }
129
# PyTorch
130
class PyTorch(Estimator):
131
def __init__(self, entry_point: str, framework_version: str, py_version: str, **kwargs): ...
132
133
# TensorFlow
134
class TensorFlow(Estimator):
135
def __init__(self, entry_point: str, framework_version: str, py_version: str, **kwargs): ...
136
137
# Scikit-learn
138
class SKLearn(Estimator):
139
def __init__(self, entry_point: str, framework_version: str, **kwargs): ...
140
141
# XGBoost
142
class XGBoost(Estimator):
143
def __init__(self, entry_point: str, framework_version: str, **kwargs): ...
144
145
# Hugging Face
146
class HuggingFace(Estimator):
147
def __init__(self, entry_point: str, transformers_version: str, pytorch_version: str, **kwargs): ...
148
```
149
150
[Framework Training](./framework-training.md)
151
152
### Amazon Built-in Algorithms
153
154
Pre-built, optimized algorithms for common ML tasks including clustering, dimensionality reduction, classification, regression, and anomaly detection.
155
156
```python { .api }
157
# Clustering
158
class KMeans(Estimator):
159
def __init__(self, role: str, instance_count: int, instance_type: str, k: int, **kwargs): ...
160
161
# Dimensionality Reduction
162
class PCA(Estimator):
163
def __init__(self, role: str, instance_count: int, instance_type: str, num_components: int, **kwargs): ...
164
165
# Classification/Regression
166
class LinearLearner(Estimator):
167
def __init__(self, role: str, instance_count: int, instance_type: str, **kwargs): ...
168
169
# Anomaly Detection
170
class RandomCutForest(Estimator):
171
def __init__(self, role: str, instance_count: int, instance_type: str, **kwargs): ...
172
```
173
174
[Amazon Algorithms](./amazon-algorithms.md)
175
176
### AutoML
177
178
Automated machine learning capabilities for tabular data, image classification, text classification, and time series forecasting with minimal configuration required.
179
180
```python { .api }
181
# AutoML v1
182
class AutoML:
183
def __init__(self, role: str = None, target_attribute_name: str = None,
184
output_kms_key: str = None, output_path: str = None,
185
base_job_name: str = None, compression_type: str = None,
186
sagemaker_session: Session = None, volume_kms_key: str = None,
187
encrypt_inter_container_traffic: bool = None,
188
vpc_config: dict = None, problem_type: str = None,
189
max_candidates: int = None, **kwargs): ...
190
def fit(self, inputs, wait: bool = True, logs: bool = True,
191
job_name: str = None): ...
192
193
class AutoMLInput:
194
def __init__(self, inputs, target_attribute_name: str, compression: str = None,
195
channel_type: str = None, content_type: str = None,
196
s3_data_type: str = None, sample_weight_attribute_name: str = None): ...
197
198
# AutoML v2
199
class AutoMLV2:
200
def __init__(self, role: str = None, output_kms_key: str = None,
201
output_path: str = None, base_job_name: str = None,
202
sagemaker_session: Session = None, volume_kms_key: str = None,
203
encrypt_inter_container_traffic: bool = None, **kwargs): ...
204
def fit(self, inputs, wait: bool = True, logs: bool = True,
205
job_name: str = None): ...
206
207
class AutoMLDataChannel:
208
def __init__(self, s3_data_source: str, target_attribute_name: str = None,
209
channel_type: str = None, content_type: str = None,
210
compression_type: str = None, sample_weight_attribute_name: str = None): ...
211
212
# Configuration classes
213
class AutoMLTabularConfig:
214
def __init__(self, target_attribute_name: str, problem_type: str = None,
215
job_objective: dict = None, **kwargs): ...
216
217
class AutoMLTimeSeriesForecastingConfig:
218
def __init__(self, forecast_frequency: str, forecast_horizon: int,
219
forecast_quantiles: list = None, **kwargs): ...
220
```
221
222
[AutoML](./automl.md)
223
224
### Model Serving and Inference
225
226
Comprehensive model deployment options including real-time endpoints, batch transform, serverless inference, and multi-model endpoints with custom serialization support.
227
228
```python { .api }
229
# Model deployment
230
class ModelBuilder:
231
def __init__(self, **kwargs): ...
232
def build(self, mode: Mode, role: str, sagemaker_session: Session) -> Model: ...
233
234
# Inference specification
235
class InferenceSpec:
236
def load(self, model_dir: str): ...
237
def invoke(self, input_object, model): ...
238
239
# Serializers
240
class JSONSerializer(BaseSerializer):
241
def serialize(self, data) -> bytes: ...
242
243
class CSVSerializer(BaseSerializer):
244
def serialize(self, data) -> bytes: ...
245
246
# Deserializers
247
class JSONDeserializer(BaseDeserializer):
248
def deserialize(self, stream, content_type: str): ...
249
```
250
251
[Model Serving](./model-serving.md)
252
253
### Data Processing
254
255
Data preprocessing capabilities including built-in processing containers, custom processing jobs, and Spark integration for large-scale data transformation.
256
257
```python { .api }
258
class Processor:
259
def __init__(self, role: str, image_uri: str, instance_count: int, instance_type: str, **kwargs): ...
260
def run(self, inputs: List[ProcessingInput], outputs: List[ProcessingOutput], **kwargs): ...
261
262
class ScriptProcessor(Processor):
263
def __init__(self, command: List[str], **kwargs): ...
264
265
# Framework processors
266
class PyTorchProcessor(Processor): ...
267
class SKLearnProcessor(Processor): ...
268
class SparkMLProcessor(Processor): ...
269
```
270
271
[Data Processing](./data-processing.md)
272
273
### Model Monitoring
274
275
Comprehensive model monitoring including data quality, model quality, bias detection, and explainability analysis with scheduled monitoring jobs.
276
277
```python { .api }
278
class ModelMonitor:
279
def __init__(self, role: str, **kwargs): ...
280
def create_monitoring_schedule(self, **kwargs): ...
281
282
class DefaultModelMonitor(ModelMonitor): ...
283
284
class ModelBiasMonitor(ModelMonitor):
285
def __init__(self, role: str, **kwargs): ...
286
287
class ModelExplainabilityMonitor(ModelMonitor):
288
def __init__(self, role: str, **kwargs): ...
289
290
class DataCaptureConfig:
291
def __init__(self, enable_capture: bool, sampling_percentage: int, **kwargs): ...
292
```
293
294
[Model Monitoring](./model-monitoring.md)
295
296
### Hyperparameter Tuning
297
298
Automated hyperparameter optimization with support for multiple search strategies, early stopping, and warm starting from previous tuning jobs.
299
300
```python { .api }
301
class HyperparameterTuner:
302
def __init__(self, estimator: Estimator, objective_metric_name: str,
303
hyperparameter_ranges: dict, **kwargs): ...
304
def fit(self, inputs, **kwargs): ...
305
def deploy(self, initial_instance_count: int, instance_type: str, **kwargs) -> Predictor: ...
306
307
class IntegerParameter:
308
def __init__(self, min_value: int, max_value: int): ...
309
310
class ContinuousParameter:
311
def __init__(self, min_value: float, max_value: float): ...
312
313
class CategoricalParameter:
314
def __init__(self, values: List[str]): ...
315
```
316
317
[Hyperparameter Tuning](./hyperparameter-tuning.md)
318
319
### Experiments and Tracking
320
321
Experiment management and tracking capabilities for organizing ML workflows, comparing runs, and tracking metrics across training jobs.
322
323
```python { .api }
324
class Experiment:
325
def __init__(self, experiment_name: str, description: str = None, **kwargs): ...
326
def create(self) -> dict: ...
327
328
class Run:
329
def __init__(self, experiment_name: str, sagemaker_session: Session = None): ...
330
def log_parameter(self, name: str, value): ...
331
def log_metric(self, name: str, value: float, step: int = None): ...
332
333
def load_run(sagemaker_session: Session = None, **kwargs) -> Run: ...
334
def list_runs(experiment_name: str = None, **kwargs) -> List[dict]: ...
335
```
336
337
[Experiments](./experiments.md)
338
339
### Debugging and Profiling
340
341
Comprehensive model debugging and performance profiling tools including tensor analysis, system metrics collection, and framework-specific profiling.
342
343
```python { .api }
344
class ProfilerConfig:
345
def __init__(self, s3_output_path: str = None, profiling_interval_millis: int = None, **kwargs): ...
346
347
class Profiler:
348
def __init__(self, **kwargs): ...
349
350
class DebuggerHookConfig:
351
def __init__(self, s3_output_path: str, **kwargs): ...
352
353
class Rule:
354
def __init__(self, name: str, image_uri: str, **kwargs): ...
355
356
class ProfilerRule(Rule):
357
def __init__(self, name: str, **kwargs): ...
358
```
359
360
[Debugging and Profiling](./debugging-profiling.md)
361
362
363
### Remote Functions
364
365
Execute Python functions remotely on SageMaker compute with automatic dependency management, data transfer, and result retrieval.
366
367
```python { .api }
368
@remote(
369
instance_type: str,
370
instance_count: int = 1,
371
role: str = None,
372
**kwargs
373
)
374
def remote_function(): ...
375
376
class RemoteExecutor:
377
def __init__(self, **kwargs): ...
378
def submit(self, func, *args, **kwargs): ...
379
```
380
381
[Remote Functions](./remote-functions.md)
382
383
## Types
384
385
```python { .api }
386
# Training input configuration
387
class TrainingInput:
388
def __init__(self, s3_data: str, s3_data_type: str = "S3Prefix", **kwargs): ...
389
390
# Processing input/output
391
class ProcessingInput:
392
def __init__(self, source: str, destination: str, **kwargs): ...
393
394
class ProcessingOutput:
395
def __init__(self, source: str, s3_upload_path: str, **kwargs): ...
396
397
# Model metrics
398
class ModelMetrics:
399
def __init__(self, model_statistics: MetricsSource = None,
400
model_constraints: MetricsSource = None, **kwargs): ...
401
402
class MetricsSource:
403
def __init__(self, s3_uri: str, content_type: str): ...
404
405
# Network configuration
406
class NetworkConfig:
407
def __init__(self, enable_network_isolation: bool = False,
408
security_group_ids: List[str] = None, **kwargs): ...
409
410
# Instance configuration
411
class InstanceConfig:
412
def __init__(self, instance_type: str, instance_count: int = 1, **kwargs): ...
413
```