or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

amazon-algorithms.mdautoml.mdcore-training.mddata-processing.mddebugging-profiling.mdexperiments.mdframework-training.mdhyperparameter-tuning.mdindex.mdmodel-monitoring.mdmodel-serving.mdremote-functions.md

index.mddocs/

0

# SageMaker Python SDK

1

2

A comprehensive Python library for training and deploying machine learning models on Amazon SageMaker. Provides high-level abstractions and APIs for the complete machine learning workflow including data preprocessing, model training, hyperparameter tuning, batch inference, and real-time endpoint deployment across popular frameworks like TensorFlow, PyTorch, Scikit-learn, XGBoost, and Hugging Face.

3

4

## Package Information

5

6

- **Package Name**: sagemaker

7

- **Language**: Python

8

- **Installation**: `pip install sagemaker`

9

- **Documentation**: https://sagemaker.readthedocs.io/

10

11

## Core Imports

12

13

```python

14

import sagemaker

15

```

16

17

Common session and role management:

18

19

```python

20

from sagemaker import Session, get_execution_role

21

```

22

23

Training and model deployment:

24

25

```python

26

from sagemaker import Estimator, Model, Predictor

27

from sagemaker.inputs import TrainingInput

28

```

29

30

## Basic Usage

31

32

```python

33

import sagemaker

34

from sagemaker import Session, get_execution_role

35

from sagemaker.sklearn import SKLearn

36

37

# Set up SageMaker session and IAM role

38

sagemaker_session = Session()

39

role = get_execution_role()

40

41

# Create a scikit-learn estimator

42

sklearn_estimator = SKLearn(

43

entry_point="train.py",

44

framework_version="1.2-1",

45

instance_type="ml.m5.large",

46

role=role,

47

sagemaker_session=sagemaker_session

48

)

49

50

# Train the model

51

sklearn_estimator.fit({"training": "s3://my-bucket/train-data"})

52

53

# Deploy the model

54

predictor = sklearn_estimator.deploy(

55

initial_instance_count=1,

56

instance_type="ml.m5.large"

57

)

58

59

# Make predictions

60

predictions = predictor.predict(test_data)

61

62

# Clean up

63

predictor.delete_endpoint()

64

```

65

66

## Architecture

67

68

The SageMaker Python SDK follows a layered architecture that abstracts AWS SageMaker complexity:

69

70

- **Session Layer**: Manages AWS credentials, regions, and service configurations

71

- **Estimator Layer**: High-level training interfaces for different ML frameworks

72

- **Model Layer**: Model deployment and management abstractions

73

- **Predictor Layer**: Real-time and batch inference clients

74

- **Processing Layer**: Data preprocessing and feature engineering jobs

75

- **Pipeline Layer**: End-to-end ML workflow orchestration

76

77

This design enables developers to focus on ML logic while the SDK handles AWS service integration, resource management, and deployment complexities.

78

79

## Capabilities

80

81

### Core Training and Model Management

82

83

Fundamental classes for training models and managing deployments including estimators, models, predictors, and session management. These form the foundation of the SageMaker workflow.

84

85

```python { .api }

86

class Estimator:

87

def __init__(self, image_uri: str, role: str = None, instance_count: int = None,

88

instance_type: str = None, keep_alive_period_in_seconds: int = None,

89

volume_size: int = 30, max_run: int = 24*60*60, input_mode: str = "File",

90

output_path: str = None, base_job_name: str = None,

91

sagemaker_session: Session = None, hyperparameters: dict = None,

92

tags: list = None, subnets: list = None, security_group_ids: list = None,

93

**kwargs): ...

94

def fit(self, inputs, wait: bool = True, logs: str = "All", job_name: str = None,

95

experiment_config: dict = None): ...

96

def deploy(self, initial_instance_count: int, instance_type: str, **kwargs) -> Predictor: ...

97

98

class Model:

99

def __init__(self, image_uri: str = None, model_data: str = None, role: str = None,

100

predictor_cls: callable = None, env: dict = None, name: str = None,

101

vpc_config: dict = None, sagemaker_session: Session = None,

102

enable_network_isolation: bool = None, model_kms_key: str = None,

103

image_config: dict = None, source_dir: str = None, code_location: str = None,

104

entry_point: str = None, container_log_level: int = logging.INFO,

105

dependencies: list = None, git_config: dict = None, **kwargs): ...

106

def deploy(self, initial_instance_count: int, instance_type: str, **kwargs) -> Predictor: ...

107

108

class Predictor:

109

def predict(self, data, **kwargs): ...

110

def delete_endpoint(self): ...

111

112

class Session:

113

def __init__(self, boto_session=None, sagemaker_client=None, sagemaker_runtime_client=None,

114

sagemaker_featurestore_runtime_client=None, default_bucket: str = None,

115

settings=None, sagemaker_metrics_client=None, sagemaker_config: dict = None,

116

default_bucket_prefix: str = None): ...

117

def upload_data(self, path: str, bucket: str, key_prefix: str) -> str: ...

118

119

def get_execution_role(sagemaker_session: Session = None, use_default: bool = False) -> str: ...

120

```

121

122

[Core Training and Models](./core-training.md)

123

124

### Framework-Specific Training

125

126

Support for popular ML frameworks including PyTorch, TensorFlow, Scikit-learn, XGBoost, Hugging Face, and MXNet. Each framework provides optimized containers and training configurations.

127

128

```python { .api }

129

# PyTorch

130

class PyTorch(Estimator):

131

def __init__(self, entry_point: str, framework_version: str, py_version: str, **kwargs): ...

132

133

# TensorFlow

134

class TensorFlow(Estimator):

135

def __init__(self, entry_point: str, framework_version: str, py_version: str, **kwargs): ...

136

137

# Scikit-learn

138

class SKLearn(Estimator):

139

def __init__(self, entry_point: str, framework_version: str, **kwargs): ...

140

141

# XGBoost

142

class XGBoost(Estimator):

143

def __init__(self, entry_point: str, framework_version: str, **kwargs): ...

144

145

# Hugging Face

146

class HuggingFace(Estimator):

147

def __init__(self, entry_point: str, transformers_version: str, pytorch_version: str, **kwargs): ...

148

```

149

150

[Framework Training](./framework-training.md)

151

152

### Amazon Built-in Algorithms

153

154

Pre-built, optimized algorithms for common ML tasks including clustering, dimensionality reduction, classification, regression, and anomaly detection.

155

156

```python { .api }

157

# Clustering

158

class KMeans(Estimator):

159

def __init__(self, role: str, instance_count: int, instance_type: str, k: int, **kwargs): ...

160

161

# Dimensionality Reduction

162

class PCA(Estimator):

163

def __init__(self, role: str, instance_count: int, instance_type: str, num_components: int, **kwargs): ...

164

165

# Classification/Regression

166

class LinearLearner(Estimator):

167

def __init__(self, role: str, instance_count: int, instance_type: str, **kwargs): ...

168

169

# Anomaly Detection

170

class RandomCutForest(Estimator):

171

def __init__(self, role: str, instance_count: int, instance_type: str, **kwargs): ...

172

```

173

174

[Amazon Algorithms](./amazon-algorithms.md)

175

176

### AutoML

177

178

Automated machine learning capabilities for tabular data, image classification, text classification, and time series forecasting with minimal configuration required.

179

180

```python { .api }

181

# AutoML v1

182

class AutoML:

183

def __init__(self, role: str = None, target_attribute_name: str = None,

184

output_kms_key: str = None, output_path: str = None,

185

base_job_name: str = None, compression_type: str = None,

186

sagemaker_session: Session = None, volume_kms_key: str = None,

187

encrypt_inter_container_traffic: bool = None,

188

vpc_config: dict = None, problem_type: str = None,

189

max_candidates: int = None, **kwargs): ...

190

def fit(self, inputs, wait: bool = True, logs: bool = True,

191

job_name: str = None): ...

192

193

class AutoMLInput:

194

def __init__(self, inputs, target_attribute_name: str, compression: str = None,

195

channel_type: str = None, content_type: str = None,

196

s3_data_type: str = None, sample_weight_attribute_name: str = None): ...

197

198

# AutoML v2

199

class AutoMLV2:

200

def __init__(self, role: str = None, output_kms_key: str = None,

201

output_path: str = None, base_job_name: str = None,

202

sagemaker_session: Session = None, volume_kms_key: str = None,

203

encrypt_inter_container_traffic: bool = None, **kwargs): ...

204

def fit(self, inputs, wait: bool = True, logs: bool = True,

205

job_name: str = None): ...

206

207

class AutoMLDataChannel:

208

def __init__(self, s3_data_source: str, target_attribute_name: str = None,

209

channel_type: str = None, content_type: str = None,

210

compression_type: str = None, sample_weight_attribute_name: str = None): ...

211

212

# Configuration classes

213

class AutoMLTabularConfig:

214

def __init__(self, target_attribute_name: str, problem_type: str = None,

215

job_objective: dict = None, **kwargs): ...

216

217

class AutoMLTimeSeriesForecastingConfig:

218

def __init__(self, forecast_frequency: str, forecast_horizon: int,

219

forecast_quantiles: list = None, **kwargs): ...

220

```

221

222

[AutoML](./automl.md)

223

224

### Model Serving and Inference

225

226

Comprehensive model deployment options including real-time endpoints, batch transform, serverless inference, and multi-model endpoints with custom serialization support.

227

228

```python { .api }

229

# Model deployment

230

class ModelBuilder:

231

def __init__(self, **kwargs): ...

232

def build(self, mode: Mode, role: str, sagemaker_session: Session) -> Model: ...

233

234

# Inference specification

235

class InferenceSpec:

236

def load(self, model_dir: str): ...

237

def invoke(self, input_object, model): ...

238

239

# Serializers

240

class JSONSerializer(BaseSerializer):

241

def serialize(self, data) -> bytes: ...

242

243

class CSVSerializer(BaseSerializer):

244

def serialize(self, data) -> bytes: ...

245

246

# Deserializers

247

class JSONDeserializer(BaseDeserializer):

248

def deserialize(self, stream, content_type: str): ...

249

```

250

251

[Model Serving](./model-serving.md)

252

253

### Data Processing

254

255

Data preprocessing capabilities including built-in processing containers, custom processing jobs, and Spark integration for large-scale data transformation.

256

257

```python { .api }

258

class Processor:

259

def __init__(self, role: str, image_uri: str, instance_count: int, instance_type: str, **kwargs): ...

260

def run(self, inputs: List[ProcessingInput], outputs: List[ProcessingOutput], **kwargs): ...

261

262

class ScriptProcessor(Processor):

263

def __init__(self, command: List[str], **kwargs): ...

264

265

# Framework processors

266

class PyTorchProcessor(Processor): ...

267

class SKLearnProcessor(Processor): ...

268

class SparkMLProcessor(Processor): ...

269

```

270

271

[Data Processing](./data-processing.md)

272

273

### Model Monitoring

274

275

Comprehensive model monitoring including data quality, model quality, bias detection, and explainability analysis with scheduled monitoring jobs.

276

277

```python { .api }

278

class ModelMonitor:

279

def __init__(self, role: str, **kwargs): ...

280

def create_monitoring_schedule(self, **kwargs): ...

281

282

class DefaultModelMonitor(ModelMonitor): ...

283

284

class ModelBiasMonitor(ModelMonitor):

285

def __init__(self, role: str, **kwargs): ...

286

287

class ModelExplainabilityMonitor(ModelMonitor):

288

def __init__(self, role: str, **kwargs): ...

289

290

class DataCaptureConfig:

291

def __init__(self, enable_capture: bool, sampling_percentage: int, **kwargs): ...

292

```

293

294

[Model Monitoring](./model-monitoring.md)

295

296

### Hyperparameter Tuning

297

298

Automated hyperparameter optimization with support for multiple search strategies, early stopping, and warm starting from previous tuning jobs.

299

300

```python { .api }

301

class HyperparameterTuner:

302

def __init__(self, estimator: Estimator, objective_metric_name: str,

303

hyperparameter_ranges: dict, **kwargs): ...

304

def fit(self, inputs, **kwargs): ...

305

def deploy(self, initial_instance_count: int, instance_type: str, **kwargs) -> Predictor: ...

306

307

class IntegerParameter:

308

def __init__(self, min_value: int, max_value: int): ...

309

310

class ContinuousParameter:

311

def __init__(self, min_value: float, max_value: float): ...

312

313

class CategoricalParameter:

314

def __init__(self, values: List[str]): ...

315

```

316

317

[Hyperparameter Tuning](./hyperparameter-tuning.md)

318

319

### Experiments and Tracking

320

321

Experiment management and tracking capabilities for organizing ML workflows, comparing runs, and tracking metrics across training jobs.

322

323

```python { .api }

324

class Experiment:

325

def __init__(self, experiment_name: str, description: str = None, **kwargs): ...

326

def create(self) -> dict: ...

327

328

class Run:

329

def __init__(self, experiment_name: str, sagemaker_session: Session = None): ...

330

def log_parameter(self, name: str, value): ...

331

def log_metric(self, name: str, value: float, step: int = None): ...

332

333

def load_run(sagemaker_session: Session = None, **kwargs) -> Run: ...

334

def list_runs(experiment_name: str = None, **kwargs) -> List[dict]: ...

335

```

336

337

[Experiments](./experiments.md)

338

339

### Debugging and Profiling

340

341

Comprehensive model debugging and performance profiling tools including tensor analysis, system metrics collection, and framework-specific profiling.

342

343

```python { .api }

344

class ProfilerConfig:

345

def __init__(self, s3_output_path: str = None, profiling_interval_millis: int = None, **kwargs): ...

346

347

class Profiler:

348

def __init__(self, **kwargs): ...

349

350

class DebuggerHookConfig:

351

def __init__(self, s3_output_path: str, **kwargs): ...

352

353

class Rule:

354

def __init__(self, name: str, image_uri: str, **kwargs): ...

355

356

class ProfilerRule(Rule):

357

def __init__(self, name: str, **kwargs): ...

358

```

359

360

[Debugging and Profiling](./debugging-profiling.md)

361

362

363

### Remote Functions

364

365

Execute Python functions remotely on SageMaker compute with automatic dependency management, data transfer, and result retrieval.

366

367

```python { .api }

368

@remote(

369

instance_type: str,

370

instance_count: int = 1,

371

role: str = None,

372

**kwargs

373

)

374

def remote_function(): ...

375

376

class RemoteExecutor:

377

def __init__(self, **kwargs): ...

378

def submit(self, func, *args, **kwargs): ...

379

```

380

381

[Remote Functions](./remote-functions.md)

382

383

## Types

384

385

```python { .api }

386

# Training input configuration

387

class TrainingInput:

388

def __init__(self, s3_data: str, s3_data_type: str = "S3Prefix", **kwargs): ...

389

390

# Processing input/output

391

class ProcessingInput:

392

def __init__(self, source: str, destination: str, **kwargs): ...

393

394

class ProcessingOutput:

395

def __init__(self, source: str, s3_upload_path: str, **kwargs): ...

396

397

# Model metrics

398

class ModelMetrics:

399

def __init__(self, model_statistics: MetricsSource = None,

400

model_constraints: MetricsSource = None, **kwargs): ...

401

402

class MetricsSource:

403

def __init__(self, s3_uri: str, content_type: str): ...

404

405

# Network configuration

406

class NetworkConfig:

407

def __init__(self, enable_network_isolation: bool = False,

408

security_group_ids: List[str] = None, **kwargs): ...

409

410

# Instance configuration

411

class InstanceConfig:

412

def __init__(self, instance_type: str, instance_count: int = 1, **kwargs): ...

413

```