0
# Amazon Built-in Algorithms
1
2
Pre-built, optimized machine learning algorithms provided by Amazon SageMaker for common ML tasks including clustering, dimensionality reduction, classification, regression, and anomaly detection. These algorithms are optimized for performance and scalability on SageMaker infrastructure.
3
4
## Capabilities
5
6
### K-Means Clustering
7
8
Unsupervised learning algorithm for clustering data into k groups based on feature similarity.
9
10
```python { .api }
11
class KMeans(Estimator):
12
"""
13
K-means clustering algorithm estimator.
14
15
Parameters:
16
- role (str): IAM role ARN
17
- instance_count (int): Number of training instances
18
- instance_type (str): EC2 instance type
19
- k (int): Number of clusters
20
- init_method (str, optional): Initialization method ("random", "kmeans++")
21
- local_init_method (str, optional): Local initialization method
22
- distance_metric (str, optional): Distance metric ("squared_euclidean")
23
- mini_batch_size (int, optional): Mini-batch size for mini-batch k-means
24
"""
25
def __init__(self, role: str, instance_count: int, instance_type: str, k: int,
26
init_method: str = "random", **kwargs): ...
27
28
class KMeansModel(Model):
29
"""
30
K-means model for deployment and inference.
31
"""
32
def __init__(self, model_data: str, role: str, sagemaker_session: 'Session' = None): ...
33
34
class KMeansPredictor(Predictor):
35
"""
36
K-means predictor for cluster assignment.
37
"""
38
def __init__(self, endpoint_name: str, **kwargs): ...
39
40
def predict(self, data) -> list:
41
"""
42
Predict cluster assignments for input data.
43
44
Parameters:
45
- data: Input data for clustering
46
47
Returns:
48
list: Cluster assignments and distances
49
"""
50
```
51
52
### Principal Component Analysis (PCA)
53
54
Dimensionality reduction algorithm that transforms data to lower-dimensional space while preserving variance.
55
56
```python { .api }
57
class PCA(Estimator):
58
"""
59
Principal Component Analysis estimator.
60
61
Parameters:
62
- role (str): IAM role ARN
63
- instance_count (int): Number of training instances
64
- instance_type (str): EC2 instance type
65
- num_components (int): Number of principal components
66
- algorithm_mode (str, optional): Algorithm mode ("regular", "randomized")
67
- subtract_mean (bool, optional): Whether to subtract mean
68
"""
69
def __init__(self, role: str, instance_count: int, instance_type: str,
70
num_components: int, algorithm_mode: str = "regular", **kwargs): ...
71
72
class PCAModel(Model):
73
"""
74
PCA model for deployment and inference.
75
"""
76
def __init__(self, model_data: str, role: str, sagemaker_session: 'Session' = None): ...
77
78
class PCAPredictor(Predictor):
79
"""
80
PCA predictor for dimensionality reduction.
81
"""
82
def __init__(self, endpoint_name: str, **kwargs): ...
83
84
def predict(self, data) -> list:
85
"""
86
Transform data to principal component space.
87
88
Parameters:
89
- data: Input data for transformation
90
91
Returns:
92
list: Transformed data in PC space
93
"""
94
```
95
96
### Linear Learner
97
98
Linear algorithm for classification and regression with support for multiple loss functions and regularization.
99
100
```python { .api }
101
class LinearLearner(Estimator):
102
"""
103
Linear learning algorithm for classification and regression.
104
105
Parameters:
106
- role (str): IAM role ARN
107
- instance_count (int): Number of training instances
108
- instance_type (str): EC2 instance type
109
- predictor_type (str, optional): Predictor type ("binary_classifier", "multiclass_classifier", "regressor")
110
- binary_classifier_model_selection_criteria (str, optional): Model selection criteria
111
- target_recall (float, optional): Target recall for precision-recall optimization
112
- target_precision (float, optional): Target precision for precision-recall optimization
113
- positive_example_weight_mult (float, optional): Weight multiplier for positive examples
114
- epochs (int, optional): Number of training epochs
115
- use_bias (bool, optional): Whether to use bias term
116
- num_models (int, optional): Number of parallel models to train
117
- num_calibration_samples (int, optional): Number of samples for calibration
118
- init_method (str, optional): Weight initialization method
119
- init_scale (float, optional): Scale for weight initialization
120
- init_sigma (float, optional): Standard deviation for weight initialization
121
- init_bias (float, optional): Initial bias value
122
- optimizer (str, optional): Optimization algorithm ("sgd", "adam", "rmsprop")
123
- loss (str, optional): Loss function
124
- wd (float, optional): Weight decay regularization
125
- l1 (float, optional): L1 regularization
126
- momentum (float, optional): Momentum for SGD
127
- learning_rate (float, optional): Learning rate
128
- beta_1 (float, optional): Beta1 parameter for Adam
129
- beta_2 (float, optional): Beta2 parameter for Adam
130
- bias_lr_mult (float, optional): Learning rate multiplier for bias
131
- bias_wd_mult (float, optional): Weight decay multiplier for bias
132
"""
133
def __init__(self, role: str, instance_count: int, instance_type: str,
134
predictor_type: str = "binary_classifier", **kwargs): ...
135
136
class LinearLearnerModel(Model):
137
"""
138
Linear learner model for deployment and inference.
139
"""
140
def __init__(self, model_data: str, role: str, sagemaker_session: 'Session' = None): ...
141
142
class LinearLearnerPredictor(Predictor):
143
"""
144
Linear learner predictor for classification and regression.
145
"""
146
def __init__(self, endpoint_name: str, **kwargs): ...
147
148
def predict(self, data) -> list:
149
"""
150
Make predictions using linear model.
151
152
Parameters:
153
- data: Input features for prediction
154
155
Returns:
156
list: Predictions and confidence scores
157
"""
158
```
159
160
### Factorization Machines
161
162
Algorithm for sparse data problems that learns feature interactions automatically.
163
164
```python { .api }
165
class FactorizationMachines(Estimator):
166
"""
167
Factorization Machines algorithm for sparse data.
168
169
Parameters:
170
- role (str): IAM role ARN
171
- instance_count (int): Number of training instances
172
- instance_type (str): EC2 instance type
173
- predictor_type (str, optional): Predictor type ("binary_classifier", "regressor")
174
- num_factors (int, optional): Number of factorization factors
175
- bias_lr (float, optional): Learning rate for bias term
176
- linear_lr (float, optional): Learning rate for linear term
177
- factors_lr (float, optional): Learning rate for factorization factors
178
- bias_wd (float, optional): Weight decay for bias
179
- linear_wd (float, optional): Weight decay for linear term
180
- factors_wd (float, optional): Weight decay for factors
181
- bias_init_method (str, optional): Bias initialization method
182
- bias_init_scale (float, optional): Bias initialization scale
183
- linear_init_method (str, optional): Linear term initialization method
184
- linear_init_scale (float, optional): Linear term initialization scale
185
- factors_init_method (str, optional): Factors initialization method
186
- factors_init_scale (float, optional): Factors initialization scale
187
- epochs (int, optional): Number of training epochs
188
- clip_gradient (float, optional): Gradient clipping threshold
189
- eps (float, optional): Epsilon for numerical stability
190
- rescale_grad (float, optional): Gradient rescaling factor
191
"""
192
def __init__(self, role: str, instance_count: int, instance_type: str,
193
predictor_type: str = "binary_classifier", **kwargs): ...
194
195
class FactorizationMachinesModel(Model):
196
"""
197
Factorization Machines model for deployment and inference.
198
"""
199
def __init__(self, model_data: str, role: str, sagemaker_session: 'Session' = None): ...
200
201
class FactorizationMachinesPredictor(Predictor):
202
"""
203
Factorization Machines predictor.
204
"""
205
def __init__(self, endpoint_name: str, **kwargs): ...
206
207
def predict(self, data) -> list:
208
"""
209
Make predictions using factorization machines.
210
211
Parameters:
212
- data: Sparse input features
213
214
Returns:
215
list: Predictions
216
"""
217
```
218
219
### Random Cut Forest
220
221
Unsupervised algorithm for anomaly detection that identifies outliers in data.
222
223
```python { .api }
224
class RandomCutForest(Estimator):
225
"""
226
Random Cut Forest algorithm for anomaly detection.
227
228
Parameters:
229
- role (str): IAM role ARN
230
- instance_count (int): Number of training instances
231
- instance_type (str): EC2 instance type
232
- num_samples_per_tree (int, optional): Number of samples per tree
233
- num_trees (int, optional): Number of trees in the forest
234
- feature_dim (int, optional): Feature dimension
235
- eval_metrics (list, optional): Evaluation metrics
236
"""
237
def __init__(self, role: str, instance_count: int, instance_type: str,
238
num_samples_per_tree: int = None, **kwargs): ...
239
240
class RandomCutForestModel(Model):
241
"""
242
Random Cut Forest model for deployment and inference.
243
"""
244
def __init__(self, model_data: str, role: str, sagemaker_session: 'Session' = None): ...
245
246
class RandomCutForestPredictor(Predictor):
247
"""
248
Random Cut Forest predictor for anomaly detection.
249
"""
250
def __init__(self, endpoint_name: str, **kwargs): ...
251
252
def predict(self, data) -> list:
253
"""
254
Detect anomalies in input data.
255
256
Parameters:
257
- data: Input data for anomaly detection
258
259
Returns:
260
list: Anomaly scores
261
"""
262
```
263
264
### Latent Dirichlet Allocation (LDA)
265
266
Topic modeling algorithm for discovering latent topics in document collections.
267
268
```python { .api }
269
class LDA(Estimator):
270
"""
271
Latent Dirichlet Allocation for topic modeling.
272
273
Parameters:
274
- role (str): IAM role ARN
275
- instance_count (int): Number of training instances
276
- instance_type (str): EC2 instance type
277
- num_topics (int): Number of topics to discover
278
- alpha0 (float, optional): Concentration parameter for document-topic distribution
279
- max_restarts (int, optional): Maximum number of restarts
280
- max_iterations (int, optional): Maximum number of iterations
281
- tol (float, optional): Tolerance for convergence
282
"""
283
def __init__(self, role: str, instance_count: int, instance_type: str,
284
num_topics: int, **kwargs): ...
285
286
class LDAModel(Model):
287
"""
288
LDA model for deployment and inference.
289
"""
290
def __init__(self, model_data: str, role: str, sagemaker_session: 'Session' = None): ...
291
292
class LDAPredictor(Predictor):
293
"""
294
LDA predictor for topic inference.
295
"""
296
def __init__(self, endpoint_name: str, **kwargs): ...
297
298
def predict(self, data) -> list:
299
"""
300
Infer topic distributions for documents.
301
302
Parameters:
303
- data: Document data for topic inference
304
305
Returns:
306
list: Topic distributions
307
"""
308
```
309
310
### Neural Topic Model (NTM)
311
312
Neural network-based topic modeling algorithm for learning topic representations.
313
314
```python { .api }
315
class NTM(Estimator):
316
"""
317
Neural Topic Model for topic modeling with neural networks.
318
319
Parameters:
320
- role (str): IAM role ARN
321
- instance_count (int): Number of training instances
322
- instance_type (str): EC2 instance type
323
- num_topics (int): Number of topics
324
- feature_dim (int): Feature dimension (vocabulary size)
325
- mini_batch_size (int, optional): Mini-batch size
326
- epochs (int, optional): Number of training epochs
327
- num_patience_epochs (int, optional): Early stopping patience
328
- tolerance (float, optional): Tolerance for early stopping
329
- learning_rate (float, optional): Learning rate
330
- batch_norm (bool, optional): Use batch normalization
331
- clip_gradient (float, optional): Gradient clipping threshold
332
- weight_decay (float, optional): Weight decay regularization
333
- latent_dim (int, optional): Latent dimension size
334
- encoder_layers (str, optional): Encoder layer configuration
335
- encoder_layers_activation (str, optional): Encoder activation function
336
- optimizer (str, optional): Optimizer algorithm
337
- rescale_gradient (float, optional): Gradient rescaling factor
338
"""
339
def __init__(self, role: str, instance_count: int, instance_type: str,
340
num_topics: int, feature_dim: int, **kwargs): ...
341
342
class NTMModel(Model):
343
"""
344
NTM model for deployment and inference.
345
"""
346
def __init__(self, model_data: str, role: str, sagemaker_session: 'Session' = None): ...
347
348
class NTMPredictor(Predictor):
349
"""
350
NTM predictor for topic inference.
351
"""
352
def __init__(self, endpoint_name: str, **kwargs): ...
353
354
def predict(self, data) -> list:
355
"""
356
Infer topic distributions using neural topic model.
357
358
Parameters:
359
- data: Document features for topic inference
360
361
Returns:
362
list: Topic distributions
363
"""
364
```
365
366
### K-Nearest Neighbors (KNN)
367
368
Non-parametric algorithm for classification and regression based on k nearest neighbors.
369
370
```python { .api }
371
class KNN(Estimator):
372
"""
373
K-Nearest Neighbors algorithm for classification and regression.
374
375
Parameters:
376
- role (str): IAM role ARN
377
- instance_count (int): Number of training instances
378
- instance_type (str): EC2 instance type
379
- k (int): Number of nearest neighbors
380
- predictor_type (str): Predictor type ("classifier", "regressor")
381
- sample_size (int, optional): Training sample size
382
- dimension_reduction_target (int, optional): Target dimensions after reduction
383
- dimension_reduction_type (str, optional): Dimension reduction method ("sign", "fjlt")
384
- index_metric (str, optional): Distance metric ("COSINE", "INNER_PRODUCT", "L2")
385
- index_type (str, optional): Index type ("faiss.Flat", "faiss.IVFFlat", "faiss.IVFPQ")
386
- faiss_index_ivf_nlists (int, optional): Number of inverted lists for IVF
387
- faiss_index_pq_m (int, optional): Number of sub-quantizers for PQ
388
"""
389
def __init__(self, role: str, instance_count: int, instance_type: str,
390
k: int, predictor_type: str, **kwargs): ...
391
392
class KNNModel(Model):
393
"""
394
KNN model for deployment and inference.
395
"""
396
def __init__(self, model_data: str, role: str, sagemaker_session: 'Session' = None): ...
397
398
class KNNPredictor(Predictor):
399
"""
400
KNN predictor for classification and regression.
401
"""
402
def __init__(self, endpoint_name: str, **kwargs): ...
403
404
def predict(self, data) -> list:
405
"""
406
Make predictions using k-nearest neighbors.
407
408
Parameters:
409
- data: Input features for prediction
410
411
Returns:
412
list: Predictions and neighbor information
413
"""
414
```
415
416
### Object2Vec
417
418
Algorithm for learning embeddings of objects such as sentences, customers, or products.
419
420
```python { .api }
421
class Object2Vec(Estimator):
422
"""
423
Object2Vec algorithm for learning object embeddings.
424
425
Parameters:
426
- role (str): IAM role ARN
427
- instance_count (int): Number of training instances
428
- instance_type (str): EC2 instance type
429
- enc_dim (int): Encoder output dimension
430
- mini_batch_size (int, optional): Mini-batch size
431
- epochs (int, optional): Number of training epochs
432
- early_stopping (bool, optional): Enable early stopping
433
- patience (int, optional): Early stopping patience
434
- tolerance (float, optional): Early stopping tolerance
435
- dropout (float, optional): Dropout probability
436
- weight_decay (float, optional): Weight decay regularization
437
- bucket_width (int, optional): Bucket width for sequence padding
438
- num_classes (int, optional): Number of classes for classification
439
- mlp_layers (int, optional): Number of MLP layers
440
- mlp_dim (int, optional): MLP layer dimension
441
- mlp_activation (str, optional): MLP activation function
442
- output_layer (str, optional): Output layer type
443
- optimizer (str, optional): Optimizer algorithm
444
- learning_rate (float, optional): Learning rate
445
"""
446
def __init__(self, role: str, instance_count: int, instance_type: str,
447
enc_dim: int, **kwargs): ...
448
449
class Object2VecModel(Model):
450
"""
451
Object2Vec model for deployment and inference.
452
"""
453
def __init__(self, model_data: str, role: str, sagemaker_session: 'Session' = None): ...
454
```
455
456
### IP Insights
457
458
Unsupervised algorithm for learning usage patterns of IP addresses.
459
460
```python { .api }
461
class IPInsights(Estimator):
462
"""
463
IP Insights algorithm for learning IP address usage patterns.
464
465
Parameters:
466
- role (str): IAM role ARN
467
- instance_count (int): Number of training instances
468
- instance_type (str): EC2 instance type
469
- num_entity_vectors (int): Number of entity vectors
470
- vector_dim (int): Vector dimension
471
- epochs (int, optional): Number of training epochs
472
- learning_rate (float, optional): Learning rate
473
- num_ip_encoder_layers (int, optional): Number of IP encoder layers
474
- random_negative_sampling_rate (int, optional): Negative sampling rate
475
- shuffled_negative_sampling_rate (int, optional): Shuffled negative sampling rate
476
- weight_decay (float, optional): Weight decay regularization
477
- batch_metrics_publish_interval (int, optional): Batch metrics publish interval
478
"""
479
def __init__(self, role: str, instance_count: int, instance_type: str,
480
num_entity_vectors: int, vector_dim: int, **kwargs): ...
481
482
class IPInsightsModel(Model):
483
"""
484
IP Insights model for deployment and inference.
485
"""
486
def __init__(self, model_data: str, role: str, sagemaker_session: 'Session' = None): ...
487
488
class IPInsightsPredictor(Predictor):
489
"""
490
IP Insights predictor for anomaly detection.
491
"""
492
def __init__(self, endpoint_name: str, **kwargs): ...
493
494
def predict(self, data) -> list:
495
"""
496
Detect anomalous IP address usage patterns.
497
498
Parameters:
499
- data: IP address and entity pairs
500
501
Returns:
502
list: Anomaly scores
503
"""
504
```
505
506
## Usage Examples
507
508
### K-Means Clustering
509
510
```python
511
from sagemaker.amazon.kmeans import KMeans
512
513
# Create K-means estimator
514
kmeans = KMeans(
515
role=role,
516
instance_count=1,
517
instance_type="ml.m5.large",
518
k=10,
519
init_method="kmeans++",
520
max_iterations=100
521
)
522
523
# Train the model
524
kmeans.fit({"training": "s3://my-bucket/training-data"})
525
526
# Deploy for inference
527
kmeans_predictor = kmeans.deploy(
528
initial_instance_count=1,
529
instance_type="ml.m5.large"
530
)
531
532
# Make predictions
533
cluster_assignments = kmeans_predictor.predict(test_data)
534
```
535
536
### Linear Learner for Classification
537
538
```python
539
from sagemaker.amazon.linear_learner import LinearLearner
540
541
# Create linear learner estimator
542
linear = LinearLearner(
543
role=role,
544
instance_count=1,
545
instance_type="ml.m5.large",
546
predictor_type="binary_classifier",
547
num_models=32,
548
use_bias=True,
549
optimizer="adam",
550
learning_rate=0.001
551
)
552
553
# Train the model
554
linear.fit({"training": "s3://my-bucket/training-data"})
555
556
# Deploy for inference
557
linear_predictor = linear.deploy(
558
initial_instance_count=1,
559
instance_type="ml.m5.large"
560
)
561
562
# Make predictions
563
predictions = linear_predictor.predict(test_data)
564
```
565
566
### Random Cut Forest for Anomaly Detection
567
568
```python
569
from sagemaker.amazon.randomcutforest import RandomCutForest
570
571
# Create Random Cut Forest estimator
572
rcf = RandomCutForest(
573
role=role,
574
instance_count=1,
575
instance_type="ml.m5.large",
576
num_samples_per_tree=512,
577
num_trees=50
578
)
579
580
# Train the model (unsupervised)
581
rcf.fit({"training": "s3://my-bucket/training-data"})
582
583
# Deploy for inference
584
rcf_predictor = rcf.deploy(
585
initial_instance_count=1,
586
instance_type="ml.m5.large"
587
)
588
589
# Detect anomalies
590
anomaly_scores = rcf_predictor.predict(test_data)
591
```