Tessl Tile for pypi/sagemaker@2.251.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

amazon-algorithms.md automl.md core-training.md data-processing.md debugging-profiling.md experiments.md framework-training.md hyperparameter-tuning.md index.md model-monitoring.md model-serving.md remote-functions.md

amazon-algorithms.mddocs/

0
# Amazon Built-in Algorithms
1

2
Pre-built, optimized machine learning algorithms provided by Amazon SageMaker for common ML tasks including clustering, dimensionality reduction, classification, regression, and anomaly detection. These algorithms are optimized for performance and scalability on SageMaker infrastructure.
3

4
## Capabilities
5

6
### K-Means Clustering
7

8
Unsupervised learning algorithm for clustering data into k groups based on feature similarity.
9

10
```python { .api }
11
class KMeans(Estimator):
12
    """
13
    K-means clustering algorithm estimator.
14
    
15
    Parameters:
16
    - role (str): IAM role ARN
17
    - instance_count (int): Number of training instances
18
    - instance_type (str): EC2 instance type
19
    - k (int): Number of clusters
20
    - init_method (str, optional): Initialization method ("random", "kmeans++")
21
    - local_init_method (str, optional): Local initialization method
22
    - distance_metric (str, optional): Distance metric ("squared_euclidean")
23
    - mini_batch_size (int, optional): Mini-batch size for mini-batch k-means
24
    """
25
    def __init__(self, role: str, instance_count: int, instance_type: str, k: int,
26
                 init_method: str = "random", **kwargs): ...
27

28
class KMeansModel(Model):
29
    """
30
    K-means model for deployment and inference.
31
    """
32
    def __init__(self, model_data: str, role: str, sagemaker_session: 'Session' = None): ...
33

34
class KMeansPredictor(Predictor):
35
    """
36
    K-means predictor for cluster assignment.
37
    """
38
    def __init__(self, endpoint_name: str, **kwargs): ...
39
    
40
    def predict(self, data) -> list:
41
        """
42
        Predict cluster assignments for input data.
43
        
44
        Parameters:
45
        - data: Input data for clustering
46
        
47
        Returns:
48
            list: Cluster assignments and distances
49
        """
50
```
51

52
### Principal Component Analysis (PCA)
53

54
Dimensionality reduction algorithm that transforms data to lower-dimensional space while preserving variance.
55

56
```python { .api }
57
class PCA(Estimator):
58
    """
59
    Principal Component Analysis estimator.
60
    
61
    Parameters:
62
    - role (str): IAM role ARN
63
    - instance_count (int): Number of training instances
64
    - instance_type (str): EC2 instance type
65
    - num_components (int): Number of principal components
66
    - algorithm_mode (str, optional): Algorithm mode ("regular", "randomized")
67
    - subtract_mean (bool, optional): Whether to subtract mean
68
    """
69
    def __init__(self, role: str, instance_count: int, instance_type: str,
70
                 num_components: int, algorithm_mode: str = "regular", **kwargs): ...
71

72
class PCAModel(Model):
73
    """
74
    PCA model for deployment and inference.
75
    """
76
    def __init__(self, model_data: str, role: str, sagemaker_session: 'Session' = None): ...
77

78
class PCAPredictor(Predictor):
79
    """
80
    PCA predictor for dimensionality reduction.
81
    """
82
    def __init__(self, endpoint_name: str, **kwargs): ...
83
    
84
    def predict(self, data) -> list:
85
        """
86
        Transform data to principal component space.
87
        
88
        Parameters:
89
        - data: Input data for transformation
90
        
91
        Returns:
92
            list: Transformed data in PC space
93
        """
94
```
95

96
### Linear Learner
97

98
Linear algorithm for classification and regression with support for multiple loss functions and regularization.
99

100
```python { .api }
101
class LinearLearner(Estimator):
102
    """
103
    Linear learning algorithm for classification and regression.
104
    
105
    Parameters:
106
    - role (str): IAM role ARN
107
    - instance_count (int): Number of training instances
108
    - instance_type (str): EC2 instance type
109
    - predictor_type (str, optional): Predictor type ("binary_classifier", "multiclass_classifier", "regressor")
110
    - binary_classifier_model_selection_criteria (str, optional): Model selection criteria
111
    - target_recall (float, optional): Target recall for precision-recall optimization
112
    - target_precision (float, optional): Target precision for precision-recall optimization
113
    - positive_example_weight_mult (float, optional): Weight multiplier for positive examples
114
    - epochs (int, optional): Number of training epochs
115
    - use_bias (bool, optional): Whether to use bias term
116
    - num_models (int, optional): Number of parallel models to train
117
    - num_calibration_samples (int, optional): Number of samples for calibration
118
    - init_method (str, optional): Weight initialization method
119
    - init_scale (float, optional): Scale for weight initialization
120
    - init_sigma (float, optional): Standard deviation for weight initialization
121
    - init_bias (float, optional): Initial bias value
122
    - optimizer (str, optional): Optimization algorithm ("sgd", "adam", "rmsprop")
123
    - loss (str, optional): Loss function
124
    - wd (float, optional): Weight decay regularization
125
    - l1 (float, optional): L1 regularization
126
    - momentum (float, optional): Momentum for SGD
127
    - learning_rate (float, optional): Learning rate
128
    - beta_1 (float, optional): Beta1 parameter for Adam
129
    - beta_2 (float, optional): Beta2 parameter for Adam
130
    - bias_lr_mult (float, optional): Learning rate multiplier for bias
131
    - bias_wd_mult (float, optional): Weight decay multiplier for bias
132
    """
133
    def __init__(self, role: str, instance_count: int, instance_type: str,
134
                 predictor_type: str = "binary_classifier", **kwargs): ...
135

136
class LinearLearnerModel(Model):
137
    """
138
    Linear learner model for deployment and inference.
139
    """
140
    def __init__(self, model_data: str, role: str, sagemaker_session: 'Session' = None): ...
141

142
class LinearLearnerPredictor(Predictor):
143
    """
144
    Linear learner predictor for classification and regression.
145
    """
146
    def __init__(self, endpoint_name: str, **kwargs): ...
147
    
148
    def predict(self, data) -> list:
149
        """
150
        Make predictions using linear model.
151
        
152
        Parameters:
153
        - data: Input features for prediction
154
        
155
        Returns:
156
            list: Predictions and confidence scores
157
        """
158
```
159

160
### Factorization Machines
161

162
Algorithm for sparse data problems that learns feature interactions automatically.
163

164
```python { .api }
165
class FactorizationMachines(Estimator):
166
    """
167
    Factorization Machines algorithm for sparse data.
168
    
169
    Parameters:
170
    - role (str): IAM role ARN
171
    - instance_count (int): Number of training instances
172
    - instance_type (str): EC2 instance type
173
    - predictor_type (str, optional): Predictor type ("binary_classifier", "regressor")
174
    - num_factors (int, optional): Number of factorization factors
175
    - bias_lr (float, optional): Learning rate for bias term
176
    - linear_lr (float, optional): Learning rate for linear term
177
    - factors_lr (float, optional): Learning rate for factorization factors
178
    - bias_wd (float, optional): Weight decay for bias
179
    - linear_wd (float, optional): Weight decay for linear term
180
    - factors_wd (float, optional): Weight decay for factors
181
    - bias_init_method (str, optional): Bias initialization method
182
    - bias_init_scale (float, optional): Bias initialization scale
183
    - linear_init_method (str, optional): Linear term initialization method
184
    - linear_init_scale (float, optional): Linear term initialization scale
185
    - factors_init_method (str, optional): Factors initialization method
186
    - factors_init_scale (float, optional): Factors initialization scale
187
    - epochs (int, optional): Number of training epochs
188
    - clip_gradient (float, optional): Gradient clipping threshold
189
    - eps (float, optional): Epsilon for numerical stability
190
    - rescale_grad (float, optional): Gradient rescaling factor
191
    """
192
    def __init__(self, role: str, instance_count: int, instance_type: str,
193
                 predictor_type: str = "binary_classifier", **kwargs): ...
194

195
class FactorizationMachinesModel(Model):
196
    """
197
    Factorization Machines model for deployment and inference.
198
    """
199
    def __init__(self, model_data: str, role: str, sagemaker_session: 'Session' = None): ...
200

201
class FactorizationMachinesPredictor(Predictor):
202
    """
203
    Factorization Machines predictor.
204
    """
205
    def __init__(self, endpoint_name: str, **kwargs): ...
206
    
207
    def predict(self, data) -> list:
208
        """
209
        Make predictions using factorization machines.
210
        
211
        Parameters:
212
        - data: Sparse input features
213
        
214
        Returns:
215
            list: Predictions
216
        """
217
```
218

219
### Random Cut Forest
220

221
Unsupervised algorithm for anomaly detection that identifies outliers in data.
222

223
```python { .api }
224
class RandomCutForest(Estimator):
225
    """
226
    Random Cut Forest algorithm for anomaly detection.
227
    
228
    Parameters:
229
    - role (str): IAM role ARN
230
    - instance_count (int): Number of training instances
231
    - instance_type (str): EC2 instance type
232
    - num_samples_per_tree (int, optional): Number of samples per tree
233
    - num_trees (int, optional): Number of trees in the forest
234
    - feature_dim (int, optional): Feature dimension
235
    - eval_metrics (list, optional): Evaluation metrics
236
    """
237
    def __init__(self, role: str, instance_count: int, instance_type: str,
238
                 num_samples_per_tree: int = None, **kwargs): ...
239

240
class RandomCutForestModel(Model):
241
    """
242
    Random Cut Forest model for deployment and inference.
243
    """
244
    def __init__(self, model_data: str, role: str, sagemaker_session: 'Session' = None): ...
245

246
class RandomCutForestPredictor(Predictor):
247
    """
248
    Random Cut Forest predictor for anomaly detection.
249
    """
250
    def __init__(self, endpoint_name: str, **kwargs): ...
251
    
252
    def predict(self, data) -> list:
253
        """
254
        Detect anomalies in input data.
255
        
256
        Parameters:
257
        - data: Input data for anomaly detection
258
        
259
        Returns:
260
            list: Anomaly scores
261
        """
262
```
263

264
### Latent Dirichlet Allocation (LDA)
265

266
Topic modeling algorithm for discovering latent topics in document collections.
267

268
```python { .api }
269
class LDA(Estimator):
270
    """
271
    Latent Dirichlet Allocation for topic modeling.
272
    
273
    Parameters:
274
    - role (str): IAM role ARN
275
    - instance_count (int): Number of training instances
276
    - instance_type (str): EC2 instance type
277
    - num_topics (int): Number of topics to discover
278
    - alpha0 (float, optional): Concentration parameter for document-topic distribution
279
    - max_restarts (int, optional): Maximum number of restarts
280
    - max_iterations (int, optional): Maximum number of iterations
281
    - tol (float, optional): Tolerance for convergence
282
    """
283
    def __init__(self, role: str, instance_count: int, instance_type: str,
284
                 num_topics: int, **kwargs): ...
285

286
class LDAModel(Model):
287
    """
288
    LDA model for deployment and inference.
289
    """
290
    def __init__(self, model_data: str, role: str, sagemaker_session: 'Session' = None): ...
291

292
class LDAPredictor(Predictor):
293
    """
294
    LDA predictor for topic inference.
295
    """
296
    def __init__(self, endpoint_name: str, **kwargs): ...
297
    
298
    def predict(self, data) -> list:
299
        """
300
        Infer topic distributions for documents.
301
        
302
        Parameters:
303
        - data: Document data for topic inference
304
        
305
        Returns:
306
            list: Topic distributions
307
        """
308
```
309

310
### Neural Topic Model (NTM)
311

312
Neural network-based topic modeling algorithm for learning topic representations.
313

314
```python { .api }
315
class NTM(Estimator):
316
    """
317
    Neural Topic Model for topic modeling with neural networks.
318
    
319
    Parameters:
320
    - role (str): IAM role ARN
321
    - instance_count (int): Number of training instances
322
    - instance_type (str): EC2 instance type
323
    - num_topics (int): Number of topics
324
    - feature_dim (int): Feature dimension (vocabulary size)
325
    - mini_batch_size (int, optional): Mini-batch size
326
    - epochs (int, optional): Number of training epochs
327
    - num_patience_epochs (int, optional): Early stopping patience
328
    - tolerance (float, optional): Tolerance for early stopping
329
    - learning_rate (float, optional): Learning rate
330
    - batch_norm (bool, optional): Use batch normalization
331
    - clip_gradient (float, optional): Gradient clipping threshold
332
    - weight_decay (float, optional): Weight decay regularization
333
    - latent_dim (int, optional): Latent dimension size
334
    - encoder_layers (str, optional): Encoder layer configuration
335
    - encoder_layers_activation (str, optional): Encoder activation function
336
    - optimizer (str, optional): Optimizer algorithm
337
    - rescale_gradient (float, optional): Gradient rescaling factor
338
    """
339
    def __init__(self, role: str, instance_count: int, instance_type: str,
340
                 num_topics: int, feature_dim: int, **kwargs): ...
341

342
class NTMModel(Model):
343
    """
344
    NTM model for deployment and inference.
345
    """
346
    def __init__(self, model_data: str, role: str, sagemaker_session: 'Session' = None): ...
347

348
class NTMPredictor(Predictor):
349
    """
350
    NTM predictor for topic inference.
351
    """
352
    def __init__(self, endpoint_name: str, **kwargs): ...
353
    
354
    def predict(self, data) -> list:
355
        """
356
        Infer topic distributions using neural topic model.
357
        
358
        Parameters:
359
        - data: Document features for topic inference
360
        
361
        Returns:
362
            list: Topic distributions
363
        """
364
```
365

366
### K-Nearest Neighbors (KNN)
367

368
Non-parametric algorithm for classification and regression based on k nearest neighbors.
369

370
```python { .api }
371
class KNN(Estimator):
372
    """
373
    K-Nearest Neighbors algorithm for classification and regression.
374
    
375
    Parameters:
376
    - role (str): IAM role ARN
377
    - instance_count (int): Number of training instances
378
    - instance_type (str): EC2 instance type
379
    - k (int): Number of nearest neighbors
380
    - predictor_type (str): Predictor type ("classifier", "regressor")
381
    - sample_size (int, optional): Training sample size
382
    - dimension_reduction_target (int, optional): Target dimensions after reduction
383
    - dimension_reduction_type (str, optional): Dimension reduction method ("sign", "fjlt")
384
    - index_metric (str, optional): Distance metric ("COSINE", "INNER_PRODUCT", "L2")
385
    - index_type (str, optional): Index type ("faiss.Flat", "faiss.IVFFlat", "faiss.IVFPQ")
386
    - faiss_index_ivf_nlists (int, optional): Number of inverted lists for IVF
387
    - faiss_index_pq_m (int, optional): Number of sub-quantizers for PQ
388
    """
389
    def __init__(self, role: str, instance_count: int, instance_type: str,
390
                 k: int, predictor_type: str, **kwargs): ...
391

392
class KNNModel(Model):
393
    """
394
    KNN model for deployment and inference.
395
    """
396
    def __init__(self, model_data: str, role: str, sagemaker_session: 'Session' = None): ...
397

398
class KNNPredictor(Predictor):
399
    """
400
    KNN predictor for classification and regression.
401
    """
402
    def __init__(self, endpoint_name: str, **kwargs): ...
403
    
404
    def predict(self, data) -> list:
405
        """
406
        Make predictions using k-nearest neighbors.
407
        
408
        Parameters:
409
        - data: Input features for prediction
410
        
411
        Returns:
412
            list: Predictions and neighbor information
413
        """
414
```
415

416
### Object2Vec
417

418
Algorithm for learning embeddings of objects such as sentences, customers, or products.
419

420
```python { .api }
421
class Object2Vec(Estimator):
422
    """
423
    Object2Vec algorithm for learning object embeddings.
424
    
425
    Parameters:
426
    - role (str): IAM role ARN
427
    - instance_count (int): Number of training instances
428
    - instance_type (str): EC2 instance type
429
    - enc_dim (int): Encoder output dimension
430
    - mini_batch_size (int, optional): Mini-batch size
431
    - epochs (int, optional): Number of training epochs
432
    - early_stopping (bool, optional): Enable early stopping
433
    - patience (int, optional): Early stopping patience
434
    - tolerance (float, optional): Early stopping tolerance
435
    - dropout (float, optional): Dropout probability
436
    - weight_decay (float, optional): Weight decay regularization
437
    - bucket_width (int, optional): Bucket width for sequence padding
438
    - num_classes (int, optional): Number of classes for classification
439
    - mlp_layers (int, optional): Number of MLP layers
440
    - mlp_dim (int, optional): MLP layer dimension
441
    - mlp_activation (str, optional): MLP activation function
442
    - output_layer (str, optional): Output layer type
443
    - optimizer (str, optional): Optimizer algorithm
444
    - learning_rate (float, optional): Learning rate
445
    """
446
    def __init__(self, role: str, instance_count: int, instance_type: str,
447
                 enc_dim: int, **kwargs): ...
448

449
class Object2VecModel(Model):
450
    """
451
    Object2Vec model for deployment and inference.
452
    """
453
    def __init__(self, model_data: str, role: str, sagemaker_session: 'Session' = None): ...
454
```
455

456
### IP Insights
457

458
Unsupervised algorithm for learning usage patterns of IP addresses.
459

460
```python { .api }
461
class IPInsights(Estimator):
462
    """
463
    IP Insights algorithm for learning IP address usage patterns.
464
    
465
    Parameters:
466
    - role (str): IAM role ARN
467
    - instance_count (int): Number of training instances
468
    - instance_type (str): EC2 instance type
469
    - num_entity_vectors (int): Number of entity vectors
470
    - vector_dim (int): Vector dimension
471
    - epochs (int, optional): Number of training epochs
472
    - learning_rate (float, optional): Learning rate
473
    - num_ip_encoder_layers (int, optional): Number of IP encoder layers
474
    - random_negative_sampling_rate (int, optional): Negative sampling rate
475
    - shuffled_negative_sampling_rate (int, optional): Shuffled negative sampling rate
476
    - weight_decay (float, optional): Weight decay regularization
477
    - batch_metrics_publish_interval (int, optional): Batch metrics publish interval
478
    """
479
    def __init__(self, role: str, instance_count: int, instance_type: str,
480
                 num_entity_vectors: int, vector_dim: int, **kwargs): ...
481

482
class IPInsightsModel(Model):
483
    """
484
    IP Insights model for deployment and inference.
485
    """
486
    def __init__(self, model_data: str, role: str, sagemaker_session: 'Session' = None): ...
487

488
class IPInsightsPredictor(Predictor):
489
    """
490
    IP Insights predictor for anomaly detection.
491
    """
492
    def __init__(self, endpoint_name: str, **kwargs): ...
493
    
494
    def predict(self, data) -> list:
495
        """
496
        Detect anomalous IP address usage patterns.
497
        
498
        Parameters:
499
        - data: IP address and entity pairs
500
        
501
        Returns:
502
            list: Anomaly scores
503
        """
504
```
505

506
## Usage Examples
507

508
### K-Means Clustering
509

510
```python
511
from sagemaker.amazon.kmeans import KMeans
512

513
# Create K-means estimator
514
kmeans = KMeans(
515
    role=role,
516
    instance_count=1,
517
    instance_type="ml.m5.large",
518
    k=10,
519
    init_method="kmeans++",
520
    max_iterations=100
521
)
522

523
# Train the model
524
kmeans.fit({"training": "s3://my-bucket/training-data"})
525

526
# Deploy for inference
527
kmeans_predictor = kmeans.deploy(
528
    initial_instance_count=1,
529
    instance_type="ml.m5.large"
530
)
531

532
# Make predictions
533
cluster_assignments = kmeans_predictor.predict(test_data)
534
```
535

536
### Linear Learner for Classification
537

538
```python
539
from sagemaker.amazon.linear_learner import LinearLearner
540

541
# Create linear learner estimator
542
linear = LinearLearner(
543
    role=role,
544
    instance_count=1,
545
    instance_type="ml.m5.large",
546
    predictor_type="binary_classifier",
547
    num_models=32,
548
    use_bias=True,
549
    optimizer="adam",
550
    learning_rate=0.001
551
)
552

553
# Train the model
554
linear.fit({"training": "s3://my-bucket/training-data"})
555

556
# Deploy for inference
557
linear_predictor = linear.deploy(
558
    initial_instance_count=1,
559
    instance_type="ml.m5.large"
560
)
561

562
# Make predictions
563
predictions = linear_predictor.predict(test_data)
564
```
565

566
### Random Cut Forest for Anomaly Detection
567

568
```python
569
from sagemaker.amazon.randomcutforest import RandomCutForest
570

571
# Create Random Cut Forest estimator
572
rcf = RandomCutForest(
573
    role=role,
574
    instance_count=1,
575
    instance_type="ml.m5.large",
576
    num_samples_per_tree=512,
577
    num_trees=50
578
)
579

580
# Train the model (unsupervised)
581
rcf.fit({"training": "s3://my-bucket/training-data"})
582

583
# Deploy for inference
584
rcf_predictor = rcf.deploy(
585
    initial_instance_count=1,
586
    instance_type="ml.m5.large"
587
)
588

589
# Detect anomalies
590
anomaly_scores = rcf_predictor.predict(test_data)
591
```

Version

Tile

Files

amazon-algorithms.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

amazon-algorithms.mddocs/