or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

amazon-algorithms.mdautoml.mdcore-training.mddata-processing.mddebugging-profiling.mdexperiments.mdframework-training.mdhyperparameter-tuning.mdindex.mdmodel-monitoring.mdmodel-serving.mdremote-functions.md

amazon-algorithms.mddocs/

0

# Amazon Built-in Algorithms

1

2

Pre-built, optimized machine learning algorithms provided by Amazon SageMaker for common ML tasks including clustering, dimensionality reduction, classification, regression, and anomaly detection. These algorithms are optimized for performance and scalability on SageMaker infrastructure.

3

4

## Capabilities

5

6

### K-Means Clustering

7

8

Unsupervised learning algorithm for clustering data into k groups based on feature similarity.

9

10

```python { .api }

11

class KMeans(Estimator):

12

"""

13

K-means clustering algorithm estimator.

14

15

Parameters:

16

- role (str): IAM role ARN

17

- instance_count (int): Number of training instances

18

- instance_type (str): EC2 instance type

19

- k (int): Number of clusters

20

- init_method (str, optional): Initialization method ("random", "kmeans++")

21

- local_init_method (str, optional): Local initialization method

22

- distance_metric (str, optional): Distance metric ("squared_euclidean")

23

- mini_batch_size (int, optional): Mini-batch size for mini-batch k-means

24

"""

25

def __init__(self, role: str, instance_count: int, instance_type: str, k: int,

26

init_method: str = "random", **kwargs): ...

27

28

class KMeansModel(Model):

29

"""

30

K-means model for deployment and inference.

31

"""

32

def __init__(self, model_data: str, role: str, sagemaker_session: 'Session' = None): ...

33

34

class KMeansPredictor(Predictor):

35

"""

36

K-means predictor for cluster assignment.

37

"""

38

def __init__(self, endpoint_name: str, **kwargs): ...

39

40

def predict(self, data) -> list:

41

"""

42

Predict cluster assignments for input data.

43

44

Parameters:

45

- data: Input data for clustering

46

47

Returns:

48

list: Cluster assignments and distances

49

"""

50

```

51

52

### Principal Component Analysis (PCA)

53

54

Dimensionality reduction algorithm that transforms data to lower-dimensional space while preserving variance.

55

56

```python { .api }

57

class PCA(Estimator):

58

"""

59

Principal Component Analysis estimator.

60

61

Parameters:

62

- role (str): IAM role ARN

63

- instance_count (int): Number of training instances

64

- instance_type (str): EC2 instance type

65

- num_components (int): Number of principal components

66

- algorithm_mode (str, optional): Algorithm mode ("regular", "randomized")

67

- subtract_mean (bool, optional): Whether to subtract mean

68

"""

69

def __init__(self, role: str, instance_count: int, instance_type: str,

70

num_components: int, algorithm_mode: str = "regular", **kwargs): ...

71

72

class PCAModel(Model):

73

"""

74

PCA model for deployment and inference.

75

"""

76

def __init__(self, model_data: str, role: str, sagemaker_session: 'Session' = None): ...

77

78

class PCAPredictor(Predictor):

79

"""

80

PCA predictor for dimensionality reduction.

81

"""

82

def __init__(self, endpoint_name: str, **kwargs): ...

83

84

def predict(self, data) -> list:

85

"""

86

Transform data to principal component space.

87

88

Parameters:

89

- data: Input data for transformation

90

91

Returns:

92

list: Transformed data in PC space

93

"""

94

```

95

96

### Linear Learner

97

98

Linear algorithm for classification and regression with support for multiple loss functions and regularization.

99

100

```python { .api }

101

class LinearLearner(Estimator):

102

"""

103

Linear learning algorithm for classification and regression.

104

105

Parameters:

106

- role (str): IAM role ARN

107

- instance_count (int): Number of training instances

108

- instance_type (str): EC2 instance type

109

- predictor_type (str, optional): Predictor type ("binary_classifier", "multiclass_classifier", "regressor")

110

- binary_classifier_model_selection_criteria (str, optional): Model selection criteria

111

- target_recall (float, optional): Target recall for precision-recall optimization

112

- target_precision (float, optional): Target precision for precision-recall optimization

113

- positive_example_weight_mult (float, optional): Weight multiplier for positive examples

114

- epochs (int, optional): Number of training epochs

115

- use_bias (bool, optional): Whether to use bias term

116

- num_models (int, optional): Number of parallel models to train

117

- num_calibration_samples (int, optional): Number of samples for calibration

118

- init_method (str, optional): Weight initialization method

119

- init_scale (float, optional): Scale for weight initialization

120

- init_sigma (float, optional): Standard deviation for weight initialization

121

- init_bias (float, optional): Initial bias value

122

- optimizer (str, optional): Optimization algorithm ("sgd", "adam", "rmsprop")

123

- loss (str, optional): Loss function

124

- wd (float, optional): Weight decay regularization

125

- l1 (float, optional): L1 regularization

126

- momentum (float, optional): Momentum for SGD

127

- learning_rate (float, optional): Learning rate

128

- beta_1 (float, optional): Beta1 parameter for Adam

129

- beta_2 (float, optional): Beta2 parameter for Adam

130

- bias_lr_mult (float, optional): Learning rate multiplier for bias

131

- bias_wd_mult (float, optional): Weight decay multiplier for bias

132

"""

133

def __init__(self, role: str, instance_count: int, instance_type: str,

134

predictor_type: str = "binary_classifier", **kwargs): ...

135

136

class LinearLearnerModel(Model):

137

"""

138

Linear learner model for deployment and inference.

139

"""

140

def __init__(self, model_data: str, role: str, sagemaker_session: 'Session' = None): ...

141

142

class LinearLearnerPredictor(Predictor):

143

"""

144

Linear learner predictor for classification and regression.

145

"""

146

def __init__(self, endpoint_name: str, **kwargs): ...

147

148

def predict(self, data) -> list:

149

"""

150

Make predictions using linear model.

151

152

Parameters:

153

- data: Input features for prediction

154

155

Returns:

156

list: Predictions and confidence scores

157

"""

158

```

159

160

### Factorization Machines

161

162

Algorithm for sparse data problems that learns feature interactions automatically.

163

164

```python { .api }

165

class FactorizationMachines(Estimator):

166

"""

167

Factorization Machines algorithm for sparse data.

168

169

Parameters:

170

- role (str): IAM role ARN

171

- instance_count (int): Number of training instances

172

- instance_type (str): EC2 instance type

173

- predictor_type (str, optional): Predictor type ("binary_classifier", "regressor")

174

- num_factors (int, optional): Number of factorization factors

175

- bias_lr (float, optional): Learning rate for bias term

176

- linear_lr (float, optional): Learning rate for linear term

177

- factors_lr (float, optional): Learning rate for factorization factors

178

- bias_wd (float, optional): Weight decay for bias

179

- linear_wd (float, optional): Weight decay for linear term

180

- factors_wd (float, optional): Weight decay for factors

181

- bias_init_method (str, optional): Bias initialization method

182

- bias_init_scale (float, optional): Bias initialization scale

183

- linear_init_method (str, optional): Linear term initialization method

184

- linear_init_scale (float, optional): Linear term initialization scale

185

- factors_init_method (str, optional): Factors initialization method

186

- factors_init_scale (float, optional): Factors initialization scale

187

- epochs (int, optional): Number of training epochs

188

- clip_gradient (float, optional): Gradient clipping threshold

189

- eps (float, optional): Epsilon for numerical stability

190

- rescale_grad (float, optional): Gradient rescaling factor

191

"""

192

def __init__(self, role: str, instance_count: int, instance_type: str,

193

predictor_type: str = "binary_classifier", **kwargs): ...

194

195

class FactorizationMachinesModel(Model):

196

"""

197

Factorization Machines model for deployment and inference.

198

"""

199

def __init__(self, model_data: str, role: str, sagemaker_session: 'Session' = None): ...

200

201

class FactorizationMachinesPredictor(Predictor):

202

"""

203

Factorization Machines predictor.

204

"""

205

def __init__(self, endpoint_name: str, **kwargs): ...

206

207

def predict(self, data) -> list:

208

"""

209

Make predictions using factorization machines.

210

211

Parameters:

212

- data: Sparse input features

213

214

Returns:

215

list: Predictions

216

"""

217

```

218

219

### Random Cut Forest

220

221

Unsupervised algorithm for anomaly detection that identifies outliers in data.

222

223

```python { .api }

224

class RandomCutForest(Estimator):

225

"""

226

Random Cut Forest algorithm for anomaly detection.

227

228

Parameters:

229

- role (str): IAM role ARN

230

- instance_count (int): Number of training instances

231

- instance_type (str): EC2 instance type

232

- num_samples_per_tree (int, optional): Number of samples per tree

233

- num_trees (int, optional): Number of trees in the forest

234

- feature_dim (int, optional): Feature dimension

235

- eval_metrics (list, optional): Evaluation metrics

236

"""

237

def __init__(self, role: str, instance_count: int, instance_type: str,

238

num_samples_per_tree: int = None, **kwargs): ...

239

240

class RandomCutForestModel(Model):

241

"""

242

Random Cut Forest model for deployment and inference.

243

"""

244

def __init__(self, model_data: str, role: str, sagemaker_session: 'Session' = None): ...

245

246

class RandomCutForestPredictor(Predictor):

247

"""

248

Random Cut Forest predictor for anomaly detection.

249

"""

250

def __init__(self, endpoint_name: str, **kwargs): ...

251

252

def predict(self, data) -> list:

253

"""

254

Detect anomalies in input data.

255

256

Parameters:

257

- data: Input data for anomaly detection

258

259

Returns:

260

list: Anomaly scores

261

"""

262

```

263

264

### Latent Dirichlet Allocation (LDA)

265

266

Topic modeling algorithm for discovering latent topics in document collections.

267

268

```python { .api }

269

class LDA(Estimator):

270

"""

271

Latent Dirichlet Allocation for topic modeling.

272

273

Parameters:

274

- role (str): IAM role ARN

275

- instance_count (int): Number of training instances

276

- instance_type (str): EC2 instance type

277

- num_topics (int): Number of topics to discover

278

- alpha0 (float, optional): Concentration parameter for document-topic distribution

279

- max_restarts (int, optional): Maximum number of restarts

280

- max_iterations (int, optional): Maximum number of iterations

281

- tol (float, optional): Tolerance for convergence

282

"""

283

def __init__(self, role: str, instance_count: int, instance_type: str,

284

num_topics: int, **kwargs): ...

285

286

class LDAModel(Model):

287

"""

288

LDA model for deployment and inference.

289

"""

290

def __init__(self, model_data: str, role: str, sagemaker_session: 'Session' = None): ...

291

292

class LDAPredictor(Predictor):

293

"""

294

LDA predictor for topic inference.

295

"""

296

def __init__(self, endpoint_name: str, **kwargs): ...

297

298

def predict(self, data) -> list:

299

"""

300

Infer topic distributions for documents.

301

302

Parameters:

303

- data: Document data for topic inference

304

305

Returns:

306

list: Topic distributions

307

"""

308

```

309

310

### Neural Topic Model (NTM)

311

312

Neural network-based topic modeling algorithm for learning topic representations.

313

314

```python { .api }

315

class NTM(Estimator):

316

"""

317

Neural Topic Model for topic modeling with neural networks.

318

319

Parameters:

320

- role (str): IAM role ARN

321

- instance_count (int): Number of training instances

322

- instance_type (str): EC2 instance type

323

- num_topics (int): Number of topics

324

- feature_dim (int): Feature dimension (vocabulary size)

325

- mini_batch_size (int, optional): Mini-batch size

326

- epochs (int, optional): Number of training epochs

327

- num_patience_epochs (int, optional): Early stopping patience

328

- tolerance (float, optional): Tolerance for early stopping

329

- learning_rate (float, optional): Learning rate

330

- batch_norm (bool, optional): Use batch normalization

331

- clip_gradient (float, optional): Gradient clipping threshold

332

- weight_decay (float, optional): Weight decay regularization

333

- latent_dim (int, optional): Latent dimension size

334

- encoder_layers (str, optional): Encoder layer configuration

335

- encoder_layers_activation (str, optional): Encoder activation function

336

- optimizer (str, optional): Optimizer algorithm

337

- rescale_gradient (float, optional): Gradient rescaling factor

338

"""

339

def __init__(self, role: str, instance_count: int, instance_type: str,

340

num_topics: int, feature_dim: int, **kwargs): ...

341

342

class NTMModel(Model):

343

"""

344

NTM model for deployment and inference.

345

"""

346

def __init__(self, model_data: str, role: str, sagemaker_session: 'Session' = None): ...

347

348

class NTMPredictor(Predictor):

349

"""

350

NTM predictor for topic inference.

351

"""

352

def __init__(self, endpoint_name: str, **kwargs): ...

353

354

def predict(self, data) -> list:

355

"""

356

Infer topic distributions using neural topic model.

357

358

Parameters:

359

- data: Document features for topic inference

360

361

Returns:

362

list: Topic distributions

363

"""

364

```

365

366

### K-Nearest Neighbors (KNN)

367

368

Non-parametric algorithm for classification and regression based on k nearest neighbors.

369

370

```python { .api }

371

class KNN(Estimator):

372

"""

373

K-Nearest Neighbors algorithm for classification and regression.

374

375

Parameters:

376

- role (str): IAM role ARN

377

- instance_count (int): Number of training instances

378

- instance_type (str): EC2 instance type

379

- k (int): Number of nearest neighbors

380

- predictor_type (str): Predictor type ("classifier", "regressor")

381

- sample_size (int, optional): Training sample size

382

- dimension_reduction_target (int, optional): Target dimensions after reduction

383

- dimension_reduction_type (str, optional): Dimension reduction method ("sign", "fjlt")

384

- index_metric (str, optional): Distance metric ("COSINE", "INNER_PRODUCT", "L2")

385

- index_type (str, optional): Index type ("faiss.Flat", "faiss.IVFFlat", "faiss.IVFPQ")

386

- faiss_index_ivf_nlists (int, optional): Number of inverted lists for IVF

387

- faiss_index_pq_m (int, optional): Number of sub-quantizers for PQ

388

"""

389

def __init__(self, role: str, instance_count: int, instance_type: str,

390

k: int, predictor_type: str, **kwargs): ...

391

392

class KNNModel(Model):

393

"""

394

KNN model for deployment and inference.

395

"""

396

def __init__(self, model_data: str, role: str, sagemaker_session: 'Session' = None): ...

397

398

class KNNPredictor(Predictor):

399

"""

400

KNN predictor for classification and regression.

401

"""

402

def __init__(self, endpoint_name: str, **kwargs): ...

403

404

def predict(self, data) -> list:

405

"""

406

Make predictions using k-nearest neighbors.

407

408

Parameters:

409

- data: Input features for prediction

410

411

Returns:

412

list: Predictions and neighbor information

413

"""

414

```

415

416

### Object2Vec

417

418

Algorithm for learning embeddings of objects such as sentences, customers, or products.

419

420

```python { .api }

421

class Object2Vec(Estimator):

422

"""

423

Object2Vec algorithm for learning object embeddings.

424

425

Parameters:

426

- role (str): IAM role ARN

427

- instance_count (int): Number of training instances

428

- instance_type (str): EC2 instance type

429

- enc_dim (int): Encoder output dimension

430

- mini_batch_size (int, optional): Mini-batch size

431

- epochs (int, optional): Number of training epochs

432

- early_stopping (bool, optional): Enable early stopping

433

- patience (int, optional): Early stopping patience

434

- tolerance (float, optional): Early stopping tolerance

435

- dropout (float, optional): Dropout probability

436

- weight_decay (float, optional): Weight decay regularization

437

- bucket_width (int, optional): Bucket width for sequence padding

438

- num_classes (int, optional): Number of classes for classification

439

- mlp_layers (int, optional): Number of MLP layers

440

- mlp_dim (int, optional): MLP layer dimension

441

- mlp_activation (str, optional): MLP activation function

442

- output_layer (str, optional): Output layer type

443

- optimizer (str, optional): Optimizer algorithm

444

- learning_rate (float, optional): Learning rate

445

"""

446

def __init__(self, role: str, instance_count: int, instance_type: str,

447

enc_dim: int, **kwargs): ...

448

449

class Object2VecModel(Model):

450

"""

451

Object2Vec model for deployment and inference.

452

"""

453

def __init__(self, model_data: str, role: str, sagemaker_session: 'Session' = None): ...

454

```

455

456

### IP Insights

457

458

Unsupervised algorithm for learning usage patterns of IP addresses.

459

460

```python { .api }

461

class IPInsights(Estimator):

462

"""

463

IP Insights algorithm for learning IP address usage patterns.

464

465

Parameters:

466

- role (str): IAM role ARN

467

- instance_count (int): Number of training instances

468

- instance_type (str): EC2 instance type

469

- num_entity_vectors (int): Number of entity vectors

470

- vector_dim (int): Vector dimension

471

- epochs (int, optional): Number of training epochs

472

- learning_rate (float, optional): Learning rate

473

- num_ip_encoder_layers (int, optional): Number of IP encoder layers

474

- random_negative_sampling_rate (int, optional): Negative sampling rate

475

- shuffled_negative_sampling_rate (int, optional): Shuffled negative sampling rate

476

- weight_decay (float, optional): Weight decay regularization

477

- batch_metrics_publish_interval (int, optional): Batch metrics publish interval

478

"""

479

def __init__(self, role: str, instance_count: int, instance_type: str,

480

num_entity_vectors: int, vector_dim: int, **kwargs): ...

481

482

class IPInsightsModel(Model):

483

"""

484

IP Insights model for deployment and inference.

485

"""

486

def __init__(self, model_data: str, role: str, sagemaker_session: 'Session' = None): ...

487

488

class IPInsightsPredictor(Predictor):

489

"""

490

IP Insights predictor for anomaly detection.

491

"""

492

def __init__(self, endpoint_name: str, **kwargs): ...

493

494

def predict(self, data) -> list:

495

"""

496

Detect anomalous IP address usage patterns.

497

498

Parameters:

499

- data: IP address and entity pairs

500

501

Returns:

502

list: Anomaly scores

503

"""

504

```

505

506

## Usage Examples

507

508

### K-Means Clustering

509

510

```python

511

from sagemaker.amazon.kmeans import KMeans

512

513

# Create K-means estimator

514

kmeans = KMeans(

515

role=role,

516

instance_count=1,

517

instance_type="ml.m5.large",

518

k=10,

519

init_method="kmeans++",

520

max_iterations=100

521

)

522

523

# Train the model

524

kmeans.fit({"training": "s3://my-bucket/training-data"})

525

526

# Deploy for inference

527

kmeans_predictor = kmeans.deploy(

528

initial_instance_count=1,

529

instance_type="ml.m5.large"

530

)

531

532

# Make predictions

533

cluster_assignments = kmeans_predictor.predict(test_data)

534

```

535

536

### Linear Learner for Classification

537

538

```python

539

from sagemaker.amazon.linear_learner import LinearLearner

540

541

# Create linear learner estimator

542

linear = LinearLearner(

543

role=role,

544

instance_count=1,

545

instance_type="ml.m5.large",

546

predictor_type="binary_classifier",

547

num_models=32,

548

use_bias=True,

549

optimizer="adam",

550

learning_rate=0.001

551

)

552

553

# Train the model

554

linear.fit({"training": "s3://my-bucket/training-data"})

555

556

# Deploy for inference

557

linear_predictor = linear.deploy(

558

initial_instance_count=1,

559

instance_type="ml.m5.large"

560

)

561

562

# Make predictions

563

predictions = linear_predictor.predict(test_data)

564

```

565

566

### Random Cut Forest for Anomaly Detection

567

568

```python

569

from sagemaker.amazon.randomcutforest import RandomCutForest

570

571

# Create Random Cut Forest estimator

572

rcf = RandomCutForest(

573

role=role,

574

instance_count=1,

575

instance_type="ml.m5.large",

576

num_samples_per_tree=512,

577

num_trees=50

578

)

579

580

# Train the model (unsupervised)

581

rcf.fit({"training": "s3://my-bucket/training-data"})

582

583

# Deploy for inference

584

rcf_predictor = rcf.deploy(

585

initial_instance_count=1,

586

instance_type="ml.m5.large"

587

)

588

589

# Detect anomalies

590

anomaly_scores = rcf_predictor.predict(test_data)

591

```