Tessl Tile for pypi/autogluon@1.4.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

core.md features.md index.md multimodal.md tabular.md timeseries.md

multimodal.mddocs/

0
# Multimodal Machine Learning
1

2
Automated machine learning for heterogeneous data combining text, images, and tabular features. MultiModalPredictor supports diverse tasks including classification, regression, object detection, named entity recognition, semantic matching, and feature extraction using state-of-the-art foundation models.
3

4
## Capabilities
5

6
### MultiModalPredictor Class
7

8
Main predictor class for multimodal data that automatically handles different data modalities and task types with minimal configuration.
9

10
```python { .api }
11
class MultiModalPredictor:
12
    def __init__(
13
        self,
14
        label: str = None,
15
        problem_type: str = None,
16
        query: str = None,
17
        response: str = None,
18
        match_label = None,
19
        presets: str = None,
20
        eval_metric = None,
21
        hyperparameters: dict = None,
22
        path: str = None,
23
        verbosity: int = 2,
24
        num_classes: int = None,
25
        classes: list = None,
26
        warn_if_exist: bool = True,
27
        enable_progress_bar: bool = None,
28
        pretrained: bool = True,
29
        validation_metric: str = None,
30
        sample_data_path: str = None,
31
        use_ensemble: bool = False,
32
        ensemble_size: int = 2,
33
        ensemble_mode: str = "one_shot"
34
    ):
35
        """
36
        Initialize MultiModalPredictor for automated multimodal machine learning.
37
        
38
        Parameters:
39
        - label: Name of target column to predict
40
        - problem_type: Problem type ('binary', 'multiclass', 'regression', 'object_detection', 
41
          'ner', 'text_similarity', 'image_similarity', 'image_text_similarity', 
42
          'feature_extraction', 'zero_shot_image_classification', 'few_shot_classification',
43
          'semantic_segmentation')
44
        - query: Column name for query data in semantic matching tasks
45
        - response: Column name for response data in semantic matching tasks
46
        - match_label: Label indicating positive matches in semantic matching
47
        - presets: Quality presets ('best_quality', 'high_quality', 'medium_quality')
48
        - eval_metric: Evaluation metric for model selection
49
        - hyperparameters: Custom hyperparameter configurations
50
        - path: Directory to save models and artifacts
51
        - verbosity: Logging verbosity level (0-4)
52
        - num_classes: Number of classes for object detection
53
        - classes: Class names for object detection
54
        - warn_if_exist: Whether to warn if save path exists
55
        - enable_progress_bar: Show training progress bars
56
        - pretrained: Use pretrained model weights
57
        - validation_metric: Metric for validation and early stopping
58
        - sample_data_path: Path to sample data for inference shape
59
        - use_ensemble: Enable ensemble learning
60
        - ensemble_size: Number of models in ensemble
61
        - ensemble_mode: Ensemble construction mode ('one_shot', 'sequential')
62
        """
63
```
64

65
### Model Training
66

67
Train multimodal models on heterogeneous data with automatic preprocessing and model selection.
68

69
```python { .api }
70
def fit(
71
    self,
72
    train_data,
73
    presets: str = None,
74
    tuning_data = None,
75
    max_num_tuning_data: int = None,
76
    id_mappings: dict = None,
77
    time_limit: int = None,
78
    save_path: str = None,
79
    hyperparameters = None,
80
    column_types: dict = None,
81
    holdout_frac: float = None,
82
    teacher_predictor = None,
83
    seed: int = 0,
84
    standalone: bool = True,
85
    hyperparameter_tune_kwargs: dict = None,
86
    clean_ckpts: bool = True,
87
    predictions: list = None,
88
    labels = None,
89
    predictors: list = None
90
):
91
    """
92
    Fit MultiModalPredictor on multimodal training data.
93
    
94
    Parameters:
95
    - train_data: Training data (DataFrame with text, images, tabular columns)
96
    - presets: Quality/speed presets
97
    - tuning_data: Validation data for hyperparameter tuning
98
    - max_num_tuning_data: Maximum tuning samples for object detection
99
    - id_mappings: ID-to-content mappings for semantic matching
100
    - time_limit: Maximum training time in seconds
101
    - save_path: Directory to save models
102
    - hyperparameters: Custom hyperparameter configurations
103
    - column_types: Manual column type specifications
104
    - holdout_frac: Fraction of data for validation
105
    - teacher_predictor: Teacher model for knowledge distillation
106
    - seed: Random seed for reproducibility
107
    - standalone: Save complete model for offline deployment
108
    - hyperparameter_tune_kwargs: HPO configuration
109
    - clean_ckpts: Clean intermediate checkpoints
110
    - predictions: Pre-computed predictions for ensemble
111
    - labels: Pre-computed labels for ensemble
112
    - predictors: Pre-trained predictors for ensemble
113
    
114
    Returns:
115
    MultiModalPredictor: Fitted predictor instance
116
    """
117
```
118

119
### Prediction
120

121
Generate predictions for multimodal data across different task types.
122

123
```python { .api }
124
def predict(
125
    self,
126
    data,
127
    candidate_data = None,
128
    id_mappings: dict = None,
129
    as_pandas: bool = None,
130
    realtime: bool = False,
131
    save_results: bool = None,
132
    **kwargs
133
):
134
    """
135
    Generate predictions for multimodal data.
136
    
137
    Parameters:
138
    - data: Input data (DataFrame, dict, list, or file path)
139
    - candidate_data: Candidate data for semantic matching/retrieval
140
    - id_mappings: ID-to-content mappings
141
    - as_pandas: Return results as pandas DataFrame/Series
142
    - realtime: Use realtime inference optimization
143
    - save_results: Save prediction results to disk
144
    - **kwargs: Additional arguments (e.g., as_coco for object detection)
145
    
146
    Returns:
147
    Predictions in format appropriate for the task type
148
    """
149

150
def predict_proba(
151
    self,
152
    data,
153
    candidate_data = None,
154
    id_mappings: dict = None,
155
    as_pandas: bool = None,
156
    as_multiclass: bool = True,
157
    realtime: bool = False
158
):
159
    """
160
    Generate prediction probabilities for classification tasks.
161
    
162
    Parameters:
163
    - data: Input data
164
    - candidate_data: Candidate data for retrieval tasks
165
    - id_mappings: ID-to-content mappings
166
    - as_pandas: Return results as pandas DataFrame
167
    - as_multiclass: Return all class probabilities vs positive class only
168
    - realtime: Use realtime inference optimization
169
    
170
    Returns:
171
    Prediction probabilities as DataFrame or numpy array
172
    """
173
```
174

175
### Feature Extraction
176

177
Extract embeddings and features from multimodal data for downstream tasks.
178

179
```python { .api }
180
def extract_embedding(
181
    self,
182
    data,
183
    id_mappings: dict = None,
184
    return_masks: bool = False,
185
    as_tensor: bool = False,
186
    as_pandas: bool = False,
187
    realtime: bool = False,
188
    signature: str = None
189
):
190
    """
191
    Extract feature embeddings from multimodal data.
192
    
193
    Parameters:
194
    - data: Input data (DataFrame, dict, or list)
195
    - id_mappings: ID-to-content mappings
196
    - return_masks: Return attention masks for missing data
197
    - as_tensor: Return PyTorch tensors
198
    - as_pandas: Return pandas DataFrame
199
    - realtime: Use realtime inference optimization
200
    - signature: Signature type for semantic matching ('query' or 'response')
201
    
202
    Returns:
203
    Feature embeddings as numpy array, tensor, or DataFrame
204
    """
205
```
206

207
### Model Evaluation
208

209
Evaluate multimodal model performance with task-specific metrics.
210

211
```python { .api }
212
def evaluate(
213
    self,
214
    data,
215
    query_data: list = None,
216
    response_data: list = None,
217
    id_mappings: dict = None,
218
    metrics: list = None,
219
    chunk_size: int = 1024,
220
    similarity_type: str = "cosine",
221
    cutoffs: list = [1, 5, 10],
222
    label: str = None,
223
    return_pred: bool = False,
224
    realtime: bool = False,
225
    eval_tool: str = None,
226
    predictions: list = None,
227
    labels = None
228
):
229
    """
230
    Evaluate multimodal model performance.
231
    
232
    Parameters:
233
    - data: Test data (DataFrame, dict, list, or annotation file path)
234
    - query_data: Query data for ranking evaluation
235
    - response_data: Response data for ranking evaluation  
236
    - id_mappings: ID-to-content mappings
237
    - metrics: List of evaluation metrics
238
    - chunk_size: Batch size for similarity computation
239
    - similarity_type: Similarity function ('cosine', 'dot_prod')
240
    - cutoffs: Cutoff values for ranking metrics
241
    - label: Label column name
242
    - return_pred: Return individual predictions
243
    - realtime: Use realtime inference
244
    - eval_tool: Evaluation tool for object detection ('pycocotools', 'torchmetrics')
245
    - predictions: Pre-computed predictions
246
    - labels: Pre-computed labels
247
    
248
    Returns:
249
    dict: Evaluation metrics and optionally predictions
250
    """
251
```
252

253
### Model Management
254

255
Save, load, and export multimodal models for deployment.
256

257
```python { .api }
258
def save(self, path: str, standalone: bool = True):
259
    """
260
    Save trained predictor to disk.
261
    
262
    Parameters:
263
    - path: Directory to save predictor
264
    - standalone: Save complete model for offline deployment
265
    """
266

267
@classmethod
268
def load(
269
    cls,
270
    path: str,
271
    resume: bool = False,
272
    verbosity: int = 3
273
):
274
    """
275
    Load saved predictor from disk.
276
    
277
    Parameters:
278
    - path: Directory containing saved predictor
279
    - resume: Resume training from checkpoint
280
    - verbosity: Logging verbosity level
281
    
282
    Returns:
283
    MultiModalPredictor: Loaded predictor instance
284
    """
285

286
def export_onnx(
287
    self,
288
    data,
289
    path: str = None,
290
    batch_size: int = None,
291
    verbose: bool = False,
292
    opset_version: int = 16,
293
    truncate_long_and_double: bool = False
294
):
295
    """
296
    Export model to ONNX format for deployment.
297
    
298
    Parameters:
299
    - data: Sample data for tracing
300
    - path: Export path (if None, returns bytes)
301
    - batch_size: Batch size for export
302
    - verbose: Verbose export logging
303
    - opset_version: ONNX opset version
304
    - truncate_long_and_double: Truncate precision for compatibility
305
    
306
    Returns:
307
    Export path or ONNX model bytes
308
    """
309

310
def optimize_for_inference(self, providers: list = None):
311
    """
312
    Optimize model for faster inference using ONNX runtime.
313
    
314
    Parameters:
315
    - providers: ONNX execution providers
316
    
317
    Returns:
318
    Optimized ONNX module for inference
319
    """
320
```
321

322
### Advanced Features
323

324
Advanced functionality for specialized use cases and model analysis.
325

326
```python { .api }
327
def fit_summary(self, verbosity: int = 0, show_plot: bool = False):
328
    """
329
    Display training summary and model information.
330
    
331
    Parameters:
332
    - verbosity: Detail level (0-4)
333
    - show_plot: Show training plots
334
    
335
    Returns:
336
    dict: Training summary information
337
    """
338

339
def list_supported_models(self, pretrained: bool = True):
340
    """
341
    List supported models for the current problem type.
342
    
343
    Parameters:
344
    - pretrained: Show only models with pretrained weights
345
    
346
    Returns:
347
    list: Available model names
348
    """
349

350
def dump_model(self, save_path: str = None):
351
    """
352
    Export model weights and configs to local directory.
353
    
354
    Parameters:
355
    - save_path: Directory to save model files
356
    """
357

358
def set_num_gpus(self, num_gpus: int):
359
    """
360
    Set number of GPUs for training/inference.
361
    
362
    Parameters:
363
    - num_gpus: Number of GPUs to use
364
    """
365
```
366

367
### Properties
368

369
Access model and training information through properties.
370

371
```python { .api }
372
@property
373
def problem_type(self) -> str:
374
    """Type of ML problem (classification, object_detection, etc.)"""
375

376
@property
377
def label(self) -> str:
378
    """Name of target label column"""
379

380
@property
381
def eval_metric(self) -> str:
382
    """Evaluation metric used for model selection"""
383

384
@property
385
def class_labels(self) -> list:
386
    """Original class label names for classification"""
387

388
@property
389
def positive_class(self):
390
    """Positive class label for binary classification"""
391

392
@property
393
def total_parameters(self) -> int:
394
    """Total number of model parameters"""
395

396
@property
397
def trainable_parameters(self) -> int:
398
    """Number of trainable model parameters"""
399

400
@property
401
def model_size(self) -> float:
402
    """Model size in megabytes"""
403
```
404

405
## Usage Examples
406

407
### Text and Image Classification
408

409
```python
410
from autogluon.multimodal import MultiModalPredictor
411
import pandas as pd
412

413
# Prepare multimodal dataset
414
data = pd.DataFrame({
415
    'image_path': ['img1.jpg', 'img2.jpg', 'img3.jpg'],
416
    'text_content': ['Product description 1', 'Product description 2', 'Product description 3'],
417
    'price': [10.5, 25.0, 15.5],
418
    'category': ['A', 'B', 'A']
419
})
420

421
# Train multimodal classifier
422
predictor = MultiModalPredictor(
423
    label='category',
424
    problem_type='multiclass',
425
    presets='high_quality'
426
)
427

428
predictor.fit(
429
    data,
430
    time_limit=3600,
431
    column_types={
432
        'image_path': 'image_path',
433
        'text_content': 'text',
434
        'price': 'numerical'
435
    }
436
)
437

438
# Make predictions
439
predictions = predictor.predict(test_data)
440
probabilities = predictor.predict_proba(test_data)
441

442
# Extract embeddings
443
embeddings = predictor.extract_embedding(test_data)
444
print(f"Embedding shape: {embeddings.shape}")
445
```
446

447
### Object Detection
448

449
```python
450
# Object detection on image data
451
detector = MultiModalPredictor(
452
    problem_type='object_detection',
453
    presets='medium_quality',  
454
    classes=['person', 'car', 'bicycle'],  # Target classes
455
    path='./detection_models'
456
)
457

458
# Train on COCO-format data
459
detector.fit(
460
    train_data='train_annotations.json',  # COCO format
461
    time_limit=7200
462
)
463

464
# Predict bounding boxes
465
detections = detector.predict(
466
    'test_images/',
467
    save_results=True,
468
    as_coco=True  # Return COCO format results
469
)
470

471
# Evaluate with COCO metrics
472
metrics = detector.evaluate(
473
    'test_annotations.json',
474
    eval_tool='pycocotools'
475
)
476
print(f"mAP: {metrics['map']:.3f}")
477
```
478

479
### Semantic Text Matching
480

481
```python
482
# Text similarity for semantic matching
483
matcher = MultiModalPredictor(
484
    problem_type='text_similarity',
485
    query='question',
486
    response='answer',
487
    match_label='relevant'
488
)
489

490
# Train on query-response pairs
491
matcher.fit(qa_data, time_limit=1800)
492

493
# Find similar documents
494
query_data = ['What is machine learning?']
495
candidate_data = [
496
    'ML is a subset of AI',
497
    'Python is a programming language', 
498
    'Deep learning uses neural networks'
499
]
500

501
similarities = matcher.predict(
502
    data=query_data,
503
    candidate_data=candidate_data
504
)
505
print("Most similar:", candidate_data[similarities.argmax()])
506
```

Version

Tile

Files

multimodal.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

multimodal.mddocs/