0
# Pipelines
1
2
High-level, task-oriented interface for common machine learning operations. Pipelines abstract away model selection, preprocessing, and postprocessing, providing immediate access to state-of-the-art capabilities across text, vision, audio, and multimodal domains.
3
4
## Capabilities
5
6
### Pipeline Factory
7
8
Central factory function that creates task-specific pipeline instances with automatic model and tokenizer selection.
9
10
```python { .api }
11
def pipeline(
12
task: str = None,
13
model: str = None,
14
config: str = None,
15
tokenizer: str = None,
16
feature_extractor: str = None,
17
image_processor: str = None,
18
processor: str = None,
19
framework: str = None,
20
revision: str = None,
21
use_fast: bool = True,
22
token: Union[str, bool] = None,
23
device: Union[int, str, torch.device] = None,
24
device_map: Union[str, Dict] = None,
25
dtype: Union[str, torch.dtype] = "auto",
26
trust_remote_code: bool = False,
27
model_kwargs: Dict[str, Any] = None,
28
pipeline_class = None,
29
**kwargs
30
) -> Pipeline:
31
"""
32
Create a pipeline for a specific ML task.
33
34
Args:
35
task: The task name (e.g., "text-classification", "question-answering")
36
model: Model name or path (defaults to task-specific default)
37
config: Configuration name or path (auto-detected if None)
38
tokenizer: Tokenizer name or path (defaults to model's tokenizer)
39
feature_extractor: Feature extractor for audio/vision tasks
40
image_processor: Image processor for vision tasks
41
processor: Processor combining tokenizer and feature extraction
42
framework: "pt" (PyTorch), "tf" (TensorFlow), or auto-detect
43
revision: Model revision/branch to use
44
use_fast: Use fast tokenizer implementation when available
45
token: Hugging Face authentication token
46
device: Device for inference (int, str, or torch.device)
47
device_map: Advanced device mapping for multi-GPU
48
dtype: Data type for model weights ("auto", torch.float16, etc.)
49
trust_remote_code: Allow custom code from model repos
50
model_kwargs: Additional arguments for model initialization
51
pipeline_class: Custom pipeline class to use
52
53
Returns:
54
Task-specific pipeline instance
55
"""
56
```
57
58
### Text Classification
59
60
Classify text into predefined categories with confidence scores and label mapping.
61
62
```python { .api }
63
class TextClassificationPipeline(Pipeline):
64
def __call__(
65
self,
66
inputs: Union[str, List[str]],
67
top_k: int = None,
68
function_to_apply: str = "default"
69
) -> Union[Dict, List[Dict]]:
70
"""
71
Classify input text(s).
72
73
Args:
74
inputs: Text string or list of strings to classify
75
top_k: Return top-k predictions (None for all)
76
function_to_apply: "softmax", "sigmoid", or "none"
77
78
Returns:
79
Dictionary with 'label' and 'score' keys, or list of such dicts
80
"""
81
```
82
83
Usage example:
84
```python
85
classifier = pipeline("text-classification")
86
result = classifier("I love this movie!")
87
# Output: {'label': 'POSITIVE', 'score': 0.9998}
88
89
# Multi-class with top-k
90
classifier = pipeline("text-classification", model="cardiffnlp/twitter-roberta-base-emotion")
91
results = classifier("I'm so excited about this!", top_k=3)
92
# Output: [{'label': 'joy', 'score': 0.8}, {'label': 'optimism', 'score': 0.15}, ...]
93
```
94
95
### Token Classification
96
97
Identify and classify individual tokens in text for named entity recognition, part-of-speech tagging, and other token-level tasks.
98
99
```python { .api }
100
class TokenClassificationPipeline(Pipeline):
101
def __call__(
102
self,
103
inputs: Union[str, List[str]],
104
aggregation_strategy: str = "simple",
105
ignore_labels: List[str] = None
106
) -> Union[List[Dict], List[List[Dict]]]:
107
"""
108
Classify tokens in input text(s).
109
110
Args:
111
inputs: Text string or list of strings
112
aggregation_strategy: "simple", "first", "average", "max", or "none"
113
ignore_labels: Labels to filter out from results
114
115
Returns:
116
List of entity dictionaries with 'entity', 'score', 'index', 'word', 'start', 'end'
117
"""
118
```
119
120
Usage example:
121
```python
122
ner = pipeline("ner", aggregation_strategy="simple")
123
entities = ner("Apple Inc. was founded by Steve Jobs in Cupertino.")
124
# Output: [
125
# {'entity': 'ORG', 'score': 0.999, 'word': 'Apple Inc.', 'start': 0, 'end': 10},
126
# {'entity': 'PER', 'score': 0.998, 'word': 'Steve Jobs', 'start': 28, 'end': 38},
127
# {'entity': 'LOC', 'score': 0.992, 'word': 'Cupertino', 'start': 42, 'end': 51}
128
# ]
129
```
130
131
### Question Answering
132
133
Extract answers from context text given a question, with confidence scores and answer span positions.
134
135
```python { .api }
136
class QuestionAnsweringPipeline(Pipeline):
137
def __call__(
138
self,
139
question: str,
140
context: str,
141
top_k: int = 1,
142
doc_stride: int = 128,
143
max_answer_len: int = 15,
144
max_seq_len: int = 384,
145
max_question_len: int = 64,
146
handle_impossible_answer: bool = False
147
) -> Union[Dict, List[Dict]]:
148
"""
149
Extract answers from context given a question.
150
151
Args:
152
question: Question to answer
153
context: Context text containing the answer
154
top_k: Number of answers to return
155
doc_stride: Overlap between context chunks
156
max_answer_len: Maximum answer length in tokens
157
max_seq_len: Maximum sequence length
158
max_question_len: Maximum question length
159
handle_impossible_answer: Allow "impossible to answer" responses
160
161
Returns:
162
Dictionary with 'answer', 'score', 'start', 'end' keys
163
"""
164
```
165
166
Usage example:
167
```python
168
qa = pipeline("question-answering")
169
result = qa(
170
question="Where was Apple founded?",
171
context="Apple Inc. was founded by Steve Jobs in Cupertino, California."
172
)
173
# Output: {'answer': 'Cupertino, California', 'score': 0.95, 'start': 42, 'end': 63}
174
```
175
176
### Text Generation
177
178
Generate text continuations using autoregressive language models with extensive control over generation parameters.
179
180
```python { .api }
181
class TextGenerationPipeline(Pipeline):
182
def __call__(
183
self,
184
text_inputs: Union[str, List[str]],
185
return_full_text: bool = True,
186
clean_up_tokenization_spaces: bool = False,
187
**generate_kwargs
188
) -> Union[List[Dict], List[List[Dict]]]:
189
"""
190
Generate text continuations.
191
192
Args:
193
text_inputs: Input text(s) to continue
194
return_full_text: Include input in output
195
clean_up_tokenization_spaces: Clean tokenization artifacts
196
**generate_kwargs: Additional generation parameters (max_length, temperature, etc.)
197
198
Returns:
199
List of dictionaries with 'generated_text' key
200
"""
201
```
202
203
Usage example:
204
```python
205
generator = pipeline("text-generation", model="gpt2")
206
outputs = generator(
207
"The future of artificial intelligence is",
208
max_length=50,
209
num_return_sequences=2,
210
temperature=0.8
211
)
212
# Output: [
213
# {'generated_text': 'The future of artificial intelligence is bright and full of possibilities...'},
214
# {'generated_text': 'The future of artificial intelligence is uncertain but promising...'}
215
# ]
216
```
217
218
### Text Summarization
219
220
Generate concise summaries of longer texts using sequence-to-sequence models.
221
222
```python { .api }
223
class SummarizationPipeline(Pipeline):
224
def __call__(
225
self,
226
documents: Union[str, List[str]],
227
return_text: bool = True,
228
return_tensors: bool = False,
229
clean_up_tokenization_spaces: bool = False,
230
**generate_kwargs
231
) -> Union[List[Dict], List[List[Dict]]]:
232
"""
233
Summarize input documents.
234
235
Args:
236
documents: Text(s) to summarize
237
return_text: Return text summaries
238
return_tensors: Return tensor outputs
239
clean_up_tokenization_spaces: Clean tokenization artifacts
240
**generate_kwargs: Generation parameters (max_length, min_length, etc.)
241
242
Returns:
243
List of dictionaries with 'summary_text' key
244
"""
245
```
246
247
### Translation
248
249
Translate text between languages using sequence-to-sequence models.
250
251
```python { .api }
252
class TranslationPipeline(Pipeline):
253
def __call__(
254
self,
255
text: Union[str, List[str]],
256
return_text: bool = True,
257
clean_up_tokenization_spaces: bool = False,
258
**generate_kwargs
259
) -> Union[List[Dict], List[List[Dict]]]:
260
"""
261
Translate input text.
262
263
Args:
264
text: Text(s) to translate
265
return_text: Return translated text
266
clean_up_tokenization_spaces: Clean tokenization artifacts
267
**generate_kwargs: Generation parameters
268
269
Returns:
270
List of dictionaries with 'translation_text' key
271
"""
272
```
273
274
### Image Classification
275
276
Classify images into predefined categories with confidence scores.
277
278
```python { .api }
279
class ImageClassificationPipeline(Pipeline):
280
def __call__(
281
self,
282
images: Union[str, "PIL.Image", List],
283
top_k: int = 5
284
) -> Union[List[Dict], List[List[Dict]]]:
285
"""
286
Classify input image(s).
287
288
Args:
289
images: Image path, PIL Image, or list of images
290
top_k: Number of top predictions to return
291
292
Returns:
293
List of dictionaries with 'label' and 'score' keys
294
"""
295
```
296
297
### Object Detection
298
299
Detect and locate objects in images with bounding boxes and confidence scores.
300
301
```python { .api }
302
class ObjectDetectionPipeline(Pipeline):
303
def __call__(
304
self,
305
images: Union[str, "PIL.Image", List],
306
threshold: float = 0.9
307
) -> Union[List[Dict], List[List[Dict]]]:
308
"""
309
Detect objects in image(s).
310
311
Args:
312
images: Image path, PIL Image, or list of images
313
threshold: Confidence threshold for detections
314
315
Returns:
316
List of dictionaries with 'score', 'label', 'box' keys
317
"""
318
```
319
320
### Automatic Speech Recognition
321
322
Convert speech audio to text with support for various audio formats and languages.
323
324
```python { .api }
325
class AutomaticSpeechRecognitionPipeline(Pipeline):
326
def __call__(
327
self,
328
inputs: Union[np.ndarray, bytes, str],
329
return_timestamps: Union[bool, str] = False,
330
generate_kwargs: Dict = None
331
) -> Union[Dict, List[Dict]]:
332
"""
333
Transcribe speech to text.
334
335
Args:
336
inputs: Audio array, bytes, or file path
337
return_timestamps: Include word-level timestamps
338
generate_kwargs: Additional generation parameters
339
340
Returns:
341
Dictionary with 'text' key and optional timestamps
342
"""
343
```
344
345
### Zero-Shot Classification
346
347
Classify text into arbitrary categories without task-specific training.
348
349
```python { .api }
350
class ZeroShotClassificationPipeline(Pipeline):
351
def __call__(
352
self,
353
sequences: Union[str, List[str]],
354
candidate_labels: List[str],
355
hypothesis_template: str = "This example is {}.",
356
multi_label: bool = False
357
) -> Union[Dict, List[Dict]]:
358
"""
359
Classify text into arbitrary categories.
360
361
Args:
362
sequences: Text(s) to classify
363
candidate_labels: Possible classification labels
364
hypothesis_template: Template for label hypotheses
365
multi_label: Allow multiple labels per input
366
367
Returns:
368
Dictionary with 'sequence', 'labels', 'scores' keys
369
"""
370
```
371
372
Usage example:
373
```python
374
classifier = pipeline("zero-shot-classification")
375
result = classifier(
376
"This is a movie review about a great film.",
377
candidate_labels=["movie", "sports", "technology", "politics"]
378
)
379
# Output: {
380
# 'sequence': 'This is a movie review about a great film.',
381
# 'labels': ['movie', 'technology', 'politics', 'sports'],
382
# 'scores': [0.85, 0.08, 0.04, 0.03]
383
# }
384
```
385
386
### Fill Mask
387
388
Predict masked tokens in text using masked language models.
389
390
```python { .api }
391
class FillMaskPipeline(Pipeline):
392
def __call__(
393
self,
394
inputs: Union[str, List[str]],
395
top_k: int = 5
396
) -> Union[List[Dict], List[List[Dict]]]:
397
"""
398
Fill masked tokens in text.
399
400
Args:
401
inputs: Text with [MASK] tokens or list of such texts
402
top_k: Number of predictions per mask
403
404
Returns:
405
List of dictionaries with 'score', 'token', 'token_str', 'sequence' keys
406
"""
407
```
408
409
### Image Text To Text
410
411
Generate text descriptions from images with optional text prompts, supporting multimodal understanding tasks.
412
413
```python { .api }
414
class ImageTextToTextPipeline(Pipeline):
415
def __call__(
416
self,
417
images,
418
prompt: str = None,
419
**kwargs
420
) -> Union[str, List[str]]:
421
"""
422
Generate text from images with optional prompts.
423
424
Args:
425
images: Single image or list of images (PIL, numpy array, or paths)
426
prompt: Optional text prompt to guide generation
427
428
Returns:
429
Generated text string or list of strings
430
"""
431
```
432
433
### Video Classification
434
435
Classify video content into predefined categories with temporal understanding.
436
437
```python { .api }
438
class VideoClassificationPipeline(Pipeline):
439
def __call__(
440
self,
441
videos,
442
top_k: int = 5
443
) -> Union[List[Dict], List[List[Dict]]]:
444
"""
445
Classify video content.
446
447
Args:
448
videos: Video file path(s) or video tensor(s)
449
top_k: Number of top predictions to return
450
451
Returns:
452
List of classification results with 'label' and 'score'
453
"""
454
```
455
456
### Depth Estimation
457
458
Estimate depth information from single images for 3D scene understanding.
459
460
```python { .api }
461
class DepthEstimationPipeline(Pipeline):
462
def __call__(
463
self,
464
images
465
) -> Union[Dict, List[Dict]]:
466
"""
467
Estimate depth from images.
468
469
Args:
470
images: Single image or list of images
471
472
Returns:
473
Dictionary with 'predicted_depth' and 'depth' keys
474
"""
475
```
476
477
### Conversational
478
479
Engage in multi-turn conversations with context-aware response generation.
480
481
```python { .api }
482
class ConversationalPipeline(Pipeline):
483
def __call__(
484
self,
485
conversations,
486
clean_up_tokenization_spaces: bool = False,
487
**generate_kwargs
488
) -> Union[Conversation, List[Conversation]]:
489
"""
490
Generate conversational responses.
491
492
Args:
493
conversations: Conversation object(s) with history
494
clean_up_tokenization_spaces: Remove extra spaces in output
495
**generate_kwargs: Additional generation parameters
496
497
Returns:
498
Updated Conversation object(s) with new responses
499
"""
500
```
501
502
## Pipeline Base Class
503
504
All pipelines inherit from the base Pipeline class:
505
506
```python { .api }
507
class Pipeline:
508
def __init__(
509
self,
510
model: PreTrainedModel,
511
tokenizer: PreTrainedTokenizer = None,
512
feature_extractor = None,
513
modelcard: ModelCard = None,
514
framework: str = None,
515
task: str = "",
516
args_parser = None,
517
device: int = -1,
518
torch_dtype = None,
519
binary_output: bool = False
520
)
521
522
def save_pretrained(
523
self,
524
save_directory: str,
525
safe_serialization: bool = True,
526
**kwargs
527
) -> None:
528
"""Save pipeline components to directory."""
529
530
def __call__(self, inputs, **kwargs):
531
"""Process inputs through the pipeline."""
532
533
def predict(self, inputs, **kwargs):
534
"""Alias for __call__."""
535
536
def transform(self, inputs, **kwargs):
537
"""Alias for __call__."""
538
```
539
540
## Available Pipeline Tasks
541
542
Complete list of supported pipeline tasks:
543
544
- **Text**: "text-classification", "token-classification", "question-answering", "fill-mask", "summarization", "translation", "text2text-generation", "text-generation", "zero-shot-classification", "conversational"
545
- **Vision**: "image-classification", "image-segmentation", "image-to-text", "image-to-image", "object-detection", "depth-estimation", "zero-shot-image-classification", "zero-shot-object-detection", "keypoint-matching", "mask-generation"
546
- **Audio**: "automatic-speech-recognition", "audio-classification", "text-to-audio", "zero-shot-audio-classification"
547
- **Video**: "video-classification"
548
- **Multimodal**: "visual-question-answering", "document-question-answering", "image-text-to-text", "feature-extraction"
549
550
Each task automatically selects appropriate default models when no specific model is provided.