0
# Pipelines
1
2
Pipelines provide a high-level, task-specific API for running machine learning models. The pipeline interface is the easiest way to use transformers.js for most ML tasks, automatically handling model loading, preprocessing, and postprocessing.
3
4
## Capabilities
5
6
### Main Pipeline Function
7
8
Creates a pipeline instance for a specific machine learning task with automatic model selection and preprocessing.
9
10
```javascript { .api }
11
/**
12
* Create a pipeline for a specific ML task
13
* @param task - The task identifier (see supported tasks below)
14
* @param model - Optional model name/path (uses default if not specified)
15
* @param options - Configuration options for the pipeline
16
* @returns Promise that resolves to a Pipeline instance
17
*/
18
async function pipeline(
19
task: string,
20
model?: string,
21
options?: PipelineOptions
22
): Promise<Pipeline>;
23
24
interface PipelineOptions {
25
/** Whether to use quantized version of the model (default: true) */
26
quantized?: boolean;
27
/** Callback function to track model download progress */
28
progress_callback?: (progress: any) => void;
29
/** Custom model configuration */
30
config?: any;
31
/** Directory to cache downloaded models */
32
cache_dir?: string;
33
/** Only use local files, don't download from remote */
34
local_files_only?: boolean;
35
/** Model revision/branch to use (default: 'main') */
36
revision?: string;
37
/** Specific model file name to use */
38
model_file_name?: string;
39
}
40
```
41
42
**Usage Examples:**
43
44
```javascript
45
import { pipeline } from "@xenova/transformers";
46
47
// Basic usage with default model
48
const classifier = await pipeline("sentiment-analysis");
49
const result = await classifier("I love this library!");
50
// Output: [{ label: 'POSITIVE', score: 0.999 }]
51
52
// Custom model specification
53
const translator = await pipeline("translation", "Xenova/opus-mt-en-de");
54
const translation = await translator("Hello world");
55
56
// With custom options
57
const generator = await pipeline("text-generation", "gpt2", {
58
quantized: false,
59
progress_callback: (progress) => console.log(progress),
60
});
61
```
62
63
### Text Processing Tasks
64
65
#### Text Classification
66
67
Classify text into predefined categories (sentiment analysis, topic classification, etc.).
68
69
```javascript { .api }
70
interface TextClassificationPipeline {
71
(
72
texts: string | string[],
73
options?: {
74
top_k?: number;
75
function_to_apply?: string;
76
}
77
): Promise<Array<{
78
label: string;
79
score: number;
80
}>>;
81
}
82
```
83
84
**Supported Task Names:** `"text-classification"`, `"sentiment-analysis"`
85
86
**Usage Example:**
87
88
```javascript
89
const classifier = await pipeline("sentiment-analysis");
90
const results = await classifier(["I love this!", "This is terrible"]);
91
// Results: [
92
// [{ label: 'POSITIVE', score: 0.999 }],
93
// [{ label: 'NEGATIVE', score: 0.998 }]
94
// ]
95
```
96
97
#### Token Classification
98
99
Classify individual tokens (Named Entity Recognition, Part-of-Speech tagging).
100
101
```javascript { .api }
102
interface TokenClassificationPipeline {
103
(
104
texts: string | string[],
105
options?: {
106
aggregation_strategy?: string;
107
ignore_labels?: string[];
108
}
109
): Promise<Array<{
110
entity: string;
111
score: number;
112
index: number;
113
word: string;
114
start: number;
115
end: number;
116
}>>;
117
}
118
```
119
120
**Supported Task Names:** `"token-classification"`, `"ner"`
121
122
#### Question Answering
123
124
Extract answers from context text based on questions.
125
126
```javascript { .api }
127
interface QuestionAnsweringPipeline {
128
(
129
question: string,
130
context: string,
131
options?: {
132
top_k?: number;
133
}
134
): Promise<{
135
answer: string;
136
score: number;
137
start: number;
138
end: number;
139
}>;
140
}
141
```
142
143
**Supported Task Names:** `"question-answering"`
144
145
#### Fill Mask
146
147
Fill masked tokens in text.
148
149
```javascript { .api }
150
interface FillMaskPipeline {
151
(
152
texts: string | string[],
153
options?: {
154
top_k?: number;
155
}
156
): Promise<Array<{
157
score: number;
158
token: number;
159
token_str: string;
160
sequence: string;
161
}>>;
162
}
163
```
164
165
**Supported Task Names:** `"fill-mask"`
166
167
#### Text Generation
168
169
Generate text continuations from input prompts.
170
171
```javascript { .api }
172
interface TextGenerationPipeline {
173
(
174
texts: string | string[],
175
options?: {
176
max_new_tokens?: number;
177
do_sample?: boolean;
178
temperature?: number;
179
top_k?: number;
180
top_p?: number;
181
}
182
): Promise<Array<{
183
generated_text: string;
184
}>>;
185
}
186
```
187
188
**Supported Task Names:** `"text-generation"`
189
190
#### Text-to-Text Generation
191
192
Generate text from text input (includes summarization, translation).
193
194
```javascript { .api }
195
interface Text2TextGenerationPipeline {
196
(
197
texts: string | string[],
198
options?: {
199
max_new_tokens?: number;
200
do_sample?: boolean;
201
temperature?: number;
202
}
203
): Promise<Array<{
204
generated_text: string;
205
}>>;
206
}
207
208
interface SummarizationPipeline {
209
(
210
texts: string | string[],
211
options?: {
212
max_new_tokens?: number;
213
min_new_tokens?: number;
214
}
215
): Promise<Array<{
216
summary_text: string;
217
}>>;
218
}
219
220
interface TranslationPipeline {
221
(
222
texts: string | string[],
223
options?: {
224
max_new_tokens?: number;
225
}
226
): Promise<Array<{
227
translation_text: string;
228
}>>;
229
}
230
```
231
232
**Supported Task Names:** `"text2text-generation"`, `"summarization"`, `"translation"`
233
234
#### Zero-Shot Classification
235
236
Classify text without predefined training examples.
237
238
```javascript { .api }
239
interface ZeroShotClassificationPipeline {
240
(
241
texts: string | string[],
242
candidate_labels: string[],
243
options?: {
244
hypothesis_template?: string;
245
multi_label?: boolean;
246
}
247
): Promise<{
248
sequence: string;
249
labels: string[];
250
scores: number[];
251
}>;
252
}
253
```
254
255
**Supported Task Names:** `"zero-shot-classification"`
256
257
#### Feature Extraction
258
259
Extract embeddings from text for similarity tasks.
260
261
```javascript { .api }
262
interface FeatureExtractionPipeline {
263
(
264
texts: string | string[],
265
options?: {
266
pooling?: string;
267
normalize?: boolean;
268
quantize?: boolean;
269
precision?: string;
270
}
271
): Promise<Tensor>;
272
}
273
```
274
275
**Supported Task Names:** `"feature-extraction"`, `"embeddings"`
276
277
### Vision Processing Tasks
278
279
#### Image Classification
280
281
Classify images into predefined categories.
282
283
```javascript { .api }
284
interface ImageClassificationPipeline {
285
(
286
images: ImageInput | ImageInput[],
287
options?: {
288
top_k?: number;
289
}
290
): Promise<Array<{
291
label: string;
292
score: number;
293
}>>;
294
}
295
```
296
297
**Supported Task Names:** `"image-classification"`
298
299
#### Object Detection
300
301
Detect and locate objects in images.
302
303
```javascript { .api }
304
interface ObjectDetectionPipeline {
305
(
306
images: ImageInput | ImageInput[],
307
options?: {
308
threshold?: number;
309
percentage?: boolean;
310
}
311
): Promise<Array<{
312
score: number;
313
label: string;
314
box: {
315
xmin: number;
316
ymin: number;
317
xmax: number;
318
ymax: number;
319
};
320
}>>;
321
}
322
```
323
324
**Supported Task Names:** `"object-detection"`
325
326
#### Zero-Shot Object Detection
327
328
Detect objects in images using text descriptions.
329
330
```javascript { .api }
331
interface ZeroShotObjectDetectionPipeline {
332
(
333
images: ImageInput | ImageInput[],
334
candidate_labels: string[],
335
options?: {
336
threshold?: number;
337
percentage?: boolean;
338
}
339
): Promise<Array<{
340
score: number;
341
label: string;
342
box: {
343
xmin: number;
344
ymin: number;
345
xmax: number;
346
ymax: number;
347
};
348
}>>;
349
}
350
```
351
352
**Supported Task Names:** `"zero-shot-object-detection"`
353
354
#### Image Segmentation
355
356
Segment objects and regions in images.
357
358
```javascript { .api }
359
interface ImageSegmentationPipeline {
360
(
361
images: ImageInput | ImageInput[],
362
options?: {
363
threshold?: number;
364
mask_threshold?: number;
365
overlap_mask_area_threshold?: number;
366
}
367
): Promise<Array<{
368
score: number;
369
label: string;
370
mask: RawImage;
371
}>>;
372
}
373
```
374
375
**Supported Task Names:** `"image-segmentation"`
376
377
#### Zero-Shot Image Classification
378
379
Classify images using text descriptions.
380
381
```javascript { .api }
382
interface ZeroShotImageClassificationPipeline {
383
(
384
images: ImageInput | ImageInput[],
385
candidate_labels: string[],
386
options?: {
387
hypothesis_template?: string;
388
}
389
): Promise<Array<{
390
label: string;
391
score: number;
392
}>>;
393
}
394
```
395
396
**Supported Task Names:** `"zero-shot-image-classification"`
397
398
#### Image-to-Text
399
400
Generate text descriptions from images.
401
402
```javascript { .api }
403
interface ImageToTextPipeline {
404
(
405
images: ImageInput | ImageInput[],
406
options?: {
407
max_new_tokens?: number;
408
do_sample?: boolean;
409
temperature?: number;
410
}
411
): Promise<Array<{
412
generated_text: string;
413
}>>;
414
}
415
```
416
417
**Supported Task Names:** `"image-to-text"`
418
419
#### Image-to-Image
420
421
Transform images (super-resolution, style transfer).
422
423
```javascript { .api }
424
interface ImageToImagePipeline {
425
(
426
images: ImageInput | ImageInput[]
427
): Promise<RawImage[]>;
428
}
429
```
430
431
**Supported Task Names:** `"image-to-image"`
432
433
#### Depth Estimation
434
435
Estimate depth maps from images.
436
437
```javascript { .api }
438
interface DepthEstimationPipeline {
439
(
440
images: ImageInput | ImageInput[]
441
): Promise<Array<{
442
predicted_depth: Tensor;
443
depth: RawImage;
444
}>>;
445
}
446
```
447
448
**Supported Task Names:** `"depth-estimation"`
449
450
#### Image Feature Extraction
451
452
Extract embeddings from images.
453
454
```javascript { .api }
455
interface ImageFeatureExtractionPipeline {
456
(
457
images: ImageInput | ImageInput[],
458
options?: {
459
pool?: boolean;
460
normalize?: boolean;
461
quantize?: boolean;
462
precision?: string;
463
}
464
): Promise<Tensor>;
465
}
466
```
467
468
**Supported Task Names:** `"image-feature-extraction"`
469
470
### Audio Processing Tasks
471
472
#### Audio Classification
473
474
Classify audio content into categories.
475
476
```javascript { .api }
477
interface AudioClassificationPipeline {
478
(
479
audio: AudioInput | AudioInput[],
480
options?: {
481
top_k?: number;
482
}
483
): Promise<Array<{
484
label: string;
485
score: number;
486
}>>;
487
}
488
```
489
490
**Supported Task Names:** `"audio-classification"`
491
492
#### Zero-Shot Audio Classification
493
494
Classify audio using text descriptions.
495
496
```javascript { .api }
497
interface ZeroShotAudioClassificationPipeline {
498
(
499
audio: AudioInput | AudioInput[],
500
candidate_labels: string[],
501
options?: {
502
hypothesis_template?: string;
503
}
504
): Promise<Array<{
505
label: string;
506
score: number;
507
}>>;
508
}
509
```
510
511
**Supported Task Names:** `"zero-shot-audio-classification"`
512
513
#### Automatic Speech Recognition
514
515
Convert speech to text.
516
517
```javascript { .api }
518
interface AutomaticSpeechRecognitionPipeline {
519
(
520
audio: AudioInput | AudioInput[],
521
options?: {
522
top_k?: number;
523
hotwords?: string;
524
language?: string;
525
task?: string;
526
return_timestamps?: boolean | string;
527
chunk_length_s?: number;
528
stride_length_s?: number;
529
}
530
): Promise<{
531
text: string;
532
chunks?: Array<{
533
text: string;
534
timestamp: [number, number];
535
}>;
536
}>;
537
}
538
```
539
540
**Supported Task Names:** `"automatic-speech-recognition"`, `"asr"`
541
542
#### Text-to-Audio
543
544
Generate audio from text.
545
546
```javascript { .api }
547
interface TextToAudioPipeline {
548
(
549
texts: string | string[],
550
options?: {
551
speaker_embeddings?: Tensor;
552
}
553
): Promise<{
554
audio: Float32Array;
555
sampling_rate: number;
556
}>;
557
}
558
```
559
560
**Supported Task Names:** `"text-to-audio"`, `"text-to-speech"`
561
562
### Multimodal Tasks
563
564
#### Document Question Answering
565
566
Answer questions about document images.
567
568
```javascript { .api }
569
interface DocumentQuestionAnsweringPipeline {
570
(
571
image: ImageInput,
572
question: string,
573
options?: {
574
top_k?: number;
575
}
576
): Promise<Array<{
577
answer: string;
578
score: number;
579
}>>;
580
}
581
```
582
583
**Supported Task Names:** `"document-question-answering"`
584
585
## Types
586
587
```javascript { .api }
588
type ImageInput = string | RawImage | URL;
589
type AudioInput = string | URL | Float32Array | Float64Array;
590
591
interface Pipeline {
592
(input: any, options?: any): Promise<any>;
593
dispose(): Promise<void>;
594
}
595
```
596
597
## Supported Tasks Summary
598
599
| Task | Task Names | Input Type | Output Type |
600
|------|------------|-----------|-------------|
601
| Text Classification | `text-classification`, `sentiment-analysis` | Text | Labels + Scores |
602
| Token Classification | `token-classification`, `ner` | Text | Token Labels |
603
| Question Answering | `question-answering` | Question + Context | Answer + Score |
604
| Fill Mask | `fill-mask` | Masked Text | Token Predictions |
605
| Text Generation | `text-generation` | Text Prompt | Generated Text |
606
| Summarization | `summarization` | Text | Summary |
607
| Translation | `translation` | Text | Translated Text |
608
| Zero-Shot Classification | `zero-shot-classification` | Text + Labels | Classification |
609
| Feature Extraction | `feature-extraction`, `embeddings` | Text | Embeddings |
610
| Image Classification | `image-classification` | Image | Labels + Scores |
611
| Object Detection | `object-detection` | Image | Objects + Boxes |
612
| Image Segmentation | `image-segmentation` | Image | Segments + Masks |
613
| Zero-Shot Image Classification | `zero-shot-image-classification` | Image + Labels | Classification |
614
| Image-to-Text | `image-to-text` | Image | Generated Text |
615
| Audio Classification | `audio-classification` | Audio | Labels + Scores |
616
| Speech Recognition | `automatic-speech-recognition`, `asr` | Audio | Transcribed Text |
617
| Text-to-Audio | `text-to-audio`, `text-to-speech` | Text | Audio Waveform |
618
| Document QA | `document-question-answering` | Document + Question | Answer |