0
# Instructor Python Package
1
2
The instructor package provides structured output extraction from Large Language Models (LLMs) with Pydantic validation. It enables type-safe interactions with various LLM providers while maintaining consistent API patterns across different platforms.
3
4
## Package Information
5
6
- **Package Name**: instructor
7
- **Language**: Python
8
- **Version**: 1.11.3
9
- **Installation**: `pip install instructor`
10
- **Repository**: [GitHub](https://github.com/jxnl/instructor)
11
12
## Core Imports
13
14
```python { .api }
15
import instructor
16
from instructor import (
17
Instructor, AsyncInstructor,
18
from_openai, from_litellm, from_provider,
19
Maybe, Partial, IterableModel, CitationMixin,
20
Mode, Provider,
21
patch, apatch,
22
llm_validator, openai_moderation,
23
BatchProcessor, BatchRequest, BatchJob,
24
Image, Audio,
25
generate_openai_schema, generate_anthropic_schema, generate_gemini_schema,
26
OpenAISchema, openai_schema,
27
FinetuneFormat, Instructions
28
)
29
from instructor.core import hooks
30
31
# Conditional provider imports (require optional dependencies)
32
# from instructor import from_anthropic # requires 'anthropic' package
33
# from instructor import from_gemini # requires 'google-generativeai'
34
# from instructor import from_genai # requires 'google-genai'
35
# from instructor import from_fireworks # requires 'fireworks' package
36
# from instructor import from_cerebras # requires 'cerebras' package
37
# from instructor import from_groq # requires 'groq' package
38
# from instructor import from_mistral # requires 'mistralai' package
39
# from instructor import from_cohere # requires 'cohere' package
40
# from instructor import from_vertexai # requires 'vertexai' + 'jsonref'
41
# from instructor import from_bedrock # requires 'boto3' package
42
# from instructor import from_writer # requires 'writerai' package
43
# from instructor import from_xai # requires 'xai_sdk' package
44
# from instructor import from_perplexity # requires 'openai' package
45
46
# Type imports for documentation
47
from typing import List, Dict, Any, Type, Optional, Union
48
from pydantic import BaseModel
49
from openai.types.chat import ChatCompletionMessageParam
50
```
51
52
## Basic Usage Example
53
54
```python { .api }
55
import instructor
56
from openai import OpenAI
57
from pydantic import BaseModel
58
59
# Create client
60
client = instructor.from_openai(OpenAI())
61
62
# Define response model
63
class UserProfile(BaseModel):
64
name: str
65
age: int
66
email: str
67
68
# Extract structured data
69
user = client.create(
70
response_model=UserProfile,
71
messages=[{"role": "user", "content": "Extract: John Doe, 25, john@example.com"}],
72
model="gpt-4"
73
)
74
print(user.name) # "John Doe"
75
```
76
77
## Capabilities
78
79
### Client Creation & Configuration
80
81
Create instructor clients for various LLM providers with type safety and validation:
82
83
```python { .api }
84
# OpenAI client
85
from instructor import from_openai
86
from openai import OpenAI, AsyncOpenAI
87
88
client = from_openai(OpenAI(), mode=instructor.Mode.TOOLS)
89
async_client = from_openai(AsyncOpenAI(), mode=instructor.Mode.TOOLS)
90
91
# Anthropic client (requires 'anthropic' package)
92
# from instructor import from_anthropic
93
# from anthropic import Anthropic
94
# client = from_anthropic(Anthropic(), mode=instructor.Mode.ANTHROPIC_TOOLS)
95
96
# Auto-detect provider
97
from instructor import from_provider
98
99
client = from_provider(some_llm_client)
100
```
101
102
[Client Usage Documentation](./client-usage.md)
103
104
### Core Client Methods
105
106
Execute structured extractions with streaming, batching, and completion access:
107
108
```python { .api }
109
# Standard creation
110
result = client.create(
111
response_model=MyModel,
112
messages=[{"role": "user", "content": "..."}],
113
model="gpt-4"
114
)
115
116
# Streaming partial results
117
for partial in client.create_partial(
118
response_model=MyModel,
119
messages=[{"role": "user", "content": "..."}],
120
model="gpt-4"
121
):
122
print(partial)
123
124
# Iterable extraction
125
for result in client.create_iterable(
126
messages=[{"role": "user", "content": "..."}],
127
response_model=MyModel,
128
model="gpt-4"
129
):
130
print(result)
131
```
132
133
[Client Usage Documentation](./client-usage.md)
134
135
### Provider Support
136
137
Support for multiple LLM providers with consistent APIs:
138
139
```python { .api }
140
# OpenAI
141
from instructor import from_openai
142
client = from_openai(OpenAI())
143
144
# Anthropic (requires 'anthropic' package)
145
# from instructor import from_anthropic
146
# client = from_anthropic(Anthropic())
147
148
# Google providers (require optional packages)
149
# from instructor import from_gemini, from_vertexai, from_genai
150
# client = from_gemini(genai_client) # requires 'google-generativeai'
151
# client = from_vertexai(vertexai_client) # requires 'vertexai' + 'jsonref'
152
# client = from_genai(genai_client) # requires 'google-genai'
153
154
# LiteLLM (always available)
155
from instructor import from_litellm
156
client = from_litellm(litellm_client)
157
158
# Other providers (require optional packages)
159
# from instructor import (
160
# from_groq, from_mistral, from_cohere,
161
# from_fireworks, from_cerebras, from_bedrock, from_writer,
162
# from_xai, from_perplexity
163
# )
164
# client = from_groq(groq_client) # requires 'groq'
165
# client = from_mistral(mistral_client) # requires 'mistralai'
166
# client = from_cohere(cohere_client) # requires 'cohere'
167
# client = from_fireworks(fireworks_client) # requires 'fireworks'
168
# client = from_cerebras(cerebras_client) # requires 'cerebras'
169
# client = from_bedrock(bedrock_client) # requires 'boto3'
170
# client = from_writer(writer_client) # requires 'writerai'
171
# client = from_xai(xai_client) # requires 'xai_sdk'
172
# client = from_perplexity(perplexity_client) # requires 'openai'
173
```
174
175
[Provider Documentation](./providers.md)
176
177
### DSL Components
178
179
Domain-specific language components for advanced extraction patterns:
180
181
```python { .api }
182
from instructor import Maybe, Partial, IterableModel, CitationMixin
183
184
# Optional extraction
185
OptionalUser = Maybe(UserProfile)
186
187
# Streaming validation
188
PartialUser = Partial[UserProfile]
189
190
# Multi-task extraction
191
TaskList = IterableModel(Task, name="TaskExtraction")
192
193
# Citation tracking
194
class CitedResponse(CitationMixin, BaseModel):
195
content: str
196
confidence: float
197
```
198
199
[DSL Components Documentation](./dsl-components.md)
200
201
### Validation System
202
203
LLM-powered validation and content moderation:
204
205
```python { .api }
206
from instructor import llm_validator, openai_moderation
207
from pydantic import BaseModel, Field
208
209
class ValidatedModel(BaseModel):
210
content: str = Field(
211
...,
212
description="User content",
213
validator=llm_validator("Check if content is appropriate")
214
)
215
216
safe_content: str = Field(
217
...,
218
description="Content safe for all audiences",
219
validator=openai_moderation()
220
)
221
```
222
223
[Validation Documentation](./validation.md)
224
225
### Batch Processing
226
227
Efficient batch processing for large-scale extractions:
228
229
```python { .api }
230
from instructor import BatchProcessor, BatchRequest, BatchJob
231
232
# Modern batch processing
233
processor = BatchProcessor("openai/gpt-4o-mini", MyModel)
234
batch_id = processor.submit_batch("batch_requests.jsonl")
235
results = processor.retrieve_results(batch_id)
236
237
# Legacy batch processing
238
results, errors = BatchJob.parse_from_file("batch_results.jsonl", MyModel)
239
```
240
241
[Batch Processing Documentation](./batch-processing.md)
242
243
### Schema Generation
244
245
Generate provider-specific schemas from Pydantic models:
246
247
```python { .api }
248
from instructor import (
249
generate_openai_schema,
250
generate_anthropic_schema,
251
generate_gemini_schema,
252
OpenAISchema,
253
openai_schema
254
)
255
256
# Generate schemas
257
openai_schema = generate_openai_schema(MyModel)
258
anthropic_schema = generate_anthropic_schema(MyModel)
259
gemini_schema = generate_gemini_schema(MyModel)
260
261
# Schema decorator
262
@openai_schema
263
class MyModel(OpenAISchema):
264
field: str
265
```
266
267
[Schema Generation Documentation](./schema-generation.md)
268
269
### Multimodal Support
270
271
Handle images and audio in structured extractions:
272
273
```python { .api }
274
from instructor import Image, Audio
275
276
# Image handling
277
image = Image.from_url("https://example.com/image.jpg")
278
image = Image.from_path("/path/to/image.png")
279
image = Image.from_base64(base64_string)
280
281
# Convert for providers
282
openai_image = image.to_openai()
283
anthropic_image = image.to_anthropic()
284
285
# Audio handling
286
audio = Audio.from_path("/path/to/audio.wav")
287
openai_audio = audio.to_openai()
288
```
289
290
[Client Usage Documentation](./client-usage.md)
291
292
### Mode System & Configuration
293
294
Configure extraction modes for different providers and use cases:
295
296
```python { .api }
297
from instructor import Mode
298
299
# OpenAI modes
300
Mode.TOOLS # Function calling (recommended)
301
Mode.TOOLS_STRICT # Strict function calling
302
Mode.JSON # JSON mode
303
Mode.JSON_O1 # JSON mode for O1 models
304
Mode.JSON_SCHEMA # JSON schema mode
305
Mode.MD_JSON # Markdown JSON mode
306
Mode.PARALLEL_TOOLS # Parallel function calls
307
308
# Response API modes
309
Mode.RESPONSES_TOOLS # Response tools mode
310
Mode.RESPONSES_TOOLS_WITH_INBUILT_TOOLS # Response tools with built-in tools
311
312
# XAI modes
313
Mode.XAI_JSON # XAI JSON mode
314
Mode.XAI_TOOLS # XAI tools mode
315
316
# Anthropic modes
317
Mode.ANTHROPIC_TOOLS # Anthropic tools
318
Mode.ANTHROPIC_JSON # Anthropic JSON
319
Mode.ANTHROPIC_REASONING_TOOLS # Reasoning tools
320
Mode.ANTHROPIC_PARALLEL_TOOLS # Parallel tools
321
322
# Provider-specific modes
323
Mode.MISTRAL_TOOLS # Mistral tools
324
Mode.VERTEXAI_TOOLS # Vertex AI tools
325
Mode.GEMINI_TOOLS # Gemini tools
326
Mode.COHERE_TOOLS # Cohere tools
327
```
328
329
[Modes and Configuration Documentation](./modes-and-configuration.md)
330
331
## Related Documentation
332
333
- [Client Usage](./client-usage.md) - Core client functionality and methods
334
- [Provider Support](./providers.md) - Provider-specific clients and configuration
335
- [DSL Components](./dsl-components.md) - DSL components (Maybe, Partial, IterableModel, etc.)
336
- [Validation System](./validation.md) - Validation system and LLM validators
337
- [Batch Processing](./batch-processing.md) - Batch processing functionality
338
- [Schema Generation](./schema-generation.md) - Schema generation utilities
339
- [Modes & Configuration](./modes-and-configuration.md) - Mode system and configuration options