or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

batch-processing.mdclient-usage.mddsl-components.mdindex.mdmodes-and-configuration.mdproviders.mdschema-generation.mdvalidation.md

index.mddocs/

0

# Instructor Python Package

1

2

The instructor package provides structured output extraction from Large Language Models (LLMs) with Pydantic validation. It enables type-safe interactions with various LLM providers while maintaining consistent API patterns across different platforms.

3

4

## Package Information

5

6

- **Package Name**: instructor

7

- **Language**: Python

8

- **Version**: 1.11.3

9

- **Installation**: `pip install instructor`

10

- **Repository**: [GitHub](https://github.com/jxnl/instructor)

11

12

## Core Imports

13

14

```python { .api }

15

import instructor

16

from instructor import (

17

Instructor, AsyncInstructor,

18

from_openai, from_litellm, from_provider,

19

Maybe, Partial, IterableModel, CitationMixin,

20

Mode, Provider,

21

patch, apatch,

22

llm_validator, openai_moderation,

23

BatchProcessor, BatchRequest, BatchJob,

24

Image, Audio,

25

generate_openai_schema, generate_anthropic_schema, generate_gemini_schema,

26

OpenAISchema, openai_schema,

27

FinetuneFormat, Instructions

28

)

29

from instructor.core import hooks

30

31

# Conditional provider imports (require optional dependencies)

32

# from instructor import from_anthropic # requires 'anthropic' package

33

# from instructor import from_gemini # requires 'google-generativeai'

34

# from instructor import from_genai # requires 'google-genai'

35

# from instructor import from_fireworks # requires 'fireworks' package

36

# from instructor import from_cerebras # requires 'cerebras' package

37

# from instructor import from_groq # requires 'groq' package

38

# from instructor import from_mistral # requires 'mistralai' package

39

# from instructor import from_cohere # requires 'cohere' package

40

# from instructor import from_vertexai # requires 'vertexai' + 'jsonref'

41

# from instructor import from_bedrock # requires 'boto3' package

42

# from instructor import from_writer # requires 'writerai' package

43

# from instructor import from_xai # requires 'xai_sdk' package

44

# from instructor import from_perplexity # requires 'openai' package

45

46

# Type imports for documentation

47

from typing import List, Dict, Any, Type, Optional, Union

48

from pydantic import BaseModel

49

from openai.types.chat import ChatCompletionMessageParam

50

```

51

52

## Basic Usage Example

53

54

```python { .api }

55

import instructor

56

from openai import OpenAI

57

from pydantic import BaseModel

58

59

# Create client

60

client = instructor.from_openai(OpenAI())

61

62

# Define response model

63

class UserProfile(BaseModel):

64

name: str

65

age: int

66

email: str

67

68

# Extract structured data

69

user = client.create(

70

response_model=UserProfile,

71

messages=[{"role": "user", "content": "Extract: John Doe, 25, john@example.com"}],

72

model="gpt-4"

73

)

74

print(user.name) # "John Doe"

75

```

76

77

## Capabilities

78

79

### Client Creation & Configuration

80

81

Create instructor clients for various LLM providers with type safety and validation:

82

83

```python { .api }

84

# OpenAI client

85

from instructor import from_openai

86

from openai import OpenAI, AsyncOpenAI

87

88

client = from_openai(OpenAI(), mode=instructor.Mode.TOOLS)

89

async_client = from_openai(AsyncOpenAI(), mode=instructor.Mode.TOOLS)

90

91

# Anthropic client (requires 'anthropic' package)

92

# from instructor import from_anthropic

93

# from anthropic import Anthropic

94

# client = from_anthropic(Anthropic(), mode=instructor.Mode.ANTHROPIC_TOOLS)

95

96

# Auto-detect provider

97

from instructor import from_provider

98

99

client = from_provider(some_llm_client)

100

```

101

102

[Client Usage Documentation](./client-usage.md)

103

104

### Core Client Methods

105

106

Execute structured extractions with streaming, batching, and completion access:

107

108

```python { .api }

109

# Standard creation

110

result = client.create(

111

response_model=MyModel,

112

messages=[{"role": "user", "content": "..."}],

113

model="gpt-4"

114

)

115

116

# Streaming partial results

117

for partial in client.create_partial(

118

response_model=MyModel,

119

messages=[{"role": "user", "content": "..."}],

120

model="gpt-4"

121

):

122

print(partial)

123

124

# Iterable extraction

125

for result in client.create_iterable(

126

messages=[{"role": "user", "content": "..."}],

127

response_model=MyModel,

128

model="gpt-4"

129

):

130

print(result)

131

```

132

133

[Client Usage Documentation](./client-usage.md)

134

135

### Provider Support

136

137

Support for multiple LLM providers with consistent APIs:

138

139

```python { .api }

140

# OpenAI

141

from instructor import from_openai

142

client = from_openai(OpenAI())

143

144

# Anthropic (requires 'anthropic' package)

145

# from instructor import from_anthropic

146

# client = from_anthropic(Anthropic())

147

148

# Google providers (require optional packages)

149

# from instructor import from_gemini, from_vertexai, from_genai

150

# client = from_gemini(genai_client) # requires 'google-generativeai'

151

# client = from_vertexai(vertexai_client) # requires 'vertexai' + 'jsonref'

152

# client = from_genai(genai_client) # requires 'google-genai'

153

154

# LiteLLM (always available)

155

from instructor import from_litellm

156

client = from_litellm(litellm_client)

157

158

# Other providers (require optional packages)

159

# from instructor import (

160

# from_groq, from_mistral, from_cohere,

161

# from_fireworks, from_cerebras, from_bedrock, from_writer,

162

# from_xai, from_perplexity

163

# )

164

# client = from_groq(groq_client) # requires 'groq'

165

# client = from_mistral(mistral_client) # requires 'mistralai'

166

# client = from_cohere(cohere_client) # requires 'cohere'

167

# client = from_fireworks(fireworks_client) # requires 'fireworks'

168

# client = from_cerebras(cerebras_client) # requires 'cerebras'

169

# client = from_bedrock(bedrock_client) # requires 'boto3'

170

# client = from_writer(writer_client) # requires 'writerai'

171

# client = from_xai(xai_client) # requires 'xai_sdk'

172

# client = from_perplexity(perplexity_client) # requires 'openai'

173

```

174

175

[Provider Documentation](./providers.md)

176

177

### DSL Components

178

179

Domain-specific language components for advanced extraction patterns:

180

181

```python { .api }

182

from instructor import Maybe, Partial, IterableModel, CitationMixin

183

184

# Optional extraction

185

OptionalUser = Maybe(UserProfile)

186

187

# Streaming validation

188

PartialUser = Partial[UserProfile]

189

190

# Multi-task extraction

191

TaskList = IterableModel(Task, name="TaskExtraction")

192

193

# Citation tracking

194

class CitedResponse(CitationMixin, BaseModel):

195

content: str

196

confidence: float

197

```

198

199

[DSL Components Documentation](./dsl-components.md)

200

201

### Validation System

202

203

LLM-powered validation and content moderation:

204

205

```python { .api }

206

from instructor import llm_validator, openai_moderation

207

from pydantic import BaseModel, Field

208

209

class ValidatedModel(BaseModel):

210

content: str = Field(

211

...,

212

description="User content",

213

validator=llm_validator("Check if content is appropriate")

214

)

215

216

safe_content: str = Field(

217

...,

218

description="Content safe for all audiences",

219

validator=openai_moderation()

220

)

221

```

222

223

[Validation Documentation](./validation.md)

224

225

### Batch Processing

226

227

Efficient batch processing for large-scale extractions:

228

229

```python { .api }

230

from instructor import BatchProcessor, BatchRequest, BatchJob

231

232

# Modern batch processing

233

processor = BatchProcessor("openai/gpt-4o-mini", MyModel)

234

batch_id = processor.submit_batch("batch_requests.jsonl")

235

results = processor.retrieve_results(batch_id)

236

237

# Legacy batch processing

238

results, errors = BatchJob.parse_from_file("batch_results.jsonl", MyModel)

239

```

240

241

[Batch Processing Documentation](./batch-processing.md)

242

243

### Schema Generation

244

245

Generate provider-specific schemas from Pydantic models:

246

247

```python { .api }

248

from instructor import (

249

generate_openai_schema,

250

generate_anthropic_schema,

251

generate_gemini_schema,

252

OpenAISchema,

253

openai_schema

254

)

255

256

# Generate schemas

257

openai_schema = generate_openai_schema(MyModel)

258

anthropic_schema = generate_anthropic_schema(MyModel)

259

gemini_schema = generate_gemini_schema(MyModel)

260

261

# Schema decorator

262

@openai_schema

263

class MyModel(OpenAISchema):

264

field: str

265

```

266

267

[Schema Generation Documentation](./schema-generation.md)

268

269

### Multimodal Support

270

271

Handle images and audio in structured extractions:

272

273

```python { .api }

274

from instructor import Image, Audio

275

276

# Image handling

277

image = Image.from_url("https://example.com/image.jpg")

278

image = Image.from_path("/path/to/image.png")

279

image = Image.from_base64(base64_string)

280

281

# Convert for providers

282

openai_image = image.to_openai()

283

anthropic_image = image.to_anthropic()

284

285

# Audio handling

286

audio = Audio.from_path("/path/to/audio.wav")

287

openai_audio = audio.to_openai()

288

```

289

290

[Client Usage Documentation](./client-usage.md)

291

292

### Mode System & Configuration

293

294

Configure extraction modes for different providers and use cases:

295

296

```python { .api }

297

from instructor import Mode

298

299

# OpenAI modes

300

Mode.TOOLS # Function calling (recommended)

301

Mode.TOOLS_STRICT # Strict function calling

302

Mode.JSON # JSON mode

303

Mode.JSON_O1 # JSON mode for O1 models

304

Mode.JSON_SCHEMA # JSON schema mode

305

Mode.MD_JSON # Markdown JSON mode

306

Mode.PARALLEL_TOOLS # Parallel function calls

307

308

# Response API modes

309

Mode.RESPONSES_TOOLS # Response tools mode

310

Mode.RESPONSES_TOOLS_WITH_INBUILT_TOOLS # Response tools with built-in tools

311

312

# XAI modes

313

Mode.XAI_JSON # XAI JSON mode

314

Mode.XAI_TOOLS # XAI tools mode

315

316

# Anthropic modes

317

Mode.ANTHROPIC_TOOLS # Anthropic tools

318

Mode.ANTHROPIC_JSON # Anthropic JSON

319

Mode.ANTHROPIC_REASONING_TOOLS # Reasoning tools

320

Mode.ANTHROPIC_PARALLEL_TOOLS # Parallel tools

321

322

# Provider-specific modes

323

Mode.MISTRAL_TOOLS # Mistral tools

324

Mode.VERTEXAI_TOOLS # Vertex AI tools

325

Mode.GEMINI_TOOLS # Gemini tools

326

Mode.COHERE_TOOLS # Cohere tools

327

```

328

329

[Modes and Configuration Documentation](./modes-and-configuration.md)

330

331

## Related Documentation

332

333

- [Client Usage](./client-usage.md) - Core client functionality and methods

334

- [Provider Support](./providers.md) - Provider-specific clients and configuration

335

- [DSL Components](./dsl-components.md) - DSL components (Maybe, Partial, IterableModel, etc.)

336

- [Validation System](./validation.md) - Validation system and LLM validators

337

- [Batch Processing](./batch-processing.md) - Batch processing functionality

338

- [Schema Generation](./schema-generation.md) - Schema generation utilities

339

- [Modes & Configuration](./modes-and-configuration.md) - Mode system and configuration options