0
# OCR (Optical Character Recognition)
1
2
Process documents and images to extract text and structured data using optical character recognition. The OCR API can analyze various document formats and extract text with position information.
3
4
## Capabilities
5
6
### Document Processing
7
8
Process documents and images to extract text and structural information.
9
10
```python { .api }
11
def process(
12
model: str,
13
document: Document,
14
pages: Optional[List[int]] = None,
15
**kwargs
16
) -> OCRResponse:
17
"""
18
Process a document with OCR.
19
20
Parameters:
21
- model: OCR model identifier
22
- document: Document to process (image or PDF)
23
- pages: Optional list of page numbers to process
24
25
Returns:
26
OCRResponse with extracted text and structure information
27
"""
28
```
29
30
## Usage Examples
31
32
### Process Image Document
33
34
```python
35
from mistralai import Mistral
36
from mistralai.models import Document
37
38
client = Mistral(api_key="your-api-key")
39
40
# Process an image document
41
with open("document.pdf", "rb") as f:
42
document = Document(
43
type="application/pdf",
44
data=f.read()
45
)
46
47
response = client.ocr.process(
48
model="ocr-model",
49
document=document,
50
pages=[1, 2, 3] # Process first 3 pages
51
)
52
53
# Extract text from all pages
54
for page in response.pages:
55
print(f"Page {page.page_number}:")
56
print(f"Text: {page.text}")
57
print(f"Dimensions: {page.dimensions.width}x{page.dimensions.height}")
58
print()
59
```
60
61
### Process with Structure Analysis
62
63
```python
64
# Process document and analyze structure
65
response = client.ocr.process(
66
model="ocr-model",
67
document=document
68
)
69
70
# Access structured information
71
for page in response.pages:
72
print(f"Page {page.page_number}:")
73
74
# Extract images if present
75
for image in page.images:
76
print(f" Image: {image.width}x{image.height} at ({image.x}, {image.y})")
77
78
# Get text content
79
print(f" Text content: {len(page.text)} characters")
80
print(f" Preview: {page.text[:200]}...")
81
```
82
83
## Types
84
85
### Request Types
86
87
```python { .api }
88
class OCRRequest:
89
model: str
90
document: Document
91
pages: Optional[List[int]]
92
93
class Document:
94
type: str # MIME type (e.g., "application/pdf", "image/jpeg")
95
data: bytes # Document content as bytes
96
```
97
98
### Response Types
99
100
```python { .api }
101
class OCRResponse:
102
id: str
103
object: str
104
model: str
105
pages: List[OCRPageObject]
106
usage: Optional[OCRUsageInfo]
107
108
class OCRPageObject:
109
page_number: int
110
text: str
111
dimensions: OCRPageDimensions
112
images: List[OCRImageObject]
113
114
class OCRPageDimensions:
115
width: float
116
height: float
117
118
class OCRImageObject:
119
x: float
120
y: float
121
width: float
122
height: float
123
124
class OCRUsageInfo:
125
prompt_tokens: int
126
completion_tokens: int
127
total_tokens: int
128
```
129
130
## Supported Formats
131
132
### Document Types
133
134
- **PDF**: Multi-page PDF documents
135
- **Images**: JPEG, PNG, TIFF formats
136
- **Scanned Documents**: Digital scans of physical documents
137
138
### Output Information
139
140
- **Text Content**: Extracted text with reading order
141
- **Layout Information**: Page dimensions and structure
142
- **Image Detection**: Embedded images and their positions
143
- **Coordinate Information**: Position data for text elements
144
145
## Best Practices
146
147
### Document Quality
148
149
- Use high-resolution images for better accuracy
150
- Ensure good contrast between text and background
151
- Minimize skew and rotation in source documents
152
- Clean, well-lit scans produce better results
153
154
### Processing Optimization
155
156
- Specify page ranges for large documents to reduce processing time
157
- Consider document orientation and layout complexity
158
- Test with representative samples before batch processing