or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

agents.mdaudio.mdbatch.mdbeta.mdchat-completions.mdclassification.mdembeddings.mdfiles.mdfim.mdfine-tuning.mdindex.mdmodels.mdocr.md

ocr.mddocs/

0

# OCR (Optical Character Recognition)

1

2

Process documents and images to extract text and structured data using optical character recognition. The OCR API can analyze various document formats and extract text with position information.

3

4

## Capabilities

5

6

### Document Processing

7

8

Process documents and images to extract text and structural information.

9

10

```python { .api }

11

def process(

12

model: str,

13

document: Document,

14

pages: Optional[List[int]] = None,

15

**kwargs

16

) -> OCRResponse:

17

"""

18

Process a document with OCR.

19

20

Parameters:

21

- model: OCR model identifier

22

- document: Document to process (image or PDF)

23

- pages: Optional list of page numbers to process

24

25

Returns:

26

OCRResponse with extracted text and structure information

27

"""

28

```

29

30

## Usage Examples

31

32

### Process Image Document

33

34

```python

35

from mistralai import Mistral

36

from mistralai.models import Document

37

38

client = Mistral(api_key="your-api-key")

39

40

# Process an image document

41

with open("document.pdf", "rb") as f:

42

document = Document(

43

type="application/pdf",

44

data=f.read()

45

)

46

47

response = client.ocr.process(

48

model="ocr-model",

49

document=document,

50

pages=[1, 2, 3] # Process first 3 pages

51

)

52

53

# Extract text from all pages

54

for page in response.pages:

55

print(f"Page {page.page_number}:")

56

print(f"Text: {page.text}")

57

print(f"Dimensions: {page.dimensions.width}x{page.dimensions.height}")

58

print()

59

```

60

61

### Process with Structure Analysis

62

63

```python

64

# Process document and analyze structure

65

response = client.ocr.process(

66

model="ocr-model",

67

document=document

68

)

69

70

# Access structured information

71

for page in response.pages:

72

print(f"Page {page.page_number}:")

73

74

# Extract images if present

75

for image in page.images:

76

print(f" Image: {image.width}x{image.height} at ({image.x}, {image.y})")

77

78

# Get text content

79

print(f" Text content: {len(page.text)} characters")

80

print(f" Preview: {page.text[:200]}...")

81

```

82

83

## Types

84

85

### Request Types

86

87

```python { .api }

88

class OCRRequest:

89

model: str

90

document: Document

91

pages: Optional[List[int]]

92

93

class Document:

94

type: str # MIME type (e.g., "application/pdf", "image/jpeg")

95

data: bytes # Document content as bytes

96

```

97

98

### Response Types

99

100

```python { .api }

101

class OCRResponse:

102

id: str

103

object: str

104

model: str

105

pages: List[OCRPageObject]

106

usage: Optional[OCRUsageInfo]

107

108

class OCRPageObject:

109

page_number: int

110

text: str

111

dimensions: OCRPageDimensions

112

images: List[OCRImageObject]

113

114

class OCRPageDimensions:

115

width: float

116

height: float

117

118

class OCRImageObject:

119

x: float

120

y: float

121

width: float

122

height: float

123

124

class OCRUsageInfo:

125

prompt_tokens: int

126

completion_tokens: int

127

total_tokens: int

128

```

129

130

## Supported Formats

131

132

### Document Types

133

134

- **PDF**: Multi-page PDF documents

135

- **Images**: JPEG, PNG, TIFF formats

136

- **Scanned Documents**: Digital scans of physical documents

137

138

### Output Information

139

140

- **Text Content**: Extracted text with reading order

141

- **Layout Information**: Page dimensions and structure

142

- **Image Detection**: Embedded images and their positions

143

- **Coordinate Information**: Position data for text elements

144

145

## Best Practices

146

147

### Document Quality

148

149

- Use high-resolution images for better accuracy

150

- Ensure good contrast between text and background

151

- Minimize skew and rotation in source documents

152

- Clean, well-lit scans produce better results

153

154

### Processing Optimization

155

156

- Specify page ranges for large documents to reduce processing time

157

- Consider document orientation and layout complexity

158

- Test with representative samples before batch processing