CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-mistralai

Python Client SDK for the Mistral AI API with chat completions, embeddings, fine-tuning, and agent capabilities.

Pending
Overview
Eval results
Files

ocr.mddocs/

OCR (Optical Character Recognition)

Process documents and images to extract text and structured data using optical character recognition. The OCR API can analyze various document formats and extract text with position information.

Capabilities

Document Processing

Process documents and images to extract text and structural information.

def process(
    model: str,
    document: Document,
    pages: Optional[List[int]] = None,
    **kwargs
) -> OCRResponse:
    """
    Process a document with OCR.

    Parameters:
    - model: OCR model identifier
    - document: Document to process (image or PDF)
    - pages: Optional list of page numbers to process

    Returns:
    OCRResponse with extracted text and structure information
    """

Usage Examples

Process Image Document

from mistralai import Mistral
from mistralai.models import Document

client = Mistral(api_key="your-api-key")

# Process an image document
with open("document.pdf", "rb") as f:
    document = Document(
        type="application/pdf",
        data=f.read()
    )

response = client.ocr.process(
    model="ocr-model",
    document=document,
    pages=[1, 2, 3]  # Process first 3 pages
)

# Extract text from all pages
for page in response.pages:
    print(f"Page {page.page_number}:")
    print(f"Text: {page.text}")
    print(f"Dimensions: {page.dimensions.width}x{page.dimensions.height}")
    print()

Process with Structure Analysis

# Process document and analyze structure
response = client.ocr.process(
    model="ocr-model",
    document=document
)

# Access structured information
for page in response.pages:
    print(f"Page {page.page_number}:")
    
    # Extract images if present
    for image in page.images:
        print(f"  Image: {image.width}x{image.height} at ({image.x}, {image.y})")
    
    # Get text content
    print(f"  Text content: {len(page.text)} characters")
    print(f"  Preview: {page.text[:200]}...")

Types

Request Types

class OCRRequest:
    model: str
    document: Document
    pages: Optional[List[int]]

class Document:
    type: str  # MIME type (e.g., "application/pdf", "image/jpeg")
    data: bytes  # Document content as bytes

Response Types

class OCRResponse:
    id: str
    object: str
    model: str
    pages: List[OCRPageObject]
    usage: Optional[OCRUsageInfo]

class OCRPageObject:
    page_number: int
    text: str
    dimensions: OCRPageDimensions
    images: List[OCRImageObject]

class OCRPageDimensions:
    width: float
    height: float

class OCRImageObject:
    x: float
    y: float
    width: float
    height: float

class OCRUsageInfo:
    prompt_tokens: int
    completion_tokens: int
    total_tokens: int

Supported Formats

Document Types

  • PDF: Multi-page PDF documents
  • Images: JPEG, PNG, TIFF formats
  • Scanned Documents: Digital scans of physical documents

Output Information

  • Text Content: Extracted text with reading order
  • Layout Information: Page dimensions and structure
  • Image Detection: Embedded images and their positions
  • Coordinate Information: Position data for text elements

Best Practices

Document Quality

  • Use high-resolution images for better accuracy
  • Ensure good contrast between text and background
  • Minimize skew and rotation in source documents
  • Clean, well-lit scans produce better results

Processing Optimization

  • Specify page ranges for large documents to reduce processing time
  • Consider document orientation and layout complexity
  • Test with representative samples before batch processing

Install with Tessl CLI

npx tessl i tessl/pypi-mistralai

docs

agents.md

audio.md

batch.md

beta.md

chat-completions.md

classification.md

embeddings.md

files.md

fim.md

fine-tuning.md

index.md

models.md

ocr.md

tile.json