or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

async-clients.mdclassifier-management.mddocument-analysis.mdindex.mdmodel-management.mdmodels-and-types.md

document-analysis.mddocs/

0

# Document Analysis Operations

1

2

Core document processing functionality for analyzing single documents, processing batches, and classifying documents. These operations support both prebuilt models (layout, invoice, receipt, etc.) and custom models with advanced features like high-resolution OCR, language detection, and structured data extraction.

3

4

## Capabilities

5

6

### Single Document Analysis

7

8

Analyzes individual documents using specified models to extract text, tables, key-value pairs, and structured data. Returns enhanced LRO poller with operation metadata.

9

10

```python { .api }

11

def begin_analyze_document(

12

model_id: str,

13

body: Union[AnalyzeDocumentRequest, JSON, IO[bytes]],

14

*,

15

pages: Optional[str] = None,

16

locale: Optional[str] = None,

17

string_index_type: Optional[Union[str, StringIndexType]] = None,

18

features: Optional[List[Union[str, DocumentAnalysisFeature]]] = None,

19

query_fields: Optional[List[str]] = None,

20

output_content_format: Optional[Union[str, DocumentContentFormat]] = None,

21

output: Optional[List[Union[str, AnalyzeOutputOption]]] = None,

22

**kwargs: Any

23

) -> AnalyzeDocumentLROPoller[AnalyzeResult]:

24

"""

25

Analyzes document with the specified model.

26

27

Parameters:

28

- model_id (str): Model ID for analysis (e.g., "prebuilt-layout", "prebuilt-invoice")

29

- body: Document data as AnalyzeDocumentRequest, JSON dict, or file bytes

30

- pages (str, optional): Page range specification (e.g., "1-3,5")

31

- locale (str, optional): Locale hint for better recognition

32

- string_index_type (StringIndexType, optional): Character indexing scheme

33

- features (List[DocumentAnalysisFeature], optional): Additional features to enable

34

- query_fields (List[str], optional): Custom field extraction queries

35

- output_content_format (DocumentContentFormat, optional): Content format (text/markdown)

36

- output (List[AnalyzeOutputOption], optional): Additional outputs (pdf/figures)

37

38

Returns:

39

AnalyzeDocumentLROPoller[AnalyzeResult]: Enhanced poller with operation metadata

40

"""

41

```

42

43

Usage example:

44

45

```python

46

# Analyze with file upload

47

with open("document.pdf", "rb") as f:

48

poller = client.begin_analyze_document(

49

model_id="prebuilt-layout",

50

body=f,

51

features=["languages", "barcodes"],

52

output_content_format="markdown"

53

)

54

result = poller.result()

55

56

# Access operation metadata

57

operation_id = poller.details["operation_id"]

58

59

# Analyze with custom fields

60

with open("invoice.pdf", "rb") as f:

61

poller = client.begin_analyze_document(

62

"prebuilt-invoice",

63

f,

64

query_fields=["Tax ID", "Purchase Order"]

65

)

66

result = poller.result()

67

```

68

69

### Batch Document Analysis

70

71

Processes multiple documents in a single operation for efficient bulk processing. Supports Azure Blob Storage as document source with flexible file selection.

72

73

```python { .api }

74

def begin_analyze_batch_documents(

75

model_id: str,

76

body: Union[AnalyzeBatchDocumentsRequest, JSON, IO[bytes]],

77

**kwargs: Any

78

) -> LROPoller[AnalyzeBatchResult]:

79

"""

80

Analyzes multiple documents in batch.

81

82

Parameters:

83

- model_id (str): Model ID for batch analysis

84

- body: Batch request with Azure Blob source configuration

85

86

Returns:

87

LROPoller[AnalyzeBatchResult]: Batch operation poller

88

"""

89

```

90

91

### Batch Results Management

92

93

Retrieves and manages batch processing results with support for listing operations and accessing individual results.

94

95

```python { .api }

96

def list_analyze_batch_results(

97

model_id: str,

98

*,

99

skip: Optional[int] = None,

100

top: Optional[int] = None,

101

**kwargs: Any

102

) -> Iterable[AnalyzeBatchOperation]:

103

"""

104

Lists batch analysis operations for the specified model.

105

106

Parameters:

107

- model_id (str): Model ID to filter operations

108

- skip (int, optional): Number of operations to skip

109

- top (int, optional): Maximum operations to return

110

111

Returns:

112

Iterable[AnalyzeBatchOperation]: Paginated batch operations

113

"""

114

115

def get_analyze_batch_result(

116

continuation_token: str,

117

**kwargs: Any

118

) -> LROPoller[AnalyzeBatchResult]:

119

"""

120

Continues batch analysis operation from continuation token.

121

122

Parameters:

123

- continuation_token (str): Continuation token for resuming batch operation

124

125

Returns:

126

LROPoller[AnalyzeBatchResult]: Batch operation poller

127

"""

128

129

def delete_analyze_batch_result(

130

model_id: str,

131

result_id: str,

132

**kwargs: Any

133

) -> None:

134

"""

135

Deletes batch analysis result.

136

137

Parameters:

138

- model_id (str): Model ID used for analysis

139

- result_id (str): Batch operation result ID to delete

140

"""

141

```

142

143

### Document Classification

144

145

Classifies documents using trained classifiers to automatically determine document types and route processing workflows.

146

147

```python { .api }

148

def begin_classify_document(

149

classifier_id: str,

150

body: Union[ClassifyDocumentRequest, JSON, IO[bytes]],

151

*,

152

string_index_type: Optional[Union[str, StringIndexType]] = None,

153

split_mode: Optional[Union[str, SplitMode]] = None,

154

pages: Optional[str] = None,

155

**kwargs: Any

156

) -> LROPoller[AnalyzeResult]:

157

"""

158

Classifies document using specified classifier.

159

160

Parameters:

161

- classifier_id (str): Document classifier ID

162

- body: Document data as ClassifyDocumentRequest, JSON dict, or file bytes

163

- string_index_type (StringIndexType, optional): Character indexing scheme

164

- split_mode (SplitMode, optional): Document splitting behavior

165

- pages (str, optional): Page range specification

166

167

Returns:

168

LROPoller[AnalyzeResult]: Classification result poller

169

"""

170

```

171

172

### Analysis Result Retrieval

173

174

Retrieves analysis outputs in various formats including searchable PDFs and extracted figure images.

175

176

```python { .api }

177

def get_analyze_result_pdf(

178

model_id: str,

179

result_id: str,

180

**kwargs: Any

181

) -> Iterator[bytes]:

182

"""

183

Gets analysis result as searchable PDF.

184

185

Parameters:

186

- model_id (str): Model ID used for analysis

187

- result_id (str): Analysis result ID

188

189

Returns:

190

Iterator[bytes]: PDF content stream

191

"""

192

193

def get_analyze_result_figure(

194

model_id: str,

195

result_id: str,

196

figure_id: str,

197

**kwargs: Any

198

) -> Iterator[bytes]:

199

"""

200

Gets extracted figure as image.

201

202

Parameters:

203

- model_id (str): Model ID used for analysis

204

- result_id (str): Analysis result ID

205

- figure_id (str): Figure identifier

206

207

Returns:

208

Iterator[bytes]: Image content stream

209

"""

210

211

def delete_analyze_result(

212

model_id: str,

213

result_id: str,

214

**kwargs: Any

215

) -> None:

216

"""

217

Deletes analysis result.

218

219

Parameters:

220

- model_id (str): Model ID used for analysis

221

- result_id (str): Analysis result ID to delete

222

"""

223

```

224

225

## Request Types

226

227

```python { .api }

228

class AnalyzeDocumentRequest:

229

"""Request for single document analysis."""

230

url_source: Optional[str]

231

base64_source: Optional[str]

232

pages: Optional[str]

233

locale: Optional[str]

234

string_index_type: Optional[StringIndexType]

235

features: Optional[List[DocumentAnalysisFeature]]

236

query_fields: Optional[List[str]]

237

output_content_format: Optional[DocumentContentFormat]

238

output: Optional[List[AnalyzeOutputOption]]

239

240

class AnalyzeBatchDocumentsRequest:

241

"""Request for batch document analysis."""

242

azure_blob_source: Optional[AzureBlobContentSource]

243

azure_blob_file_list_source: Optional[AzureBlobFileListContentSource]

244

result_container_url: str

245

result_prefix: Optional[str]

246

overwrite_existing: Optional[bool]

247

pages: Optional[str]

248

locale: Optional[str]

249

string_index_type: Optional[StringIndexType]

250

features: Optional[List[DocumentAnalysisFeature]]

251

query_fields: Optional[List[str]]

252

output_content_format: Optional[DocumentContentFormat]

253

output: Optional[List[AnalyzeOutputOption]]

254

255

class ClassifyDocumentRequest:

256

"""Request for document classification."""

257

url_source: Optional[str]

258

base64_source: Optional[str]

259

pages: Optional[str]

260

string_index_type: Optional[StringIndexType]

261

split_mode: Optional[SplitMode]

262

```

263

264

## Response Types

265

266

```python { .api }

267

class AnalyzeResult:

268

"""Main analysis result containing extracted content and metadata."""

269

api_version: Optional[str]

270

model_id: str

271

string_index_type: Optional[StringIndexType]

272

content: Optional[str]

273

pages: Optional[List[DocumentPage]]

274

paragraphs: Optional[List[DocumentParagraph]]

275

tables: Optional[List[DocumentTable]]

276

figures: Optional[List[DocumentFigure]]

277

sections: Optional[List[DocumentSection]]

278

key_value_pairs: Optional[List[DocumentKeyValuePair]]

279

styles: Optional[List[DocumentStyle]]

280

languages: Optional[List[DocumentLanguage]]

281

documents: Optional[List[AnalyzedDocument]]

282

warnings: Optional[List[DocumentIntelligenceWarning]]

283

284

class AnalyzeBatchResult:

285

"""Results from batch document analysis."""

286

succeeded_count: int

287

failed_count: int

288

skipped_count: int

289

details: List[AnalyzeBatchOperationDetail]

290

291

class AnalyzeBatchOperation:

292

"""Batch operation metadata and status."""

293

operation_id: str

294

status: DocumentIntelligenceOperationStatus

295

created_date_time: datetime

296

last_updated_date_time: datetime

297

percent_completed: Optional[int]

298

result: Optional[AnalyzeBatchResult]

299

error: Optional[DocumentIntelligenceError]

300

```

301

302

## Enhanced LRO Poller

303

304

```python { .api }

305

class AnalyzeDocumentLROPoller(LROPoller[AnalyzeResult]):

306

"""Enhanced poller for document analysis operations."""

307

308

@property

309

def details(self) -> Dict[str, Any]:

310

"""

311

Returns operation metadata including operation_id.

312

313

Returns:

314

Dict containing operation_id extracted from Operation-Location header

315

"""

316

317

@classmethod

318

def from_continuation_token(

319

cls,

320

polling_method: PollingMethod,

321

continuation_token: str,

322

**kwargs: Any

323

) -> "AnalyzeDocumentLROPoller[AnalyzeResult]":

324

"""Resume operation from continuation token."""

325

```

326

327

## Client Utility Methods

328

329

```python { .api }

330

def send_request(

331

request: HttpRequest,

332

*,

333

stream: bool = False,

334

**kwargs: Any

335

) -> HttpResponse:

336

"""

337

Sends custom HTTP request using the client's pipeline.

338

339

Parameters:

340

- request (HttpRequest): HTTP request to send

341

- stream (bool): Whether to stream the response

342

343

Returns:

344

HttpResponse: Raw HTTP response

345

"""

346

347

def close() -> None:

348

"""Close the client and release resources."""

349

```