or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

data-models.mddocument-analysis.mdform-recognition.mdindex.mdmodel-management.md

form-recognition.mddocs/

0

# Form Recognition (Legacy API)

1

2

Traditional form processing capabilities using the legacy Form Recognizer API (v2.0, v2.1). This API provides prebuilt models for common document types and basic custom form training functionality. While still supported, the modern Document Analysis API is recommended for new applications.

3

4

## Capabilities

5

6

### Receipt Recognition

7

8

Extracts key information from receipts including merchant details, transaction amounts, dates, and line items using the prebuilt receipt model.

9

10

```python { .api }

11

def begin_recognize_receipts(receipt: Union[bytes, IO[bytes]], **kwargs) -> LROPoller[List[RecognizedForm]]:

12

"""

13

Recognize receipt data from documents.

14

15

Parameters:

16

- receipt: Receipt document as bytes or file stream

17

- locale: Optional locale hint (e.g., "en-US")

18

- include_field_elements: Include field elements in response

19

- content_type: MIME type of the document

20

21

Returns:

22

LROPoller that yields List[RecognizedForm] with extracted receipt data

23

"""

24

25

def begin_recognize_receipts_from_url(receipt_url: str, **kwargs) -> LROPoller[List[RecognizedForm]]:

26

"""

27

Recognize receipt data from document URL.

28

29

Parameters:

30

- receipt_url: Publicly accessible URL to receipt document

31

- locale: Optional locale hint

32

- include_field_elements: Include field elements in response

33

34

Returns:

35

LROPoller that yields List[RecognizedForm] with extracted receipt data

36

"""

37

```

38

39

#### Usage Example

40

41

```python

42

from azure.ai.formrecognizer import FormRecognizerClient

43

from azure.core.credentials import AzureKeyCredential

44

45

client = FormRecognizerClient(endpoint, AzureKeyCredential("key"))

46

47

# From local file

48

with open("receipt.jpg", "rb") as receipt_file:

49

poller = client.begin_recognize_receipts(receipt_file, locale="en-US")

50

receipts = poller.result()

51

52

# Access extracted data

53

for receipt in receipts:

54

merchant_name = receipt.fields.get("MerchantName")

55

if merchant_name:

56

print(f"Merchant: {merchant_name.value}")

57

58

total = receipt.fields.get("Total")

59

if total:

60

print(f"Total: {total.value}")

61

62

# Access line items

63

items = receipt.fields.get("Items")

64

if items:

65

for item in items.value:

66

name = item.value.get("Name")

67

price = item.value.get("TotalPrice")

68

if name and price:

69

print(f"Item: {name.value} - ${price.value}")

70

```

71

72

### Business Card Recognition

73

74

Extracts contact information from business cards including names, job titles, organizations, phone numbers, and email addresses.

75

76

```python { .api }

77

def begin_recognize_business_cards(business_card: Union[bytes, IO[bytes]], **kwargs) -> LROPoller[List[RecognizedForm]]:

78

"""

79

Extract business card information.

80

81

Parameters:

82

- business_card: Business card document as bytes or file stream

83

- locale: Optional locale hint

84

- include_field_elements: Include field elements in response

85

- content_type: MIME type of the document

86

87

Returns:

88

LROPoller that yields List[RecognizedForm] with contact information

89

"""

90

91

def begin_recognize_business_cards_from_url(business_card_url: str, **kwargs) -> LROPoller[List[RecognizedForm]]:

92

"""

93

Extract business card information from URL.

94

95

Parameters:

96

- business_card_url: Publicly accessible URL to business card

97

- locale: Optional locale hint

98

- include_field_elements: Include field elements in response

99

100

Returns:

101

LROPoller that yields List[RecognizedForm] with contact information

102

"""

103

```

104

105

### Invoice Recognition

106

107

Processes invoices to extract vendor information, customer details, invoice amounts, due dates, and line item details.

108

109

```python { .api }

110

def begin_recognize_invoices(invoice: Union[bytes, IO[bytes]], **kwargs) -> LROPoller[List[RecognizedForm]]:

111

"""

112

Extract invoice information using prebuilt model.

113

114

Parameters:

115

- invoice: Invoice document as bytes or file stream

116

- locale: Optional locale hint

117

- include_field_elements: Include field elements in response

118

- content_type: MIME type of the document

119

120

Returns:

121

LROPoller that yields List[RecognizedForm] with invoice data

122

"""

123

124

def begin_recognize_invoices_from_url(invoice_url: str, **kwargs) -> LROPoller[List[RecognizedForm]]:

125

"""

126

Extract invoice information from URL.

127

128

Parameters:

129

- invoice_url: Publicly accessible URL to invoice document

130

- locale: Optional locale hint

131

- include_field_elements: Include field elements in response

132

133

Returns:

134

LROPoller that yields List[RecognizedForm] with invoice data

135

"""

136

```

137

138

### Identity Document Recognition

139

140

Extracts information from identity documents such as driver's licenses and passports, including personal details, document numbers, and expiration dates.

141

142

```python { .api }

143

def begin_recognize_identity_documents(identity_document: Union[bytes, IO[bytes]], **kwargs) -> LROPoller[List[RecognizedForm]]:

144

"""

145

Extract identity document information.

146

147

Parameters:

148

- identity_document: ID document as bytes or file stream

149

- include_field_elements: Include field elements in response

150

- content_type: MIME type of the document

151

152

Returns:

153

LROPoller that yields List[RecognizedForm] with identity information

154

"""

155

156

def begin_recognize_identity_documents_from_url(identity_document_url: str, **kwargs) -> LROPoller[List[RecognizedForm]]:

157

"""

158

Extract identity document information from URL.

159

160

Parameters:

161

- identity_document_url: Publicly accessible URL to ID document

162

- include_field_elements: Include field elements in response

163

164

Returns:

165

LROPoller that yields List[RecognizedForm] with identity information

166

"""

167

```

168

169

### Content Recognition

170

171

Extracts layout information including text, tables, and selection marks without using a specific model. Useful for general document layout analysis.

172

173

```python { .api }

174

def begin_recognize_content(form: Union[bytes, IO[bytes]], **kwargs) -> LROPoller[List[FormPage]]:

175

"""

176

Extract layout information from documents.

177

178

Parameters:

179

- form: Document as bytes or file stream

180

- language: Language code for text recognition

181

- pages: Specific page numbers to analyze

182

- reading_order: Reading order algorithm

183

- content_type: MIME type of the document

184

185

Returns:

186

LROPoller that yields List[FormPage] with layout information

187

"""

188

189

def begin_recognize_content_from_url(form_url: str, **kwargs) -> LROPoller[List[FormPage]]:

190

"""

191

Extract layout information from document URL.

192

193

Parameters:

194

- form_url: Publicly accessible URL to document

195

- language: Language code for text recognition

196

- pages: Specific page numbers to analyze

197

- reading_order: Reading order algorithm

198

199

Returns:

200

LROPoller that yields List[FormPage] with layout information

201

"""

202

```

203

204

### Custom Form Recognition

205

206

Uses custom trained models to extract information from domain-specific forms and documents.

207

208

```python { .api }

209

def begin_recognize_custom_forms(model_id: str, form: Union[bytes, IO[bytes]], **kwargs) -> LROPoller[List[RecognizedForm]]:

210

"""

211

Recognize forms using custom trained model.

212

213

Parameters:

214

- model_id: ID of custom trained model

215

- form: Form document as bytes or file stream

216

- include_field_elements: Include field elements in response

217

- content_type: MIME type of the document

218

219

Returns:

220

LROPoller that yields List[RecognizedForm] with extracted custom form data

221

"""

222

223

def begin_recognize_custom_forms_from_url(model_id: str, form_url: str, **kwargs) -> LROPoller[List[RecognizedForm]]:

224

"""

225

Recognize forms from URL using custom model.

226

227

Parameters:

228

- model_id: ID of custom trained model

229

- form_url: Publicly accessible URL to form document

230

- include_field_elements: Include field elements in response

231

232

Returns:

233

LROPoller that yields List[RecognizedForm] with extracted custom form data

234

"""

235

```

236

237

#### Custom Form Usage Example

238

239

```python

240

# Recognize custom form

241

model_id = "your-custom-model-id"

242

243

with open("custom_form.pdf", "rb") as form_file:

244

poller = client.begin_recognize_custom_forms(model_id, form_file)

245

forms = poller.result()

246

247

# Process results

248

for form in forms:

249

print(f"Form type: {form.form_type}")

250

print(f"Confidence: {form.form_type_confidence}")

251

252

for field_name, field in form.fields.items():

253

print(f"{field_name}: {field.value} (confidence: {field.confidence})")

254

```

255

256

## FormRecognizerClient

257

258

```python { .api }

259

class FormRecognizerClient:

260

"""

261

Client for analyzing forms using Form Recognizer API v2.1 and below.

262

"""

263

264

def __init__(

265

self,

266

endpoint: str,

267

credential: Union[AzureKeyCredential, TokenCredential],

268

**kwargs

269

):

270

"""

271

Initialize FormRecognizerClient.

272

273

Parameters:

274

- endpoint: Cognitive Services endpoint URL

275

- credential: Authentication credential

276

- api_version: API version (default: FormRecognizerApiVersion.V2_1)

277

"""

278

279

def close(self) -> None:

280

"""Close client and release resources."""

281

282

# Async version

283

class AsyncFormRecognizerClient:

284

"""

285

Async client for analyzing forms using Form Recognizer API v2.1 and below.

286

287

Provides the same methods as FormRecognizerClient but with async/await support.

288

"""

289

290

def __init__(

291

self,

292

endpoint: str,

293

credential: Union[AzureKeyCredential, AsyncTokenCredential],

294

**kwargs

295

):

296

"""

297

Initialize AsyncFormRecognizerClient.

298

299

Parameters:

300

- endpoint: Cognitive Services endpoint URL

301

- credential: Authentication credential (must support async operations)

302

- api_version: API version (default: FormRecognizerApiVersion.V2_1)

303

"""

304

305

async def begin_recognize_receipts(self, receipt: Union[bytes, IO[bytes]], **kwargs) -> AsyncLROPoller[List[RecognizedForm]]: ...

306

async def begin_recognize_receipts_from_url(self, receipt_url: str, **kwargs) -> AsyncLROPoller[List[RecognizedForm]]: ...

307

async def begin_recognize_business_cards(self, business_card: Union[bytes, IO[bytes]], **kwargs) -> AsyncLROPoller[List[RecognizedForm]]: ...

308

async def begin_recognize_business_cards_from_url(self, business_card_url: str, **kwargs) -> AsyncLROPoller[List[RecognizedForm]]: ...

309

async def begin_recognize_identity_documents(self, identity_document: Union[bytes, IO[bytes]], **kwargs) -> AsyncLROPoller[List[RecognizedForm]]: ...

310

async def begin_recognize_identity_documents_from_url(self, identity_document_url: str, **kwargs) -> AsyncLROPoller[List[RecognizedForm]]: ...

311

async def begin_recognize_invoices(self, invoice: Union[bytes, IO[bytes]], **kwargs) -> AsyncLROPoller[List[RecognizedForm]]: ...

312

async def begin_recognize_invoices_from_url(self, invoice_url: str, **kwargs) -> AsyncLROPoller[List[RecognizedForm]]: ...

313

async def begin_recognize_content(self, form: Union[bytes, IO[bytes]], **kwargs) -> AsyncLROPoller[List[FormPage]]: ...

314

async def begin_recognize_content_from_url(self, form_url: str, **kwargs) -> AsyncLROPoller[List[FormPage]]: ...

315

async def begin_recognize_custom_forms(self, model_id: str, form: Union[bytes, IO[bytes]], **kwargs) -> AsyncLROPoller[List[RecognizedForm]]: ...

316

async def begin_recognize_custom_forms_from_url(self, model_id: str, form_url: str, **kwargs) -> AsyncLROPoller[List[RecognizedForm]]: ...

317

318

async def close(self) -> None:

319

"""Close client and release resources."""

320

```

321

322

## Common Parameters

323

324

### Content Types

325

```python { .api }

326

class FormContentType(str, Enum):

327

APPLICATION_PDF = "application/pdf"

328

IMAGE_JPEG = "image/jpeg"

329

IMAGE_PNG = "image/png"

330

IMAGE_TIFF = "image/tiff"

331

IMAGE_BMP = "image/bmp"

332

```

333

334

### Language Codes

335

Common locale values for enhanced recognition:

336

- `"en-US"` - English (United States)

337

- `"en-AU"` - English (Australia)

338

- `"en-CA"` - English (Canada)

339

- `"en-GB"` - English (Great Britain)

340

- `"en-IN"` - English (India)

341

342

## Error Handling

343

344

```python { .api }

345

from azure.ai.formrecognizer import FormRecognizerError

346

347

try:

348

poller = client.begin_recognize_receipts(receipt_data)

349

result = poller.result()

350

except FormRecognizerError as e:

351

print(f"Recognition failed: {e.error_code} - {e.message}")

352

if hasattr(e, 'details'):

353

for detail in e.details:

354

print(f"Detail: {detail}")

355

```

356

357

## Polling Operations

358

359

All recognition operations return Long Running Operation (LRO) pollers:

360

361

```python

362

# Start operation

363

poller = client.begin_recognize_receipts(receipt_data)

364

365

# Check status

366

print(f"Status: {poller.status()}")

367

368

# Wait for completion (blocking)

369

result = poller.result()

370

371

# Poll with custom interval

372

result = poller.result(timeout=300) # 5 minute timeout

373

```