or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

index.md

index.mddocs/

0

# usaddress

1

2

A Python library for parsing unstructured United States address strings into address components using advanced NLP methods and conditional random fields. It makes educated guesses in identifying address components, even in challenging cases where rule-based parsers typically fail.

3

4

## Package Information

5

6

- **Package Name**: usaddress

7

- **Package Type**: pypi

8

- **Language**: Python

9

- **Installation**: `pip install usaddress`

10

- **Version**: 0.5.16

11

- **Dependencies**: python-crfsuite>=0.7, probableparsing

12

13

## Core Imports

14

15

```python

16

import usaddress

17

```

18

19

## Basic Usage

20

21

```python

22

import usaddress

23

24

# Example address string

25

addr = '123 Main St. Suite 100 Chicago, IL'

26

27

# Parse method: split address into components and label each

28

# Returns list of (token, label) tuples

29

parsed = usaddress.parse(addr)

30

# Output: [('123', 'AddressNumber'), ('Main', 'StreetName'), ('St.', 'StreetNamePostType'),

31

# ('Suite', 'OccupancyType'), ('100', 'OccupancyIdentifier'),

32

# ('Chicago,', 'PlaceName'), ('IL', 'StateName')]

33

34

# Tag method: merge consecutive components and return address type

35

# Returns (dict, address_type) tuple

36

tagged, address_type = usaddress.tag(addr)

37

# Output: ({'AddressNumber': '123', 'StreetName': 'Main',

38

# 'StreetNamePostType': 'St.', 'OccupancyType': 'Suite',

39

# 'OccupancyIdentifier': '100', 'PlaceName': 'Chicago',

40

# 'StateName': 'IL'}, 'Street Address')

41

```

42

43

## Capabilities

44

45

### Address Parsing

46

47

Core functionality for parsing US addresses into labeled components using probabilistic models.

48

49

```python { .api }

50

def parse(address_string: str) -> list[tuple[str, str]]:

51

"""

52

Split an address string into components, and label each component.

53

54

Args:

55

address_string (str): The address to parse

56

57

Returns:

58

list[tuple[str, str]]: List of (token, label) pairs

59

"""

60

61

def tag(address_string: str, tag_mapping=None) -> tuple[dict[str, str], str]:

62

"""

63

Parse and merge consecutive components & strip commas.

64

Also return an address type ('Street Address', 'Intersection', 'PO Box', or 'Ambiguous').

65

66

Because this method returns a dict with labels as keys, it will throw a

67

RepeatedLabelError when multiple areas of an address have the same label.

68

69

Args:

70

address_string (str): The address to parse

71

tag_mapping (dict, optional): Optional mapping to remap labels to custom format

72

73

Returns:

74

tuple[dict[str, str], str]: (tagged_address_dict, address_type)

75

"""

76

```

77

78

#### Usage Examples

79

80

```python

81

# Basic parsing - get individual tokens with labels

82

tokens = usaddress.parse("1600 Pennsylvania Avenue NW Washington DC 20500")

83

for token, label in tokens:

84

print(f"{token}: {label}")

85

86

# Advanced tagging - get consolidated address components

87

address, addr_type = usaddress.tag("1600 Pennsylvania Avenue NW Washington DC 20500")

88

print(f"Address type: {addr_type}")

89

print(f"Street number: {address.get('AddressNumber', 'N/A')}")

90

print(f"Street name: {address.get('StreetName', 'N/A')}")

91

92

# Custom label mapping

93

mapping = {'StreetName': 'Street', 'AddressNumber': 'Number'}

94

address, addr_type = usaddress.tag("123 Main St", tag_mapping=mapping)

95

print(address) # Uses custom labels

96

```

97

98

### Address Tokenization

99

100

Low-level tokenization functionality for splitting addresses into unlabeled tokens.

101

102

```python { .api }

103

def tokenize(address_string: str) -> list[str]:

104

"""

105

Split each component of an address into a list of unlabeled tokens.

106

107

Args:

108

address_string (str): The address to tokenize

109

110

Returns:

111

list[str]: The tokenized address components

112

"""

113

```

114

115

#### Usage Examples

116

117

```python

118

# Tokenize without labeling

119

tokens = usaddress.tokenize("123 Main St. Apt 4B")

120

print(tokens) # ['123', 'Main', 'St.', 'Apt', '4B']

121

```

122

123

### Feature Extraction

124

125

Functions for extracting machine learning features from address tokens.

126

127

```python { .api }

128

def tokenFeatures(token: str) -> Feature:

129

"""

130

Return a Feature dict with attributes that describe a token.

131

132

Args:

133

token (str): The token to analyze

134

135

Returns:

136

Feature: Dict with attributes describing the token

137

(abbrev, digits, word, trailing.zeros, length, endsinpunc,

138

directional, street_name, has.vowels)

139

"""

140

141

def tokens2features(address: list[str]) -> list[Feature]:

142

"""

143

Turn every token into a Feature dict, and return a list of each token as a Feature.

144

Each attribute in a Feature describes the corresponding token.

145

146

Args:

147

address (list[str]): The address as a list of tokens

148

149

Returns:

150

list[Feature]: A list of all tokens with feature details and context

151

"""

152

```

153

154

#### Usage Examples

155

156

```python

157

# Extract features for a single token

158

features = usaddress.tokenFeatures("123")

159

print(features['digits']) # 'all_digits'

160

print(features['length']) # 'd:3'

161

162

# Extract features for all tokens with context

163

tokens = ["123", "Main", "St."]

164

features_list = usaddress.tokens2features(tokens)

165

print(features_list[0]['next']['word']) # 'main'

166

print(features_list[1]['previous']['digits']) # 'all_digits'

167

```

168

169

### Utility Functions

170

171

Helper functions for analyzing token characteristics.

172

173

```python { .api }

174

def digits(token: str) -> typing.Literal["all_digits", "some_digits", "no_digits"]:

175

"""

176

Identify whether the token string is all digits, has some digits, or has no digits.

177

178

Args:

179

token (str): The token to parse

180

181

Returns:

182

str: Label denoting digit presence ('all_digits', 'some_digits', 'no_digits')

183

"""

184

185

def trailingZeros(token: str) -> str:

186

"""

187

Return any trailing zeros found at the end of a token.

188

If none are found, then return an empty string.

189

190

Args:

191

token (str): The token to search for zeros

192

193

Returns:

194

str: The trailing zeros found, if any. Otherwise, an empty string.

195

"""

196

```

197

198

#### Usage Examples

199

200

```python

201

# Analyze digit content

202

print(usaddress.digits("123")) # 'all_digits'

203

print(usaddress.digits("12th")) # 'some_digits'

204

print(usaddress.digits("Main")) # 'no_digits'

205

206

# Find trailing zeros

207

print(usaddress.trailingZeros("1200")) # '00'

208

print(usaddress.trailingZeros("123")) # ''

209

```

210

211

### Address Component Labels

212

213

Constants defining the complete set of address component labels used by the parser.

214

215

```python { .api }

216

LABELS: list[str]

217

```

218

219

The complete list of 25 address component labels based on the United States Thoroughfare, Landmark, and Postal Address Data Standard:

220

221

- **Address Number Components**: AddressNumberPrefix, AddressNumber, AddressNumberSuffix

222

- **Street Name Components**: StreetNamePreModifier, StreetNamePreDirectional, StreetNamePreType, StreetName, StreetNamePostType, StreetNamePostDirectional

223

- **Subaddress Components**: SubaddressType, SubaddressIdentifier, BuildingName

224

- **Occupancy Components**: OccupancyType, OccupancyIdentifier

225

- **Intersection Components**: CornerOf, IntersectionSeparator

226

- **Location Components**: LandmarkName, PlaceName, StateName, ZipCode

227

- **USPS Box Components**: USPSBoxType, USPSBoxID, USPSBoxGroupType, USPSBoxGroupID

228

- **Other Components**: Recipient, NotAddress

229

230

### Reference Data

231

232

Built-in reference data for address parsing and feature extraction.

233

234

```python { .api }

235

DIRECTIONS: set[str]

236

STREET_NAMES: set[str]

237

PARENT_LABEL: str

238

GROUP_LABEL: str

239

```

240

241

#### Usage Examples

242

243

```python

244

# Check if token is a direction

245

if token.lower() in usaddress.DIRECTIONS:

246

print("This is a directional")

247

248

# Check if token is a street type

249

if token.lower() in usaddress.STREET_NAMES:

250

print("This is a street type")

251

252

# Access all available labels

253

print(f"Total labels: {len(usaddress.LABELS)}")

254

for label in usaddress.LABELS:

255

print(label)

256

```

257

258

## Types

259

260

```python { .api }

261

Feature = dict[str, typing.Union[str, bool, "Feature"]]

262

263

class RepeatedLabelError(probableparsing.RepeatedLabelError):

264

"""

265

Exception raised when tag() encounters repeated labels that cannot be merged.

266

267

Attributes:

268

REPO_URL (str): "https://github.com/datamade/usaddress/issues/new"

269

DOCS_URL (str): "https://usaddress.readthedocs.io/"

270

"""

271

```

272

273

## Error Handling

274

275

The `tag()` function can raise a `RepeatedLabelError` when multiple areas of an address have the same label and cannot be concatenated. This typically indicates either:

276

277

1. The input string is not a valid address

278

2. Some tokens were labeled incorrectly by the model

279

280

```python

281

try:

282

address, addr_type = usaddress.tag("123 Main St 456 Oak Ave")

283

except usaddress.RepeatedLabelError as e:

284

print(f"Ambiguous address: {e}")

285

# Fall back to parse() for detailed token analysis

286

tokens = usaddress.parse("123 Main St 456 Oak Ave")

287

```

288

289

## Address Types

290

291

The `tag()` function returns one of four address types:

292

293

- **"Street Address"**: Standard street address with AddressNumber

294

- **"Intersection"**: Street intersection without AddressNumber

295

- **"PO Box"**: Postal box address with USPSBoxID

296

- **"Ambiguous"**: Cannot be classified into other categories