Tessl Tile for pypi/usaddress@0.5.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

index.md

index.mddocs/

0
# usaddress
1

2
A Python library for parsing unstructured United States address strings into address components using advanced NLP methods and conditional random fields. It makes educated guesses in identifying address components, even in challenging cases where rule-based parsers typically fail.
3

4
## Package Information
5

6
- **Package Name**: usaddress
7
- **Package Type**: pypi
8
- **Language**: Python
9
- **Installation**: `pip install usaddress`
10
- **Version**: 0.5.16
11
- **Dependencies**: python-crfsuite>=0.7, probableparsing
12

13
## Core Imports
14

15
```python
16
import usaddress
17
```
18

19
## Basic Usage
20

21
```python
22
import usaddress
23

24
# Example address string
25
addr = '123 Main St. Suite 100 Chicago, IL'
26

27
# Parse method: split address into components and label each
28
# Returns list of (token, label) tuples
29
parsed = usaddress.parse(addr)
30
# Output: [('123', 'AddressNumber'), ('Main', 'StreetName'), ('St.', 'StreetNamePostType'), 
31
#          ('Suite', 'OccupancyType'), ('100', 'OccupancyIdentifier'), 
32
#          ('Chicago,', 'PlaceName'), ('IL', 'StateName')]
33

34
# Tag method: merge consecutive components and return address type
35
# Returns (dict, address_type) tuple
36
tagged, address_type = usaddress.tag(addr)
37
# Output: ({'AddressNumber': '123', 'StreetName': 'Main', 
38
#          'StreetNamePostType': 'St.', 'OccupancyType': 'Suite', 
39
#          'OccupancyIdentifier': '100', 'PlaceName': 'Chicago', 
40
#          'StateName': 'IL'}, 'Street Address')
41
```
42

43
## Capabilities
44

45
### Address Parsing
46

47
Core functionality for parsing US addresses into labeled components using probabilistic models.
48

49
```python { .api }
50
def parse(address_string: str) -> list[tuple[str, str]]:
51
    """
52
    Split an address string into components, and label each component.
53

54
    Args:
55
        address_string (str): The address to parse
56

57
    Returns:
58
        list[tuple[str, str]]: List of (token, label) pairs
59
    """
60

61
def tag(address_string: str, tag_mapping=None) -> tuple[dict[str, str], str]:
62
    """
63
    Parse and merge consecutive components & strip commas.
64
    Also return an address type ('Street Address', 'Intersection', 'PO Box', or 'Ambiguous').
65

66
    Because this method returns a dict with labels as keys, it will throw a
67
    RepeatedLabelError when multiple areas of an address have the same label.
68

69
    Args:
70
        address_string (str): The address to parse
71
        tag_mapping (dict, optional): Optional mapping to remap labels to custom format
72

73
    Returns:
74
        tuple[dict[str, str], str]: (tagged_address_dict, address_type)
75
    """
76
```
77

78
#### Usage Examples
79

80
```python
81
# Basic parsing - get individual tokens with labels
82
tokens = usaddress.parse("1600 Pennsylvania Avenue NW Washington DC 20500")
83
for token, label in tokens:
84
    print(f"{token}: {label}")
85

86
# Advanced tagging - get consolidated address components
87
address, addr_type = usaddress.tag("1600 Pennsylvania Avenue NW Washington DC 20500")
88
print(f"Address type: {addr_type}")
89
print(f"Street number: {address.get('AddressNumber', 'N/A')}")
90
print(f"Street name: {address.get('StreetName', 'N/A')}")
91

92
# Custom label mapping
93
mapping = {'StreetName': 'Street', 'AddressNumber': 'Number'}
94
address, addr_type = usaddress.tag("123 Main St", tag_mapping=mapping)
95
print(address)  # Uses custom labels
96
```
97

98
### Address Tokenization
99

100
Low-level tokenization functionality for splitting addresses into unlabeled tokens.
101

102
```python { .api }
103
def tokenize(address_string: str) -> list[str]:
104
    """
105
    Split each component of an address into a list of unlabeled tokens.
106

107
    Args:
108
        address_string (str): The address to tokenize
109

110
    Returns:
111
        list[str]: The tokenized address components
112
    """
113
```
114

115
#### Usage Examples
116

117
```python
118
# Tokenize without labeling
119
tokens = usaddress.tokenize("123 Main St. Apt 4B")
120
print(tokens)  # ['123', 'Main', 'St.', 'Apt', '4B']
121
```
122

123
### Feature Extraction
124

125
Functions for extracting machine learning features from address tokens.
126

127
```python { .api }
128
def tokenFeatures(token: str) -> Feature:
129
    """
130
    Return a Feature dict with attributes that describe a token.
131

132
    Args:
133
        token (str): The token to analyze
134

135
    Returns:
136
        Feature: Dict with attributes describing the token
137
                (abbrev, digits, word, trailing.zeros, length, endsinpunc, 
138
                 directional, street_name, has.vowels)
139
    """
140

141
def tokens2features(address: list[str]) -> list[Feature]:
142
    """
143
    Turn every token into a Feature dict, and return a list of each token as a Feature.
144
    Each attribute in a Feature describes the corresponding token.
145

146
    Args:
147
        address (list[str]): The address as a list of tokens
148

149
    Returns:
150
        list[Feature]: A list of all tokens with feature details and context
151
    """
152
```
153

154
#### Usage Examples
155

156
```python
157
# Extract features for a single token
158
features = usaddress.tokenFeatures("123")
159
print(features['digits'])  # 'all_digits'
160
print(features['length'])  # 'd:3'
161

162
# Extract features for all tokens with context
163
tokens = ["123", "Main", "St."]
164
features_list = usaddress.tokens2features(tokens)
165
print(features_list[0]['next']['word'])  # 'main'
166
print(features_list[1]['previous']['digits'])  # 'all_digits'
167
```
168

169
### Utility Functions
170

171
Helper functions for analyzing token characteristics.
172

173
```python { .api }
174
def digits(token: str) -> typing.Literal["all_digits", "some_digits", "no_digits"]:
175
    """
176
    Identify whether the token string is all digits, has some digits, or has no digits.
177

178
    Args:
179
        token (str): The token to parse
180

181
    Returns:
182
        str: Label denoting digit presence ('all_digits', 'some_digits', 'no_digits')
183
    """
184

185
def trailingZeros(token: str) -> str:
186
    """
187
    Return any trailing zeros found at the end of a token.
188
    If none are found, then return an empty string.
189

190
    Args:
191
        token (str): The token to search for zeros
192

193
    Returns:
194
        str: The trailing zeros found, if any. Otherwise, an empty string.
195
    """
196
```
197

198
#### Usage Examples
199

200
```python
201
# Analyze digit content
202
print(usaddress.digits("123"))     # 'all_digits'
203
print(usaddress.digits("12th"))    # 'some_digits'
204
print(usaddress.digits("Main"))    # 'no_digits'
205

206
# Find trailing zeros
207
print(usaddress.trailingZeros("1200"))  # '00'
208
print(usaddress.trailingZeros("123"))   # ''
209
```
210

211
### Address Component Labels
212

213
Constants defining the complete set of address component labels used by the parser.
214

215
```python { .api }
216
LABELS: list[str]
217
```
218

219
The complete list of 25 address component labels based on the United States Thoroughfare, Landmark, and Postal Address Data Standard:
220

221
- **Address Number Components**: AddressNumberPrefix, AddressNumber, AddressNumberSuffix
222
- **Street Name Components**: StreetNamePreModifier, StreetNamePreDirectional, StreetNamePreType, StreetName, StreetNamePostType, StreetNamePostDirectional
223
- **Subaddress Components**: SubaddressType, SubaddressIdentifier, BuildingName
224
- **Occupancy Components**: OccupancyType, OccupancyIdentifier
225
- **Intersection Components**: CornerOf, IntersectionSeparator
226
- **Location Components**: LandmarkName, PlaceName, StateName, ZipCode
227
- **USPS Box Components**: USPSBoxType, USPSBoxID, USPSBoxGroupType, USPSBoxGroupID
228
- **Other Components**: Recipient, NotAddress
229

230
### Reference Data
231

232
Built-in reference data for address parsing and feature extraction.
233

234
```python { .api }
235
DIRECTIONS: set[str]
236
STREET_NAMES: set[str]
237
PARENT_LABEL: str
238
GROUP_LABEL: str
239
```
240

241
#### Usage Examples
242

243
```python
244
# Check if token is a direction
245
if token.lower() in usaddress.DIRECTIONS:
246
    print("This is a directional")
247

248
# Check if token is a street type
249
if token.lower() in usaddress.STREET_NAMES:
250
    print("This is a street type")
251

252
# Access all available labels
253
print(f"Total labels: {len(usaddress.LABELS)}")
254
for label in usaddress.LABELS:
255
    print(label)
256
```
257

258
## Types
259

260
```python { .api }
261
Feature = dict[str, typing.Union[str, bool, "Feature"]]
262

263
class RepeatedLabelError(probableparsing.RepeatedLabelError):
264
    """
265
    Exception raised when tag() encounters repeated labels that cannot be merged.
266
    
267
    Attributes:
268
        REPO_URL (str): "https://github.com/datamade/usaddress/issues/new"
269
        DOCS_URL (str): "https://usaddress.readthedocs.io/"
270
    """
271
```
272

273
## Error Handling
274

275
The `tag()` function can raise a `RepeatedLabelError` when multiple areas of an address have the same label and cannot be concatenated. This typically indicates either:
276

277
1. The input string is not a valid address
278
2. Some tokens were labeled incorrectly by the model
279

280
```python
281
try:
282
    address, addr_type = usaddress.tag("123 Main St 456 Oak Ave")
283
except usaddress.RepeatedLabelError as e:
284
    print(f"Ambiguous address: {e}")
285
    # Fall back to parse() for detailed token analysis
286
    tokens = usaddress.parse("123 Main St 456 Oak Ave")
287
```
288

289
## Address Types
290

291
The `tag()` function returns one of four address types:
292

293
- **"Street Address"**: Standard street address with AddressNumber
294
- **"Intersection"**: Street intersection without AddressNumber
295
- **"PO Box"**: Postal box address with USPSBoxID
296
- **"Ambiguous"**: Cannot be classified into other categories

Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/