0
# Common Parser Expressions
1
2
Pre-built parser expressions for frequently used patterns including numeric types, identifiers, network addresses, dates, and parse actions for data conversion. The `pyparsing_common` class provides a comprehensive collection of ready-to-use parsers that handle common data formats.
3
4
## Capabilities
5
6
### Numeric Parsers
7
8
Parser expressions for various numeric formats with automatic type conversion.
9
10
```python { .api }
11
class pyparsing_common:
12
"""Namespace class containing common parser expressions."""
13
14
# Integer parsers
15
integer: ParserElement # Unsigned integer
16
signed_integer: ParserElement # Signed integer (+/-)
17
hex_integer: ParserElement # Hexadecimal integer (0x...)
18
19
# Floating point parsers
20
real: ParserElement # Floating point number
21
sci_real: ParserElement # Scientific notation (1.23e-4)
22
number: ParserElement # Any number (int or float)
23
fnumber: ParserElement # Any number returned as float
24
ieee_float: ParserElement # IEEE float including NaN, inf
25
26
# Fraction parsers
27
fraction: ParserElement # Fraction (n/d)
28
mixed_integer: ParserElement # Mixed number (1-1/2)
29
```
30
31
**Usage examples:**
32
```python
33
from pyparsing import pyparsing_common as ppc
34
35
# Parse integers
36
result = ppc.integer.parse_string("12345") # -> [12345]
37
38
# Parse signed numbers
39
result = ppc.signed_integer.parse_string("-42") # -> [-42]
40
41
# Parse hexadecimal
42
result = ppc.hex_integer.parse_string("0xFF") # -> [255]
43
44
# Parse floating point
45
result = ppc.real.parse_string("3.14159") # -> [3.14159]
46
47
# Parse scientific notation
48
result = ppc.sci_real.parse_string("6.02e23") # -> [6.02e+23]
49
50
# Parse fractions
51
result = ppc.fraction.parse_string("3/4") # -> [[3, '/', 4]]
52
53
# Parse mixed numbers
54
result = ppc.mixed_integer.parse_string("2-1/3") # -> [[2, [1, '/', 3]]]
55
```
56
57
### String and Identifier Parsers
58
59
Parsers for common string patterns and programming language constructs.
60
61
```python { .api }
62
class pyparsing_common:
63
# Programming identifiers
64
identifier: ParserElement # Programming language identifier
65
66
# List parsers
67
comma_separated_list: ParserElement # Comma-separated values
68
```
69
70
**Usage examples:**
71
```python
72
# Parse identifiers
73
result = ppc.identifier.parse_string("variable_name") # -> ['variable_name']
74
result = ppc.identifier.parse_string("_private") # -> ['_private']
75
76
# Parse CSV data
77
csv_data = ppc.comma_separated_list.parse_string("apple,banana,cherry")
78
# -> [['apple', 'banana', 'cherry']]
79
```
80
81
### Network Address Parsers
82
83
Parsers for various network address formats.
84
85
```python { .api }
86
class pyparsing_common:
87
# IP addresses
88
ipv4_address: ParserElement # IPv4 address (192.168.1.1)
89
ipv6_address: ParserElement # IPv6 address
90
91
# Other network formats
92
mac_address: ParserElement # MAC address (AA:BB:CC:DD:EE:FF)
93
url: ParserElement # HTTP/HTTPS/FTP URLs
94
```
95
96
**Usage examples:**
97
```python
98
# Parse IPv4 addresses
99
result = ppc.ipv4_address.parse_string("192.168.1.1")
100
# -> [['192', '.', '168', '.', '1', '.', '1']]
101
102
# Parse URLs
103
result = ppc.url.parse_string("https://www.example.com/path")
104
# -> ['https://www.example.com/path']
105
106
# Parse MAC addresses
107
result = ppc.mac_address.parse_string("AA:BB:CC:DD:EE:FF")
108
# -> [['AA', ':', 'BB', ':', 'CC', ':', 'DD', ':', 'EE', ':', 'FF']]
109
```
110
111
### Date and Time Parsers
112
113
Parsers for ISO8601 date and datetime formats.
114
115
```python { .api }
116
class pyparsing_common:
117
# Date/time parsers
118
iso8601_date: ParserElement # ISO8601 date (YYYY-MM-DD)
119
iso8601_datetime: ParserElement # ISO8601 datetime
120
```
121
122
**Usage examples:**
123
```python
124
# Parse ISO8601 dates
125
result = ppc.iso8601_date.parse_string("2023-12-25")
126
# -> [['2023', '-', '12', '-', '25']]
127
128
# Parse ISO8601 datetime
129
result = ppc.iso8601_datetime.parse_string("2023-12-25T10:30:00Z")
130
# -> datetime parsing result
131
```
132
133
### UUID Parser
134
135
Parser for Universally Unique Identifiers.
136
137
```python { .api }
138
class pyparsing_common:
139
# UUID parser
140
uuid: ParserElement # UUID format
141
```
142
143
**Usage example:**
144
```python
145
# Parse UUIDs
146
import uuid
147
ppc.uuid.set_parse_action(lambda t: uuid.UUID(t[0]))
148
result = ppc.uuid.parse_string("12345678-1234-5678-1234-567812345678")
149
# -> [UUID('12345678-1234-5678-1234-567812345678')]
150
```
151
152
### Parse Actions for Data Conversion
153
154
Static methods that create parse actions for converting parsed tokens to specific data types.
155
156
```python { .api }
157
class pyparsing_common:
158
@staticmethod
159
def convert_to_integer() -> callable:
160
"""Create parse action to convert tokens to integers."""
161
162
@staticmethod
163
def convert_to_float() -> callable:
164
"""Create parse action to convert tokens to floats."""
165
166
@staticmethod
167
def convert_to_date(fmt: str = "%Y-%m-%d") -> callable:
168
"""Create parse action to convert tokens to date objects."""
169
170
@staticmethod
171
def convert_to_datetime(fmt: str = None) -> callable:
172
"""Create parse action to convert tokens to datetime objects."""
173
```
174
175
**Usage examples:**
176
```python
177
# Convert to integers
178
int_parser = Word(nums).set_parse_action(ppc.convert_to_integer())
179
result = int_parser.parse_string("42") # -> [42] (int, not string)
180
181
# Convert to floats
182
float_parser = Regex(r'\d+\.\d+').set_parse_action(ppc.convert_to_float())
183
result = float_parser.parse_string("3.14") # -> [3.14] (float)
184
185
# Convert to date objects
186
date_parser = ppc.iso8601_date.set_parse_action(ppc.convert_to_date())
187
result = date_parser.parse_string("2023-12-25") # -> [datetime.date(2023, 12, 25)]
188
189
# Convert to datetime objects
190
datetime_parser = ppc.iso8601_datetime.set_parse_action(ppc.convert_to_datetime())
191
```
192
193
### Text Processing Actions
194
195
Static methods for common text processing operations.
196
197
```python { .api }
198
class pyparsing_common:
199
@staticmethod
200
def strip_html_tags() -> callable:
201
"""Create parse action to remove HTML tags from text."""
202
203
@staticmethod
204
def upcase_tokens() -> callable:
205
"""Create parse action to convert tokens to uppercase."""
206
207
@staticmethod
208
def downcase_tokens() -> callable:
209
"""Create parse action to convert tokens to lowercase."""
210
```
211
212
**Usage examples:**
213
```python
214
# Strip HTML tags
215
html_stripper = Regex(r'<[^>]*>').set_parse_action(ppc.strip_html_tags())
216
217
# Convert to uppercase
218
upper_parser = Word(alphas).set_parse_action(ppc.upcase_tokens())
219
result = upper_parser.parse_string("hello") # -> ['HELLO']
220
221
# Convert to lowercase
222
lower_parser = Word(alphas).set_parse_action(ppc.downcase_tokens())
223
result = lower_parser.parse_string("WORLD") # -> ['world']
224
```
225
226
### Usage Patterns
227
228
Common patterns for using pyparsing_common expressions.
229
230
**Complete number parsing:**
231
```python
232
# Parse any numeric format
233
any_number = (ppc.sci_real | ppc.real | ppc.signed_integer | ppc.integer)
234
235
# Parse with automatic conversion
236
typed_number = any_number.copy().set_parse_action(
237
lambda t: float(t[0]) if '.' in t[0] or 'e' in t[0].lower() else int(t[0])
238
)
239
```
240
241
**Configuration file parsing:**
242
```python
243
# Parse configuration entries
244
config_value = (ppc.number | ppc.uuid | QuotedString('"') | Word(alphanums))
245
config_entry = ppc.identifier + "=" + config_value
246
config_file = OneOrMore(config_entry)
247
```
248
249
**Network log parsing:**
250
```python
251
# Parse access log entries
252
log_entry = (ppc.ipv4_address +
253
QuotedString('"') + # User agent
254
ppc.iso8601_datetime +
255
ppc.integer) # Response code
256
```
257
258
**Data validation with parse actions:**
259
```python
260
# Validate and convert email-like patterns
261
email_pattern = (Word(alphanums + "._") + "@" +
262
Word(alphanums + ".-") + "." +
263
Word(alphas, min=2, max=4))
264
265
validated_email = email_pattern.set_parse_action(
266
lambda t: "".join(t) if "@" in "".join(t) else None
267
)
268
```