0
# Data Type Utilities
1
2
Utility functions for parsing and converting Xero's custom data formats and JSON structures. These utilities ensure proper data type conversion, date parsing, and RFC3339 compliance for all Xero API responses.
3
4
## Core Imports
5
6
```python
7
import decimal
8
import re
9
from abc import ABC
10
from datetime import date, datetime, time, timedelta, timezone
11
from typing import Any, Iterable, Mapping, MutableMapping, Optional
12
import pendulum
13
import requests
14
```
15
16
## Capabilities
17
18
### Date Parsing Functions
19
20
Xero uses multiple date formats that require specialized parsing to ensure compatibility with downstream systems.
21
22
#### Date Parser
23
24
```python { .api }
25
def parse_date(value: str) -> Optional[datetime]:
26
"""
27
Parse Xero date strings in various formats to datetime objects.
28
29
Supports multiple date formats used by Xero API including:
30
- .NET JSON format: "/Date(1419937200000+0000)/"
31
- ISO 8601 format: "2023-08-15T14:30:25Z"
32
- Partial ISO format: "2023-08-15T14:30:25"
33
34
Parameters:
35
- value: String containing date in any supported format
36
37
Returns:
38
datetime object in UTC timezone, or None if parsing fails
39
40
Examples:
41
- parse_date("/Date(1419937200000+0000)/") -> datetime(2014, 12, 30, 9, 0)
42
- parse_date("2023-08-15T14:30:25Z") -> datetime(2023, 8, 15, 14, 30, 25)
43
- parse_date("invalid-date") -> None
44
"""
45
```
46
47
### JSON Processing Functions
48
49
Custom JSON processing to handle Xero's data structures and ensure RFC3339 compliance.
50
51
#### JSON Object Hook
52
53
```python { .api }
54
def _json_load_object_hook(_dict: dict) -> dict:
55
"""
56
JSON parse hook to convert Xero date formats to RFC3339 strings.
57
58
Automatically processes dictionary objects during JSON parsing
59
to identify and convert date fields from Xero's formats to
60
standardized RFC3339 format for downstream compatibility.
61
62
Parameters:
63
- _dict: Dictionary object from JSON parsing containing potential date fields
64
65
Returns:
66
Modified dictionary with converted date strings in RFC3339 format
67
68
Date Field Patterns:
69
- Fields ending in "Date", "DateUTC", or containing "Date" substring
70
- Common fields: UpdatedDateUTC, CreatedDateUTC, DueDateString, etc.
71
72
Conversion Examples:
73
- "/Date(1419937200000+0000)/" -> "2014-12-30T09:00:00+00:00"
74
- "2023-08-15T14:30:25" -> "2023-08-15T14:30:25+00:00"
75
"""
76
```
77
78
## Date Format Support
79
80
### .NET JSON Date Format
81
82
Xero's legacy .NET JSON date format requires special parsing:
83
84
```python
85
# .NET JSON date format pattern
86
NET_JSON_PATTERN = r"/Date\((\d+)([\+\-]\d{4})?\)/"
87
88
# Examples of .NET JSON dates from Xero:
89
NET_JSON_EXAMPLES = [
90
"/Date(1419937200000+0000)/", # UTC timestamp with timezone
91
"/Date(1419937200000)/", # UTC timestamp without timezone
92
"/Date(1419937200000-0500)/", # Timestamp with negative timezone offset
93
]
94
95
# Parsed results (all converted to UTC):
96
PARSED_RESULTS = [
97
"2014-12-30T09:00:00+00:00", # December 30, 2014 9:00 AM UTC
98
"2014-12-30T09:00:00+00:00", # Same timestamp, assumed UTC
99
"2014-12-30T14:00:00+00:00", # Adjusted for -0500 timezone offset
100
]
101
```
102
103
### ISO 8601 Date Format
104
105
Standard ISO date formats are also supported:
106
107
```python
108
# ISO 8601 format examples
109
ISO_8601_EXAMPLES = [
110
"2023-08-15T14:30:25Z", # Full UTC format with Z suffix
111
"2023-08-15T14:30:25+00:00", # Full UTC format with +00:00 offset
112
"2023-08-15T14:30:25", # Local time without timezone (assumed UTC)
113
"2023-08-15T14:30:25.123Z", # With milliseconds
114
]
115
```
116
117
## Usage Examples
118
119
### Manual Date Parsing
120
121
```python
122
from source_xero.streams import parse_date
123
from datetime import datetime
124
125
# Parse various date formats
126
net_date = parse_date("/Date(1419937200000+0000)/")
127
iso_date = parse_date("2023-08-15T14:30:25Z")
128
partial_date = parse_date("2023-08-15T14:30:25")
129
130
print(f".NET date: {net_date}") # 2014-12-30 09:00:00
131
print(f"ISO date: {iso_date}") # 2023-08-15 14:30:25
132
print(f"Partial date: {partial_date}") # 2023-08-15 14:30:25
133
134
# Handle invalid dates
135
invalid_date = parse_date("not-a-date")
136
print(f"Invalid date: {invalid_date}") # None
137
```
138
139
### JSON Processing with Date Conversion
140
141
```python
142
import json
143
from source_xero.streams import _json_load_object_hook
144
145
# Raw JSON response from Xero API
146
xero_json = '''
147
{
148
"ContactID": "12345678-1234-1234-1234-123456789012",
149
"Name": "Sample Customer",
150
"UpdatedDateUTC": "/Date(1419937200000+0000)/",
151
"CreatedDateUTC": "2023-08-15T14:30:25Z",
152
"EmailAddress": "customer@example.com"
153
}
154
'''
155
156
# Parse with automatic date conversion
157
parsed_data = json.loads(xero_json, object_hook=_json_load_object_hook)
158
159
print(f"Contact: {parsed_data['Name']}")
160
print(f"Updated: {parsed_data['UpdatedDateUTC']}") # Converted to RFC3339
161
print(f"Created: {parsed_data['CreatedDateUTC']}") # Already RFC3339
162
```
163
164
### Stream Response Processing
165
166
```python
167
# This processing happens automatically in all Xero streams
168
def parse_response(self, response: requests.Response, **kwargs) -> Iterable[Mapping]:
169
"""Example of how streams use the date utilities internally."""
170
171
response_data = response.json(object_hook=_json_load_object_hook)
172
data_field = self.data_field()
173
174
# Extract records from response
175
if data_field in response_data:
176
records = response_data[data_field]
177
# All date fields are now in RFC3339 format
178
return records
179
else:
180
return []
181
```
182
183
## Date Field Identification
184
185
### Common Date Fields
186
187
The JSON object hook automatically processes these common date fields:
188
189
```python
190
COMMON_DATE_FIELDS = [
191
"UpdatedDateUTC", # Most common cursor field
192
"CreatedDateUTC", # Alternative cursor field
193
"DueDateString", # Invoice due dates
194
"DateString", # Transaction dates
195
"FullyPaidOnDate", # Payment completion dates
196
"ExpectedArrivalDate", # Purchase order dates
197
"DeliveryDate", # Delivery scheduling
198
"PaymentDueDate", # Payment deadlines
199
"InvoiceDate", # Invoice issue dates
200
"LastLoginDate", # User activity tracking
201
]
202
```
203
204
### Field Detection Logic
205
206
```python
207
# Date field detection patterns
208
def is_date_field(field_name: str) -> bool:
209
"""
210
Determine if a field name likely contains date data.
211
212
Detection criteria:
213
- Field name ends with "Date" or "DateUTC"
214
- Field name contains "Date" substring
215
- Known date field patterns from Xero API
216
"""
217
218
date_patterns = [
219
field_name.endswith('Date'),
220
field_name.endswith('DateUTC'),
221
'Date' in field_name,
222
field_name.endswith('DateString')
223
]
224
225
return any(date_patterns)
226
```
227
228
## Error Handling
229
230
### Date Parsing Errors
231
232
The date parser handles various error conditions gracefully:
233
234
```python
235
# Error handling examples
236
ERROR_CASES = {
237
"Invalid .NET format": "/Date(invalid)/",
238
"Malformed timestamp": "/Date(abc123+0000)/",
239
"Invalid ISO format": "2023-13-45T25:70:99Z",
240
"Empty string": "",
241
"None value": None,
242
"Non-string input": 12345
243
}
244
245
# All error cases return None without raising exceptions
246
for case, value in ERROR_CASES.items():
247
result = parse_date(value)
248
assert result is None, f"{case} should return None"
249
```
250
251
### JSON Processing Errors
252
253
The JSON object hook handles processing errors:
254
255
- **Non-string values**: Skips non-string values in date fields
256
- **Missing fields**: Gracefully handles missing date fields
257
- **Nested objects**: Recursively processes nested date fields
258
- **Array processing**: Handles date fields within array elements
259
260
## Performance Considerations
261
262
### Regex Compilation
263
264
Date parsing uses compiled regex patterns for efficiency:
265
266
```python
267
import re
268
269
# Pre-compiled regex for .NET JSON dates
270
NET_DATE_REGEX = re.compile(r"/Date\((\d+)([\+\-]\d{4})?\)/")
271
272
# Single compilation for all parsing operations
273
# Significantly faster than re-compiling for each date
274
```
275
276
### Caching Strategy
277
278
Date parsing could benefit from caching for repeated values:
279
280
```python
281
# Potential optimization for repeated date values
282
from functools import lru_cache
283
284
@lru_cache(maxsize=1000)
285
def cached_parse_date(value: str) -> Optional[datetime]:
286
"""Cached version of parse_date for performance optimization."""
287
return parse_date(value)
288
```
289
290
### Memory Usage
291
292
The utilities are designed for minimal memory overhead:
293
294
- **Stream Processing**: Processes one record at a time
295
- **No Global State**: Functions are stateless and thread-safe
296
- **Garbage Collection**: Temporary objects are quickly released
297
- **Efficient Patterns**: Uses efficient regex and string operations
298
299
## Integration Notes
300
301
### Airbyte CDK Compatibility
302
303
The utilities integrate seamlessly with Airbyte CDK:
304
305
- **Stream Interface**: Used automatically by all stream classes
306
- **Type Consistency**: Ensures consistent datetime handling
307
- **Error Handling**: Follows Airbyte error handling patterns
308
- **Logging**: Compatible with Airbyte's logging framework
309
310
### Downstream Compatibility
311
312
Converted dates work with common data processing tools:
313
314
- **Data Warehouses**: RFC3339 format is widely supported
315
- **Analytics Tools**: Standard datetime format for analysis
316
- **ETL Pipelines**: Consistent format reduces transformation overhead
317
- **JSON Serialization**: Compatible with standard JSON libraries