0
# Data Processing
1
2
Specialized utilities for handling Xero-specific data formats and custom record extraction. These components handle the conversion of Xero's .NET JSON date formats to standard ISO 8601 timestamps and provide custom record extraction with automatic field path resolution.
3
4
## Capabilities
5
6
### Date Parsing Utilities
7
8
Utility class for parsing and converting Xero's .NET JSON date format to standard Python datetime objects with proper timezone handling.
9
10
```python { .api }
11
from datetime import datetime
12
from typing import List, Union, Mapping, Any
13
from dataclasses import dataclass, InitVar
14
import requests
15
from airbyte_cdk.sources.declarative.extractors.record_extractor import RecordExtractor
16
from airbyte_cdk.sources.declarative.interpolation import InterpolatedString
17
from airbyte_cdk.sources.declarative.decoders.decoder import Decoder
18
from airbyte_cdk.sources.declarative.types import Config
19
20
class ParseDates:
21
"""
22
Static utility class for parsing Xero date formats.
23
24
Xero uses .NET JSON date strings in the format "/Date(timestamp±offset)/"
25
where timestamp is milliseconds since epoch and offset is timezone.
26
"""
27
28
@staticmethod
29
def parse_date(value):
30
"""
31
Parse a Xero date string into a Python datetime object.
32
33
Supports both .NET JSON format and standard ISO 8601 format:
34
- .NET format: "/Date(1419937200000+0000)/"
35
- ISO format: "2014-12-30T07:00:00Z"
36
37
Args:
38
value (str): Date string in Xero format or ISO 8601 format
39
40
Returns:
41
datetime or None: Parsed datetime with UTC timezone, or None if parsing fails
42
43
Examples:
44
>>> ParseDates.parse_date("/Date(1419937200000+0000)/")
45
datetime.datetime(2014, 12, 30, 7, 0, tzinfo=datetime.timezone.utc)
46
47
>>> ParseDates.parse_date("/Date(1580628711500+0300)/")
48
datetime.datetime(2020, 2, 2, 10, 31, 51, 500000, tzinfo=datetime.timezone.utc)
49
50
>>> ParseDates.parse_date("not a date")
51
None
52
"""
53
54
@staticmethod
55
def convert_dates(obj):
56
"""
57
Recursively convert all Xero date strings in a nested data structure.
58
59
Performs in-place conversion of date strings to ISO 8601 format.
60
Searches through dictionaries and lists recursively to find and
61
convert any date strings.
62
63
Args:
64
obj (dict or list): Data structure containing potential date strings
65
Modifies the object in-place
66
67
Side Effects:
68
- Converts .NET JSON dates to ISO 8601 strings
69
- Ensures all dates have UTC timezone information
70
- Preserves non-date data unchanged
71
72
Examples:
73
>>> data = {
74
... "UpdatedDate": "/Date(1419937200000+0000)/",
75
... "Amount": 100.50,
76
... "Items": [{"Date": "/Date(1580628711500+0300)/"}]
77
... }
78
>>> ParseDates.convert_dates(data)
79
>>> print(data)
80
{
81
"UpdatedDate": "2014-12-30T07:00:00+00:00",
82
"Amount": 100.50,
83
"Items": [{"Date": "2020-02-02T10:31:51+00:00"}]
84
}
85
"""
86
```
87
88
### Custom Record Extractor
89
90
Dataclass-based record extractor that extends Airbyte's RecordExtractor with automatic date conversion for Xero API responses.
91
92
```python { .api }
93
@dataclass
94
class CustomExtractor(RecordExtractor):
95
"""
96
Custom record extractor for Xero API responses with date parsing.
97
98
Extracts records from HTTP responses using configurable field paths
99
and automatically converts Xero date formats to ISO 8601.
100
"""
101
102
field_path: List[Union[InterpolatedString, str]]
103
"""
104
Path to extract records from the response JSON.
105
Supports nested paths and wildcards for complex data structures.
106
Each element can be a string or InterpolatedString for dynamic values.
107
"""
108
109
config: Config
110
"""
111
Configuration object containing connection and extraction parameters.
112
Used for interpolating dynamic values in field paths.
113
"""
114
115
parameters: InitVar[Mapping[str, Any]]
116
"""
117
Initialization parameters passed during object creation.
118
Used to configure InterpolatedString objects in field_path.
119
"""
120
121
decoder: Decoder = JsonDecoder(parameters={})
122
"""
123
Response decoder for converting HTTP response to Python objects.
124
Defaults to JsonDecoder for JSON API responses.
125
"""
126
127
def __post_init__(self, parameters: Mapping[str, Any]):
128
"""
129
Initialize InterpolatedString objects in field_path after creation.
130
131
Args:
132
parameters: Parameters for configuring dynamic string interpolation
133
"""
134
135
def extract_records(self, response: requests.Response) -> List[Mapping[str, Any]]:
136
"""
137
Extract and process records from HTTP response.
138
139
Decodes the response, extracts records using the configured field path,
140
applies date format conversion, and returns processed records.
141
142
Args:
143
response: HTTP response object containing JSON data
144
145
Returns:
146
List[Mapping[str, Any]]: List of extracted records with converted dates
147
Empty list if no records found or extraction fails
148
149
Processing Steps:
150
1. Decode HTTP response using configured decoder
151
2. Extract records using field_path (supports nested paths and wildcards)
152
3. Apply date format conversion using ParseDates.convert_dates()
153
4. Return list of processed records
154
155
Examples:
156
# Response: {"BankTransactions": [{"ID": "123", "Date": "/Date(1419937200000)/"}]}
157
# field_path: ["BankTransactions"]
158
# Returns: [{"ID": "123", "Date": "2014-12-30T07:00:00+00:00"}]
159
"""
160
```
161
162
## Usage Examples
163
164
### Basic Date Parsing
165
166
```python
167
from source_xero.components import ParseDates
168
from datetime import datetime, timezone
169
170
# Parse individual date strings
171
xero_date = "/Date(1419937200000+0000)/"
172
parsed = ParseDates.parse_date(xero_date)
173
print(parsed) # 2014-12-30 07:00:00+00:00
174
175
# Handle timezone offsets
176
date_with_offset = "/Date(1580628711500+0300)/"
177
parsed_offset = ParseDates.parse_date(date_with_offset)
178
print(parsed_offset) # 2020-02-02 10:31:51.500000+00:00
179
180
# Handle invalid dates gracefully
181
invalid_date = "not a date"
182
result = ParseDates.parse_date(invalid_date)
183
print(result) # None
184
```
185
186
### Bulk Date Conversion
187
188
```python
189
from source_xero.components import ParseDates
190
191
# Convert dates in nested data structures
192
bank_transaction = {
193
"BankTransactionID": "12345",
194
"Date": "/Date(1419937200000+0000)/",
195
"UpdatedDateUTC": "/Date(1580628711500+0300)/",
196
"Amount": 150.75,
197
"LineItems": [
198
{
199
"LineItemID": "67890",
200
"UpdatedDate": "/Date(1419937200000+0000)/",
201
"Amount": 75.50
202
}
203
]
204
}
205
206
# Convert all dates in-place
207
ParseDates.convert_dates(bank_transaction)
208
print(bank_transaction)
209
# Output:
210
# {
211
# "BankTransactionID": "12345",
212
# "Date": "2014-12-30T07:00:00+00:00",
213
# "UpdatedDateUTC": "2020-02-02T10:31:51+00:00",
214
# "Amount": 150.75,
215
# "LineItems": [
216
# {
217
# "LineItemID": "67890",
218
# "UpdatedDate": "2014-12-30T07:00:00+00:00",
219
# "Amount": 75.50
220
# }
221
# ]
222
# }
223
```
224
225
### Custom Extractor Usage
226
227
```python
228
from source_xero.components import CustomExtractor
229
from airbyte_cdk.sources.declarative.interpolation import InterpolatedString
230
import requests
231
232
# Create custom extractor for bank transactions
233
extractor = CustomExtractor(
234
field_path=["BankTransactions"],
235
config={"tenant_id": "your-tenant-id"},
236
parameters={}
237
)
238
239
# Mock response (would come from actual API call)
240
response = requests.Response()
241
response._content = b'{"BankTransactions": [{"ID": "123", "Date": "/Date(1419937200000)/"}]}'
242
response.status_code = 200
243
244
# Extract records with automatic date conversion
245
records = extractor.extract_records(response)
246
print(records)
247
# Output: [{"ID": "123", "Date": "2014-12-30T07:00:00+00:00"}]
248
```
249
250
### Integration with Manifest Configuration
251
252
The CustomExtractor is used within the declarative manifest configuration:
253
254
```yaml
255
# From manifest.yaml
256
selector:
257
type: RecordSelector
258
extractor:
259
type: CustomRecordExtractor
260
class_name: source_xero.components.CustomExtractor
261
field_path: ["{{ parameters.extractor_path }}"]
262
```
263
264
This allows streams to specify their extraction path dynamically:
265
266
```yaml
267
bank_transactions_stream:
268
$parameters:
269
extractor_path: "BankTransactions" # Extracts from response.BankTransactions
270
```
271
272
## Error Handling
273
274
The data processing components include robust error handling:
275
276
- **Date Parsing**: Invalid date strings return None rather than raising exceptions
277
- **Nested Conversion**: Safely handles missing or null values in nested structures
278
- **Type Safety**: Checks data types before attempting conversion
279
- **Response Extraction**: Gracefully handles missing fields and empty responses
280
- **Path Resolution**: Uses default values when extraction paths don't exist