or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

configuration.mdcore-connector.mddata-processing.mddata-streams.mdindex.md

data-processing.mddocs/

0

# Data Processing

1

2

Specialized utilities for handling Xero-specific data formats and custom record extraction. These components handle the conversion of Xero's .NET JSON date formats to standard ISO 8601 timestamps and provide custom record extraction with automatic field path resolution.

3

4

## Capabilities

5

6

### Date Parsing Utilities

7

8

Utility class for parsing and converting Xero's .NET JSON date format to standard Python datetime objects with proper timezone handling.

9

10

```python { .api }

11

from datetime import datetime

12

from typing import List, Union, Mapping, Any

13

from dataclasses import dataclass, InitVar

14

import requests

15

from airbyte_cdk.sources.declarative.extractors.record_extractor import RecordExtractor

16

from airbyte_cdk.sources.declarative.interpolation import InterpolatedString

17

from airbyte_cdk.sources.declarative.decoders.decoder import Decoder

18

from airbyte_cdk.sources.declarative.types import Config

19

20

class ParseDates:

21

"""

22

Static utility class for parsing Xero date formats.

23

24

Xero uses .NET JSON date strings in the format "/Date(timestamp±offset)/"

25

where timestamp is milliseconds since epoch and offset is timezone.

26

"""

27

28

@staticmethod

29

def parse_date(value):

30

"""

31

Parse a Xero date string into a Python datetime object.

32

33

Supports both .NET JSON format and standard ISO 8601 format:

34

- .NET format: "/Date(1419937200000+0000)/"

35

- ISO format: "2014-12-30T07:00:00Z"

36

37

Args:

38

value (str): Date string in Xero format or ISO 8601 format

39

40

Returns:

41

datetime or None: Parsed datetime with UTC timezone, or None if parsing fails

42

43

Examples:

44

>>> ParseDates.parse_date("/Date(1419937200000+0000)/")

45

datetime.datetime(2014, 12, 30, 7, 0, tzinfo=datetime.timezone.utc)

46

47

>>> ParseDates.parse_date("/Date(1580628711500+0300)/")

48

datetime.datetime(2020, 2, 2, 10, 31, 51, 500000, tzinfo=datetime.timezone.utc)

49

50

>>> ParseDates.parse_date("not a date")

51

None

52

"""

53

54

@staticmethod

55

def convert_dates(obj):

56

"""

57

Recursively convert all Xero date strings in a nested data structure.

58

59

Performs in-place conversion of date strings to ISO 8601 format.

60

Searches through dictionaries and lists recursively to find and

61

convert any date strings.

62

63

Args:

64

obj (dict or list): Data structure containing potential date strings

65

Modifies the object in-place

66

67

Side Effects:

68

- Converts .NET JSON dates to ISO 8601 strings

69

- Ensures all dates have UTC timezone information

70

- Preserves non-date data unchanged

71

72

Examples:

73

>>> data = {

74

... "UpdatedDate": "/Date(1419937200000+0000)/",

75

... "Amount": 100.50,

76

... "Items": [{"Date": "/Date(1580628711500+0300)/"}]

77

... }

78

>>> ParseDates.convert_dates(data)

79

>>> print(data)

80

{

81

"UpdatedDate": "2014-12-30T07:00:00+00:00",

82

"Amount": 100.50,

83

"Items": [{"Date": "2020-02-02T10:31:51+00:00"}]

84

}

85

"""

86

```

87

88

### Custom Record Extractor

89

90

Dataclass-based record extractor that extends Airbyte's RecordExtractor with automatic date conversion for Xero API responses.

91

92

```python { .api }

93

@dataclass

94

class CustomExtractor(RecordExtractor):

95

"""

96

Custom record extractor for Xero API responses with date parsing.

97

98

Extracts records from HTTP responses using configurable field paths

99

and automatically converts Xero date formats to ISO 8601.

100

"""

101

102

field_path: List[Union[InterpolatedString, str]]

103

"""

104

Path to extract records from the response JSON.

105

Supports nested paths and wildcards for complex data structures.

106

Each element can be a string or InterpolatedString for dynamic values.

107

"""

108

109

config: Config

110

"""

111

Configuration object containing connection and extraction parameters.

112

Used for interpolating dynamic values in field paths.

113

"""

114

115

parameters: InitVar[Mapping[str, Any]]

116

"""

117

Initialization parameters passed during object creation.

118

Used to configure InterpolatedString objects in field_path.

119

"""

120

121

decoder: Decoder = JsonDecoder(parameters={})

122

"""

123

Response decoder for converting HTTP response to Python objects.

124

Defaults to JsonDecoder for JSON API responses.

125

"""

126

127

def __post_init__(self, parameters: Mapping[str, Any]):

128

"""

129

Initialize InterpolatedString objects in field_path after creation.

130

131

Args:

132

parameters: Parameters for configuring dynamic string interpolation

133

"""

134

135

def extract_records(self, response: requests.Response) -> List[Mapping[str, Any]]:

136

"""

137

Extract and process records from HTTP response.

138

139

Decodes the response, extracts records using the configured field path,

140

applies date format conversion, and returns processed records.

141

142

Args:

143

response: HTTP response object containing JSON data

144

145

Returns:

146

List[Mapping[str, Any]]: List of extracted records with converted dates

147

Empty list if no records found or extraction fails

148

149

Processing Steps:

150

1. Decode HTTP response using configured decoder

151

2. Extract records using field_path (supports nested paths and wildcards)

152

3. Apply date format conversion using ParseDates.convert_dates()

153

4. Return list of processed records

154

155

Examples:

156

# Response: {"BankTransactions": [{"ID": "123", "Date": "/Date(1419937200000)/"}]}

157

# field_path: ["BankTransactions"]

158

# Returns: [{"ID": "123", "Date": "2014-12-30T07:00:00+00:00"}]

159

"""

160

```

161

162

## Usage Examples

163

164

### Basic Date Parsing

165

166

```python

167

from source_xero.components import ParseDates

168

from datetime import datetime, timezone

169

170

# Parse individual date strings

171

xero_date = "/Date(1419937200000+0000)/"

172

parsed = ParseDates.parse_date(xero_date)

173

print(parsed) # 2014-12-30 07:00:00+00:00

174

175

# Handle timezone offsets

176

date_with_offset = "/Date(1580628711500+0300)/"

177

parsed_offset = ParseDates.parse_date(date_with_offset)

178

print(parsed_offset) # 2020-02-02 10:31:51.500000+00:00

179

180

# Handle invalid dates gracefully

181

invalid_date = "not a date"

182

result = ParseDates.parse_date(invalid_date)

183

print(result) # None

184

```

185

186

### Bulk Date Conversion

187

188

```python

189

from source_xero.components import ParseDates

190

191

# Convert dates in nested data structures

192

bank_transaction = {

193

"BankTransactionID": "12345",

194

"Date": "/Date(1419937200000+0000)/",

195

"UpdatedDateUTC": "/Date(1580628711500+0300)/",

196

"Amount": 150.75,

197

"LineItems": [

198

{

199

"LineItemID": "67890",

200

"UpdatedDate": "/Date(1419937200000+0000)/",

201

"Amount": 75.50

202

}

203

]

204

}

205

206

# Convert all dates in-place

207

ParseDates.convert_dates(bank_transaction)

208

print(bank_transaction)

209

# Output:

210

# {

211

# "BankTransactionID": "12345",

212

# "Date": "2014-12-30T07:00:00+00:00",

213

# "UpdatedDateUTC": "2020-02-02T10:31:51+00:00",

214

# "Amount": 150.75,

215

# "LineItems": [

216

# {

217

# "LineItemID": "67890",

218

# "UpdatedDate": "2014-12-30T07:00:00+00:00",

219

# "Amount": 75.50

220

# }

221

# ]

222

# }

223

```

224

225

### Custom Extractor Usage

226

227

```python

228

from source_xero.components import CustomExtractor

229

from airbyte_cdk.sources.declarative.interpolation import InterpolatedString

230

import requests

231

232

# Create custom extractor for bank transactions

233

extractor = CustomExtractor(

234

field_path=["BankTransactions"],

235

config={"tenant_id": "your-tenant-id"},

236

parameters={}

237

)

238

239

# Mock response (would come from actual API call)

240

response = requests.Response()

241

response._content = b'{"BankTransactions": [{"ID": "123", "Date": "/Date(1419937200000)/"}]}'

242

response.status_code = 200

243

244

# Extract records with automatic date conversion

245

records = extractor.extract_records(response)

246

print(records)

247

# Output: [{"ID": "123", "Date": "2014-12-30T07:00:00+00:00"}]

248

```

249

250

### Integration with Manifest Configuration

251

252

The CustomExtractor is used within the declarative manifest configuration:

253

254

```yaml

255

# From manifest.yaml

256

selector:

257

type: RecordSelector

258

extractor:

259

type: CustomRecordExtractor

260

class_name: source_xero.components.CustomExtractor

261

field_path: ["{{ parameters.extractor_path }}"]

262

```

263

264

This allows streams to specify their extraction path dynamically:

265

266

```yaml

267

bank_transactions_stream:

268

$parameters:

269

extractor_path: "BankTransactions" # Extracts from response.BankTransactions

270

```

271

272

## Error Handling

273

274

The data processing components include robust error handling:

275

276

- **Date Parsing**: Invalid date strings return None rather than raising exceptions

277

- **Nested Conversion**: Safely handles missing or null values in nested structures

278

- **Type Safety**: Checks data types before attempting conversion

279

- **Response Extraction**: Gracefully handles missing fields and empty responses

280

- **Path Resolution**: Uses default values when extraction paths don't exist