Tessl Tile for maven/org.apache.spark/spark-avro_2.13@3.5.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

binary-conversion.md configuration.md data-types.md file-operations.md index.md schema-conversion.md

data-types.mddocs/

0
# Data Type Support
1

2
This document covers the comprehensive data type mappings between Avro and Spark SQL, including support for complex nested structures, logical types, and custom types.
3

4
## Type Mapping Overview
5

6
### Primitive Types
7

8
| Avro Type | Spark SQL Type | Notes |
9
|-----------|----------------|-------|
10
| `null` | `NullType` | Represents null values |
11
| `boolean` | `BooleanType` | Boolean values |
12
| `int` | `IntegerType` | 32-bit signed integer |
13
| `long` | `LongType` | 64-bit signed integer |
14
| `float` | `FloatType` | Single precision floating point |
15
| `double` | `DoubleType` | Double precision floating point |
16
| `bytes` | `BinaryType` | Variable-length byte array |
17
| `string` | `StringType` | UTF-8 string |
18
19
### Complex Types
20

21
| Avro Type | Spark SQL Type | Notes |
22
|-----------|----------------|-------|
23
| `record` | `StructType` | Structured record with named fields |
24
| `enum` | `StringType` | Enumeration converted to string |
25
| `array` | `ArrayType` | Variable-length array |
26
| `map` | `MapType` | Key-value mapping (keys must be strings) |
27
| `union` | `StructType` or nullable type | Depends on union composition |
28
| `fixed` | `BinaryType` | Fixed-length byte array |
29
30
## Primitive Type Examples
31

32
### Basic Type Conversion
33

34
```scala
35
import org.apache.spark.sql.avro.SchemaConverters
36
import org.apache.avro.Schema
37

38
// Boolean type
39
val booleanSchema = Schema.create(Schema.Type.BOOLEAN)
40
val sparkBoolean = SchemaConverters.toSqlType(booleanSchema)
41
// Result: BooleanType, nullable = false
42

43
// Integer types
44
val intSchema = Schema.create(Schema.Type.INT)
45
val longSchema = Schema.create(Schema.Type.LONG)
46
val sparkInt = SchemaConverters.toSqlType(intSchema)     // IntegerType
47
val sparkLong = SchemaConverters.toSqlType(longSchema)   // LongType
48

49
// Floating point types
50
val floatSchema = Schema.create(Schema.Type.FLOAT)
51
val doubleSchema = Schema.create(Schema.Type.DOUBLE)
52
val sparkFloat = SchemaConverters.toSqlType(floatSchema)   // FloatType
53
val sparkDouble = SchemaConverters.toSqlType(doubleSchema) // DoubleType
54

55
// String and binary types
56
val stringSchema = Schema.create(Schema.Type.STRING)
57
val bytesSchema = Schema.create(Schema.Type.BYTES)
58
val sparkString = SchemaConverters.toSqlType(stringSchema) // StringType
59
val sparkBytes = SchemaConverters.toSqlType(bytesSchema)   // BinaryType
60
```
61

62
### Reverse Conversion (Spark to Avro)
63

64
```scala
65
import org.apache.spark.sql.types._
66

67
// Convert Spark types to Avro
68
val avroInt = SchemaConverters.toAvroType(IntegerType)        // int
69
val avroLong = SchemaConverters.toAvroType(LongType)          // long
70
val avroString = SchemaConverters.toAvroType(StringType)      // string
71
val avroBinary = SchemaConverters.toAvroType(BinaryType)      // bytes
72

73
// Nullable types become unions with null
74
val avroNullableString = SchemaConverters.toAvroType(StringType, nullable = true)
75
// Result: ["null", "string"]
76
```
77

78
## Logical Types
79

80
### Date and Timestamp Types
81

82
```scala { .api }
83
// Spark SQL types with Avro logical type equivalents
84
DateType        <-> int (date logical type)
85
TimestampType   <-> long (timestamp-micros logical type) 
86
TimestampNTZType <-> long (local-timestamp-micros logical type)
87
```
88

89
**Examples:**
90
```scala
91
// Date type conversion
92
val avroDateJson = """
93
{
94
  "type": "int",
95
  "logicalType": "date"
96
}
97
"""
98
val dateSchema = new Schema.Parser().parse(avroDateJson)
99
val sparkDate = SchemaConverters.toSqlType(dateSchema)
100
// Result: DateType, nullable = false
101

102
// Timestamp type conversion
103
val avroTimestampJson = """
104
{
105
  "type": "long", 
106
  "logicalType": "timestamp-micros"
107
}
108
"""
109
val timestampSchema = new Schema.Parser().parse(avroTimestampJson)
110
val sparkTimestamp = SchemaConverters.toSqlType(timestampSchema)
111
// Result: TimestampType, nullable = false
112

113
// Local timestamp (no timezone)
114
val avroLocalTimestampJson = """
115
{
116
  "type": "long",
117
  "logicalType": "local-timestamp-micros"  
118
}
119
"""
120
val localTimestampSchema = new Schema.Parser().parse(avroLocalTimestampJson)
121
val sparkLocalTimestamp = SchemaConverters.toSqlType(localTimestampSchema)
122
// Result: TimestampNTZType, nullable = false
123
```
124

125
### Decimal Types
126

127
```scala { .api }
128
DecimalType(precision, scale) <-> bytes/fixed (decimal logical type)
129
```
130

131
**Examples:**
132
```scala
133
// Avro decimal using bytes
134
val avroDecimalBytesJson = """
135
{
136
  "type": "bytes",
137
  "logicalType": "decimal",
138
  "precision": 10,
139
  "scale": 2
140
}
141
"""
142
val decimalBytesSchema = new Schema.Parser().parse(avroDecimalBytesJson)
143
val sparkDecimal = SchemaConverters.toSqlType(decimalBytesSchema)
144
// Result: DecimalType(10, 2), nullable = false
145

146
// Avro decimal using fixed
147
val avroDecimalFixedJson = """
148
{
149
  "type": "fixed",
150
  "name": "DecimalFixed",
151
  "size": 8,
152
  "logicalType": "decimal",
153
  "precision": 18,
154
  "scale": 4
155
}
156
"""
157
val decimalFixedSchema = new Schema.Parser().parse(avroDecimalFixedJson)
158
val sparkDecimalFixed = SchemaConverters.toSqlType(decimalFixedSchema)
159
// Result: DecimalType(18, 4), nullable = false
160

161
// Convert Spark decimal to Avro
162
val sparkDecimalType = DecimalType(precision = 12, scale = 3)
163
val avroDecimalSchema = SchemaConverters.toAvroType(sparkDecimalType)
164
// Result: fixed type with decimal logical type
165
```
166

167
## Complex Types
168

169
### Record Types (Structs)
170

171
```scala
172
val avroRecordJson = """
173
{
174
  "type": "record",
175
  "name": "Person",
176
  "namespace": "com.example",
177
  "fields": [
178
    {"name": "id", "type": "long"},
179
    {"name": "firstName", "type": "string"},
180
    {"name": "lastName", "type": "string"},
181
    {"name": "age", "type": ["null", "int"], "default": null},
182
    {"name": "email", "type": ["null", "string"], "default": null}
183
  ]
184
}
185
"""
186

187
val recordSchema = new Schema.Parser().parse(avroRecordJson)
188
val sparkStruct = SchemaConverters.toSqlType(recordSchema)
189

190
sparkStruct.dataType match {
191
  case StructType(fields) =>
192
    fields.foreach { field =>
193
      println(s"${field.name}: ${field.dataType}, nullable = ${field.nullable}")
194
    }
195
    // Output:
196
    // id: LongType, nullable = false  
197
    // firstName: StringType, nullable = false
198
    // lastName: StringType, nullable = false
199
    // age: IntegerType, nullable = true
200
    // email: StringType, nullable = true
201
}
202
```
203

204
### Array Types
205

206
```scala
207
// Simple array
208
val avroArrayJson = """
209
{
210
  "type": "array",
211
  "items": "string"
212
}
213
"""
214
val arraySchema = new Schema.Parser().parse(avroArrayJson)
215
val sparkArray = SchemaConverters.toSqlType(arraySchema)
216
// Result: ArrayType(StringType, containsNull = false)
217

218
// Array of nullable items
219
val avroNullableArrayJson = """
220
{
221
  "type": "array",
222
  "items": ["null", "string"]
223
}
224
"""
225
val nullableArraySchema = new Schema.Parser().parse(avroNullableArrayJson)  
226
val sparkNullableArray = SchemaConverters.toSqlType(nullableArraySchema)
227
// Result: ArrayType(StringType, containsNull = true)
228

229
// Array of records
230
val avroRecordArrayJson = """
231
{
232
  "type": "array",
233
  "items": {
234
    "type": "record",
235
    "name": "Item",
236
    "fields": [
237
      {"name": "id", "type": "int"},
238
      {"name": "name", "type": "string"}
239
    ]
240
  }
241
}
242
"""
243
val recordArraySchema = new Schema.Parser().parse(avroRecordArrayJson)
244
val sparkRecordArray = SchemaConverters.toSqlType(recordArraySchema)
245
// Result: ArrayType(StructType(...), containsNull = false)
246
```
247

248
### Map Types
249

250
```scala
251
// Simple map (string keys to string values)
252
val avroMapJson = """
253
{
254
  "type": "map",
255
  "values": "string"
256
}
257
"""
258
val mapSchema = new Schema.Parser().parse(avroMapJson)
259
val sparkMap = SchemaConverters.toSqlType(mapSchema)
260
// Result: MapType(StringType, StringType, valueContainsNull = false)
261

262
// Map with nullable values
263
val avroNullableMapJson = """
264
{
265
  "type": "map",
266
  "values": ["null", "int"]
267
}
268
"""
269
val nullableMapSchema = new Schema.Parser().parse(avroNullableMapJson)
270
val sparkNullableMap = SchemaConverters.toSqlType(nullableMapSchema)
271
// Result: MapType(StringType, IntegerType, valueContainsNull = true)
272

273
// Map with complex values
274
val avroComplexMapJson = """
275
{
276
  "type": "map",
277
  "values": {
278
    "type": "record",
279
    "name": "Value",
280
    "fields": [
281
      {"name": "count", "type": "int"},
282
      {"name": "total", "type": "double"}
283
    ]
284
  }
285
}
286
"""
287
val complexMapSchema = new Schema.Parser().parse(avroComplexMapJson)
288
val sparkComplexMap = SchemaConverters.toSqlType(complexMapSchema)
289
// Result: MapType(StringType, StructType(...), valueContainsNull = false)
290
```
291

292
## Union Types
293

294
### Nullable Unions
295

296
```scala
297
// Simple nullable union
298
val nullableUnionJson = """
299
{
300
  "type": "record",
301
  "name": "Record",
302
  "fields": [
303
    {"name": "id", "type": "long"},
304
    {"name": "optionalValue", "type": ["null", "string"]}
305
  ]
306
}
307
"""
308
val nullableUnionSchema = new Schema.Parser().parse(nullableUnionJson)
309
val sparkNullableUnion = SchemaConverters.toSqlType(nullableUnionSchema)
310
// Field "optionalValue" becomes StringType with nullable = true
311
```
312

313
### Complex Unions
314

315
```scala
316
// Multi-type union (converted to struct)
317
val complexUnionJson = """
318
{
319
  "type": "record",
320
  "name": "Event",
321
  "fields": [
322
    {"name": "id", "type": "long"},
323
    {"name": "data", "type": [
324
      {
325
        "type": "record",
326
        "name": "TextEvent", 
327
        "fields": [{"name": "message", "type": "string"}]
328
      },
329
      {
330
        "type": "record",
331
        "name": "NumericEvent",
332
        "fields": [{"name": "value", "type": "double"}]
333
      }
334
    ]}
335
  ]
336
}
337
"""
338

339
val complexUnionSchema = new Schema.Parser().parse(complexUnionJson)
340

341
// Default conversion (positional field names)
342
val defaultUnionStruct = SchemaConverters.toSqlType(complexUnionSchema)
343
// Field "data" becomes StructType with fields: member0, member1
344

345
// Stable identifier conversion  
346
val stableUnionStruct = SchemaConverters.toSqlType(
347
  complexUnionSchema, 
348
  useStableIdForUnionType = true
349
)
350
// Field "data" becomes StructType with fields: member_textevent, member_numericevent
351
```
352

353
### Union Type Compatibility
354

355
```scala
356
// Compatible primitive unions
357
val primitiveUnionJson = """
358
{
359
  "type": ["int", "long"]
360
}
361
"""
362
val primitiveUnionSchema = new Schema.Parser().parse(primitiveUnionJson)
363
val sparkPrimitiveUnion = SchemaConverters.toSqlType(primitiveUnionSchema)
364
// Result: LongType (widened to accommodate both int and long)
365

366
val floatUnionJson = """
367
{
368
  "type": ["float", "double"]
369
}
370
"""
371
val floatUnionSchema = new Schema.Parser().parse(floatUnionJson)
372
val sparkFloatUnion = SchemaConverters.toSqlType(floatUnionSchema)
373
// Result: DoubleType (widened to accommodate both float and double)
374
```
375

376
## Special Interval Types
377

378
### Interval Type Support
379

380
```scala { .api }
381
YearMonthIntervalType <-> int (with catalyst type property)
382
DayTimeIntervalType   <-> long (with catalyst type property)
383
```
384

385
**Examples:**
386
```scala
387
import org.apache.spark.sql.types._
388

389
// Convert interval types to Avro (preserves type information)
390
val yearMonthInterval = YearMonthIntervalType()
391
val avroYearMonth = SchemaConverters.toAvroType(yearMonthInterval)
392
// Result: {"type": "int", "spark.sql.catalyst.type": "interval year to month"}
393

394
val dayTimeInterval = DayTimeIntervalType()  
395
val avroDayTime = SchemaConverters.toAvroType(dayTimeInterval)
396
// Result: {"type": "long", "spark.sql.catalyst.type": "interval day to second"}
397

398
// When converting back, the catalyst type property is used
399
val restoredYearMonth = SchemaConverters.toSqlType(avroYearMonth)
400
// Result: YearMonthIntervalType (not IntegerType)
401
```
402

403
## Custom Type Handling
404

405
### User-Defined Types
406

407
```scala
408
// Custom UDT support through SQL types
409
import org.apache.spark.sql.types._
410

411
class CustomPointUDT extends UserDefinedType[CustomPoint] {
412
  override def sqlType: DataType = StructType(Seq(
413
    StructField("x", DoubleType, nullable = false),
414
    StructField("y", DoubleType, nullable = false)
415
  ))
416
  
417
  // Implementation details...
418
}
419

420
// UDT is converted based on its sqlType
421
val customUDT = new CustomPointUDT()
422
val avroCustomType = SchemaConverters.toAvroType(customUDT)
423
// Result: record type with x, y fields
424
```
425

426
### Enum Type Handling
427

428
```scala
429
val avroEnumJson = """
430
{
431
  "type": "enum",
432
  "name": "Color",
433
  "symbols": ["RED", "GREEN", "BLUE"]
434
}
435
"""
436
val enumSchema = new Schema.Parser().parse(avroEnumJson)
437
val sparkEnum = SchemaConverters.toSqlType(enumSchema)
438
// Result: StringType (enum values become strings)
439
```
440

441
## Data Type Validation
442

443
### Supported Type Checking
444

445
```scala
446
import org.apache.spark.sql.avro.AvroUtils
447

448
// Check if a Spark SQL type is supported for Avro conversion
449
val supportedTypes = List(
450
  IntegerType,
451
  StringType, 
452
  StructType(Seq(StructField("id", LongType))),
453
  ArrayType(StringType),
454
  MapType(StringType, IntegerType)
455
)
456

457
supportedTypes.foreach { dataType =>
458
  val isSupported = AvroUtils.supportsDataType(dataType)
459
  println(s"$dataType supported: $isSupported")
460
}
461

462
// Unsupported types
463
val unsupportedTypes = List(
464
  CalendarIntervalType,  // Not supported
465
  ObjectType(classOf[String])  // Not supported
466
)
467
```
468

469
### Schema Compatibility Checking
470

471
```scala
472
def checkTypeCompatibility(
473
    writerType: DataType, 
474
    readerType: DataType
475
): Boolean = {
476
  DataType.equalsIgnoreCompatibleNullability(writerType, readerType)
477
}
478

479
// Example compatibility checks
480
val writerSchema = StructType(Seq(
481
  StructField("id", LongType, nullable = false),
482
  StructField("name", StringType, nullable = false)
483
))
484

485
val readerSchema = StructType(Seq(
486
  StructField("id", LongType, nullable = true),      // Compatible (nullable widening)
487
  StructField("name", StringType, nullable = false),
488
  StructField("email", StringType, nullable = true)  // New optional field
489
))
490

491
val compatible = checkTypeCompatibility(writerSchema, readerSchema)
492
```
493

494
## Performance Considerations
495

496
### Type Conversion Performance
497

498
```scala
499
// Efficient type patterns for high-performance scenarios
500

501
// 1. Prefer primitive types over complex unions
502
val efficientSchema = StructType(Seq(
503
  StructField("id", LongType),
504
  StructField("value", StringType)
505
))
506

507
// 2. Use appropriate precision for decimals  
508
val efficientDecimal = DecimalType(precision = 18, scale = 2)  // Common precision
509
val inefficientDecimal = DecimalType(precision = 38, scale = 18)  // Maximum precision
510

511
// 3. Consider array element nullability
512
val efficientArray = ArrayType(StringType, containsNull = false)  // Faster
513
val nullableArray = ArrayType(StringType, containsNull = true)    // Slower due to null checks
514
```
515

516
### Memory Usage Optimization
517

518
```scala
519
// Optimize for memory usage with appropriate types
520
val memoryEfficientSchema = StructType(Seq(
521
  StructField("id", IntegerType),        // Use int instead of long if range permits
522
  StructField("flag", BooleanType),      // Boolean more efficient than string
523
  StructField("score", FloatType),       // Float instead of double if precision permits  
524
  StructField("data", BinaryType)        // Binary for large opaque data
525
))
526
```

Version

Tile

Files

data-types.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

data-types.mddocs/