Tessl Tile for maven/org.apache.spark/spark-avro_2.13@3.5.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

binary-conversion.md configuration.md data-types.md file-operations.md index.md schema-conversion.md

index.mddocs/

0
# Apache Spark Avro Data Source
1

2
The Apache Spark Avro Data Source provides built-in support for reading and writing Apache Avro data in Apache Spark SQL. This library enables seamless conversion between Spark's Catalyst data types and Avro format, supporting schema evolution, complex nested data structures, and efficient serialization for large-scale data processing.
3

4
## Package Information
5

6
- **Name**: spark-avro_2.13
7
- **Type**: Scala library
8
- **Language**: Scala (compatible with Scala 2.12 and 2.13)
9
- **Platform**: JVM
10
- **Latest Version**: 3.5.6
11
- **Language Bindings**: Also available in Python (PySpark), Java, and R
12

13
### Installation
14

15
For Maven:
16
```xml
17
<dependency>
18
    <groupId>org.apache.spark</groupId>
19
    <artifactId>spark-avro_2.13</artifactId>
20
    <version>3.5.6</version>
21
</dependency>
22
```
23

24
For SBT:
25
```scala
26
libraryDependencies += "org.apache.spark" %% "spark-avro" % "3.5.6"
27
```
28

29
## Core Imports
30

31
```scala
32
// For reading/writing Avro files as DataFrames
33
import org.apache.spark.sql.SparkSession
34

35
// For binary Avro data conversion functions
36
import org.apache.spark.sql.avro.functions._
37

38
// For schema conversion utilities
39
import org.apache.spark.sql.avro.SchemaConverters
40

41
// For DataFrames and Column operations
42
import org.apache.spark.sql.{DataFrame, Column}
43
import org.apache.spark.sql.functions._
44
```
45

46
## Basic Usage
47

48
### Reading Avro Files
49
```scala
50
val spark = SparkSession.builder()
51
  .appName("Avro Example")
52
  .getOrCreate()
53

54
// Read Avro files into DataFrame
55
val df = spark.read
56
  .format("avro")
57
  .load("path/to/avro/files")
58

59
df.show()
60
```
61

62
### Writing Avro Files
63
```scala
64
// Write DataFrame to Avro format
65
df.write
66
  .format("avro")
67
  .option("compression", "snappy")
68
  .save("path/to/output")
69
```
70

71
### Converting Binary Avro Data
72
```scala
73
import org.apache.spark.sql.avro.functions._
74

75
val avroSchema = """
76
{
77
  "type": "record",
78
  "name": "User",
79
  "fields": [
80
    {"name": "id", "type": "long"},
81
    {"name": "name", "type": "string"}
82
  ]
83
}
84
"""
85

86
// Convert binary Avro data to Catalyst format
87
val decodedDF = df.select(
88
  from_avro(col("avro_data"), avroSchema).as("decoded")
89
)
90

91
// Convert Catalyst data to binary Avro format
92
val encodedDF = df.select(
93
  to_avro(struct(col("id"), col("name"))).as("avro_data")
94
)
95
```
96

97
## Architecture
98

99
The Spark Avro connector consists of several key components:
100

101
- **Data Source Integration**: FileFormat (V1) and DataSourceV2 implementations for seamless Spark SQL integration
102
- **Schema Conversion**: Bidirectional conversion between Avro schemas and Spark SQL schemas
103
- **Binary Conversion Functions**: High-level functions for converting between binary Avro data and Catalyst values
104
- **Serialization/Deserialization**: Efficient conversion between Avro GenericRecord and Spark InternalRow formats
105

106
## Capabilities
107

108
### File Operations
109
Read and write Avro files with automatic schema inference and configurable options.
110

111
```scala
112
// Key APIs for file operations
113
def format(source: String): DataFrameReader  // Use "avro" as source
114
def option(key: String, value: String): DataFrameReader
115
def load(path: String*): DataFrame
116
def save(path: String): Unit
117
```
118

119
[File Operations Documentation](./file-operations.md)
120

121
### Binary Data Conversion
122
Convert between binary Avro data and Spark DataFrame columns using built-in functions.
123

124
```scala
125
// Key APIs for binary conversion
126
def from_avro(data: Column, jsonFormatSchema: String): Column
127
def from_avro(data: Column, jsonFormatSchema: String, options: java.util.Map[String, String]): Column
128
def to_avro(data: Column): Column
129
def to_avro(data: Column, jsonFormatSchema: String): Column
130
```
131

132
[Binary Conversion Documentation](./binary-conversion.md)
133

134
### Schema Conversion
135
Convert between Avro schemas and Spark SQL schemas for interoperability.
136

137
```scala
138
// Key APIs for schema conversion
139
def toSqlType(avroSchema: Schema): SchemaType
140
def toSqlType(avroSchema: Schema, options: Map[String, String]): SchemaType
141
def toAvroType(catalystType: DataType, nullable: Boolean, recordName: String, nameSpace: String): Schema
142

143
case class SchemaType(dataType: DataType, nullable: Boolean)
144
```
145

146
[Schema Conversion Documentation](./schema-conversion.md)
147

148
### Configuration Options
149
Comprehensive configuration options for reading, writing, and data conversion operations.
150

151
```scala
152
// Example configuration usage
153
val df = spark.read
154
  .format("avro")
155
  .option("avroSchema", customSchemaJson)
156
  .option("mode", "PERMISSIVE")
157
  .option("positionalFieldMatching", "true")
158
  .load("path/to/files")
159
```
160

161
[Configuration Options Documentation](./configuration.md)
162

163
### Data Type Support
164
Support for all Spark SQL data types including complex nested structures, logical types, and custom types.
165

166
[Data Type Support Documentation](./data-types.md)

Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/