0
# Apache Spark Tags
1
2
Apache Spark Tags provides Java and Scala annotations for API stability marking and test categorization within the Apache Spark ecosystem. This annotation library enables consistent test organization, clear API stability communication to users, and systematic tracking of feature evolution across Spark releases.
3
4
## Package Information
5
6
- **Package Name**: spark-tags_2.12
7
- **Package Type**: maven
8
- **Group ID**: org.apache.spark
9
- **Language**: Java/Scala
10
- **Installation**: Add Maven dependency:
11
12
```xml
13
<dependency>
14
<groupId>org.apache.spark</groupId>
15
<artifactId>spark-tags_2.12</artifactId>
16
<version>3.5.6</version>
17
</dependency>
18
```
19
20
For Gradle:
21
22
```gradle
23
implementation 'org.apache.spark:spark-tags_2.12:3.5.6'
24
```
25
26
## Core Imports
27
28
```java
29
// API Stability Annotations
30
import org.apache.spark.annotation.Stable;
31
import org.apache.spark.annotation.Unstable;
32
import org.apache.spark.annotation.Experimental;
33
import org.apache.spark.annotation.Evolving;
34
import org.apache.spark.annotation.DeveloperApi;
35
import org.apache.spark.annotation.Private;
36
import org.apache.spark.annotation.AlphaComponent;
37
38
// Test Category Annotations
39
import org.apache.spark.tags.ExtendedSQLTest;
40
import org.apache.spark.tags.DockerTest;
41
import org.apache.spark.tags.SlowHiveTest;
42
```
43
44
For Scala:
45
46
```scala
47
import org.apache.spark.annotation.Since
48
```
49
50
## Basic Usage
51
52
```java
53
// Mark a stable API
54
@Stable
55
public class MyStableAPI {
56
@Stable
57
public void stableMethod() {
58
// Implementation
59
}
60
}
61
62
// Mark experimental features
63
@Experimental
64
public class NewFeature {
65
@DeveloperApi
66
public void advancedConfiguration() {
67
// Developer-only API
68
}
69
}
70
71
// Categorize tests
72
@ExtendedSQLTest
73
public class MyExtendedSQLTest {
74
@DockerTest
75
public void testWithDocker() {
76
// Test requiring Docker
77
}
78
}
79
```
80
81
## Architecture
82
83
Apache Spark Tags is organized into two main annotation packages:
84
85
- **API Stability Package** (`org.apache.spark.annotation`): Annotations for marking API stability and intended audience. This package contains "Spark annotations to mark an API experimental or intended only for advanced usages by developers" and is reflected in Scala and Java docs.
86
- **Test Category Package** (`org.apache.spark.tags`): ScalaTest tag annotations for test categorization and filtering
87
- **Version Tracking**: Scala-specific annotation for tracking feature introduction versions
88
89
## Capabilities
90
91
### API Stability Annotations
92
93
Annotations for marking the stability and intended audience of APIs within the Apache Spark ecosystem.
94
95
```java { .api }
96
@Documented
97
@Retention(RetentionPolicy.RUNTIME)
98
@Target({ElementType.TYPE, ElementType.FIELD, ElementType.METHOD, ElementType.PARAMETER,
99
ElementType.CONSTRUCTOR, ElementType.LOCAL_VARIABLE, ElementType.PACKAGE})
100
public @interface Stable {}
101
102
@Documented
103
@Retention(RetentionPolicy.RUNTIME)
104
@Target({ElementType.TYPE, ElementType.FIELD, ElementType.METHOD, ElementType.PARAMETER,
105
ElementType.CONSTRUCTOR, ElementType.LOCAL_VARIABLE, ElementType.PACKAGE})
106
public @interface Unstable {}
107
108
@Documented
109
@Retention(RetentionPolicy.RUNTIME)
110
@Target({ElementType.TYPE, ElementType.FIELD, ElementType.METHOD, ElementType.PARAMETER,
111
ElementType.CONSTRUCTOR, ElementType.LOCAL_VARIABLE, ElementType.PACKAGE})
112
public @interface Experimental {}
113
114
@Documented
115
@Retention(RetentionPolicy.RUNTIME)
116
@Target({ElementType.TYPE, ElementType.FIELD, ElementType.METHOD, ElementType.PARAMETER,
117
ElementType.CONSTRUCTOR, ElementType.LOCAL_VARIABLE, ElementType.PACKAGE})
118
public @interface Evolving {}
119
120
@Documented
121
@Retention(RetentionPolicy.RUNTIME)
122
@Target({ElementType.TYPE, ElementType.FIELD, ElementType.METHOD, ElementType.PARAMETER,
123
ElementType.CONSTRUCTOR, ElementType.LOCAL_VARIABLE, ElementType.PACKAGE})
124
public @interface DeveloperApi {}
125
126
@Documented
127
@Retention(RetentionPolicy.RUNTIME)
128
@Target({ElementType.TYPE, ElementType.FIELD, ElementType.METHOD, ElementType.PARAMETER,
129
ElementType.CONSTRUCTOR, ElementType.LOCAL_VARIABLE, ElementType.PACKAGE})
130
public @interface Private {}
131
132
@Retention(RetentionPolicy.RUNTIME)
133
@Target({ElementType.TYPE, ElementType.FIELD, ElementType.METHOD, ElementType.PARAMETER,
134
ElementType.CONSTRUCTOR, ElementType.LOCAL_VARIABLE, ElementType.PACKAGE})
135
public @interface AlphaComponent {}
136
```
137
138
**Stability Levels (in order of stability):**
139
140
- **@Stable**: APIs that retain source and binary compatibility within a major release. Can change between major releases (e.g., 1.0 to 2.0).
141
- **@Evolving**: APIs meant to evolve towards becoming stable APIs but not stable yet. Can change between feature releases (e.g., 2.1 to 2.2).
142
- **@Experimental**: Experimental user-facing APIs that might change or be removed in minor versions of Spark, or be adopted as first-class Spark APIs. **Note**: When used with Scaladoc, the first line of any preceding comment must be ":: Experimental ::" with no trailing blank line due to known Scaladoc display limitations.
143
- **@Unstable**: APIs with no guarantee on stability. This is the default for unannotated classes.
144
145
**Audience Annotations:**
146
147
- **@DeveloperApi**: Lower-level, unstable APIs intended for developers. Might change or be removed in minor versions.
148
- **@Private**: Classes considered private to Spark internals with high likelihood of change. Used when standard Java/Scala visibility modifiers are insufficient.
149
- **@AlphaComponent**: New components of Spark which may have unstable APIs.
150
151
**Usage Examples:**
152
153
```java
154
// Stable public API
155
@Stable
156
public class DataFrameReader {
157
@Stable
158
public Dataset<Row> json(String path) {
159
// Implementation
160
}
161
}
162
163
// Experimental feature
164
@Experimental
165
public class MLPipeline {
166
@Experimental
167
public Model fit(Dataset<?> dataset) {
168
// Implementation
169
}
170
}
171
172
// Developer-only API
173
@DeveloperApi
174
public class InternalCacheManager {
175
@Private
176
void clearInternalCache() {
177
// Internal implementation
178
}
179
}
180
181
// Alpha component
182
@AlphaComponent
183
public class GraphXExperimental {
184
@Unstable
185
public Graph<VD, ED> processGraph() {
186
// Implementation
187
}
188
}
189
```
190
191
### Version Tracking
192
193
Scala annotation for tracking when features were introduced in Spark versions.
194
195
```scala { .api }
196
private[spark] class Since(version: String) extends StaticAnnotation
197
```
198
199
**Usage:**
200
201
```scala
202
@Since("3.0.0")
203
def newSparkFeature(): Unit = {
204
// Implementation introduced in Spark 3.0.0
205
}
206
207
@Since("2.4.0")
208
class FeatureFromTwoFour {
209
@Since("2.4.5")
210
def methodAddedInPatch(): String = "result"
211
}
212
```
213
214
The `Since` annotation is private to the Spark package (`private[spark]`) and is used internally to track API evolution. Unlike `@since` JavaDoc tags, this annotation doesn't require explicit JavaDoc and works for overridden methods that inherit documentation from parents. However, it doesn't appear in generated Java API documentation.
215
216
### Test Category Annotations
217
218
ScalaTest tag annotations for categorizing and filtering test execution in the Apache Spark test suite.
219
220
```java { .api }
221
@TagAnnotation
222
@Retention(RetentionPolicy.RUNTIME)
223
@Target({ElementType.METHOD, ElementType.TYPE})
224
public @interface ExtendedSQLTest {}
225
226
@TagAnnotation
227
@Retention(RetentionPolicy.RUNTIME)
228
@Target({ElementType.METHOD, ElementType.TYPE})
229
public @interface ExtendedHiveTest {}
230
231
@TagAnnotation
232
@Retention(RetentionPolicy.RUNTIME)
233
@Target({ElementType.METHOD, ElementType.TYPE})
234
public @interface ExtendedYarnTest {}
235
236
@TagAnnotation
237
@Retention(RetentionPolicy.RUNTIME)
238
@Target({ElementType.METHOD, ElementType.TYPE})
239
public @interface ExtendedLevelDBTest {}
240
241
@TagAnnotation
242
@Retention(RetentionPolicy.RUNTIME)
243
@Target({ElementType.METHOD, ElementType.TYPE})
244
public @interface SlowSQLTest {}
245
246
@TagAnnotation
247
@Retention(RetentionPolicy.RUNTIME)
248
@Target({ElementType.METHOD, ElementType.TYPE})
249
public @interface SlowHiveTest {}
250
251
@TagAnnotation
252
@Retention(RetentionPolicy.RUNTIME)
253
@Target({ElementType.METHOD, ElementType.TYPE})
254
public @interface DockerTest {}
255
256
@TagAnnotation
257
@Retention(RetentionPolicy.RUNTIME)
258
@Target({ElementType.METHOD, ElementType.TYPE})
259
public @interface ChromeUITest {}
260
```
261
262
All test category annotations are ScalaTest `@TagAnnotation` interfaces that can be applied to test methods or test classes. They enable selective test execution and organization.
263
264
**Test Categories:**
265
266
- **@ExtendedSQLTest**: Extended SQL functionality tests
267
- **@ExtendedHiveTest**: Extended Hive integration tests
268
- **@ExtendedYarnTest**: Extended YARN cluster tests
269
- **@ExtendedLevelDBTest**: Extended LevelDB storage tests
270
- **@SlowSQLTest**: SQL tests that take significant time to execute
271
- **@SlowHiveTest**: Hive tests that take significant time to execute
272
- **@DockerTest**: Tests requiring Docker containers
273
- **@ChromeUITest**: UI tests requiring Chrome browser
274
275
**Usage Examples:**
276
277
```java
278
// Categorize a test class
279
@ExtendedSQLTest
280
public class ComplexSQLQueryTest {
281
@Test
282
public void testComplexJoins() {
283
// Test implementation
284
}
285
}
286
287
// Categorize individual test methods
288
public class MixedTestSuite {
289
@Test
290
@SlowSQLTest
291
public void testLargeDatasetQuery() {
292
// Slow SQL test
293
}
294
295
@Test
296
@DockerTest
297
public void testWithExternalDatabase() {
298
// Test requiring Docker
299
}
300
301
@Test
302
@ChromeUITest
303
public void testWebUI() {
304
// UI test requiring Chrome
305
}
306
}
307
308
// Multiple annotations
309
@ExtendedHiveTest
310
@SlowHiveTest
311
public class HeavyHiveProcessingTest {
312
// Extended and slow Hive tests
313
}
314
```
315
316
**Test Filtering:**
317
318
These annotations enable running specific test categories:
319
320
```bash
321
# Run only extended SQL tests
322
mvn test -Dtest.include.tags=org.apache.spark.tags.ExtendedSQLTest
323
324
# Exclude slow tests
325
mvn test -Dtest.exclude.tags=org.apache.spark.tags.SlowSQLTest,org.apache.spark.tags.SlowHiveTest
326
327
# Run only Docker tests
328
mvn test -Dtest.include.tags=org.apache.spark.tags.DockerTest
329
```
330
331
## Type Definitions
332
333
All annotations are marker annotations (no parameters) that serve as metadata for the Java/Scala compiler and runtime reflection. The annotations use Java's built-in annotation types:
334
335
```java { .api }
336
// Standard annotation imports used by all annotations
337
import java.lang.annotation.Documented;
338
import java.lang.annotation.ElementType;
339
import java.lang.annotation.Retention;
340
import java.lang.annotation.RetentionPolicy;
341
import java.lang.annotation.Target;
342
343
// ScalaTest import for test annotations
344
import org.scalatest.TagAnnotation;
345
```
346
347
**Retention Policies:**
348
- `RetentionPolicy.RUNTIME`: Annotations are available at runtime via reflection
349
- All annotations in this library use RUNTIME retention
350
351
**Target Elements:**
352
- API stability annotations can be applied to: TYPE, FIELD, METHOD, PARAMETER, CONSTRUCTOR, LOCAL_VARIABLE, PACKAGE
353
- Test category annotations can be applied to: METHOD, TYPE only
354
355
**Scala Annotation Features:**
356
```scala { .api }
357
import scala.annotation.StaticAnnotation
358
import scala.annotation.meta._
359
360
// Meta-annotations for Scala annotation targets
361
@param @field @getter @setter @beanGetter @beanSetter
362
private[spark] class Since(version: String) extends StaticAnnotation
363
```
364
365
The `Since` annotation uses Scala's meta-annotation system to apply to multiple target types simultaneously (parameters, fields, getters, setters, and bean accessors).