or run

tessl search
Log in

Version

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.spark/spark-kvstore_2.13@3.5.x

docs

core-operations.mdindex.mdindexing-querying.mdserialization.mdstorage-implementations.md
tile.json

tessl/maven-org-apache-spark--spark-kvstore_2-13

tessl install tessl/maven-org-apache-spark--spark-kvstore_2-13@3.5.0

Local key/value store abstraction for Apache Spark with thread-safe operations, automatic serialization, and indexing capabilities

storage-implementations.mddocs/

Storage Implementations

Apache Spark KVStore provides three distinct storage implementations, each optimized for different use cases: InMemoryStore for high-performance temporary storage, LevelDB for reliable persistent storage, and RocksDB for high-throughput persistent storage with advanced compression.

Imports

import org.apache.spark.util.kvstore.InMemoryStore;
import org.apache.spark.util.kvstore.LevelDB;
import org.apache.spark.util.kvstore.RocksDB;
import org.apache.spark.util.kvstore.KVStoreSerializer;
import org.apache.spark.util.kvstore.UnsupportedStoreVersionException;
import java.io.File;
import java.util.List;

Capabilities

InMemoryStore

High-performance in-memory implementation that keeps all data deserialized in memory. Ideal for temporary caching, session storage, and development/testing scenarios.

public class InMemoryStore implements KVStore {
    public InMemoryStore();
}

Usage Example:

// Create in-memory store
KVStore store = new InMemoryStore();

// Use normally - all data kept in memory
User user = new User("user123", "Alice", "Engineering");
store.write(user);
User retrieved = store.read(User.class, "user123");

// Data is lost when store is closed or application exits
store.close();

Characteristics:

  • Performance: Fastest read/write operations, no serialization overhead
  • Memory Usage: Stores objects in deserialized form, higher memory consumption
  • Persistence: No persistence - data lost on close/restart
  • Indexing: Dynamic sorting on iteration (slower for large datasets)
  • Thread Safety: Full thread safety with concurrent collections
  • Use Cases: Caching, temporary storage, development, testing

LevelDB Implementation

Persistent storage using Google's LevelDB with Jackson JSON serialization and GZIP compression. Provides reliable storage with good performance characteristics.

public class LevelDB implements KVStore {
    public LevelDB(File path) throws Exception;
    public LevelDB(File path, KVStoreSerializer serializer) throws Exception;
    public void writeAll(List<?> values) throws Exception;
    
    static final long STORE_VERSION = 1L;
}

Usage Example:

// Create LevelDB store with default serializer
File dbPath = new File("/path/to/leveldb");
KVStore store = new LevelDB(dbPath);

// Or with custom serializer
KVStoreSerializer customSerializer = new MyCustomSerializer();
KVStore store = new LevelDB(dbPath, customSerializer);

// Use normally - data persisted to disk
User user = new User("user123", "Alice", "Engineering");
store.write(user);

// Data survives application restarts
store.close();

// Reopen same database
KVStore reopened = new LevelDB(dbPath);
User retrieved = reopened.read(User.class, "user123"); // Still there!
reopened.close();

Characteristics:

  • Performance: Good read/write performance with efficient key-value operations
  • Storage: JSON serialization with GZIP compression
  • Persistence: Full durability with write-ahead logging
  • Memory Usage: Efficient memory usage with LRU caching
  • Thread Safety: Full thread safety with proper locking
  • Compatibility: Version checking prevents incompatible store access
  • Use Cases: Application state, metrics storage, local databases

Configuration:

  • Database files created automatically if path doesn't exist
  • Uses LevelDB JNI native libraries for optimal performance
  • Automatic compression reduces disk space usage
  • Built-in store version compatibility checking

RocksDB Implementation

High-performance persistent storage using Facebook's RocksDB with advanced compression algorithms and optimized block-based storage format.

public class RocksDB implements KVStore {
    public RocksDB(File path) throws Exception;
    public RocksDB(File path, KVStoreSerializer serializer) throws Exception;
    public void writeAll(List<?> values) throws Exception;
    
    static final long STORE_VERSION = 1L;
}

Usage Example:

// Create RocksDB store  
File dbPath = new File("/path/to/rocksdb");
KVStore store = new RocksDB(dbPath);

// High-throughput operations
for (int i = 0; i < 100000; i++) {
    User user = new User("user" + i, "User " + i, "Department" + (i % 10));
    store.write(user);
}

// Efficient bulk operations
List<String> userIds = IntStream.range(0, 1000)
    .mapToObj(i -> "user" + i)
    .collect(Collectors.toList());
store.removeAllByIndexValues(User.class, "__main__", userIds);

store.close();

Characteristics:

  • Performance: Highest throughput for write-heavy workloads
  • Storage: Advanced compression (LZ4 + ZSTD) with optimized block format
  • Persistence: Full ACID compliance with write-ahead logging
  • Memory Usage: Highly configurable memory management
  • Thread Safety: Optimized for concurrent access patterns
  • Scalability: Handles very large datasets efficiently
  • Use Cases: High-volume logging, time-series data, large-scale caching

Configuration:

  • Compression: LZ4 for general levels, ZSTD for bottom level (maximum compression)
  • Bloom Filters: Enabled for faster key lookups
  • Block Format: Version 5 block format with optimized indexing
  • Write Options: Async writes (sync=false) for better performance

Bulk Operations

Both LevelDB and RocksDB implementations provide optimized bulk write operations for improved performance when storing multiple objects at once.

public void writeAll(List<?> values) throws Exception;

Usage Example:

import java.util.Arrays;
import java.util.List;

// Create sample data
List<User> users = Arrays.asList(
    new User("user1", "Alice", "Engineering"),
    new User("user2", "Bob", "Marketing"), 
    new User("user3", "Carol", "Engineering")
);

// Bulk write using LevelDB
LevelDB levelDbStore = new LevelDB(new File("/path/to/leveldb"));
levelDbStore.writeAll(users);

// Bulk write using RocksDB  
RocksDB rocksDbStore = new RocksDB(new File("/path/to/rocksdb"));
rocksDbStore.writeAll(users);

// Close stores
levelDbStore.close();
rocksDbStore.close();

Performance Benefits:

  • Batch Processing: Reduces overhead of individual write operations
  • Transaction Optimization: Groups writes into fewer storage transactions
  • Index Efficiency: Reduces index update overhead through batching

Parameters:

  • values: List of objects to write in bulk (all objects must have natural key annotations)

Exceptions:

  • Exception: For serialization errors, storage backend issues, or duplicate key conflicts

Note: InMemoryStore does not provide a specific writeAll method but can achieve similar functionality through multiple individual write() calls.

Choosing the Right Implementation

Performance Comparison

FeatureInMemoryStoreLevelDBRocksDB
Read SpeedFastestFastFast
Write SpeedFastestGoodExcellent
Memory UsageHighLowLow
Disk UsageNoneMediumLow (compressed)
Startup TimeInstantFastMedium
Large DatasetsLimited by RAMGoodExcellent

Use Case Guidelines

Choose InMemoryStore when:

  • Data fits comfortably in memory
  • Maximum read/write performance required
  • Temporary storage or caching scenarios
  • Development and testing environments
  • Data doesn't need to survive application restarts

Choose LevelDB when:

  • Need reliable persistent storage
  • Moderate data sizes (< 100GB)
  • Balanced read/write workloads
  • Simple deployment requirements
  • Prefer stable, well-tested storage engine

Choose RocksDB when:

  • High-volume write workloads
  • Large datasets (100GB+)
  • Need advanced compression to save disk space
  • Write-heavy applications (logging, metrics, time-series)
  • Can handle slightly more complex deployment

Storage Backend Configuration

Custom Serialization

All persistent implementations support custom serializers for specialized encoding requirements:

public class CustomSerializer extends KVStoreSerializer {
    public CustomSerializer() {
        super();
        // Configure ObjectMapper for specific needs
        mapper.configure(JsonGenerator.Feature.WRITE_NUMBERS_AS_STRINGS, true);
        mapper.setDateFormat(new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"));
    }
}

KVStore store = new RocksDB(dbPath, new CustomSerializer());

Error Handling

try {
    KVStore store = new LevelDB(new File("/invalid/path"));
} catch (UnsupportedStoreVersionException e) {
    // Store created with incompatible version
    System.err.println("Database version incompatible");
} catch (Exception e) {
    // Other initialization errors (disk full, permissions, etc.)
    System.err.println("Failed to open database: " + e.getMessage());
}

Best Practices

  1. Resource Management: Always close stores to prevent resource leaks
  2. Path Management: Use absolute paths for persistent stores
  3. Error Handling: Catch UnsupportedStoreVersionException for version conflicts
  4. Concurrent Access: Don't open the same database path from multiple processes
  5. Backup Strategy: Implement regular backups for critical persistent data
  6. Monitoring: Monitor disk space usage for persistent implementations