or run

tessl search

Version

Workspace: tessl
Visibility: Public
Created: 4 months ago
Last updated: 4 months ago
Describes: pkg:maven/org.apache.spark/spark-kvstore_2.13@3.5.x

docs

core-operations.md index.md indexing-querying.md serialization.md storage-implementations.md

tile.json

tessl/maven-org-apache-spark--spark-kvstore_2-13

tessl install tessl/maven-org-apache-spark--spark-kvstore_2-13@3.5.0

Local key/value store abstraction for Apache Spark with thread-safe operations, automatic serialization, and indexing capabilities

Storage Implementations

Apache Spark KVStore provides three distinct storage implementations, each optimized for different use cases: InMemoryStore for high-performance temporary storage, LevelDB for reliable persistent storage, and RocksDB for high-throughput persistent storage with advanced compression.

Imports

import org.apache.spark.util.kvstore.InMemoryStore;
import org.apache.spark.util.kvstore.LevelDB;
import org.apache.spark.util.kvstore.RocksDB;
import org.apache.spark.util.kvstore.KVStoreSerializer;
import org.apache.spark.util.kvstore.UnsupportedStoreVersionException;
import java.io.File;
import java.util.List;

Capabilities

InMemoryStore

High-performance in-memory implementation that keeps all data deserialized in memory. Ideal for temporary caching, session storage, and development/testing scenarios.

public class InMemoryStore implements KVStore {
    public InMemoryStore();
}

Usage Example:

// Create in-memory store
KVStore store = new InMemoryStore();

// Use normally - all data kept in memory
User user = new User("user123", "Alice", "Engineering");
store.write(user);
User retrieved = store.read(User.class, "user123");

// Data is lost when store is closed or application exits
store.close();

Characteristics:

Performance: Fastest read/write operations, no serialization overhead
Memory Usage: Stores objects in deserialized form, higher memory consumption
Persistence: No persistence - data lost on close/restart
Indexing: Dynamic sorting on iteration (slower for large datasets)
Thread Safety: Full thread safety with concurrent collections
Use Cases: Caching, temporary storage, development, testing

LevelDB Implementation

Persistent storage using Google's LevelDB with Jackson JSON serialization and GZIP compression. Provides reliable storage with good performance characteristics.

public class LevelDB implements KVStore {
    public LevelDB(File path) throws Exception;
    public LevelDB(File path, KVStoreSerializer serializer) throws Exception;
    public void writeAll(List<?> values) throws Exception;
    
    static final long STORE_VERSION = 1L;
}

Usage Example:

// Create LevelDB store with default serializer
File dbPath = new File("/path/to/leveldb");
KVStore store = new LevelDB(dbPath);

// Or with custom serializer
KVStoreSerializer customSerializer = new MyCustomSerializer();
KVStore store = new LevelDB(dbPath, customSerializer);

// Use normally - data persisted to disk
User user = new User("user123", "Alice", "Engineering");
store.write(user);

// Data survives application restarts
store.close();

// Reopen same database
KVStore reopened = new LevelDB(dbPath);
User retrieved = reopened.read(User.class, "user123"); // Still there!
reopened.close();

Characteristics:

Performance: Good read/write performance with efficient key-value operations
Storage: JSON serialization with GZIP compression
Persistence: Full durability with write-ahead logging
Memory Usage: Efficient memory usage with LRU caching
Thread Safety: Full thread safety with proper locking
Compatibility: Version checking prevents incompatible store access
Use Cases: Application state, metrics storage, local databases

Configuration:

Database files created automatically if path doesn't exist
Uses LevelDB JNI native libraries for optimal performance
Automatic compression reduces disk space usage
Built-in store version compatibility checking

RocksDB Implementation

High-performance persistent storage using Facebook's RocksDB with advanced compression algorithms and optimized block-based storage format.

public class RocksDB implements KVStore {
    public RocksDB(File path) throws Exception;
    public RocksDB(File path, KVStoreSerializer serializer) throws Exception;
    public void writeAll(List<?> values) throws Exception;
    
    static final long STORE_VERSION = 1L;
}

Usage Example:

// Create RocksDB store  
File dbPath = new File("/path/to/rocksdb");
KVStore store = new RocksDB(dbPath);

// High-throughput operations
for (int i = 0; i < 100000; i++) {
    User user = new User("user" + i, "User " + i, "Department" + (i % 10));
    store.write(user);
}

// Efficient bulk operations
List<String> userIds = IntStream.range(0, 1000)
    .mapToObj(i -> "user" + i)
    .collect(Collectors.toList());
store.removeAllByIndexValues(User.class, "__main__", userIds);

store.close();

Characteristics:

Performance: Highest throughput for write-heavy workloads
Storage: Advanced compression (LZ4 + ZSTD) with optimized block format
Persistence: Full ACID compliance with write-ahead logging
Memory Usage: Highly configurable memory management
Thread Safety: Optimized for concurrent access patterns
Scalability: Handles very large datasets efficiently
Use Cases: High-volume logging, time-series data, large-scale caching

Configuration:

Compression: LZ4 for general levels, ZSTD for bottom level (maximum compression)
Bloom Filters: Enabled for faster key lookups
Block Format: Version 5 block format with optimized indexing
Write Options: Async writes (sync=false) for better performance

Bulk Operations

Both LevelDB and RocksDB implementations provide optimized bulk write operations for improved performance when storing multiple objects at once.

public void writeAll(List<?> values) throws Exception;

Usage Example:

import java.util.Arrays;
import java.util.List;

// Create sample data
List<User> users = Arrays.asList(
    new User("user1", "Alice", "Engineering"),
    new User("user2", "Bob", "Marketing"), 
    new User("user3", "Carol", "Engineering")
);

// Bulk write using LevelDB
LevelDB levelDbStore = new LevelDB(new File("/path/to/leveldb"));
levelDbStore.writeAll(users);

// Bulk write using RocksDB  
RocksDB rocksDbStore = new RocksDB(new File("/path/to/rocksdb"));
rocksDbStore.writeAll(users);

// Close stores
levelDbStore.close();
rocksDbStore.close();

Performance Benefits:

Batch Processing: Reduces overhead of individual write operations
Transaction Optimization: Groups writes into fewer storage transactions
Index Efficiency: Reduces index update overhead through batching

Parameters:

values: List of objects to write in bulk (all objects must have natural key annotations)

Exceptions:

Exception: For serialization errors, storage backend issues, or duplicate key conflicts

Note: InMemoryStore does not provide a specific writeAll method but can achieve similar functionality through multiple individual write() calls.

Choosing the Right Implementation

Performance Comparison

Feature	InMemoryStore	LevelDB	RocksDB
Read Speed	Fastest	Fast	Fast
Write Speed	Fastest	Good	Excellent
Memory Usage	High	Low	Low
Disk Usage	None	Medium	Low (compressed)
Startup Time	Instant	Fast	Medium
Large Datasets	Limited by RAM	Good	Excellent

Use Case Guidelines

Choose InMemoryStore when:

Data fits comfortably in memory
Maximum read/write performance required
Temporary storage or caching scenarios
Development and testing environments
Data doesn't need to survive application restarts

Choose LevelDB when:

Need reliable persistent storage
Moderate data sizes (< 100GB)
Balanced read/write workloads
Simple deployment requirements
Prefer stable, well-tested storage engine

Choose RocksDB when:

High-volume write workloads
Large datasets (100GB+)
Need advanced compression to save disk space
Write-heavy applications (logging, metrics, time-series)
Can handle slightly more complex deployment

Storage Backend Configuration

Custom Serialization

All persistent implementations support custom serializers for specialized encoding requirements:

public class CustomSerializer extends KVStoreSerializer {
    public CustomSerializer() {
        super();
        // Configure ObjectMapper for specific needs
        mapper.configure(JsonGenerator.Feature.WRITE_NUMBERS_AS_STRINGS, true);
        mapper.setDateFormat(new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"));
    }
}

KVStore store = new RocksDB(dbPath, new CustomSerializer());

Error Handling

try {
    KVStore store = new LevelDB(new File("/invalid/path"));
} catch (UnsupportedStoreVersionException e) {
    // Store created with incompatible version
    System.err.println("Database version incompatible");
} catch (Exception e) {
    // Other initialization errors (disk full, permissions, etc.)
    System.err.println("Failed to open database: " + e.getMessage());
}

Best Practices

Resource Management: Always close stores to prevent resource leaks
Path Management: Use absolute paths for persistent stores
Error Handling: Catch UnsupportedStoreVersionException for version conflicts
Concurrent Access: Don't open the same database path from multiple processes
Backup Strategy: Implement regular backups for critical persistent data
Monitoring: Monitor disk space usage for persistent implementations

Version

tessl/maven-org-apache-spark--spark-kvstore_2-13