CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-com-github-luben--zstd-jni

JNI bindings for Zstd native library that provides fast and high compression lossless algorithm for Java and all JVM languages.

Pending
Overview
Eval results
Files

dictionary-compression.mddocs/

Dictionary-Based Compression

Dictionary-based compression provides improved compression ratios when compressing similar data by using pre-trained dictionaries. Zstd-jni supports both raw byte array dictionaries and pre-compiled dictionary objects for optimal performance.

Capabilities

Dictionary Compression with Byte Arrays

Compress data using raw dictionary byte arrays.

/**
 * Compresses data using a byte array dictionary
 * @param src source data to compress
 * @param dict dictionary data
 * @param level compression level (1-22)
 * @return compressed data as byte array
 */
public static byte[] compressUsingDict(byte[] src, byte[] dict, int level);

/**
 * Compresses source into destination buffer using dictionary
 * @param dst destination buffer (must be sized using compressBound)
 * @param src source data to compress
 * @param dict dictionary data
 * @param level compression level (1-22)
 * @return number of bytes written to dst, or error code
 */
public static long compress(byte[] dst, byte[] src, byte[] dict, int level);

/**
 * ByteBuffer compression with byte array dictionary
 * @param dstBuf destination buffer (must be direct)
 * @param srcBuf source buffer (must be direct)
 * @param dict dictionary data
 * @param level compression level (1-22)
 * @return number of bytes written to destination
 */
public static int compress(ByteBuffer dstBuf, ByteBuffer srcBuf, byte[] dict, int level);

/**
 * ByteBuffer compression with dictionary, returns new buffer
 * @param srcBuf source buffer (must be direct)
 * @param dict dictionary data
 * @param level compression level (1-22)
 * @return new direct ByteBuffer containing compressed data
 */
public static ByteBuffer compress(ByteBuffer srcBuf, byte[] dict, int level);

Usage Examples:

import com.github.luben.zstd.Zstd;

// Create dictionary from sample data
String[] samples = {
    "The quick brown fox jumps over the lazy dog",
    "The lazy dog sleeps under the quick brown fox", 
    "A quick brown fox and a lazy dog are friends"
};
byte[][] sampleBytes = Arrays.stream(samples)
    .map(String::getBytes)
    .toArray(byte[][]::new);

// Train dictionary
byte[] dictionary = new byte[1024];
long dictSize = Zstd.trainFromBuffer(sampleBytes, dictionary);
if (Zstd.isError(dictSize)) {
    throw new RuntimeException("Dictionary training failed");
}
dictionary = Arrays.copyOf(dictionary, (int) dictSize);

// Compress with dictionary
byte[] data = "The quick brown fox runs fast".getBytes();
byte[] compressed = Zstd.compressUsingDict(data, dictionary, 6);

Dictionary Compression with Dictionary Objects

Use pre-compiled dictionary objects for better performance when reusing dictionaries.

/**
 * Compresses data using a pre-compiled compression dictionary
 * @param src source data to compress
 * @param dict pre-compiled compression dictionary
 * @return compressed data as byte array
 */
public static byte[] compress(byte[] src, ZstdDictCompress dict);

/**
 * ByteBuffer compression with pre-compiled dictionary
 * @param dstBuf destination buffer (must be direct)
 * @param srcBuf source buffer (must be direct)
 * @param dict pre-compiled compression dictionary
 * @return number of bytes written to destination
 */
public static int compress(ByteBuffer dstBuf, ByteBuffer srcBuf, ZstdDictCompress dict);

/**
 * ByteBuffer compression with dictionary, returns new buffer
 * @param srcBuf source buffer (must be direct)
 * @param dict pre-compiled compression dictionary
 * @return new direct ByteBuffer containing compressed data
 */
public static ByteBuffer compress(ByteBuffer srcBuf, ZstdDictCompress dict);

Usage Examples:

import com.github.luben.zstd.ZstdDictCompress;

// Create pre-compiled dictionary for reuse
try (ZstdDictCompress dict = new ZstdDictCompress(dictionary, 6)) {
    // Compress multiple pieces of data efficiently
    byte[] data1 = "The quick brown fox".getBytes();
    byte[] data2 = "The lazy dog sleeps".getBytes();
    
    byte[] compressed1 = Zstd.compress(data1, dict);
    byte[] compressed2 = Zstd.compress(data2, dict);
}

Dictionary Decompression with Byte Arrays

Decompress data that was compressed with dictionary using raw byte array dictionaries.

/**
 * Decompresses data using a byte array dictionary
 * @param src compressed data
 * @param dict dictionary data (same as used for compression)
 * @param originalSize size of original uncompressed data
 * @return decompressed data as byte array
 */
public static byte[] decompress(byte[] src, byte[] dict, int originalSize);

/**
 * Decompresses source into destination buffer using dictionary
 * @param dst destination buffer (must be sized to original size)
 * @param src compressed data
 * @param dict dictionary data
 * @return number of bytes written to dst, or error code
 */
public static long decompress(byte[] dst, byte[] src, byte[] dict);

/**
 * ByteBuffer decompression with byte array dictionary
 * @param dstBuf destination buffer (must be direct)
 * @param srcBuf source buffer (must be direct)
 * @param dict dictionary data
 * @return number of bytes written to destination
 */
public static int decompress(ByteBuffer dstBuf, ByteBuffer srcBuf, byte[] dict);

/**
 * ByteBuffer decompression with dictionary, returns new buffer
 * @param srcBuf source buffer (must be direct)
 * @param dict dictionary data
 * @param originalSize size of original uncompressed data
 * @return new direct ByteBuffer containing decompressed data
 */
public static ByteBuffer decompress(ByteBuffer srcBuf, byte[] dict, int originalSize);

Dictionary Decompression with Dictionary Objects

Use pre-compiled decompression dictionary objects for better performance.

/**
 * Decompresses data using a pre-compiled decompression dictionary
 * @param src compressed data
 * @param dict pre-compiled decompression dictionary
 * @param originalSize size of original uncompressed data
 * @return decompressed data as byte array
 */
public static byte[] decompress(byte[] src, ZstdDictDecompress dict, int originalSize);

/**
 * ByteBuffer decompression with pre-compiled dictionary
 * @param dstBuf destination buffer (must be direct)
 * @param srcBuf source buffer (must be direct) 
 * @param dict pre-compiled decompression dictionary
 * @return number of bytes written to destination
 */
public static int decompress(ByteBuffer dstBuf, ByteBuffer srcBuf, ZstdDictDecompress dict);

/**
 * ByteBuffer decompression with dictionary, returns new buffer
 * @param srcBuf source buffer (must be direct)
 * @param dict pre-compiled decompression dictionary
 * @param originalSize size of original uncompressed data
 * @return new direct ByteBuffer containing decompressed data
 */
public static ByteBuffer decompress(ByteBuffer srcBuf, ZstdDictDecompress dict, int originalSize);

Usage Examples:

import com.github.luben.zstd.ZstdDictDecompress;

// Create pre-compiled decompression dictionary
try (ZstdDictDecompress dict = new ZstdDictDecompress(dictionary)) {
    // Decompress multiple pieces of data efficiently
    byte[] decompressed1 = Zstd.decompress(compressed1, dict, originalSize1);
    byte[] decompressed2 = Zstd.decompress(compressed2, dict, originalSize2);
}

Dictionary Training

Create optimized dictionaries from sample data.

/**
 * Creates a dictionary from sample data
 * @param samples array of sample byte arrays representing typical data
 * @param dictBuffer buffer to store the created dictionary
 * @return size of dictionary written to buffer, or error code
 */
public static long trainFromBuffer(byte[][] samples, byte[] dictBuffer);

Usage Examples:

// Collect sample data representative of what you'll compress
List<String> sampleTexts = Arrays.asList(
    "Sample text with common patterns",
    "Another sample with similar patterns",
    "More sample text following the same structure"
);

byte[][] samples = sampleTexts.stream()
    .map(String::getBytes)
    .toArray(byte[][]::new);

// Train dictionary (size should be much smaller than total sample size)
byte[] dictBuffer = new byte[4096]; // 4KB dictionary
long dictSize = Zstd.trainFromBuffer(samples, dictBuffer);

if (Zstd.isError(dictSize)) {
    throw new RuntimeException("Dictionary training failed: " + Zstd.getErrorName(dictSize));
}

// Trim dictionary to actual size
byte[] dictionary = Arrays.copyOf(dictBuffer, (int) dictSize);

Dictionary Objects

/**
 * Pre-compiled compression dictionary for efficient reuse
 */
class ZstdDictCompress implements Closeable {
    /**
     * Creates compression dictionary from byte array
     * @param dict dictionary data
     * @param level compression level to compile into dictionary
     */
    public ZstdDictCompress(byte[] dict, int level);
    
    /**
     * Creates compression dictionary from byte array segment
     * @param dict dictionary data buffer
     * @param offset offset in buffer
     * @param length number of bytes to use
     * @param level compression level to compile into dictionary
     */
    public ZstdDictCompress(byte[] dict, int offset, int length, int level);
    
    /**
     * Releases native dictionary resources
     */
    public void close() throws IOException;
}

/**
 * Pre-compiled decompression dictionary for efficient reuse
 */
class ZstdDictDecompress implements Closeable {
    /**
     * Creates decompression dictionary from byte array
     * @param dict dictionary data
     */
    public ZstdDictDecompress(byte[] dict);
    
    /**
     * Creates decompression dictionary from byte array segment  
     * @param dict dictionary data buffer
     * @param offset offset in buffer
     * @param length number of bytes to use
     */
    public ZstdDictDecompress(byte[] dict, int offset, int length);
    
    /**
     * Releases native dictionary resources
     */
    public void close() throws IOException;
}

Performance Tips

  • Dictionary reuse: Use pre-compiled dictionary objects (ZstdDictCompress/ZstdDictDecompress) when compressing multiple data items
  • Dictionary size: Optimal dictionary size is typically 100KB or less for most use cases
  • Training data: Use representative sample data that closely matches your actual compression workload
  • Memory management: Always close dictionary objects to free native memory
  • Compression improvement: Dictionaries work best on data with repeated patterns or similar structure

Install with Tessl CLI

npx tessl i tessl/maven-com-github-luben--zstd-jni

docs

dictionary-compression.md

direct-buffer-streaming.md

index.md

static-compression.md

stream-compression.md

utility-functions.md

tile.json