CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-it-unimi-dsi--fastutil

High-performance type-specific collections extending the Java Collections Framework with memory-efficient primitive containers and big data structures.

Pending
Overview
Eval results
Files

big-data-structures.mddocs/

Big Data Structures

Collections with 64-bit indices supporting datasets larger than 2^31 elements. These structures enable handling of massive datasets that exceed the limitations of standard Java collections, which are constrained by 32-bit integer indices.

Overview

Big data structures in FastUtil are designed to handle collections with more than Integer.MAX_VALUE (2,147,483,647) elements. They use 64-bit long indices and sizes, enabling collections that can theoretically contain up to 2^63 elements, limited primarily by available memory.

Key features:

  • 64-bit Indexing: Uses long indices instead of int
  • Segmented Storage: Internal array-of-arrays structure for efficient memory management
  • Compatible Interfaces: Extend standard collection interfaces where possible
  • Specialized Operations: Optimized algorithms for big data operations

Capabilities

Size64 Interface

Foundation interface providing 64-bit size information for collections that may exceed Integer.MAX_VALUE elements.

/**
 * Interface for collections supporting 64-bit sizes
 */
public interface Size64 {
    /**
     * Get the actual size as a long value
     * @return the number of elements as long
     */
    long size64();
    
    /**
     * Get size capped at Integer.MAX_VALUE (deprecated for big collections)
     * @return size capped at Integer.MAX_VALUE
     * @deprecated Use size64() for accurate size information
     */
    @Deprecated
    default int size() {
        return (int) Math.min(size64(), Integer.MAX_VALUE);
    }
    
    /**
     * Get size as long from any collection
     * @param c the collection
     * @return the size as long
     */
    static long sizeOf(Collection<?> c) {
        return c instanceof Size64 ? ((Size64) c).size64() : c.size();
    }
    
    /**
     * Get size as long from any map
     * @param m the map
     * @return the size as long
     */
    static long sizeOf(Map<?,?> m) {
        return m instanceof Size64 ? ((Size64) m).size64() : m.size();
    }
}

Usage Examples:

import it.unimi.dsi.fastutil.Size64;
import it.unimi.dsi.fastutil.objects.*;

// Working with potentially big collections
ObjectBigList<String> bigList = new ObjectBigArrayBigList<>();
// ... add billions of elements ...

// Safe size checking
long actualSize = bigList.size64();
if (actualSize > Integer.MAX_VALUE) {
    System.out.println("Collection has " + actualSize + " elements (exceeds int range)");
} else {
    System.out.println("Collection has " + actualSize + " elements");
}

// Generic size checking
long size = Size64.sizeOf(someCollection); // Works with any collection

BigList Interface

List interface supporting 64-bit indices, enabling lists with more than 2^31 elements.

/**
 * List with 64-bit indices supporting massive datasets
 * @param <K> the type of elements
 */
public interface BigList<K> extends Collection<K>, Size64 {
    /**
     * Get element at 64-bit index
     * @param index the 64-bit index
     * @return the element at index
     * @throws IndexOutOfBoundsException if index is invalid
     */
    K get(long index);
    
    /**
     * Set element at 64-bit index
     * @param index the 64-bit index
     * @param element the element to set
     * @return the previous element at index
     * @throws IndexOutOfBoundsException if index is invalid
     */
    K set(long index, K element);
    
    /**
     * Insert element at 64-bit index
     * @param index the 64-bit index
     * @param element the element to insert
     * @throws IndexOutOfBoundsException if index is invalid
     */
    void add(long index, K element);
    
    /**
     * Remove element at 64-bit index
     * @param index the 64-bit index
     * @return the removed element
     * @throws IndexOutOfBoundsException if index is invalid
     */
    K remove(long index);
    
    /**
     * Find first occurrence of element (returns long index)
     * @param o the element to find
     * @return the 64-bit index, or -1 if not found
     */
    long indexOf(Object o);
    
    /**
     * Find last occurrence of element (returns long index)
     * @param o the element to find
     * @return the 64-bit index, or -1 if not found
     */
    long lastIndexOf(Object o);
    
    /**
     * Set the size of the list (truncate or extend with nulls)
     * @param size the new size
     */
    void size(long size);
    
    /**
     * Get big list iterator
     * @return BigListIterator for this list
     */
    BigListIterator<K> listIterator();
    
    /**
     * Get big list iterator starting at 64-bit index
     * @param index the starting 64-bit index
     * @return BigListIterator starting at index
     */
    BigListIterator<K> listIterator(long index);
    
    /**
     * Get sublist view with 64-bit indices
     * @param from start index (inclusive)
     * @param to end index (exclusive)
     * @return BigList view of the specified range
     */
    BigList<K> subList(long from, long to);
    
    /**
     * Get elements as array (limited by array size constraints)
     * @param a the array to fill (if large enough)
     * @return array containing the elements
     */
    <T> T[] toArray(T[] a);
}

Usage Examples:

import it.unimi.dsi.fastutil.objects.*;

// Create a big list that can grow beyond Integer.MAX_VALUE
ObjectBigList<String> bigData = new ObjectBigArrayBigList<>();

// Add billions of elements (example with smaller numbers for demo)
for (long i = 0; i < 3_000_000_000L; i++) {
    bigData.add("Element " + i);
    if (i % 100_000_000 == 0) {
        System.out.println("Added " + i + " elements, size: " + bigData.size64());
    }
}

// Access elements using 64-bit indices
String element = bigData.get(2_500_000_000L);
bigData.set(2_500_000_000L, "Updated element");

// Find elements (returns long index)
long index = bigData.indexOf("Element 1000000000");
if (index >= 0) {
    System.out.println("Found element at index: " + index);
}

// Sublist operations with big indices
BigList<String> subset = bigData.subList(1_000_000_000L, 1_000_001_000L);
System.out.println("Subset size: " + subset.size64());

BigListIterator Interface

Bidirectional iterator for BigList supporting 64-bit index navigation.

/**
 * Bidirectional iterator for BigList with 64-bit indices
 * @param <K> the type of elements
 */
public interface BigListIterator<K> extends BidirectionalIterator<K> {
    /**
     * Get next 64-bit index
     * @return the next index as long
     */
    long nextIndex();
    
    /**
     * Get previous 64-bit index
     * @return the previous index as long
     */
    long previousIndex();
    
    /**
     * Add element at current position
     * @param k the element to add
     */
    void add(K k);
    
    /**
     * Set element at current position
     * @param k the element to set
     */
    void set(K k);
    
    /**
     * Skip forward efficiently
     * @param n number of elements to skip
     * @return number of elements actually skipped
     */
    default long skip(long n) {
        long i = 0;
        while (i < n && hasNext()) {
            next();
            i++;
        }
        return i;
    }
    
    /**
     * Skip backward efficiently
     * @param n number of elements to skip backward
     * @return number of elements actually skipped
     */
    default long back(long n) {
        long i = 0;
        while (i < n && hasPrevious()) {
            previous();
            i++;
        }
        return i;
    }
}

BigArrays Utility Interface

Constants and utilities for managing segmented big arrays that support 64-bit indexing.

/**
 * Utilities for big arrays (array-of-arrays supporting 64-bit indices)
 */
public interface BigArrays {
    /** Size of each segment in big arrays (2^27 = 134,217,728) */
    long SEGMENT_SIZE = 1L << 27;
    
    /** Shift amount for segment calculations */
    int SEGMENT_SHIFT = 27;
    
    /** Mask for offset within segment */
    int SEGMENT_MASK = (1 << 27) - 1;
    
    /**
     * Calculate segment index from 64-bit index
     * @param index the 64-bit index
     * @return the segment index
     */
    static int segment(long index) {
        return (int) (index >>> SEGMENT_SHIFT);
    }
    
    /**
     * Calculate displacement within segment from 64-bit index
     * @param index the 64-bit index
     * @return the offset within segment
     */
    static int displacement(long index) {
        return (int) (index & SEGMENT_MASK);
    }
    
    /**
     * Calculate 64-bit index from segment and displacement
     * @param segment the segment index
     * @param displacement the offset within segment
     * @return the combined 64-bit index
     */
    static long index(int segment, int displacement) {
        return ((long) segment << SEGMENT_SHIFT) + displacement;
    }
}

Usage Examples:

import it.unimi.dsi.fastutil.BigArrays;

// Understanding big array structure
long bigIndex = 200_000_000_000L; // 200 billion

int segment = BigArrays.segment(bigIndex);           // Which segment
int displacement = BigArrays.displacement(bigIndex); // Offset within segment

System.out.println("Index " + bigIndex + " is in segment " + segment + 
                   " at displacement " + displacement);

// Reconstruct index
long reconstructed = BigArrays.index(segment, displacement);
assert reconstructed == bigIndex;

BigSwapper Interface

Functional interface for swapping elements at 64-bit positions, used by big sorting algorithms.

/**
 * Functional interface for swapping elements at 64-bit positions
 */
@FunctionalInterface
public interface BigSwapper {
    /**
     * Swap elements at 64-bit positions a and b
     * @param a first 64-bit position
     * @param b second 64-bit position
     */
    void swap(long a, long b);
}

Big Array Operations

Static utility methods for operations on big arrays (arrays-of-arrays supporting 64-bit indexing).

/**
 * Utility methods for big array operations
 */
public class ObjectBigArrays {
    /**
     * Create a new big array with specified length
     * @param length the 64-bit length
     * @return new big array
     */
    public static <K> K[][] newBigArray(long length);
    
    /**
     * Get element from big array at 64-bit index
     * @param array the big array
     * @param index the 64-bit index
     * @return the element at index
     */
    public static <K> K get(K[][] array, long index);
    
    /**
     * Set element in big array at 64-bit index
     * @param array the big array
     * @param index the 64-bit index
     * @param value the value to set
     */
    public static <K> void set(K[][] array, long index, K value);
    
    /**
     * Get the length of a big array
     * @param array the big array
     * @return the 64-bit length
     */
    public static <K> long length(K[][] array);
    
    /**
     * Copy elements within big array
     * @param array the big array
     * @param srcPos source 64-bit position
     * @param destPos destination 64-bit position
     * @param length number of elements to copy
     */
    public static <K> void copy(K[][] array, long srcPos, long destPos, long length);
    
    /**
     * Fill range of big array with value
     * @param array the big array
     * @param from start 64-bit index (inclusive)
     * @param to end 64-bit index (exclusive)
     * @param value the value to fill
     */
    public static <K> void fill(K[][] array, long from, long to, K value);
    
    /**
     * Quick sort big array range
     * @param array the big array to sort
     * @param from start 64-bit index (inclusive)
     * @param to end 64-bit index (exclusive)
     * @param comp comparator for elements
     */
    public static <K> void quickSort(K[][] array, long from, long to, Comparator<K> comp);
}

// Similar utility classes exist for primitive types:
// IntBigArrays, LongBigArrays, DoubleBigArrays, etc.

Usage Examples:

import it.unimi.dsi.fastutil.objects.*;

// Create big array for massive dataset
String[][] bigArray = ObjectBigArrays.newBigArray(5_000_000_000L);

// Set and get elements using 64-bit indices
ObjectBigArrays.set(bigArray, 3_000_000_000L, "Huge index element");
String value = ObjectBigArrays.get(bigArray, 3_000_000_000L);

// Fill a range with default values
ObjectBigArrays.fill(bigArray, 1_000_000_000L, 1_000_001_000L, "default");

// Sort a portion of the big array
ObjectBigArrays.quickSort(bigArray, 0, 1_000_000, String::compareTo);

// Get total length
long totalElements = ObjectBigArrays.length(bigArray);
System.out.println("Big array contains " + totalElements + " elements");

Implementation Classes

Big List Implementations

ObjectBigArrayBigList<K>

  • Resizable big list backed by segmented arrays
  • Supports random access and efficient append operations
  • Memory usage scales smoothly with size

Type-Specific Big Lists

  • IntBigArrayBigList, LongBigArrayBigList, etc. for primitive types
  • Avoid boxing overhead while providing 64-bit indexing

Performance Characteristics

  • Memory Efficiency: Segmented storage avoids huge contiguous allocations
  • Access Time: O(1) random access using segment calculation
  • Insertion/Deletion: O(n) in worst case, but efficient for end operations
  • Iteration: Optimized iterators with efficient skip operations

Usage Guidelines

  1. When to Use Big Collections:

    • Collections expected to exceed 2^31 elements
    • Processing massive datasets from databases or files
    • Scenarios where 32-bit indices are insufficient
  2. Memory Considerations:

    • Each segment can be garbage collected independently
    • More efficient than attempting huge single arrays
    • Monitor memory usage and consider processing in chunks
  3. Performance Tips:

    • Use bulk operations when possible
    • Prefer sequential access patterns for better cache performance
    • Consider parallel processing for large ranges

Install with Tessl CLI

npx tessl i tessl/maven-it-unimi-dsi--fastutil

docs

big-data-structures.md

core-utilities.md

index.md

io-streams.md

priority-queues-stacks.md

type-specific-collections.md

tile.json