0
# Big Data Structures
1
2
Collections with 64-bit indices supporting datasets larger than 2^31 elements. These structures enable handling of massive datasets that exceed the limitations of standard Java collections, which are constrained by 32-bit integer indices.
3
4
## Overview
5
6
Big data structures in FastUtil are designed to handle collections with more than `Integer.MAX_VALUE` (2,147,483,647) elements. They use 64-bit `long` indices and sizes, enabling collections that can theoretically contain up to 2^63 elements, limited primarily by available memory.
7
8
Key features:
9
- **64-bit Indexing**: Uses `long` indices instead of `int`
10
- **Segmented Storage**: Internal array-of-arrays structure for efficient memory management
11
- **Compatible Interfaces**: Extend standard collection interfaces where possible
12
- **Specialized Operations**: Optimized algorithms for big data operations
13
14
## Capabilities
15
16
### Size64 Interface
17
18
Foundation interface providing 64-bit size information for collections that may exceed `Integer.MAX_VALUE` elements.
19
20
```java { .api }
21
/**
22
* Interface for collections supporting 64-bit sizes
23
*/
24
public interface Size64 {
25
/**
26
* Get the actual size as a long value
27
* @return the number of elements as long
28
*/
29
long size64();
30
31
/**
32
* Get size capped at Integer.MAX_VALUE (deprecated for big collections)
33
* @return size capped at Integer.MAX_VALUE
34
* @deprecated Use size64() for accurate size information
35
*/
36
@Deprecated
37
default int size() {
38
return (int) Math.min(size64(), Integer.MAX_VALUE);
39
}
40
41
/**
42
* Get size as long from any collection
43
* @param c the collection
44
* @return the size as long
45
*/
46
static long sizeOf(Collection<?> c) {
47
return c instanceof Size64 ? ((Size64) c).size64() : c.size();
48
}
49
50
/**
51
* Get size as long from any map
52
* @param m the map
53
* @return the size as long
54
*/
55
static long sizeOf(Map<?,?> m) {
56
return m instanceof Size64 ? ((Size64) m).size64() : m.size();
57
}
58
}
59
```
60
61
**Usage Examples:**
62
63
```java
64
import it.unimi.dsi.fastutil.Size64;
65
import it.unimi.dsi.fastutil.objects.*;
66
67
// Working with potentially big collections
68
ObjectBigList<String> bigList = new ObjectBigArrayBigList<>();
69
// ... add billions of elements ...
70
71
// Safe size checking
72
long actualSize = bigList.size64();
73
if (actualSize > Integer.MAX_VALUE) {
74
System.out.println("Collection has " + actualSize + " elements (exceeds int range)");
75
} else {
76
System.out.println("Collection has " + actualSize + " elements");
77
}
78
79
// Generic size checking
80
long size = Size64.sizeOf(someCollection); // Works with any collection
81
```
82
83
### BigList Interface
84
85
List interface supporting 64-bit indices, enabling lists with more than 2^31 elements.
86
87
```java { .api }
88
/**
89
* List with 64-bit indices supporting massive datasets
90
* @param <K> the type of elements
91
*/
92
public interface BigList<K> extends Collection<K>, Size64 {
93
/**
94
* Get element at 64-bit index
95
* @param index the 64-bit index
96
* @return the element at index
97
* @throws IndexOutOfBoundsException if index is invalid
98
*/
99
K get(long index);
100
101
/**
102
* Set element at 64-bit index
103
* @param index the 64-bit index
104
* @param element the element to set
105
* @return the previous element at index
106
* @throws IndexOutOfBoundsException if index is invalid
107
*/
108
K set(long index, K element);
109
110
/**
111
* Insert element at 64-bit index
112
* @param index the 64-bit index
113
* @param element the element to insert
114
* @throws IndexOutOfBoundsException if index is invalid
115
*/
116
void add(long index, K element);
117
118
/**
119
* Remove element at 64-bit index
120
* @param index the 64-bit index
121
* @return the removed element
122
* @throws IndexOutOfBoundsException if index is invalid
123
*/
124
K remove(long index);
125
126
/**
127
* Find first occurrence of element (returns long index)
128
* @param o the element to find
129
* @return the 64-bit index, or -1 if not found
130
*/
131
long indexOf(Object o);
132
133
/**
134
* Find last occurrence of element (returns long index)
135
* @param o the element to find
136
* @return the 64-bit index, or -1 if not found
137
*/
138
long lastIndexOf(Object o);
139
140
/**
141
* Set the size of the list (truncate or extend with nulls)
142
* @param size the new size
143
*/
144
void size(long size);
145
146
/**
147
* Get big list iterator
148
* @return BigListIterator for this list
149
*/
150
BigListIterator<K> listIterator();
151
152
/**
153
* Get big list iterator starting at 64-bit index
154
* @param index the starting 64-bit index
155
* @return BigListIterator starting at index
156
*/
157
BigListIterator<K> listIterator(long index);
158
159
/**
160
* Get sublist view with 64-bit indices
161
* @param from start index (inclusive)
162
* @param to end index (exclusive)
163
* @return BigList view of the specified range
164
*/
165
BigList<K> subList(long from, long to);
166
167
/**
168
* Get elements as array (limited by array size constraints)
169
* @param a the array to fill (if large enough)
170
* @return array containing the elements
171
*/
172
<T> T[] toArray(T[] a);
173
}
174
```
175
176
**Usage Examples:**
177
178
```java
179
import it.unimi.dsi.fastutil.objects.*;
180
181
// Create a big list that can grow beyond Integer.MAX_VALUE
182
ObjectBigList<String> bigData = new ObjectBigArrayBigList<>();
183
184
// Add billions of elements (example with smaller numbers for demo)
185
for (long i = 0; i < 3_000_000_000L; i++) {
186
bigData.add("Element " + i);
187
if (i % 100_000_000 == 0) {
188
System.out.println("Added " + i + " elements, size: " + bigData.size64());
189
}
190
}
191
192
// Access elements using 64-bit indices
193
String element = bigData.get(2_500_000_000L);
194
bigData.set(2_500_000_000L, "Updated element");
195
196
// Find elements (returns long index)
197
long index = bigData.indexOf("Element 1000000000");
198
if (index >= 0) {
199
System.out.println("Found element at index: " + index);
200
}
201
202
// Sublist operations with big indices
203
BigList<String> subset = bigData.subList(1_000_000_000L, 1_000_001_000L);
204
System.out.println("Subset size: " + subset.size64());
205
```
206
207
### BigListIterator Interface
208
209
Bidirectional iterator for BigList supporting 64-bit index navigation.
210
211
```java { .api }
212
/**
213
* Bidirectional iterator for BigList with 64-bit indices
214
* @param <K> the type of elements
215
*/
216
public interface BigListIterator<K> extends BidirectionalIterator<K> {
217
/**
218
* Get next 64-bit index
219
* @return the next index as long
220
*/
221
long nextIndex();
222
223
/**
224
* Get previous 64-bit index
225
* @return the previous index as long
226
*/
227
long previousIndex();
228
229
/**
230
* Add element at current position
231
* @param k the element to add
232
*/
233
void add(K k);
234
235
/**
236
* Set element at current position
237
* @param k the element to set
238
*/
239
void set(K k);
240
241
/**
242
* Skip forward efficiently
243
* @param n number of elements to skip
244
* @return number of elements actually skipped
245
*/
246
default long skip(long n) {
247
long i = 0;
248
while (i < n && hasNext()) {
249
next();
250
i++;
251
}
252
return i;
253
}
254
255
/**
256
* Skip backward efficiently
257
* @param n number of elements to skip backward
258
* @return number of elements actually skipped
259
*/
260
default long back(long n) {
261
long i = 0;
262
while (i < n && hasPrevious()) {
263
previous();
264
i++;
265
}
266
return i;
267
}
268
}
269
```
270
271
### BigArrays Utility Interface
272
273
Constants and utilities for managing segmented big arrays that support 64-bit indexing.
274
275
```java { .api }
276
/**
277
* Utilities for big arrays (array-of-arrays supporting 64-bit indices)
278
*/
279
public interface BigArrays {
280
/** Size of each segment in big arrays (2^27 = 134,217,728) */
281
long SEGMENT_SIZE = 1L << 27;
282
283
/** Shift amount for segment calculations */
284
int SEGMENT_SHIFT = 27;
285
286
/** Mask for offset within segment */
287
int SEGMENT_MASK = (1 << 27) - 1;
288
289
/**
290
* Calculate segment index from 64-bit index
291
* @param index the 64-bit index
292
* @return the segment index
293
*/
294
static int segment(long index) {
295
return (int) (index >>> SEGMENT_SHIFT);
296
}
297
298
/**
299
* Calculate displacement within segment from 64-bit index
300
* @param index the 64-bit index
301
* @return the offset within segment
302
*/
303
static int displacement(long index) {
304
return (int) (index & SEGMENT_MASK);
305
}
306
307
/**
308
* Calculate 64-bit index from segment and displacement
309
* @param segment the segment index
310
* @param displacement the offset within segment
311
* @return the combined 64-bit index
312
*/
313
static long index(int segment, int displacement) {
314
return ((long) segment << SEGMENT_SHIFT) + displacement;
315
}
316
}
317
```
318
319
**Usage Examples:**
320
321
```java
322
import it.unimi.dsi.fastutil.BigArrays;
323
324
// Understanding big array structure
325
long bigIndex = 200_000_000_000L; // 200 billion
326
327
int segment = BigArrays.segment(bigIndex); // Which segment
328
int displacement = BigArrays.displacement(bigIndex); // Offset within segment
329
330
System.out.println("Index " + bigIndex + " is in segment " + segment +
331
" at displacement " + displacement);
332
333
// Reconstruct index
334
long reconstructed = BigArrays.index(segment, displacement);
335
assert reconstructed == bigIndex;
336
```
337
338
### BigSwapper Interface
339
340
Functional interface for swapping elements at 64-bit positions, used by big sorting algorithms.
341
342
```java { .api }
343
/**
344
* Functional interface for swapping elements at 64-bit positions
345
*/
346
@FunctionalInterface
347
public interface BigSwapper {
348
/**
349
* Swap elements at 64-bit positions a and b
350
* @param a first 64-bit position
351
* @param b second 64-bit position
352
*/
353
void swap(long a, long b);
354
}
355
```
356
357
### Big Array Operations
358
359
Static utility methods for operations on big arrays (arrays-of-arrays supporting 64-bit indexing).
360
361
```java { .api }
362
/**
363
* Utility methods for big array operations
364
*/
365
public class ObjectBigArrays {
366
/**
367
* Create a new big array with specified length
368
* @param length the 64-bit length
369
* @return new big array
370
*/
371
public static <K> K[][] newBigArray(long length);
372
373
/**
374
* Get element from big array at 64-bit index
375
* @param array the big array
376
* @param index the 64-bit index
377
* @return the element at index
378
*/
379
public static <K> K get(K[][] array, long index);
380
381
/**
382
* Set element in big array at 64-bit index
383
* @param array the big array
384
* @param index the 64-bit index
385
* @param value the value to set
386
*/
387
public static <K> void set(K[][] array, long index, K value);
388
389
/**
390
* Get the length of a big array
391
* @param array the big array
392
* @return the 64-bit length
393
*/
394
public static <K> long length(K[][] array);
395
396
/**
397
* Copy elements within big array
398
* @param array the big array
399
* @param srcPos source 64-bit position
400
* @param destPos destination 64-bit position
401
* @param length number of elements to copy
402
*/
403
public static <K> void copy(K[][] array, long srcPos, long destPos, long length);
404
405
/**
406
* Fill range of big array with value
407
* @param array the big array
408
* @param from start 64-bit index (inclusive)
409
* @param to end 64-bit index (exclusive)
410
* @param value the value to fill
411
*/
412
public static <K> void fill(K[][] array, long from, long to, K value);
413
414
/**
415
* Quick sort big array range
416
* @param array the big array to sort
417
* @param from start 64-bit index (inclusive)
418
* @param to end 64-bit index (exclusive)
419
* @param comp comparator for elements
420
*/
421
public static <K> void quickSort(K[][] array, long from, long to, Comparator<K> comp);
422
}
423
424
// Similar utility classes exist for primitive types:
425
// IntBigArrays, LongBigArrays, DoubleBigArrays, etc.
426
```
427
428
**Usage Examples:**
429
430
```java
431
import it.unimi.dsi.fastutil.objects.*;
432
433
// Create big array for massive dataset
434
String[][] bigArray = ObjectBigArrays.newBigArray(5_000_000_000L);
435
436
// Set and get elements using 64-bit indices
437
ObjectBigArrays.set(bigArray, 3_000_000_000L, "Huge index element");
438
String value = ObjectBigArrays.get(bigArray, 3_000_000_000L);
439
440
// Fill a range with default values
441
ObjectBigArrays.fill(bigArray, 1_000_000_000L, 1_000_001_000L, "default");
442
443
// Sort a portion of the big array
444
ObjectBigArrays.quickSort(bigArray, 0, 1_000_000, String::compareTo);
445
446
// Get total length
447
long totalElements = ObjectBigArrays.length(bigArray);
448
System.out.println("Big array contains " + totalElements + " elements");
449
```
450
451
## Implementation Classes
452
453
### Big List Implementations
454
455
**ObjectBigArrayBigList<K>**
456
- Resizable big list backed by segmented arrays
457
- Supports random access and efficient append operations
458
- Memory usage scales smoothly with size
459
460
**Type-Specific Big Lists**
461
- **`IntBigArrayBigList`**, **`LongBigArrayBigList`**, etc. for primitive types
462
- Avoid boxing overhead while providing 64-bit indexing
463
464
### Performance Characteristics
465
466
- **Memory Efficiency**: Segmented storage avoids huge contiguous allocations
467
- **Access Time**: O(1) random access using segment calculation
468
- **Insertion/Deletion**: O(n) in worst case, but efficient for end operations
469
- **Iteration**: Optimized iterators with efficient skip operations
470
471
### Usage Guidelines
472
473
1. **When to Use Big Collections**:
474
- Collections expected to exceed 2^31 elements
475
- Processing massive datasets from databases or files
476
- Scenarios where 32-bit indices are insufficient
477
478
2. **Memory Considerations**:
479
- Each segment can be garbage collected independently
480
- More efficient than attempting huge single arrays
481
- Monitor memory usage and consider processing in chunks
482
483
3. **Performance Tips**:
484
- Use bulk operations when possible
485
- Prefer sequential access patterns for better cache performance
486
- Consider parallel processing for large ranges