or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

big-data-structures.mdcore-utilities.mdindex.mdio-streams.mdpriority-queues-stacks.mdtype-specific-collections.md

big-data-structures.mddocs/

0

# Big Data Structures

1

2

Collections with 64-bit indices supporting datasets larger than 2^31 elements. These structures enable handling of massive datasets that exceed the limitations of standard Java collections, which are constrained by 32-bit integer indices.

3

4

## Overview

5

6

Big data structures in FastUtil are designed to handle collections with more than `Integer.MAX_VALUE` (2,147,483,647) elements. They use 64-bit `long` indices and sizes, enabling collections that can theoretically contain up to 2^63 elements, limited primarily by available memory.

7

8

Key features:

9

- **64-bit Indexing**: Uses `long` indices instead of `int`

10

- **Segmented Storage**: Internal array-of-arrays structure for efficient memory management

11

- **Compatible Interfaces**: Extend standard collection interfaces where possible

12

- **Specialized Operations**: Optimized algorithms for big data operations

13

14

## Capabilities

15

16

### Size64 Interface

17

18

Foundation interface providing 64-bit size information for collections that may exceed `Integer.MAX_VALUE` elements.

19

20

```java { .api }

21

/**

22

* Interface for collections supporting 64-bit sizes

23

*/

24

public interface Size64 {

25

/**

26

* Get the actual size as a long value

27

* @return the number of elements as long

28

*/

29

long size64();

30

31

/**

32

* Get size capped at Integer.MAX_VALUE (deprecated for big collections)

33

* @return size capped at Integer.MAX_VALUE

34

* @deprecated Use size64() for accurate size information

35

*/

36

@Deprecated

37

default int size() {

38

return (int) Math.min(size64(), Integer.MAX_VALUE);

39

}

40

41

/**

42

* Get size as long from any collection

43

* @param c the collection

44

* @return the size as long

45

*/

46

static long sizeOf(Collection<?> c) {

47

return c instanceof Size64 ? ((Size64) c).size64() : c.size();

48

}

49

50

/**

51

* Get size as long from any map

52

* @param m the map

53

* @return the size as long

54

*/

55

static long sizeOf(Map<?,?> m) {

56

return m instanceof Size64 ? ((Size64) m).size64() : m.size();

57

}

58

}

59

```

60

61

**Usage Examples:**

62

63

```java

64

import it.unimi.dsi.fastutil.Size64;

65

import it.unimi.dsi.fastutil.objects.*;

66

67

// Working with potentially big collections

68

ObjectBigList<String> bigList = new ObjectBigArrayBigList<>();

69

// ... add billions of elements ...

70

71

// Safe size checking

72

long actualSize = bigList.size64();

73

if (actualSize > Integer.MAX_VALUE) {

74

System.out.println("Collection has " + actualSize + " elements (exceeds int range)");

75

} else {

76

System.out.println("Collection has " + actualSize + " elements");

77

}

78

79

// Generic size checking

80

long size = Size64.sizeOf(someCollection); // Works with any collection

81

```

82

83

### BigList Interface

84

85

List interface supporting 64-bit indices, enabling lists with more than 2^31 elements.

86

87

```java { .api }

88

/**

89

* List with 64-bit indices supporting massive datasets

90

* @param <K> the type of elements

91

*/

92

public interface BigList<K> extends Collection<K>, Size64 {

93

/**

94

* Get element at 64-bit index

95

* @param index the 64-bit index

96

* @return the element at index

97

* @throws IndexOutOfBoundsException if index is invalid

98

*/

99

K get(long index);

100

101

/**

102

* Set element at 64-bit index

103

* @param index the 64-bit index

104

* @param element the element to set

105

* @return the previous element at index

106

* @throws IndexOutOfBoundsException if index is invalid

107

*/

108

K set(long index, K element);

109

110

/**

111

* Insert element at 64-bit index

112

* @param index the 64-bit index

113

* @param element the element to insert

114

* @throws IndexOutOfBoundsException if index is invalid

115

*/

116

void add(long index, K element);

117

118

/**

119

* Remove element at 64-bit index

120

* @param index the 64-bit index

121

* @return the removed element

122

* @throws IndexOutOfBoundsException if index is invalid

123

*/

124

K remove(long index);

125

126

/**

127

* Find first occurrence of element (returns long index)

128

* @param o the element to find

129

* @return the 64-bit index, or -1 if not found

130

*/

131

long indexOf(Object o);

132

133

/**

134

* Find last occurrence of element (returns long index)

135

* @param o the element to find

136

* @return the 64-bit index, or -1 if not found

137

*/

138

long lastIndexOf(Object o);

139

140

/**

141

* Set the size of the list (truncate or extend with nulls)

142

* @param size the new size

143

*/

144

void size(long size);

145

146

/**

147

* Get big list iterator

148

* @return BigListIterator for this list

149

*/

150

BigListIterator<K> listIterator();

151

152

/**

153

* Get big list iterator starting at 64-bit index

154

* @param index the starting 64-bit index

155

* @return BigListIterator starting at index

156

*/

157

BigListIterator<K> listIterator(long index);

158

159

/**

160

* Get sublist view with 64-bit indices

161

* @param from start index (inclusive)

162

* @param to end index (exclusive)

163

* @return BigList view of the specified range

164

*/

165

BigList<K> subList(long from, long to);

166

167

/**

168

* Get elements as array (limited by array size constraints)

169

* @param a the array to fill (if large enough)

170

* @return array containing the elements

171

*/

172

<T> T[] toArray(T[] a);

173

}

174

```

175

176

**Usage Examples:**

177

178

```java

179

import it.unimi.dsi.fastutil.objects.*;

180

181

// Create a big list that can grow beyond Integer.MAX_VALUE

182

ObjectBigList<String> bigData = new ObjectBigArrayBigList<>();

183

184

// Add billions of elements (example with smaller numbers for demo)

185

for (long i = 0; i < 3_000_000_000L; i++) {

186

bigData.add("Element " + i);

187

if (i % 100_000_000 == 0) {

188

System.out.println("Added " + i + " elements, size: " + bigData.size64());

189

}

190

}

191

192

// Access elements using 64-bit indices

193

String element = bigData.get(2_500_000_000L);

194

bigData.set(2_500_000_000L, "Updated element");

195

196

// Find elements (returns long index)

197

long index = bigData.indexOf("Element 1000000000");

198

if (index >= 0) {

199

System.out.println("Found element at index: " + index);

200

}

201

202

// Sublist operations with big indices

203

BigList<String> subset = bigData.subList(1_000_000_000L, 1_000_001_000L);

204

System.out.println("Subset size: " + subset.size64());

205

```

206

207

### BigListIterator Interface

208

209

Bidirectional iterator for BigList supporting 64-bit index navigation.

210

211

```java { .api }

212

/**

213

* Bidirectional iterator for BigList with 64-bit indices

214

* @param <K> the type of elements

215

*/

216

public interface BigListIterator<K> extends BidirectionalIterator<K> {

217

/**

218

* Get next 64-bit index

219

* @return the next index as long

220

*/

221

long nextIndex();

222

223

/**

224

* Get previous 64-bit index

225

* @return the previous index as long

226

*/

227

long previousIndex();

228

229

/**

230

* Add element at current position

231

* @param k the element to add

232

*/

233

void add(K k);

234

235

/**

236

* Set element at current position

237

* @param k the element to set

238

*/

239

void set(K k);

240

241

/**

242

* Skip forward efficiently

243

* @param n number of elements to skip

244

* @return number of elements actually skipped

245

*/

246

default long skip(long n) {

247

long i = 0;

248

while (i < n && hasNext()) {

249

next();

250

i++;

251

}

252

return i;

253

}

254

255

/**

256

* Skip backward efficiently

257

* @param n number of elements to skip backward

258

* @return number of elements actually skipped

259

*/

260

default long back(long n) {

261

long i = 0;

262

while (i < n && hasPrevious()) {

263

previous();

264

i++;

265

}

266

return i;

267

}

268

}

269

```

270

271

### BigArrays Utility Interface

272

273

Constants and utilities for managing segmented big arrays that support 64-bit indexing.

274

275

```java { .api }

276

/**

277

* Utilities for big arrays (array-of-arrays supporting 64-bit indices)

278

*/

279

public interface BigArrays {

280

/** Size of each segment in big arrays (2^27 = 134,217,728) */

281

long SEGMENT_SIZE = 1L << 27;

282

283

/** Shift amount for segment calculations */

284

int SEGMENT_SHIFT = 27;

285

286

/** Mask for offset within segment */

287

int SEGMENT_MASK = (1 << 27) - 1;

288

289

/**

290

* Calculate segment index from 64-bit index

291

* @param index the 64-bit index

292

* @return the segment index

293

*/

294

static int segment(long index) {

295

return (int) (index >>> SEGMENT_SHIFT);

296

}

297

298

/**

299

* Calculate displacement within segment from 64-bit index

300

* @param index the 64-bit index

301

* @return the offset within segment

302

*/

303

static int displacement(long index) {

304

return (int) (index & SEGMENT_MASK);

305

}

306

307

/**

308

* Calculate 64-bit index from segment and displacement

309

* @param segment the segment index

310

* @param displacement the offset within segment

311

* @return the combined 64-bit index

312

*/

313

static long index(int segment, int displacement) {

314

return ((long) segment << SEGMENT_SHIFT) + displacement;

315

}

316

}

317

```

318

319

**Usage Examples:**

320

321

```java

322

import it.unimi.dsi.fastutil.BigArrays;

323

324

// Understanding big array structure

325

long bigIndex = 200_000_000_000L; // 200 billion

326

327

int segment = BigArrays.segment(bigIndex); // Which segment

328

int displacement = BigArrays.displacement(bigIndex); // Offset within segment

329

330

System.out.println("Index " + bigIndex + " is in segment " + segment +

331

" at displacement " + displacement);

332

333

// Reconstruct index

334

long reconstructed = BigArrays.index(segment, displacement);

335

assert reconstructed == bigIndex;

336

```

337

338

### BigSwapper Interface

339

340

Functional interface for swapping elements at 64-bit positions, used by big sorting algorithms.

341

342

```java { .api }

343

/**

344

* Functional interface for swapping elements at 64-bit positions

345

*/

346

@FunctionalInterface

347

public interface BigSwapper {

348

/**

349

* Swap elements at 64-bit positions a and b

350

* @param a first 64-bit position

351

* @param b second 64-bit position

352

*/

353

void swap(long a, long b);

354

}

355

```

356

357

### Big Array Operations

358

359

Static utility methods for operations on big arrays (arrays-of-arrays supporting 64-bit indexing).

360

361

```java { .api }

362

/**

363

* Utility methods for big array operations

364

*/

365

public class ObjectBigArrays {

366

/**

367

* Create a new big array with specified length

368

* @param length the 64-bit length

369

* @return new big array

370

*/

371

public static <K> K[][] newBigArray(long length);

372

373

/**

374

* Get element from big array at 64-bit index

375

* @param array the big array

376

* @param index the 64-bit index

377

* @return the element at index

378

*/

379

public static <K> K get(K[][] array, long index);

380

381

/**

382

* Set element in big array at 64-bit index

383

* @param array the big array

384

* @param index the 64-bit index

385

* @param value the value to set

386

*/

387

public static <K> void set(K[][] array, long index, K value);

388

389

/**

390

* Get the length of a big array

391

* @param array the big array

392

* @return the 64-bit length

393

*/

394

public static <K> long length(K[][] array);

395

396

/**

397

* Copy elements within big array

398

* @param array the big array

399

* @param srcPos source 64-bit position

400

* @param destPos destination 64-bit position

401

* @param length number of elements to copy

402

*/

403

public static <K> void copy(K[][] array, long srcPos, long destPos, long length);

404

405

/**

406

* Fill range of big array with value

407

* @param array the big array

408

* @param from start 64-bit index (inclusive)

409

* @param to end 64-bit index (exclusive)

410

* @param value the value to fill

411

*/

412

public static <K> void fill(K[][] array, long from, long to, K value);

413

414

/**

415

* Quick sort big array range

416

* @param array the big array to sort

417

* @param from start 64-bit index (inclusive)

418

* @param to end 64-bit index (exclusive)

419

* @param comp comparator for elements

420

*/

421

public static <K> void quickSort(K[][] array, long from, long to, Comparator<K> comp);

422

}

423

424

// Similar utility classes exist for primitive types:

425

// IntBigArrays, LongBigArrays, DoubleBigArrays, etc.

426

```

427

428

**Usage Examples:**

429

430

```java

431

import it.unimi.dsi.fastutil.objects.*;

432

433

// Create big array for massive dataset

434

String[][] bigArray = ObjectBigArrays.newBigArray(5_000_000_000L);

435

436

// Set and get elements using 64-bit indices

437

ObjectBigArrays.set(bigArray, 3_000_000_000L, "Huge index element");

438

String value = ObjectBigArrays.get(bigArray, 3_000_000_000L);

439

440

// Fill a range with default values

441

ObjectBigArrays.fill(bigArray, 1_000_000_000L, 1_000_001_000L, "default");

442

443

// Sort a portion of the big array

444

ObjectBigArrays.quickSort(bigArray, 0, 1_000_000, String::compareTo);

445

446

// Get total length

447

long totalElements = ObjectBigArrays.length(bigArray);

448

System.out.println("Big array contains " + totalElements + " elements");

449

```

450

451

## Implementation Classes

452

453

### Big List Implementations

454

455

**ObjectBigArrayBigList<K>**

456

- Resizable big list backed by segmented arrays

457

- Supports random access and efficient append operations

458

- Memory usage scales smoothly with size

459

460

**Type-Specific Big Lists**

461

- **`IntBigArrayBigList`**, **`LongBigArrayBigList`**, etc. for primitive types

462

- Avoid boxing overhead while providing 64-bit indexing

463

464

### Performance Characteristics

465

466

- **Memory Efficiency**: Segmented storage avoids huge contiguous allocations

467

- **Access Time**: O(1) random access using segment calculation

468

- **Insertion/Deletion**: O(n) in worst case, but efficient for end operations

469

- **Iteration**: Optimized iterators with efficient skip operations

470

471

### Usage Guidelines

472

473

1. **When to Use Big Collections**:

474

- Collections expected to exceed 2^31 elements

475

- Processing massive datasets from databases or files

476

- Scenarios where 32-bit indices are insufficient

477

478

2. **Memory Considerations**:

479

- Each segment can be garbage collected independently

480

- More efficient than attempting huge single arrays

481

- Monitor memory usage and consider processing in chunks

482

483

3. **Performance Tips**:

484

- Use bulk operations when possible

485

- Prefer sequential access patterns for better cache performance

486

- Consider parallel processing for large ranges