or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

arctic-connection.mdasync-operations.mdbson-store.mdchunk-store.mddate-utilities.mdindex.mdtick-store.mdversion-store.md

chunk-store.mddocs/

0

# Chunk Store Operations

1

2

Configurable chunked storage for large datasets with custom serialization strategies and date-based chunking. Supports append/update operations, audit trails, and flexible data organization patterns optimized for handling massive datasets that don't fit in memory.

3

4

## Capabilities

5

6

### ChunkStore Class

7

8

Chunked storage system for large datasets with configurable chunking strategies and serialization options.

9

10

```python { .api }

11

class ChunkStore:

12

"""

13

Chunked storage for large datasets with configurable organization.

14

15

Provides flexible data chunking strategies, custom serialization,

16

and efficient append/update operations for datasets that exceed

17

memory capacity or require specific organization patterns.

18

"""

19

```

20

21

### Symbol Management

22

23

Operations for managing chunked data symbols including listing, existence checking, and renaming.

24

25

```python { .api }

26

def list_symbols(self, partial_match=None):

27

"""

28

List available symbols with optional pattern matching.

29

30

Parameters:

31

- partial_match: Partial string to match symbol names

32

33

Returns:

34

List of symbol names in the chunk store

35

"""

36

37

def has_symbol(self, symbol):

38

"""

39

Check if symbol exists in chunk store.

40

41

Parameters:

42

- symbol: Symbol name to check

43

44

Returns:

45

bool: True if symbol exists

46

"""

47

48

def rename(self, from_symbol, to_symbol, audit=None):

49

"""

50

Rename symbol in chunk store.

51

52

Parameters:

53

- from_symbol: Current symbol name

54

- to_symbol: New symbol name

55

- audit: Optional audit metadata for the operation

56

57

Raises:

58

- NoDataFoundException: If source symbol doesn't exist

59

- ArcticException: If target symbol already exists

60

"""

61

62

def delete(self, symbol, chunk_range=None, audit=None):

63

"""

64

Delete symbol or specific chunk range.

65

66

Parameters:

67

- symbol: Symbol name to delete

68

- chunk_range: Specific chunk range to delete (default: all chunks)

69

- audit: Optional audit metadata for the operation

70

71

Raises:

72

- NoDataFoundException: If symbol doesn't exist

73

"""

74

```

75

76

### Read Operations

77

78

Methods for retrieving chunked data with range filtering and chunk iteration.

79

80

```python { .api }

81

def read(self, symbol, chunk_range=None, filter_data=True, **kwargs):

82

"""

83

Read chunked data with optional filtering and range selection.

84

85

Parameters:

86

- symbol: Symbol name to read

87

- chunk_range: Specific chunk range to read (default: all chunks)

88

- filter_data: Apply data filtering during read (default: True)

89

- **kwargs: Additional read parameters for chunker/serializer

90

91

Returns:

92

Reconstructed data object (type depends on serializer)

93

94

Raises:

95

- NoDataFoundException: If symbol or chunk range doesn't exist

96

"""

97

98

def read_metadata(self, symbol):

99

"""

100

Read symbol metadata without loading chunk data.

101

102

Parameters:

103

- symbol: Symbol name

104

105

Returns:

106

dict: Symbol metadata including chunk information

107

108

Raises:

109

- NoDataFoundException: If symbol doesn't exist

110

"""

111

112

def read_audit_log(self, symbol=None):

113

"""

114

Read audit log for chunk operations.

115

116

Parameters:

117

- symbol: Filter by specific symbol (default: all symbols)

118

119

Returns:

120

List of audit log entries with operation details

121

"""

122

```

123

124

### Write Operations

125

126

Methods for storing chunked data with configurable chunking strategies and metadata support.

127

128

```python { .api }

129

def write(self, symbol, item, metadata=None, chunker=None, audit=None, **kwargs):

130

"""

131

Write data using specified chunking strategy.

132

133

Parameters:

134

- symbol: Symbol name to write

135

- item: Data to store (DataFrame, array, or custom object)

136

- metadata: Optional metadata dictionary

137

- chunker: Chunking strategy object (default: DateChunker)

138

- audit: Optional audit metadata for the operation

139

- **kwargs: Additional parameters for chunker/serializer

140

141

Raises:

142

- QuotaExceededException: If write would exceed storage quota

143

- ArcticException: If chunking or serialization fails

144

"""

145

146

def append(self, symbol, item, upsert=False, metadata=None, audit=None, **kwargs):

147

"""

148

Append data to existing symbol or create if doesn't exist.

149

150

Parameters:

151

- symbol: Symbol name

152

- item: Data to append

153

- upsert: Create symbol if doesn't exist (default: False)

154

- metadata: Optional metadata dictionary

155

- audit: Optional audit metadata for the operation

156

- **kwargs: Additional parameters for append operation

157

158

Returns:

159

Operation result information

160

161

Raises:

162

- NoDataFoundException: If symbol doesn't exist and upsert=False

163

- OverlappingDataException: If appended data overlaps existing chunks

164

"""

165

166

def update(self, symbol, item, metadata=None, chunk_range=None,

167

upsert=False, audit=None, **kwargs):

168

"""

169

Update data in specific chunk range.

170

171

Parameters:

172

- symbol: Symbol name

173

- item: Data to update with

174

- metadata: Optional metadata dictionary

175

- chunk_range: Specific chunks to update (default: auto-detect)

176

- upsert: Create symbol if doesn't exist (default: False)

177

- audit: Optional audit metadata for the operation

178

- **kwargs: Additional parameters for update operation

179

180

Returns:

181

Operation result information

182

183

Raises:

184

- NoDataFoundException: If symbol doesn't exist and upsert=False

185

"""

186

187

def write_metadata(self, symbol, metadata):

188

"""

189

Write metadata for symbol without changing data.

190

191

Parameters:

192

- symbol: Symbol name

193

- metadata: Metadata dictionary to write

194

195

Raises:

196

- NoDataFoundException: If symbol doesn't exist

197

"""

198

```

199

200

### Chunk Management

201

202

Methods for managing and iterating over data chunks with range queries.

203

204

```python { .api }

205

def get_chunk_ranges(self, symbol, chunk_range=None, reverse=False):

206

"""

207

Get chunk ranges for symbol.

208

209

Parameters:

210

- symbol: Symbol name

211

- chunk_range: Filter to specific range (default: all chunks)

212

- reverse: Return ranges in reverse order (default: False)

213

214

Returns:

215

List of chunk range objects

216

217

Raises:

218

- NoDataFoundException: If symbol doesn't exist

219

"""

220

221

def iterator(self, symbol, chunk_range=None, **kwargs):

222

"""

223

Create iterator over symbol chunks.

224

225

Parameters:

226

- symbol: Symbol name

227

- chunk_range: Iterate over specific range (default: all chunks)

228

- **kwargs: Additional iterator parameters

229

230

Returns:

231

Generator yielding (chunk_range, data) tuples

232

233

Raises:

234

- NoDataFoundException: If symbol doesn't exist

235

"""

236

237

def reverse_iterator(self, symbol, chunk_range=None, **kwargs):

238

"""

239

Create reverse iterator over symbol chunks.

240

241

Parameters:

242

- symbol: Symbol name

243

- chunk_range: Iterate over specific range (default: all chunks)

244

- **kwargs: Additional iterator parameters

245

246

Returns:

247

Generator yielding (chunk_range, data) tuples in reverse order

248

249

Raises:

250

- NoDataFoundException: If symbol doesn't exist

251

"""

252

```

253

254

### Information and Statistics

255

256

Methods for retrieving detailed information about symbols and storage statistics.

257

258

```python { .api }

259

def get_info(self, symbol):

260

"""

261

Get detailed information about symbol's chunks and storage.

262

263

Parameters:

264

- symbol: Symbol name

265

266

Returns:

267

dict: Comprehensive information including chunk counts, sizes, ranges

268

269

Raises:

270

- NoDataFoundException: If symbol doesn't exist

271

"""

272

273

def stats(self):

274

"""

275

Get chunk store statistics and performance metrics.

276

277

Returns:

278

dict: Statistics including symbol counts, storage usage, chunk distribution

279

"""

280

```

281

282

## Chunking Strategies

283

284

### DateChunker

285

286

Default chunking strategy that organizes data by date ranges.

287

288

```python { .api }

289

class DateChunker:

290

"""

291

Date-based chunking strategy for time series data.

292

293

Automatically partitions data based on date boundaries,

294

enabling efficient date range queries and updates.

295

"""

296

TYPE = 'date'

297

```

298

299

### PassthroughChunker

300

301

Simple chunking strategy that stores data as single chunks.

302

303

```python { .api }

304

class PassthroughChunker:

305

"""

306

Pass-through chunking strategy with no automatic partitioning.

307

308

Stores data as provided without automatic chunking,

309

suitable for data that doesn't benefit from partitioning.

310

"""

311

TYPE = 'passthrough'

312

```

313

314

## Usage Examples

315

316

### Basic Chunked Storage

317

318

```python

319

from arctic import Arctic, CHUNK_STORE

320

from arctic.chunkstore.date_chunker import DateChunker

321

import pandas as pd

322

import numpy as np

323

324

# Setup chunk store

325

arctic_conn = Arctic('mongodb://localhost:27017')

326

arctic_conn.initialize_library('chunks', CHUNK_STORE)

327

chunk_lib = arctic_conn['chunks']

328

329

# Create large dataset

330

dates = pd.date_range('2020-01-01', periods=1000000, freq='min')

331

large_data = pd.DataFrame({

332

'value1': np.random.randn(1000000),

333

'value2': np.random.randn(1000000),

334

'category': np.random.choice(['A', 'B', 'C'], 1000000)

335

}, index=dates)

336

337

# Write with date-based chunking

338

metadata = {'source': 'simulation', 'data_type': 'time_series'}

339

chunk_lib.write('large_dataset', large_data,

340

metadata=metadata,

341

chunker=DateChunker())

342

```

343

344

### Reading and Chunk Iteration

345

346

```python

347

from arctic.date import DateRange

348

from datetime import datetime

349

350

# Read entire dataset (automatically reconstructed from chunks)

351

full_data = chunk_lib.read('large_dataset')

352

print(f"Full dataset shape: {full_data.shape}")

353

354

# Read specific date range

355

jan_range = DateRange(datetime(2020, 1, 1), datetime(2020, 2, 1))

356

jan_data = chunk_lib.read('large_dataset', chunk_range=jan_range)

357

print(f"January data shape: {jan_data.shape}")

358

359

# Iterate over chunks

360

for chunk_range, chunk_data in chunk_lib.iterator('large_dataset'):

361

print(f"Chunk {chunk_range}: {chunk_data.shape}")

362

# Process chunk individually to save memory

363

364

# Get chunk information

365

chunks = chunk_lib.get_chunk_ranges('large_dataset')

366

print(f"Total chunks: {len(chunks)}")

367

for chunk in chunks[:5]: # First 5 chunks

368

print(f"Chunk range: {chunk}")

369

```

370

371

### Append and Update Operations

372

373

```python

374

# Create additional data to append

375

new_dates = pd.date_range('2020-02-01', periods=100000, freq='min')

376

new_data = pd.DataFrame({

377

'value1': np.random.randn(100000),

378

'value2': np.random.randn(100000),

379

'category': np.random.choice(['A', 'B', 'C'], 100000)

380

}, index=new_dates)

381

382

# Append new data

383

chunk_lib.append('large_dataset', new_data,

384

audit={'operation': 'monthly_update', 'user': 'system'})

385

386

# Update specific chunk range

387

update_range = DateRange(datetime(2020, 1, 15), datetime(2020, 1, 16))

388

update_data = chunk_lib.read('large_dataset', chunk_range=update_range)

389

update_data['value1'] *= 1.1 # Apply 10% adjustment

390

391

chunk_lib.update('large_dataset', update_data,

392

chunk_range=update_range,

393

audit={'operation': 'correction', 'reason': 'data_adjustment'})

394

```

395

396

### Symbol Management and Metadata

397

398

```python

399

# List all symbols

400

symbols = chunk_lib.list_symbols()

401

print(f"Available symbols: {symbols}")

402

403

# Check if symbol exists

404

exists = chunk_lib.has_symbol('large_dataset')

405

print(f"Symbol exists: {exists}")

406

407

# Get detailed symbol information

408

info = chunk_lib.get_info('large_dataset')

409

print(f"Symbol info: {info}")

410

411

# Read metadata

412

metadata = chunk_lib.read_metadata('large_dataset')

413

print(f"Metadata: {metadata}")

414

415

# Update metadata

416

chunk_lib.write_metadata('large_dataset', {

417

'source': 'simulation',

418

'data_type': 'time_series',

419

'last_updated': datetime.now().isoformat(),

420

'version': '2.0'

421

})

422

423

# Rename symbol

424

chunk_lib.rename('large_dataset', 'historical_data',

425

audit={'operation': 'rename', 'reason': 'restructuring'})

426

```

427

428

### Audit Trail and Statistics

429

430

```python

431

# Read audit log

432

audit_entries = chunk_lib.read_audit_log('historical_data')

433

for entry in audit_entries[-10:]: # Last 10 entries

434

print(f"{entry['date']}: {entry['operation']}")

435

436

# Get store statistics

437

stats = chunk_lib.stats()

438

print(f"Store statistics: {stats}")

439

440

# Clean up

441

chunk_lib.delete('historical_data',

442

audit={'operation': 'cleanup', 'reason': 'demo_complete'})

443

```

444

445

### Advanced Chunking Strategies

446

447

```python

448

from arctic.chunkstore.passthrough_chunker import PassthroughChunker

449

450

# Use passthrough chunker for non-time-series data

451

static_data = pd.DataFrame({

452

'id': range(10000),

453

'name': [f'item_{i}' for i in range(10000)],

454

'value': np.random.randn(10000)

455

})

456

457

chunk_lib.write('static_reference', static_data,

458

chunker=PassthroughChunker(),

459

metadata={'type': 'reference_data'})

460

461

# Custom chunking parameters

462

chunk_lib.write('custom_chunks', large_data,

463

chunker=DateChunker(),

464

chunk_size='1D', # Daily chunks

465

metadata={'chunking': 'daily'})

466

```