0
# BSON Store Operations
1
2
Raw MongoDB document storage providing full PyMongo interface for direct database operations. Enables custom document structures, aggregation pipelines, and advanced MongoDB features within Arctic's framework for maximum flexibility with unstructured or semi-structured data.
3
4
## Capabilities
5
6
### BSONStore Class
7
8
Direct MongoDB document storage with complete PyMongo interface for custom data structures and advanced operations.
9
10
```python { .api }
11
class BSONStore:
12
"""
13
Raw BSON document storage with full MongoDB interface.
14
15
Provides direct access to MongoDB operations within Arctic's
16
framework, enabling custom document structures, complex queries,
17
and advanced MongoDB features for unstructured data.
18
"""
19
```
20
21
### Document Query Operations
22
23
Standard MongoDB query operations for finding and retrieving documents.
24
25
```python { .api }
26
def find(self, *args, **kwargs):
27
"""
28
Find documents matching criteria (PyMongo interface).
29
30
Parameters:
31
- filter: Query filter dictionary (default: {})
32
- projection: Fields to include/exclude
33
- limit: Maximum number of documents to return
34
- sort: Sort specification
35
- **kwargs: Additional PyMongo find parameters
36
37
Returns:
38
pymongo.Cursor: Cursor over matching documents
39
"""
40
41
def find_one(self, *args, **kwargs):
42
"""
43
Find single document matching criteria.
44
45
Parameters:
46
- filter: Query filter dictionary (default: {})
47
- projection: Fields to include/exclude
48
- **kwargs: Additional PyMongo find_one parameters
49
50
Returns:
51
dict or None: First matching document or None if not found
52
"""
53
54
def count(self, filter, **kwargs):
55
"""
56
Count documents matching filter criteria.
57
58
Parameters:
59
- filter: Query filter dictionary
60
- **kwargs: Additional count parameters
61
62
Returns:
63
int: Number of matching documents
64
"""
65
66
def distinct(self, key, **kwargs):
67
"""
68
Get distinct values for specified field.
69
70
Parameters:
71
- key: Field name to get distinct values for
72
- filter: Optional query filter
73
- **kwargs: Additional distinct parameters
74
75
Returns:
76
List of distinct values
77
"""
78
```
79
80
### Document Write Operations
81
82
Methods for inserting, updating, and replacing documents in the collection.
83
84
```python { .api }
85
def insert_one(self, document, **kwargs):
86
"""
87
Insert single document into collection.
88
89
Parameters:
90
- document: Document dictionary to insert
91
- **kwargs: Additional insert parameters
92
93
Returns:
94
pymongo.results.InsertOneResult: Insert operation result
95
96
Raises:
97
- DuplicateKeyError: If document violates unique constraints
98
"""
99
100
def insert_many(self, documents, **kwargs):
101
"""
102
Insert multiple documents into collection.
103
104
Parameters:
105
- documents: List of document dictionaries to insert
106
- ordered: Whether to stop on first error (default: True)
107
- **kwargs: Additional insert parameters
108
109
Returns:
110
pymongo.results.InsertManyResult: Insert operation result
111
"""
112
113
def update_one(self, filter, update, **kwargs):
114
"""
115
Update single document matching filter.
116
117
Parameters:
118
- filter: Query filter to match document
119
- update: Update operations to apply
120
- upsert: Create document if not found (default: False)
121
- **kwargs: Additional update parameters
122
123
Returns:
124
pymongo.results.UpdateResult: Update operation result
125
"""
126
127
def update_many(self, filter, update, **kwargs):
128
"""
129
Update all documents matching filter.
130
131
Parameters:
132
- filter: Query filter to match documents
133
- update: Update operations to apply
134
- upsert: Create document if not found (default: False)
135
- **kwargs: Additional update parameters
136
137
Returns:
138
pymongo.results.UpdateResult: Update operation result
139
"""
140
141
def replace_one(self, filter, replacement, **kwargs):
142
"""
143
Replace single document matching filter.
144
145
Parameters:
146
- filter: Query filter to match document
147
- replacement: New document to replace with
148
- upsert: Create document if not found (default: False)
149
- **kwargs: Additional replace parameters
150
151
Returns:
152
pymongo.results.UpdateResult: Replace operation result
153
"""
154
```
155
156
### Document Delete Operations
157
158
Methods for removing documents from the collection.
159
160
```python { .api }
161
def delete_one(self, filter, **kwargs):
162
"""
163
Delete single document matching filter.
164
165
Parameters:
166
- filter: Query filter to match document
167
- **kwargs: Additional delete parameters
168
169
Returns:
170
pymongo.results.DeleteResult: Delete operation result
171
"""
172
173
def delete_many(self, filter, **kwargs):
174
"""
175
Delete all documents matching filter.
176
177
Parameters:
178
- filter: Query filter to match documents
179
- **kwargs: Additional delete parameters
180
181
Returns:
182
pymongo.results.DeleteResult: Delete operation result
183
"""
184
```
185
186
### Advanced Query Operations
187
188
Methods for atomic operations and complex document modifications.
189
190
```python { .api }
191
def find_one_and_replace(self, filter, replacement, **kwargs):
192
"""
193
Find document and replace it atomically.
194
195
Parameters:
196
- filter: Query filter to match document
197
- replacement: New document to replace with
198
- return_document: Return original or updated document
199
- upsert: Create document if not found (default: False)
200
- **kwargs: Additional parameters
201
202
Returns:
203
dict or None: Original or updated document based on return_document
204
"""
205
206
def find_one_and_update(self, filter, update, **kwargs):
207
"""
208
Find document and update it atomically.
209
210
Parameters:
211
- filter: Query filter to match document
212
- update: Update operations to apply
213
- return_document: Return original or updated document
214
- upsert: Create document if not found (default: False)
215
- **kwargs: Additional parameters
216
217
Returns:
218
dict or None: Original or updated document based on return_document
219
"""
220
221
def find_one_and_delete(self, filter, **kwargs):
222
"""
223
Find document and delete it atomically.
224
225
Parameters:
226
- filter: Query filter to match document
227
- **kwargs: Additional parameters
228
229
Returns:
230
dict or None: Deleted document or None if not found
231
"""
232
```
233
234
### Bulk Operations
235
236
Methods for executing multiple operations efficiently in batches.
237
238
```python { .api }
239
def bulk_write(self, requests, **kwargs):
240
"""
241
Execute multiple write operations in batch.
242
243
Parameters:
244
- requests: List of write operation objects
245
- ordered: Execute operations in order (default: True)
246
- **kwargs: Additional bulk write parameters
247
248
Returns:
249
pymongo.results.BulkWriteResult: Bulk operation result
250
"""
251
```
252
253
### Aggregation Operations
254
255
Methods for complex data processing and analysis using MongoDB aggregation framework.
256
257
```python { .api }
258
def aggregate(self, pipeline, **kwargs):
259
"""
260
Execute aggregation pipeline for complex data processing.
261
262
Parameters:
263
- pipeline: List of aggregation stage dictionaries
264
- allowDiskUse: Allow disk usage for large operations
265
- **kwargs: Additional aggregation parameters
266
267
Returns:
268
pymongo.CommandCursor: Cursor over aggregation results
269
"""
270
```
271
272
### Index Management
273
274
Methods for creating and managing database indexes for query performance.
275
276
```python { .api }
277
def create_index(self, keys, **kwargs):
278
"""
279
Create database index for improved query performance.
280
281
Parameters:
282
- keys: Index specification (field names and direction)
283
- unique: Create unique index (default: False)
284
- sparse: Create sparse index (default: False)
285
- **kwargs: Additional index parameters
286
287
Returns:
288
str: Index name
289
"""
290
291
def drop_index(self, index_or_name):
292
"""
293
Drop database index.
294
295
Parameters:
296
- index_or_name: Index name or specification to drop
297
"""
298
299
def index_information(self):
300
"""
301
Get information about all indexes on collection.
302
303
Returns:
304
dict: Index information including names, keys, and options
305
"""
306
```
307
308
### Store Management
309
310
Methods for managing the BSON store and getting statistics.
311
312
```python { .api }
313
def enable_sharding(self):
314
"""
315
Enable MongoDB sharding for the collection.
316
317
Enables horizontal scaling across multiple MongoDB instances
318
for handling large document collections.
319
"""
320
321
def stats(self):
322
"""
323
Get BSON store statistics and performance metrics.
324
325
Returns:
326
dict: Collection statistics including document counts, storage size
327
"""
328
```
329
330
## Usage Examples
331
332
### Basic Document Operations
333
334
```python
335
from arctic import Arctic, BSON_STORE
336
from datetime import datetime
337
import pymongo
338
339
# Setup BSON store
340
arctic_conn = Arctic('mongodb://localhost:27017')
341
arctic_conn.initialize_library('documents', BSON_STORE)
342
doc_store = arctic_conn['documents']
343
344
# Insert documents
345
documents = [
346
{
347
'user_id': 'user123',
348
'event': 'login',
349
'timestamp': datetime.now(),
350
'metadata': {'ip': '192.168.1.1', 'device': 'mobile'}
351
},
352
{
353
'user_id': 'user456',
354
'event': 'purchase',
355
'timestamp': datetime.now(),
356
'amount': 99.99,
357
'product_id': 'prod789'
358
}
359
]
360
361
# Insert single document
362
result = doc_store.insert_one(documents[0])
363
print(f"Inserted document ID: {result.inserted_id}")
364
365
# Insert multiple documents
366
result = doc_store.insert_many(documents)
367
print(f"Inserted {len(result.inserted_ids)} documents")
368
```
369
370
### Querying Documents
371
372
```python
373
# Find all documents
374
all_docs = list(doc_store.find())
375
print(f"Total documents: {len(all_docs)}")
376
377
# Find with filter
378
login_events = list(doc_store.find({'event': 'login'}))
379
print(f"Login events: {len(login_events)}")
380
381
# Find with projection (specific fields only)
382
user_events = list(doc_store.find(
383
{'user_id': 'user123'},
384
{'event': 1, 'timestamp': 1, '_id': 0}
385
))
386
387
# Find one document
388
recent_purchase = doc_store.find_one(
389
{'event': 'purchase'},
390
sort=[('timestamp', pymongo.DESCENDING)]
391
)
392
393
# Count documents
394
purchase_count = doc_store.count({'event': 'purchase'})
395
print(f"Total purchases: {purchase_count}")
396
397
# Get distinct values
398
unique_events = doc_store.distinct('event')
399
print(f"Event types: {unique_events}")
400
```
401
402
### Updating Documents
403
404
```python
405
# Update single document
406
doc_store.update_one(
407
{'user_id': 'user123', 'event': 'login'},
408
{'$set': {'processed': True}}
409
)
410
411
# Update multiple documents
412
doc_store.update_many(
413
{'event': 'login'},
414
{'$set': {'category': 'authentication'}}
415
)
416
417
# Upsert (insert if not exists)
418
doc_store.update_one(
419
{'user_id': 'user999', 'event': 'signup'},
420
{
421
'$set': {
422
'timestamp': datetime.now(),
423
'source': 'web'
424
}
425
},
426
upsert=True
427
)
428
429
# Replace entire document
430
doc_store.replace_one(
431
{'user_id': 'user456'},
432
{
433
'user_id': 'user456',
434
'event': 'purchase',
435
'timestamp': datetime.now(),
436
'amount': 149.99,
437
'product_id': 'prod999',
438
'status': 'completed'
439
}
440
)
441
```
442
443
### Advanced Operations
444
445
```python
446
# Atomic find and modify
447
updated_doc = doc_store.find_one_and_update(
448
{'user_id': 'user123'},
449
{'$inc': {'login_count': 1}},
450
return_document=pymongo.ReturnDocument.AFTER,
451
upsert=True
452
)
453
454
# Atomic find and delete
455
deleted_doc = doc_store.find_one_and_delete(
456
{'event': 'temp_event'}
457
)
458
459
# Bulk operations
460
from pymongo import InsertOne, UpdateOne, DeleteOne
461
462
bulk_ops = [
463
InsertOne({'user_id': 'bulk1', 'event': 'test'}),
464
UpdateOne({'user_id': 'bulk1'}, {'$set': {'processed': True}}),
465
DeleteOne({'user_id': 'old_user'})
466
]
467
468
result = doc_store.bulk_write(bulk_ops)
469
print(f"Bulk operations result: {result.bulk_api_result}")
470
```
471
472
### Aggregation Pipeline
473
474
```python
475
# Complex aggregation for analytics
476
pipeline = [
477
# Group by event type and count
478
{
479
'$group': {
480
'_id': '$event',
481
'count': {'$sum': 1},
482
'avg_amount': {'$avg': '$amount'}
483
}
484
},
485
# Sort by count descending
486
{
487
'$sort': {'count': -1}
488
},
489
# Add computed fields
490
{
491
'$addFields': {
492
'percentage': {
493
'$multiply': [
494
{'$divide': ['$count', {'$sum': '$count'}]},
495
100
496
]
497
}
498
}
499
}
500
]
501
502
results = list(doc_store.aggregate(pipeline))
503
for result in results:
504
print(f"Event: {result['_id']}, Count: {result['count']}")
505
506
# Time-based aggregation
507
daily_stats = list(doc_store.aggregate([
508
{
509
'$group': {
510
'_id': {
511
'year': {'$year': '$timestamp'},
512
'month': {'$month': '$timestamp'},
513
'day': {'$dayOfMonth': '$timestamp'}
514
},
515
'events': {'$sum': 1},
516
'revenue': {'$sum': '$amount'}
517
}
518
},
519
{
520
'$sort': {'_id': 1}
521
}
522
]))
523
```
524
525
### Index Management
526
527
```python
528
# Create indexes for better query performance
529
doc_store.create_index('user_id')
530
doc_store.create_index('event')
531
doc_store.create_index([('timestamp', pymongo.DESCENDING)])
532
533
# Create compound index
534
doc_store.create_index([
535
('user_id', pymongo.ASCENDING),
536
('timestamp', pymongo.DESCENDING)
537
])
538
539
# Create unique index
540
doc_store.create_index('transaction_id', unique=True)
541
542
# Get index information
543
indexes = doc_store.index_information()
544
for index_name, index_info in indexes.items():
545
print(f"Index: {index_name}, Keys: {index_info['key']}")
546
547
# Enable sharding for large collections
548
doc_store.enable_sharding()
549
```
550
551
### Complex Queries and Filtering
552
553
```python
554
from datetime import timedelta
555
556
# Date range queries
557
yesterday = datetime.now() - timedelta(days=1)
558
recent_docs = list(doc_store.find({
559
'timestamp': {'$gte': yesterday}
560
}))
561
562
# Complex filters with operators
563
filtered_docs = list(doc_store.find({
564
'$and': [
565
{'event': {'$in': ['purchase', 'refund']}},
566
{'amount': {'$gt': 50}},
567
{'timestamp': {'$gte': yesterday}}
568
]
569
}))
570
571
# Text search (requires text index)
572
doc_store.create_index([('description', 'text')])
573
text_results = list(doc_store.find({
574
'$text': {'$search': 'important event'}
575
}))
576
577
# Regular expression queries
578
pattern_docs = list(doc_store.find({
579
'user_id': {'$regex': '^user[0-9]+$'}
580
}))
581
582
# Array and nested document queries
583
nested_docs = list(doc_store.find({
584
'metadata.device': 'mobile',
585
'tags': {'$in': ['premium', 'vip']}
586
}))
587
```