Tessl Tile for pypi/deeplake@4.3.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

data-access.md data-import-export.md dataset-management.md error-handling.md framework-integration.md index.md query-system.md schema-templates.md storage-system.md type-system.md version-control.md

index.mddocs/

0
# Deep Lake
1

2
Deep Lake is a database for AI powered by a storage format optimized for deep-learning applications. It provides comprehensive dataset management, querying capabilities, and seamless integration with popular ML frameworks, enabling both data storage/retrieval for LLM applications and dataset management for deep learning model training.
3

4
## Package Information
5

6
- **Package Name**: deeplake
7
- **Language**: Python
8
- **Installation**: `pip install deeplake`
9

10
## Core Imports
11

12
```python
13
import deeplake
14
```
15

16
Common type imports:
17

18
```python
19
from deeplake import types
20
from deeplake.types import Image, Text, Embedding, Array
21
```
22

23
Schema template imports:
24

25
```python
26
from deeplake.schemas import TextEmbeddings, COCOImages
27
```
28

29
## Basic Usage
30

31
```python
32
import deeplake
33

34
# Create a new dataset
35
dataset = deeplake.create("./my_dataset")
36

37
# Add columns with types
38
dataset.add_column("images", deeplake.types.Image())
39
dataset.add_column("labels", deeplake.types.Text())
40
dataset.add_column("embeddings", deeplake.types.Embedding(size=768))
41

42
# Append data
43
dataset.append({
44
    "images": "path/to/image.jpg",
45
    "labels": "cat",
46
    "embeddings": [0.1, 0.2, 0.3, ...]  # 768-dimensional vector
47
})
48

49
# Commit changes
50
dataset.commit("Added initial data")
51

52
# Query data using TQL (Tensor Query Language)
53
results = deeplake.query("SELECT * FROM dataset WHERE labels == 'cat'")
54
for row in results:
55
    print(row["labels"].text())
56

57
# Open existing dataset
58
dataset = deeplake.open("./my_dataset")
59
print(f"Dataset has {len(dataset)} rows")
60

61
# Framework integration
62
pytorch_dataloader = dataset.pytorch(transform=my_transform)
63
tensorflow_dataset = dataset.tensorflow()
64
```
65

66
## Architecture
67

68
Deep Lake's architecture centers around datasets as the primary abstraction, with the following key components:
69

70
- **Dataset/DatasetView**: Core data containers supporting CRUD operations, version control, and framework integration
71
- **Column/ColumnView**: Typed columns storing homogeneous data with optional indexing for performance
72
- **Row/RowView**: Individual record access with dictionary-like interfaces
73
- **Schema**: Type definitions and column specifications for data validation
74
- **Type System**: Rich type hierarchy supporting ML data types (Image, Embedding, Video, etc.)
75
- **Storage Layer**: Multi-cloud storage abstraction with built-in compression and lazy loading
76
- **Query Engine**: TQL (Tensor Query Language) for complex data filtering and aggregation
77
- **Version Control**: Git-like branching, tagging, and commit history for dataset evolution
78

79
This design enables Deep Lake to handle data of any size in a serverless manner while maintaining unified access through a single API, supporting all data types (embeddings, audio, text, videos, images, PDFs, annotations) with data versioning and lineage capabilities.
80

81
## Capabilities
82

83
### Dataset Management
84

85
Core functionality for creating, opening, deleting, and copying datasets with support for various storage backends and comprehensive lifecycle management.
86

87
```python { .api }
88
def create(url: str, creds: Optional[Dict[str, str]] = None, token: Optional[str] = None, schema: Optional[Schema] = None) -> Dataset: ...
89
def open(url: str, creds: Optional[Dict[str, str]] = None, token: Optional[str] = None) -> Dataset: ...
90
def open_read_only(url: str, creds: Optional[Dict[str, str]] = None, token: Optional[str] = None) -> ReadOnlyDataset: ...
91
def delete(url: str, creds: Optional[Dict[str, str]] = None, token: Optional[str] = None) -> None: ...
92
def exists(url: str, creds: Optional[Dict[str, str]] = None, token: Optional[str] = None) -> bool: ...
93
def copy(src: str, dst: str, src_creds: Optional[Dict[str, str]] = None, dst_creds: Optional[Dict[str, str]] = None, token: Optional[str] = None) -> None: ...
94
```
95

96
[Dataset Management](./dataset-management.md)
97

98
### Data Access and Manipulation
99

100
Row and column-based data access patterns with comprehensive indexing, slicing, and batch operations for efficient data manipulation.
101

102
```python { .api }
103
class Dataset:
104
    def __getitem__(self, key: Union[int, slice, str]) -> Union[Row, RowRange, Column]: ...
105
    def append(self, data: Dict[str, Any]) -> None: ...
106
    def add_column(self, name: str, dtype: Type) -> None: ...
107
    def remove_column(self, name: str) -> None: ...
108
    
109
class Column:
110
    def __getitem__(self, key: Union[int, slice, List[int]]) -> Any: ...
111
    def __setitem__(self, key: Union[int, slice, List[int]], value: Any) -> None: ...
112
```
113

114
[Data Access](./data-access.md)
115

116
### Query System
117

118
TQL (Tensor Query Language) for complex data filtering, aggregation, and transformation with SQL-like syntax optimized for tensor operations.
119

120
```python { .api }
121
def query(query: str, token: Optional[str] = None, creds: Optional[Dict[str, str]] = None) -> DatasetView: ...
122
def prepare_query(query: str, token: Optional[str] = None, creds: Optional[Dict[str, str]] = None) -> Executor: ...
123
def explain_query(query: str, token: Optional[str] = None, creds: Optional[Dict[str, str]] = None) -> ExplainQueryResult: ...
124

125
class Executor:
126
    def run_single(self, parameters: Dict[str, Any]) -> DatasetView: ...
127
    def run_batch(self, parameters: List[Dict[str, Any]]) -> List[DatasetView]: ...
128
```
129

130
[Query System](./query-system.md)
131

132
### Type System
133

134
Rich type hierarchy supporting all ML data types including images, embeddings, audio, video, geometric data, and custom structures with compression and indexing options.
135

136
```python { .api }
137
class Image:
138
    def __init__(self, dtype: str = "uint8", sample_compression: str = "png"): ...
139

140
class Embedding:
141
    def __init__(self, size: Optional[int] = None, dtype: str = "float32", index_type: Optional[IndexType] = None): ...
142

143
class Text:
144
    def __init__(self, index_type: Optional[TextIndexType] = None): ...
145

146
class Array:
147
    def __init__(self, dtype: DataType, dimensions: Optional[int] = None, shape: Optional[List[int]] = None): ...
148
```
149

150
[Type System](./type-system.md)
151

152
### Version Control
153

154
Git-like version control with branching, tagging, commit history, and merge operations for dataset evolution and collaboration.
155

156
```python { .api }
157
class Dataset:
158
    def commit(self, message: str = "") -> str: ...
159
    def branch(self, name: str) -> Branch: ...
160
    def tag(self, name: str, message: str = "") -> Tag: ...
161
    def push(self) -> None: ...
162
    def pull(self) -> None: ...
163

164
class Branch:
165
    def open(self) -> Dataset: ...
166
    def delete(self) -> None: ...
167
    def rename(self, new_name: str) -> None: ...
168
```
169

170
[Version Control](./version-control.md)
171

172
### Storage System
173

174
Multi-cloud storage abstraction supporting local filesystem, S3, GCS, Azure with built-in compression, encryption, and performance optimization.
175

176
```python { .api }
177
class Reader:
178
    def get(self, path: str) -> bytes: ...
179
    def list(self, path: str = "") -> List[str]: ...
180
    def subdir(self, path: str) -> Reader: ...
181

182
class Writer:
183
    def set(self, path: str, data: bytes) -> None: ...
184
    def remove(self, path: str) -> None: ...
185
    def subdir(self, path: str) -> Writer: ...
186
```
187

188
[Storage System](./storage-system.md)
189

190
### Data Import and Export
191

192
Comprehensive data import/export capabilities supporting various formats including Parquet, CSV, COCO datasets, and custom data ingestion pipelines.
193

194
```python { .api }
195
def from_parquet(url_or_bytes: Union[str, bytes]) -> ReadOnlyDataset: ...
196
def from_csv(url_or_bytes: Union[str, bytes]) -> ReadOnlyDataset: ...
197
def from_coco(images_directory: str, annotation_files: List[str], dest: str, dest_creds: Optional[Dict[str, str]] = None) -> Dataset: ...
198

199
class DatasetView:
200
    def to_csv(self, path: str) -> None: ...
201
```
202

203
[Data Import/Export](./data-import-export.md)
204

205
### Framework Integration
206

207
Seamless integration with PyTorch and TensorFlow for training and inference workflows with optimized data loading and transformation pipelines.
208

209
```python { .api }
210
class DatasetView:
211
    def pytorch(self, transform: Optional[Callable[[Any], Any]] = None) -> Any: ...
212
    def tensorflow(self) -> Any: ...
213
    def batches(self, batch_size: int = 1) -> Iterator[Dict[str, Any]]: ...
214
```
215

216
[Framework Integration](./framework-integration.md)
217

218
### Error Handling
219

220
Comprehensive exception handling for various failure scenarios including authentication, authorization, storage, dataset operations, and data validation with detailed error information for debugging and recovery.
221

222
```python { .api }
223
class AuthenticationError:
224
    """Authentication failed or credentials invalid."""
225

226
class AuthorizationError:
227
    """User lacks permissions for requested operation."""
228

229
class NotFoundError:
230
    """Requested dataset or resource not found."""
231

232
class StorageAccessDenied:
233
    """Access denied to storage location."""
234

235
class BranchExistsError:
236
    """Branch with given name already exists."""
237

238
class ColumnAlreadyExistsError:
239
    """Column with given name already exists."""
240
```
241

242
[Error Handling](./error-handling.md)
243

244
### Schema Templates
245

246
Pre-defined schema templates for common ML use cases including text embeddings, COCO datasets, and custom schema creation patterns.
247

248
```python { .api }
249
class TextEmbeddings:
250
    def __init__(self, embedding_size: int, quantize: bool = False): ...
251

252
class COCOImages:
253
    def __init__(self, embedding_size: int, quantize: bool = False, objects: bool = True, keypoints: bool = False, stuffs: bool = False): ...
254
```
255

256
[Schema Templates](./schema-templates.md)
257

258
### Client and Configuration
259

260
Client management, telemetry, and configuration utilities for Deep Lake integration and monitoring.
261

262
```python { .api }
263
class Client:
264
    """Deep Lake client for dataset operations and authentication."""
265

266
class TelemetryClient:
267
    """Telemetry client for usage tracking and analytics."""
268

269
def client() -> Client:
270
    """Get current Deep Lake client instance."""
271

272
def telemetry_client() -> TelemetryClient:
273
    """Get current telemetry client instance."""
274

275
def disconnect() -> None:
276
    """Disconnect from Deep Lake services."""
277
```
278

279
### Utilities and Helpers
280

281
Utility functions and helper classes for data generation, caching, and system optimization.
282

283
```python { .api }
284
class Random:
285
    """Random data generation utilities."""
286

287
def random() -> Random:
288
    """Get random data generator instance."""
289

290
def _create_global_cache() -> None:
291
    """Create global cache for performance optimization."""
292

293
def __prepare_atfork() -> None:
294
    """Prepare Deep Lake for fork-based multiprocessing."""
295
```
296

297
## Types
298

299
### Core Dataset Classes
300

301
```python { .api }
302
class Dataset:
303
    """Primary mutable dataset class for read-write operations."""
304
    name: str
305
    description: str
306
    metadata: Metadata
307
    schema: Schema
308
    version: Version
309
    history: History
310
    branches: Branches
311
    tags: Tags
312
    
313
class ReadOnlyDataset:
314
    """Read-only dataset access."""
315
    name: str
316
    description: str
317
    metadata: ReadOnlyMetadata
318
    schema: SchemaView
319
    version: Version
320
    history: History
321
    branches: BranchesView
322
    tags: TagsView
323

324
class DatasetView:
325
    """Query result view of dataset."""
326
    schema: SchemaView
327
```
328

329
### Schema Classes
330

331
```python { .api }
332
class Schema:
333
    """Dataset schema management."""
334
    columns: List[ColumnDefinition]
335
    
336
class ColumnDefinition:
337
    """Column schema information."""
338
    name: str
339
    dtype: Type
340
```
341

342
### Version Control Classes
343

344
```python { .api }
345
class Version:
346
    """Single version information."""
347
    id: str
348
    message: str
349
    timestamp: str
350
    client_timestamp: str
351

352
class Branch:
353
    """Dataset branch management."""
354
    id: str
355
    name: str
356
    timestamp: str
357
    base: str
358

359
class Tag:
360
    """Dataset tag management."""
361
    id: str
362
    name: str
363
    message: str
364
    version: str
365
    timestamp: str
366
```
367

368
### Async Classes
369

370
```python { .api }
371
class Future[T]:
372
    """Asynchronous operation result."""
373
    def result(self) -> T: ...
374
    def is_completed(self) -> bool: ...
375
    def cancel(self) -> bool: ...
376

377
class FutureVoid:
378
    """Asynchronous void operation."""
379
    def wait(self) -> None: ...
380
    def is_completed(self) -> bool: ...
381
    def cancel(self) -> bool: ...
382
```

Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/