Native Delta Lake Python binding based on delta-rs with Pandas integration
npx @tessl/cli install tessl/pypi-deltalake@1.1.00
# Deltalake
1
2
Native Delta Lake Python binding based on delta-rs with Pandas integration. Provides high-performance operations on Delta Lake tables with seamless integration to the Python data ecosystem, enabling ACID transactions, time travel queries, schema evolution, and efficient data storage with multiple backend support.
3
4
## Package Information
5
6
- **Package Name**: deltalake
7
- **Language**: Python
8
- **Installation**: `pip install deltalake`
9
- **Optional Dependencies**: `pip install deltalake[pandas,pyarrow]`
10
11
## Version Information
12
13
```python { .api }
14
__version__: str # Python package version
15
rust_core_version: str # Underlying Rust core version
16
```
17
18
Access package version information to check compatibility and track installed versions.
19
20
## Core Imports
21
22
```python
23
from deltalake import DeltaTable
24
```
25
26
Complete import for all functionality:
27
28
```python
29
from deltalake import (
30
DeltaTable,
31
Metadata,
32
write_deltalake,
33
convert_to_deltalake,
34
QueryBuilder,
35
Schema,
36
Field,
37
DataType,
38
WriterProperties,
39
BloomFilterProperties,
40
ColumnProperties,
41
CommitProperties,
42
PostCommitHookProperties,
43
TableFeatures,
44
Transaction,
45
__version__,
46
rust_core_version
47
)
48
```
49
50
## Basic Usage
51
52
```python
53
from deltalake import DeltaTable, write_deltalake
54
import pandas as pd
55
56
# Reading a Delta table
57
dt = DeltaTable("path/to/delta-table")
58
df = dt.to_pandas()
59
print(f"Table has {dt.version()} versions with {len(dt.files())} files")
60
61
# Writing data to Delta table
62
data = pd.DataFrame({
63
'id': [1, 2, 3],
64
'name': ['Alice', 'Bob', 'Charlie'],
65
'age': [25, 30, 35]
66
})
67
68
write_deltalake("path/to/new-table", data)
69
70
# Updating records
71
dt = DeltaTable("path/to/new-table")
72
dt.update(
73
predicate="age < 30",
74
new_values={"name": "Updated Name"}
75
)
76
77
# Time travel
78
dt.load_as_version(0) # Load first version
79
older_df = dt.to_pandas()
80
```
81
82
## Architecture
83
84
Delta Lake provides ACID transactions on top of object storage through a transaction log that tracks all changes to table metadata and data files. The deltalake package exposes this functionality through several key components:
85
86
- **DeltaTable**: Main interface for reading, writing, and managing Delta tables
87
- **Transaction System**: ACID guarantees with optimistic concurrency control
88
- **Schema Evolution**: Support for adding/modifying columns while maintaining compatibility
89
- **Time Travel**: Access to historical versions of data
90
- **Storage Backends**: Support for local filesystem, S3, Azure Blob Storage, and Google Cloud Storage
91
92
The Rust-based core (delta-rs) provides high-performance operations while the Python binding offers seamless integration with pandas, PyArrow, and the broader Python data ecosystem.
93
94
## Capabilities
95
96
### Table Operations
97
98
Core table management including creation, reading, and metadata access. The DeltaTable class provides the primary interface for interacting with Delta Lake tables.
99
100
```python { .api }
101
class DeltaTable:
102
def __init__(
103
self,
104
table_uri: str | Path,
105
version: int | None = None,
106
storage_options: dict[str, str] | None = None,
107
without_files: bool = False,
108
log_buffer_size: int | None = None
109
): ...
110
111
@classmethod
112
def create(
113
cls,
114
table_uri: str | Path,
115
schema: Schema,
116
mode: Literal["error", "append", "overwrite", "ignore"] = "error",
117
partition_by: list[str] | str | None = None,
118
storage_options: dict[str, str] | None = None
119
) -> DeltaTable: ...
120
121
@staticmethod
122
def is_deltatable(table_uri: str, storage_options: dict[str, str] | None = None) -> bool: ...
123
```
124
125
[Table Operations](./table-operations.md)
126
127
### Data Reading and Conversion
128
129
Converting Delta tables to various formats including pandas DataFrames, PyArrow tables, and streaming readers for efficient data processing.
130
131
```python { .api }
132
def to_pandas(
133
self,
134
columns: list[str] | None = None,
135
filesystem: Any | None = None
136
) -> pd.DataFrame: ...
137
138
def to_pyarrow_table(
139
self,
140
columns: list[str] | None = None,
141
filesystem: Any | None = None
142
) -> pyarrow.Table: ...
143
144
def to_pyarrow_dataset(
145
self,
146
partitions: list[tuple[str, str, Any]] | None = None,
147
filesystem: Any | None = None
148
) -> pyarrow.dataset.Dataset: ...
149
```
150
151
[Data Reading](./data-reading.md)
152
153
### Writing and Data Modification
154
155
Functions for writing data to Delta tables and modifying existing records through update, delete, and merge operations.
156
157
```python { .api }
158
def write_deltalake(
159
table_or_uri: str | Path | DeltaTable,
160
data: Any,
161
*,
162
partition_by: list[str] | str | None = None,
163
mode: Literal["error", "append", "overwrite", "ignore"] = "error",
164
schema_mode: Literal["merge", "overwrite"] | None = None,
165
storage_options: dict[str, str] | None = None,
166
writer_properties: WriterProperties | None = None
167
) -> None: ...
168
169
def update(
170
self,
171
updates: dict[str, str] | None = None,
172
new_values: dict[str, Any] | None = None,
173
predicate: str | None = None,
174
writer_properties: WriterProperties | None = None
175
) -> dict[str, Any]: ...
176
177
def delete(self, predicate: str | None = None) -> dict[str, Any]: ...
178
```
179
180
[Writing and Modification](./writing-modification.md)
181
182
### Schema Management
183
184
Schema definition, evolution, and type system for Delta Lake tables including field definitions and data types.
185
186
```python { .api }
187
class Schema:
188
def __init__(self, fields: list[Field]): ...
189
@property
190
def fields(self) -> list[Field]: ...
191
192
class Field:
193
def __init__(self, name: str, data_type: DataType, nullable: bool = True, metadata: dict | None = None): ...
194
@property
195
def name(self) -> str: ...
196
@property
197
def data_type(self) -> DataType: ...
198
199
# Data types: PrimitiveType, ArrayType, MapType, StructType
200
DataType = Union[PrimitiveType, ArrayType, MapType, StructType]
201
```
202
203
[Schema Management](./schema-management.md)
204
205
### Transaction Management
206
207
Transaction properties, commit configurations, and ACID transaction control for ensuring data consistency.
208
209
```python { .api }
210
class CommitProperties:
211
def __init__(
212
self,
213
max_retry_commit_attempts: int | None = None,
214
app_metadata: dict[str, Any] | None = None
215
): ...
216
217
class PostCommitHookProperties:
218
def __init__(
219
self,
220
create_checkpoint: bool = True,
221
cleanup_expired_logs: bool | None = None
222
): ...
223
224
class Transaction:
225
def commit(
226
self,
227
actions: list[Any],
228
commit_properties: CommitProperties | None = None,
229
post_commit_hook_properties: PostCommitHookProperties | None = None
230
) -> int: ...
231
```
232
233
[Transaction Management](./transaction-management.md)
234
235
### Query Operations
236
237
SQL querying capabilities using Apache DataFusion integration for running analytical queries on Delta tables.
238
239
```python { .api }
240
class QueryBuilder:
241
def __init__(self): ...
242
def register(self, table_name: str, delta_table: DeltaTable) -> QueryBuilder: ...
243
def execute(self, sql: str) -> RecordBatchReader: ...
244
```
245
246
[Query Operations](./query-operations.md)
247
248
### Table Maintenance
249
250
Operations for table optimization, vacuum cleanup, and checkpoint management to maintain table performance and storage efficiency.
251
252
```python { .api }
253
def vacuum(
254
self,
255
retention_hours: int | None = None,
256
dry_run: bool = True,
257
enforce_retention_duration: bool = True
258
) -> list[str]: ...
259
260
def create_checkpoint(self) -> None: ...
261
def cleanup_metadata(self) -> None: ...
262
def optimize(self) -> TableOptimizer: ...
263
```
264
265
[Table Maintenance](./table-maintenance.md)