or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

data-reading.mdindex.mdquery-operations.mdschema-management.mdtable-maintenance.mdtable-operations.mdtransaction-management.mdwriting-modification.md

index.mddocs/

0

# Deltalake

1

2

Native Delta Lake Python binding based on delta-rs with Pandas integration. Provides high-performance operations on Delta Lake tables with seamless integration to the Python data ecosystem, enabling ACID transactions, time travel queries, schema evolution, and efficient data storage with multiple backend support.

3

4

## Package Information

5

6

- **Package Name**: deltalake

7

- **Language**: Python

8

- **Installation**: `pip install deltalake`

9

- **Optional Dependencies**: `pip install deltalake[pandas,pyarrow]`

10

11

## Version Information

12

13

```python { .api }

14

__version__: str # Python package version

15

rust_core_version: str # Underlying Rust core version

16

```

17

18

Access package version information to check compatibility and track installed versions.

19

20

## Core Imports

21

22

```python

23

from deltalake import DeltaTable

24

```

25

26

Complete import for all functionality:

27

28

```python

29

from deltalake import (

30

DeltaTable,

31

Metadata,

32

write_deltalake,

33

convert_to_deltalake,

34

QueryBuilder,

35

Schema,

36

Field,

37

DataType,

38

WriterProperties,

39

BloomFilterProperties,

40

ColumnProperties,

41

CommitProperties,

42

PostCommitHookProperties,

43

TableFeatures,

44

Transaction,

45

__version__,

46

rust_core_version

47

)

48

```

49

50

## Basic Usage

51

52

```python

53

from deltalake import DeltaTable, write_deltalake

54

import pandas as pd

55

56

# Reading a Delta table

57

dt = DeltaTable("path/to/delta-table")

58

df = dt.to_pandas()

59

print(f"Table has {dt.version()} versions with {len(dt.files())} files")

60

61

# Writing data to Delta table

62

data = pd.DataFrame({

63

'id': [1, 2, 3],

64

'name': ['Alice', 'Bob', 'Charlie'],

65

'age': [25, 30, 35]

66

})

67

68

write_deltalake("path/to/new-table", data)

69

70

# Updating records

71

dt = DeltaTable("path/to/new-table")

72

dt.update(

73

predicate="age < 30",

74

new_values={"name": "Updated Name"}

75

)

76

77

# Time travel

78

dt.load_as_version(0) # Load first version

79

older_df = dt.to_pandas()

80

```

81

82

## Architecture

83

84

Delta Lake provides ACID transactions on top of object storage through a transaction log that tracks all changes to table metadata and data files. The deltalake package exposes this functionality through several key components:

85

86

- **DeltaTable**: Main interface for reading, writing, and managing Delta tables

87

- **Transaction System**: ACID guarantees with optimistic concurrency control

88

- **Schema Evolution**: Support for adding/modifying columns while maintaining compatibility

89

- **Time Travel**: Access to historical versions of data

90

- **Storage Backends**: Support for local filesystem, S3, Azure Blob Storage, and Google Cloud Storage

91

92

The Rust-based core (delta-rs) provides high-performance operations while the Python binding offers seamless integration with pandas, PyArrow, and the broader Python data ecosystem.

93

94

## Capabilities

95

96

### Table Operations

97

98

Core table management including creation, reading, and metadata access. The DeltaTable class provides the primary interface for interacting with Delta Lake tables.

99

100

```python { .api }

101

class DeltaTable:

102

def __init__(

103

self,

104

table_uri: str | Path,

105

version: int | None = None,

106

storage_options: dict[str, str] | None = None,

107

without_files: bool = False,

108

log_buffer_size: int | None = None

109

): ...

110

111

@classmethod

112

def create(

113

cls,

114

table_uri: str | Path,

115

schema: Schema,

116

mode: Literal["error", "append", "overwrite", "ignore"] = "error",

117

partition_by: list[str] | str | None = None,

118

storage_options: dict[str, str] | None = None

119

) -> DeltaTable: ...

120

121

@staticmethod

122

def is_deltatable(table_uri: str, storage_options: dict[str, str] | None = None) -> bool: ...

123

```

124

125

[Table Operations](./table-operations.md)

126

127

### Data Reading and Conversion

128

129

Converting Delta tables to various formats including pandas DataFrames, PyArrow tables, and streaming readers for efficient data processing.

130

131

```python { .api }

132

def to_pandas(

133

self,

134

columns: list[str] | None = None,

135

filesystem: Any | None = None

136

) -> pd.DataFrame: ...

137

138

def to_pyarrow_table(

139

self,

140

columns: list[str] | None = None,

141

filesystem: Any | None = None

142

) -> pyarrow.Table: ...

143

144

def to_pyarrow_dataset(

145

self,

146

partitions: list[tuple[str, str, Any]] | None = None,

147

filesystem: Any | None = None

148

) -> pyarrow.dataset.Dataset: ...

149

```

150

151

[Data Reading](./data-reading.md)

152

153

### Writing and Data Modification

154

155

Functions for writing data to Delta tables and modifying existing records through update, delete, and merge operations.

156

157

```python { .api }

158

def write_deltalake(

159

table_or_uri: str | Path | DeltaTable,

160

data: Any,

161

*,

162

partition_by: list[str] | str | None = None,

163

mode: Literal["error", "append", "overwrite", "ignore"] = "error",

164

schema_mode: Literal["merge", "overwrite"] | None = None,

165

storage_options: dict[str, str] | None = None,

166

writer_properties: WriterProperties | None = None

167

) -> None: ...

168

169

def update(

170

self,

171

updates: dict[str, str] | None = None,

172

new_values: dict[str, Any] | None = None,

173

predicate: str | None = None,

174

writer_properties: WriterProperties | None = None

175

) -> dict[str, Any]: ...

176

177

def delete(self, predicate: str | None = None) -> dict[str, Any]: ...

178

```

179

180

[Writing and Modification](./writing-modification.md)

181

182

### Schema Management

183

184

Schema definition, evolution, and type system for Delta Lake tables including field definitions and data types.

185

186

```python { .api }

187

class Schema:

188

def __init__(self, fields: list[Field]): ...

189

@property

190

def fields(self) -> list[Field]: ...

191

192

class Field:

193

def __init__(self, name: str, data_type: DataType, nullable: bool = True, metadata: dict | None = None): ...

194

@property

195

def name(self) -> str: ...

196

@property

197

def data_type(self) -> DataType: ...

198

199

# Data types: PrimitiveType, ArrayType, MapType, StructType

200

DataType = Union[PrimitiveType, ArrayType, MapType, StructType]

201

```

202

203

[Schema Management](./schema-management.md)

204

205

### Transaction Management

206

207

Transaction properties, commit configurations, and ACID transaction control for ensuring data consistency.

208

209

```python { .api }

210

class CommitProperties:

211

def __init__(

212

self,

213

max_retry_commit_attempts: int | None = None,

214

app_metadata: dict[str, Any] | None = None

215

): ...

216

217

class PostCommitHookProperties:

218

def __init__(

219

self,

220

create_checkpoint: bool = True,

221

cleanup_expired_logs: bool | None = None

222

): ...

223

224

class Transaction:

225

def commit(

226

self,

227

actions: list[Any],

228

commit_properties: CommitProperties | None = None,

229

post_commit_hook_properties: PostCommitHookProperties | None = None

230

) -> int: ...

231

```

232

233

[Transaction Management](./transaction-management.md)

234

235

### Query Operations

236

237

SQL querying capabilities using Apache DataFusion integration for running analytical queries on Delta tables.

238

239

```python { .api }

240

class QueryBuilder:

241

def __init__(self): ...

242

def register(self, table_name: str, delta_table: DeltaTable) -> QueryBuilder: ...

243

def execute(self, sql: str) -> RecordBatchReader: ...

244

```

245

246

[Query Operations](./query-operations.md)

247

248

### Table Maintenance

249

250

Operations for table optimization, vacuum cleanup, and checkpoint management to maintain table performance and storage efficiency.

251

252

```python { .api }

253

def vacuum(

254

self,

255

retention_hours: int | None = None,

256

dry_run: bool = True,

257

enforce_retention_duration: bool = True

258

) -> list[str]: ...

259

260

def create_checkpoint(self) -> None: ...

261

def cleanup_metadata(self) -> None: ...

262

def optimize(self) -> TableOptimizer: ...

263

```

264

265

[Table Maintenance](./table-maintenance.md)