0
# Time Travel and History
1
2
Version control capabilities for Delta Lake tables including time travel queries, table restoration, and comprehensive history exploration. Supports both version-based and timestamp-based operations for data lineage and recovery.
3
4
## Capabilities
5
6
### Table History
7
8
Explore the complete transaction history of Delta tables.
9
10
```python { .api }
11
class DeltaTable:
12
def history(self, limit: Optional[int] = None) -> DataFrame:
13
"""
14
Get table commit history in reverse chronological order.
15
16
Parameters:
17
- limit: Optional maximum number of commits to return
18
19
Returns:
20
DataFrame with commit history including version, timestamp, operation, etc.
21
"""
22
```
23
24
```scala { .api }
25
class DeltaTable {
26
def history(): DataFrame
27
def history(limit: Int): DataFrame
28
}
29
```
30
31
History DataFrame contains columns:
32
- `version`: Table version number
33
- `timestamp`: Commit timestamp
34
- `operation`: Operation type (WRITE, UPDATE, DELETE, MERGE, etc.)
35
- `operationParameters`: Operation-specific parameters
36
- `readVersion`: Version read during operation
37
- `isBlindAppend`: Whether operation was append-only
38
- `operationMetrics`: Metrics like number of files, rows, etc.
39
40
### Table Restoration
41
42
Restore tables to previous versions or timestamps.
43
44
```python { .api }
45
class DeltaTable:
46
def restoreToVersion(self, version: int) -> DataFrame:
47
"""
48
Restore table to specific version number.
49
50
Parameters:
51
- version: Target version number to restore to
52
53
Returns:
54
DataFrame with restoration metrics
55
"""
56
57
def restoreToTimestamp(self, timestamp: str) -> DataFrame:
58
"""
59
Restore table to specific timestamp.
60
61
Parameters:
62
- timestamp: Target timestamp in format 'yyyy-MM-dd' or 'yyyy-MM-dd HH:mm:ss'
63
64
Returns:
65
DataFrame with restoration metrics
66
"""
67
```
68
69
```scala { .api }
70
class DeltaTable {
71
def restoreToVersion(version: Long): DataFrame
72
def restoreToTimestamp(timestamp: String): DataFrame
73
}
74
```
75
76
### Time Travel Queries
77
78
Query historical versions of tables using SQL syntax.
79
80
```python
81
# Time travel with DataFrame API
82
df = spark.read.format("delta").option("versionAsOf", 5).load("/path/to/table")
83
df = spark.read.format("delta").option("timestampAsOf", "2023-01-01").load("/path/to/table")
84
85
# Time travel with SQL
86
spark.sql("SELECT * FROM delta.`/path/to/table` VERSION AS OF 5")
87
spark.sql("SELECT * FROM delta.`/path/to/table` TIMESTAMP AS OF '2023-01-01'")
88
```
89
90
### Clone Operations
91
92
Create table clones at specific versions or timestamps.
93
94
```python { .api }
95
class DeltaTable:
96
def clone(
97
self,
98
target: str,
99
is_shallow: bool,
100
replace: bool = False,
101
properties: Optional[Dict[str, str]] = None
102
) -> DeltaTable:
103
"""
104
Clone table to destination.
105
106
Parameters:
107
- target: Target path or table name for clone
108
- is_shallow: True for shallow clone, False for deep clone
109
- replace: Whether to replace existing target
110
- properties: Optional table properties to override
111
112
Returns:
113
DeltaTable instance for cloned table
114
"""
115
116
def cloneAtVersion(
117
self,
118
version: int,
119
target: str,
120
is_shallow: bool,
121
replace: bool = False,
122
properties: Optional[Dict[str, str]] = None
123
) -> DeltaTable:
124
"""
125
Clone table at specific version.
126
127
Parameters:
128
- version: Source version to clone from
129
- target: Target path or table name
130
- is_shallow: Clone type (shallow vs deep)
131
- replace: Whether to replace existing target
132
- properties: Optional table properties to override
133
134
Returns:
135
DeltaTable instance for cloned table
136
"""
137
138
def cloneAtTimestamp(
139
self,
140
timestamp: str,
141
target: str,
142
is_shallow: bool,
143
replace: bool = False,
144
properties: Optional[Dict[str, str]] = None
145
) -> DeltaTable:
146
"""
147
Clone table at specific timestamp.
148
149
Parameters:
150
- timestamp: Source timestamp to clone from
151
- target: Target path or table name
152
- is_shallow: Clone type (shallow vs deep)
153
- replace: Whether to replace existing target
154
- properties: Optional table properties to override
155
156
Returns:
157
DeltaTable instance for cloned table
158
"""
159
```
160
161
```scala { .api }
162
class DeltaTable {
163
def clone(target: String, isShallow: Boolean): DeltaTable
164
def clone(target: String, isShallow: Boolean, replace: Boolean): DeltaTable
165
def clone(
166
target: String,
167
isShallow: Boolean,
168
replace: Boolean,
169
properties: Map[String, String]
170
): DeltaTable
171
172
def cloneAtVersion(version: Long, target: String, isShallow: Boolean): DeltaTable
173
def cloneAtVersion(
174
version: Long,
175
target: String,
176
isShallow: Boolean,
177
replace: Boolean
178
): DeltaTable
179
def cloneAtVersion(
180
version: Long,
181
target: String,
182
isShallow: Boolean,
183
replace: Boolean,
184
properties: Map[String, String]
185
): DeltaTable
186
187
def cloneAtTimestamp(timestamp: String, target: String, isShallow: Boolean): DeltaTable
188
def cloneAtTimestamp(
189
timestamp: String,
190
target: String,
191
isShallow: Boolean,
192
replace: Boolean
193
): DeltaTable
194
def cloneAtTimestamp(
195
timestamp: String,
196
target: String,
197
isShallow: Boolean,
198
replace: Boolean,
199
properties: Map[String, String]
200
): DeltaTable
201
}
202
```
203
204
## Usage Examples
205
206
### Exploring Table History
207
208
```python
209
# Get full history
210
history_df = delta_table.history()
211
history_df.select("version", "timestamp", "operation", "operationParameters").show()
212
213
# Get last 10 commits
214
recent_history = delta_table.history(10)
215
recent_history.show()
216
217
# Analyze specific operations
218
history_df.filter(col("operation") == "MERGE").show()
219
```
220
221
### Time Travel Queries
222
223
```python
224
# Query table as it was 5 versions ago
225
old_df = spark.read.format("delta").option("versionAsOf", 5).load("/path/to/table")
226
227
# Query table state from yesterday
228
yesterday_df = spark.read.format("delta").option("timestampAsOf", "2023-12-01").load("/path/to/table")
229
230
# Compare current vs historical data
231
current_count = delta_table.toDF().count()
232
historical_count = spark.read.format("delta").option("versionAsOf", 0).load("/path/to/table").count()
233
print(f"Rows added: {current_count - historical_count}")
234
```
235
236
### Table Restoration
237
238
```python
239
# Restore to specific version (e.g., before bad data was written)
240
restore_metrics = delta_table.restoreToVersion(10)
241
restore_metrics.show()
242
243
# Restore to timestamp (e.g., state from this morning)
244
restore_metrics = delta_table.restoreToTimestamp("2023-12-01 09:00:00")
245
restore_metrics.show()
246
```
247
248
### Table Cloning
249
250
```python
251
# Create shallow clone for testing
252
test_table = delta_table.clone(
253
target="/path/to/test/table",
254
is_shallow=True,
255
replace=True
256
)
257
258
# Clone specific version for analysis
259
analysis_table = delta_table.cloneAtVersion(
260
version=5,
261
target="analysis_db.temp_table",
262
is_shallow=False,
263
properties={"owner": "data_team"}
264
)
265
266
# Clone yesterday's state for comparison
267
comparison_table = delta_table.cloneAtTimestamp(
268
timestamp="2023-12-01",
269
target="/path/to/comparison",
270
is_shallow=True
271
)
272
```
273
274
### Data Lineage Analysis
275
276
```python
277
# Track changes over time
278
history = delta_table.history()
279
280
# Find when specific column was added
281
schema_changes = history.filter(
282
col("operationParameters.newSchema").isNotNull()
283
).select("version", "timestamp", "operationParameters.newSchema")
284
285
# Analyze write patterns
286
write_operations = history.filter(col("operation").isin(["WRITE", "APPEND"]))
287
write_operations.select(
288
"version",
289
"timestamp",
290
col("operationMetrics.numFiles").alias("files_written"),
291
col("operationMetrics.numOutputRows").alias("rows_written")
292
).show()
293
```
294
295
## Clone Types
296
297
### Shallow Clone
298
- Copies only metadata and references to data files
299
- Fast and storage-efficient
300
- Changes to source files affect the clone
301
- Ideal for testing and experimentation
302
303
### Deep Clone
304
- Copies both metadata and data files
305
- Independent of source table
306
- Requires more storage and time
307
- Ideal for production backups and branching
308
309
## Time Travel Limitations
310
311
- History retention controlled by `delta.logRetentionDuration` (default: 30 days)
312
- Data files retained based on `delta.deletedFileRetentionDuration` (default: 7 days)
313
- Cannot time travel beyond available transaction logs
314
- Vacuum operations may remove files needed for time travel
315
- Performance impact for very old versions due to metadata overhead