or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

configuration.mdindex.mdmerge-operations.mdoptimization.mdtable-management.mdtable-operations.mdtime-travel.md

time-travel.mddocs/

0

# Time Travel and History

1

2

Version control capabilities for Delta Lake tables including time travel queries, table restoration, and comprehensive history exploration. Supports both version-based and timestamp-based operations for data lineage and recovery.

3

4

## Capabilities

5

6

### Table History

7

8

Explore the complete transaction history of Delta tables.

9

10

```python { .api }

11

class DeltaTable:

12

def history(self, limit: Optional[int] = None) -> DataFrame:

13

"""

14

Get table commit history in reverse chronological order.

15

16

Parameters:

17

- limit: Optional maximum number of commits to return

18

19

Returns:

20

DataFrame with commit history including version, timestamp, operation, etc.

21

"""

22

```

23

24

```scala { .api }

25

class DeltaTable {

26

def history(): DataFrame

27

def history(limit: Int): DataFrame

28

}

29

```

30

31

History DataFrame contains columns:

32

- `version`: Table version number

33

- `timestamp`: Commit timestamp

34

- `operation`: Operation type (WRITE, UPDATE, DELETE, MERGE, etc.)

35

- `operationParameters`: Operation-specific parameters

36

- `readVersion`: Version read during operation

37

- `isBlindAppend`: Whether operation was append-only

38

- `operationMetrics`: Metrics like number of files, rows, etc.

39

40

### Table Restoration

41

42

Restore tables to previous versions or timestamps.

43

44

```python { .api }

45

class DeltaTable:

46

def restoreToVersion(self, version: int) -> DataFrame:

47

"""

48

Restore table to specific version number.

49

50

Parameters:

51

- version: Target version number to restore to

52

53

Returns:

54

DataFrame with restoration metrics

55

"""

56

57

def restoreToTimestamp(self, timestamp: str) -> DataFrame:

58

"""

59

Restore table to specific timestamp.

60

61

Parameters:

62

- timestamp: Target timestamp in format 'yyyy-MM-dd' or 'yyyy-MM-dd HH:mm:ss'

63

64

Returns:

65

DataFrame with restoration metrics

66

"""

67

```

68

69

```scala { .api }

70

class DeltaTable {

71

def restoreToVersion(version: Long): DataFrame

72

def restoreToTimestamp(timestamp: String): DataFrame

73

}

74

```

75

76

### Time Travel Queries

77

78

Query historical versions of tables using SQL syntax.

79

80

```python

81

# Time travel with DataFrame API

82

df = spark.read.format("delta").option("versionAsOf", 5).load("/path/to/table")

83

df = spark.read.format("delta").option("timestampAsOf", "2023-01-01").load("/path/to/table")

84

85

# Time travel with SQL

86

spark.sql("SELECT * FROM delta.`/path/to/table` VERSION AS OF 5")

87

spark.sql("SELECT * FROM delta.`/path/to/table` TIMESTAMP AS OF '2023-01-01'")

88

```

89

90

### Clone Operations

91

92

Create table clones at specific versions or timestamps.

93

94

```python { .api }

95

class DeltaTable:

96

def clone(

97

self,

98

target: str,

99

is_shallow: bool,

100

replace: bool = False,

101

properties: Optional[Dict[str, str]] = None

102

) -> DeltaTable:

103

"""

104

Clone table to destination.

105

106

Parameters:

107

- target: Target path or table name for clone

108

- is_shallow: True for shallow clone, False for deep clone

109

- replace: Whether to replace existing target

110

- properties: Optional table properties to override

111

112

Returns:

113

DeltaTable instance for cloned table

114

"""

115

116

def cloneAtVersion(

117

self,

118

version: int,

119

target: str,

120

is_shallow: bool,

121

replace: bool = False,

122

properties: Optional[Dict[str, str]] = None

123

) -> DeltaTable:

124

"""

125

Clone table at specific version.

126

127

Parameters:

128

- version: Source version to clone from

129

- target: Target path or table name

130

- is_shallow: Clone type (shallow vs deep)

131

- replace: Whether to replace existing target

132

- properties: Optional table properties to override

133

134

Returns:

135

DeltaTable instance for cloned table

136

"""

137

138

def cloneAtTimestamp(

139

self,

140

timestamp: str,

141

target: str,

142

is_shallow: bool,

143

replace: bool = False,

144

properties: Optional[Dict[str, str]] = None

145

) -> DeltaTable:

146

"""

147

Clone table at specific timestamp.

148

149

Parameters:

150

- timestamp: Source timestamp to clone from

151

- target: Target path or table name

152

- is_shallow: Clone type (shallow vs deep)

153

- replace: Whether to replace existing target

154

- properties: Optional table properties to override

155

156

Returns:

157

DeltaTable instance for cloned table

158

"""

159

```

160

161

```scala { .api }

162

class DeltaTable {

163

def clone(target: String, isShallow: Boolean): DeltaTable

164

def clone(target: String, isShallow: Boolean, replace: Boolean): DeltaTable

165

def clone(

166

target: String,

167

isShallow: Boolean,

168

replace: Boolean,

169

properties: Map[String, String]

170

): DeltaTable

171

172

def cloneAtVersion(version: Long, target: String, isShallow: Boolean): DeltaTable

173

def cloneAtVersion(

174

version: Long,

175

target: String,

176

isShallow: Boolean,

177

replace: Boolean

178

): DeltaTable

179

def cloneAtVersion(

180

version: Long,

181

target: String,

182

isShallow: Boolean,

183

replace: Boolean,

184

properties: Map[String, String]

185

): DeltaTable

186

187

def cloneAtTimestamp(timestamp: String, target: String, isShallow: Boolean): DeltaTable

188

def cloneAtTimestamp(

189

timestamp: String,

190

target: String,

191

isShallow: Boolean,

192

replace: Boolean

193

): DeltaTable

194

def cloneAtTimestamp(

195

timestamp: String,

196

target: String,

197

isShallow: Boolean,

198

replace: Boolean,

199

properties: Map[String, String]

200

): DeltaTable

201

}

202

```

203

204

## Usage Examples

205

206

### Exploring Table History

207

208

```python

209

# Get full history

210

history_df = delta_table.history()

211

history_df.select("version", "timestamp", "operation", "operationParameters").show()

212

213

# Get last 10 commits

214

recent_history = delta_table.history(10)

215

recent_history.show()

216

217

# Analyze specific operations

218

history_df.filter(col("operation") == "MERGE").show()

219

```

220

221

### Time Travel Queries

222

223

```python

224

# Query table as it was 5 versions ago

225

old_df = spark.read.format("delta").option("versionAsOf", 5).load("/path/to/table")

226

227

# Query table state from yesterday

228

yesterday_df = spark.read.format("delta").option("timestampAsOf", "2023-12-01").load("/path/to/table")

229

230

# Compare current vs historical data

231

current_count = delta_table.toDF().count()

232

historical_count = spark.read.format("delta").option("versionAsOf", 0).load("/path/to/table").count()

233

print(f"Rows added: {current_count - historical_count}")

234

```

235

236

### Table Restoration

237

238

```python

239

# Restore to specific version (e.g., before bad data was written)

240

restore_metrics = delta_table.restoreToVersion(10)

241

restore_metrics.show()

242

243

# Restore to timestamp (e.g., state from this morning)

244

restore_metrics = delta_table.restoreToTimestamp("2023-12-01 09:00:00")

245

restore_metrics.show()

246

```

247

248

### Table Cloning

249

250

```python

251

# Create shallow clone for testing

252

test_table = delta_table.clone(

253

target="/path/to/test/table",

254

is_shallow=True,

255

replace=True

256

)

257

258

# Clone specific version for analysis

259

analysis_table = delta_table.cloneAtVersion(

260

version=5,

261

target="analysis_db.temp_table",

262

is_shallow=False,

263

properties={"owner": "data_team"}

264

)

265

266

# Clone yesterday's state for comparison

267

comparison_table = delta_table.cloneAtTimestamp(

268

timestamp="2023-12-01",

269

target="/path/to/comparison",

270

is_shallow=True

271

)

272

```

273

274

### Data Lineage Analysis

275

276

```python

277

# Track changes over time

278

history = delta_table.history()

279

280

# Find when specific column was added

281

schema_changes = history.filter(

282

col("operationParameters.newSchema").isNotNull()

283

).select("version", "timestamp", "operationParameters.newSchema")

284

285

# Analyze write patterns

286

write_operations = history.filter(col("operation").isin(["WRITE", "APPEND"]))

287

write_operations.select(

288

"version",

289

"timestamp",

290

col("operationMetrics.numFiles").alias("files_written"),

291

col("operationMetrics.numOutputRows").alias("rows_written")

292

).show()

293

```

294

295

## Clone Types

296

297

### Shallow Clone

298

- Copies only metadata and references to data files

299

- Fast and storage-efficient

300

- Changes to source files affect the clone

301

- Ideal for testing and experimentation

302

303

### Deep Clone

304

- Copies both metadata and data files

305

- Independent of source table

306

- Requires more storage and time

307

- Ideal for production backups and branching

308

309

## Time Travel Limitations

310

311

- History retention controlled by `delta.logRetentionDuration` (default: 30 days)

312

- Data files retained based on `delta.deletedFileRetentionDuration` (default: 7 days)

313

- Cannot time travel beyond available transaction logs

314

- Vacuum operations may remove files needed for time travel

315

- Performance impact for very old versions due to metadata overhead