or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-sqllineage

SQL Lineage Analysis Tool powered by Python

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/sqllineage@1.5.x

To install, run

npx @tessl/cli install tessl/pypi-sqllineage@1.5.0

0

# SQLLineage

1

2

SQLLineage is a Python library that analyzes SQL statements to extract data lineage information, identifying source and target tables without requiring deep knowledge of SQL parsing. It leverages pluggable parser libraries (sqlfluff and sqlparse) to parse SQL commands, analyzes the AST, stores lineage information in a graph using networkx, and provides human-readable results.

3

4

## Package Information

5

6

- **Package Name**: sqllineage

7

- **Language**: Python

8

- **Installation**: `pip install sqllineage`

9

- **Version**: 1.5.4

10

11

## Core Imports

12

13

```python

14

from sqllineage.runner import LineageRunner

15

```

16

17

Common imports for working with models:

18

19

```python

20

from sqllineage.core.models import Table, Column, Schema

21

```

22

23

For metadata providers:

24

25

```python

26

from sqllineage.core.metadata.dummy import DummyMetaDataProvider

27

from sqllineage.core.metadata.sqlalchemy import SQLAlchemyMetaDataProvider

28

```

29

30

## Basic Usage

31

32

```python

33

from sqllineage.runner import LineageRunner

34

35

# Basic table-level lineage analysis

36

sql = """

37

INSERT INTO target_table

38

SELECT a.col1, b.col2

39

FROM source_table_a a

40

JOIN source_table_b b ON a.id = b.id

41

"""

42

43

# Create runner and analyze SQL

44

runner = LineageRunner(sql)

45

46

# Get lineage results

47

print("Source tables:", [str(table) for table in runner.source_tables])

48

print("Target tables:", [str(table) for table in runner.target_tables])

49

50

# Column-level lineage

51

for src_col, tgt_col in runner.get_column_lineage():

52

print(f"{src_col} -> {tgt_col}")

53

54

# Launch web visualization

55

runner.draw()

56

```

57

58

## Architecture

59

60

SQLLineage uses a pluggable parser architecture with configurable metadata providers:

61

62

- **LineageRunner**: Main entry point that orchestrates parsing and analysis

63

- **Parser Layer**: Pluggable SQL parser backends (sqlfluff for modern dialects, sqlparse for legacy)

64

- **Model Layer**: Data classes representing tables, columns, schemas, and relationships

65

- **Metadata Layer**: Pluggable providers for schema information (dummy, SQLAlchemy, custom)

66

- **Visualization Layer**: Web interface and export formats for lineage graphs

67

68

This design enables support for 20+ SQL dialects while providing both programmatic and interactive interfaces for data lineage analysis.

69

70

## Capabilities

71

72

### SQL Analysis and Lineage Runner

73

74

Core functionality for analyzing SQL statements and extracting lineage information. The LineageRunner class provides the main programmatic interface with support for multiple SQL dialects, metadata integration, and various output formats.

75

76

```python { .api }

77

class LineageRunner:

78

def __init__(

79

self,

80

sql: str,

81

dialect: str = "ansi",

82

metadata_provider: MetaDataProvider = DummyMetaDataProvider(),

83

verbose: bool = False,

84

silent_mode: bool = False,

85

draw_options: Optional[Dict[str, Any]] = None,

86

file_path: str = "."

87

): ...

88

89

@property

90

def source_tables(self) -> List[Table]: ...

91

92

@property

93

def target_tables(self) -> List[Table]: ...

94

95

@property

96

def intermediate_tables(self) -> List[Table]: ...

97

98

def get_column_lineage(

99

self,

100

exclude_path_ending_in_subquery: bool = True,

101

exclude_subquery_columns: bool = False

102

) -> List[Tuple[Column, Column]]: ...

103

104

def print_column_lineage(self) -> None: ...

105

106

def print_table_lineage(self) -> None: ...

107

108

def statements(self) -> List[str]: ...

109

110

def to_cytoscape(self, level: LineageLevel = LineageLevel.TABLE) -> List[Dict[str, Dict[str, str]]]: ...

111

112

@staticmethod

113

def supported_dialects() -> Dict[str, List[str]]: ...

114

115

def draw(self) -> None: ...

116

```

117

118

[SQL Analysis and Lineage Runner](./lineage-runner.md)

119

120

### Data Models

121

122

Core data classes representing SQL entities like tables, columns, schemas, and subqueries. These models provide the foundation for lineage analysis and include support for complex SQL constructs like CTEs, subqueries, and cross-schema references.

123

124

```python { .api }

125

class Table:

126

def __init__(self, name: str, schema: Schema = Schema(), **kwargs): ...

127

128

class Column:

129

def __init__(self, name: str, **kwargs): ...

130

131

class Schema:

132

def __init__(self, name: Optional[str] = None): ...

133

134

class SubQuery:

135

def __init__(self, subquery: Any, subquery_raw: str, alias: Optional[str]): ...

136

```

137

138

[Data Models](./data-models.md)

139

140

### Metadata Providers

141

142

Pluggable interfaces for providing schema and table metadata to enhance lineage analysis. Supports both simple dictionary-based metadata and database introspection via SQLAlchemy.

143

144

```python { .api }

145

class MetaDataProvider:

146

def get_table_columns(self, table: Table, **kwargs) -> List[Column]: ...

147

148

class DummyMetaDataProvider(MetaDataProvider):

149

def __init__(self, metadata: Optional[Dict[str, List[str]]] = None): ...

150

151

class SQLAlchemyMetaDataProvider(MetaDataProvider):

152

def __init__(self, url: str, engine_kwargs: Optional[Dict[str, Any]] = None): ...

153

```

154

155

[Metadata Providers](./metadata-providers.md)

156

157

### Configuration Management

158

159

Thread-safe configuration system for customizing SQLLineage behavior including default schemas, dialect-specific parsing options, and integration settings.

160

161

```python { .api }

162

class _SQLLineageConfigLoader:

163

def __call__(self, **kwargs) -> "_SQLLineageConfigLoader": ...

164

def __enter__(self) -> None: ...

165

def __exit__(self, exc_type, exc_val, exc_tb) -> None: ...

166

167

# Global configuration instance

168

SQLLineageConfig = _SQLLineageConfigLoader()

169

```

170

171

[Configuration Management](./configuration.md)

172

173

### CLI Interface

174

175

Command-line interface for analyzing SQL files and generating lineage reports, with options for different output formats and visualization modes.

176

177

```python { .api }

178

def main(args=None) -> None:

179

"""The command line interface entry point"""

180

```

181

182

Console script entry point: `sqllineage`

183

184

[CLI Interface](./cli-interface.md)

185

186

### Visualization and Export

187

188

Web-based visualization and export capabilities for lineage graphs, including support for Cytoscape.js format and interactive browser interface.

189

190

```python { .api }

191

def draw_lineage_graph(**kwargs) -> None: ...

192

193

def to_cytoscape(graph: DiGraph, compound=False) -> List[Dict[str, Dict[str, Any]]]: ...

194

```

195

196

[Visualization and Export](./visualization-export.md)

197

198

## Exception Handling

199

200

```python { .api }

201

class SQLLineageException(Exception):

202

"""Base Exception for SQLLineage"""

203

204

class UnsupportedStatementException(SQLLineageException):

205

"""Raised for SQL statement that SQLLineage doesn't support analyzing"""

206

207

class InvalidSyntaxException(SQLLineageException):

208

"""Raised for SQL statement that parser cannot parse"""

209

210

class MetaDataProviderException(SQLLineageException):

211

"""Raised for MetaDataProvider errors"""

212

213

class ConfigException(SQLLineageException):

214

"""Raised for configuration errors"""

215

```

216

217

## Constants and Types

218

219

```python { .api }

220

# Package constants

221

NAME: str = "sqllineage"

222

VERSION: str = "1.5.4"

223

DEFAULT_DIALECT: str = "ansi"

224

DEFAULT_HOST: str = "localhost"

225

DEFAULT_PORT: int = 5000

226

STATIC_FOLDER: str = "build"

227

228

# Lineage levels

229

class LineageLevel:

230

TABLE = "table"

231

COLUMN = "column"

232

233

# Node and edge types for graph analysis

234

class NodeTag:

235

READ = "read"

236

WRITE = "write"

237

CTE = "cte"

238

DROP = "drop"

239

SOURCE_ONLY = "source_only"

240

TARGET_ONLY = "target_only"

241

SELFLOOP = "selfloop"

242

243

class EdgeTag:

244

INDEX = "index"

245

246

class EdgeType:

247

LINEAGE = "lineage"

248

RENAME = "rename"

249

HAS_COLUMN = "has_column"

250

HAS_ALIAS = "has_alias"

251

```