0
# SQLLineage
1
2
SQLLineage is a Python library that analyzes SQL statements to extract data lineage information, identifying source and target tables without requiring deep knowledge of SQL parsing. It leverages pluggable parser libraries (sqlfluff and sqlparse) to parse SQL commands, analyzes the AST, stores lineage information in a graph using networkx, and provides human-readable results.
3
4
## Package Information
5
6
- **Package Name**: sqllineage
7
- **Language**: Python
8
- **Installation**: `pip install sqllineage`
9
- **Version**: 1.5.4
10
11
## Core Imports
12
13
```python
14
from sqllineage.runner import LineageRunner
15
```
16
17
Common imports for working with models:
18
19
```python
20
from sqllineage.core.models import Table, Column, Schema
21
```
22
23
For metadata providers:
24
25
```python
26
from sqllineage.core.metadata.dummy import DummyMetaDataProvider
27
from sqllineage.core.metadata.sqlalchemy import SQLAlchemyMetaDataProvider
28
```
29
30
## Basic Usage
31
32
```python
33
from sqllineage.runner import LineageRunner
34
35
# Basic table-level lineage analysis
36
sql = """
37
INSERT INTO target_table
38
SELECT a.col1, b.col2
39
FROM source_table_a a
40
JOIN source_table_b b ON a.id = b.id
41
"""
42
43
# Create runner and analyze SQL
44
runner = LineageRunner(sql)
45
46
# Get lineage results
47
print("Source tables:", [str(table) for table in runner.source_tables])
48
print("Target tables:", [str(table) for table in runner.target_tables])
49
50
# Column-level lineage
51
for src_col, tgt_col in runner.get_column_lineage():
52
print(f"{src_col} -> {tgt_col}")
53
54
# Launch web visualization
55
runner.draw()
56
```
57
58
## Architecture
59
60
SQLLineage uses a pluggable parser architecture with configurable metadata providers:
61
62
- **LineageRunner**: Main entry point that orchestrates parsing and analysis
63
- **Parser Layer**: Pluggable SQL parser backends (sqlfluff for modern dialects, sqlparse for legacy)
64
- **Model Layer**: Data classes representing tables, columns, schemas, and relationships
65
- **Metadata Layer**: Pluggable providers for schema information (dummy, SQLAlchemy, custom)
66
- **Visualization Layer**: Web interface and export formats for lineage graphs
67
68
This design enables support for 20+ SQL dialects while providing both programmatic and interactive interfaces for data lineage analysis.
69
70
## Capabilities
71
72
### SQL Analysis and Lineage Runner
73
74
Core functionality for analyzing SQL statements and extracting lineage information. The LineageRunner class provides the main programmatic interface with support for multiple SQL dialects, metadata integration, and various output formats.
75
76
```python { .api }
77
class LineageRunner:
78
def __init__(
79
self,
80
sql: str,
81
dialect: str = "ansi",
82
metadata_provider: MetaDataProvider = DummyMetaDataProvider(),
83
verbose: bool = False,
84
silent_mode: bool = False,
85
draw_options: Optional[Dict[str, Any]] = None,
86
file_path: str = "."
87
): ...
88
89
@property
90
def source_tables(self) -> List[Table]: ...
91
92
@property
93
def target_tables(self) -> List[Table]: ...
94
95
@property
96
def intermediate_tables(self) -> List[Table]: ...
97
98
def get_column_lineage(
99
self,
100
exclude_path_ending_in_subquery: bool = True,
101
exclude_subquery_columns: bool = False
102
) -> List[Tuple[Column, Column]]: ...
103
104
def print_column_lineage(self) -> None: ...
105
106
def print_table_lineage(self) -> None: ...
107
108
def statements(self) -> List[str]: ...
109
110
def to_cytoscape(self, level: LineageLevel = LineageLevel.TABLE) -> List[Dict[str, Dict[str, str]]]: ...
111
112
@staticmethod
113
def supported_dialects() -> Dict[str, List[str]]: ...
114
115
def draw(self) -> None: ...
116
```
117
118
[SQL Analysis and Lineage Runner](./lineage-runner.md)
119
120
### Data Models
121
122
Core data classes representing SQL entities like tables, columns, schemas, and subqueries. These models provide the foundation for lineage analysis and include support for complex SQL constructs like CTEs, subqueries, and cross-schema references.
123
124
```python { .api }
125
class Table:
126
def __init__(self, name: str, schema: Schema = Schema(), **kwargs): ...
127
128
class Column:
129
def __init__(self, name: str, **kwargs): ...
130
131
class Schema:
132
def __init__(self, name: Optional[str] = None): ...
133
134
class SubQuery:
135
def __init__(self, subquery: Any, subquery_raw: str, alias: Optional[str]): ...
136
```
137
138
[Data Models](./data-models.md)
139
140
### Metadata Providers
141
142
Pluggable interfaces for providing schema and table metadata to enhance lineage analysis. Supports both simple dictionary-based metadata and database introspection via SQLAlchemy.
143
144
```python { .api }
145
class MetaDataProvider:
146
def get_table_columns(self, table: Table, **kwargs) -> List[Column]: ...
147
148
class DummyMetaDataProvider(MetaDataProvider):
149
def __init__(self, metadata: Optional[Dict[str, List[str]]] = None): ...
150
151
class SQLAlchemyMetaDataProvider(MetaDataProvider):
152
def __init__(self, url: str, engine_kwargs: Optional[Dict[str, Any]] = None): ...
153
```
154
155
[Metadata Providers](./metadata-providers.md)
156
157
### Configuration Management
158
159
Thread-safe configuration system for customizing SQLLineage behavior including default schemas, dialect-specific parsing options, and integration settings.
160
161
```python { .api }
162
class _SQLLineageConfigLoader:
163
def __call__(self, **kwargs) -> "_SQLLineageConfigLoader": ...
164
def __enter__(self) -> None: ...
165
def __exit__(self, exc_type, exc_val, exc_tb) -> None: ...
166
167
# Global configuration instance
168
SQLLineageConfig = _SQLLineageConfigLoader()
169
```
170
171
[Configuration Management](./configuration.md)
172
173
### CLI Interface
174
175
Command-line interface for analyzing SQL files and generating lineage reports, with options for different output formats and visualization modes.
176
177
```python { .api }
178
def main(args=None) -> None:
179
"""The command line interface entry point"""
180
```
181
182
Console script entry point: `sqllineage`
183
184
[CLI Interface](./cli-interface.md)
185
186
### Visualization and Export
187
188
Web-based visualization and export capabilities for lineage graphs, including support for Cytoscape.js format and interactive browser interface.
189
190
```python { .api }
191
def draw_lineage_graph(**kwargs) -> None: ...
192
193
def to_cytoscape(graph: DiGraph, compound=False) -> List[Dict[str, Dict[str, Any]]]: ...
194
```
195
196
[Visualization and Export](./visualization-export.md)
197
198
## Exception Handling
199
200
```python { .api }
201
class SQLLineageException(Exception):
202
"""Base Exception for SQLLineage"""
203
204
class UnsupportedStatementException(SQLLineageException):
205
"""Raised for SQL statement that SQLLineage doesn't support analyzing"""
206
207
class InvalidSyntaxException(SQLLineageException):
208
"""Raised for SQL statement that parser cannot parse"""
209
210
class MetaDataProviderException(SQLLineageException):
211
"""Raised for MetaDataProvider errors"""
212
213
class ConfigException(SQLLineageException):
214
"""Raised for configuration errors"""
215
```
216
217
## Constants and Types
218
219
```python { .api }
220
# Package constants
221
NAME: str = "sqllineage"
222
VERSION: str = "1.5.4"
223
DEFAULT_DIALECT: str = "ansi"
224
DEFAULT_HOST: str = "localhost"
225
DEFAULT_PORT: int = 5000
226
STATIC_FOLDER: str = "build"
227
228
# Lineage levels
229
class LineageLevel:
230
TABLE = "table"
231
COLUMN = "column"
232
233
# Node and edge types for graph analysis
234
class NodeTag:
235
READ = "read"
236
WRITE = "write"
237
CTE = "cte"
238
DROP = "drop"
239
SOURCE_ONLY = "source_only"
240
TARGET_ONLY = "target_only"
241
SELFLOOP = "selfloop"
242
243
class EdgeTag:
244
INDEX = "index"
245
246
class EdgeType:
247
LINEAGE = "lineage"
248
RENAME = "rename"
249
HAS_COLUMN = "has_column"
250
HAS_ALIAS = "has_alias"
251
```