0
# Lineage Entities
1
2
Data lineage entities including files, users, tables, columns, and tags with template field support for dynamic content rendering. These entities provide structured representations of data assets and their relationships for lineage tracking.
3
4
## Capabilities
5
6
### File Entity
7
8
Represents file-based data assets in lineage tracking.
9
10
```python { .api }
11
class File:
12
"""
13
File entity for lineage tracking.
14
15
Attributes:
16
url (str): File URL or path
17
type_hint (str | None): Optional type hint for the file
18
template_fields (tuple): Fields that support templating - ("url",)
19
"""
20
21
def __init__(self, url: str, type_hint: str | None = None): ...
22
```
23
24
### User Entity
25
26
Represents users associated with data assets for ownership and access tracking.
27
28
```python { .api }
29
class User:
30
"""
31
User entity for lineage tracking.
32
33
Attributes:
34
email (str): User email address
35
first_name (str | None): Optional first name
36
last_name (str | None): Optional last name
37
template_fields (tuple): Fields that support templating - ("email", "first_name", "last_name")
38
"""
39
40
def __init__(
41
self,
42
email: str,
43
first_name: str | None = None,
44
last_name: str | None = None
45
): ...
46
```
47
48
### Tag Entity
49
50
Represents tags or classifications applied to data assets.
51
52
```python { .api }
53
class Tag:
54
"""
55
Tag/classification entity for data assets.
56
57
Attributes:
58
tag_name (str): Name of the tag
59
template_fields (tuple): Fields that support templating - ("tag_name",)
60
"""
61
62
def __init__(self, tag_name: str): ...
63
```
64
65
### Column Entity
66
67
Represents individual columns within table entities.
68
69
```python { .api }
70
class Column:
71
"""
72
Table column entity for lineage tracking.
73
74
Attributes:
75
name (str): Column name
76
description (str | None): Optional column description
77
data_type (str): Column data type
78
tags (list[Tag]): List of tags applied to the column
79
template_fields (tuple): Fields that support templating - ("name", "description", "data_type", "tags")
80
"""
81
82
def __init__(
83
self,
84
name: str,
85
description: str | None = None,
86
data_type: str = "",
87
tags: list[Tag] | None = None
88
): ...
89
```
90
91
### Table Entity
92
93
Represents database tables or structured datasets in lineage tracking.
94
95
```python { .api }
96
class Table:
97
"""
98
Table entity for lineage tracking.
99
100
Attributes:
101
database (str): Database name
102
cluster (str): Cluster or server name
103
name (str): Table name
104
tags (list[Tag]): List of tags applied to the table
105
description (str | None): Optional table description
106
columns (list[Column]): List of table columns
107
owners (list[User]): List of table owners
108
extra (dict[str, Any]): Additional metadata
109
type_hint (str | None): Optional type hint
110
template_fields (tuple): Fields that support templating
111
"""
112
113
def __init__(
114
self,
115
database: str,
116
cluster: str,
117
name: str,
118
tags: list[Tag] | None = None,
119
description: str | None = None,
120
columns: list[Column] | None = None,
121
owners: list[User] | None = None,
122
extra: dict[str, Any] | None = None,
123
type_hint: str | None = None
124
): ...
125
```
126
127
### Utility Functions
128
129
Helper functions for lineage entity management.
130
131
```python { .api }
132
def default_if_none(arg: bool | None) -> bool:
133
"""
134
Return default value when argument is None.
135
136
Args:
137
arg (bool | None): Boolean value or None
138
139
Returns:
140
bool: False if arg is None, otherwise arg
141
"""
142
143
```
144
145
### Hook Integration
146
147
Hook lineage collector for version-compatible lineage tracking.
148
149
```python { .api }
150
def get_hook_lineage_collector():
151
"""
152
Get version-compatible hook lineage collector.
153
154
Returns hook lineage collector appropriate for current Airflow version.
155
In Airflow 3.0+, returns airflow.lineage.hook.get_hook_lineage_collector()
156
In Airflow < 3.0, returns compatibility wrapper with asset/dataset renaming
157
158
Returns:
159
Hook lineage collector with version-appropriate asset/dataset methods
160
"""
161
```
162
163
## Usage Examples
164
165
```python
166
from airflow.providers.common.compat.lineage.entities import (
167
File, User, Tag, Column, Table, default_if_none
168
)
169
from airflow.providers.common.compat.lineage.hook import get_hook_lineage_collector
170
171
# Create lineage entities
172
owner = User(
173
email="data-team@company.com",
174
first_name="Data",
175
last_name="Team"
176
)
177
178
pii_tag = Tag("PII")
179
sensitive_tag = Tag("SENSITIVE")
180
181
# Define table columns
182
user_id_col = Column(
183
name="user_id",
184
description="Unique user identifier",
185
data_type="INTEGER",
186
tags=[]
187
)
188
189
email_col = Column(
190
name="email",
191
description="User email address",
192
data_type="VARCHAR(255)",
193
tags=[pii_tag, sensitive_tag]
194
)
195
196
# Create table entity
197
users_table = Table(
198
database="analytics",
199
cluster="prod-cluster",
200
name="users",
201
description="User information table",
202
columns=[user_id_col, email_col],
203
owners=[owner],
204
tags=[sensitive_tag],
205
extra={"partition_key": "created_date"}
206
)
207
208
# Create file entity
209
data_file = File(
210
url="s3://data-lake/users/{{ ds }}/users.parquet",
211
type_hint="parquet"
212
)
213
214
# Use in operators with lineage tracking
215
from airflow.operators.python import PythonOperator
216
217
def process_data_with_lineage(**context):
218
# Get lineage collector
219
collector = get_hook_lineage_collector()
220
221
# Add lineage information
222
collector.add_input(users_table)
223
collector.add_output(data_file)
224
225
# Process data
226
print(f"Processing {users_table.name} to {data_file.url}")
227
228
lineage_task = PythonOperator(
229
task_id="process_with_lineage",
230
python_callable=process_data_with_lineage,
231
# Template fields will be rendered with Airflow context
232
)
233
234
# Utility usage
235
include_metadata = default_if_none(None) # Returns False
236
```