or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

asset-management.mdindex.mdlineage-entities.mdnotifier-compatibility.mdopenlineage-integration.mdprovider-verification.mdsecurity-permissions.mdstandard-components.mdversion-compatibility.md

lineage-entities.mddocs/

0

# Lineage Entities

1

2

Data lineage entities including files, users, tables, columns, and tags with template field support for dynamic content rendering. These entities provide structured representations of data assets and their relationships for lineage tracking.

3

4

## Capabilities

5

6

### File Entity

7

8

Represents file-based data assets in lineage tracking.

9

10

```python { .api }

11

class File:

12

"""

13

File entity for lineage tracking.

14

15

Attributes:

16

url (str): File URL or path

17

type_hint (str | None): Optional type hint for the file

18

template_fields (tuple): Fields that support templating - ("url",)

19

"""

20

21

def __init__(self, url: str, type_hint: str | None = None): ...

22

```

23

24

### User Entity

25

26

Represents users associated with data assets for ownership and access tracking.

27

28

```python { .api }

29

class User:

30

"""

31

User entity for lineage tracking.

32

33

Attributes:

34

email (str): User email address

35

first_name (str | None): Optional first name

36

last_name (str | None): Optional last name

37

template_fields (tuple): Fields that support templating - ("email", "first_name", "last_name")

38

"""

39

40

def __init__(

41

self,

42

email: str,

43

first_name: str | None = None,

44

last_name: str | None = None

45

): ...

46

```

47

48

### Tag Entity

49

50

Represents tags or classifications applied to data assets.

51

52

```python { .api }

53

class Tag:

54

"""

55

Tag/classification entity for data assets.

56

57

Attributes:

58

tag_name (str): Name of the tag

59

template_fields (tuple): Fields that support templating - ("tag_name",)

60

"""

61

62

def __init__(self, tag_name: str): ...

63

```

64

65

### Column Entity

66

67

Represents individual columns within table entities.

68

69

```python { .api }

70

class Column:

71

"""

72

Table column entity for lineage tracking.

73

74

Attributes:

75

name (str): Column name

76

description (str | None): Optional column description

77

data_type (str): Column data type

78

tags (list[Tag]): List of tags applied to the column

79

template_fields (tuple): Fields that support templating - ("name", "description", "data_type", "tags")

80

"""

81

82

def __init__(

83

self,

84

name: str,

85

description: str | None = None,

86

data_type: str = "",

87

tags: list[Tag] | None = None

88

): ...

89

```

90

91

### Table Entity

92

93

Represents database tables or structured datasets in lineage tracking.

94

95

```python { .api }

96

class Table:

97

"""

98

Table entity for lineage tracking.

99

100

Attributes:

101

database (str): Database name

102

cluster (str): Cluster or server name

103

name (str): Table name

104

tags (list[Tag]): List of tags applied to the table

105

description (str | None): Optional table description

106

columns (list[Column]): List of table columns

107

owners (list[User]): List of table owners

108

extra (dict[str, Any]): Additional metadata

109

type_hint (str | None): Optional type hint

110

template_fields (tuple): Fields that support templating

111

"""

112

113

def __init__(

114

self,

115

database: str,

116

cluster: str,

117

name: str,

118

tags: list[Tag] | None = None,

119

description: str | None = None,

120

columns: list[Column] | None = None,

121

owners: list[User] | None = None,

122

extra: dict[str, Any] | None = None,

123

type_hint: str | None = None

124

): ...

125

```

126

127

### Utility Functions

128

129

Helper functions for lineage entity management.

130

131

```python { .api }

132

def default_if_none(arg: bool | None) -> bool:

133

"""

134

Return default value when argument is None.

135

136

Args:

137

arg (bool | None): Boolean value or None

138

139

Returns:

140

bool: False if arg is None, otherwise arg

141

"""

142

143

```

144

145

### Hook Integration

146

147

Hook lineage collector for version-compatible lineage tracking.

148

149

```python { .api }

150

def get_hook_lineage_collector():

151

"""

152

Get version-compatible hook lineage collector.

153

154

Returns hook lineage collector appropriate for current Airflow version.

155

In Airflow 3.0+, returns airflow.lineage.hook.get_hook_lineage_collector()

156

In Airflow < 3.0, returns compatibility wrapper with asset/dataset renaming

157

158

Returns:

159

Hook lineage collector with version-appropriate asset/dataset methods

160

"""

161

```

162

163

## Usage Examples

164

165

```python

166

from airflow.providers.common.compat.lineage.entities import (

167

File, User, Tag, Column, Table, default_if_none

168

)

169

from airflow.providers.common.compat.lineage.hook import get_hook_lineage_collector

170

171

# Create lineage entities

172

owner = User(

173

email="data-team@company.com",

174

first_name="Data",

175

last_name="Team"

176

)

177

178

pii_tag = Tag("PII")

179

sensitive_tag = Tag("SENSITIVE")

180

181

# Define table columns

182

user_id_col = Column(

183

name="user_id",

184

description="Unique user identifier",

185

data_type="INTEGER",

186

tags=[]

187

)

188

189

email_col = Column(

190

name="email",

191

description="User email address",

192

data_type="VARCHAR(255)",

193

tags=[pii_tag, sensitive_tag]

194

)

195

196

# Create table entity

197

users_table = Table(

198

database="analytics",

199

cluster="prod-cluster",

200

name="users",

201

description="User information table",

202

columns=[user_id_col, email_col],

203

owners=[owner],

204

tags=[sensitive_tag],

205

extra={"partition_key": "created_date"}

206

)

207

208

# Create file entity

209

data_file = File(

210

url="s3://data-lake/users/{{ ds }}/users.parquet",

211

type_hint="parquet"

212

)

213

214

# Use in operators with lineage tracking

215

from airflow.operators.python import PythonOperator

216

217

def process_data_with_lineage(**context):

218

# Get lineage collector

219

collector = get_hook_lineage_collector()

220

221

# Add lineage information

222

collector.add_input(users_table)

223

collector.add_output(data_file)

224

225

# Process data

226

print(f"Processing {users_table.name} to {data_file.url}")

227

228

lineage_task = PythonOperator(

229

task_id="process_with_lineage",

230

python_callable=process_data_with_lineage,

231

# Template fields will be rendered with Airflow context

232

)

233

234

# Utility usage

235

include_metadata = default_if_none(None) # Returns False

236

```