or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

index.md

index.mddocs/

0

# Apache Airflow Papermill Provider

1

2

Apache Airflow provider for executing Jupyter notebooks with Papermill. This provider enables data teams to integrate notebook-based analytics and machine learning workflows into their Airflow DAGs by executing parameterized Jupyter notebooks through the Papermill library.

3

4

## Package Information

5

6

- **Package Name**: apache-airflow-providers-papermill

7

- **Language**: Python

8

- **Installation**: `pip install apache-airflow-providers-papermill`

9

10

## Core Imports

11

12

```python

13

from airflow.providers.papermill.operators.papermill import PapermillOperator

14

```

15

16

For lineage tracking:

17

18

```python

19

from airflow.providers.papermill.operators.papermill import NoteBook

20

```

21

22

## Basic Usage

23

24

```python

25

from airflow import DAG

26

from airflow.providers.papermill.operators.papermill import PapermillOperator

27

from airflow.utils.dates import days_ago

28

from datetime import timedelta

29

30

# Define DAG

31

dag = DAG(

32

'example_papermill',

33

default_args={'owner': 'airflow'},

34

schedule_interval='0 0 * * *',

35

start_date=days_ago(2),

36

dagrun_timeout=timedelta(minutes=60),

37

)

38

39

# Execute a notebook with parameters

40

run_notebook = PapermillOperator(

41

task_id="run_analysis_notebook",

42

input_nb="/path/to/analysis.ipynb",

43

output_nb="/path/to/output-{{ execution_date }}.ipynb",

44

parameters={"date": "{{ execution_date }}", "source": "airflow"},

45

dag=dag

46

)

47

```

48

49

## Capabilities

50

51

### Notebook Execution

52

53

Execute Jupyter notebooks through Papermill with parameter injection and lineage tracking support.

54

55

```python { .api }

56

class PapermillOperator(BaseOperator):

57

"""

58

Executes a jupyter notebook through papermill that is annotated with parameters

59

60

:param input_nb: input notebook (can also be a NoteBook or a File inlet)

61

:type input_nb: str

62

:param output_nb: output notebook (can also be a NoteBook or File outlet)

63

:type output_nb: str

64

:param parameters: the notebook parameters to set

65

:type parameters: dict

66

"""

67

68

supports_lineage = True

69

70

@apply_defaults

71

def __init__(

72

self,

73

*,

74

input_nb: Optional[str] = None,

75

output_nb: Optional[str] = None,

76

parameters: Optional[Dict] = None,

77

**kwargs,

78

) -> None: ...

79

80

def execute(self, context): ...

81

```

82

83

### Lineage Entity

84

85

Represents Jupyter notebooks for Airflow lineage tracking.

86

87

```python { .api }

88

@attr.s(auto_attribs=True)

89

class NoteBook(File):

90

"""Jupyter notebook"""

91

92

type_hint: Optional[str] = "jupyter_notebook"

93

parameters: Optional[Dict] = {}

94

meta_schema: str = __name__ + '.NoteBook'

95

```

96

97

## Types

98

99

```python { .api }

100

from typing import Dict, Optional

101

import attr

102

import papermill as pm

103

from airflow.lineage.entities import File

104

from airflow.models import BaseOperator

105

from airflow.utils.decorators import apply_defaults

106

```

107

108

## Advanced Usage Examples

109

110

### Template Variables

111

112

Use Airflow's templating system for dynamic notebook paths and parameters:

113

114

```python

115

run_notebook = PapermillOperator(

116

task_id="daily_report",

117

input_nb="/notebooks/daily_report_template.ipynb",

118

output_nb="/reports/daily_report_{{ ds }}.ipynb",

119

parameters={

120

"report_date": "{{ ds }}",

121

"execution_time": "{{ execution_date }}",

122

"run_id": "{{ run_id }}"

123

}

124

)

125

```

126

127

### Lineage Tracking

128

129

The operator automatically creates lineage entities that can be used by downstream tasks:

130

131

```python

132

from airflow.lineage import AUTO

133

from airflow.operators.python import PythonOperator

134

135

def process_notebook_output(inlets, **context):

136

# Access the output notebook through lineage

137

notebook_path = inlets[0].url

138

# Process the executed notebook...

139

140

process_task = PythonOperator(

141

task_id='process_output',

142

python_callable=process_notebook_output,

143

inlets=AUTO # Automatically detects upstream notebook outputs

144

)

145

146

run_notebook >> process_task

147

```

148

149

### Multiple Notebook Execution

150

151

Execute multiple notebooks in sequence by setting up multiple inlets and outlets:

152

153

```python

154

# Note: This pattern requires careful setup of inlets/outlets

155

# The operator will execute notebooks in pairs (inlet[i] -> outlet[i])

156

multi_notebook = PapermillOperator(

157

task_id="run_multiple_notebooks",

158

# Set up inlets and outlets manually for multiple notebooks

159

dag=dag

160

)

161

# Additional configuration needed for multiple notebook execution

162

```

163

164

## Error Handling

165

166

The operator performs validation during execution:

167

168

- Raises `ValueError` if inlets or outlets are not properly configured (i.e., "Input notebook or output notebook is not specified")

169

- Papermill execution errors are propagated as task failures

170

171

## Integration Notes

172

173

### Papermill Configuration

174

175

The operator calls `papermill.execute_notebook()` with these settings:

176

- `progress_bar=False` - Disables progress display for cleaner logs

177

- `report_mode=True` - Enables report generation mode

178

- Parameters are passed through for notebook injection

179

180

### Airflow Features

181

182

- **Templating**: All string parameters support Airflow's Jinja templating

183

- **Lineage**: Automatic lineage tracking through `NoteBook` entities

184

- **XCom**: Can be used with XCom for passing data between tasks

185

- **Retries**: Supports standard Airflow retry mechanisms

186

- **Connections**: Can use Airflow connections for remote notebook storage

187

188

## Migration Notes

189

190

The legacy import path is deprecated and issues a warning:

191

192

```python

193

# DEPRECATED - issues warning

194

from airflow.operators.papermill_operator import PapermillOperator

195

```

196

197

Use the current import path:

198

199

```python

200

# Current import path

201

from airflow.providers.papermill.operators.papermill import PapermillOperator

202

```