0
# GitHub Operators
1
2
GitHub Operator provides generic execution of GitHub API operations as Airflow tasks. Uses PyGithub SDK methods dynamically with support for templated parameters and result processing.
3
4
## Capabilities
5
6
### GithubOperator
7
8
Generic operator for executing any GitHub API method through PyGithub client.
9
10
```python { .api }
11
class GithubOperator(BaseOperator):
12
"""
13
Interact and perform actions on GitHub API.
14
15
This operator is designed to use GitHub's Python SDK: https://github.com/PyGithub/PyGithub
16
Executes any method available on the PyGithub client with provided arguments.
17
"""
18
19
# Template fields for dynamic argument substitution
20
template_fields = ("github_method_args",)
21
22
def __init__(
23
self,
24
*,
25
github_method: str,
26
github_conn_id: str = "github_default",
27
github_method_args: dict | None = None,
28
result_processor: Callable | None = None,
29
**kwargs,
30
) -> None:
31
"""
32
Initialize GitHub operator.
33
34
Parameters:
35
- github_method: Method name from PyGithub client to be called
36
- github_conn_id: Reference to pre-defined GitHub Connection
37
- github_method_args: Method parameters for the github_method (templated)
38
- result_processor: Function to further process the response from GitHub API
39
- **kwargs: Additional BaseOperator parameters
40
"""
41
42
def execute(self, context: Context) -> Any:
43
"""
44
Execute GitHub method with provided arguments.
45
46
Creates GithubHook, gets client, and calls specified method.
47
Optionally processes results through result_processor function.
48
49
Parameters:
50
- context: Airflow task execution context
51
52
Returns:
53
Any: Result from GitHub API method, optionally processed
54
55
Raises:
56
AirflowException: If GitHub operation fails or method doesn't exist
57
"""
58
```
59
60
## Usage Examples
61
62
### Basic API Calls
63
64
```python
65
from airflow.providers.github.operators.github import GithubOperator
66
67
# Get user information
68
get_user = GithubOperator(
69
task_id='get_github_user',
70
github_method='get_user',
71
dag=dag
72
)
73
74
# Get specific repository
75
get_repo = GithubOperator(
76
task_id='get_repository',
77
github_method='get_repo',
78
github_method_args={'full_name_or_id': 'apache/airflow'},
79
dag=dag
80
)
81
```
82
83
### Repository Operations
84
85
```python
86
# List user repositories
87
list_repos = GithubOperator(
88
task_id='list_repositories',
89
github_method='get_user',
90
result_processor=lambda user: [repo.name for repo in user.get_repos()],
91
dag=dag
92
)
93
94
# Get repository issues
95
get_issues = GithubOperator(
96
task_id='get_repo_issues',
97
github_method='get_repo',
98
github_method_args={'full_name_or_id': 'apache/airflow'},
99
result_processor=lambda repo: list(repo.get_issues(state='open')),
100
dag=dag
101
)
102
103
# Get repository tags
104
list_tags = GithubOperator(
105
task_id='list_repo_tags',
106
github_method='get_repo',
107
github_method_args={'full_name_or_id': 'apache/airflow'},
108
result_processor=lambda repo: [tag.name for tag in repo.get_tags()],
109
dag=dag
110
)
111
```
112
113
### Organization Operations
114
115
```python
116
# Get organization
117
get_org = GithubOperator(
118
task_id='get_organization',
119
github_method='get_organization',
120
github_method_args={'login': 'apache'},
121
dag=dag
122
)
123
124
# List organization repositories
125
org_repos = GithubOperator(
126
task_id='list_org_repos',
127
github_method='get_organization',
128
github_method_args={'login': 'apache'},
129
result_processor=lambda org: [repo.name for repo in org.get_repos()],
130
dag=dag
131
)
132
```
133
134
### Templated Parameters
135
136
```python
137
# Use templated arguments with Airflow context
138
templated_operation = GithubOperator(
139
task_id='templated_github_call',
140
github_method='get_repo',
141
github_method_args={
142
'full_name_or_id': '{{ dag_run.conf["repo_name"] }}' # Templated
143
},
144
result_processor=lambda repo: repo.stargazers_count,
145
dag=dag
146
)
147
```
148
149
### Custom Result Processing
150
151
```python
152
import logging
153
154
def process_repo_info(repo):
155
"""Custom processor to extract and log repository information."""
156
info = {
157
'name': repo.name,
158
'stars': repo.stargazers_count,
159
'forks': repo.forks_count,
160
'language': repo.language,
161
'open_issues': repo.open_issues_count
162
}
163
logging.info(f"Repository info: {info}")
164
return info
165
166
analyze_repo = GithubOperator(
167
task_id='analyze_repository',
168
github_method='get_repo',
169
github_method_args={'full_name_or_id': 'apache/airflow'},
170
result_processor=process_repo_info,
171
dag=dag
172
)
173
```
174
175
### Complex Workflows
176
177
```python
178
def get_recent_releases(repo):
179
"""Get releases from the last 30 days."""
180
from datetime import datetime, timedelta
181
182
cutoff_date = datetime.now() - timedelta(days=30)
183
recent_releases = []
184
185
for release in repo.get_releases():
186
if release.created_at >= cutoff_date:
187
recent_releases.append({
188
'tag': release.tag_name,
189
'name': release.name,
190
'created': release.created_at.isoformat()
191
})
192
193
return recent_releases
194
195
recent_releases = GithubOperator(
196
task_id='get_recent_releases',
197
github_method='get_repo',
198
github_method_args={'full_name_or_id': 'apache/airflow'},
199
result_processor=get_recent_releases,
200
dag=dag
201
)
202
```
203
204
## Available GitHub Methods
205
206
The operator can call any method available on the PyGithub `Github` client. Common methods include:
207
208
### User/Authentication Methods
209
- `get_user()`: Get authenticated user
210
- `get_user(login)`: Get specific user by login
211
212
### Repository Methods
213
- `get_repo(full_name_or_id)`: Get specific repository
214
- `search_repositories(query)`: Search repositories
215
216
### Organization Methods
217
- `get_organization(login)`: Get organization
218
- `search_users(query)`: Search users
219
220
### And many more as provided by PyGithub SDK
221
222
## Error Handling
223
224
The operator wraps GitHub API exceptions:
225
226
```python
227
# GitHub API errors are caught and re-raised as AirflowException
228
try:
229
result = operator.execute(context)
230
except AirflowException as e:
231
# Handle GitHub API failures
232
if "404" in str(e):
233
print("Resource not found")
234
elif "403" in str(e):
235
print("Access forbidden - check token permissions")
236
```
237
238
## Return Values
239
240
- **Without result_processor**: Returns raw PyGithub object (Repository, User, etc.)
241
- **With result_processor**: Returns processed result from the processor function
242
- **On error**: Raises `AirflowException` with GitHub error details