0
# IMAP Hooks
1
2
Comprehensive IMAP server connectivity with SSL/TLS support, email searching, attachment detection, and secure file downloads. The ImapHook provides low-level interface for interacting with IMAP email servers.
3
4
## Capabilities
5
6
### ImapHook Class
7
8
Main hook class for IMAP server connections with automatic connection management and context manager support.
9
10
```python { .api }
11
class ImapHook:
12
"""
13
Hook for connecting to mail servers using IMAP protocol.
14
15
Parameters:
16
- imap_conn_id (str): Connection ID for IMAP server credentials
17
18
Attributes:
19
- conn_name_attr (str): "imap_conn_id"
20
- default_conn_name (str): "imap_default"
21
- conn_type (str): "imap"
22
- hook_name (str): "IMAP"
23
"""
24
25
def __init__(self, imap_conn_id: str = "imap_default") -> None: ...
26
```
27
28
### Connection Management
29
30
Methods for establishing and managing IMAP server connections with automatic SSL/TLS configuration.
31
32
```python { .api }
33
def get_conn(self) -> ImapHook:
34
"""
35
Login to the mail server and return authorized hook instance.
36
37
Returns:
38
ImapHook: Authorized hook object ready for operations
39
40
Note: Use as context manager with 'with' statement for automatic cleanup
41
"""
42
43
def __enter__(self) -> ImapHook:
44
"""Context manager entry - returns connected hook instance."""
45
46
def __exit__(self, exc_type, exc_val, exc_tb):
47
"""Context manager exit - automatically logs out from mail server."""
48
```
49
50
**Usage Example:**
51
52
```python
53
from airflow.providers.imap.hooks.imap import ImapHook
54
55
# Recommended: Use as context manager for automatic connection cleanup
56
with ImapHook(imap_conn_id="my_imap_conn") as hook:
57
# Connection is automatically established and cleaned up
58
attachments = hook.retrieve_mail_attachments("*.pdf")
59
60
# Manual connection management (not recommended)
61
hook = ImapHook(imap_conn_id="my_imap_conn")
62
connected_hook = hook.get_conn()
63
# Remember to call hook.mail_client.logout() manually
64
```
65
66
### Attachment Detection
67
68
Methods for checking the existence of email attachments with flexible search patterns.
69
70
```python { .api }
71
def has_mail_attachment(
72
self,
73
name: str,
74
*,
75
check_regex: bool = False,
76
mail_folder: str = "INBOX",
77
mail_filter: str = "All"
78
) -> bool:
79
"""
80
Check if mail folder contains attachments with the given name.
81
82
Parameters:
83
- name (str): Attachment name to search for
84
- check_regex (bool): If True, treat name as regular expression pattern
85
- mail_folder (str): Mail folder to search in (default: "INBOX")
86
- mail_filter (str): IMAP search filter (default: "All")
87
88
Returns:
89
bool: True if attachment found, False otherwise
90
"""
91
```
92
93
**Usage Example:**
94
95
```python
96
with ImapHook() as hook:
97
# Check for exact attachment name
98
has_report = hook.has_mail_attachment("daily_report.csv")
99
100
# Check using regex pattern
101
has_any_csv = hook.has_mail_attachment(
102
r".*\.csv$",
103
check_regex=True
104
)
105
106
# Search in specific folder with date filter
107
has_recent = hook.has_mail_attachment(
108
"invoice.pdf",
109
mail_folder="Business",
110
mail_filter='(SINCE "01-Jan-2024")'
111
)
112
```
113
114
### Attachment Retrieval
115
116
Methods for retrieving attachment data as in-memory content for processing.
117
118
```python { .api }
119
def retrieve_mail_attachments(
120
self,
121
name: str,
122
*,
123
check_regex: bool = False,
124
latest_only: bool = False,
125
mail_folder: str = "INBOX",
126
mail_filter: str = "All",
127
not_found_mode: str = "raise",
128
) -> list[tuple[str, bytes]]:
129
"""
130
Retrieve attachment data from emails matching the criteria.
131
132
Parameters:
133
- name (str): Attachment name to search for
134
- check_regex (bool): If True, treat name as regular expression
135
- latest_only (bool): If True, return only the first matching attachment
136
- mail_folder (str): Mail folder to search in (default: "INBOX")
137
- mail_filter (str): IMAP search filter (default: "All")
138
- not_found_mode (str): Error handling mode ("raise", "warn", "ignore")
139
140
Returns:
141
list[tuple[str, bytes]]: List of (filename, payload) tuples containing attachment data
142
"""
143
```
144
145
**Usage Example:**
146
147
```python
148
with ImapHook() as hook:
149
# Retrieve all CSV attachments
150
attachments = hook.retrieve_mail_attachments("*.csv", check_regex=True)
151
for filename, payload in attachments:
152
print(f"Found attachment: {filename}, size: {len(payload)} bytes")
153
154
# Get only the latest matching attachment
155
latest = hook.retrieve_mail_attachments(
156
"report.xlsx",
157
latest_only=True,
158
not_found_mode="warn" # Just log warning if not found
159
)
160
```
161
162
### Attachment Download
163
164
Methods for downloading attachments to local filesystem with security protections.
165
166
```python { .api }
167
def download_mail_attachments(
168
self,
169
name: str,
170
local_output_directory: str,
171
*,
172
check_regex: bool = False,
173
latest_only: bool = False,
174
mail_folder: str = "INBOX",
175
mail_filter: str = "All",
176
not_found_mode: str = "raise",
177
) -> None:
178
"""
179
Download attachments from emails to local directory.
180
181
Parameters:
182
- name (str): Attachment name to search for
183
- local_output_directory (str): Local directory path for downloads
184
- check_regex (bool): If True, treat name as regular expression
185
- latest_only (bool): If True, download only the first matching attachment
186
- mail_folder (str): Mail folder to search in (default: "INBOX")
187
- mail_filter (str): IMAP search filter (default: "All")
188
- not_found_mode (str): Error handling mode ("raise", "warn", "ignore")
189
190
Security Features:
191
- Prevents directory traversal attacks (blocks "../" in filenames)
192
- Blocks symlink creation for security
193
- Validates output directory paths
194
"""
195
```
196
197
**Usage Example:**
198
199
```python
200
import os
201
202
with ImapHook() as hook:
203
# Create download directory
204
download_dir = "/tmp/email_attachments"
205
os.makedirs(download_dir, exist_ok=True)
206
207
# Download all PDF attachments from last 7 days
208
hook.download_mail_attachments(
209
name=r".*\.pdf$",
210
local_output_directory=download_dir,
211
check_regex=True,
212
mail_filter='(SINCE "07-days-ago")'
213
)
214
215
# Download only the latest report
216
hook.download_mail_attachments(
217
name="daily_report.xlsx",
218
local_output_directory=download_dir,
219
latest_only=True,
220
not_found_mode="ignore" # Continue processing even if not found
221
)
222
```
223
224
## Helper Classes
225
226
### Mail Class
227
228
Helper class for parsing and working with individual email messages.
229
230
```python { .api }
231
class Mail:
232
"""
233
Helper class for working with mail messages from imaplib.
234
235
Parameters:
236
- mail_body (str): Raw email message body from IMAP server
237
"""
238
239
def __init__(self, mail_body: str) -> None: ...
240
241
def has_attachments(self) -> bool:
242
"""
243
Check if the email message contains attachments.
244
245
Returns:
246
bool: True if email has attachments, False otherwise
247
"""
248
249
def get_attachments_by_name(
250
self,
251
name: str,
252
check_regex: bool,
253
find_first: bool = False
254
) -> list[tuple[str, bytes]]:
255
"""
256
Extract attachments from email by name pattern.
257
258
Parameters:
259
- name (str): Attachment name or pattern to match
260
- check_regex (bool): If True, use regex matching
261
- find_first (bool): If True, return only first match
262
263
Returns:
264
list[tuple[str, bytes]]: List of (filename, payload) tuples
265
"""
266
```
267
268
### MailPart Class
269
270
Helper class for working with individual email message parts and attachments.
271
272
```python { .api }
273
class MailPart:
274
"""
275
Wrapper for individual email parts with attachment functionality.
276
277
Parameters:
278
- part: Email message part from email.message
279
"""
280
281
def __init__(self, part) -> None: ...
282
283
def is_attachment(self) -> bool:
284
"""
285
Check if the message part is a valid attachment.
286
287
Returns:
288
bool: True if part is an attachment, False otherwise
289
"""
290
291
def has_matching_name(self, name: str) -> re.Match[str] | None:
292
"""
293
Check if attachment name matches regex pattern.
294
295
Parameters:
296
- name (str): Regular expression pattern to match
297
298
Returns:
299
re.Match[str] | None: Match object if pattern matches, None otherwise
300
"""
301
302
def has_equal_name(self, name: str) -> bool:
303
"""
304
Check if attachment name equals the given name exactly.
305
306
Parameters:
307
- name (str): Exact name to match
308
309
Returns:
310
bool: True if names are equal, False otherwise
311
"""
312
313
def get_file(self) -> tuple[str, bytes]:
314
"""
315
Extract filename and payload from attachment.
316
317
Returns:
318
tuple[str, bytes]: (filename, payload) where payload is decoded bytes
319
"""
320
```
321
322
## IMAP Search Filters
323
324
The `mail_filter` parameter supports IMAP search criteria. Common examples:
325
326
```python
327
# All messages (default)
328
mail_filter = "All"
329
330
# Messages from specific sender
331
mail_filter = 'FROM "sender@example.com"'
332
333
# Messages with specific subject
334
mail_filter = 'SUBJECT "Monthly Report"'
335
336
# Messages since specific date
337
mail_filter = 'SINCE "01-Jan-2024"'
338
339
# Messages from last N days
340
mail_filter = 'SINCE "7-days-ago"'
341
342
# Unread messages only
343
mail_filter = "UNSEEN"
344
345
# Combine multiple criteria
346
mail_filter = 'FROM "reports@company.com" SINCE "01-Jan-2024" UNSEEN'
347
```
348
349
## Connection Security
350
351
### SSL/TLS Configuration
352
353
The hook automatically configures SSL/TLS based on connection settings:
354
355
```python
356
# Connection Extra field configuration
357
{
358
"use_ssl": true, # Enable SSL/TLS (default: true)
359
"ssl_context": "default" # SSL context ("default" or "none")
360
}
361
```
362
363
- **"default"**: Uses `ssl.create_default_context()` for secure connections
364
- **"none"**: Disables certificate verification (not recommended for production)
365
366
### Global Configuration
367
368
SSL context can be configured globally in Airflow configuration:
369
370
```ini
371
[imap]
372
ssl_context = default
373
374
# Or fallback to email section
375
[email]
376
ssl_context = default
377
```
378
379
## Error Handling
380
381
### Exception Types
382
383
```python
384
from airflow.exceptions import AirflowException
385
386
try:
387
with ImapHook() as hook:
388
attachments = hook.retrieve_mail_attachments("report.csv")
389
except AirflowException:
390
# Handle case where no attachments were found
391
print("No matching attachments found")
392
except RuntimeError as e:
393
# Handle SSL configuration or connection errors
394
print(f"Connection error: {e}")
395
```
396
397
### Error Modes
398
399
Configure how the hook handles missing attachments:
400
401
- **"raise"** (default): Raises `AirflowException` if no attachments found
402
- **"warn"**: Logs warning message and continues execution
403
- **"ignore"**: Silent operation, returns empty list if no attachments found
404
405
```python
406
# Different error handling approaches
407
with ImapHook() as hook:
408
# Strict mode - will raise exception if not found
409
attachments = hook.retrieve_mail_attachments("critical_report.csv")
410
411
# Lenient mode - will continue even if not found
412
optional_attachments = hook.retrieve_mail_attachments(
413
"optional_data.csv",
414
not_found_mode="ignore"
415
)
416
```