0
# Datetime Feature Extraction
1
2
Transformers for extracting meaningful features from datetime variables including time components, periods, and date-related boolean flags to capture temporal patterns in machine learning models.
3
4
## Capabilities
5
6
### Datetime Features Extractor
7
8
Extracts date and time features from datetime variables, creating multiple new features from each datetime column.
9
10
```python { .api }
11
class DatetimeFeatures:
12
def __init__(self, variables=None, features_to_extract=None, drop_original=True,
13
missing_values='raise', dayfirst=False, yearfirst=False, utc=None):
14
"""
15
Initialize DatetimeFeatures.
16
17
Parameters:
18
- variables (list): List of datetime variables to extract features from. If None, auto-detects datetime columns
19
- features_to_extract (list/str): Specific features to extract or 'all' for all available features
20
- drop_original (bool): Whether to drop original datetime variables after extraction
21
- missing_values (str): How to handle missing values - 'raise' or 'ignore'
22
- dayfirst (bool): Parse dates with day first (DD/MM/YYYY format)
23
- yearfirst (bool): Parse dates with year first (YYYY/MM/DD format)
24
- utc (bool): Return UTC DatetimeIndex. If None, keeps original timezone
25
"""
26
27
def fit(self, X, y=None):
28
"""
29
Validate datetime variables and features to extract.
30
31
Parameters:
32
- X (pandas.DataFrame): Training dataset
33
- y (pandas.Series, optional): Target variable (not used)
34
35
Returns:
36
- self
37
"""
38
39
def transform(self, X):
40
"""
41
Extract datetime features and add to dataframe.
42
43
Parameters:
44
- X (pandas.DataFrame): Dataset to transform
45
46
Returns:
47
- pandas.DataFrame: Dataset with extracted datetime features
48
"""
49
50
def fit_transform(self, X, y=None):
51
"""Fit to data, then transform it."""
52
```
53
54
## Supported Datetime Features
55
56
### Time Period Features
57
- `"month"`: Month of the year (1-12)
58
- `"quarter"`: Quarter of the year (1-4)
59
- `"semester"`: Semester of the year (1-2)
60
- `"year"`: Year value
61
- `"week"`: Week of the year (1-52/53)
62
63
### Date Position Features
64
- `"day_of_week"`: Day of the week (0=Monday, 6=Sunday)
65
- `"day_of_month"`: Day of the month (1-31)
66
- `"day_of_year"`: Day of the year (1-365/366)
67
68
### Time Component Features
69
- `"hour"`: Hour of the day (0-23)
70
- `"minute"`: Minute of the hour (0-59)
71
- `"second"`: Second of the minute (0-59)
72
73
### Boolean Date Features
74
- `"weekend"`: Whether date falls on weekend (Saturday/Sunday)
75
- `"month_start"`: Whether date is first day of month
76
- `"month_end"`: Whether date is last day of month
77
- `"quarter_start"`: Whether date is first day of quarter
78
- `"quarter_end"`: Whether date is last day of quarter
79
- `"year_start"`: Whether date is first day of year
80
- `"year_end"`: Whether date is last day of year
81
- `"leap_year"`: Whether year is a leap year
82
83
### Calendar Properties
84
- `"days_in_month"`: Number of days in the month (28-31)
85
86
**Usage Example**:
87
```python
88
from feature_engine.datetime import DatetimeFeatures
89
import pandas as pd
90
91
# Sample datetime data
92
dates = pd.date_range('2023-01-01', periods=100, freq='D')
93
data = {
94
'transaction_date': dates,
95
'created_at': pd.date_range('2023-01-01 09:00:00', periods=100, freq='H'),
96
'amount': range(100)
97
}
98
df = pd.DataFrame(data)
99
100
# Extract common datetime features
101
extractor = DatetimeFeatures(
102
features_to_extract=['month', 'day_of_week', 'hour', 'weekend'],
103
drop_original=False
104
)
105
df_enhanced = extractor.fit_transform(df)
106
107
# Extract all available features
108
extractor_all = DatetimeFeatures(features_to_extract='all')
109
df_all_features = extractor_all.fit_transform(df)
110
111
# Access extracted feature information
112
print(extractor.variables_) # Datetime variables processed
113
print(extractor.features_to_extract_) # Features that were extracted
114
```
115
116
## Usage Patterns
117
118
### Time Series Feature Engineering
119
120
```python
121
import pandas as pd
122
import numpy as np
123
124
# Create time series data
125
dates = pd.date_range('2022-01-01', '2023-12-31', freq='D')
126
ts_data = {
127
'date': dates,
128
'sales': np.random.normal(1000, 200, len(dates)) +
129
100 * np.sin(2 * np.pi * dates.dayofyear / 365), # Seasonal pattern
130
'temperature': 20 + 10 * np.sin(2 * np.pi * dates.dayofyear / 365)
131
}
132
df_ts = pd.DataFrame(ts_data)
133
134
# Extract comprehensive datetime features for time series analysis
135
ts_extractor = DatetimeFeatures(
136
variables=['date'],
137
features_to_extract=[
138
'month', 'quarter', 'day_of_week', 'day_of_month',
139
'weekend', 'month_start', 'month_end', 'quarter_start', 'quarter_end'
140
]
141
)
142
df_ts_enhanced = ts_extractor.fit_transform(df_ts)
143
144
print(f"Original columns: {len(df_ts.columns)}")
145
print(f"Enhanced columns: {len(df_ts_enhanced.columns)}")
146
print("New datetime features:", [col for col in df_ts_enhanced.columns if col not in df_ts.columns])
147
```
148
149
### E-commerce Transaction Analysis
150
151
```python
152
# E-commerce transaction data
153
transaction_data = {
154
'order_date': pd.date_range('2023-01-01', periods=1000, freq='H'),
155
'customer_id': np.random.randint(1, 100, 1000),
156
'order_amount': np.random.normal(150, 50, 1000)
157
}
158
df_ecommerce = pd.DataFrame(transaction_data)
159
160
# Extract business-relevant datetime features
161
business_extractor = DatetimeFeatures(
162
features_to_extract=[
163
'month', 'day_of_week', 'hour',
164
'weekend', 'month_start', 'month_end'
165
],
166
drop_original=True
167
)
168
df_business = business_extractor.fit_transform(df_ecommerce)
169
170
# Now we can analyze patterns like:
171
# - Monthly seasonality (month feature)
172
# - Day of week effects (day_of_week, weekend features)
173
# - Hourly patterns (hour feature)
174
# - End/start of month effects (month_start, month_end features)
175
```
176
177
### Multi-timezone Datetime Processing
178
179
```python
180
# Data with timezone-aware datetimes
181
utc_dates = pd.date_range('2023-01-01', periods=100, freq='6H', tz='UTC')
182
est_dates = utc_dates.tz_convert('US/Eastern')
183
184
multi_tz_data = {
185
'utc_timestamp': utc_dates,
186
'local_timestamp': est_dates,
187
'value': np.random.randn(100)
188
}
189
df_tz = pd.DataFrame(multi_tz_data)
190
191
# Extract features preserving timezone info
192
tz_extractor = DatetimeFeatures(
193
features_to_extract=['hour', 'day_of_week'],
194
utc=None # Preserve original timezone
195
)
196
df_tz_features = tz_extractor.fit_transform(df_tz)
197
198
# Compare UTC vs local time features
199
print("UTC hours:", df_tz_features['utc_timestamp_hour'].unique())
200
print("Local hours:", df_tz_features['local_timestamp_hour'].unique())
201
```
202
203
### Custom Feature Selection
204
205
```python
206
# Financial data requiring specific datetime features
207
financial_data = {
208
'trade_date': pd.bdate_range('2023-01-01', periods=250), # Business days only
209
'stock_price': 100 + np.random.randn(250).cumsum(),
210
'volume': np.random.randint(1000, 10000, 250)
211
}
212
df_financial = pd.DataFrame(financial_data)
213
214
# Extract only relevant features for financial analysis
215
financial_extractor = DatetimeFeatures(
216
variables=['trade_date'],
217
features_to_extract=[
218
'month', 'quarter', 'day_of_week',
219
'month_start', 'month_end', 'quarter_start', 'quarter_end'
220
]
221
)
222
df_financial_enhanced = financial_extractor.fit_transform(df_financial)
223
224
# Features useful for:
225
# - Monthly/quarterly reporting periods
226
# - Day of week trading patterns (Monday effect, Friday effect)
227
# - Period start/end effects (window dressing, rebalancing)
228
```
229
230
### Handling Different Date Formats
231
232
```python
233
# Data with various date formats
234
mixed_format_data = {
235
'date_american': ['01/15/2023', '02/20/2023', '03/25/2023'], # MM/DD/YYYY
236
'date_european': ['15/01/2023', '20/02/2023', '25/03/2023'], # DD/MM/YYYY
237
'date_iso': ['2023-01-15', '2023-02-20', '2023-03-25'], # YYYY-MM-DD
238
'value': [100, 200, 300]
239
}
240
df_formats = pd.DataFrame(mixed_format_data)
241
242
# Convert to datetime with appropriate parsing
243
df_formats['date_american'] = pd.to_datetime(df_formats['date_american'], format='%m/%d/%Y')
244
df_formats['date_european'] = pd.to_datetime(df_formats['date_european'], dayfirst=True)
245
df_formats['date_iso'] = pd.to_datetime(df_formats['date_iso'])
246
247
# Extract features from properly parsed dates
248
format_extractor = DatetimeFeatures(
249
features_to_extract=['month', 'day_of_month', 'year']
250
)
251
df_formats_enhanced = format_extractor.fit_transform(df_formats)
252
```
253
254
### Pipeline Integration
255
256
```python
257
from sklearn.pipeline import Pipeline
258
from feature_engine.imputation import MeanMedianImputer
259
from feature_engine.datetime import DatetimeFeatures
260
from feature_engine.encoding import OneHotEncoder
261
262
# Preprocessing pipeline with datetime feature extraction
263
datetime_pipeline = Pipeline([
264
('datetime_features', DatetimeFeatures(
265
features_to_extract=['month', 'day_of_week', 'hour', 'weekend']
266
)),
267
('imputer', MeanMedianImputer()), # Handle any missing values
268
('encoder', OneHotEncoder()) # Encode categorical datetime features
269
])
270
271
df_processed = datetime_pipeline.fit_transform(df)
272
```
273
274
## Feature Naming Convention
275
276
DatetimeFeatures creates new columns following the pattern: `{original_column_name}_{feature_name}`
277
278
Examples:
279
- `transaction_date` + `month` → `transaction_date_month`
280
- `created_at` + `hour` → `created_at_hour`
281
- `timestamp` + `weekend` → `timestamp_weekend`
282
283
## Common Attributes
284
285
DatetimeFeatures has these fitted attributes:
286
287
- `variables_` (list): Datetime variables from which features will be extracted
288
- `features_to_extract_` (list): Features that will be extracted from each datetime variable
289
- `n_features_in_` (int): Number of features in training set
290
291
The transformer automatically handles pandas datetime types and can parse string dates during the transform process when proper parsing parameters are provided.