or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

creation.mddatetime.mddiscretisation.mdencoding.mdimputation.mdindex.mdoutliers.mdpreprocessing.mdselection.mdtransformation.mdwrappers.md

datetime.mddocs/

0

# Datetime Feature Extraction

1

2

Transformers for extracting meaningful features from datetime variables including time components, periods, and date-related boolean flags to capture temporal patterns in machine learning models.

3

4

## Capabilities

5

6

### Datetime Features Extractor

7

8

Extracts date and time features from datetime variables, creating multiple new features from each datetime column.

9

10

```python { .api }

11

class DatetimeFeatures:

12

def __init__(self, variables=None, features_to_extract=None, drop_original=True,

13

missing_values='raise', dayfirst=False, yearfirst=False, utc=None):

14

"""

15

Initialize DatetimeFeatures.

16

17

Parameters:

18

- variables (list): List of datetime variables to extract features from. If None, auto-detects datetime columns

19

- features_to_extract (list/str): Specific features to extract or 'all' for all available features

20

- drop_original (bool): Whether to drop original datetime variables after extraction

21

- missing_values (str): How to handle missing values - 'raise' or 'ignore'

22

- dayfirst (bool): Parse dates with day first (DD/MM/YYYY format)

23

- yearfirst (bool): Parse dates with year first (YYYY/MM/DD format)

24

- utc (bool): Return UTC DatetimeIndex. If None, keeps original timezone

25

"""

26

27

def fit(self, X, y=None):

28

"""

29

Validate datetime variables and features to extract.

30

31

Parameters:

32

- X (pandas.DataFrame): Training dataset

33

- y (pandas.Series, optional): Target variable (not used)

34

35

Returns:

36

- self

37

"""

38

39

def transform(self, X):

40

"""

41

Extract datetime features and add to dataframe.

42

43

Parameters:

44

- X (pandas.DataFrame): Dataset to transform

45

46

Returns:

47

- pandas.DataFrame: Dataset with extracted datetime features

48

"""

49

50

def fit_transform(self, X, y=None):

51

"""Fit to data, then transform it."""

52

```

53

54

## Supported Datetime Features

55

56

### Time Period Features

57

- `"month"`: Month of the year (1-12)

58

- `"quarter"`: Quarter of the year (1-4)

59

- `"semester"`: Semester of the year (1-2)

60

- `"year"`: Year value

61

- `"week"`: Week of the year (1-52/53)

62

63

### Date Position Features

64

- `"day_of_week"`: Day of the week (0=Monday, 6=Sunday)

65

- `"day_of_month"`: Day of the month (1-31)

66

- `"day_of_year"`: Day of the year (1-365/366)

67

68

### Time Component Features

69

- `"hour"`: Hour of the day (0-23)

70

- `"minute"`: Minute of the hour (0-59)

71

- `"second"`: Second of the minute (0-59)

72

73

### Boolean Date Features

74

- `"weekend"`: Whether date falls on weekend (Saturday/Sunday)

75

- `"month_start"`: Whether date is first day of month

76

- `"month_end"`: Whether date is last day of month

77

- `"quarter_start"`: Whether date is first day of quarter

78

- `"quarter_end"`: Whether date is last day of quarter

79

- `"year_start"`: Whether date is first day of year

80

- `"year_end"`: Whether date is last day of year

81

- `"leap_year"`: Whether year is a leap year

82

83

### Calendar Properties

84

- `"days_in_month"`: Number of days in the month (28-31)

85

86

**Usage Example**:

87

```python

88

from feature_engine.datetime import DatetimeFeatures

89

import pandas as pd

90

91

# Sample datetime data

92

dates = pd.date_range('2023-01-01', periods=100, freq='D')

93

data = {

94

'transaction_date': dates,

95

'created_at': pd.date_range('2023-01-01 09:00:00', periods=100, freq='H'),

96

'amount': range(100)

97

}

98

df = pd.DataFrame(data)

99

100

# Extract common datetime features

101

extractor = DatetimeFeatures(

102

features_to_extract=['month', 'day_of_week', 'hour', 'weekend'],

103

drop_original=False

104

)

105

df_enhanced = extractor.fit_transform(df)

106

107

# Extract all available features

108

extractor_all = DatetimeFeatures(features_to_extract='all')

109

df_all_features = extractor_all.fit_transform(df)

110

111

# Access extracted feature information

112

print(extractor.variables_) # Datetime variables processed

113

print(extractor.features_to_extract_) # Features that were extracted

114

```

115

116

## Usage Patterns

117

118

### Time Series Feature Engineering

119

120

```python

121

import pandas as pd

122

import numpy as np

123

124

# Create time series data

125

dates = pd.date_range('2022-01-01', '2023-12-31', freq='D')

126

ts_data = {

127

'date': dates,

128

'sales': np.random.normal(1000, 200, len(dates)) +

129

100 * np.sin(2 * np.pi * dates.dayofyear / 365), # Seasonal pattern

130

'temperature': 20 + 10 * np.sin(2 * np.pi * dates.dayofyear / 365)

131

}

132

df_ts = pd.DataFrame(ts_data)

133

134

# Extract comprehensive datetime features for time series analysis

135

ts_extractor = DatetimeFeatures(

136

variables=['date'],

137

features_to_extract=[

138

'month', 'quarter', 'day_of_week', 'day_of_month',

139

'weekend', 'month_start', 'month_end', 'quarter_start', 'quarter_end'

140

]

141

)

142

df_ts_enhanced = ts_extractor.fit_transform(df_ts)

143

144

print(f"Original columns: {len(df_ts.columns)}")

145

print(f"Enhanced columns: {len(df_ts_enhanced.columns)}")

146

print("New datetime features:", [col for col in df_ts_enhanced.columns if col not in df_ts.columns])

147

```

148

149

### E-commerce Transaction Analysis

150

151

```python

152

# E-commerce transaction data

153

transaction_data = {

154

'order_date': pd.date_range('2023-01-01', periods=1000, freq='H'),

155

'customer_id': np.random.randint(1, 100, 1000),

156

'order_amount': np.random.normal(150, 50, 1000)

157

}

158

df_ecommerce = pd.DataFrame(transaction_data)

159

160

# Extract business-relevant datetime features

161

business_extractor = DatetimeFeatures(

162

features_to_extract=[

163

'month', 'day_of_week', 'hour',

164

'weekend', 'month_start', 'month_end'

165

],

166

drop_original=True

167

)

168

df_business = business_extractor.fit_transform(df_ecommerce)

169

170

# Now we can analyze patterns like:

171

# - Monthly seasonality (month feature)

172

# - Day of week effects (day_of_week, weekend features)

173

# - Hourly patterns (hour feature)

174

# - End/start of month effects (month_start, month_end features)

175

```

176

177

### Multi-timezone Datetime Processing

178

179

```python

180

# Data with timezone-aware datetimes

181

utc_dates = pd.date_range('2023-01-01', periods=100, freq='6H', tz='UTC')

182

est_dates = utc_dates.tz_convert('US/Eastern')

183

184

multi_tz_data = {

185

'utc_timestamp': utc_dates,

186

'local_timestamp': est_dates,

187

'value': np.random.randn(100)

188

}

189

df_tz = pd.DataFrame(multi_tz_data)

190

191

# Extract features preserving timezone info

192

tz_extractor = DatetimeFeatures(

193

features_to_extract=['hour', 'day_of_week'],

194

utc=None # Preserve original timezone

195

)

196

df_tz_features = tz_extractor.fit_transform(df_tz)

197

198

# Compare UTC vs local time features

199

print("UTC hours:", df_tz_features['utc_timestamp_hour'].unique())

200

print("Local hours:", df_tz_features['local_timestamp_hour'].unique())

201

```

202

203

### Custom Feature Selection

204

205

```python

206

# Financial data requiring specific datetime features

207

financial_data = {

208

'trade_date': pd.bdate_range('2023-01-01', periods=250), # Business days only

209

'stock_price': 100 + np.random.randn(250).cumsum(),

210

'volume': np.random.randint(1000, 10000, 250)

211

}

212

df_financial = pd.DataFrame(financial_data)

213

214

# Extract only relevant features for financial analysis

215

financial_extractor = DatetimeFeatures(

216

variables=['trade_date'],

217

features_to_extract=[

218

'month', 'quarter', 'day_of_week',

219

'month_start', 'month_end', 'quarter_start', 'quarter_end'

220

]

221

)

222

df_financial_enhanced = financial_extractor.fit_transform(df_financial)

223

224

# Features useful for:

225

# - Monthly/quarterly reporting periods

226

# - Day of week trading patterns (Monday effect, Friday effect)

227

# - Period start/end effects (window dressing, rebalancing)

228

```

229

230

### Handling Different Date Formats

231

232

```python

233

# Data with various date formats

234

mixed_format_data = {

235

'date_american': ['01/15/2023', '02/20/2023', '03/25/2023'], # MM/DD/YYYY

236

'date_european': ['15/01/2023', '20/02/2023', '25/03/2023'], # DD/MM/YYYY

237

'date_iso': ['2023-01-15', '2023-02-20', '2023-03-25'], # YYYY-MM-DD

238

'value': [100, 200, 300]

239

}

240

df_formats = pd.DataFrame(mixed_format_data)

241

242

# Convert to datetime with appropriate parsing

243

df_formats['date_american'] = pd.to_datetime(df_formats['date_american'], format='%m/%d/%Y')

244

df_formats['date_european'] = pd.to_datetime(df_formats['date_european'], dayfirst=True)

245

df_formats['date_iso'] = pd.to_datetime(df_formats['date_iso'])

246

247

# Extract features from properly parsed dates

248

format_extractor = DatetimeFeatures(

249

features_to_extract=['month', 'day_of_month', 'year']

250

)

251

df_formats_enhanced = format_extractor.fit_transform(df_formats)

252

```

253

254

### Pipeline Integration

255

256

```python

257

from sklearn.pipeline import Pipeline

258

from feature_engine.imputation import MeanMedianImputer

259

from feature_engine.datetime import DatetimeFeatures

260

from feature_engine.encoding import OneHotEncoder

261

262

# Preprocessing pipeline with datetime feature extraction

263

datetime_pipeline = Pipeline([

264

('datetime_features', DatetimeFeatures(

265

features_to_extract=['month', 'day_of_week', 'hour', 'weekend']

266

)),

267

('imputer', MeanMedianImputer()), # Handle any missing values

268

('encoder', OneHotEncoder()) # Encode categorical datetime features

269

])

270

271

df_processed = datetime_pipeline.fit_transform(df)

272

```

273

274

## Feature Naming Convention

275

276

DatetimeFeatures creates new columns following the pattern: `{original_column_name}_{feature_name}`

277

278

Examples:

279

- `transaction_date` + `month``transaction_date_month`

280

- `created_at` + `hour``created_at_hour`

281

- `timestamp` + `weekend``timestamp_weekend`

282

283

## Common Attributes

284

285

DatetimeFeatures has these fitted attributes:

286

287

- `variables_` (list): Datetime variables from which features will be extracted

288

- `features_to_extract_` (list): Features that will be extracted from each datetime variable

289

- `n_features_in_` (int): Number of features in training set

290

291

The transformer automatically handles pandas datetime types and can parse string dates during the transform process when proper parsing parameters are provided.