0
# Feast
1
2
Feast (Feature Store) is a comprehensive open-source feature store for machine learning that enables ML platform teams to consistently manage features for both training and serving environments. The system provides an offline store for processing historical data at scale, a low-latency online store for real-time predictions, and a battle-tested feature server for serving pre-computed features.
3
4
## Package Information
5
6
- **Package Name**: feast
7
- **Language**: Python
8
- **Installation**: `pip install feast`
9
10
## Core Imports
11
12
```python
13
import feast
14
from feast import FeatureStore
15
```
16
17
Common imports for feature definitions:
18
19
```python
20
from feast import (
21
Entity,
22
FeatureView,
23
BatchFeatureView,
24
OnDemandFeatureView,
25
StreamFeatureView,
26
FeatureService,
27
Feature,
28
Field,
29
FileSource,
30
ValueType,
31
RepoConfig,
32
Project
33
)
34
```
35
36
Data source imports:
37
38
```python
39
from feast import (
40
BigQuerySource,
41
RedshiftSource,
42
SnowflakeSource,
43
AthenaSource,
44
KafkaSource,
45
KinesisSource,
46
PushSource,
47
RequestSource
48
)
49
```
50
51
Vector store imports:
52
53
```python
54
from feast import FeastVectorStore
55
```
56
57
## Basic Usage
58
59
```python
60
import pandas as pd
61
from feast import FeatureStore, Entity, FeatureView, Field, FileSource, ValueType
62
from datetime import timedelta
63
64
# Initialize feature store from repo directory
65
fs = FeatureStore(repo_path=".")
66
67
# Define an entity
68
customer = Entity(
69
name="customer",
70
value_type=ValueType.INT64,
71
description="Customer identifier"
72
)
73
74
# Define a data source
75
customer_source = FileSource(
76
path="data/customer_features.parquet",
77
timestamp_field="event_timestamp"
78
)
79
80
# Define a feature view
81
customer_fv = FeatureView(
82
name="customer_features",
83
entities=[customer],
84
ttl=timedelta(days=1),
85
schema=[
86
Field(name="age", dtype=ValueType.INT64),
87
Field(name="income", dtype=ValueType.DOUBLE),
88
Field(name="city", dtype=ValueType.STRING)
89
],
90
source=customer_source
91
)
92
93
# Apply definitions to registry
94
fs.apply([customer, customer_fv])
95
96
# Get historical features for training
97
entity_df = pd.DataFrame({
98
"customer": [1001, 1002, 1003],
99
"event_timestamp": [
100
pd.Timestamp("2023-01-01"),
101
pd.Timestamp("2023-01-02"),
102
pd.Timestamp("2023-01-03")
103
]
104
})
105
106
training_df = fs.get_historical_features(
107
entity_df=entity_df,
108
features=["customer_features:age", "customer_features:income"]
109
).to_df()
110
111
# Get online features for serving
112
online_features = fs.get_online_features(
113
features=["customer_features:age", "customer_features:income"],
114
entity_rows=[{"customer": 1001}]
115
)
116
```
117
118
## Architecture
119
120
Feast provides a comprehensive feature store architecture with several key components:
121
122
- **Feature Store**: Central orchestrator managing the complete feature store lifecycle
123
- **Registry**: Centralized metadata store tracking all feature definitions and their lineage
124
- **Offline Store**: Scalable storage and compute engine for historical feature processing
125
- **Online Store**: Low-latency key-value store optimized for real-time feature serving
126
- **Feature Server**: HTTP/gRPC service providing standardized feature access APIs
127
128
This architecture enables teams to prevent data leakage through point-in-time correctness, decouple ML from data infrastructure, and ensure model portability across environments while supporting multiple data sources and deployment scenarios.
129
130
## Capabilities
131
132
### Feature Store Management
133
134
Core feature store operations including initialization, configuration, and lifecycle management. The FeatureStore class serves as the primary interface for all feature operations.
135
136
```python { .api }
137
class FeatureStore:
138
def __init__(self, repo_path: Optional[str] = None, config: Optional[RepoConfig] = None): ...
139
def apply(self, objects: List[Union[Entity, FeatureView, FeatureService]]): ...
140
def get_historical_features(self, entity_df: pd.DataFrame, features: List[str]) -> RetrievalJob: ...
141
def get_online_features(self, features: List[str], entity_rows: List[Dict[str, Any]]) -> OnlineResponse: ...
142
def materialize(self, start_date: datetime, end_date: datetime, feature_views: Optional[List[str]] = None): ...
143
```
144
145
[Feature Store](./feature-store.md)
146
147
### Entity Management
148
149
Entity definitions that establish the primary keys and identifiers around which features are organized. Entities define collections of related features and enable proper joining across different data sources.
150
151
```python { .api }
152
class Entity:
153
def __init__(self, name: str, value_type: ValueType, join_key: Optional[str] = None, description: str = "", tags: Optional[Dict[str, str]] = None): ...
154
155
@dataclass
156
class ValueType(Enum):
157
UNKNOWN = 0
158
BYTES = 1
159
STRING = 2
160
INT32 = 3
161
INT64 = 4
162
DOUBLE = 5
163
FLOAT = 6
164
BOOL = 7
165
```
166
167
[Entities](./entities.md)
168
169
### Feature View Definitions
170
171
Feature view types that define how features are computed, stored, and served. Different view types support various feature engineering patterns from batch processing to real-time transformations.
172
173
```python { .api }
174
class FeatureView:
175
def __init__(self, name: str, entities: List[Union[Entity, str]], schema: List[Field], source: DataSource, ttl: Optional[timedelta] = None): ...
176
177
class BatchFeatureView:
178
def __init__(self, name: str, entities: List[Union[Entity, str]], schema: List[Field], source: DataSource): ...
179
180
class OnDemandFeatureView:
181
def __init__(self, name: str, sources: Dict[str, Union[FeatureView, FeatureService]], udf: PythonTransformation): ...
182
```
183
184
[Feature Views](./feature-views.md)
185
186
### Data Source Connectors
187
188
Data source implementations for connecting to various storage systems and streaming platforms. Each connector provides optimized access patterns for different data infrastructure scenarios.
189
190
```python { .api }
191
class FileSource:
192
def __init__(self, path: str, timestamp_field: Optional[str] = None, created_timestamp_column: Optional[str] = None): ...
193
194
class BigQuerySource:
195
def __init__(self, table: str, timestamp_field: Optional[str] = None, created_timestamp_column: Optional[str] = None): ...
196
197
class KafkaSource:
198
def __init__(self, kafka_bootstrap_servers: str, message_format: StreamFormat, topic: str): ...
199
```
200
201
[Data Sources](./data-sources.md)
202
203
### CLI Operations
204
205
Command-line interface for managing feature store operations, deployments, and development workflows. The CLI provides essential tools for feature engineering teams.
206
207
```bash
208
feast init PROJECT_NAME # Initialize new project
209
feast apply # Apply feature definitions
210
feast materialize # Materialize features to online store
211
feast serve # Start feature server
212
```
213
214
[CLI Operations](./cli-operations.md)
215
216
### Vector Store Operations
217
218
Vector store functionality for RAG (Retrieval-Augmented Generation) applications and semantic search using feature store infrastructure.
219
220
```python { .api }
221
class FeastVectorStore:
222
def __init__(self, repo_path: str, rag_view: FeatureView, features: List[str]): ...
223
def query(self, query_vector: Optional[np.ndarray] = None, query_string: Optional[str] = None, top_k: int = 10) -> OnlineResponse: ...
224
```
225
226
[Vector Store](./vector-store.md)
227
228
## Types
229
230
```python { .api }
231
@dataclass
232
class Field:
233
name: str
234
dtype: ValueType
235
description: str = ""
236
tags: Optional[Dict[str, str]] = None
237
238
class RepoConfig:
239
def __init__(self, registry: str, project: str, provider: str): ...
240
241
class OnlineResponse:
242
def to_dict(self) -> Dict[str, List[Any]]: ...
243
def to_df(self) -> pd.DataFrame: ...
244
245
class RetrievalJob:
246
def to_df(self) -> pd.DataFrame: ...
247
def to_arrow(self) -> pa.Table: ...
248
249
class Project:
250
name: str
251
description: str
252
tags: Dict[str, str]
253
254
class Permission:
255
name: str
256
types: List[str]
257
policy: str
258
259
class SavedDataset:
260
name: str
261
features: List[str]
262
join_keys: List[str]
263
storage: SavedDatasetStorage
264
265
class ValidationReference:
266
name: str
267
dataset: SavedDataset
268
269
class LoggingSource:
270
def __init__(self, name: str, source_type: str): ...
271
272
class LoggingConfig:
273
destination: str
274
format: str
275
```