or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-feast

Python SDK for Feast - an open source feature store for machine learning that manages features for both training and serving environments.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/feast@0.53.x

To install, run

npx @tessl/cli install tessl/pypi-feast@0.53.0

0

# Feast

1

2

Feast (Feature Store) is a comprehensive open-source feature store for machine learning that enables ML platform teams to consistently manage features for both training and serving environments. The system provides an offline store for processing historical data at scale, a low-latency online store for real-time predictions, and a battle-tested feature server for serving pre-computed features.

3

4

## Package Information

5

6

- **Package Name**: feast

7

- **Language**: Python

8

- **Installation**: `pip install feast`

9

10

## Core Imports

11

12

```python

13

import feast

14

from feast import FeatureStore

15

```

16

17

Common imports for feature definitions:

18

19

```python

20

from feast import (

21

Entity,

22

FeatureView,

23

BatchFeatureView,

24

OnDemandFeatureView,

25

StreamFeatureView,

26

FeatureService,

27

Feature,

28

Field,

29

FileSource,

30

ValueType,

31

RepoConfig,

32

Project

33

)

34

```

35

36

Data source imports:

37

38

```python

39

from feast import (

40

BigQuerySource,

41

RedshiftSource,

42

SnowflakeSource,

43

AthenaSource,

44

KafkaSource,

45

KinesisSource,

46

PushSource,

47

RequestSource

48

)

49

```

50

51

Vector store imports:

52

53

```python

54

from feast import FeastVectorStore

55

```

56

57

## Basic Usage

58

59

```python

60

import pandas as pd

61

from feast import FeatureStore, Entity, FeatureView, Field, FileSource, ValueType

62

from datetime import timedelta

63

64

# Initialize feature store from repo directory

65

fs = FeatureStore(repo_path=".")

66

67

# Define an entity

68

customer = Entity(

69

name="customer",

70

value_type=ValueType.INT64,

71

description="Customer identifier"

72

)

73

74

# Define a data source

75

customer_source = FileSource(

76

path="data/customer_features.parquet",

77

timestamp_field="event_timestamp"

78

)

79

80

# Define a feature view

81

customer_fv = FeatureView(

82

name="customer_features",

83

entities=[customer],

84

ttl=timedelta(days=1),

85

schema=[

86

Field(name="age", dtype=ValueType.INT64),

87

Field(name="income", dtype=ValueType.DOUBLE),

88

Field(name="city", dtype=ValueType.STRING)

89

],

90

source=customer_source

91

)

92

93

# Apply definitions to registry

94

fs.apply([customer, customer_fv])

95

96

# Get historical features for training

97

entity_df = pd.DataFrame({

98

"customer": [1001, 1002, 1003],

99

"event_timestamp": [

100

pd.Timestamp("2023-01-01"),

101

pd.Timestamp("2023-01-02"),

102

pd.Timestamp("2023-01-03")

103

]

104

})

105

106

training_df = fs.get_historical_features(

107

entity_df=entity_df,

108

features=["customer_features:age", "customer_features:income"]

109

).to_df()

110

111

# Get online features for serving

112

online_features = fs.get_online_features(

113

features=["customer_features:age", "customer_features:income"],

114

entity_rows=[{"customer": 1001}]

115

)

116

```

117

118

## Architecture

119

120

Feast provides a comprehensive feature store architecture with several key components:

121

122

- **Feature Store**: Central orchestrator managing the complete feature store lifecycle

123

- **Registry**: Centralized metadata store tracking all feature definitions and their lineage

124

- **Offline Store**: Scalable storage and compute engine for historical feature processing

125

- **Online Store**: Low-latency key-value store optimized for real-time feature serving

126

- **Feature Server**: HTTP/gRPC service providing standardized feature access APIs

127

128

This architecture enables teams to prevent data leakage through point-in-time correctness, decouple ML from data infrastructure, and ensure model portability across environments while supporting multiple data sources and deployment scenarios.

129

130

## Capabilities

131

132

### Feature Store Management

133

134

Core feature store operations including initialization, configuration, and lifecycle management. The FeatureStore class serves as the primary interface for all feature operations.

135

136

```python { .api }

137

class FeatureStore:

138

def __init__(self, repo_path: Optional[str] = None, config: Optional[RepoConfig] = None): ...

139

def apply(self, objects: List[Union[Entity, FeatureView, FeatureService]]): ...

140

def get_historical_features(self, entity_df: pd.DataFrame, features: List[str]) -> RetrievalJob: ...

141

def get_online_features(self, features: List[str], entity_rows: List[Dict[str, Any]]) -> OnlineResponse: ...

142

def materialize(self, start_date: datetime, end_date: datetime, feature_views: Optional[List[str]] = None): ...

143

```

144

145

[Feature Store](./feature-store.md)

146

147

### Entity Management

148

149

Entity definitions that establish the primary keys and identifiers around which features are organized. Entities define collections of related features and enable proper joining across different data sources.

150

151

```python { .api }

152

class Entity:

153

def __init__(self, name: str, value_type: ValueType, join_key: Optional[str] = None, description: str = "", tags: Optional[Dict[str, str]] = None): ...

154

155

@dataclass

156

class ValueType(Enum):

157

UNKNOWN = 0

158

BYTES = 1

159

STRING = 2

160

INT32 = 3

161

INT64 = 4

162

DOUBLE = 5

163

FLOAT = 6

164

BOOL = 7

165

```

166

167

[Entities](./entities.md)

168

169

### Feature View Definitions

170

171

Feature view types that define how features are computed, stored, and served. Different view types support various feature engineering patterns from batch processing to real-time transformations.

172

173

```python { .api }

174

class FeatureView:

175

def __init__(self, name: str, entities: List[Union[Entity, str]], schema: List[Field], source: DataSource, ttl: Optional[timedelta] = None): ...

176

177

class BatchFeatureView:

178

def __init__(self, name: str, entities: List[Union[Entity, str]], schema: List[Field], source: DataSource): ...

179

180

class OnDemandFeatureView:

181

def __init__(self, name: str, sources: Dict[str, Union[FeatureView, FeatureService]], udf: PythonTransformation): ...

182

```

183

184

[Feature Views](./feature-views.md)

185

186

### Data Source Connectors

187

188

Data source implementations for connecting to various storage systems and streaming platforms. Each connector provides optimized access patterns for different data infrastructure scenarios.

189

190

```python { .api }

191

class FileSource:

192

def __init__(self, path: str, timestamp_field: Optional[str] = None, created_timestamp_column: Optional[str] = None): ...

193

194

class BigQuerySource:

195

def __init__(self, table: str, timestamp_field: Optional[str] = None, created_timestamp_column: Optional[str] = None): ...

196

197

class KafkaSource:

198

def __init__(self, kafka_bootstrap_servers: str, message_format: StreamFormat, topic: str): ...

199

```

200

201

[Data Sources](./data-sources.md)

202

203

### CLI Operations

204

205

Command-line interface for managing feature store operations, deployments, and development workflows. The CLI provides essential tools for feature engineering teams.

206

207

```bash

208

feast init PROJECT_NAME # Initialize new project

209

feast apply # Apply feature definitions

210

feast materialize # Materialize features to online store

211

feast serve # Start feature server

212

```

213

214

[CLI Operations](./cli-operations.md)

215

216

### Vector Store Operations

217

218

Vector store functionality for RAG (Retrieval-Augmented Generation) applications and semantic search using feature store infrastructure.

219

220

```python { .api }

221

class FeastVectorStore:

222

def __init__(self, repo_path: str, rag_view: FeatureView, features: List[str]): ...

223

def query(self, query_vector: Optional[np.ndarray] = None, query_string: Optional[str] = None, top_k: int = 10) -> OnlineResponse: ...

224

```

225

226

[Vector Store](./vector-store.md)

227

228

## Types

229

230

```python { .api }

231

@dataclass

232

class Field:

233

name: str

234

dtype: ValueType

235

description: str = ""

236

tags: Optional[Dict[str, str]] = None

237

238

class RepoConfig:

239

def __init__(self, registry: str, project: str, provider: str): ...

240

241

class OnlineResponse:

242

def to_dict(self) -> Dict[str, List[Any]]: ...

243

def to_df(self) -> pd.DataFrame: ...

244

245

class RetrievalJob:

246

def to_df(self) -> pd.DataFrame: ...

247

def to_arrow(self) -> pa.Table: ...

248

249

class Project:

250

name: str

251

description: str

252

tags: Dict[str, str]

253

254

class Permission:

255

name: str

256

types: List[str]

257

policy: str

258

259

class SavedDataset:

260

name: str

261

features: List[str]

262

join_keys: List[str]

263

storage: SavedDatasetStorage

264

265

class ValidationReference:

266

name: str

267

dataset: SavedDataset

268

269

class LoggingSource:

270

def __init__(self, name: str, source_type: str): ...

271

272

class LoggingConfig:

273

destination: str

274

format: str

275

```