Kubernetes Custom Resource Definition for serving predictive and generative machine learning models with high abstraction interfaces and features like GPU autoscaling, scale to zero, and canary rollouts.
pkg:github/kserve/kserve@0.15.x
npx @tessl/cli install tessl/github-kserve@0.15.00
# KServe
1
2
KServe is a comprehensive Kubernetes-native machine learning model serving platform that enables production deployment of ML models through Custom Resource Definitions. It provides a unified interface for serving predictive and generative models with enterprise-grade features like GPU autoscaling, scale-to-zero, canary rollouts, and multi-framework support.
3
4
## Package Information
5
6
- **Package Name**: kserve
7
- **Package Type**: Python SDK
8
- **Language**: Python
9
- **Installation**: `pip install kserve`
10
- **Requirements**: Python >=3.9,<3.13, Kubernetes cluster
11
- **Optional Features**: `pip install kserve[storage]` for cloud storage, `pip install kserve[ray]` for Ray integration
12
13
## Core Imports
14
15
```python
16
import kserve
17
```
18
19
Common imports for model serving:
20
21
```python
22
from kserve import Model, ModelServer
23
```
24
25
For client operations:
26
27
```python
28
from kserve import InferenceRESTClient, InferenceGRPCClient, KServeClient
29
```
30
31
For protocol types:
32
33
```python
34
from kserve import InferRequest, InferResponse, InferInput, InferOutput
35
```
36
37
## Basic Usage
38
39
### Creating a Custom Model Server
40
41
```python
42
from kserve import Model, ModelServer
43
import asyncio
44
45
class MyModel(Model):
46
def __init__(self, name: str):
47
super().__init__(name)
48
self.model = None
49
self.ready = False
50
51
def load(self):
52
# Load your model here
53
# self.model = load_model("path/to/model")
54
self.ready = True
55
56
async def predict(self, payload):
57
# Implement prediction logic
58
# result = self.model.predict(payload)
59
return {"predictions": "example_result"}
60
61
if __name__ == "__main__":
62
model = MyModel("my-model")
63
ModelServer().start([model])
64
```
65
66
### Making Inference Requests
67
68
```python
69
import asyncio
70
from kserve import InferenceRESTClient, InferRequest, InferInput
71
72
async def main():
73
client = InferenceRESTClient("http://localhost:8080")
74
75
# Create input data
76
input_data = InferInput(name="data", shape=[1, 784], datatype="FP32")
77
input_data.set_data_from_numpy(data_array)
78
79
# Create inference request
80
request = InferRequest(model_name="my-model", inputs=[input_data])
81
82
# Make prediction
83
response = await client.infer(request)
84
predictions = response.outputs[0].as_numpy()
85
86
asyncio.run(main())
87
```
88
89
### Deploying Models with Kubernetes
90
91
```python
92
from kserve import KServeClient, V1beta1InferenceService, V1beta1InferenceServiceSpec
93
from kserve import V1beta1PredictorSpec, V1beta1SKLearnSpec
94
95
# Create Kubernetes client
96
client = KServeClient()
97
98
# Define inference service
99
isvc = V1beta1InferenceService(
100
api_version="serving.kserve.io/v1beta1",
101
kind="InferenceService",
102
metadata={"name": "sklearn-iris", "namespace": "default"},
103
spec=V1beta1InferenceServiceSpec(
104
predictor=V1beta1PredictorSpec(
105
sklearn=V1beta1SKLearnSpec(
106
storage_uri="gs://kfserving-examples/models/sklearn/1.0/model"
107
)
108
)
109
)
110
)
111
112
# Deploy the service
113
client.create(isvc, namespace="default")
114
```
115
116
## Architecture
117
118
KServe consists of several key components that work together to provide a complete ML serving solution:
119
120
### Core Components
121
122
- **Model**: Abstract base class for implementing custom serving logic with lifecycle management
123
- **ModelServer**: Production-ready HTTP/gRPC server with multi-processing, health checks, and metrics
124
- **ModelRepository**: Registry for managing multiple models with loading/unloading capabilities
125
- **Clients**: High-level REST and gRPC clients for inference requests
126
- **Protocol**: Standardized data types for inference requests and responses
127
128
### Kubernetes Integration
129
130
- **InferenceService**: Main Kubernetes custom resource for deploying models
131
- **ServingRuntime**: Defines the model serving container and runtime configuration
132
- **TrainedModel**: Manages model artifacts and versioning
133
- **InferenceGraph**: Orchestrates multi-model inference pipelines
134
135
### Framework Support
136
137
KServe provides built-in support for popular ML frameworks through specialized servers:
138
- TensorFlow Serving, PyTorch TorchServe, Scikit-learn, XGBoost, LightGBM
139
- ONNX Runtime, NVIDIA Triton, PMML, PaddlePaddle
140
- HuggingFace Transformers for LLM serving
141
142
## Capabilities
143
144
### Model Serving Framework
145
146
Core classes and interfaces for implementing custom model servers with lifecycle management, health checking, and protocol support.
147
148
```python { .api }
149
class Model:
150
def __init__(self, name: str): ...
151
def load(self): ...
152
async def predict(self, payload): ...
153
async def preprocess(self, payload): ...
154
async def postprocess(self, payload): ...
155
156
class ModelServer:
157
def __init__(self, http_port: int = 8080, grpc_port: int = 8081): ...
158
def start(self, models: List[Model]): ...
159
def register_model(self, model: Model): ...
160
```
161
162
[Model Serving](./model-serving.md)
163
164
### Inference Clients
165
166
High-level async clients for making inference requests to KServe models with retry logic, SSL support, and protocol conversion.
167
168
```python { .api }
169
class InferenceRESTClient:
170
def __init__(self, url: str, config: RESTConfig = None): ...
171
async def infer(self, request: InferRequest) -> InferResponse: ...
172
async def explain(self, request: InferRequest) -> InferResponse: ...
173
async def is_model_ready(self, model_name: str) -> bool: ...
174
175
class InferenceGRPCClient:
176
def __init__(self, url: str): ...
177
async def infer(self, request: InferRequest) -> InferResponse: ...
178
```
179
180
[Inference Clients](./inference-clients.md)
181
182
### Protocol and Data Types
183
184
Standardized data structures for inference requests and responses with support for multiple protocols and data formats.
185
186
```python { .api }
187
class InferRequest:
188
def __init__(self, model_name: str, inputs: List[InferInput]): ...
189
def as_dataframe(self) -> pandas.DataFrame: ...
190
def to_rest(self) -> dict: ...
191
192
class InferResponse:
193
def __init__(self, model_name: str, outputs: List[InferOutput]): ...
194
@classmethod
195
def from_rest(cls, response: dict) -> 'InferResponse': ...
196
197
class InferInput:
198
def __init__(self, name: str, shape: List[int], datatype: str): ...
199
def set_data_from_numpy(self, input_tensor: numpy.ndarray): ...
200
def as_numpy(self) -> numpy.ndarray: ...
201
```
202
203
[Protocol and Data Types](./protocol.md)
204
205
### Kubernetes API Client
206
207
Python client for managing KServe resources in Kubernetes clusters including InferenceServices, TrainedModels, and InferenceGraphs.
208
209
```python { .api }
210
class KServeClient:
211
def __init__(self, config_file: str = None): ...
212
def create(self, obj, namespace: str = "default"): ...
213
def get(self, name: str, namespace: str = "default"): ...
214
def delete(self, name: str, namespace: str = "default"): ...
215
def set_credentials(self, storage_type: str, **kwargs): ...
216
```
217
218
[Kubernetes Client](./kubernetes-client.md)
219
220
### Resource Models
221
222
Comprehensive set of Kubernetes Custom Resource Definitions for defining inference services, serving runtimes, and model configurations.
223
224
```python { .api }
225
class V1beta1InferenceService:
226
def __init__(self, metadata: dict, spec: V1beta1InferenceServiceSpec): ...
227
228
class V1beta1PredictorSpec:
229
def __init__(self, sklearn: V1beta1SKLearnSpec = None,
230
pytorch: V1beta1TorchServeSpec = None): ...
231
232
class V1alpha1ServingRuntime:
233
def __init__(self, metadata: dict, spec: V1alpha1ServingRuntimeSpec): ...
234
```
235
236
[Resource Models](./resource-models.md)
237
238
### Framework Servers
239
240
Pre-built model servers for popular ML frameworks that extend the core KServe functionality with framework-specific optimizations.
241
242
```python { .api }
243
# Scikit-learn
244
from sklearnserver import SKLearnModel
245
246
# XGBoost
247
from xgbserver import XGBoostModel
248
249
# HuggingFace
250
from huggingfaceserver import HuggingFaceModel
251
```
252
253
[Framework Servers](./framework-servers.md)
254
255
### Storage Integration
256
257
Unified storage interface supporting multiple cloud providers and local storage for model artifact management.
258
259
```python { .api }
260
from kserve.storage import Storage
261
262
def download_model(uri: str, dest: str):
263
Storage.download(uri, dest)
264
```
265
266
[Storage](./storage.md)