Tessl Tile for github/kserve/kserve@0.15.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

framework-servers.md index.md inference-clients.md kubernetes-client.md model-serving.md protocol.md resource-models.md storage.md

index.mddocs/

0
# KServe
1

2
KServe is a comprehensive Kubernetes-native machine learning model serving platform that enables production deployment of ML models through Custom Resource Definitions. It provides a unified interface for serving predictive and generative models with enterprise-grade features like GPU autoscaling, scale-to-zero, canary rollouts, and multi-framework support.
3

4
## Package Information
5

6
- **Package Name**: kserve
7
- **Package Type**: Python SDK
8
- **Language**: Python
9
- **Installation**: `pip install kserve`
10
- **Requirements**: Python >=3.9,<3.13, Kubernetes cluster
11
- **Optional Features**: `pip install kserve[storage]` for cloud storage, `pip install kserve[ray]` for Ray integration
12

13
## Core Imports
14

15
```python
16
import kserve
17
```
18

19
Common imports for model serving:
20

21
```python
22
from kserve import Model, ModelServer
23
```
24

25
For client operations:
26

27
```python
28
from kserve import InferenceRESTClient, InferenceGRPCClient, KServeClient
29
```
30

31
For protocol types:
32

33
```python
34
from kserve import InferRequest, InferResponse, InferInput, InferOutput
35
```
36

37
## Basic Usage
38

39
### Creating a Custom Model Server
40

41
```python
42
from kserve import Model, ModelServer
43
import asyncio
44

45
class MyModel(Model):
46
    def __init__(self, name: str):
47
        super().__init__(name)
48
        self.model = None
49
        self.ready = False
50
    
51
    def load(self):
52
        # Load your model here
53
        # self.model = load_model("path/to/model")
54
        self.ready = True
55
    
56
    async def predict(self, payload):
57
        # Implement prediction logic
58
        # result = self.model.predict(payload)
59
        return {"predictions": "example_result"}
60

61
if __name__ == "__main__":
62
    model = MyModel("my-model")
63
    ModelServer().start([model])
64
```
65

66
### Making Inference Requests
67

68
```python
69
import asyncio
70
from kserve import InferenceRESTClient, InferRequest, InferInput
71

72
async def main():
73
    client = InferenceRESTClient("http://localhost:8080")
74
    
75
    # Create input data
76
    input_data = InferInput(name="data", shape=[1, 784], datatype="FP32")
77
    input_data.set_data_from_numpy(data_array)
78
    
79
    # Create inference request
80
    request = InferRequest(model_name="my-model", inputs=[input_data])
81
    
82
    # Make prediction
83
    response = await client.infer(request)
84
    predictions = response.outputs[0].as_numpy()
85

86
asyncio.run(main())
87
```
88

89
### Deploying Models with Kubernetes
90

91
```python
92
from kserve import KServeClient, V1beta1InferenceService, V1beta1InferenceServiceSpec
93
from kserve import V1beta1PredictorSpec, V1beta1SKLearnSpec
94

95
# Create Kubernetes client
96
client = KServeClient()
97

98
# Define inference service
99
isvc = V1beta1InferenceService(
100
    api_version="serving.kserve.io/v1beta1",
101
    kind="InferenceService",
102
    metadata={"name": "sklearn-iris", "namespace": "default"},
103
    spec=V1beta1InferenceServiceSpec(
104
        predictor=V1beta1PredictorSpec(
105
            sklearn=V1beta1SKLearnSpec(
106
                storage_uri="gs://kfserving-examples/models/sklearn/1.0/model"
107
            )
108
        )
109
    )
110
)
111

112
# Deploy the service
113
client.create(isvc, namespace="default")
114
```
115

116
## Architecture
117

118
KServe consists of several key components that work together to provide a complete ML serving solution:
119

120
### Core Components
121

122
- **Model**: Abstract base class for implementing custom serving logic with lifecycle management
123
- **ModelServer**: Production-ready HTTP/gRPC server with multi-processing, health checks, and metrics
124
- **ModelRepository**: Registry for managing multiple models with loading/unloading capabilities
125
- **Clients**: High-level REST and gRPC clients for inference requests
126
- **Protocol**: Standardized data types for inference requests and responses
127

128
### Kubernetes Integration
129

130
- **InferenceService**: Main Kubernetes custom resource for deploying models
131
- **ServingRuntime**: Defines the model serving container and runtime configuration
132
- **TrainedModel**: Manages model artifacts and versioning
133
- **InferenceGraph**: Orchestrates multi-model inference pipelines
134

135
### Framework Support
136

137
KServe provides built-in support for popular ML frameworks through specialized servers:
138
- TensorFlow Serving, PyTorch TorchServe, Scikit-learn, XGBoost, LightGBM
139
- ONNX Runtime, NVIDIA Triton, PMML, PaddlePaddle
140
- HuggingFace Transformers for LLM serving
141

142
## Capabilities
143

144
### Model Serving Framework
145

146
Core classes and interfaces for implementing custom model servers with lifecycle management, health checking, and protocol support.
147

148
```python { .api }
149
class Model:
150
    def __init__(self, name: str): ...
151
    def load(self): ...
152
    async def predict(self, payload): ...
153
    async def preprocess(self, payload): ...
154
    async def postprocess(self, payload): ...
155

156
class ModelServer:
157
    def __init__(self, http_port: int = 8080, grpc_port: int = 8081): ...
158
    def start(self, models: List[Model]): ...
159
    def register_model(self, model: Model): ...
160
```
161

162
[Model Serving](./model-serving.md)
163

164
### Inference Clients
165

166
High-level async clients for making inference requests to KServe models with retry logic, SSL support, and protocol conversion.
167

168
```python { .api }
169
class InferenceRESTClient:
170
    def __init__(self, url: str, config: RESTConfig = None): ...
171
    async def infer(self, request: InferRequest) -> InferResponse: ...
172
    async def explain(self, request: InferRequest) -> InferResponse: ...
173
    async def is_model_ready(self, model_name: str) -> bool: ...
174

175
class InferenceGRPCClient:
176
    def __init__(self, url: str): ...
177
    async def infer(self, request: InferRequest) -> InferResponse: ...
178
```
179

180
[Inference Clients](./inference-clients.md)
181

182
### Protocol and Data Types
183

184
Standardized data structures for inference requests and responses with support for multiple protocols and data formats.
185

186
```python { .api }
187
class InferRequest:
188
    def __init__(self, model_name: str, inputs: List[InferInput]): ...
189
    def as_dataframe(self) -> pandas.DataFrame: ...
190
    def to_rest(self) -> dict: ...
191

192
class InferResponse:
193
    def __init__(self, model_name: str, outputs: List[InferOutput]): ...
194
    @classmethod
195
    def from_rest(cls, response: dict) -> 'InferResponse': ...
196

197
class InferInput:
198
    def __init__(self, name: str, shape: List[int], datatype: str): ...
199
    def set_data_from_numpy(self, input_tensor: numpy.ndarray): ...
200
    def as_numpy(self) -> numpy.ndarray: ...
201
```
202

203
[Protocol and Data Types](./protocol.md)
204

205
### Kubernetes API Client
206

207
Python client for managing KServe resources in Kubernetes clusters including InferenceServices, TrainedModels, and InferenceGraphs.
208

209
```python { .api }
210
class KServeClient:
211
    def __init__(self, config_file: str = None): ...
212
    def create(self, obj, namespace: str = "default"): ...
213
    def get(self, name: str, namespace: str = "default"): ...
214
    def delete(self, name: str, namespace: str = "default"): ...
215
    def set_credentials(self, storage_type: str, **kwargs): ...
216
```
217

218
[Kubernetes Client](./kubernetes-client.md)
219

220
### Resource Models
221

222
Comprehensive set of Kubernetes Custom Resource Definitions for defining inference services, serving runtimes, and model configurations.
223

224
```python { .api }
225
class V1beta1InferenceService:
226
    def __init__(self, metadata: dict, spec: V1beta1InferenceServiceSpec): ...
227

228
class V1beta1PredictorSpec:
229
    def __init__(self, sklearn: V1beta1SKLearnSpec = None, 
230
                 pytorch: V1beta1TorchServeSpec = None): ...
231

232
class V1alpha1ServingRuntime:
233
    def __init__(self, metadata: dict, spec: V1alpha1ServingRuntimeSpec): ...
234
```
235

236
[Resource Models](./resource-models.md)
237

238
### Framework Servers
239

240
Pre-built model servers for popular ML frameworks that extend the core KServe functionality with framework-specific optimizations.
241

242
```python { .api }
243
# Scikit-learn
244
from sklearnserver import SKLearnModel
245

246
# XGBoost  
247
from xgbserver import XGBoostModel
248

249
# HuggingFace
250
from huggingfaceserver import HuggingFaceModel
251
```
252

253
[Framework Servers](./framework-servers.md)
254

255
### Storage Integration
256

257
Unified storage interface supporting multiple cloud providers and local storage for model artifact management.
258

259
```python { .api }
260
from kserve.storage import Storage
261

262
def download_model(uri: str, dest: str):
263
    Storage.download(uri, dest)
264
```
265

266
[Storage](./storage.md)

Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/