or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

framework-servers.mdindex.mdinference-clients.mdkubernetes-client.mdmodel-serving.mdprotocol.mdresource-models.mdstorage.md

index.mddocs/

0

# KServe

1

2

KServe is a comprehensive Kubernetes-native machine learning model serving platform that enables production deployment of ML models through Custom Resource Definitions. It provides a unified interface for serving predictive and generative models with enterprise-grade features like GPU autoscaling, scale-to-zero, canary rollouts, and multi-framework support.

3

4

## Package Information

5

6

- **Package Name**: kserve

7

- **Package Type**: Python SDK

8

- **Language**: Python

9

- **Installation**: `pip install kserve`

10

- **Requirements**: Python >=3.9,<3.13, Kubernetes cluster

11

- **Optional Features**: `pip install kserve[storage]` for cloud storage, `pip install kserve[ray]` for Ray integration

12

13

## Core Imports

14

15

```python

16

import kserve

17

```

18

19

Common imports for model serving:

20

21

```python

22

from kserve import Model, ModelServer

23

```

24

25

For client operations:

26

27

```python

28

from kserve import InferenceRESTClient, InferenceGRPCClient, KServeClient

29

```

30

31

For protocol types:

32

33

```python

34

from kserve import InferRequest, InferResponse, InferInput, InferOutput

35

```

36

37

## Basic Usage

38

39

### Creating a Custom Model Server

40

41

```python

42

from kserve import Model, ModelServer

43

import asyncio

44

45

class MyModel(Model):

46

def __init__(self, name: str):

47

super().__init__(name)

48

self.model = None

49

self.ready = False

50

51

def load(self):

52

# Load your model here

53

# self.model = load_model("path/to/model")

54

self.ready = True

55

56

async def predict(self, payload):

57

# Implement prediction logic

58

# result = self.model.predict(payload)

59

return {"predictions": "example_result"}

60

61

if __name__ == "__main__":

62

model = MyModel("my-model")

63

ModelServer().start([model])

64

```

65

66

### Making Inference Requests

67

68

```python

69

import asyncio

70

from kserve import InferenceRESTClient, InferRequest, InferInput

71

72

async def main():

73

client = InferenceRESTClient("http://localhost:8080")

74

75

# Create input data

76

input_data = InferInput(name="data", shape=[1, 784], datatype="FP32")

77

input_data.set_data_from_numpy(data_array)

78

79

# Create inference request

80

request = InferRequest(model_name="my-model", inputs=[input_data])

81

82

# Make prediction

83

response = await client.infer(request)

84

predictions = response.outputs[0].as_numpy()

85

86

asyncio.run(main())

87

```

88

89

### Deploying Models with Kubernetes

90

91

```python

92

from kserve import KServeClient, V1beta1InferenceService, V1beta1InferenceServiceSpec

93

from kserve import V1beta1PredictorSpec, V1beta1SKLearnSpec

94

95

# Create Kubernetes client

96

client = KServeClient()

97

98

# Define inference service

99

isvc = V1beta1InferenceService(

100

api_version="serving.kserve.io/v1beta1",

101

kind="InferenceService",

102

metadata={"name": "sklearn-iris", "namespace": "default"},

103

spec=V1beta1InferenceServiceSpec(

104

predictor=V1beta1PredictorSpec(

105

sklearn=V1beta1SKLearnSpec(

106

storage_uri="gs://kfserving-examples/models/sklearn/1.0/model"

107

)

108

)

109

)

110

)

111

112

# Deploy the service

113

client.create(isvc, namespace="default")

114

```

115

116

## Architecture

117

118

KServe consists of several key components that work together to provide a complete ML serving solution:

119

120

### Core Components

121

122

- **Model**: Abstract base class for implementing custom serving logic with lifecycle management

123

- **ModelServer**: Production-ready HTTP/gRPC server with multi-processing, health checks, and metrics

124

- **ModelRepository**: Registry for managing multiple models with loading/unloading capabilities

125

- **Clients**: High-level REST and gRPC clients for inference requests

126

- **Protocol**: Standardized data types for inference requests and responses

127

128

### Kubernetes Integration

129

130

- **InferenceService**: Main Kubernetes custom resource for deploying models

131

- **ServingRuntime**: Defines the model serving container and runtime configuration

132

- **TrainedModel**: Manages model artifacts and versioning

133

- **InferenceGraph**: Orchestrates multi-model inference pipelines

134

135

### Framework Support

136

137

KServe provides built-in support for popular ML frameworks through specialized servers:

138

- TensorFlow Serving, PyTorch TorchServe, Scikit-learn, XGBoost, LightGBM

139

- ONNX Runtime, NVIDIA Triton, PMML, PaddlePaddle

140

- HuggingFace Transformers for LLM serving

141

142

## Capabilities

143

144

### Model Serving Framework

145

146

Core classes and interfaces for implementing custom model servers with lifecycle management, health checking, and protocol support.

147

148

```python { .api }

149

class Model:

150

def __init__(self, name: str): ...

151

def load(self): ...

152

async def predict(self, payload): ...

153

async def preprocess(self, payload): ...

154

async def postprocess(self, payload): ...

155

156

class ModelServer:

157

def __init__(self, http_port: int = 8080, grpc_port: int = 8081): ...

158

def start(self, models: List[Model]): ...

159

def register_model(self, model: Model): ...

160

```

161

162

[Model Serving](./model-serving.md)

163

164

### Inference Clients

165

166

High-level async clients for making inference requests to KServe models with retry logic, SSL support, and protocol conversion.

167

168

```python { .api }

169

class InferenceRESTClient:

170

def __init__(self, url: str, config: RESTConfig = None): ...

171

async def infer(self, request: InferRequest) -> InferResponse: ...

172

async def explain(self, request: InferRequest) -> InferResponse: ...

173

async def is_model_ready(self, model_name: str) -> bool: ...

174

175

class InferenceGRPCClient:

176

def __init__(self, url: str): ...

177

async def infer(self, request: InferRequest) -> InferResponse: ...

178

```

179

180

[Inference Clients](./inference-clients.md)

181

182

### Protocol and Data Types

183

184

Standardized data structures for inference requests and responses with support for multiple protocols and data formats.

185

186

```python { .api }

187

class InferRequest:

188

def __init__(self, model_name: str, inputs: List[InferInput]): ...

189

def as_dataframe(self) -> pandas.DataFrame: ...

190

def to_rest(self) -> dict: ...

191

192

class InferResponse:

193

def __init__(self, model_name: str, outputs: List[InferOutput]): ...

194

@classmethod

195

def from_rest(cls, response: dict) -> 'InferResponse': ...

196

197

class InferInput:

198

def __init__(self, name: str, shape: List[int], datatype: str): ...

199

def set_data_from_numpy(self, input_tensor: numpy.ndarray): ...

200

def as_numpy(self) -> numpy.ndarray: ...

201

```

202

203

[Protocol and Data Types](./protocol.md)

204

205

### Kubernetes API Client

206

207

Python client for managing KServe resources in Kubernetes clusters including InferenceServices, TrainedModels, and InferenceGraphs.

208

209

```python { .api }

210

class KServeClient:

211

def __init__(self, config_file: str = None): ...

212

def create(self, obj, namespace: str = "default"): ...

213

def get(self, name: str, namespace: str = "default"): ...

214

def delete(self, name: str, namespace: str = "default"): ...

215

def set_credentials(self, storage_type: str, **kwargs): ...

216

```

217

218

[Kubernetes Client](./kubernetes-client.md)

219

220

### Resource Models

221

222

Comprehensive set of Kubernetes Custom Resource Definitions for defining inference services, serving runtimes, and model configurations.

223

224

```python { .api }

225

class V1beta1InferenceService:

226

def __init__(self, metadata: dict, spec: V1beta1InferenceServiceSpec): ...

227

228

class V1beta1PredictorSpec:

229

def __init__(self, sklearn: V1beta1SKLearnSpec = None,

230

pytorch: V1beta1TorchServeSpec = None): ...

231

232

class V1alpha1ServingRuntime:

233

def __init__(self, metadata: dict, spec: V1alpha1ServingRuntimeSpec): ...

234

```

235

236

[Resource Models](./resource-models.md)

237

238

### Framework Servers

239

240

Pre-built model servers for popular ML frameworks that extend the core KServe functionality with framework-specific optimizations.

241

242

```python { .api }

243

# Scikit-learn

244

from sklearnserver import SKLearnModel

245

246

# XGBoost

247

from xgbserver import XGBoostModel

248

249

# HuggingFace

250

from huggingfaceserver import HuggingFaceModel

251

```

252

253

[Framework Servers](./framework-servers.md)

254

255

### Storage Integration

256

257

Unified storage interface supporting multiple cloud providers and local storage for model artifact management.

258

259

```python { .api }

260

from kserve.storage import Storage

261

262

def download_model(uri: str, dest: str):

263

Storage.download(uri, dest)

264

```

265

266

[Storage](./storage.md)