or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/github-kserve

Kubernetes Custom Resource Definition for serving predictive and generative machine learning models with high abstraction interfaces and features like GPU autoscaling, scale to zero, and canary rollouts.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes

pkg:github/kserve/kserve@0.15.x

To install, run

npx @tessl/cli install tessl/github-kserve@0.15.0

0

# KServe

1

2

KServe is a comprehensive Kubernetes-native machine learning model serving platform that enables production deployment of ML models through Custom Resource Definitions. It provides a unified interface for serving predictive and generative models with enterprise-grade features like GPU autoscaling, scale-to-zero, canary rollouts, and multi-framework support.

3

4

## Package Information

5

6

- **Package Name**: kserve

7

- **Package Type**: Python SDK

8

- **Language**: Python

9

- **Installation**: `pip install kserve`

10

- **Requirements**: Python >=3.9,<3.13, Kubernetes cluster

11

- **Optional Features**: `pip install kserve[storage]` for cloud storage, `pip install kserve[ray]` for Ray integration

12

13

## Core Imports

14

15

```python

16

import kserve

17

```

18

19

Common imports for model serving:

20

21

```python

22

from kserve import Model, ModelServer

23

```

24

25

For client operations:

26

27

```python

28

from kserve import InferenceRESTClient, InferenceGRPCClient, KServeClient

29

```

30

31

For protocol types:

32

33

```python

34

from kserve import InferRequest, InferResponse, InferInput, InferOutput

35

```

36

37

## Basic Usage

38

39

### Creating a Custom Model Server

40

41

```python

42

from kserve import Model, ModelServer

43

import asyncio

44

45

class MyModel(Model):

46

def __init__(self, name: str):

47

super().__init__(name)

48

self.model = None

49

self.ready = False

50

51

def load(self):

52

# Load your model here

53

# self.model = load_model("path/to/model")

54

self.ready = True

55

56

async def predict(self, payload):

57

# Implement prediction logic

58

# result = self.model.predict(payload)

59

return {"predictions": "example_result"}

60

61

if __name__ == "__main__":

62

model = MyModel("my-model")

63

ModelServer().start([model])

64

```

65

66

### Making Inference Requests

67

68

```python

69

import asyncio

70

from kserve import InferenceRESTClient, InferRequest, InferInput

71

72

async def main():

73

client = InferenceRESTClient("http://localhost:8080")

74

75

# Create input data

76

input_data = InferInput(name="data", shape=[1, 784], datatype="FP32")

77

input_data.set_data_from_numpy(data_array)

78

79

# Create inference request

80

request = InferRequest(model_name="my-model", inputs=[input_data])

81

82

# Make prediction

83

response = await client.infer(request)

84

predictions = response.outputs[0].as_numpy()

85

86

asyncio.run(main())

87

```

88

89

### Deploying Models with Kubernetes

90

91

```python

92

from kserve import KServeClient, V1beta1InferenceService, V1beta1InferenceServiceSpec

93

from kserve import V1beta1PredictorSpec, V1beta1SKLearnSpec

94

95

# Create Kubernetes client

96

client = KServeClient()

97

98

# Define inference service

99

isvc = V1beta1InferenceService(

100

api_version="serving.kserve.io/v1beta1",

101

kind="InferenceService",

102

metadata={"name": "sklearn-iris", "namespace": "default"},

103

spec=V1beta1InferenceServiceSpec(

104

predictor=V1beta1PredictorSpec(

105

sklearn=V1beta1SKLearnSpec(

106

storage_uri="gs://kfserving-examples/models/sklearn/1.0/model"

107

)

108

)

109

)

110

)

111

112

# Deploy the service

113

client.create(isvc, namespace="default")

114

```

115

116

## Architecture

117

118

KServe consists of several key components that work together to provide a complete ML serving solution:

119

120

### Core Components

121

122

- **Model**: Abstract base class for implementing custom serving logic with lifecycle management

123

- **ModelServer**: Production-ready HTTP/gRPC server with multi-processing, health checks, and metrics

124

- **ModelRepository**: Registry for managing multiple models with loading/unloading capabilities

125

- **Clients**: High-level REST and gRPC clients for inference requests

126

- **Protocol**: Standardized data types for inference requests and responses

127

128

### Kubernetes Integration

129

130

- **InferenceService**: Main Kubernetes custom resource for deploying models

131

- **ServingRuntime**: Defines the model serving container and runtime configuration

132

- **TrainedModel**: Manages model artifacts and versioning

133

- **InferenceGraph**: Orchestrates multi-model inference pipelines

134

135

### Framework Support

136

137

KServe provides built-in support for popular ML frameworks through specialized servers:

138

- TensorFlow Serving, PyTorch TorchServe, Scikit-learn, XGBoost, LightGBM

139

- ONNX Runtime, NVIDIA Triton, PMML, PaddlePaddle

140

- HuggingFace Transformers for LLM serving

141

142

## Capabilities

143

144

### Model Serving Framework

145

146

Core classes and interfaces for implementing custom model servers with lifecycle management, health checking, and protocol support.

147

148

```python { .api }

149

class Model:

150

def __init__(self, name: str): ...

151

def load(self): ...

152

async def predict(self, payload): ...

153

async def preprocess(self, payload): ...

154

async def postprocess(self, payload): ...

155

156

class ModelServer:

157

def __init__(self, http_port: int = 8080, grpc_port: int = 8081): ...

158

def start(self, models: List[Model]): ...

159

def register_model(self, model: Model): ...

160

```

161

162

[Model Serving](./model-serving.md)

163

164

### Inference Clients

165

166

High-level async clients for making inference requests to KServe models with retry logic, SSL support, and protocol conversion.

167

168

```python { .api }

169

class InferenceRESTClient:

170

def __init__(self, url: str, config: RESTConfig = None): ...

171

async def infer(self, request: InferRequest) -> InferResponse: ...

172

async def explain(self, request: InferRequest) -> InferResponse: ...

173

async def is_model_ready(self, model_name: str) -> bool: ...

174

175

class InferenceGRPCClient:

176

def __init__(self, url: str): ...

177

async def infer(self, request: InferRequest) -> InferResponse: ...

178

```

179

180

[Inference Clients](./inference-clients.md)

181

182

### Protocol and Data Types

183

184

Standardized data structures for inference requests and responses with support for multiple protocols and data formats.

185

186

```python { .api }

187

class InferRequest:

188

def __init__(self, model_name: str, inputs: List[InferInput]): ...

189

def as_dataframe(self) -> pandas.DataFrame: ...

190

def to_rest(self) -> dict: ...

191

192

class InferResponse:

193

def __init__(self, model_name: str, outputs: List[InferOutput]): ...

194

@classmethod

195

def from_rest(cls, response: dict) -> 'InferResponse': ...

196

197

class InferInput:

198

def __init__(self, name: str, shape: List[int], datatype: str): ...

199

def set_data_from_numpy(self, input_tensor: numpy.ndarray): ...

200

def as_numpy(self) -> numpy.ndarray: ...

201

```

202

203

[Protocol and Data Types](./protocol.md)

204

205

### Kubernetes API Client

206

207

Python client for managing KServe resources in Kubernetes clusters including InferenceServices, TrainedModels, and InferenceGraphs.

208

209

```python { .api }

210

class KServeClient:

211

def __init__(self, config_file: str = None): ...

212

def create(self, obj, namespace: str = "default"): ...

213

def get(self, name: str, namespace: str = "default"): ...

214

def delete(self, name: str, namespace: str = "default"): ...

215

def set_credentials(self, storage_type: str, **kwargs): ...

216

```

217

218

[Kubernetes Client](./kubernetes-client.md)

219

220

### Resource Models

221

222

Comprehensive set of Kubernetes Custom Resource Definitions for defining inference services, serving runtimes, and model configurations.

223

224

```python { .api }

225

class V1beta1InferenceService:

226

def __init__(self, metadata: dict, spec: V1beta1InferenceServiceSpec): ...

227

228

class V1beta1PredictorSpec:

229

def __init__(self, sklearn: V1beta1SKLearnSpec = None,

230

pytorch: V1beta1TorchServeSpec = None): ...

231

232

class V1alpha1ServingRuntime:

233

def __init__(self, metadata: dict, spec: V1alpha1ServingRuntimeSpec): ...

234

```

235

236

[Resource Models](./resource-models.md)

237

238

### Framework Servers

239

240

Pre-built model servers for popular ML frameworks that extend the core KServe functionality with framework-specific optimizations.

241

242

```python { .api }

243

# Scikit-learn

244

from sklearnserver import SKLearnModel

245

246

# XGBoost

247

from xgbserver import XGBoostModel

248

249

# HuggingFace

250

from huggingfaceserver import HuggingFaceModel

251

```

252

253

[Framework Servers](./framework-servers.md)

254

255

### Storage Integration

256

257

Unified storage interface supporting multiple cloud providers and local storage for model artifact management.

258

259

```python { .api }

260

from kserve.storage import Storage

261

262

def download_model(uri: str, dest: str):

263

Storage.download(uri, dest)

264

```

265

266

[Storage](./storage.md)