or run

tessl search

tessl/pypi-kserve

tessl install tessl/pypi-kserve@0.16.1

KServe is a comprehensive Python SDK that provides standardized interfaces for building and deploying machine learning model serving infrastructure on Kubernetes.

KServe Python SDK

KServe is a comprehensive Python SDK for building and deploying ML model serving infrastructure on Kubernetes. It provides a Control Plane Client for managing InferenceService resources and a Serving Runtime SDK with FastAPI-based servers supporting Open Inference Protocol, V1, V2, and OpenAI protocols.

Quick Start

Installation:

pip install kserve              # Base package
pip install kserve[storage]     # With S3, GCS, Azure support
pip install kserve[llm]         # With OpenAI protocol support

Core Imports:

from kserve import Model, ModelServer                    # Model serving
from kserve import InferenceRESTClient, InferenceGRPCClient  # Clients
from kserve import KServeClient                          # Kubernetes control
from kserve import InferRequest, InferResponse           # Protocol types

Basic Model Server:

from kserve import Model, ModelServer

class MyModel(Model):
    def load(self):
        self.model = load_my_model()
        self.ready = True
    
    def predict(self, payload, headers=None):
        return {"predictions": self.model.predict(payload["instances"])}

if __name__ == "__main__":
    model = MyModel("my-model")
    model.load()
    ModelServer().start([model])

→ Complete Quick Start Guide

Architecture Overview

Serving Runtime SDK

Model Class - Custom inference logic with lifecycle hooks
ModelServer - FastAPI server with health endpoints
Protocols - REST v1/v2, gRPC v2, OpenAI-compatible
Model Repository - Dynamic model loading/unloading
Storage - GCS, S3, Azure Blob, PVC, HTTP/HTTPS
Observability - Prometheus metrics, structured logging

Control Plane Client

KServeClient - Kubernetes API operations
Resource Management - InferenceServices, TrainedModels, InferenceGraphs
Credentials - Storage configuration for GCS, S3, Azure
Status Tracking - Resource readiness monitoring

Core Components

Component	Description	Reference
Model	Base class for custom models	custom-models.md
ModelServer	FastAPI-based server	model-server.md
InferenceClients	REST/gRPC clients	inference-clients.md
KServeClient	Kubernetes control plane	kserve-client.md
Protocol Types	InferRequest/InferResponse	protocol-types.md
ModelRepository	Dynamic model management	model-repository.md
Configuration	Client/server config	configuration.md
Errors	Exception handling	errors.md
Logging/Metrics	Observability	logging-metrics.md
Constants/Utils	Helper functions	constants-utils.md
Kubernetes Models	Resource definitions	kubernetes-models.md
OpenAI Protocol	LLM serving	openai-protocol.md

Quick Reference

Model Lifecycle Methods

class Model:
    def load(self) -> None: ...                          # Load model artifacts
    def preprocess(self, body, headers=None): ...        # Transform input
    def predict(self, payload, headers=None): ...        # Run inference
    def postprocess(self, response, headers=None): ...   # Transform output
    def explain(self, payload, headers=None): ...        # Generate explanations

Inference Clients

# REST Client
client = InferenceRESTClient(url="http://localhost:8080")
response = await client.infer(base_url=url, model_name="my-model", data={...})

# gRPC Client
client = InferenceGRPCClient(url="localhost:8081")
response = await client.infer(model_name="my-model", inputs=[...])

Kubernetes Operations

client = KServeClient()
client.create(inferenceservice)                          # Create resource
client.get(name, namespace)                              # Get resource
client.patch(name, inferenceservice, namespace)          # Update resource
client.delete(name, namespace)                           # Delete resource
client.wait_isvc_ready(name, namespace, timeout)         # Wait for ready

Protocol Data Types

# Input tensor
input = InferInput(name="input-0", shape=[1, 4], datatype="FP32", data=[[...]])
input.set_data_from_numpy(array)                         # From NumPy
array = input.as_numpy()                                 # To NumPy

# Request/Response
request = InferRequest(model_name="model", infer_inputs=[input])
response = InferResponse(model_name="model", infer_outputs=[output])

Server Configuration

# Start server with options
ModelServer(
    http_port=8080,
    grpc_port=8081,
    workers=4,
    enable_grpc=True,
    enable_docs_url=True
).start([model])

Protocol Support

Protocol	Endpoints	Use Case
REST v1	`/v1/models/:predict`, `/v1/models/:explain`	Legacy compatibility
REST v2	`/v2/models/:infer`, `/v2/health/*`	Standard inference
gRPC v2	`ModelInfer`, `ServerMetadata`	High performance
OpenAI	`/v1/chat/completions`, `/v1/embeddings`	LLM serving

Storage Backends

Supported with pip install kserve[storage]:

Google Cloud Storage (gs://)
S3 Compatible (s3://)
Azure Blob Storage (https://)
Local filesystem (file://)
Persistent Volume Claims (pvc://)
HTTP/HTTPS URLs

Framework Support

Framework	Predictor Spec	Example
Scikit-learn	`V1beta1SKLearnSpec`	`sklearn={"storageUri": "gs://..."}`
XGBoost	`V1beta1XGBoostSpec`	`xgboost={"storageUri": "s3://..."}`
TensorFlow	`V1beta1TFServingSpec`	`tensorflow={"storageUri": "..."}`
PyTorch	`V1beta1TorchServeSpec`	`pytorch={"storageUri": "..."}`
ONNX	`V1beta1ONNXRuntimeSpec`	`onnx={"storageUri": "..."}`
Triton	`V1beta1TritonSpec`	`triton={"storageUri": "..."}`
Hugging Face	`V1beta1HuggingFaceRuntimeSpec`	`huggingface={"storageUri": "..."}`
LightGBM	`V1beta1LightGBMSpec`	`lightgbm={"storageUri": "..."}`
PMML	`V1beta1PMMLSpec`	`pmml={"storageUri": "..."}`

Data Types

KServe Type	NumPy Type	Description
`BOOL`	`np.bool_`	Boolean
`UINT8/16/32/64`	`np.uint8/16/32/64`	Unsigned integers
`INT8/16/32/64`	`np.int8/16/32/64`	Signed integers
`FP16/32/64`	`np.float16/32/64`	Floating point
`BYTES`	`np.object_`	Variable-length bytes

Error Handling

from kserve.errors import (
    InferenceError,      # Inference execution failure (500)
    InvalidInput,        # Invalid input data (400)
    ModelNotFound,       # Model doesn't exist (404)
    ModelNotReady,       # Model not initialized (503)
    UnsupportedProtocol, # Unknown protocol
    ServerNotReady       # Server not ready (503)
)

Guides

Quick Start Guide - Get started with KServe
Production Deployment - Best practices for production
Testing Strategies - Unit and integration testing
Performance Optimization - Caching, batching, async patterns

Examples

Real-World Scenarios - Complete usage examples
Edge Cases - Advanced scenarios and error handling
Integration Patterns - Component integration examples

Reference Documentation

API Reference

Custom Models - Model class and lifecycle
Model Server - Server configuration and management
Inference Clients - REST and gRPC clients
KServe Client - Kubernetes operations
Protocol Types - Request/response data types
Model Repository - Dynamic model management

Configuration & Operations

Configuration - Client and server configuration
Errors - Exception classes and handlers
Logging & Metrics - Observability setup
Constants & Utils - Helper functions and constants

Kubernetes Resources

Kubernetes Models - InferenceService, TrainedModel, etc.
OpenAI Protocol - LLM serving with OpenAI API

Advanced Topics

Design Patterns - Async/sync, context managers, binary data
Error Recovery - Retry logic, circuit breakers
Troubleshooting - Common issues and solutions

Health & Monitoring

Health Endpoints:

GET /v2/health/live - Server liveness
GET /v2/health/ready - Server readiness
GET /v2/models/{name}/ready - Model readiness

Metrics Endpoint:

GET /metrics - Prometheus metrics

Key Metrics:

request_preprocess_seconds - Preprocessing latency
request_predict_seconds - Prediction latency
request_postprocess_seconds - Postprocessing latency
request_explain_seconds - Explanation latency

Command-Line Options

python model.py \
  --http_port 8080 \
  --grpc_port 8081 \
  --workers 4 \
  --max_threads 8 \
  --enable_grpc true \
  --enable_docs_url true \
  --log_config_file /path/to/config.yaml

Additional Resources

Version Information

Package: kserve
Version: 0.16.0
Python: 3.8+
Kubernetes: 1.20+

Version