tessl/pypi-kserve

KServe is a comprehensive Python SDK that provides standardized interfaces for building and deploying machine learning model serving infrastructure on Kubernetes.

—

Pending

Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Securityby

Pending

The risk profile of this skill

Overview

Eval results

Files

KServe Python SDK

Name: tessl/pypi-kserve
Author: tessl

KServe is a comprehensive Python SDK for building and deploying ML model serving infrastructure on Kubernetes. It provides a Control Plane Client for managing InferenceService resources and a Serving Runtime SDK with FastAPI-based servers supporting Open Inference Protocol, V1, V2, and OpenAI protocols.

Quick Start

Installation:

pip install kserve              # Base package
pip install kserve[storage]     # With S3, GCS, Azure support
pip install kserve[llm]         # With OpenAI protocol support

Core Imports:

from kserve import Model, ModelServer                    # Model serving
from kserve import InferenceRESTClient, InferenceGRPCClient  # Clients
from kserve import KServeClient                          # Kubernetes control
from kserve import InferRequest, InferResponse           # Protocol types

Basic Model Server:

from kserve import Model, ModelServer

class MyModel(Model):
    def load(self):
        self.model = load_my_model()
        self.ready = True
    
    def predict(self, payload, headers=None):
        return {"predictions": self.model.predict(payload["instances"])}

if __name__ == "__main__":
    model = MyModel("my-model")
    model.load()
    ModelServer().start([model])

→ Complete Quick Start Guide

Architecture Overview

Serving Runtime SDK

Model Class - Custom inference logic with lifecycle hooks
ModelServer - FastAPI server with health endpoints
Protocols - REST v1/v2, gRPC v2, OpenAI-compatible
Model Repository - Dynamic model loading/unloading
Storage - GCS, S3, Azure Blob, PVC, HTTP/HTTPS
Observability - Prometheus metrics, structured logging

Control Plane Client

KServeClient - Kubernetes API operations
Resource Management - InferenceServices, TrainedModels, InferenceGraphs
Credentials - Storage configuration for GCS, S3, Azure
Status Tracking - Resource readiness monitoring

Core Components

Component	Description	Reference
Model	Base class for custom models	custom-models.md
ModelServer	FastAPI-based server	model-server.md
InferenceClients	REST/gRPC clients	inference-clients.md
KServeClient	Kubernetes control plane	kserve-client.md
Protocol Types	InferRequest/InferResponse	protocol-types.md
ModelRepository	Dynamic model management	model-repository.md
Configuration	Client/server config	configuration.md
Errors	Exception handling	errors.md
Logging/Metrics	Observability	logging-metrics.md
Constants/Utils	Helper functions	constants-utils.md
Kubernetes Models	Resource definitions	kubernetes-models.md
OpenAI Protocol	LLM serving	openai-protocol.md

Quick Reference

Model Lifecycle Methods

class Model:
    def load(self) -> None: ...                          # Load model artifacts
    def preprocess(self, body, headers=None): ...        # Transform input
    def predict(self, payload, headers=None): ...        # Run inference
    def postprocess(self, response, headers=None): ...   # Transform output
    def explain(self, payload, headers=None): ...        # Generate explanations

Inference Clients

# REST Client
client = InferenceRESTClient(url="http://localhost:8080")
response = await client.infer(base_url=url, model_name="my-model", data={...})

# gRPC Client
client = InferenceGRPCClient(url="localhost:8081")
response = await client.infer(model_name="my-model", inputs=[...])

Kubernetes Operations

client = KServeClient()
client.create(inferenceservice)                          # Create resource
client.get(name, namespace)                              # Get resource
client.patch(name, inferenceservice, namespace)          # Update resource
client.delete(name, namespace)                           # Delete resource
client.wait_isvc_ready(name, namespace, timeout)         # Wait for ready

Protocol Data Types

# Input tensor
input = InferInput(name="input-0", shape=[1, 4], datatype="FP32", data=[[...]])
input.set_data_from_numpy(array)                         # From NumPy
array = input.as_numpy()                                 # To NumPy

# Request/Response
request = InferRequest(model_name="model", infer_inputs=[input])
response = InferResponse(model_name="model", infer_outputs=[output])

Server Configuration

# Start server with options
ModelServer(
    http_port=8080,
    grpc_port=8081,
    workers=4,
    enable_grpc=True,
    enable_docs_url=True
).start([model])

Protocol Support

Protocol	Endpoints	Use Case
REST v1	`/v1/models/:predict`, `/v1/models/:explain`	Legacy compatibility
REST v2	`/v2/models/:infer`, `/v2/health/*`	Standard inference
gRPC v2	`ModelInfer`, `ServerMetadata`	High performance
OpenAI	`/v1/chat/completions`, `/v1/embeddings`	LLM serving

Storage Backends

Supported with pip install kserve[storage]:

Google Cloud Storage (gs://)
S3 Compatible (s3://)
Azure Blob Storage (https://)
Local filesystem (file://)
Persistent Volume Claims (pvc://)
HTTP/HTTPS URLs

Framework Support

Framework	Predictor Spec	Example
Scikit-learn	`V1beta1SKLearnSpec`	`sklearn={"storageUri": "gs://..."}`
XGBoost	`V1beta1XGBoostSpec`	`xgboost={"storageUri": "s3://..."}`
TensorFlow	`V1beta1TFServingSpec`	`tensorflow={"storageUri": "..."}`
PyTorch	`V1beta1TorchServeSpec`	`pytorch={"storageUri": "..."}`
ONNX	`V1beta1ONNXRuntimeSpec`	`onnx={"storageUri": "..."}`
Triton	`V1beta1TritonSpec`	`triton={"storageUri": "..."}`
Hugging Face	`V1beta1HuggingFaceRuntimeSpec`	`huggingface={"storageUri": "..."}`
LightGBM	`V1beta1LightGBMSpec`	`lightgbm={"storageUri": "..."}`
PMML	`V1beta1PMMLSpec`	`pmml={"storageUri": "..."}`

Data Types

KServe Type	NumPy Type	Description
`BOOL`	`np.bool_`	Boolean
`UINT8/16/32/64`	`np.uint8/16/32/64`	Unsigned integers
`INT8/16/32/64`	`np.int8/16/32/64`	Signed integers
`FP16/32/64`	`np.float16/32/64`	Floating point
`BYTES`	`np.object_`	Variable-length bytes

Error Handling

from kserve.errors import (
    InferenceError,      # Inference execution failure (500)
    InvalidInput,        # Invalid input data (400)
    ModelNotFound,       # Model doesn't exist (404)
    ModelNotReady,       # Model not initialized (503)
    UnsupportedProtocol, # Unknown protocol
    ServerNotReady       # Server not ready (503)
)

Guides

Quick Start Guide - Get started with KServe
Production Deployment - Best practices for production
Testing Strategies - Unit and integration testing
Performance Optimization - Caching, batching, async patterns

Examples

Real-World Scenarios - Complete usage examples
Edge Cases - Advanced scenarios and error handling
Integration Patterns - Component integration examples

Reference Documentation

API Reference

Custom Models - Model class and lifecycle
Model Server - Server configuration and management
Inference Clients - REST and gRPC clients
KServe Client - Kubernetes operations
Protocol Types - Request/response data types
Model Repository - Dynamic model management

Configuration & Operations

Configuration - Client and server configuration
Errors - Exception classes and handlers
Logging & Metrics - Observability setup
Constants & Utils - Helper functions and constants

Kubernetes Resources

Kubernetes Models - InferenceService, TrainedModel, etc.
OpenAI Protocol - LLM serving with OpenAI API

Advanced Topics

Design Patterns - Async/sync, context managers, binary data
Error Recovery - Retry logic, circuit breakers
Troubleshooting - Common issues and solutions

Health & Monitoring

Health Endpoints:

GET /v2/health/live - Server liveness
GET /v2/health/ready - Server readiness
GET /v2/models/{name}/ready - Model readiness

Metrics Endpoint:

GET /metrics - Prometheus metrics

Key Metrics:

request_preprocess_seconds - Preprocessing latency
request_predict_seconds - Prediction latency
request_postprocess_seconds - Postprocessing latency
request_explain_seconds - Explanation latency

Command-Line Options

python model.py \
  --http_port 8080 \
  --grpc_port 8081 \
  --workers 4 \
  --max_threads 8 \
  --enable_grpc true \
  --enable_docs_url true \
  --log_config_file /path/to/config.yaml

Additional Resources

Version Information

Package: kserve
Version: 0.16.0
Python: 3.8+
Kubernetes: 1.20+

Workspace: tessl
Visibility: Public
Created: 2 months ago
Last updated: 2 months ago
Describes: pkg:pypi/kserve@0.16.x
Publish Source: CLI
Badge