CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-kserve

KServe is a comprehensive Python SDK that provides standardized interfaces for building and deploying machine learning model serving infrastructure on Kubernetes.

Pending
Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Pending

The risk profile of this skill

Overview
Eval results
Files

KServe Python SDK

KServe is a comprehensive Python SDK for building and deploying ML model serving infrastructure on Kubernetes. It provides a Control Plane Client for managing InferenceService resources and a Serving Runtime SDK with FastAPI-based servers supporting Open Inference Protocol, V1, V2, and OpenAI protocols.

Quick Start

Installation:

pip install kserve              # Base package
pip install kserve[storage]     # With S3, GCS, Azure support
pip install kserve[llm]         # With OpenAI protocol support

Core Imports:

from kserve import Model, ModelServer                    # Model serving
from kserve import InferenceRESTClient, InferenceGRPCClient  # Clients
from kserve import KServeClient                          # Kubernetes control
from kserve import InferRequest, InferResponse           # Protocol types

Basic Model Server:

from kserve import Model, ModelServer

class MyModel(Model):
    def load(self):
        self.model = load_my_model()
        self.ready = True
    
    def predict(self, payload, headers=None):
        return {"predictions": self.model.predict(payload["instances"])}

if __name__ == "__main__":
    model = MyModel("my-model")
    model.load()
    ModelServer().start([model])

Complete Quick Start Guide

Architecture Overview

Serving Runtime SDK

  • Model Class - Custom inference logic with lifecycle hooks
  • ModelServer - FastAPI server with health endpoints
  • Protocols - REST v1/v2, gRPC v2, OpenAI-compatible
  • Model Repository - Dynamic model loading/unloading
  • Storage - GCS, S3, Azure Blob, PVC, HTTP/HTTPS
  • Observability - Prometheus metrics, structured logging

Control Plane Client

  • KServeClient - Kubernetes API operations
  • Resource Management - InferenceServices, TrainedModels, InferenceGraphs
  • Credentials - Storage configuration for GCS, S3, Azure
  • Status Tracking - Resource readiness monitoring

Core Components

ComponentDescriptionReference
ModelBase class for custom modelscustom-models.md
ModelServerFastAPI-based servermodel-server.md
InferenceClientsREST/gRPC clientsinference-clients.md
KServeClientKubernetes control planekserve-client.md
Protocol TypesInferRequest/InferResponseprotocol-types.md
ModelRepositoryDynamic model managementmodel-repository.md
ConfigurationClient/server configconfiguration.md
ErrorsException handlingerrors.md
Logging/MetricsObservabilitylogging-metrics.md
Constants/UtilsHelper functionsconstants-utils.md
Kubernetes ModelsResource definitionskubernetes-models.md
OpenAI ProtocolLLM servingopenai-protocol.md

Quick Reference

Model Lifecycle Methods

class Model:
    def load(self) -> None: ...                          # Load model artifacts
    def preprocess(self, body, headers=None): ...        # Transform input
    def predict(self, payload, headers=None): ...        # Run inference
    def postprocess(self, response, headers=None): ...   # Transform output
    def explain(self, payload, headers=None): ...        # Generate explanations

Inference Clients

# REST Client
client = InferenceRESTClient(url="http://localhost:8080")
response = await client.infer(base_url=url, model_name="my-model", data={...})

# gRPC Client
client = InferenceGRPCClient(url="localhost:8081")
response = await client.infer(model_name="my-model", inputs=[...])

Kubernetes Operations

client = KServeClient()
client.create(inferenceservice)                          # Create resource
client.get(name, namespace)                              # Get resource
client.patch(name, inferenceservice, namespace)          # Update resource
client.delete(name, namespace)                           # Delete resource
client.wait_isvc_ready(name, namespace, timeout)         # Wait for ready

Protocol Data Types

# Input tensor
input = InferInput(name="input-0", shape=[1, 4], datatype="FP32", data=[[...]])
input.set_data_from_numpy(array)                         # From NumPy
array = input.as_numpy()                                 # To NumPy

# Request/Response
request = InferRequest(model_name="model", infer_inputs=[input])
response = InferResponse(model_name="model", infer_outputs=[output])

Server Configuration

# Start server with options
ModelServer(
    http_port=8080,
    grpc_port=8081,
    workers=4,
    enable_grpc=True,
    enable_docs_url=True
).start([model])

Protocol Support

ProtocolEndpointsUse Case
REST v1/v1/models/:predict, /v1/models/:explainLegacy compatibility
REST v2/v2/models/:infer, /v2/health/*Standard inference
gRPC v2ModelInfer, ServerMetadataHigh performance
OpenAI/v1/chat/completions, /v1/embeddingsLLM serving

Storage Backends

Supported with pip install kserve[storage]:

  • Google Cloud Storage (gs://)
  • S3 Compatible (s3://)
  • Azure Blob Storage (https://)
  • Local filesystem (file://)
  • Persistent Volume Claims (pvc://)
  • HTTP/HTTPS URLs

Framework Support

FrameworkPredictor SpecExample
Scikit-learnV1beta1SKLearnSpecsklearn={"storageUri": "gs://..."}
XGBoostV1beta1XGBoostSpecxgboost={"storageUri": "s3://..."}
TensorFlowV1beta1TFServingSpectensorflow={"storageUri": "..."}
PyTorchV1beta1TorchServeSpecpytorch={"storageUri": "..."}
ONNXV1beta1ONNXRuntimeSpeconnx={"storageUri": "..."}
TritonV1beta1TritonSpectriton={"storageUri": "..."}
Hugging FaceV1beta1HuggingFaceRuntimeSpechuggingface={"storageUri": "..."}
LightGBMV1beta1LightGBMSpeclightgbm={"storageUri": "..."}
PMMLV1beta1PMMLSpecpmml={"storageUri": "..."}

Data Types

KServe TypeNumPy TypeDescription
BOOLnp.bool_Boolean
UINT8/16/32/64np.uint8/16/32/64Unsigned integers
INT8/16/32/64np.int8/16/32/64Signed integers
FP16/32/64np.float16/32/64Floating point
BYTESnp.object_Variable-length bytes

Error Handling

from kserve.errors import (
    InferenceError,      # Inference execution failure (500)
    InvalidInput,        # Invalid input data (400)
    ModelNotFound,       # Model doesn't exist (404)
    ModelNotReady,       # Model not initialized (503)
    UnsupportedProtocol, # Unknown protocol
    ServerNotReady       # Server not ready (503)
)

Guides

Examples

Reference Documentation

API Reference

Configuration & Operations

Kubernetes Resources

Advanced Topics

Health & Monitoring

Health Endpoints:

  • GET /v2/health/live - Server liveness
  • GET /v2/health/ready - Server readiness
  • GET /v2/models/{name}/ready - Model readiness

Metrics Endpoint:

  • GET /metrics - Prometheus metrics

Key Metrics:

  • request_preprocess_seconds - Preprocessing latency
  • request_predict_seconds - Prediction latency
  • request_postprocess_seconds - Postprocessing latency
  • request_explain_seconds - Explanation latency

Command-Line Options

python model.py \
  --http_port 8080 \
  --grpc_port 8081 \
  --workers 4 \
  --max_threads 8 \
  --enable_grpc true \
  --enable_docs_url true \
  --log_config_file /path/to/config.yaml

Additional Resources

  • KServe Documentation
  • GitHub Repository
  • Examples Repository
  • Community Slack

Version Information

  • Package: kserve
  • Version: 0.16.0
  • Python: 3.8+
  • Kubernetes: 1.20+
Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/kserve@0.16.x
Publish Source
CLI
Badge
tessl/pypi-kserve badge