or run

tessl search
Log in

Version

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/kserve@0.16.x

docs

index.md
tile.json

tessl/pypi-kserve

tessl install tessl/pypi-kserve@0.16.1

KServe is a comprehensive Python SDK that provides standardized interfaces for building and deploying machine learning model serving infrastructure on Kubernetes.

troubleshooting.mddocs/reference/

Troubleshooting Guide

Common issues and solutions for KServe model servers.

Model Not Ready

Symptom

  • Health checks fail
  • Requests return 503 Service Unavailable
  • Model shows as not ready

Diagnosis

from kserve import Model, logger

class DebuggableModel(Model):
    def load(self):
        """Load with detailed logging"""
        try:
            logger.info(f"Starting model load for {self.name}")
            logger.info(f"Model path: {self.model_path}")
            
            # Check file exists
            import os
            if not os.path.exists(self.model_path):
                logger.error(f"Model file not found: {self.model_path}")
                raise FileNotFoundError(self.model_path)
            
            # Check file size
            size_mb = os.path.getsize(self.model_path) / (1024 * 1024)
            logger.info(f"Model file size: {size_mb:.2f}MB")
            
            # Load model
            logger.info("Loading model...")
            self.model = joblib.load(self.model_path)
            logger.info("Model loaded successfully")
            
            # Test prediction
            logger.info("Testing model...")
            test_result = self.model.predict([[1, 2, 3, 4]])
            logger.info(f"Test prediction successful: {test_result}")
            
            self.ready = True
            logger.info(f"Model {self.name} is ready")
            
        except Exception as e:
            logger.error(f"Failed to load model {self.name}: {e}", exc_info=True)
            self.ready = False
            raise

Solutions

  1. Check logs:
# Kubernetes
kubectl logs <pod-name> -n <namespace>

# Docker
docker logs <container-id>
  1. Verify model file:
# Check file exists
ls -lh /mnt/models/model.pkl

# Check permissions
ls -la /mnt/models/

# Check disk space
df -h /mnt/models/
  1. Test model loading:
import joblib

try:
    model = joblib.load("/mnt/models/model.pkl")
    print("Model loaded successfully")
    print(f"Model type: {type(model)}")
    
    # Test prediction
    result = model.predict([[1, 2, 3, 4]])
    print(f"Test prediction: {result}")
except Exception as e:
    print(f"Failed to load: {e}")

Memory Issues

Symptom

  • Pod OOMKilled
  • Server crashes during inference
  • Slow performance

Diagnosis

from kserve import Model, logger
import psutil
import gc

class MemoryMonitoredModel(Model):
    def load(self):
        """Load with memory monitoring"""
        # Check memory before load
        mem_before = psutil.virtual_memory()
        logger.info(f"Memory before load: {mem_before.percent}% used")
        
        self.model = joblib.load("/mnt/models/model.pkl")
        
        # Check memory after load
        mem_after = psutil.virtual_memory()
        logger.info(f"Memory after load: {mem_after.percent}% used")
        logger.info(f"Model memory: {mem_after.used - mem_before.used} bytes")
        
        self.ready = True
    
    def predict(self, payload, headers=None):
        """Predict with memory monitoring"""
        mem_before = psutil.virtual_memory()
        
        predictions = self.model.predict(payload["instances"])
        
        mem_after = psutil.virtual_memory()
        logger.debug(f"Prediction memory delta: {mem_after.used - mem_before.used} bytes")
        
        return {"predictions": predictions.tolist()}

Solutions

  1. Increase memory limits:
resources:
  limits:
    memory: "8Gi"  # Increase from 4Gi
  requests:
    memory: "4Gi"
  1. Enable garbage collection:
import gc
from kserve import Model

class GCModel(Model):
    def predict(self, payload, headers=None):
        try:
            result = self.model.predict(payload["instances"])
            return {"predictions": result.tolist()}
        finally:
            gc.collect()
  1. Use model quantization:
import torch

# Quantize model to reduce memory
model = torch.quantization.quantize_dynamic(
    model,
    {torch.nn.Linear},
    dtype=torch.qint8
)
  1. Implement lazy loading:
class LazyModel(Model):
    def load(self):
        # Don't load model yet
        self.ready = True
    
    def predict(self, payload, headers=None):
        # Load on first use
        if self.model is None:
            self.model = joblib.load("/mnt/models/model.pkl")
        
        return {"predictions": self.model.predict(payload["instances"])}

Connection Timeouts

Symptom

  • Requests timeout
  • "Connection timeout" errors
  • Slow response times

Diagnosis

from kserve import InferenceRESTClient, RESTConfig
import time

async def diagnose_timeout():
    """Diagnose connection timeout issues"""
    client = InferenceRESTClient()
    
    # Test with increasing timeouts
    for timeout in [5, 10, 30, 60]:
        try:
            start = time.time()
            response = await client.infer(
                base_url="http://localhost:8080",
                model_name="my-model",
                data={"instances": [[1, 2, 3, 4]]},
                timeout=timeout
            )
            elapsed = time.time() - start
            print(f"Success with {timeout}s timeout (took {elapsed:.2f}s)")
            break
        except Exception as e:
            print(f"Failed with {timeout}s timeout: {e}")
    
    await client.close()

Solutions

  1. Increase client timeout:
config = RESTConfig(
    protocol="v2",
    timeout=120,  # Increase to 2 minutes
    retries=1
)
client = InferenceRESTClient(config=config)
  1. Optimize model inference:
# Use smaller batch sizes
# Enable GPU acceleration
# Optimize preprocessing
  1. Check network connectivity:
# Test connection
curl -v http://localhost:8080/v2/health/live

# Check DNS resolution
nslookup model-service.default.svc.cluster.local

# Test latency
ping model-service.default.svc.cluster.local

gRPC Errors

Symptom

  • gRPC UNAVAILABLE errors
  • Connection refused
  • Channel closed errors

Solutions

from kserve import InferenceGRPCClient
import grpc
import asyncio

async def resilient_grpc_client():
    """Create resilient gRPC client"""
    # Configure channel options
    channel_args = [
        ('grpc.keepalive_time_ms', 30000),
        ('grpc.keepalive_timeout_ms', 10000),
        ('grpc.keepalive_permit_without_calls', True),
        ('grpc.http2.max_pings_without_data', 0),
        ('grpc.max_connection_idle_ms', 60000),
        ('grpc.max_connection_age_ms', 300000)
    ]
    
    client = InferenceGRPCClient(
        url="localhost:8081",
        channel_args=channel_args,
        timeout=60,
        retries=3
    )
    
    return client

Storage Access Issues

Symptom

  • "Permission denied" errors
  • "File not found" errors
  • Slow model loading

Solutions

  1. Verify credentials:
from kserve import KServeClient

# Check S3 credentials
client = KServeClient()
client.set_credentials(
    storage_type="S3",
    namespace="default",
    service_account="kserve-sa",
    aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
    aws_secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY")
)
  1. Test storage access:
# Test S3 access
aws s3 ls s3://my-bucket/models/

# Test GCS access
gsutil ls gs://my-bucket/models/

# Test Azure access
az storage blob list --account-name myaccount --container-name models
  1. Check service account permissions:
# Kubernetes
kubectl get serviceaccount kserve-sa -n default -o yaml

# Check role bindings
kubectl get rolebindings -n default | grep kserve

Protocol Mismatch

Symptom

  • "Unsupported protocol" errors
  • Malformed request errors
  • 400 Bad Request

Solutions

from kserve import InferenceRESTClient, RESTConfig

# Ensure protocol matches server
# For v1 protocol
v1_config = RESTConfig(protocol="v1")
v1_client = InferenceRESTClient(config=v1_config)

# For v2 protocol
v2_config = RESTConfig(protocol="v2")
v2_client = InferenceRESTClient(config=v2_config)

# Check server protocol
response = await client.get_server_metadata(base_url="http://localhost:8080")
print(f"Server version: {response}")

Performance Issues

Symptom

  • Slow inference times
  • High latency
  • Low throughput

Diagnosis

import cProfile
import pstats
from kserve import Model

class ProfiledModel(Model):
    def predict(self, payload, headers=None):
        """Profile prediction performance"""
        profiler = cProfile.Profile()
        profiler.enable()
        
        result = self.model.predict(payload["instances"])
        
        profiler.disable()
        stats = pstats.Stats(profiler)
        stats.sort_stats('cumulative')
        stats.print_stats(20)  # Top 20 functions
        
        return {"predictions": result.tolist()}

Solutions

  1. Enable caching:
from functools import lru_cache

@lru_cache(maxsize=1000)
def cached_predict(input_tuple):
    return model.predict([list(input_tuple)])
  1. Use batching:
# Increase batch size
ModelServer(max_batch_size=64).start([model])
  1. Enable GPU:
import torch

model = model.to('cuda')
  1. Optimize workers:
python model.py --workers 4 --max_threads 8

Kubernetes Issues

Pod Stuck in Pending

Check events:

kubectl describe pod <pod-name> -n <namespace>
kubectl get events -n <namespace> --sort-by='.lastTimestamp'

Common causes:

  • Insufficient resources
  • Image pull errors
  • Volume mount issues
  • Node selector mismatch

InferenceService Not Ready

Check status:

kubectl get inferenceservice sklearn-iris -n default -o yaml
kubectl describe inferenceservice sklearn-iris -n default

Check underlying resources:

kubectl get pods -n default -l serving.kserve.io/inferenceservice=sklearn-iris
kubectl get services -n default -l serving.kserve.io/inferenceservice=sklearn-iris
kubectl get virtualservices -n default

Debugging Tools

Enable Debug Logging

import os
os.environ["KSERVE_LOGLEVEL"] = "DEBUG"

from kserve import logger
logger.setLevel("DEBUG")

Enable FastAPI Docs

python model.py --enable_docs_url true
# Access at http://localhost:8080/docs

Check Metrics

# Get Prometheus metrics
curl http://localhost:8080/metrics

# Check specific metric
curl http://localhost:8080/metrics | grep request_predict_seconds

Common Error Messages

"Model not found"

  • Cause: Model not registered or name mismatch
  • Solution: Check model name in request matches registered name

"Invalid input"

  • Cause: Input format doesn't match expected schema
  • Solution: Validate input shape, datatype, and structure

"Inference failed"

  • Cause: Model execution error
  • Solution: Check model compatibility, input ranges, dependencies

"Circuit breaker is open"

  • Cause: Too many consecutive failures
  • Solution: Wait for timeout, fix underlying issue, restart service

Next Steps

  • Design Patterns - Implement robust patterns
  • Error Recovery - Handle failures gracefully
  • Production Deployment - Deploy correctly