Deploying Models

This guide covers the different ways to deploy trained models in KLearn for serving predictions.

Deployment Options

KLearn supports two deployment methods:

Method	Use Case	Features
KLearn Serving	Development, testing	Simple, lightweight, fast iteration
KServe	Production	Autoscaling, canary, GPU support

KLearn Serving (Recommended for Development)

KLearn Serving is a lightweight FastAPI-based serving solution perfect for:

Local development and testing
Simple use cases
Quick deployment iterations

Deploy via Dashboard

Navigate to Models
Find your trained model
Click Deploy
Select KLearn as deployment type
Set replicas (1-3 for development)
Click Deploy

Deploy via API

curl -X POST http://localhost:8000/api/v1/models/{model_name}/deploy \
  -H "Content-Type: application/json" \
  -d '{
    "deployment_type": "klearn",
    "replicas": 2
  }'

Architecture

Client → Gateway API → HTTPRoute → KLearn Serving Pod → MinIO (model)

The serving pod:

Loads model from MinIO on startup
Exposes /predict endpoint
Handles JSON input/output
Returns predictions and probabilities

KServe (Recommended for Production)

KServe is a Kubernetes-native model serving platform that provides:

Autoscaling: Scale to zero, scale based on load
Canary deployments: Gradual rollout of new versions
Multi-model serving: Serve multiple models efficiently
GPU support: For deep learning models
Request batching: Improved throughput

Prerequisites

KServe must be installed on your cluster. KLearn's Helm chart can install it:

# values.yaml
kserve:
  enabled: true

Deploy via Dashboard

Navigate to Models
Click Deploy on your model
Select KServe as deployment type
Configure options:
- Replicas: min/max for autoscaling
- Runtime: flaml (for FLAML models)
Click Deploy

Deploy via API

curl -X POST http://localhost:8000/api/v1/models/{model_name}/deploy \
  -H "Content-Type: application/json" \
  -d '{
    "deployment_type": "kserve",
    "replicas": 2,
    "runtime": "flaml",
    "autoscaling": {
      "min_replicas": 1,
      "max_replicas": 10,
      "target_utilization": 70
    }
  }'

Custom Runtime

KLearn provides a custom FLAML runtime for KServe:

apiVersion: serving.kserve.io/v1alpha1
kind: ClusterServingRuntime
metadata:
  name: flaml-runtime
spec:
  containers:
  - name: kserve-container
    image: localhost:5001/klearn/flaml-runtime:latest
    env:
    - name: PROTOCOL
      value: v2

Making Predictions

Request Format

curl -X POST http://{endpoint}/predict \
  -H "Content-Type: application/json" \
  -d '{
    "instances": [
      {"feature1": 1.0, "feature2": "value"},
      {"feature1": 2.0, "feature2": "other"}
    ]
  }'

Response Format

{
  "predictions": [0, 1],
  "probabilities": [
    [0.85, 0.15],
    [0.20, 0.80]
  ],
  "model_name": "churn-model",
  "model_version": "v1"
}

Batch Predictions

For large batches, use the batch endpoint:

curl -X POST http://{endpoint}/predict/batch \
  -H "Content-Type: application/json" \
  -d '{
    "instances": [
      // Up to 1000 instances
    ]
  }'

Monitoring Deployments

Check Deployment Status

# Via API
curl http://localhost:8000/api/v1/deployments

# Via kubectl
kubectl get deployment -n klearn -l klearn.dev/model-name={model}
kubectl get inferenceservice -n klearn

View Logs

# KLearn serving logs
kubectl logs -n klearn -l app={model-name}-serving

# KServe logs
kubectl logs -n klearn -l serving.kserve.io/inferenceservice={model-name}

Metrics

KLearn exposes Prometheus metrics:

klearn_predictions_total: Total predictions made
klearn_prediction_latency_seconds: Prediction latency histogram
klearn_model_load_time_seconds: Model loading time

Scaling

Manual Scaling

# Via API
curl -X PATCH http://localhost:8000/api/v1/deployments/{name}/scale \
  -H "Content-Type: application/json" \
  -d '{"replicas": 5}'

# Via kubectl
kubectl scale deployment {name}-serving -n klearn --replicas=5

Autoscaling (KServe only)

Configure in the deployment:

{
  "autoscaling": {
    "min_replicas": 1,
    "max_replicas": 10,
    "target_utilization": 70,
    "scale_down_delay": "5m"
  }
}

Updating Deployments

Rolling Update

When you deploy a new model version:

New pods are created with the new model
Traffic gradually shifts to new pods
Old pods are terminated

# Redeploy with new model
curl -X POST http://localhost:8000/api/v1/models/{new-model}/deploy \
  -H "Content-Type: application/json" \
  -d '{"deployment_type": "klearn", "replicas": 2}'

Canary Deployment (KServe)

Gradually shift traffic:

{
  "canary": {
    "traffic_percent": 10,
    "model_name": "new-model-version"
  }
}

Undeploying

Via Dashboard

Navigate to Deployments
Find your deployment
Click Delete (with confirmation dialog)

Via API

curl -X DELETE http://localhost:8000/api/v1/deployments/{name}

Troubleshooting

Pod not starting

kubectl describe pod -n klearn -l app={model}-serving

Common issues:

ImagePullBackOff: Image not in registry
CrashLoopBackOff: Check logs for errors
Pending: Not enough resources

Model not loading

Check serving logs:

kubectl logs -n klearn -l app={model}-serving

Common issues:

MinIO connection: Check credentials
Model file missing: Verify path in MinIO
Incompatible model: Check model format

High latency

Increase replicas
Check resource limits
Enable request batching
Consider GPU for large models