Architecture

Understanding KLearn's architecture and components

Architecture

KLearn is built on Kubernetes-native principles with a modular, extensible design. This page explains the core architecture and how components interact.

High-Level Overview

┌─────────────────────────────────────────────────────────────────┐
│                         User Layer                               │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
│  │  Dashboard  │  │  REST API   │  │   LLM Chat (Ollama)     │  │
│  │  (Next.js)  │  │  (FastAPI)  │  │   (LangChain)           │  │
│  └─────────────┘  └─────────────┘  └─────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
                              │
┌─────────────────────────────────────────────────────────────────┐
│                      Backend Services                            │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
│  │  PostgreSQL │  │    Redis    │  │        MLflow           │  │
│  │  (Metadata) │  │   (Cache)   │  │  (Experiment Tracking)  │  │
│  └─────────────┘  └─────────────┘  └─────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
                              │
┌─────────────────────────────────────────────────────────────────┐
│                    Kubernetes Operator                           │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
│  │ KLearnJob   │  │ KLearnModel │  │  KServe Integration     │  │
│  │ Controller  │  │ Controller  │  │  (InferenceService)     │  │
│  └─────────────┘  └─────────────┘  └─────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
                              │
┌─────────────────────────────────────────────────────────────────┐
│                      Training & Serving                          │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
│  │    FLAML    │  │   KServe    │  │     Gateway API         │  │
│  │  (AutoML)   │  │  (Serving)  │  │    (Routing)            │  │
│  └─────────────┘  └─────────────┘  └─────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

Core Components

1. Frontend Dashboard

The frontend is a Next.js 14 application providing:

  • Dataset Management: Upload, preview, and manage training data
  • Experiment Tracking: Create, monitor, and analyze training jobs
  • Model Registry: View trained models and their metrics
  • Deployment Management: Deploy and manage model endpoints
  • User Management: Role-based access control (RBAC)

2. Backend API

The backend is a FastAPI application that:

  • Provides REST API for all operations
  • Manages database models (PostgreSQL)
  • Interacts with Kubernetes API for CRD operations
  • Handles authentication and authorization
  • Proxies requests to MLflow and MinIO

3. Kubernetes Operator

Built with Kubebuilder, the operator manages:

KLearnJob Controller

Handles the lifecycle of training jobs:

  1. Creates training pods with FLAML container
  2. Monitors training progress
  3. Handles job completion and failure
  4. Creates KLearnModel on success

KLearnModel Controller

Manages trained model resources:

  1. Tracks model metadata and metrics
  2. Integrates with MLflow for artifact storage
  3. Enables deployment through KServe

4. FLAML Trainer

The training container uses Microsoft FLAML for:

  • Automatic model selection: Tests multiple algorithms
  • Hyperparameter optimization: Finds optimal configurations
  • Feature engineering: Handles preprocessing automatically
  • Time-budgeted training: Respects time constraints

5. Model Serving

KLearn supports two deployment methods:

KServe InferenceService

  • Standard Kubernetes-native serving
  • Autoscaling based on load
  • Canary deployments
  • Multi-model serving

KLearn Serving

  • Lightweight FastAPI-based serving
  • Direct MinIO integration
  • Gateway API routing

Custom Resources

KLearnJob

Represents an AutoML training job:

apiVersion: klearn.klearn.dev/v1alpha1
kind: KLearnJob
metadata:
  name: my-training-job
spec:
  dataSource:
    type: minio
    uri: s3://klearn/datasets/data.csv
  taskType: classification
  targetColumn: target
  flamlConfig:
    timeBudget: 3600
    metric: accuracy
status:
  phase: Succeeded
  bestModel:
    estimator: RandomForestClassifier
    score: 0.95

KLearnModel

Represents a trained model:

apiVersion: klearn.klearn.dev/v1alpha1
kind: KLearnModel
metadata:
  name: my-model
spec:
  sourceJob: my-training-job
  modelUri: s3://klearn/models/my-training-job/model.pkl
  stage: production
status:
  phase: Registered
  metrics:
    accuracy: 0.95
    f1_score: 0.94

Data Flow

Training Flow

  1. User uploads dataset → Stored in MinIO
  2. User creates experiment → Backend creates KLearnJob
  3. Operator detects KLearnJob → Creates training pod
  4. FLAML trainer runs → Logs to MLflow, saves model to MinIO
  5. Training completes → Operator creates KLearnModel
  6. User deploys model → Backend creates InferenceService

Inference Flow

  1. Client sends request → Gateway API
  2. Gateway routes to model → Based on HTTPRoute
  3. Serving container loads model → From MinIO
  4. Model makes prediction → Returns response
  5. Response sent to client → Through Gateway

Technology Stack

LayerTechnology
FrontendNext.js 14, React, Tailwind CSS, shadcn/ui
BackendFastAPI, SQLAlchemy, Pydantic
DatabasePostgreSQL
CacheRedis
StorageMinIO (S3-compatible)
ML TrackingMLflow
AutoMLFLAML
OperatorKubebuilder (Go)
ServingKServe, FastAPI
RoutingKubernetes Gateway API
LLMOllama, LangChain

Scalability

KLearn is designed to scale:

  • Horizontal scaling: All components support replicas
  • Stateless design: Backend and frontend are stateless
  • Kubernetes-native: Leverages K8s scheduling and autoscaling
  • Distributed training: Support for multi-node training (roadmap)

Security

  • RBAC: Role-based access control for users
  • Network policies: Isolate components in Kubernetes
  • Secrets management: Kubernetes secrets for credentials
  • TLS: HTTPS for all external traffic