Architecture

KLearn is built on Kubernetes-native principles with a modular, extensible design. This page explains the core architecture and how components interact.

High-Level Overview

┌─────────────────────────────────────────────────────────────────┐
│                         User Layer                               │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
│  │  Dashboard  │  │  REST API   │  │   LLM Chat (Ollama)     │  │
│  │  (Next.js)  │  │  (FastAPI)  │  │   (LangChain)           │  │
│  └─────────────┘  └─────────────┘  └─────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
                              │
┌─────────────────────────────────────────────────────────────────┐
│                      Backend Services                            │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
│  │  PostgreSQL │  │    Redis    │  │        MLflow           │  │
│  │  (Metadata) │  │   (Cache)   │  │  (Experiment Tracking)  │  │
│  └─────────────┘  └─────────────┘  └─────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
                              │
┌─────────────────────────────────────────────────────────────────┐
│                    Kubernetes Operator                           │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
│  │ KLearnJob   │  │ KLearnModel │  │  KServe Integration     │  │
│  │ Controller  │  │ Controller  │  │  (InferenceService)     │  │
│  └─────────────┘  └─────────────┘  └─────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
                              │
┌─────────────────────────────────────────────────────────────────┐
│                      Training & Serving                          │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
│  │    FLAML    │  │   KServe    │  │     Gateway API         │  │
│  │  (AutoML)   │  │  (Serving)  │  │    (Routing)            │  │
│  └─────────────┘  └─────────────┘  └─────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

Core Components

1. Frontend Dashboard

The frontend is a Next.js 14 application providing:

Dataset Management: Upload, preview, and manage training data
Experiment Tracking: Create, monitor, and analyze training jobs
Model Registry: View trained models and their metrics
Deployment Management: Deploy and manage model endpoints
User Management: Role-based access control (RBAC)

2. Backend API

The backend is a FastAPI application that:

Provides REST API for all operations
Manages database models (PostgreSQL)
Interacts with Kubernetes API for CRD operations
Handles authentication and authorization
Proxies requests to MLflow and MinIO

3. Kubernetes Operator

Built with Kubebuilder, the operator manages:

KLearnJob Controller

Handles the lifecycle of training jobs:

Creates training pods with FLAML container
Monitors training progress
Handles job completion and failure
Creates KLearnModel on success

KLearnModel Controller

Manages trained model resources:

Tracks model metadata and metrics
Integrates with MLflow for artifact storage
Enables deployment through KServe

4. FLAML Trainer

The training container uses Microsoft FLAML for:

Automatic model selection: Tests multiple algorithms
Hyperparameter optimization: Finds optimal configurations
Feature engineering: Handles preprocessing automatically
Time-budgeted training: Respects time constraints

5. Model Serving

KLearn supports two deployment methods:

KServe InferenceService

Standard Kubernetes-native serving
Autoscaling based on load
Canary deployments
Multi-model serving

KLearn Serving

Lightweight FastAPI-based serving
Direct MinIO integration
Gateway API routing

Custom Resources

KLearnJob

Represents an AutoML training job:

apiVersion: klearn.klearn.dev/v1alpha1
kind: KLearnJob
metadata:
  name: my-training-job
spec:
  dataSource:
    type: minio
    uri: s3://klearn/datasets/data.csv
  taskType: classification
  targetColumn: target
  flamlConfig:
    timeBudget: 3600
    metric: accuracy
status:
  phase: Succeeded
  bestModel:
    estimator: RandomForestClassifier
    score: 0.95

KLearnModel

Represents a trained model:

apiVersion: klearn.klearn.dev/v1alpha1
kind: KLearnModel
metadata:
  name: my-model
spec:
  sourceJob: my-training-job
  modelUri: s3://klearn/models/my-training-job/model.pkl
  stage: production
status:
  phase: Registered
  metrics:
    accuracy: 0.95
    f1_score: 0.94

Data Flow

Training Flow

User uploads dataset → Stored in MinIO
User creates experiment → Backend creates KLearnJob
Operator detects KLearnJob → Creates training pod
FLAML trainer runs → Logs to MLflow, saves model to MinIO
Training completes → Operator creates KLearnModel
User deploys model → Backend creates InferenceService

Inference Flow

Client sends request → Gateway API
Gateway routes to model → Based on HTTPRoute
Serving container loads model → From MinIO
Model makes prediction → Returns response
Response sent to client → Through Gateway

Technology Stack

Layer	Technology
Frontend	Next.js 14, React, Tailwind CSS, shadcn/ui
Backend	FastAPI, SQLAlchemy, Pydantic
Database	PostgreSQL
Cache	Redis
Storage	MinIO (S3-compatible)
ML Tracking	MLflow
AutoML	FLAML
Operator	Kubebuilder (Go)
Serving	KServe, FastAPI
Routing	Kubernetes Gateway API
LLM	Ollama, LangChain

Scalability

KLearn is designed to scale:

Horizontal scaling: All components support replicas
Stateless design: Backend and frontend are stateless
Kubernetes-native: Leverages K8s scheduling and autoscaling
Distributed training: Support for multi-node training (roadmap)

Security

RBAC: Role-based access control for users
Network policies: Isolate components in Kubernetes
Secrets management: Kubernetes secrets for credentials
TLS: HTTPS for all external traffic