Architecture
Understanding KLearn's architecture and components
Architecture
KLearn is built on Kubernetes-native principles with a modular, extensible design. This page explains the core architecture and how components interact.
High-Level Overview
┌─────────────────────────────────────────────────────────────────┐
│ User Layer │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ Dashboard │ │ REST API │ │ LLM Chat (Ollama) │ │
│ │ (Next.js) │ │ (FastAPI) │ │ (LangChain) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────────────────┐
│ Backend Services │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ PostgreSQL │ │ Redis │ │ MLflow │ │
│ │ (Metadata) │ │ (Cache) │ │ (Experiment Tracking) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────────────────┐
│ Kubernetes Operator │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ KLearnJob │ │ KLearnModel │ │ KServe Integration │ │
│ │ Controller │ │ Controller │ │ (InferenceService) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────────────────┐
│ Training & Serving │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ FLAML │ │ KServe │ │ Gateway API │ │
│ │ (AutoML) │ │ (Serving) │ │ (Routing) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Core Components
1. Frontend Dashboard
The frontend is a Next.js 14 application providing:
- Dataset Management: Upload, preview, and manage training data
- Experiment Tracking: Create, monitor, and analyze training jobs
- Model Registry: View trained models and their metrics
- Deployment Management: Deploy and manage model endpoints
- User Management: Role-based access control (RBAC)
2. Backend API
The backend is a FastAPI application that:
- Provides REST API for all operations
- Manages database models (PostgreSQL)
- Interacts with Kubernetes API for CRD operations
- Handles authentication and authorization
- Proxies requests to MLflow and MinIO
3. Kubernetes Operator
Built with Kubebuilder, the operator manages:
KLearnJob Controller
Handles the lifecycle of training jobs:
- Creates training pods with FLAML container
- Monitors training progress
- Handles job completion and failure
- Creates KLearnModel on success
KLearnModel Controller
Manages trained model resources:
- Tracks model metadata and metrics
- Integrates with MLflow for artifact storage
- Enables deployment through KServe
4. FLAML Trainer
The training container uses Microsoft FLAML for:
- Automatic model selection: Tests multiple algorithms
- Hyperparameter optimization: Finds optimal configurations
- Feature engineering: Handles preprocessing automatically
- Time-budgeted training: Respects time constraints
5. Model Serving
KLearn supports two deployment methods:
KServe InferenceService
- Standard Kubernetes-native serving
- Autoscaling based on load
- Canary deployments
- Multi-model serving
KLearn Serving
- Lightweight FastAPI-based serving
- Direct MinIO integration
- Gateway API routing
Custom Resources
KLearnJob
Represents an AutoML training job:
apiVersion: klearn.klearn.dev/v1alpha1
kind: KLearnJob
metadata:
name: my-training-job
spec:
dataSource:
type: minio
uri: s3://klearn/datasets/data.csv
taskType: classification
targetColumn: target
flamlConfig:
timeBudget: 3600
metric: accuracy
status:
phase: Succeeded
bestModel:
estimator: RandomForestClassifier
score: 0.95
KLearnModel
Represents a trained model:
apiVersion: klearn.klearn.dev/v1alpha1
kind: KLearnModel
metadata:
name: my-model
spec:
sourceJob: my-training-job
modelUri: s3://klearn/models/my-training-job/model.pkl
stage: production
status:
phase: Registered
metrics:
accuracy: 0.95
f1_score: 0.94
Data Flow
Training Flow
- User uploads dataset → Stored in MinIO
- User creates experiment → Backend creates KLearnJob
- Operator detects KLearnJob → Creates training pod
- FLAML trainer runs → Logs to MLflow, saves model to MinIO
- Training completes → Operator creates KLearnModel
- User deploys model → Backend creates InferenceService
Inference Flow
- Client sends request → Gateway API
- Gateway routes to model → Based on HTTPRoute
- Serving container loads model → From MinIO
- Model makes prediction → Returns response
- Response sent to client → Through Gateway
Technology Stack
| Layer | Technology |
|---|---|
| Frontend | Next.js 14, React, Tailwind CSS, shadcn/ui |
| Backend | FastAPI, SQLAlchemy, Pydantic |
| Database | PostgreSQL |
| Cache | Redis |
| Storage | MinIO (S3-compatible) |
| ML Tracking | MLflow |
| AutoML | FLAML |
| Operator | Kubebuilder (Go) |
| Serving | KServe, FastAPI |
| Routing | Kubernetes Gateway API |
| LLM | Ollama, LangChain |
Scalability
KLearn is designed to scale:
- Horizontal scaling: All components support replicas
- Stateless design: Backend and frontend are stateless
- Kubernetes-native: Leverages K8s scheduling and autoscaling
- Distributed training: Support for multi-node training (roadmap)
Security
- RBAC: Role-based access control for users
- Network policies: Isolate components in Kubernetes
- Secrets management: Kubernetes secrets for credentials
- TLS: HTTPS for all external traffic