First Experiment

Train your first model with KLearn step by step

Your First Experiment

This guide walks you through training your first machine learning model with KLearn, from uploading data to getting predictions.

Prerequisites

Before starting, ensure you have:

  • KLearn installed and running (Installation Guide)
  • Access to the KLearn dashboard at http://localhost:3000
  • A CSV dataset ready (we'll use the Iris dataset as an example)

Step 1: Prepare Your Data

For this tutorial, we'll use the classic Iris dataset. You can download it or create a sample:

sepal_length,sepal_width,petal_length,petal_width,species
5.1,3.5,1.4,0.2,setosa
4.9,3.0,1.4,0.2,setosa
7.0,3.2,4.7,1.4,versicolor
6.4,3.2,4.5,1.5,versicolor
6.3,3.3,6.0,2.5,virginica
5.8,2.7,5.1,1.9,virginica

Save this as iris.csv.

Step 2: Upload Dataset

Using the Dashboard

  1. Navigate to Datasets in the sidebar
  2. Click Upload Dataset
  3. Drag and drop your iris.csv file
  4. Enter a name: iris-classification
  5. Click Upload

Using the API

curl -X POST http://localhost:8000/api/v1/datasets \
  -F "file=@iris.csv" \
  -F "name=iris-classification"

After upload, you'll see:

  • Row count: 150
  • Column count: 5
  • Column types detected automatically

Step 3: Create an Experiment

Using the Dashboard

  1. Navigate to Experiments in the sidebar
  2. Click New Experiment
  3. Fill in the form:
    • Name: iris-species-classifier
    • Dataset: Select iris-classification
    • Task Type: Classification
    • Target Column: species
    • Time Budget: 120 seconds (for quick testing)
  4. Click Start Training

Using the API

curl -X POST http://localhost:8000/api/v1/experiments \
  -H "Content-Type: application/json" \
  -d '{
    "name": "iris-species-classifier",
    "dataset_id": 1,
    "task_type": "classification",
    "target_column": "species",
    "time_budget": 120
  }'

Step 4: Monitor Training

Once training starts, you can monitor progress:

Real-time Dashboard

The experiment detail page shows:

  • Progress bar: Time elapsed vs. budget
  • Current trial: Which model is being tested
  • Best score: Current best accuracy
  • Live logs: Real-time training output

Using kubectl

# Watch the KLearnJob
kubectl get klearnjob -n klearn -w

# Check pod logs
kubectl logs -n klearn -l job-name=iris-species-classifier-xxxxx -f

What FLAML does during training

  1. Data preprocessing: Handles missing values, encodes categories
  2. Model selection: Tests RandomForest, XGBoost, LightGBM, etc.
  3. Hyperparameter tuning: Optimizes each model's parameters
  4. Cross-validation: Ensures model generalizes well
  5. Best model selection: Picks the highest-scoring model

Step 5: Review Results

When training completes (usually in 1-2 minutes for Iris):

Metrics Tab

  • Best Score: ~0.97 accuracy
  • Best Model: Usually RandomForest or LightGBM
  • Training Time: Actual time taken
  • Total Trials: Number of models tested

Configuration Tab

  • Task type and target column
  • Time budget used
  • K8s job name for debugging

Logs Tab

  • Complete training logs
  • FLAML output with trial details
  • Any warnings or errors

Step 6: Deploy the Model

Using the Dashboard

  1. From the experiment page, click Deploy Model
  2. Or navigate to Models and find your model
  3. Click the Deploy button
  4. Choose deployment type:
    • KLearn: Lightweight, good for testing
    • KServe: Production-grade with autoscaling
  5. Set replicas (default: 2)
  6. Click Deploy

Using the API

# Get the model name
curl http://localhost:8000/api/v1/models

# Deploy it
curl -X POST http://localhost:8000/api/v1/models/iris-species-classifier-model/deploy \
  -H "Content-Type: application/json" \
  -d '{
    "deployment_type": "klearn",
    "replicas": 1
  }'

Step 7: Make Predictions

Once deployed, your model is ready for predictions:

Find the endpoint

# Check deployment status
curl http://localhost:8000/api/v1/deployments

# Get the endpoint URL
kubectl get httproute -n klearn

Send a prediction request

curl -X POST http://localhost:8080/predict \
  -H "Content-Type: application/json" \
  -d '{
    "instances": [
      {
        "sepal_length": 5.1,
        "sepal_width": 3.5,
        "petal_length": 1.4,
        "petal_width": 0.2
      }
    ]
  }'

Response

{
  "predictions": ["setosa"],
  "probabilities": [[0.98, 0.01, 0.01]]
}

Troubleshooting

Training stuck at "Pending"

Check if the trainer image is available:

kubectl describe pod -n klearn -l klearn.dev/job-name=iris-species-classifier

Low accuracy

  • Increase time budget
  • Check for data quality issues
  • Ensure target column is correct

Deployment not starting

kubectl describe deployment -n klearn iris-species-classifier-serving
kubectl logs -n klearn -l app=iris-species-classifier-serving

Next Steps

Congratulations! 🎉 You've trained and deployed your first model with KLearn.