MLOps Architecture

Overview

The AI Data Science Service implements a comprehensive MLOps architecture that ensures reproducibility, traceability, and production-readiness for machine learning models. This architecture bridges the gap between data science experimentation and production deployment.

MLOps Philosophy: “The difference between a notebook and a product is the engineering.”This architecture demonstrates how to structure data science projects following industry best practices, breaking the barrier between exploratory analysis and production software.

Core Components

Experiment Tracking

MLflow for tracking metrics, parameters, and artifacts

Model Versioning

Systematic versioning of models and configurations

CI/CD Integration

Automated pipelines for testing and deployment

Reproducibility

Deterministic environments and data versioning

Experiment Tracking

MLflow Integration

The service uses MLflow to track all aspects of model training, enabling complete experiment reproducibility and comparison.

training/training.py

import mlflow
import mlflow.pytorch

def train(args):
    config_path = args.config
    config_name = os.path.splitext(os.path.basename(config_path))[0]
    
    # Configure MLflow experiment
    mlflow.set_experiment("Credit Score Training")
    
    with mlflow.start_run(run_name=config_name):
        # Log all hyperparameters
        mlflow.log_params(config)
        mlflow.log_param("config_file", config_name)
        
        # Training loop
        for epoch in range(epochs):
            # ... training code ...
            
            # Log metrics per epoch
            mlflow.log_metric("train_loss", epoch_loss, step=epoch)
            mlflow.log_metric("train_accuracy", epoch_acc, step=epoch)
        
        # Log evaluation metrics
        mlflow.log_metric("test_accuracy", acc)
        mlflow.log_metric("test_roc_auc", roc_auc)
        mlflow.log_metric("test_precision", precision)
        mlflow.log_metric("test_recall", recall)
        mlflow.log_metric("test_f1_score", f1)
        
        # Log visualization artifacts
        mlflow.log_figure(plt.gcf(), "confusion_matrix.png")
        mlflow.log_figure(plt.gcf(), "roc_curve.png")
        mlflow.log_figure(plt.gcf(), "precision_recall_curve.png")
        
        # Log model artifacts
        mlflow.log_artifact(model_save_path)

Tracked Metrics

The architecture tracks comprehensive metrics at different stages:

Training Metrics

train_loss: Binary cross-entropy loss per epoch
train_accuracy: Training accuracy per epoch
batch_size: Number of samples per batch
learning_rate: Optimizer learning rate

Evaluation Metrics

test_accuracy: Overall model accuracy on test set
test_roc_auc: Area under ROC curve
test_precision: Positive predictive value
test_recall: Sensitivity/true positive rate
test_f1_score: Harmonic mean of precision and recall

Visual Artifacts

Confusion Matrix: Classification performance heatmap
ROC Curve: True vs false positive rate visualization
Precision-Recall Curve: Trade-off visualization
Classification Report: Detailed per-class metrics

Accessing MLflow UI

Start the MLflow tracking server to visualize experiments:

# Start MLflow UI
uv run mlflow ui

# Access dashboard at http://127.0.0.1:5000

The MLflow UI provides real-time visualization of training metrics, model comparisons, and artifact browsing. All experiments are stored in the mlruns/ directory.

Model Versioning

Configuration-Based Versioning

Models are versioned through YAML configuration files, enabling systematic experimentation:

config/models-configs/model_config_001.yaml

hidden_layers:
  - 128
  - 64
  - 32
activation_functions:
  - relu
  - relu
  - relu
dropout_rate: 0.3
learning_rate: 0.0005
epochs: 150
batch_size: 64

Versioning Strategy:

model_config_000.yaml - Baseline configuration
model_config_001.yaml - Production configuration
model_config_002.yaml - Experimental variants

Model Artifacts

Each training run produces versioned artifacts:

# Model weights naming convention
weights_name = config_name.replace("model_config", "model_weights")
model_save_path = os.path.join(save_dir, f"{weights_name}.pth")

# Example: model_config_001.yaml → model_weights_001.pth
torch.save(model.state_dict(), model_save_path)
mlflow.log_artifact(model_save_path)

Artifact Structure:

model/
├── model_weights_000.pth  # Baseline model
├── model_weights_001.pth  # Production model
└── model_weights_002.pth  # Experimental variants

mlruns/
└── <experiment_id>/
    └── <run_id>/
        ├── artifacts/
        │   ├── model_weights_001.pth
        │   ├── confusion_matrix.png
        │   ├── roc_curve.png
        │   └── classification_report.txt
        ├── metrics/
        └── params/

CI/CD Integration

Training Pipeline

The architecture supports automated training pipelines:

# Command-line interface for automated execution
uv run training/training.py --config config/models-configs/model_config_001.yaml

Docker-Based Deployment

Production deployment uses containerized environments:

docker-compose.yml

services:
  api:
    build:
      context: .
      dockerfile: Dockerfile.api
    ports:
      - "8000:8000"
    environment:
      - MODEL_PATH=/app/model/model_weights_001.pth
      - CONFIG_PATH=/app/config/models-configs/model_config_001.yaml

Continuous Integration

Automated testing on code changes
Linting and type checking
Training smoke tests

Continuous Deployment

Containerized deployments
Blue-green deployment strategy
Automated rollback capabilities

Reproducibility

Environment Management

The service ensures deterministic environments:

pyproject.toml

[project]
name = "credit-score-ai"
requires-python = ">=3.10"
dependencies = [
    "torch>=2.0.0",
    "mlflow>=2.8.0",
    "fastapi>=0.100.0",
    "scikit-learn>=1.3.0",
]

Reproducibility Features:

UV Lock File: uv.lock ensures exact dependency versions
Python Version: .python-version pins Python runtime
Random Seeds: random_state=42 for consistent data splits
Docker Images: Immutable runtime environments

Data Versioning

Integration with DVC (Data Version Control) for dataset versioning:

# Training references specific dataset versions
dataset_path = os.path.join(
    "datasets",
    "credit_score_dataset",
    "german_credit_risk_v1.0.0_training_23012026.csv"
)

Dataset versions are tracked via DVC, with .dvc files stored in Git and actual data in remote storage (S3, DagsHub, Azure Blob).

Inference Architecture

Singleton Predictor Pattern

The inference system uses a singleton pattern for efficient model loading:

inference/inference.py

class Predictor:
    """Singleton class for model inference."""
    _instance = None
    _initialized = False
    
    def __new__(cls, *args, **kwargs):
        if not cls._instance:
            cls._instance = super(Predictor, cls).__new__(cls)
        return cls._instance
    
    def __init__(self, model_path: str, config_path: str, preprocessor_path: str):
        if Predictor._initialized:
            return
        
        # Load configuration
        with open(config_path, "r") as f:
            yaml_config = yaml.safe_load(f)
        
        # Load preprocessor
        self.preprocessor = joblib.load(preprocessor_path)
        
        # Initialize model with config
        self.model = CreditScoreModel(model_config)
        self.model.load_state_dict(torch.load(model_path, map_location=device))
        self.model.eval()
        
        Predictor._initialized = True

# Global singleton instance
predictor = Predictor(
    model_path=DEFAULT_MODEL_PATH,
    config_path=DEFAULT_CONFIG_PATH,
    preprocessor_path=DEFAULT_PREPROCESSOR_PATH
)

Benefits:

Model loaded once at startup
Reduced inference latency
Memory efficient for high-throughput scenarios

Best Practices

Experiment Organization

Use descriptive experiment names
Tag runs with metadata (dataset version, git commit)
Archive failed experiments for learning

Model Selection

Define success metrics upfront
Compare models systematically via MLflow
Document model selection rationale

Artifact Management

Log all training artifacts
Version control configurations
Maintain artifact lineage

Monitoring

Track inference latency
Monitor prediction distributions
Alert on model degradation

Next Steps

Project Structure

Explore the modular project organization

Data Versioning

Learn about DVC and data management

Get Started

Core Concepts

Guides

Use Cases

Overview

Core Components

Experiment Tracking

Model Versioning

CI/CD Integration

Reproducibility

Experiment Tracking

MLflow Integration

Tracked Metrics

Accessing MLflow UI

Model Versioning

Configuration-Based Versioning

Model Artifacts

CI/CD Integration

Training Pipeline

Docker-Based Deployment

Continuous Integration

Continuous Deployment

Reproducibility

Environment Management

Data Versioning

Inference Architecture

Singleton Predictor Pattern

Best Practices

Experiment Organization

Model Selection

Artifact Management

Monitoring

Next Steps

Project Structure

Data Versioning

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Use Cases

​Overview

​Core Components

Experiment Tracking

Model Versioning

CI/CD Integration

Reproducibility

​Experiment Tracking

​MLflow Integration

​Tracked Metrics

​Accessing MLflow UI

​Model Versioning

​Configuration-Based Versioning

​Model Artifacts

​CI/CD Integration

​Training Pipeline

​Docker-Based Deployment

Continuous Integration

Continuous Deployment

​Reproducibility

​Environment Management

​Data Versioning

​Inference Architecture

​Singleton Predictor Pattern

​Best Practices

Experiment Organization

Model Selection

Artifact Management

Monitoring

​Next Steps

Project Structure

Data Versioning

Build docs developers (and LLMs) love

Overview

Core Components

Experiment Tracking

MLflow Integration

Tracked Metrics

Accessing MLflow UI

Model Versioning

Configuration-Based Versioning

Model Artifacts

CI/CD Integration

Training Pipeline

Docker-Based Deployment

Reproducibility

Environment Management

Data Versioning

Inference Architecture

Singleton Predictor Pattern

Best Practices

Next Steps