Installation Guide

Comprehensive instructions for setting up the malware classification platform on your system.

System Requirements

Minimum Requirements

Operating System

Linux (Ubuntu 20.04+)
macOS (11.0+)
Windows 10/11

Python

Python 3.10 or higher
pip or uv package manager

Memory

8 GB RAM (minimum)
16 GB RAM (recommended)

Storage

5 GB for dependencies
Additional space for datasets and models

Recommended Hardware

NVIDIA GPU
Apple Silicon
CPU Only

Best for: Maximum training speed

NVIDIA GPU with CUDA support (compute capability 3.5+)
6 GB+ VRAM (8 GB+ recommended)
CUDA Toolkit 11.8 or 12.1
cuDNN 8.x

Recommended GPUs:

RTX 3060 (12 GB)
RTX 3080 (10 GB)
RTX 4090 (24 GB)
Tesla T4, V100, A100 (cloud)

Installation Methods

Method 1: Using uv (Recommended)

uv is a fast Python package installer and resolver.

Install uv

macOS/Linux
Windows
Alternative (pip)

curl -LsSf https://astral.sh/uv/install.sh | sh

powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

pip install uv

Clone Repository

git clone <repository-url>
cd <repository-name>

Install Dependencies

# Installs all dependencies from pyproject.toml
uv sync

# With optional dev dependencies (Jupyter, notebooks)
uv sync --extra dev

# With deployment dependencies (Docker, Redis)
uv sync --extra deployment

Activate Environment

# On Unix/macOS
source .venv/bin/activate

# On Windows
.venv\Scripts\activate

Method 2: Using pip

Check Python Version

python --version  # Should be 3.10+

If you have multiple Python versions, you may need to use python3.10 or python3.11 explicitly.

Clone Repository

git clone <repository-url>
cd <repository-name>

Create Virtual Environment

python -m venv venv

# Activate on Unix/macOS
source venv/bin/activate

# Activate on Windows
venv\Scripts\activate

Install Package

# Install in editable mode
pip install -e .

# Or install with extras
pip install -e ".[dev]"
pip install -e ".[deployment]"

GPU Setup

NVIDIA CUDA Setup

Install NVIDIA Drivers

Ubuntu/Debian
Windows

# Check current driver
nvidia-smi

# Install latest driver
sudo ubuntu-drivers autoinstall
sudo reboot

Install CUDA Toolkit

PyTorch includes CUDA libraries, but for development:

# Ubuntu 22.04 example
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get install cuda-toolkit-12-1

See CUDA Installation Guide

Verify CUDA Installation

nvidia-smi

Expected output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05   Driver Version: 525.147.05   CUDA Version: 12.1   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  NVIDIA RTX 3080    Off  | 00000000:01:00.0  On |                  N/A |
| 30%   45C    P8    25W / 320W |    512MiB / 10240MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+

Test PyTorch CUDA

import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")
print(f"Device name: {torch.cuda.get_device_name(0)}")

Expected:

CUDA available: True
CUDA version: 12.1
Device name: NVIDIA GeForce RTX 3080

Apple Silicon MPS Setup

Verify macOS Version

sw_vers

Ensure macOS 12.3+ (MPS support requires PyTorch 1.12+)

Install Dependencies

# Use uv or pip as described above
uv sync

PyTorch will automatically detect and use MPS backend on Apple Silicon

Test MPS Support

import torch
print(f"MPS available: {torch.backends.mps.is_available()}")
print(f"MPS built: {torch.backends.mps.is_built()}")

# Create tensor on MPS device
device = torch.device("mps")
x = torch.randn(3, 3, device=device)
print(x)

Dependency Overview

The project uses pyproject.toml for dependency management:

Core Dependencies

[project]
dependencies = [
    # PyTorch ecosystem
    "torch>=2.1.0",
    "torchvision>=0.16.0",
    "torch-geometric>=2.4.0",
    
    # Traditional ML
    "xgboost>=2.0.0",
    "lightgbm>=4.1.0",
    "scikit-learn>=1.4.0",
    
    # Transformers
    "transformers>=4.35.0",
    
    # Data processing
    "polars>=0.19.0",
    "pyarrow>=14.0.0",
    "pandas>=2.1.0",
    "numpy>=1.24.0",
]

Dataset Setup

Create Dataset Directory

cd app
mkdir -p dataset/malware

Organize Your Data

Place malware images in class-based folders:

dataset/malware/
├── Adialer.C/
│   ├── 00ebc5bf03e5a8e52ad171b2ad9d7e51.png
│   ├── 010c3b03d2901a4edd76f52e06f34c20.png
│   └── ...
├── Agent.FYI/
│   └── ...
├── Allaple.A/
│   └── ...
└── ...

The app automatically scans this directory and detects classes. Folder names become class labels.

Verify Dataset

# Check dataset structure
ls -lh dataset/malware/

# Count images per class
for dir in dataset/malware/*/; do
    echo "$(basename "$dir"): $(ls "$dir" | wc -l) images"
done

Supported Image Formats

PNG (recommended for malware visualization)
JPEG/JPG
BMP
TIFF

Ensure all images in a class folder are valid. Corrupted images will cause dataset loading errors.

Verify Installation

Run Import Test

# test_imports.py
import torch
import torchvision
import streamlit
import plotly
import sklearn

print(f"✓ PyTorch {torch.__version__}")
print(f"✓ torchvision {torchvision.__version__}")
print(f"✓ Streamlit {streamlit.__version__}")
print(f"✓ CUDA available: {torch.cuda.is_available()}")
print(f"✓ MPS available: {torch.backends.mps.is_available()}")

Run:

python test_imports.py

Check GPU Detection

# test_gpu.py
from app.components.utils import get_compute_device, get_gpu_memory

device_info = get_compute_device()
print(f"Device Type: {device_info['type']}")
print(f"Device Name: {device_info['name']}")
print(f"Available: {device_info['available']}")
print(f"Memory: {get_gpu_memory()}")

Launch Dashboard

cd app
streamlit run main.py

Visit http://localhost:8501 and verify:

Dashboard loads without errors
Device info shows in header
Navigation pages are accessible

Configuration Files

Streamlit Configuration

The app includes a pre-configured .streamlit/config.toml:

[theme]
base = "dark"
primaryColor = "#98c127"
backgroundColor = "#0e1117"
secondaryBackgroundColor = "#262730"

[server]
headless = true
port = 8501

Customize as needed:

cd app/.streamlit
nano config.toml

Storage Directories

The app automatically creates these directories:

# From constants.py
STORAGE_ROOT = Path("storage")
SESSIONS_DIR = STORAGE_ROOT / "sessions"      # Saved sessions
MODELS_DIR = STORAGE_ROOT / "models"          # Trained models
RESULTS_DIR = STORAGE_ROOT / "results"        # Training results
CHECKPOINTS_DIR = STORAGE_ROOT / "checkpoints" # Model checkpoints

Locations (relative to app/):

app/storage/
├── sessions/       # Session state JSON files
├── models/         # Final trained models (.pth)
├── results/        # Metrics, charts, logs
└── checkpoints/    # Best model checkpoints

Troubleshooting Installation

ImportError: No module named 'torch'

Cause: PyTorch not installed correctlySolution:

# Reinstall PyTorch
pip uninstall torch torchvision
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

CUDA out of memory

Cause: GPU doesn’t have enough VRAMSolutions:

Reduce batch size in training config
Use smaller model (EfficientNetB0 instead of ResNet101)
Reduce image size (192x192 instead of 224x224)
Close other GPU applications

Check memory usage:

nvidia-smi

Streamlit shows blank page

Causes & Solutions:

Port already in use:

streamlit run main.py --server.port 8502

Browser cache:
- Hard refresh (Ctrl+Shift+R or Cmd+Shift+R)
- Clear browser cache
Module import errors:
```
streamlit run main.py --logger.level=debug
```
Check terminal for error messages

Module not found errors in Streamlit

Cause: Running from wrong directorySolution:

# Must run from app/ directory
cd app
streamlit run main.py

# NOT from repo root:
# cd repo_root
# streamlit run app/main.py  # ❌ Wrong!

uv: command not found

Cause: uv not in PATHSolutions:

Restart terminal after installation

Add to PATH manually:

# Add to ~/.bashrc or ~/.zshrc
export PATH="$HOME/.cargo/bin:$PATH"
source ~/.bashrc

Use pip instead:
```
pip install uv
```

MPS backend not available (Mac)

Causes:

macOS < 12.3
PyTorch < 1.12
Intel Mac (MPS is Apple Silicon only)

Check:

import torch
print(torch.backends.mps.is_built())  # Should be True
print(torch.backends.mps.is_available())  # Should be True

Solution:

# Update PyTorch
pip install --upgrade torch torchvision

Optional: Jupyter Setup

For development and experimentation:

# Install dev extras
uv sync --extra dev

# Start Jupyter
jupyter notebook

Create a notebook to test components:

# test_notebook.ipynb
import sys
sys.path.append('../app')

from models.pytorch.cnn_builder import CustomCNNBuilder
from training.engine import TrainingEngine
import torch

# Test model building
config = {...}
builder = CustomCNNBuilder(config)
model = builder.build()
print(model)

Next Steps

Quick Start

Train your first model in 5 minutes

Architecture Guide

Understand the codebase structure

Training Guide

Best practices for model training

API Reference

Detailed API documentation

Getting Help

GitHub Issues

Report bugs or request features on the repository

Discussions

Ask questions and share experiences

Installation complete! You’re ready to start building malware classifiers.

Get Started

Core Concepts

Dashboard Guide

Training

Model Interpretability

Installation

Installation Guide

System Requirements

Minimum Requirements

Operating System

Python

Memory

Storage

Recommended Hardware

Installation Methods

Method 1: Using uv (Recommended)

Method 2: Using pip

GPU Setup

NVIDIA CUDA Setup

Apple Silicon MPS Setup

Dependency Overview

Core Dependencies

Dataset Setup

Supported Image Formats

Verify Installation

Configuration Files

Streamlit Configuration

Storage Directories

Troubleshooting Installation

Optional: Jupyter Setup

Next Steps

Quick Start

Architecture Guide

Training Guide

API Reference

Getting Help

GitHub Issues

Discussions

Build docs developers (and LLMs) love

Get Started

Core Concepts

Dashboard Guide

Training

Model Interpretability

​Installation Guide

​System Requirements

​Minimum Requirements

Operating System

Python

Memory

Storage

​Recommended Hardware

​Installation Methods

​Method 1: Using uv (Recommended)

​Method 2: Using pip

​GPU Setup

​NVIDIA CUDA Setup

​Apple Silicon MPS Setup

​Dependency Overview

​Core Dependencies

​Dataset Setup

​Supported Image Formats

​Verify Installation

​Configuration Files

​Streamlit Configuration

​Storage Directories

​Troubleshooting Installation

​Optional: Jupyter Setup

​Next Steps

Quick Start

Architecture Guide

Training Guide

API Reference

​Getting Help

GitHub Issues

Discussions

Build docs developers (and LLMs) love

Installation Guide

System Requirements

Minimum Requirements

Recommended Hardware

Installation Methods

Method 1: Using uv (Recommended)

Method 2: Using pip

GPU Setup

NVIDIA CUDA Setup

Apple Silicon MPS Setup

Dependency Overview

Core Dependencies

Dataset Setup

Supported Image Formats

Verify Installation

Configuration Files

Streamlit Configuration

Storage Directories

Troubleshooting Installation

Optional: Jupyter Setup

Next Steps

Getting Help