Skip to main content

Installation Guide

Comprehensive instructions for setting up the malware classification platform on your system.

System Requirements

Minimum Requirements

Operating System

  • Linux (Ubuntu 20.04+)
  • macOS (11.0+)
  • Windows 10/11

Python

  • Python 3.10 or higher
  • pip or uv package manager

Memory

  • 8 GB RAM (minimum)
  • 16 GB RAM (recommended)

Storage

  • 5 GB for dependencies
  • Additional space for datasets and models
Best for: Maximum training speed
  • NVIDIA GPU with CUDA support (compute capability 3.5+)
  • 6 GB+ VRAM (8 GB+ recommended)
  • CUDA Toolkit 11.8 or 12.1
  • cuDNN 8.x
Recommended GPUs:
  • RTX 3060 (12 GB)
  • RTX 3080 (10 GB)
  • RTX 4090 (24 GB)
  • Tesla T4, V100, A100 (cloud)

Installation Methods

uv is a fast Python package installer and resolver.
1

Install uv

curl -LsSf https://astral.sh/uv/install.sh | sh
2

Clone Repository

git clone <repository-url>
cd <repository-name>
3

Install Dependencies

# Installs all dependencies from pyproject.toml
uv sync

# With optional dev dependencies (Jupyter, notebooks)
uv sync --extra dev

# With deployment dependencies (Docker, Redis)
uv sync --extra deployment
4

Activate Environment

# On Unix/macOS
source .venv/bin/activate

# On Windows
.venv\Scripts\activate

Method 2: Using pip

1

Check Python Version

python --version  # Should be 3.10+
If you have multiple Python versions, you may need to use python3.10 or python3.11 explicitly.
2

Clone Repository

git clone <repository-url>
cd <repository-name>
3

Create Virtual Environment

python -m venv venv

# Activate on Unix/macOS
source venv/bin/activate

# Activate on Windows
venv\Scripts\activate
4

Install Package

# Install in editable mode
pip install -e .

# Or install with extras
pip install -e ".[dev]"
pip install -e ".[deployment]"

GPU Setup

NVIDIA CUDA Setup

1

Install NVIDIA Drivers

# Check current driver
nvidia-smi

# Install latest driver
sudo ubuntu-drivers autoinstall
sudo reboot
2

Install CUDA Toolkit

PyTorch includes CUDA libraries, but for development:
# Ubuntu 22.04 example
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get install cuda-toolkit-12-1
See CUDA Installation Guide
3

Verify CUDA Installation

nvidia-smi
Expected output:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05   Driver Version: 525.147.05   CUDA Version: 12.1   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  NVIDIA RTX 3080    Off  | 00000000:01:00.0  On |                  N/A |
| 30%   45C    P8    25W / 320W |    512MiB / 10240MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+
4

Test PyTorch CUDA

import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")
print(f"Device name: {torch.cuda.get_device_name(0)}")
Expected:
CUDA available: True
CUDA version: 12.1
Device name: NVIDIA GeForce RTX 3080

Apple Silicon MPS Setup

1

Verify macOS Version

sw_vers
Ensure macOS 12.3+ (MPS support requires PyTorch 1.12+)
2

Install Dependencies

# Use uv or pip as described above
uv sync
PyTorch will automatically detect and use MPS backend on Apple Silicon
3

Test MPS Support

import torch
print(f"MPS available: {torch.backends.mps.is_available()}")
print(f"MPS built: {torch.backends.mps.is_built()}")

# Create tensor on MPS device
device = torch.device("mps")
x = torch.randn(3, 3, device=device)
print(x)

Dependency Overview

The project uses pyproject.toml for dependency management:

Core Dependencies

[project]
dependencies = [
    # PyTorch ecosystem
    "torch>=2.1.0",
    "torchvision>=0.16.0",
    "torch-geometric>=2.4.0",
    
    # Traditional ML
    "xgboost>=2.0.0",
    "lightgbm>=4.1.0",
    "scikit-learn>=1.4.0",
    
    # Transformers
    "transformers>=4.35.0",
    
    # Data processing
    "polars>=0.19.0",
    "pyarrow>=14.0.0",
    "pandas>=2.1.0",
    "numpy>=1.24.0",
]

Dataset Setup

1

Create Dataset Directory

cd app
mkdir -p dataset/malware
2

Organize Your Data

Place malware images in class-based folders:
dataset/malware/
├── Adialer.C/
   ├── 00ebc5bf03e5a8e52ad171b2ad9d7e51.png
   ├── 010c3b03d2901a4edd76f52e06f34c20.png
   └── ...
├── Agent.FYI/
   └── ...
├── Allaple.A/
   └── ...
└── ...
The app automatically scans this directory and detects classes. Folder names become class labels.
3

Verify Dataset

# Check dataset structure
ls -lh dataset/malware/

# Count images per class
for dir in dataset/malware/*/; do
    echo "$(basename "$dir"): $(ls "$dir" | wc -l) images"
done

Supported Image Formats

  • PNG (recommended for malware visualization)
  • JPEG/JPG
  • BMP
  • TIFF
Ensure all images in a class folder are valid. Corrupted images will cause dataset loading errors.

Verify Installation

1

Run Import Test

# test_imports.py
import torch
import torchvision
import streamlit
import plotly
import sklearn

print(f"✓ PyTorch {torch.__version__}")
print(f"✓ torchvision {torchvision.__version__}")
print(f"✓ Streamlit {streamlit.__version__}")
print(f"✓ CUDA available: {torch.cuda.is_available()}")
print(f"✓ MPS available: {torch.backends.mps.is_available()}")
Run:
python test_imports.py
2

Check GPU Detection

# test_gpu.py
from app.components.utils import get_compute_device, get_gpu_memory

device_info = get_compute_device()
print(f"Device Type: {device_info['type']}")
print(f"Device Name: {device_info['name']}")
print(f"Available: {device_info['available']}")
print(f"Memory: {get_gpu_memory()}")
3

Launch Dashboard

cd app
streamlit run main.py
Visit http://localhost:8501 and verify:
  • Dashboard loads without errors
  • Device info shows in header
  • Navigation pages are accessible

Configuration Files

Streamlit Configuration

The app includes a pre-configured .streamlit/config.toml:
[theme]
base = "dark"
primaryColor = "#98c127"
backgroundColor = "#0e1117"
secondaryBackgroundColor = "#262730"

[server]
headless = true
port = 8501
Customize as needed:
cd app/.streamlit
nano config.toml

Storage Directories

The app automatically creates these directories:
# From constants.py
STORAGE_ROOT = Path("storage")
SESSIONS_DIR = STORAGE_ROOT / "sessions"      # Saved sessions
MODELS_DIR = STORAGE_ROOT / "models"          # Trained models
RESULTS_DIR = STORAGE_ROOT / "results"        # Training results
CHECKPOINTS_DIR = STORAGE_ROOT / "checkpoints" # Model checkpoints
Locations (relative to app/):
app/storage/
├── sessions/       # Session state JSON files
├── models/         # Final trained models (.pth)
├── results/        # Metrics, charts, logs
└── checkpoints/    # Best model checkpoints

Troubleshooting Installation

Cause: PyTorch not installed correctlySolution:
# Reinstall PyTorch
pip uninstall torch torchvision
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
Cause: GPU doesn’t have enough VRAMSolutions:
  1. Reduce batch size in training config
  2. Use smaller model (EfficientNetB0 instead of ResNet101)
  3. Reduce image size (192x192 instead of 224x224)
  4. Close other GPU applications
Check memory usage:
nvidia-smi
Causes & Solutions:
  1. Port already in use:
    streamlit run main.py --server.port 8502
    
  2. Browser cache:
    • Hard refresh (Ctrl+Shift+R or Cmd+Shift+R)
    • Clear browser cache
  3. Module import errors:
    streamlit run main.py --logger.level=debug
    
    Check terminal for error messages
Cause: Running from wrong directorySolution:
# Must run from app/ directory
cd app
streamlit run main.py

# NOT from repo root:
# cd repo_root
# streamlit run app/main.py  # ❌ Wrong!
Cause: uv not in PATHSolutions:
  1. Restart terminal after installation
  2. Add to PATH manually:
    # Add to ~/.bashrc or ~/.zshrc
    export PATH="$HOME/.cargo/bin:$PATH"
    source ~/.bashrc
    
  3. Use pip instead:
    pip install uv
    
Causes:
  • macOS < 12.3
  • PyTorch < 1.12
  • Intel Mac (MPS is Apple Silicon only)
Check:
import torch
print(torch.backends.mps.is_built())  # Should be True
print(torch.backends.mps.is_available())  # Should be True
Solution:
# Update PyTorch
pip install --upgrade torch torchvision

Optional: Jupyter Setup

For development and experimentation:
# Install dev extras
uv sync --extra dev

# Start Jupyter
jupyter notebook
Create a notebook to test components:
# test_notebook.ipynb
import sys
sys.path.append('../app')

from models.pytorch.cnn_builder import CustomCNNBuilder
from training.engine import TrainingEngine
import torch

# Test model building
config = {...}
builder = CustomCNNBuilder(config)
model = builder.build()
print(model)

Next Steps

Quick Start

Train your first model in 5 minutes

Architecture Guide

Understand the codebase structure

Training Guide

Best practices for model training

API Reference

Detailed API documentation

Getting Help

GitHub Issues

Report bugs or request features on the repository

Discussions

Ask questions and share experiences
Installation complete! You’re ready to start building malware classifiers.

Build docs developers (and LLMs) love