Installation Guide
Comprehensive instructions for setting up the malware classification platform on your system.
System Requirements
Minimum Requirements
Operating System
Linux (Ubuntu 20.04+)
macOS (11.0+)
Windows 10/11
Python
Python 3.10 or higher
pip or uv package manager
Memory
8 GB RAM (minimum)
16 GB RAM (recommended)
Storage
5 GB for dependencies
Additional space for datasets and models
Recommended Hardware
NVIDIA GPU
Apple Silicon
CPU Only
Best for : Maximum training speed
NVIDIA GPU with CUDA support (compute capability 3.5+)
6 GB+ VRAM (8 GB+ recommended)
CUDA Toolkit 11.8 or 12.1
cuDNN 8.x
Recommended GPUs:
RTX 3060 (12 GB)
RTX 3080 (10 GB)
RTX 4090 (24 GB)
Tesla T4, V100, A100 (cloud)
Best for : MacBook Pro/Air users
Apple M1, M2, or M3 chip
16 GB+ unified memory recommended
macOS 12.0+ (Monterey or later)
PyTorch with MPS backend support
Performance:
M1 Pro/Max: Similar to GTX 1660 Ti
M2 Pro/Max: Similar to RTX 3060
M3 Max: Similar to RTX 4070
Best for : Testing and small datasets
Multi-core CPU (6+ cores recommended)
16 GB+ RAM
SSD storage
Training will be 10-50x slower than GPU. Suitable for experimentation but not production training.
Installation Methods
Method 1: Using uv (Recommended)
uv is a fast Python package installer and resolver.
Install uv
macOS/Linux
Windows
Alternative (pip)
curl -LsSf https://astral.sh/uv/install.sh | sh
powershell - c "irm https://astral.sh/uv/install.ps1 | iex"
Clone Repository
git clone < repository-ur l >
cd < repository-nam e >
Install Dependencies
# Installs all dependencies from pyproject.toml
uv sync
# With optional dev dependencies (Jupyter, notebooks)
uv sync --extra dev
# With deployment dependencies (Docker, Redis)
uv sync --extra deployment
Activate Environment
# On Unix/macOS
source .venv/bin/activate
# On Windows
.venv\Scripts\activate
Method 2: Using pip
Check Python Version
python --version # Should be 3.10+
If you have multiple Python versions, you may need to use python3.10 or python3.11 explicitly.
Clone Repository
git clone < repository-ur l >
cd < repository-nam e >
Create Virtual Environment
python -m venv venv
# Activate on Unix/macOS
source venv/bin/activate
# Activate on Windows
venv\Scripts\activate
Install Package
# Install in editable mode
pip install -e .
# Or install with extras
pip install -e ".[dev]"
pip install -e ".[deployment]"
GPU Setup
NVIDIA CUDA Setup
Install NVIDIA Drivers
# Check current driver
nvidia-smi
# Install latest driver
sudo ubuntu-drivers autoinstall
sudo reboot
Install CUDA Toolkit
PyTorch includes CUDA libraries, but for development: # Ubuntu 22.04 example
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get install cuda-toolkit-12-1
See CUDA Installation Guide
Verify CUDA Installation
Expected output: +-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 NVIDIA RTX 3080 Off | 00000000:01:00.0 On | N/A |
| 30% 45C P8 25W / 320W | 512MiB / 10240MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
Test PyTorch CUDA
import torch
print ( f "CUDA available: { torch.cuda.is_available() } " )
print ( f "CUDA version: { torch.version.cuda } " )
print ( f "Device name: { torch.cuda.get_device_name( 0 ) } " )
Expected: CUDA available: True
CUDA version: 12.1
Device name: NVIDIA GeForce RTX 3080
Apple Silicon MPS Setup
Verify macOS Version
Ensure macOS 12.3+ (MPS support requires PyTorch 1.12+)
Install Dependencies
# Use uv or pip as described above
uv sync
PyTorch will automatically detect and use MPS backend on Apple Silicon
Test MPS Support
import torch
print ( f "MPS available: { torch.backends.mps.is_available() } " )
print ( f "MPS built: { torch.backends.mps.is_built() } " )
# Create tensor on MPS device
device = torch.device( "mps" )
x = torch.randn( 3 , 3 , device = device)
print (x)
Dependency Overview
The project uses pyproject.toml for dependency management:
Core Dependencies
ML & Deep Learning
Dashboard & Viz
Utilities
Development
[ project ]
dependencies = [
# PyTorch ecosystem
"torch>=2.1.0" ,
"torchvision>=0.16.0" ,
"torch-geometric>=2.4.0" ,
# Traditional ML
"xgboost>=2.0.0" ,
"lightgbm>=4.1.0" ,
"scikit-learn>=1.4.0" ,
# Transformers
"transformers>=4.35.0" ,
# Data processing
"polars>=0.19.0" ,
"pyarrow>=14.0.0" ,
"pandas>=2.1.0" ,
"numpy>=1.24.0" ,
]
Dataset Setup
Create Dataset Directory
cd app
mkdir -p dataset/malware
Organize Your Data
Place malware images in class-based folders: dataset/malware/
├── Adialer.C/
│ ├── 00ebc5bf03e5a8e52ad171b2ad9d7e51.png
│ ├── 010c3b03d2901a4edd76f52e06f34c20.png
│ └── ...
├── Agent.FYI/
│ └── ...
├── Allaple.A/
│ └── ...
└── ...
The app automatically scans this directory and detects classes. Folder names become class labels.
Verify Dataset
# Check dataset structure
ls -lh dataset/malware/
# Count images per class
for dir in dataset/malware/*/ ; do
echo "$( basename " $dir "): $( ls " $dir " | wc -l ) images"
done
PNG (recommended for malware visualization)
JPEG/JPG
BMP
TIFF
Ensure all images in a class folder are valid. Corrupted images will cause dataset loading errors.
Verify Installation
Run Import Test
# test_imports.py
import torch
import torchvision
import streamlit
import plotly
import sklearn
print ( f "✓ PyTorch { torch. __version__ } " )
print ( f "✓ torchvision { torchvision. __version__ } " )
print ( f "✓ Streamlit { streamlit. __version__ } " )
print ( f "✓ CUDA available: { torch.cuda.is_available() } " )
print ( f "✓ MPS available: { torch.backends.mps.is_available() } " )
Run:
Check GPU Detection
# test_gpu.py
from app.components.utils import get_compute_device, get_gpu_memory
device_info = get_compute_device()
print ( f "Device Type: { device_info[ 'type' ] } " )
print ( f "Device Name: { device_info[ 'name' ] } " )
print ( f "Available: { device_info[ 'available' ] } " )
print ( f "Memory: { get_gpu_memory() } " )
Launch Dashboard
cd app
streamlit run main.py
Visit http://localhost:8501 and verify:
Dashboard loads without errors
Device info shows in header
Navigation pages are accessible
Configuration Files
Streamlit Configuration
The app includes a pre-configured .streamlit/config.toml:
[ theme ]
base = "dark"
primaryColor = "#98c127"
backgroundColor = "#0e1117"
secondaryBackgroundColor = "#262730"
[ server ]
headless = true
port = 8501
Customize as needed:
cd app/.streamlit
nano config.toml
Storage Directories
The app automatically creates these directories:
# From constants.py
STORAGE_ROOT = Path( "storage" )
SESSIONS_DIR = STORAGE_ROOT / "sessions" # Saved sessions
MODELS_DIR = STORAGE_ROOT / "models" # Trained models
RESULTS_DIR = STORAGE_ROOT / "results" # Training results
CHECKPOINTS_DIR = STORAGE_ROOT / "checkpoints" # Model checkpoints
Locations (relative to app/):
app/storage/
├── sessions/ # Session state JSON files
├── models/ # Final trained models (.pth)
├── results/ # Metrics, charts, logs
└── checkpoints/ # Best model checkpoints
Troubleshooting Installation
ImportError: No module named 'torch'
Cause : PyTorch not installed correctlySolution :# Reinstall PyTorch
pip uninstall torch torchvision
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
Cause : GPU doesn’t have enough VRAMSolutions :
Reduce batch size in training config
Use smaller model (EfficientNetB0 instead of ResNet101)
Reduce image size (192x192 instead of 224x224)
Close other GPU applications
Check memory usage:
Streamlit shows blank page
Causes & Solutions :
Port already in use :
streamlit run main.py --server.port 8502
Browser cache :
Hard refresh (Ctrl+Shift+R or Cmd+Shift+R)
Clear browser cache
Module import errors :
streamlit run main.py --logger.level=debug
Check terminal for error messages
Module not found errors in Streamlit
Cause : Running from wrong directorySolution :# Must run from app/ directory
cd app
streamlit run main.py
# NOT from repo root:
# cd repo_root
# streamlit run app/main.py # ❌ Wrong!
Cause : uv not in PATHSolutions :
Restart terminal after installation
Add to PATH manually:
# Add to ~/.bashrc or ~/.zshrc
export PATH = " $HOME /.cargo/bin: $PATH "
source ~/.bashrc
Use pip instead:
MPS backend not available (Mac)
Causes :
macOS < 12.3
PyTorch < 1.12
Intel Mac (MPS is Apple Silicon only)
Check :import torch
print (torch.backends.mps.is_built()) # Should be True
print (torch.backends.mps.is_available()) # Should be True
Solution :# Update PyTorch
pip install --upgrade torch torchvision
Optional: Jupyter Setup
For development and experimentation:
# Install dev extras
uv sync --extra dev
# Start Jupyter
jupyter notebook
Create a notebook to test components:
# test_notebook.ipynb
import sys
sys.path.append( '../app' )
from models.pytorch.cnn_builder import CustomCNNBuilder
from training.engine import TrainingEngine
import torch
# Test model building
config = { ... }
builder = CustomCNNBuilder(config)
model = builder.build()
print (model)
Next Steps
Quick Start Train your first model in 5 minutes
Architecture Guide Understand the codebase structure
Training Guide Best practices for model training
API Reference Detailed API documentation
Getting Help
GitHub Issues Report bugs or request features on the repository
Discussions Ask questions and share experiences
Installation complete! You’re ready to start building malware classifiers.