Right-size LLM models to your hardware
Terminal tool that detects your system specs, scores hundreds of models across quality, speed, fit, and context, and tells you which ones will actually run well on your machine.
Quick Start
Get llmfit running on your system in under a minute
Run the interactive TUI
j/k to navigate, / to search, and q to quit.Example TUI output
Example TUI output
- Score: Composite score (0-100) across quality, speed, fit, and context
- TPS: Estimated tokens per second
- Quant: Best quantization for your hardware (Q8_0 to Q2_K)
- Mode: Run mode (GPU, CPU+GPU, CPU, MoE)
- Mem: Memory usage percentage
- Context: Maximum context length
Filter and download models
- Press
fto cycle fit filters (All, Runnable, Perfect, Good, Marginal) - Press
/to search by name, provider, or use case - Press
don a selected model to download it via Ollama or llama.cpp
Explore by Topic
Deep dive into llmfit’s features and capabilities
Core Concepts
TUI Mode
CLI Mode
REST API
Provider Integration
Platform Support
Key Features
Multi-GPU Detection
Detects NVIDIA, AMD ROCm, Intel Arc, Apple Silicon, and Ascend NPUs with aggregated VRAM reporting
Dynamic Quantization
Automatically selects the best quantization (Q8_0 to Q2_K) that fits your available memory
Multi-Dimensional Scoring
Ranks models across quality, speed, memory fit, and context window with use-case-specific weights
MoE Architecture Support
Optimizes Mixture-of-Experts models like Mixtral and DeepSeek with expert offloading
Command Reference
Explore all available CLI commands and REST API endpoints
llmfit
system
fit
search
recommend
plan
serve
REST API
Core Library
Ready to find the perfect model for your hardware?
Install llmfit and start discovering which LLMs will run well on your system
