Skip to main content

Right-size LLM models to your hardware

Terminal tool that detects your system specs, scores hundreds of models across quality, speed, fit, and context, and tells you which ones will actually run well on your machine.

$ llmfit
System: 62GB RAM | 14 cores | NVIDIA RTX 4090 (24GB VRAM)
Qwen/Qwen2.5-Coder-7BScore: 94.5
Mistral-7B-InstructScore: 92.1
Llama-3.1-8B-InstructScore: 91.8

Quick Start

Get llmfit running on your system in under a minute

1

Install llmfit

Install using your preferred package manager:
brew install llmfit
2

Run the interactive TUI

Launch llmfit to see which models fit your hardware:
llmfit
The TUI displays your system specs at the top and ranks all models by fit score. Use arrow keys or j/k to navigate, / to search, and q to quit.
Your system specs (CPU, RAM, GPU name, VRAM, backend) appear at the top. Models are listed in a scrollable table sorted by composite score, with columns for:
  • Score: Composite score (0-100) across quality, speed, fit, and context
  • TPS: Estimated tokens per second
  • Quant: Best quantization for your hardware (Q8_0 to Q2_K)
  • Mode: Run mode (GPU, CPU+GPU, CPU, MoE)
  • Mem: Memory usage percentage
  • Context: Maximum context length
3

Filter and download models

Use keyboard shortcuts to filter results:
  • Press f to cycle fit filters (All, Runnable, Perfect, Good, Marginal)
  • Press / to search by name, provider, or use case
  • Press d on a selected model to download it via Ollama or llama.cpp
# Or use CLI mode for scripting
llmfit fit --perfect -n 5
llmfit recommend --json --use-case coding --limit 3
Models marked with a green ✓ in the Inst column are already installed via Ollama, llama.cpp, or MLX.

Explore by Topic

Deep dive into llmfit’s features and capabilities

Core Concepts

Understand hardware detection, scoring algorithms, and fit analysis

TUI Mode

Master the interactive terminal UI with keybindings and themes

CLI Mode

Use llmfit in scripts with subcommands and JSON output

REST API

Run llmfit as a node-level API for cluster scheduling

Provider Integration

Connect with Ollama, llama.cpp, and MLX for model downloads

Platform Support

GPU detection for NVIDIA, AMD, Intel Arc, and Apple Silicon

Key Features

Multi-GPU Detection

Detects NVIDIA, AMD ROCm, Intel Arc, Apple Silicon, and Ascend NPUs with aggregated VRAM reporting

Dynamic Quantization

Automatically selects the best quantization (Q8_0 to Q2_K) that fits your available memory

Multi-Dimensional Scoring

Ranks models across quality, speed, memory fit, and context window with use-case-specific weights

MoE Architecture Support

Optimizes Mixture-of-Experts models like Mixtral and DeepSeek with expert offloading

Command Reference

Explore all available CLI commands and REST API endpoints

llmfit

Launch interactive TUI

system

Show hardware specs

fit

Filter by fit level

search

Search model database

recommend

Get top recommendations

plan

Estimate hardware needs

serve

Start REST API server

REST API

HTTP endpoints

Core Library

Rust API reference

Ready to find the perfect model for your hardware?

Install llmfit and start discovering which LLMs will run well on your system