Right-size LLM models to your hardware

Terminal tool that detects your system specs, scores hundreds of models across quality, speed, fit, and context, and tells you which ones will actually run well on your machine.

Get Started Command Reference

$ llmfit

System: 62GB RAM | 14 cores | NVIDIA RTX 4090 (24GB VRAM)

Qwen/Qwen2.5-Coder-7BScore: 94.5

Mistral-7B-InstructScore: 92.1

Llama-3.1-8B-InstructScore: 91.8

Quick Start

Get llmfit running on your system in under a minute

Install llmfit

Install using your preferred package manager:

brew install llmfit

Run the interactive TUI

Launch llmfit to see which models fit your hardware:

llmfit

The TUI displays your system specs at the top and ranks all models by fit score. Use arrow keys or j/k to navigate, / to search, and q to quit.

Example TUI output

Your system specs (CPU, RAM, GPU name, VRAM, backend) appear at the top. Models are listed in a scrollable table sorted by composite score, with columns for:

Score: Composite score (0-100) across quality, speed, fit, and context
TPS: Estimated tokens per second
Quant: Best quantization for your hardware (Q8_0 to Q2_K)
Mode: Run mode (GPU, CPU+GPU, CPU, MoE)
Mem: Memory usage percentage
Context: Maximum context length

Filter and download models

Use keyboard shortcuts to filter results:

Press f to cycle fit filters (All, Runnable, Perfect, Good, Marginal)
Press / to search by name, provider, or use case
Press d on a selected model to download it via Ollama or llama.cpp

# Or use CLI mode for scripting
llmfit fit --perfect -n 5
llmfit recommend --json --use-case coding --limit 3

Models marked with a green ✓ in the Inst column are already installed via Ollama, llama.cpp, or MLX.

Explore by Topic

Deep dive into llmfit’s features and capabilities

Core Concepts

Understand hardware detection, scoring algorithms, and fit analysis

TUI Mode

Master the interactive terminal UI with keybindings and themes

CLI Mode

Use llmfit in scripts with subcommands and JSON output

REST API

Run llmfit as a node-level API for cluster scheduling

Provider Integration

Connect with Ollama, llama.cpp, and MLX for model downloads

Platform Support

GPU detection for NVIDIA, AMD, Intel Arc, and Apple Silicon

Key Features

Multi-GPU Detection

Detects NVIDIA, AMD ROCm, Intel Arc, Apple Silicon, and Ascend NPUs with aggregated VRAM reporting

Dynamic Quantization

Automatically selects the best quantization (Q8_0 to Q2_K) that fits your available memory

Multi-Dimensional Scoring

Ranks models across quality, speed, memory fit, and context window with use-case-specific weights

MoE Architecture Support

Optimizes Mixture-of-Experts models like Mixtral and DeepSeek with expert offloading

Command Reference

Explore all available CLI commands and REST API endpoints

llmfit

Launch interactive TUI

system

Show hardware specs

fit

Filter by fit level

search

Search model database

recommend

Get top recommendations

plan

Estimate hardware needs

serve

Start REST API server

REST API

HTTP endpoints

Core Library

Rust API reference

Ready to find the perfect model for your hardware?

Install llmfit and start discovering which LLMs will run well on your system

Installation Guide View on GitHub

Get Started

Core Concepts

Guides

Platform Support

Right-size LLM models to your hardware

Quick Start

Explore by Topic

Core Concepts

TUI Mode

CLI Mode

REST API

Provider Integration

Platform Support

Key Features

Multi-GPU Detection

Dynamic Quantization

Multi-Dimensional Scoring

MoE Architecture Support

Command Reference

llmfit

system

fit

search

recommend

plan

serve

REST API

Core Library

Ready to find the perfect model for your hardware?