Skip to main content
Ollama enables SlasshyWispr to run large language models locally for completely offline AI assistance.

What is Ollama?

Ollama is a lightweight, extensible framework for running large language models on your local machine. It provides:
  • Local AI Inference: Run models like Llama, Mistral, and Gemma without internet
  • Simple API: Compatible with OpenAI API format
  • Model Management: Easy model pulling, updating, and version control
  • Cross-Platform: Works on macOS, Linux, and Windows
  • GPU Acceleration: Automatic CUDA and Metal support
SlasshyWispr communicates with Ollama via HTTP API calls, keeping your conversations completely private and offline.

Installation and Setup

1

Install Ollama

Download and install Ollama from ollama.aimacOS / Linux:
curl -fsSL https://ollama.ai/install.sh | sh
Windows: Download the installer from the Ollama website
2

Start Ollama Service

Ollama runs as a background service after installation.macOS / Linux:
ollama serve
Windows: Ollama starts automatically after installation
The Ollama service must be running for SlasshyWispr to communicate with it.
3

Verify Installation

Check that Ollama is running:
ollama --version
Test the API endpoint:
curl http://127.0.0.1:11434/api/tags

Ollama Base URL Configuration

SlasshyWispr connects to Ollama via its HTTP API endpoint.

Default Configuration

The default Ollama base URL is:
http://127.0.0.1:11434
This is the standard local endpoint where Ollama serves its API.

Custom Base URL

You can configure a custom base URL if:
  • Ollama is running on a different port
  • Ollama is running on a remote machine
  • You’re using a reverse proxy
1

Open Settings

Navigate to Settings > Offline in SlasshyWispr
2

Update Base URL

Find the Ollama Base URL field and enter your custom URLExamples:
  • Different port: http://127.0.0.1:8080
  • Remote server: http://192.168.1.100:11434
  • HTTPS endpoint: https://ollama.example.com
3

Test Connection

SlasshyWispr will automatically verify the connection to your Ollama instance
Using a remote Ollama instance over the internet may expose your conversations to network monitoring. Use HTTPS and secure networking when connecting to remote instances.

Model Pulling

Before using a model with SlasshyWispr, you need to pull it from the Ollama library.

Using Ollama CLI

1

Browse Available Models

Visit ollama.ai/library to see available models
2

Pull a Model

Use the Ollama CLI to download a model:
ollama pull llama3.2
Other popular models:
ollama pull mistral
ollama pull gemma2
ollama pull qwen2.5
3

Wait for Download

Models range from 1GB to 40GB+ depending on size. Download time varies by internet speed.
4

Verify Model

List downloaded models:
ollama list

Using SlasshyWispr UI

1

Open Offline Settings

Navigate to Settings > Offline tab
2

Enter Model Name

In the Local Ollama Model field, type the model name (e.g., llama3.2)
3

Pull Model

Click Pull Model to download directly from SlasshyWisprProgress will be shown with:
  • Download status
  • Model being pulled
  • Success/failure indication
Models pulled via either method are stored in Ollama’s model directory and accessible to both Ollama CLI and SlasshyWispr.

Ollama Status Checking

SlasshyWispr continuously monitors your Ollama installation status.

Status Response Fields

The Ollama status check returns:
interface OllamaStatusResponse {
  installed: boolean;      // Whether Ollama is installed on the system
  running: boolean;        // Whether Ollama service is currently running
  version: string;         // Ollama version (e.g., "0.1.27")
  details: string;         // Additional status information or errors
}

Status Indicators

In SlasshyWispr Settings > Offline:
  • Green dot: Ollama installed and running
  • Yellow dot: Ollama installed but not running
  • Red dot: Ollama not detected or error

Common Status Issues

1

Ollama Not Installed

Status: installed: falseSolution: Install Ollama following the steps above
2

Ollama Not Running

Status: installed: true, running: falseSolution: Start the Ollama service:
ollama serve
3

Connection Refused

Status: Connection error in detailsSolution: Check that Ollama base URL is correct and firewall allows local connections

Compatible Models

SlasshyWispr works with any Ollama-compatible model. Here are recommended models by use case:

Conversational AI

Llama 3.2

Fast, efficient, excellent for general conversation
ollama pull llama3.2

Mistral

Balanced performance and quality
ollama pull mistral

Gemma 2

Google’s efficient language model
ollama pull gemma2

Qwen 2.5

Strong multilingual capabilities
ollama pull qwen2.5

Model Sizes

Most models come in multiple sizes (parameter counts):
SizeRAM RequiredPerformance
1B-3B4-8 GBFast, basic tasks
7B-8B8-16 GBBalanced, good quality
13B-14B16-32 GBHigh quality
30B+32+ GBBest quality, slower
Pull specific sizes by appending the parameter count:
ollama pull llama3.2:1b
ollama pull llama3.2:3b
ollama pull mistral:7b

Specialized Models

  • Code Generation: codellama, deepseek-coder
  • Chat Optimized: Models with :chat or :instruct tags
  • Uncensored: Models with :uncensored tag for unrestricted responses

Testing Your Setup

1

Verify Ollama Status

In SlasshyWispr Settings > Offline, confirm Ollama shows as installed and running
2

Select Local Mode

In Settings > Models, set AI Runtime Mode to Local
3

Choose Model

Select your pulled model from the Local Ollama Model dropdown
4

Test Dictation

Use your push-to-talk hotkey and ask a questionExample: “What is the capital of France?”
5

Verify Response

You should receive a response generated entirely locally without internet

Performance Optimization

GPU Acceleration

Ollama automatically uses GPU when available:
  • NVIDIA GPUs: Requires CUDA toolkit
  • Apple Silicon: Uses Metal acceleration
  • AMD GPUs: ROCm support (Linux)
Check GPU usage during inference:
ollama ps

Context Window

Adjust context size for longer conversations:
ollama run llama3.2 --ctx-size 4096

Concurrent Requests

Ollama can handle multiple requests. Configure in Ollama settings:
export OLLAMA_NUM_PARALLEL=2

Troubleshooting

Model Pull Fails

Issue: Cannot download model Solutions:
  • Check internet connection
  • Verify sufficient disk space (models are large)
  • Try pulling via Ollama CLI directly
  • Check Ollama logs for errors

Slow Inference

Issue: AI responses take too long Solutions:
  • Use a smaller model (3B instead of 13B)
  • Enable GPU acceleration if available
  • Close other resource-intensive applications
  • Check hardware requirements

Connection Errors

Issue: SlasshyWispr cannot connect to Ollama Solutions:
  • Verify Ollama is running: ollama serve
  • Check base URL matches Ollama’s listening address
  • Test manually: curl http://127.0.0.1:11434/api/tags
  • Check firewall settings for localhost connections

Out of Memory

Issue: Model fails to load or crashes Solutions:
  • Use a smaller model variant
  • Close other applications to free RAM
  • Increase system swap space
  • See hardware requirements for RAM recommendations
Running models larger than your available RAM will cause severe performance degradation or crashes. Always choose models appropriate for your hardware.

Build docs developers (and LLMs) love