Skip to main content
This integration connects Ollama’s local models to LangChain.

Installation

First, install Ollama from ollama.com and pull a model:
ollama pull llama3.2
Then install the LangChain integration:
pip install -U langchain-ollama

Usage

from langchain_ollama import ChatOllama

model = ChatOllama(
    model="llama3.2",
    temperature=0.8,
    num_predict=256,
)

messages = [
    ("system", "You are a helpful assistant."),
    ("human", "What is the capital of France?"),
]

response = model.invoke(messages)
print(response.content)

Streaming

for chunk in model.stream(messages):
    print(chunk.content, end="")

API Reference

ChatOllama

model
str
required
Name of Ollama model to use (e.g., llama3.2, mistral, phi3).
temperature
float
default:"0.8"
Sampling temperature between 0.0 and 1.0. Higher values make output more random.
num_predict
int | None
default:"None"
Maximum number of tokens to generate.
reasoning
bool | None
default:"None"
Controls reasoning/thinking mode for supported models:
  • True: Enables reasoning mode. Reasoning is captured in additional_kwargs.reasoning_content
  • False: Disables reasoning mode
  • None: Uses model’s default behavior
base_url
str
default:"http://localhost:11434"
Base URL where Ollama is running.
top_k
int | None
default:"None"
Reduces probability of generating nonsense. Higher values give more diversity.
top_p
float | None
default:"None"
Works together with top_k. Higher values give more diversity.
num_ctx
int | None
default:"None"
Sets the size of the context window used to generate the next token.
repeat_penalty
float | None
default:"None"
Sets how strongly to penalize repetitions. Higher values make repetitions less likely.
validate_model_on_init
bool
default:"False"
Whether to validate that the model exists when initializing.
stop
list[str] | None
default:"None"
Stop sequences to end generation.

Supported Models

Ollama supports hundreds of models. Popular options include:
  • Llama 3.2: Fast, efficient model from Meta
  • Mistral: High-quality open model
  • Phi-3: Microsoft’s small language model
  • Gemma: Google’s open model
  • DeepSeek: Reasoning-capable models
  • Qwen: Alibaba’s multilingual models
Visit ollama.com/library for the full model catalog.

Features

  • Run models locally without API keys
  • Full privacy - no data sent to external servers
  • Tool calling (select models)
  • Vision capabilities (multimodal models)
  • Streaming
  • Async support
  • Custom model parameters
  • Reasoning mode for supported models
Ollama runs models locally on your machine. Performance depends on your hardware. GPU acceleration is recommended for larger models.

Build docs developers (and LLMs) love