Harvey Model Selection Guide

Last updated: 2026-05-12 — model inventory from /ollama probe (agents/model_cache.db). M1 Mac is the primary machine; Raspberry Pi 500+ runs a subset.


Capability legend

Tools — model sends tool_call responses that Harvey’s tool executor can dispatch (required for /run, /git, file-write operations).

Tagged — model respects Harvey’s ```path fenced-block syntax so autoExecute can write files without a /apply prompt. -1 = not yet probed.


Installed model inventory

Embedding-only models (not for chat)

Model Size Notes
nomic-embed-text:latest 137 MB Harvey default for RAG; 2K context
mxbai-embed-large 334 MB Strong English-only embedding
bge-m3:latest 567 MB Best installed embed; 8K context, multilingual
locusai/all-minilm-l6-v2:latest Lightweight sentence embedder

Do not select these for Harvey chat sessions — they produce unusable responses.


Chat / coding models by capability tier

Tier 1 — Very small (≤ 1 B params)

Model Params Context Tools Tagged
smollm:360m 362 M 2K -1
sailor2:1b 988 M 32K -1
smollm:1.7b 1.7 B 2K -1

Good for: Trivial string transformations, testing Harvey plumbing, one-sentence RAG lookups where the model just reads injected context and echoes it back.

Avoid for: Any reasoning, multi-step logic, code generation, anything requiring the model to hold more than one idea at once.

Note: sailor2:1b has a surprisingly large 32K context for its size — use it for token-constrained machines that need a slightly longer window.


Tier 2 — Small (2–4 B params)

Model Params Context Tools Tagged
granite3-moe:3b 3.4 B 4K
smallthinker:latest 3.4 B 32K -1
stable-code:3b 3 B 16K -1
cogito:3b 3.6 B 131K -1
phi4-mini:latest 3.8 B 131K -1

Good for: Single-function code generation with a clear spec, adding docstrings, answering scoped questions about a file already in context, writing short unit tests.

Avoid for: Multi-file analysis, architectural decisions, anything requiring context > 4K (for granite3-moe) or > ~16K (for others at this tier).

Recommended defaults at this tier: - Agent tasks needing tool-calling and tagged blocks: granite3-moe:3b (only 4K context — short sessions only) - Reasoning tasks: smallthinker:latest (chain-of-thought trained, 32K) - General fast chat with large context: cogito:3b or phi4-mini (both 131K) - Code-only, no tool support needed: stable-code:3b


Tier 3 — Medium (7–9 B params)

Model Params Context Tools Tagged
apertus-tools:8b 8.1 B 65K
gemma2:latest 9.2 B 8K -1
gemma4:latest 8 B 131K
llama3.1:latest 8 B 131K -1
ministral-3:latest 8.9 B 262K -1

Good for: Writing complete functions or small files, explaining existing code in depth, writing tests that require understanding of the code under test, debugging with a stack trace in context, most day-to-day Harvey coding tasks.

Avoid for: gemma2 for anything needing more than ~8K context.

Recommended defaults at this tier: - Agent/tool tasks with autoExecute: apertus-tools:8b — only 8B model confirmed tools=✓ and tagged=✓. Best pick for iterative coding loops. - General assistant: llama3.1:latest — reliable, well-tested instruction following. - Session handoffs / long docs: ministral-3:latest — 262K context fits entire session histories; Mistral models follow structured formatting reliably.


Tier 4 — Large (23–24 B params)

Model Params Context Tools Tagged
mistral-small:latest 23.6 B 32K -1
mistral-small3.2 24 B 131K -1
devstral-small-2:24b 24 B 393K

Good for: Multi-file refactoring, architectural reasoning, writing new features end-to-end, security review, writing documentation, anything that benefits from a large context window and strong reasoning.

Notes: - devstral-small-2:24b is Mistral’s coding-specialized model with the largest context window installed (393K tokens). First choice for complex software tasks on the M1 Mac. - mistral-small3.2 (131K) is the best general-purpose large model when you don’t need coding specialisation. - mistral-small:latest (32K context, older version) — prefer mistral-small3.2. - All Tier 4 models require ~16–20 GB free RAM. They will not run on the Raspberry Pi 500+.


Hardware constraints

Machine Max practical model Notes
M1 Mac devstral-small-2:24b (15 GB) All models available
Raspberry Pi 500+ ministral-3:latest (8.9B) or smaller 24B models will not run

When working on the Pi, treat ministral-3:latest as the Tier 4 ceiling and apertus-tools:8b as the everyday coding model.


Task rubric

Task Min tier Recommended (Mac) Recommended (Pi)
Add a docstring 2 phi4-mini phi4-mini
Quick Q&A on a /read file 2 cogito:3b cogito:3b
Write a unit test (known signature) 2–3 apertus-tools:8b apertus-tools:8b
Fix a bug with error + context 2–3 apertus-tools:8b apertus-tools:8b
Write a new function 3 apertus-tools:8b apertus-tools:8b
Write a complete new file 3 apertus-tools:8b llama3.1
Debug across multiple files 3–4 ministral-3 ministral-3
Multi-file feature (new code) 4 devstral-small-2:24b ministral-3
Architectural review 4 devstral-small-2:24b mistral-small3.2*
Write documentation 3–4 ministral-3 ministral-3
Security review 4 devstral-small-2:24b ministral-3
Reasoning / multi-step planning 2–4 smallthinker or mistral-small3.2 smallthinker
Agent loop (tool-calling + file writes) 2–3 apertus-tools:8b apertus-tools:8b
Session handoff (Fountain writing) 3 ministral-3 ministral-3
RAG-augmented Q&A 1–2 cogito:3b cogito:3b
Embedding for RAG bge-m3:latest nomic-embed-text

*mistral-small3.2 is 24B and may be marginal on Pi; use with care.


Suggested model aliases

Add to agents/harvey.yaml under model_aliases:. Use /ollama alias NAME MODEL or edit the file directly.

model_aliases:
  # Coding and agent work
  agent:      abb-decide/apertus-tools:8b-instruct-2509-q4_k_m
  agent-fast: granite3-moe:3b
  coder:      devstral-small-2:24b

  # General assistant
  chat:       llama3.1:latest
  chat-big:   mistral-small3.2
  fast:       cogito:3b

  # Long-context / documents
  docs:       ministral-3:latest
  long:       devstral-small-2:24b

  # Reasoning
  think:      smallthinker:latest

  # Code only (no tools needed)
  code-light: stable-code:3b

  # Embedding (for /rag setup)
  embed:      bge-m3:latest
  embed-fast: nomic-embed-text:latest

Planning a Harvey work session

Before starting, ask:

  1. What is the task? Pick a tier and model from the table above.
  2. Which machine? devstral-small-2:24b needs ~16 GB free — Mac only.
  3. Do I need tool-calling? Agent mode requires tools=✓. Use apertus-tools:8b or granite3-moe:3b (short context) for reliable tool dispatch.
  4. How much context? If /read-dir loads a full package, pick a 131K+ model. Tier 2 models below 32K will truncate silently.
  5. Is RAG useful? If the task involves an API the model doesn’t know well, run /rag on before asking.

Typical session setup

# 1. Start Harvey
harvey

# 2. Select model mid-session (or use an alias):
/ollama use agent              # → apertus-tools:8b (agent tasks)
/ollama use coder              # → devstral-small-2:24b (complex coding)
/ollama use docs               # → ministral-3:latest (long documents)
/ollama use fast               # → cogito:3b (quick Q&A)

# 3. Enable RAG if needed
/rag on

# 4. Load context
/read harvey/commands.go
/read-dir harvey/ --depth 1

# 5. Optionally load a skill bundle
/skill-set load fountain

What Harvey can realistically do with small models

Small models (Tier 2) work well when: - The task is scoped to a single function and you provide the signature - You have injected the relevant file via /read first - You ask for one thing at a time (no “also do X and Y and Z”) - The answer fits in a few hundred tokens

Small models fail when: - Asked to reason about code they haven’t seen - Context is filled with unrelated content - Asked to make design decisions without constraints - Asked to generate large amounts of code in one shot

Best pattern for small models in Harvey: 1. /read the specific file or function 2. Ask a focused question or give a tightly scoped task 3. Review the output before using /apply


Model trust and supply chain

Models pulled via ollama pull are contributed by third parties and downloaded from the Ollama registry. As with any software supply chain, prefer models from well-known organisations or projects you can independently verify. Model weights can encode biases or unexpected behaviours; test new models in a safe-mode session before granting broader permissions.

Keeping this guide current

Run /ollama probe after installing new models to update agents/model_cache.db. Re-examine the capability columns (tools, tagged blocks) and update the inventory table above when results change. The probed_at column tracks when each entry was last verified.