Version 1.0 — Complete guide to multi-model routing in Harvey
Harvey’s routing system allows you to dispatch prompts to remote LLM endpoints — other Ollama instances on your network, Llamafile servers, or cloud providers (Anthropic, DeepSeek, Gemini, Mistral, OpenAI). This enables:
Prefix any prompt with @name to send it to a registered
remote endpoint:
RSDOIEL
@claude refactor this module
HARVEY
Forwarding to CLAUDE.
CLAUDE
Here is the refactored code...
The response is streamed back into your local conversation history, so Harvey retains full context of the exchange.
# In Harvey REPL:
harvey> /route add pi2 ollama://192.168.1.12:11434 llama3.1:8b
# Or from shell:
harvey --route add pi2 ollama://192.168.1.12:11434 llama3.1:8bharvey> /route on
harvey> @pi2 explain this algorithm
HARVEY
Forwarding to PI2.
PI2
The algorithm implements a depth-first search with...
Harvey supports 10 endpoint types across local and cloud providers:
| Type | Scheme | Backend | Default Port | Notes |
|---|---|---|---|---|
| Ollama | ollama://host:port |
Ollama | 11434 | Official Ollama server |
| Ollama | http://host:port |
Ollama | 11434 | HTTP (insecure) |
| Ollama | https://host:port |
Ollama | 11434 | HTTPS (secure) |
| Llamafile | llamafile://host:port |
Llamafile | 8080 | OpenAI-compatible |
| llama.cpp | llamacpp://host:port |
llama.cpp | 8080 | OpenAI-compatible |
| Type | Scheme | Backend | API Key Variable | Notes |
|---|---|---|---|---|
| Anthropic | anthropic:// |
Claude | ANTHROPIC_API_KEY |
All Claude models |
| DeepSeek | deepseek:// |
DeepSeek | DEEPSEEK_API_KEY |
All DeepSeek models |
| Gemini | gemini:// |
GEMINI_API_KEY |
Primary | |
| Gemini | gemini:// |
GOOGLE_API_KEY |
Fallback | |
| Mistral | mistral:// |
Mistral | MISTRAL_API_KEY |
All Mistral models |
| OpenAI | openai:// |
OpenAI | OPENAI_API_KEY |
All OpenAI models |
<workspace>/agents/routes.jsonAll registered endpoints and routing state are persisted in JSON format inside the workspace. There is no global route configuration.
Location:
<workspace>/agents/routes.json
Example:
{
"enabled": true,
"endpoints": [
{
"name": "pi2",
"url": "ollama://192.168.1.12:11434",
"model": "llama3.1:8b",
"kind": "ollama"
},
{
"name": "pi3",
"url": "ollama://192.168.1.13:11434",
"model": "mistral:latest",
"kind": "ollama"
},
{
"name": "claude",
"url": "anthropic://",
"model": "claude-3-haiku",
"kind": "anthropic"
},
{
"name": "gemini",
"url": "gemini://",
"model": "gemini-1.5-flash",
"kind": "gemini"
}
]
}<workspace>/agents/harvey.yamlAdditional Harvey configuration (knowledge base path, sessions
directory, RAG stores, etc.) lives in agents/harvey.yaml
inside the workspace.
/route add NAME URL [MODEL]Register a new remote endpoint.
Arguments: - NAME: Unique identifier
(used in @mentions)
- URL: Full URL including scheme (see table above) -
MODEL: Optional default model for this endpoint
Examples:
/route add pi2 ollama://192.168.1.12:11434 llama3.1:8b
/route add pi2 ollama://192.168.1.12:11434
/route add claude anthropic:// claude-3-haiku
/route add myollama http://localhost:11434
/route add secure https://my-ollama.example.com:443
Notes: - NAME must be unique across all
endpoints - If MODEL is omitted, you must specify it each
time you use the endpoint - The URL scheme determines the backend
type
/route rm NAMERemove a registered endpoint.
Examples:
/route rm pi2
/route rm claude
/route listList all registered endpoints with reachability status.
Output:
Endpoints registered:
✓ @pi2 ollama://192.168.1.12:11434 llama3.1:8b
✓ @pi3 ollama://192.168.1.13:11434 mistral:latest
✗ @pi4 ollama://192.168.1.14:11434 (unreachable)
✓ @claude anthropic:// claude-3-haiku
? @gemini gemini:// (API key not set)
Status indicators: - ✓ (green):
Endpoint is reachable - ✗ (yellow): Endpoint is unreachable
- ? (yellow): API key not configured (cloud providers)
/route onEnable routing. State is persisted to
agents/routes.json.
/route offDisable routing. @mentions will be rejected. State is
persisted to agents/routes.json.
/route statusShow routing state and endpoint count.
Output:
Routing: on
Endpoints: 5 registered
Active endpoint: none (routing uses @mention)
# Register cluster nodes
harvey> /route add pi2 ollama://192.168.1.12:11434 llama3.1:8b
harvey> /route add pi3 ollama://192.168.1.13:11434 mistral:latest
harvey> /route add pi4 ollama://192.168.1.14:11434 phi3:latest
# Enable routing
harvey> /route on
# Distribute work
harvey> @pi2 write unit tests for the parser package
harvey> @pi3 review the API design document
harvey> @pi4 generate documentation
# Use local model for most work
harvey> Explain this Go code
# Switch to cloud for complex tasks
harvey> @claude This is a complex architecture decision, give me options
# Same prompt to multiple models
harvey> @llama3 How would you refactor this?
harvey> @mistral How would you refactor this?
harvey> @claude How would you refactor this?
# Compare responses in the same conversation
When you dispatch a prompt to a remote endpoint via @mention, Harvey sends:
Why 10 messages? This gives the remote model enough context without sending the entire conversation (which may exceed the remote model’s context window or waste tokens).
Example:
Local history:
[system] HARVEY.md content
[user] Read the parser code
[assistant] Here is the parser code...
[user] The ParseExpr function has a bug
[assistant] Yes, it doesn't handle empty input...
[user] @claude How should we fix this?
Sent to CLAUDE:
[system] HARVEY.md content
[user] Read the parser code
[assistant] Here is the parser code...
[user] The ParseExpr function has a bug
[assistant] Yes, it doesn't handle empty input...
[user] How should we fix this?
The @claude prefix is removed from the sent prompt.
| Good | Bad | Reason |
|---|---|---|
pi2, pi3, pi4 |
node2, server2 |
Consistent with role |
claude, mistral |
model1, api1 |
Descriptive of model |
gpu-box, big-iron |
machine, computer |
Descriptive of hardware |
dev, prod, test |
d, p, t |
Clear purpose |
Recommendations: - Use lowercase
names (Harvey normalizes to lowercase internally) - Use
hyphens for multi-word: gpu-box,
code-review - Avoid spaces and special characters - Keep
names short (used frequently in @mentions)
By hardware:
/route add pi2 ollama://192.168.1.12:11434
/route add pi3 ollama://192.168.1.13:11434
/route add pi4 ollama://192.168.1.14:11434
By capability:
/route add coder ollama://192.168.1.100:11434 codellama:13b
/route add reasoner ollama://192.168.1.100:11434 llama3:70b
/route add embedder ollama://192.168.1.100:11434 nomic-embed-text
By project:
/route add harvey ollama://localhost:11434 llama3.1:8b
/route add mable ollama://localhost:11434 mistral:latest
For coding tasks: - Local:
codellama:13b, llama3:latest - Cloud:
@claude (Claude 3 Sonnet/Haiku)
For reasoning/complex tasks: - Local:
llama3:70b (if you have the VRAM) - Cloud:
@claude (Claude 3 Opus)
For embedding: - Local:
nomic-embed-text, mxbai-embed-large
For small/quick tasks: - Local:
llama3:8b, phi3:latest - Cloud:
@mistral (Mistral Small)
Cause: The endpoint hasn’t been registered.
Fix:
/route add name URL [MODEL]
Cause: Routing has been turned off.
Fix:
/route on
Causes: - Ollama server not running - Wrong IP/port - Firewall blocking connection - Ollama not installed on remote machine
Fix:
# Check if Ollama is running on the remote
ssh pi2 ollama --version
# Start Ollama on the remote
ssh pi2 ollama serve
# Verify connectivity
curl http://192.168.1.12:11434/api/tagsCause: The required environment variable isn’t set.
Fix:
# For current session
export ANTHROPIC_API_KEY=sk-...
harvey
# Permanently (add to ~/.bashrc or ~/.zshrc)
echo 'export ANTHROPIC_API_KEY=sk-...' >> ~/.bashrc
source ~/.bashrcCause: The model hasn’t been pulled on the remote Ollama instance.
Fix:
# On the remote machine
ssh pi2 ollama pull llama3.1:8bOr pull it directly through Harvey:
harvey> /ollama pull llama3.1:8b
(Note: This pulls on the local machine, not the remote)
Routes can be managed programmatically via the
RouteRegistry type in Go code.
Example (from Harvey source):
// Add a route programmatically
rr := agent.Routes
rr.Add(&RouteEndpoint{
Name: "pi5",
URL: "ollama://192.168.1.15:11434",
Model: "llama3.1:8b",
Kind: KindOllama,
})
// Save to disk
SaveRouteConfig(rr)Never commit API keys to version control.
Recommended approach:
# Set in shell profile (not in repository)
echo 'export ANTHROPIC_API_KEY=...' >> ~/.bashrc
# Or use a secrets manager
eval "$(vault read -f json secret/harvey/anthropic)"Environment variable precedence: - Harvey checks environment variables at startup - Changing them requires restarting Harvey - Each cloud provider uses its own variable name
For local network routes: - Use HTTPS if exposed to untrusted networks - Consider firewall rules to restrict access - Use SSH tunneling for remote access
For cloud providers: - API keys have the same permissions as your account - Monitor usage to detect anomalies - Rotate keys periodically
Title: Multi-Model Development Session
Credit: Recorded by Harvey
Author: RSDOIEL
Date: 2026-05-04 14:00:00
Draft date: 2026-05-04
FADE IN:
INT. HARVEY AND RSDOIEL TALKING 2026-05-04 14:00:00
Harvey and RSDOIEL are in chat mode. Model: llama3:latest. Workspace: /home/user/project.
RSDOIEL
/route list
HARVEY
Endpoints registered:
✓ @pi2 ollama://192.168.1.12:11434 llama3.1:8b
✓ @pi3 ollama://192.168.1.13:11434 mistral:latest
✓ @claude anthropic:// claude-3-haiku
RSDOIEL
@pi2 write a unit test for the Parse function
HARVEY
Forwarding to PI2.
PI2
Here is a comprehensive test for Parse:
```go parser_test.go
func TestParse(t *testing.T) {
// ... test code
}
RSDOIEL /apply
INT. AGENT MODE 2026-05-04 14:05:00
HARVEY Harvey proposes to write 1 file(s) to the workspace.
HARVEY Write parser_test.go?
RSDOIEL yes
[[write: parser_test.go — ok]]
INT. HARVEY AND RSDOIEL TALKING 2026-05-04 14:06:00
Harvey and RSDOIEL are in chat mode. Model: llama3:latest. Workspace: /home/user/project.
RSDOIEL @claude review this test and suggest improvements
HARVEY Forwarding to CLAUDE.
CLAUDE The test is well-structured. Consider adding: 1. Edge case for empty input 2. Property-based testing 3. Benchmark
THE END. ```
See also: - CONFIGURATION.md — All configuration files - fountain-analysis skill — Analyze session files - ARCHITECTURE.md — Technical deep dive on routing implementation