Version 1.0 — Complete guide to model capability caching in Harvey
Harvey’s Model Cache is a SQLite-backed database that stores capability metadata for Ollama models. This caching system significantly speeds up Harvey’s startup time by avoiding the need to re-probe every model on each launch.
| Problem | Without Cache | With Cache |
|---|---|---|
| Slow startup with many models | Probes every model on each startup (5-10s per model) | Loads cached results instantly |
| Redundant network calls | Repeated /api/show requests to Ollama | Single probe per model, cached indefinitely |
| Inconsistent capability detection | Must re-check every time | Results persist until explicitly updated |
The model cache works automatically — no configuration required:
# First run: probes all installed models and caches results
harvey
# Subsequent runs: loads from cache, much faster
harvey
# Force re-probe a specific model
harvey> /ollama probe llama3.2:latest┌─────────────────────────────────────────────────────────────────────┐
│ MODEL CACHE ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────┐ │
│ │ Ollama │ │ Model Cache │ │ Harvey │ │
│ │ Server │────▶│ (SQLite DB) │────▶│ Startup │ │
│ │ │ │ │ │ │ │
│ │ /api/show │ │ model_cache.db │ │ Load cache │ │
│ │ /api/embed │ │ │ │ │ │
│ └─────────────────┘ └─────────────────┘ └─────────────┘ │
│ │ │ │
│ │ Probe (fast/thorough) │ Update cache │
│ ▼ ▼ ▼
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ ModelCapability │ │
│ │ - name, family, parameter_size, quantization │ │
│ │ - size_bytes, context_length │ │
│ │ - supports_tools, supports_embed (CapabilityStatus enum) │ │
│ │ - probe_level ("none", "fast", "thorough") │ │
│ │ - probed_at (timestamp) │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
agents/model_cache.db)FastProbeModel() or
ThoroughProbeModel()/api/show/api/embed endpointCREATE TABLE IF NOT EXISTS model_capabilities (
name TEXT PRIMARY KEY,
family TEXT NOT NULL DEFAULT '',
parameter_size TEXT NOT NULL DEFAULT '',
quantization TEXT NOT NULL DEFAULT '',
size_bytes INTEGER NOT NULL DEFAULT 0,
context_length INTEGER NOT NULL DEFAULT 0,
supports_tools INTEGER NOT NULL DEFAULT -1,
supports_embed INTEGER NOT NULL DEFAULT -1,
probe_level TEXT NOT NULL DEFAULT 'none',
probed_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
PRAGMA foreign_keys = ON;
PRAGMA journal_mode = WAL;| Column | Type | Description |
|---|---|---|
name |
TEXT (PK) | Full model identifier, e.g., “llama3.2:latest”, “nomic-embed-text” |
family |
TEXT | Model family, e.g., “llama”, “phi”, “mistral”, “nomic” |
parameter_size |
TEXT | Human-readable parameter count, e.g., “8.0B”, “70B” |
quantization |
TEXT | Quantization level, e.g., “Q4_K_M”, “Q8_0” |
size_bytes |
INTEGER | Size on disk in bytes |
context_length |
INTEGER | Context window in tokens; 0 = unknown |
supports_tools |
INTEGER | CapabilityStatus enum: -1=unknown, 0=no, 1=yes |
supports_embed |
INTEGER | CapabilityStatus enum: -1=unknown, 0=no, 1=yes |
probe_level |
TEXT | “none”, “fast”, or “thorough” |
probed_at |
DATETIME | When the last probe ran |
name — fast lookup by
model nameCapabilityStatusAn enum representing whether a model capability is confirmed, denied, or unknown.
type CapabilityStatus int
const (
CapUnknown CapabilityStatus = -1 // Not yet probed
CapNo CapabilityStatus = 0 // Confirmed absent
CapYes CapabilityStatus = 1 // Confirmed present
)String Representation: - CapUnknown →
“?” - CapNo → “—”
- CapYes → “✓”
ModelCapabilityHolds all cached metadata for a single model.
type ModelCapability struct {
Name string // Full model identifier
Family string // Model family
ParameterSize string // Human-readable size
Quantization string // Quantization level
SizeBytes int64 // Bytes on disk
ContextLength int // Context window in tokens
SupportsTools CapabilityStatus // Tool/function calling support
SupportsEmbed CapabilityStatus // Embedding support
ProbeLevel string // "none", "fast", or "thorough"
ProbedAt time.Time // When last probed
}ModelCacheThe main handle for the model cache database.
type ModelCache struct {
db *sql.DB
path string
}
// OpenModelCache opens (or creates) the model cache database.
// customPath overrides the default location (agents/model_cache.db).
func OpenModelCache(ws *Workspace, customPath string) (*ModelCache, error)
// Path returns the absolute path of the cache file.
func (mc *ModelCache) Path() string
// Close releases the database connection.
func (mc *ModelCache) Close() errorGet(name string) (*ModelCapability, error)Returns the cached capability for the named model, or nil if not found.
cap, err := mc.Get("llama3.2:latest")
if cap == nil {
// Model not in cache
}
if cap.SupportsTools == CapYes {
fmt.Println("Model supports tools")
}Parameters: - name — Full model
identifier (e.g., “llama3.2:latest”)
Returns: - *ModelCapability — cached
entry; nil if not found - error — non-nil on database error
(not on missing row)
Set(cap *ModelCapability) errorUpserts a ModelCapability into the cache. Existing entries are completely replaced.
cap := &ModelCapability{
Name: "llama3.2:latest",
Family: "llama",
SupportsTools: CapYes,
SupportsEmbed: CapNo,
ProbeLevel: "fast",
ProbedAt: time.Now(),
}
err := mc.Set(cap)Parameters: - cap — Capability record
to store
Returns: - error — non-nil on database
write failure
Delete(name string) errorRemoves the cache entry for the named model.
err := mc.Delete("old-model:tag")Parameters: - name — Full model
name
Returns: - error — non-nil on database
write failure
All() ([]ModelCapability, error)Returns all cached model capabilities, ordered by name.
allCaps, err := mc.All()
for _, cap := range allCaps {
fmt.Printf("%s (%s): tools=%s embed=%s\n",
cap.Name, cap.Family, cap.SupportsTools, cap.SupportsEmbed)
}Returns: - []ModelCapability — all
cached entries; empty slice if cache is empty - error —
non-nil on database read failure
Harvey uses two levels of capability probing, implemented in
ollama.go:
FastProbeModel)Uses heuristics to determine capabilities from the model’s
/api/show response:
Tool Support Detection: 1. Checks the
Capabilities array from /api/show
(authoritative on Ollama ≥ 0.3) 2. Falls back to checking for known
tool-call template markers: - {% if tools %} (Llama 3,
Granite - Jinja2) - [TOOL_CALLS],
[AVAILABLE_TOOLS] (Mistral, Ministral) -
<tool_call>, ✿FUNCTION✿ (Qwen 2.x
variants) - <function_calls> (Gemma 4 and others)
Embedding Support Detection: - Checks if model name
contains known embedding-model keywords: - embed,
e5-, bge-, gte-,
minilm, nomic, mxbai,
jina
Advantages: - Single API call (/api/show) - Fast execution - No embedding model test required
Limitations: - Embedding detection is heuristic-based - May produce false positives for models with embedding keywords in name
ThoroughProbeModel)First runs FastProbeModel, then makes a live /api/embed
request to confirm embedding support:
/api/embedSupportsEmbed = CapYesSupportsEmbed = CapNoAdvantages: - Definitive embedding support confirmation - Accurate results
Limitations: - Requires embedding model to be loaded in Ollama - Slower (additional API call) - Still uses heuristics for tool support
| Probe Level | Tool Detection | Embed Detection | Speed | API Calls |
|---|---|---|---|---|
| none | Not probed | Not probed | Instant | 0 |
| fast | Heuristic + Capabilities | Keyword-based | Fast | 1 |
| thorough | Heuristic + Capabilities | Live test | Slow | 2 |
Harvey automatically probes models when needed:
// In harvey initialization
if model, ok := knownModels[name]; !ok {
cap, err := FastProbeModel(ctx, ollamaURL, name)
if err == nil {
cache.Set(cap)
}
}# Probe a specific model
harvey> /ollama probe llama3.2:latest
# Probe all installed models
harvey> /ollama probe --all# List all models with their capabilities
harvey> /ollama list
# The output shows:
# - Model name and family
# - Parameter size and quantization
# - Context length
# - Tool support (✓, —, ?)
# - Embedding support (✓, —, ?)// Open the cache
mc, err := OpenModelCache(ws, "")
defer mc.Close()
// Get a specific model's capabilities
cap, err := mc.Get("llama3.2:latest")
if cap != nil {
fmt.Printf("Tools: %s, Embed: %s\n", cap.SupportsTools, cap.SupportsEmbed)
}
// Iterate all cached models
all, _ := mc.All()
for _, c := range all {
if c.SupportsEmbed == CapYes {
fmt.Println(c.Name, "supports embeddings")
}
}Default: agents/model_cache.db in the
workspace root
Custom Path:
mc, err := OpenModelCache(ws, "custom/path/model_cache.db")YAML Configuration:
# In harvey.yaml
model_cache:
path: custom/path/model_cache.dbThe database is configured with: - Journal Mode: WAL (Write-Ahead Logging) - Foreign Keys: ON - Max Connections: 1 (prevents locking issues)
agents/model_cache.db contains valuable metadatallama3.2:latestnomic-embed-text:latestgranite-code:3bllama3.2 (missing tag)nomic-embed-text (missing tag)| Issue | Cause | Solution |
|---|---|---|
| Model not found in cache | Never probed or deleted | Run /ollama probe MODEL |
| Outdated cache entries | Model updated in Ollama | Re-probe the model or delete and re-probe |
| Database locked | Multiple connections | Harvey uses MaxOpenConns(1) to prevent this |
| “None” probe level | Model never probed | Run a probe to populate |
| Incorrect tool support | Heuristic detection failed | Use thorough probe or manually verify |
| Incorrect embed support | Keyword detection failed | Use thorough probe for definitive answer |
To force a fresh probe of a model:
// Delete the old entry
mc.Delete("model:tag")
// Run a new probe
cap, err := FastProbeModel(ctx, ollamaURL, "model:tag")
if err == nil {
mc.Set(cap)
}If the database file is corrupted:
# Remove the corrupted file
rm agents/model_cache.db
# Harvey will create a new one on next startup
# All capabilities will be re-probed# Check the database directly
sqlite3 agents/model_cache.db "SELECT name, probe_level, supports_tools, supports_embed FROM model_capabilities"
# Count entries
sqlite3 agents/model_cache.db "SELECT COUNT(*) FROM model_capabilities"| Probe Type | Time | Network Calls |
|---|---|---|
| Fast | 50-200ms | 1 (/api/show) |
| Thorough | 1-5s | 2 (/api/show + /api/embed) |
Harvey v0.2+ includes automatic model caching:
The cache schema is automatically migrated on open: - Missing columns are added - Indexes are created - Existing data is preserved
No manual migration needed.
| File | Description |
|---|---|
harvey/model_cache.go |
Core cache implementation |
harvey/ollama.go |
Probing functions (FastProbeModel, ThoroughProbeModel) |
agents/model_cache.db |
Default cache database location |
harvey.yaml |
Configuration for cache path |
Documentation generated from model_cache.go and ollama.go source code. Version 1.0.