Harvey — Architecture & UX Decision Log

This file records significant architectural and UX decisions, their rationale, and known trade-offs. New decisions are added at the top. Each entry names the decision, the context that prompted it, the chosen approach, the rejected alternatives, and the consequences.

2026-06-25 — Source registry lives in `knowledge.db`; not a separate database

Context. The scholarly provenance design (see scholarly-provenance-design.md) requires a sources authority table and an observation_sources join table. Two placement options were considered: a new provenance.db alongside knowledge.db, or new tables inside the existing knowledge.db.

Decision. Add sources and observation_sources directly to knowledge.db. The sources table needs to join against observations, concepts, and kb_fts, all of which live in knowledge.db. SQLite cross-database joins via ATTACH DATABASE cannot use foreign keys and require every query to name the attached database alias, making all query code more fragile. A single database with multiple tables is the correct SQLite idiom.

Rejected alternatives.

Separate provenance.db — eliminates foreign keys between observations and sources; requires ATTACH in every query that spans the two files; adds a new runtime file that users must back up and move with their workspace.
In-memory provenance (no persistence) — RAG provenance that disappears on session end has no scholarly value. The whole point is a durable, auditable record.

Consequences.

knowledge.go gains DDL for sources and observation_sources in its Open path.
The data migration from observations.source_doi runs once on first open after upgrade; source_doi is retained as a read-only backward- compat column.
No new runtime files are introduced; agents/knowledge.db remains the single knowledge-base file.

2026-06-25 — Scholarly provenance: inference-time only; training-time attribution deferred

Context. Two Scholarly Kitchen articles (2026-06-17 and 2026-06-25) and the Cambridge Scholarly AI Workshop identified that AI systems interact with scholarly content at two points: training time (content absorbed into model weights) and inference time (content retrieved and injected via RAG at query time). The workshop explicitly classified training-time attribution as technically intractable at current model scales and recommended focusing practical interventions on inference-time retrieval.

Harvey’s architecture makes inference-time provenance fully tractable: the RAG pipeline (ragAugment, RagStore.Query) has complete observability of what was retrieved and from where. Training-time attribution for Ollama or Llamafile models is not accessible to Harvey and would require coordination with model providers.

Decision. The scholarly provenance work (v0.0.15) focuses entirely on inference-time provenance: 1. A minimum provenance payload on RAG chunks (source, DOI, title, version, rights, content hash, retraction flag). 2. A source registry in knowledge.db as the authority for source metadata, linked to observations via observation_sources. 3. Per-source [[rag-source: ...]] Fountain notes so session files serve as citable records of what evidence informed each response. 4. HARVEY.md system-prompt guidance to retrieve before generating and to attribute content at the point of use, not post-hoc.

Training-time attribution is explicitly deferred and recorded as out of scope, not a gap in the design.

Rejected alternatives.

Attempt training-data disclosure via model metadata — Ollama’s /api/show endpoint returns a Modelfile and template but not a training corpus manifest. No standard interface exists. Not tractable.
Restrict Harvey to models with published data cards — would exclude most locally-available models and undermine the local-first principle.

Consequences.

See scholarly-provenance-design.md for the full architecture and scholarly-provenance-plan.md for the phased implementation.
Provenance metadata added to chunks schema (S1), source registry added to knowledge.db (S2), Fountain notes enhanced (S3), /kb commands extended (S4).

2026-06-24 — INT./EXT. scene prefix redefined as local/remote computation

Context. The original Fountain format spec (v1.0–1.1) defined INT. as “Harvey is involved as orchestrator” and EXT. as “direct model-human conversation without Harvey.” This made EXT. scenes effectively hypothetical — the recorder never wrote one, because Harvey is always involved. Remote Ollama route dispatches (e.g. @pi2) and cloud API calls were both recorded as INT. despite running on remote machines. The distinction was meaningless in practice.

Decision. Redefine the prefix semantically as location of computation: INT. = runs on the local machine where Harvey is running; EXT. = runs on a remote system. This maps naturally to the theatrical meaning (interior/exterior), gives EXT. scenes real-world frequency, and encodes practically important information (network latency, data exposure, cost). Remote Ollama routes and cloud API routes are now EXT. HARVEY still appears in EXT. scene dialogue as the forwarding character when Harvey initiated the route dispatch; HARVEY is absent only in truly direct conversations (no Harvey involvement). The RecordExteriorTurn recorder method writes EXT. scenes; RecordTurnWithStats continues to write INT. scenes.

Rejected alternatives.

Keep the old semantic, just document it better — the old definition made EXT. permanently dead code and gave parsers no useful locality signal. The new semantic costs nothing to implement and adds real diagnostic value.
Use a Remote: true field in the scene description instead of the prefix — keeps the prefix consistent but buries locality in metadata. The theatrical prefix is the primary structural signal in Fountain; using it for locality is more idiomatic.
Make every forwarded turn EXT. regardless of locality — local model-switch via @mention (where attemptModelSwitch succeeds) is still local computation. Only registered route dispatches (DispatchToEndpoint) are genuinely remote.

Consequences.

FOUNTAIN_FORMAT.md updated to v1.2 with corrected INT./EXT. definitions, updated scenarios 2 and 3, and updated best practices.
recorder.go gains RecordExteriorTurn(endpoint, userInput, reply string).
terminal.go route dispatch path calls RecordExteriorTurn instead of RecordTurn.
Existing .spmd session files recorded before v0.0.15 have INT. for route dispatches — this is a known inaccuracy, not a migration target.

2026-06-24 — Fountain sessions become full audit trails (v0.0.15)

Context. Corin Wagen’s article “Tool Use and AI Scientists” argues that tool calls are the primary mechanism for AI interpretability — the decision trace of what an agent chose to do and why. Harvey’s Fountain session files record dialogue, file writes, and shell commands, but tool calls appear only as unstructured prose (“Harvey calls read_file: {args}”), tool results are not recorded, RAG context retrieval leaves no trace, and memory injection at session start is invisible. See audit-trail-design.md.

Decision. Extend the Fountain format to v1.2 with four new audit elements. A Harvey session file is a sequence of many scenes — one per discrete interaction (chat turn, shell command, file write, skill activation). The placement of new elements respects this: notes go inside existing scenes; only one new scene type is added.

[[tool: name(args) — status]] notes replace prose action blocks for tool calls. They appear inside the existing INT. HARVEY AND … TALKING scene for the turn where the tool loop ran, between HARVEY’s forwarding line and the model’s reply. Multiple tool-call rounds within one turn produce multiple flat notes in the same scene — no new scene is opened per round.
[[CHARACTER.tool: name(args) — status]] variant attributes tool calls to a forwarded model in @mention turns. Same placement as above; only the prefix changes.
[[rag: N chunks from STORE, top score S.SS]] notes record RAG retrieval inside the existing INT. HARVEY AND … TALKING scene for the turn where RAG fired, before the user dialogue line. Turns where RAG did not fire have no [[rag:]] note.
INT. CONTEXT RECALL TIMESTAMP is the only new scene type. It appears once at session start — before the first chat turn — when UnifiedMemory.Recall injects memories. It contains [[recall: ID (SOURCE) — score S.SS]] notes, one per recalled item.

Rejected alternatives.

Bridge audit.jsonl and Fountain — routing AuditBuffer events to the recorder would couple two unrelated systems (security audit vs. session narrative) and require the audit buffer to hold a recorder reference. Rejected: keep them separate.
Full tool result content in Fountain — maximally auditable but bloats session files and degrades memory miner quality for large read_file or search outputs. Status-only (ok / error: first line) achieves the diagnostic goal.
RAG note in the scene description block — the scene description is written at scene open; RAG fires later in runChatTurn. A separate note just before user dialogue is temporally accurate.
INT. TOOL LOOP scene per tool-call round — a multi-round tool loop (model calls tool, gets result, calls another tool, gets result, produces final answer) could open a new scene for each round. Rejected: a “turn” from the user’s perspective is one request-response cycle; splitting it across multiple scenes makes the session harder to read and harder for the memory miner to extract question-answer pairs. Flat notes inside the single turn scene preserve both.
Per-message character attribution via Message.Model — accurate multi-round character attribution requires tagging each Message, which ripples through serialisation, compaction, and replay. Deferred: single character per turn covers the real-world case.

Consequences.

recorder.go: ToolCallRecord gains Result and Character fields; RAGAugmentInfo struct added; RecordTurnWithStats gains ragInfo *RAGAugmentInfo parameter; RecordContextRecall method added.
terminal.go: ragAugment returns (string, *RAGAugmentInfo); toolCallsFromHistory gains charName string parameter; runChatTurn gains charName string parameter.
harvey.go: injectMemoryContext calls a.Recorder.RecordContextRecall when results are non-empty.
tool_executor.go: ToolExecutor gains CharacterName string field.
FOUNTAIN_FORMAT.md updated to v1.2 with new syntax and scene type.
All existing callers of RecordTurnWithStats pass nil for the new ragInfo parameter; all callers of toolCallsFromHistory pass "" for charName except the @mention local-switch path.

2026-06-20 — Command vocabulary standardised across all resource-management commands

Context. Harvey’s command families share a common resource-management pattern but use inconsistent verbs: /llamafile drop, /rag drop, /route rm, and /model alias delete all mean the same thing; /skill info and /skill-set info duplicate /memory profile show’s pattern under a different name; /session has no list or show; /route has no use. Users must learn each command family independently rather than applying a single vocabulary pattern. See llamafile-primary-design.md and TODO.md.

Decision. Standardise on eight core verbs for all resource-management commands: list, add (register external resource), new (create internal item), use (activate), show (display content/details), edit (open in $EDITOR), remove (delete/unregister), rename. Backend service commands additionally support start, stop, and status (health/connection — distinct from show). The add vs new distinction is preserved: add registers something that already exists externally (a file path, a URL); new creates something Harvey owns (a database, a skill, a plan). Existing non-standard verbs (drop, rm, info, create, set) are kept as backward-compatible aliases; the canonical verb is the one documented and tab-completed.

Rejected alternatives.

Rename only the worst offenders — partial fixes leave the vocabulary inconsistent enough that users still cannot predict subcommands. The value comes from universal coverage.
Single delete verb everywhere — delete implies permanent destruction; remove better conveys “unregister from Harvey’s knowledge” (the underlying file or database is not deleted).
Collapse add and new into a single verb — the distinction maps to a real semantic difference users already understand. add = “I have a thing, register it”; new = “create a thing for me”.

Consequences.

/rag remove, /route remove, /session list, /session show, /session use, /llamafile show, /rag show, /route use, /skill show, /skill-set new, /skill-set show, /model alias add are all new subcommand aliases or additions.
Existing verbs (drop, rm, info, create, set, continue) remain as aliases; no existing scripts or muscle-memory broken.
user_manual.md and getting-started.md gain a “Command vocabulary” section explaining the eight verbs once, making every command family self-documenting.
Tab completion ArgCompletion maps for each command are updated to list canonical verbs first.

2026-06-20 — Llamafile becomes the primary model backend; Ollama is secondary

Context. Harvey has supported both Llamafile and Ollama since v0.0.11, but startup logic, documentation, and default prompts all treat Ollama as the assumed backend. New users who want a fully local, no-server-required setup must discover Llamafile through man pages rather than finding it naturally in the startup flow. See llamafile-primary-design.md.

Decision. Reverse the priority: at startup Harvey probes for an active Llamafile first, registered Llamafiles second, Ollama third. The model picker (shown when no session is being continued) lists Llamafile models above Ollama models. getting-started.md and INSTALL.md lead with the Llamafile path; Ollama is documented as an advanced alternative. Ollama support is fully retained — no existing config or commands change.

Rejected alternatives.

Keep Ollama as primary, improve Llamafile docs only — documentation-only change leaves the startup UX inconsistent with the stated priority. New users still encounter Ollama first.
Detect “better” backend heuristically (GPU present → Llamafile, else Ollama) — GPU detection is platform-specific and error-prone. User intent (registered a Llamafile → prefer Llamafile) is a cleaner signal.
Single preferred_backend setting in harvey.yaml — adds config surface without improving the default experience for users who have not read the config docs.

Consequences.

Startup probing order changes in terminal.go backend selection block.
getting-started.md and harvey-getting-started.7.md are rewritten.
The model picker presents Llamafile entries before Ollama entries.
No breaking changes to harvey.yaml schema, API, or slash commands.

2026-06-20 — At-mention (`@model`) switches the active model while preserving history

Context. Switching models mid-session requires /llamafile use NAME or /ollama use NAME, which breaks conversational flow. Users who want a different model for the next question should be able to express that inline. The theatrical framing — a model switch is a new character entering the scene — also clarifies how downstream systems (memory miner, replay, plan executor) should handle boundaries. See llamafile-primary-design.md.

Decision. If the REPL input begins with @name where name matches a registered Llamafile or Ollama model, Harvey switches to that model and forwards the remainder as the prompt. Conversation history is preserved unchanged. If @name is not recognised, the whole input is forwarded to the current model without warning (false positives on natural @ mentions are rare enough that silent pass-through is less surprising than an error). Mid-session switches are recorded in the session file as [[model switch: NAME (BACKEND) at TIMESTAMP]] Fountain notes rather than starting a new session file — continuing in the same file preserves pre-switch context for memory mining and replay. The memory miner, session replay, and plan executor each gain logic to track model attribution across switch boundaries.

Rejected alternatives.

Error if @name is unknown — would break natural-language inputs that begin with a person or file mention.
Require separator syntax @name: rest — adds friction; a space is sufficient and consistent with how @route mentions already work in routing.
Start a new session file on switch — orphans the pre-switch context; the [[model switch: ...]] note preserves the boundary without splitting the file.

Consequences.

terminal.go REPL input handler gains an @ prefix check before the /command check.
attemptModelSwitch(a, name, out) looks up Llamafiles first, then Ollama models.
Recorder.RecordModelSwitch(model, backend) writes a Fountain note at the switch point.
NewRecorder gains a Backend: title-page field.
Memory miner splits sessions at switch notes and attributes turns to the generating model.
Session replay parses switch notes and performs mid-replay model switches.
Plan executor supports [model: name] step annotations and restores the default model after each annotated step.

2026-06-20 — Unified `/model` command as a backend-agnostic delegating facade

Context. Users who switch between Llamafile and Ollama must remember which backend is active to choose the right command. As more backends are added (remote routes, encoderfiles), per-backend command proliferation increases cognitive load for users who just want to switch models. See llamafile-primary-design.md.

Decision. Add /model [list|use NAME|show NAME|status] as a backend-agnostic facade. /model use NAME resolves the name by checking Llamafile models first, then Ollama models, then named routes, and delegates to the appropriate backend command. /model list merges all backends into one sorted table. The backend-specific commands (/llamafile, /ollama) are unchanged and remain the authoritative interfaces for backend-specific operations (/llamafile start, /ollama pull, etc.).

Rejected alternatives.

Deprecate /llamafile and /ollama in favour of /model — too disruptive; power users and scripts depend on the specific subcommands.
/model with no subcommand shows an interactive picker — inconsistent with Harvey’s pattern: pickers appear when a required argument is omitted from a subcommand, not when the command itself is invoked without arguments.
Top-level /use NAME — shorter but conflicts with the established convention that use appears only as a subcommand.

Consequences.

commands.go gains a "model" registration; cmdModel dispatcher added.
/model use NAME resolves across backends; no new switching code — delegates to existing handlers.
helptext.go gains ModelHelpText and ModelAliasHelpText (the latter covering both /model alias subcommands and @mention switching; source for regenerating the currently sourceless harvey-model-alias.7.md).

2026-06-20 — Context utilization reads `n_ctx` from `/v1/models`; config override available

Context. A [ctx: N%] indicator requires knowing both the current token count (available from ChatStats.PromptTokens) and the model’s maximum context window. Context window size is model-specific and not always available at runtime. See llamafile-primary-design.md.

Decision. Priority order for context length: (1) context_length field on LlamafileEntry in harvey.yaml — explicit user override; (2) data[0].meta.n_ctx from the /v1/models API response — tested on Qwen3.5-2B, Qwen3.5-4B, and Apertus-8B, consistently present across all three model families; (3) OllamaContextLength on Config, already populated by ShowModel; (4) unknown — suppress the indicator entirely. The n_ctx value is the runtime context window (what llamafile loaded), not n_ctx_train (training context). When the probe succeeds and no user config is present, the result is stored in memory only — not written back to harvey.yaml — to avoid config churn on every startup.

Rejected alternatives.

Hardcode context lengths per known model family — goes stale as model versions change; does not cover user-downloaded custom models.
Always show token count without percentage — [tokens: 4.2k] is informative but gives no sense of urgency; percentage is more actionable for deciding when to /clear.
Use n_ctx_train as the window size — this is the training context, which can be 4× larger than the runtime window. Using it would make the utilization % appear artificially low and mislead users.

Consequences.

LlamafileEntry gains ContextLength int \yaml:“context_length,omitempty”``.
llamafile_service.go gains ProbeLlamafileContextLength(url string) int parsing data[0].meta.n_ctx.
terminal.go appends [ctx: N%] to the post-turn status line when context length is known and non-zero.
CONFIGURATION.md documents the new context_length field on LlamafileEntry.

2026-06-19 — Tab completion: two-layer design with shared SelectFrom helper

Context. Harvey’s buildCompleter() only completes top-level command names, @route references, Ollama model names, and file paths. Users must remember subcommand names by heart and must know exact RAG store/model names to use use and drop subcommands. Several commands already show numbered pickers when no name is given, but each reimplements the pattern differently. See tab-completion-design.md.

Decision. Extend completion in two layers: (1) second-token subcommand names using a new Subcommands []string field on Command; (2) third-token argument values using a new ArgCompletion map[string]func(*Agent) []string field that maps each subcommand to a candidate-list function. Additionally, introduce a shared SelectFrom / SelectItem / SelectFromStrings API in a new ui.go file. Commands whose first positional argument comes from a finite, enumerable list (/rag use, /memory show, /llamafile use, etc.) display the picker when no argument is given. ui.go lives in the harvey package; promotion to termlib is deferred until a clean generalisation is proven.

Rejected alternatives.

Parse Usage strings — brittle; the Usage field is for display, not machine consumption. A Subcommands field is explicit and refactoring-safe.
Single CompletionFunc per Command — more flexible but requires each command to handle prefix filtering, sorting, and the active-marker display pattern itself. The ArgCompletion map[string]func approach keeps candidate production separate from completion mechanics.
Fuzzy matching — adds complexity without a proven need. Prefix matching is sufficient for short subcommand names; fuzzy can be added later without changing the API.
Move SelectFrom to termlib immediately — premature. We don’t know the right generalisation until it has been used in several places. Standard design → plan → decision process applies if/when that move happens.

Consequences.

Command struct gains Subcommands []string and ArgCompletion map[string]func(*Agent) []string. The doc comment is updated. No existing registration is broken (new fields are optional).
buildCompleter() gains two new blocks before the existing file-path switch. Existing file-path and model-name completion is unchanged.
ui.go is a new file; ui_test.go covers all exported symbols.
Existing picker implementations in llamafile.go and commands.go are refactored to call SelectFrom in Phase E. Behaviour is identical; code shrinks.
Harvey YAML and configuration are not changed.

2026-06-18 — MinIO replaced with aws-sdk-go-v2 S3 client

Context. remote_s3.go uses github.com/minio/minio-go/v7 as the S3 protocol client. MinIO’s Go client has moved to a closed-source license, making it unsuitable for Harvey’s AGPL-3.0 codebase. The affected surface is small: Stat, Get, and List operations on S3-compatible stores (AWS S3, MinIO server, Cloudflare R2). See s3-replacement-design.md.

Decision. Replace the MinIO client with github.com/aws/aws-sdk-go-v2 (Apache-2.0 licensed). The AWS SDK v2 supports all S3-compatible endpoints via the BaseEndpoint override option. The call sites in remote_s3.go map cleanly: StatObject → HeadObject, GetObject → GetObject, ListObjects → ListObjectsV2. Credentials continue to come from environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) and the SDK’s default credential chain.

Rejected alternatives.

Minimal net/http + AWS Signature V4 from scratch — eliminates the dependency but requires maintaining the signing and error-parsing logic. The AWS SDK already does this correctly for all S3 variants; hand-rolling it for three methods is low-leverage.
rclone/rclone as a library — comprehensive but extremely heavy (~100+ package imports). Overkill for three read-only S3 operations.
Continue using MinIO client — violates Harvey’s open-source license requirements.

Consequences.

go.mod removes github.com/minio/minio-go/v7, adds three aws-sdk-go-v2 modules (config, service/s3, credentials).
remote_s3.go is rewritten; public interface (RemoteReader implementation) is unchanged.
Existing S3 URIs and harvey.yaml config fields are unaffected.
AWS credential chain (env vars, ~/.aws/credentials, IAM roles) works automatically.

2026-06-18 — Spinner gains dynamic status message channel

Context. Harvey’s spinner currently shows rotating Edward Lear quotes and a timer while waiting for the LLM. Users have no way to tell whether Harvey is embedding a query, calling a tool, waiting for Ollama, or doing something else. Claude Code and similar tools display live status messages that update as work progresses. See spinner-ux-design.md.

Decision. Add a StatusCh chan string field to the Spinner struct and an UpdateStatus(msg string) method. The spinner’s message line shows the most-recent status update instead of the next Lear quote whenever a message is pending; Lear quotes resume when no status is pending. The caller sends non-blocking updates via UpdateStatus; the spinner goroutine reads them on the fast tick. This preserves the existing Lear personality while surfacing actionable progress at key moments: tool call start/end, RAG embedding, context injection, model switching. Tab completion is out of scope for this work item; it is a separate, larger effort.

Rejected alternatives.

Replace Lear messages entirely with status strings — loses the personality that distinguishes Harvey from generic CLI tools. The mixed approach preserves Lear for idle periods.
Print status on a separate line below the spinner — requires the spinner to know its vertical position relative to other output, which it does not; scrolling behavior would be unpredictable.
Atomic string (sync/atomic or sync.Mutex) — functionally equivalent but a channel fits Harvey’s existing goroutine patterns and avoids a lock.

Consequences.

spinner.go adds StatusCh chan string, UpdateStatus(string), and lastStatus string to the Spinner type.
terminal.go calls UpdateStatus at: RAG embedding start, tool call start, tool call complete, context injection.
The message line now shows status text (dim green) when present; falls back to a Lear quote (colored) when idle.
No change to the timer or frame tick behavior.

2026-06-18 — Assay evaluation output moves to workspace-level directory

Context. bin/assay writes evaluation results to testout/ inside the harvey/ source repository. This directory is gitignored, but the JSON and Markdown artifacts look like test output to language models that read the source tree, causing models to misinterpret stale evaluation results as current test failures. See assay-llamafile-design.md.

Decision. Change the default output directory for bin/assay from testout/ to $WORKSPACE/assay-results/<timestamp>/ where $WORKSPACE is resolved the same way Harvey resolves its workspace (walk up from cwd to the directory containing agents/harvey.yaml). If no workspace is found, fall back to a assay-results/ directory in the current working directory. The --output flag overrides the default as before.

Rejected alternatives.

Keep output in testout/ but add a note file — models still read and misinterpret the directory.
Always require --output flag — breaks existing workflows that rely on the default.
Use $XDG_DATA_HOME/harvey/assay-results/ — correct in principle but separates results from the workspace they were generated against, making correlation harder.

Consequences.

cmd/assay/main.go gains workspace discovery logic (same heuristic as Harvey’s NewWorkspace).
Default report and results paths change; documented in --help output.
testout/ in the harvey repo is no longer populated by bin/assay in normal use.

2026-06-18 — Assay adds Llamafile backend via `--llamafile` flag

Context. bin/assay currently only supports Ollama as a model backend, but Harvey supports both Ollama and Llamafile. Users evaluating a Llamafile model must run it manually and point assay at it with a custom URL, which is error-prone and undocumented. See assay-llamafile-design.md.

Decision. Add a --llamafile PATH flag to bin/assay. When provided, assay starts the llamafile process on an ephemeral port (same startLlamafile logic as in llamafile_service.go), runs the evaluation suite against that endpoint, then terminates the process on exit. The --model flag is still respected (it sets the model name in the report) but --ollama is ignored when --llamafile is given. Embeddings continue to use the Ollama embedder unless --rag-db is also given and the store’s recorded embedding model differs, in which case the operation fails fast with a clear error.

Rejected alternatives.

Separate assay-llamafile binary — duplicates 95% of the evaluation harness; not maintainable.
Auto-discover a running Llamafile process — fragile; depends on port conventions that are not enforced.
Require user to start Llamafile and pass URL — current workaround; acceptable as an escape hatch but the --llamafile flag makes the common case ergonomic.

Consequences.

cmd/assay/main.go imports llamafile_service.go functions already in the package; no new files needed.
Llamafile process is always terminated on assay exit, even if evaluation panics (deferred cleanup).
The report header records the llamafile path and version alongside the model name.

2026-06-18 — Web developer template added to built-in profile set

Context. The five templates shipped in v1 (backend-developer, frontend-developer, dataset-developer, data-scientist, technical-writer) do not have a template that covers the full polyglot web development stack used in this workspace: Go backends, uv-managed Python scripts, SQL (SQLite3 and Postgres), Deno+TypeScript frontends, and vanilla JavaScript/CSS/HTML5. A backend developer using Deno or a frontend developer writing Go API clients currently reaches for an incomplete template. See web-developer-template-design.md.

Decision. Add a web-developer.spmd template to templates/profiles/. It covers: Go (net/http, database/sql), uv+Python (scripting, data processing), SQL (SQLite3 dialect and Postgres), Deno+TypeScript (runtime, standard library, no bundler by default), JavaScript (ES modules, no framework by default), CSS (custom properties, no utility framework by default), HTML5 (semantic markup). The template’s NOTE: recommends qwen2.5-coder:7b or granite3.3:2b and suggests ingesting both Go source and the deno.json/package.json for context.

Rejected alternatives.

Extend the existing backend-developer template — the existing template is already a good fit for pure Go/Python/SQL work; adding Deno and CSS would make it too broad and undermine the template picker’s value as a role-specific starting point.
Split into go-web and deno-web templates — two templates for what is effectively one stack in this workspace is unnecessarily granular.

Consequences.

templates/profiles/web-developer.spmd is added to the embedded binary.
The onboarding template picker shows a seventh option.
No code changes required; ListTemplates() discovers it automatically.

2026-06-18 — `/memory profile` subcommand set expanded and naming standardized

Context. The current /memory profile command has three subcommands — show, update, use — but their semantics do not match Harvey’s established command vocabulary. show lists active profiles (like list does elsewhere) rather than showing the content of the active profile. use creates a new profile from a template (like new does elsewhere) rather than selecting an existing saved profile. update opens the current profile in $EDITOR. There is no way to rename a workspace. See memory-profile-ux-design.md.

Decision. Standardize the subcommand set:

Subcommand	New behaviour	Was
`list`	List all profiles (active + archived)	(`show` partial)
`show`	Print the content of the current active profile	(missing)
`edit`	Open the active profile in `$EDITOR` (rename of `update`)	`update`
`use [NAME]`	Switch to a named template or picker	unchanged
`rename NAME`	Rename the workspace display name in the active profile	(missing)

update is kept as a deprecated alias for edit with a one-line deprecation notice, to avoid breaking existing workflows. The /profile top-level alias continues to delegate to all subcommands. The help text for /memory is updated to list all five subcommands.

Rejected alternatives.

Rename use to new to match the new/list/use pattern elsewhere — /profile use is already shipped, documented, and matches /ollama use, /rag use. Breaking the alias would confuse users more than the current inconsistency.
Keep show with list semantics — defeats discoverability; users type /memory profile show expecting to see what their profile says, not a list of IDs.

Consequences.

commands.go: cmdMemoryProfile gains list, rename, and show (content-display) cases. show (old list behavior) becomes list. update remains as alias for edit.
helptext.go: memory and profile help text updated.
harvey-memory.7.md: man page updated to document all five subcommands.

2026-06-18 — PDF capability disclosed in HARVEY.md system prompt

Context. Harvey’s read_file built-in tool description states that PDF files are extracted automatically via poppler. But when tools are disabled — or when a small model uses prose tool calls and does not consistently read all tool descriptions — the model has no knowledge of this capability and asks the user to manually convert PDFs to text. HARVEY.md is always injected as the system prompt, making it the correct place to disclose capabilities that should be known regardless of tool-call mode. See quick-fixes-design.md.

Decision. Add a File reading capabilities section to HARVEY.md that enumerates what Harvey can read without conversion: plain text, Markdown, Go/TypeScript/Python source, and PDF (extracted via poppler automatically). This mirrors the pattern of the existing “Tagged code blocks” section — documenting Harvey’s automatic behaviors so the model can confidently use them rather than guessing.

Rejected alternatives.

Only fix the read_file tool description — already done; the problem is the model doesn’t see tool descriptions when tools are disabled.
Inject a capability summary at each turn — wasteful in context tokens; a one-time system prompt disclosure is sufficient.
Print a reminder when the user asks about a PDF — reactive; the bug is the model prompting the user to convert, not the user asking Harvey.

Consequences.

HARVEY.md gains a short “File reading” section (4-6 bullet points).
No code changes required; HARVEY.md is loaded by LoadHarveyMD() at startup.
Models that previously asked users to convert PDFs will instead use read_file directly.

2026-06-18 — Llamafile model discovery includes Windows .exe extensions

Context. scanLlamafileModels() in llamafile.go uses strings.HasSuffix(e.Name(), ".llamafile") to identify llamafile binaries. On Windows, llamafile binaries end in .exe (plain) or .llamafile.exe (when distributed with the double extension). Users on Windows who place binaries in ~/Models see an empty picker even with valid models present. The same bug affects llamafileModelName, which only strips the .llamafile suffix and leaves .exe on Windows paths. See quick-fixes-design.md.

Decision. Extend scanLlamafileModels to match three patterns: .llamafile, .llamafile.exe, and (on Windows only) any .exe file in the models directory. llamafileModelName is updated to strip suffixes in the correct order: strip .exe first (if present), then .llamafile (if present). The llamafileDefaultModelsDir() platform function already returns the correct OS-appropriate path; no change needed there.

Rejected alternatives.

Require users to rename binaries on Windows — poor UX; llamafile project ships .exe files and users should not need to rename them.
Add a config field for custom extensions — over-engineering a simple extension check.
Match all .exe files unconditionally — would pick up non-llamafile executables; restrict to .exe only when the scan finds no .llamafile or .llamafile.exe files, or only match .exe files that also check for the llamafile magic bytes (deferred to a future improvement).

Consequences.

llamafile.go: scanLlamafileModels matches .llamafile, .llamafile.exe, and .exe (Windows-only guard); llamafileModelName strips suffixes in the correct order.
Windows users with binaries in ~/Models now see them in the picker.
No change to Linux/macOS behavior.

2026-06-18 — `--resume` flag auto-selects the most recent session

Context. Harvey’s --continue PATH flag resumes from a specific session file. When the user simply wants to pick up where they left off (the most common case), they must find and type the session path, or navigate the interactive picker. Both are unnecessary friction when the intent is always “resume my last session.” See quick-fixes-design.md.

Decision. Add a --resume flag (no argument) that resolves to the most recently modified .spmd file in agents/sessions/ and sets cfg.ContinuePath to that path before Run. If no sessions exist, Harvey prints a one-line notice and starts fresh. The implementation delegates entirely to the existing ContinueFromFountain path — no new session-loading logic is needed.

Rejected alternatives.

Make --continue with no argument mean “most recent” — changes the semantics of an existing flag; would break scripts that pass --continue expecting a required argument.
Add --resume as an alias for opening the interactive picker — the picker is useful for choosing among multiple sessions; --resume should be zero-friction and not prompt.

Consequences.

cmd/harvey/main.go gains a --resume case that calls a new mostRecentSession(sessDir string) string helper.
harvey.go or sessions_files.go gains mostRecentSession (walks agents/sessions/, returns path of newest .spmd by ModTime).
No change to --continue semantics.
If called with --record, the resumed session is not re-recorded (existing guard in terminal.go:333-338 already handles this).

2026-06-09 — Programming language support uses a central LanguageRegistry with pluggable handlers

Context. Harvey’s RAG system already supports ingesting 17 programming language file extensions (commands.go:4975-4979), but the looksLikePath function (commands.go:3463-3467) was missing extensions for C, C++, Pascal, Oberon, Lisp, and Basic. Additionally, all languages used generic paragraph-based chunking which breaks code structures (functions, procedures) across chunk boundaries, reducing RAG retrieval quality for programming queries. Users working with source code need language-aware features: code-aware chunking, documentation extraction, syntax highlighting, and auto-formatting.

Decision. Create a comprehensive language support system with the following architecture:

Central LanguageRegistry (language_registry.go) — Maps language identifiers to handlers (detectors, chunkers, extractors, formatters, highlighters). Each language has a LanguageInfo struct with metadata (name, extensions, comment markers, block delimiters, capabilities).
Pluggable Interfaces — Define Go interfaces for each capability:
- LanguageDetector — Identifies language from file path and/or content
- CodeChunker — Splits source into meaningful units (functions, classes, procedures)
- DocExtractor — Extracts comments, docstrings, and symbol documentation
- CodeFormatter — Formats source code according to language conventions
- SyntaxHighlighter — Adds ANSI color to code blocks for terminal display
Code-Aware Chunking — Language-specific chunkers that respect code structure:
- C/C++: Split at function boundaries, preserve preprocessor directives and structs
- Pascal: Split at PROCEDURE/FUNCTION boundaries, preserve TYPE/RECORD definitions
- Oberon: Split at MODULE/PROCEDURE boundaries
- Lisp: Split at top-level forms (balanced parentheses), keep DEFUN/DEFMACRO together
- Basic: Split at SUB/FUNCTION boundaries
Progressive Enhancement — All features are opt-in. Basic file I/O works for all languages. If a language-specific handler fails, fall back to generic behavior.
Immediate Fix — Add missing extensions (.c, .cpp, .h, .hpp, .pas, .Mod, .obn, .lisp, .bas) to looksLikePath function for tagged code block detection.

Rejected alternatives.

Use Tree-sitter for all parsing — Tree-sitter provides excellent AST-based parsing but adds ~5MB per language grammar, significant build complexity, and external dependencies. Rejected in favor of simpler regex-based and state-machine approaches for initial implementation, with Tree-sitter as a future enhancement.
Single monolithic chunker — One chunker handling all languages with conditional logic. Rejected for being hard to maintain, test, and extend. The interface-based approach allows independent development and testing of each language’s chunker.
Cloud-based language services — Use external APIs for formatting, analysis, etc. Rejected for violating Harvey’s local-first philosophy and introducing privacy/security concerns (sending user code to external services).
Mandatory formatting — Always format code on write without user control. Rejected for being too opinionated and potentially breaking user workflows. Auto-formatting must be opt-in and configurable.

Consequences.

File Changes: New files language_registry.go, code_chunkers.go, doc_extractors.go, syntax_highlighters.go, code_formatters.go with corresponding test files. Modified commands.go, config.go, builtin_tools.go, terminal.go.
Backward Compatibility: Existing RAG stores continue to work. Generic chunking remains as fallback. No breaking changes to SQLite schema or session format.
Performance: Language registry initialization at startup adds < 10ms. Chunking with language-specific handlers adds ~10-20% overhead vs. generic chunking. Formatters only invoked when auto-format is enabled.
Extensibility: New languages can be added by implementing the interfaces and registering them, without modifying core code.
Improved RAG Quality: Code-aware chunking preserves function/procedure boundaries, improving retrieval quality for code-related queries by an estimated 20%+ over generic chunking.
Better UX: Syntax highlighting in terminal output and auto-formatting on file write improve the user experience when working with source code.

2026-06-09 — Code block path detection (`looksLikePath`) extended to support all RAG-ingestible languages

Context. The looksLikePath function in commands.go (lines 3463-3467) determines whether a string looks like a file path rather than a language identifier. This is used by fencePathToken when parsing tagged code blocks (e.g., ```c:program.c). The function had a hardcoded list of known extensions that was missing: .c, .cpp, .h, .hpp, .pas, .Mod, .obn, .lisp, .bas. This meant that tagged code blocks for these languages were not recognized as file paths, preventing the auto-write feature from working.

Decision. Extend the knownExts slice in looksLikePath to include all extensions supported by RAG ingestion (from ragIngestableExts in commands.go:4975-4979). Additionally, add a comment noting that these are programming languages supported by RAG ingestion for future maintainability.

Rejected alternatives.

Refactor to use the language registry — While this would be more maintainable long-term, it would introduce a circular dependency (the registry isn’t initialized when looksLikePath is first used during startup). Deferred to a future cleanup.
Create a separate list — Maintain a separate, parallel list of extensions. Rejected for creating a maintenance burden and potential for divergence.
Make it dynamic — Load extensions from configuration. Rejected as over-engineering for a static list that rarely changes.

Consequences.

Tagged code blocks for all RAG-supported languages now work correctly, e.g., c:src/main.c` orpascal:module.pas`.
The hardcoded list remains a maintenance point but now includes all 17 supported languages.
Future additions to RAG ingestion must remember to also update looksLikePath. This is documented in the code comments.

2026-06-08 — `/loop` chat iterations use a shared `runChatTurn` helper that skips skill auto-trigger and `autoExecuteReply`

Context. The REPL’s plain-chat path does more than call the model: it checks whether the input matches a skill trigger pattern (auto-dispatching to a different flow entirely), and after the reply, offers to write fenced code blocks to disk via an interactive Y/n prompt (autoExecuteReply). Both make sense for a human typing one message at a time; both are problematic when the same prompt is sent N times unattended — a skill could fire on iteration 3 but not iteration 1, and a Y/n prompt would block forever waiting on stdin that nothing will type.

Decision. Factor the REPL’s inline chat block (terminal.go, roughly lines 635-820) into a shared (a *Agent) runChatTurn(ctx, input, out) (reply string, stats ChatStats, err error). It keeps everything that defines “how Harvey answers a prompt” — RAG augmentation, the tool-loop-or-plain-chat branch, token/context warnings, stats, Fountain recording — and excludes skill auto-trigger matching and autoExecuteReply, both of which belong to “how the REPL reacts to a typed line.” /loop calls this helper directly for its chat-mode iterations; the REPL becomes a thin wrapper around the same helper plus its own skill-trigger/autoExecuteReply handling.

Rejected alternatives.

Reuse the REPL’s inline chat block as-is — a looped prompt could silently jump to a different skill mid-run, or stall on iteration 1 waiting for a keypress that never comes.
Duplicate the chat block inside cmdLoop — roughly 150 lines of copy-paste that would drift from the REPL’s version on the next change to the chat path.

Consequences.

terminal.go’s plain-chat branch is refactored but behaviourally unchanged for normal typed input — verified with go test -race after extraction.
/loop behaves predictably: the same prompt produces the same kind of exchange every time, with no surprise skill redirects or stalled prompts.
If a.Config.ToolsEnabled, looped prompts can still cause the model to write files or run commands via the normal tool loop — /loop does not suppress this, since doing so would make looped chat behave differently from normal chat (see loop-design.md, “Safety Considerations”).

2026-06-08 — `/loop` caps iterations at 100 and defaults to 10

Context. /loop is the first Harvey command that can run LLM calls — and, with tools enabled, write files or execute shell commands — repeatedly and unattended. Harvey’s existing security posture (safe mode, permission system, audit log) is built around bounding and surfacing risky actions rather than trusting the user to always type the right thing.

Decision. /loop takes an optional --count N (following the --depth N convention already established by /read-dir), defaulting to 10 and capped at 100. There is no “run forever” option.

Rejected alternatives.

Unbounded by default — the one command that could turn a typo (/loop 1s tell me a joke) into thousands of unattended LLM calls before the user notices.
Confirmation prompt before starting — adds a keypress without adding much safety; the printed plan summary (Looping every 5m, up to 10 times: ...) gives the same “last chance to Ctrl+C” moment without an extra interaction step, consistent with how /pipeline announces its plan before running.

Consequences.

A fully unattended /loop run is bounded to at most 100 iterations — e.g. roughly 8 hours at a 5-minute interval — which still covers realistic “check on this periodically” use cases.
Users who need more must re-invoke /loop, a deliberate speed bump rather than an oversight.

2026-06-08 — `/loop` requires an explicit interval; no self-pacing mode

Context. Claude Code’s /loop can omit the interval and let the agent self-pace via a wake-scheduling primitive. Harvey has no equivalent — it is a synchronous CLI process with no persistent scheduler or “wake me up later” mechanism.

Decision. INTERVAL is a required first argument to /loop, parsed with the existing parseDurationString helper (config.go:650, already used for run_timeout/ollama_timeout in harvey.yaml). There is no self-pacing mode.

Rejected alternatives.

Have Harvey “guess” an interval once and run at that fixed cadence — just a worse version of asking the user, with an extra layer of unpredictability.
Keep the process resident and let it wake itself — a fundamentally different program shape than Harvey’s synchronous REPL; far outside the scope of adding one command.

Consequences.

/loop’s usage string and help text always show INTERVAL as required.
Users coming from Claude Code’s /loop will notice the difference; the help text explains why (no async scheduler in Harvey).

2026-06-08 — `/loop` runs as a blocking foreground command, not a background goroutine

Context. Harvey’s REPL (terminal.go:Run) is a single-threaded loop that blocks on each turn, mutating a.History, a.Recorder, and the shared output writer with no locking — because nothing has ever run concurrently with it. Adding a command that repeats a prompt on an interval raises the question of whether it should run in the background while the user keeps typing, or take over the REPL until it finishes.

Decision. /loop runs in the foreground inside its own command handler, reusing the SIGINT-cancellation pattern already used three times in terminal.go (chat, ! commands, @mention dispatch): one cancellable context for the whole run, a goroutine watching os.Signal, and a wasCancelled check. Any Ctrl+C — mid-iteration or during the inter-iteration sleep — stops the whole loop and returns to the prompt.

Rejected alternatives.

Background goroutine — would require introducing locking around a.History, a.Recorder, and out, none of which exist today. The concurrency-safety surface this opens is large relative to the value of letting the user type while the loop runs.
“Ctrl+C cancels the iteration; a second Ctrl+C stops the loop” — a second control surface nothing else in Harvey has; rejected for consistency with the existing single-Ctrl+C-aborts convention.

Consequences.

/loop blocks the REPL for its duration — communicated up front via a printed plan summary before the first iteration runs.
/loop status / /loop stop subcommands aren’t meaningful (the REPL can’t read them while blocked) and are not implemented.
No new synchronization primitives are introduced anywhere in Harvey.

2026-06-05 — Profile templates and help guides ship embedded in the binary

Context. Harvey installs by copying a single executable to $HOME/bin. Users on three OS / two CPU architectures should not need to install a separate asset package. Templates and help guides must therefore travel with the binary.

Decision. Use Go’s //go:embed directive (standard library since Go 1.16) to compile a templates/ directory tree into the binary at build time. A single EmbeddedTemplates embed.FS variable in templates.go gives the rest of Harvey read access to template and help guide content at runtime. Workspace-local templates in agents/templates/profiles/ are checked at runtime and merged with the built-in list, allowing organisations to add shared templates without patching Harvey.

Rejected alternatives.

Separate asset directory alongside the binary — breaks the single-file install model.
Download templates from the internet on first run — requires network access, adds failure modes, complicates offline use on a Raspberry Pi.
Templates in harvey.yaml — templates are multi-line prose documents; embedding them in YAML is unreadable and fragile to edit.

Consequences.

templates/ directory added to the Harvey source tree; must be maintained alongside code.
Binary size increases modestly (six .spmd files and three Markdown guides are small).
templates.go is the single registration point for all embedded assets.

2026-06-05 — Initial developer/writer template set; library templates deferred

Context. Harvey needs a useful starting set of profile templates but the full range of library staff roles requires domain expertise and UX review that is not yet available.

Decision. Ship five developer/writer templates for v1:

Template	Role
`backend-developer`	Go, Python, TypeScript+Deno, SQL for application work
`frontend-developer`	HTML, CSS, TypeScript/JavaScript, Deno bundling
`dataset-developer`	Front end plus SQL, dataset CLI, datasetd web service
`data-scientist`	Data analysis, SQL for exploration, Python data tooling
`technical-writer`	Documentation, man pages, tutorials, Markdown and Fountain

Library role templates (subject specialist, systems/digital, instruction/data literacy, support staff) are deferred until library staff and a UX colleague can define the categories and content correctly. Placeholder files are named in the plan but not authored.

Rejected alternatives.

Ship library templates based on external assumptions — risks producing templates that do not match how library staff actually work, which would undermine trust in the feature.

Consequences.

Library users who try Harvey before the library templates ship will use blank.spmd or one of the developer templates as a starting point. Acceptable short-term.

2026-06-05 — `/profile use` verb; `/profile` top-level alias

Context. The profile switching command needed a name consistent with Harvey’s existing command vocabulary. Two candidates were considered: switch and use.

Decision. Use use as the subcommand verb because it matches the established pattern in Harvey: /ollama use, /rag use, and /kb use all select the active item from a list. Register /profile as a top-level alias delegating to /memory profile, following the same one-line handler pattern as /recall → /memory recall.

Rejected alternatives.

/profile switch — switch does not appear elsewhere in Harvey’s command vocabulary. use is already the selection verb.
/switch-profile or /change-profile — hyphenated commands are not the Harvey convention.

Consequences.

commands.go gains a "profile" entry in the top-level command table (identical in structure to "recall").
cmdMemoryProfile gains a "use" dispatch case.
/memory profile use, /profile use, and /profile (showing subcommand help) all work.

2026-06-05 — Profile switching writes a Fountain handoff document

Context. When a user switches profiles mid-session with /profile use, the in-progress conversation context would be lost after ClearHistory(). The user may need to resume the previous context in a future session.

Decision. Before clearing history, /profile use writes a .spmd summary file to agents/hand-off/<timestamp>.spmd. The handoff captures the last N assistant messages as bullet points and lists file paths and open questions from recent turns. No LLM call is required — the handoff is structural, not summarised. Because it is a .spmd file, the memory miner can extract facts from it in a later session, migrating context from the old role into the new session’s experience memories over time.

The previous workspace_profile document is archived (status set to archived) rather than deleted, preserving the history of who this workspace has been used as.

Rejected alternatives.

No handoff — context is lost on profile switch; acceptable only if profiles are rarely switched.
LLM-generated summary — higher quality but requires a blocking model call during the switch, adding latency and a failure mode.
Write handoff to the session file — session files record conversation turns, not profile transitions; mixing them would complicate the memory miner.

Consequences.

agents/hand-off/ directory is created at workspace init alongside agents/sessions/.
writeHandoff() function added to harvey.go.
Memory miner learns to process files from agents/hand-off/ as well as agents/sessions/.

2026-06-05 — Help guides for Ollama and PDF tools embedded in binary

Context. New users frequently fail to install Ollama or PDF extraction tools before running Harvey. The error messages Harvey currently produces do not explain what is missing or how to fix it. Users on three operating systems need platform-specific install instructions.

Decision. Embed short Markdown help guides (templates/help/ollama.md, templates/help/pdf-tools.md) in the binary using the same //go:embed infrastructure as profile templates. Surface them via /help ollama and /help pdf-tools. Print a one-line pointer to the relevant guide when a detection failure occurs at startup (Ollama unreachable) or during a command (PDF extraction fails). Guides are deliberately short: what it is, how to install on each platform, one troubleshooting line.

Rejected alternatives.

Link to external documentation only — requires network access to get help; unhelpful in offline or restricted environments.
Inline error messages only — install instructions for three platforms embedded in Go string literals are unmaintainable; Markdown guides are editable without touching code.

Consequences.

templates/help/ directory contains three Markdown files maintained alongside the code.
helptext.go gains OllamaHelpText and PDFToolsHelpText helpers.
terminal.go and pdf_extract.go each gain one conditional pointer line.

2026-06-02 — Persistent command history across sessions

Context. Harvey’s termlib.LineEditor supports Up/Down arrow history navigation within a session, but the history is in-memory only and lost on exit. Users must retype slash commands, ! shell commands, and prompts from prior sessions, which breaks flow — especially for repeated workflows like /rag ingest, /memory mine, or iterating on a prompt.

Decision. Persist the input history to agents/harvey_history inside the workspace (one entry per line, plain text). On startup Harvey loads this file and seeds the LineEditor before entering the REPL. On clean exit the in-memory history is written back, capped at 1000 entries (most recent kept). Consecutive duplicate suppression is already handled by AppendHistory; no further deduplication is applied at write time.

The implementation requires two changes:

termlib (lineeditor.go) — add two methods to LineEditor:
- SetHistory(lines []string) — replaces the in-memory history slice wholesale (used at startup).
- History() []string — returns a copy of the current history slice (used at exit to write back).
Harvey (terminal.go) — add loadCmdHistory(ws, le) called after le is created (line ~225), and saveCmdHistory(ws, le) called in the REPL exit path. Both functions resolve the path as ws.AbsPath("agents/harvey_history"). saveCmdHistory truncates to the last 1000 entries before writing.

The history file path is not configurable in this iteration; agents/ is Harvey’s conventional home for all runtime state (harvey.yaml, sessions/, memories/, rag/, knowledge.db).

Rejected alternatives.

Global ~/.harvey_history — shares history across workspaces, which leaks commands and paths between projects. Harvey’s workspace-boundary model makes per-workspace the correct scope.
Storing history in agents/harvey.yaml — would pollute the config file with ephemeral runtime data and complicate config schema evolution.
Parsing .spmd session files for history — session recordings are conversation transcripts, not command logs; extraction would be fragile and slow.

Consequences.

termlib/lineeditor.go gains SetHistory and History methods.
harvey/terminal.go gains loadCmdHistory and saveCmdHistory helper functions wired into the REPL startup and exit.
No changes to harvey.yaml schema, Config, or any other subsystem.
Concurrent Harvey sessions in the same workspace will silently overwrite each other’s history on exit (last-writer-wins), consistent with bash’s behaviour without HISTFILE locking.

2026-06-02 — UX nudge system for memory discoverability

Context. Users who understand the three storage silos (RAG / Memory / Knowledge Base) can get significantly better results, but the ingestion decision (“where does this go?”) breaks flow. No built-in mechanism surfaced actionable hints about pending mining, empty RAG stores, or RAG being disabled.

Decision. Implement a four-part nudge system:

Session-start digest — a sessionMemoryDigest() function called after the ready line that prints dim hints only when a condition is actionable:
- Unmined sessions pending → suggest /memory mine
- Active RAG store is empty → suggest /rag ingest
- RAG off but chunks exist → suggest /rag on No output is printed when everything looks healthy.
Enhanced /status — extend cmdStatus with a Memory/RAG summary block (active memories, unmined sessions, active store, chunk count, RAG on/off). Keeps the one-stop status view complete.
New /hint command — on-demand improvement suggestions that aggregate all three silos and explain the decision rule. Verbose version of the session digest with context about why each suggestion matters.
/help learn topic — a unified “How Harvey learns” help page with a three-column table (what to ingest → which command → where it goes) and the single decision rule:
- Have a text file or document? → /rag ingest
- Something useful happened in a session? → /memory mine
- Making an observation about an experiment? → /kb observe
/recall alias — routes to /memory recall to make the unified retrieval interface the obvious entry point.

Rejected alternatives. - Single storage silo — would reduce configuration but lose retrieval precision for small models. Topic-scoped RAG stores (e.g., deno_typescript, go) give better recall than one large mixed store. - Always-on verbose status — printing all memory info on every startup is too noisy. Only surface hints when actionable. - Merging /rag on + /memory recall into a single toggle — the per-prompt RAG injection (ragAugment) and session-start injection (UnifiedMemory.Recall) are different channels. A single toggle would require auditing whether UnifiedMemory already includes RAG chunks. Deferred to a future audit.

Consequences. - terminal.go gains a sessionMemoryDigest() call after the ready line. - commands.go gains cmdHint, enhanced cmdStatus, and a /recall registration. - helptext.go gains LearnHelpText. - cmdHelp dispatches "learn" and "memory-overview" to LearnHelpText. - help topic list is updated to include learn.

2026-06-02 — model_map in RAG stores (deferred simplification)

Context. Each RAG store entry in harvey.yaml has a model_map field that maps generation models to embedding models. In practice every store uses nomic-embed-text for all generation models, making the map redundant.

Decision. Deferred. Do not remove model_map now. The code is already correct and operational. Remove it when there is a concrete reason to simplify the config schema (e.g., adding a new embedder type that makes the override meaningful).

Consequences. model_map remains in the config and ragAugment continues to honour it. No user-visible change.

2026-06-02 — Dual RAG injection audit (deferred)

Context. Harvey has two RAG injection paths that run independently: 1. Per-prompt via ragAugment() in terminal.go (when a.RagOn) 2. Session-start via UnifiedMemory.Recall() which also queries the RAG store

A user with both memory.enabled and rag.enabled may receive RAG content twice per turn — once in the system prompt injection and once prepended to each prompt. This wastes context tokens and may confuse small models.

Decision. Deferred. Audit and fix when a user observes noticeably degraded context efficiency. The fix would be to either: (a) skip RAG chunks in UnifiedMemory.Recall() when a.RagOn is true, or (b) make ragAugment a no-op when UnifiedMemory already injected from the same store.

Consequences. Known overlap. No immediate action required.

2026-05-31 — prose tool call correction injection

Context. Small models emit tool calls as JSON fenced blocks rather than structured API responses. The original tryExecuteProseToolCalls returned bool and could not distinguish “dispatched successfully” from “dispatched but every call errored”. When models hallucinated tool names the warning was suppressed because len(results) > 0 was always true.

Decision. Change tryExecuteProseToolCalls to return (dispatched bool, unknownNames []string). Track a succeeded counter internally; set dispatched = true only when ≥1 call succeeded. When unknownNames is non-empty, inject a correction message into history after a.AddMessage("assistant", ...) so history ordering is: user → assistant → correction-user. This gives the model a chance to retry with the correct tool names.

Consequences. The noToolCalls guard also gates autoExecuteReply to prevent directory-tree code blocks from being offered as files to write after successful tool-call turns.

2026-05-31 — histLenBeforeChat pattern for noToolCalls guard

Context. Harvey needs to know whether a chat turn resulted in structured tool calls (via RunToolLoop) so it can skip autoExecuteReply when tool calls already handled file writing. The check len(a.History) == histLenBeforeChat correctly detects no tool calls only when captured before a.AddMessage.

Decision. Capture histLenBeforeChat := len(a.History) before the Chat/RunToolLoop call. Compute noToolCalls := len(a.History) == histLenBeforeChat before a.AddMessage. This invariant must be preserved: any refactor that moves a.AddMessage before the noToolCalls check will silently break the guard.

Consequences. Documented as a key invariant in CLAUDE.md.

2026-05-28 — Three-silo memory architecture

Context. Harvey needs to accumulate knowledge across sessions without polluting the LLM context window on every turn. Three distinct content types require different ingestion and retrieval strategies: (1) external documents, (2) session experience, (3) research observations.

Decision. Three independent silos unified at retrieval time by UnifiedMemory.Recall():

Silo	Ingestion	Retrieval
RAG store	`/rag ingest` (explicit)	Per-prompt via `ragAugment()`
Memory store	`/memory mine` or auto-mine on exit	Session-start via `UnifiedMemory`
Knowledge base	`/kb observe` (explicit)	On-demand via `UnifiedMemory`

Consequences. Each silo has its own command namespace (/rag, /memory, /kb). The unified retrieval via /memory recall is the recommended entry point. All three silos share a token budget enforced at injection time.

Harvey — Architecture & UX Decision Log

2026-06-25 — Source registry lives in knowledge.db; not a separate database

2026-06-25 — Scholarly provenance: inference-time only; training-time attribution deferred

2026-06-24 — INT./EXT. scene prefix redefined as local/remote computation

2026-06-24 — Fountain sessions become full audit trails (v0.0.15)

2026-06-20 — Command vocabulary standardised across all resource-management commands

2026-06-20 — Llamafile becomes the primary model backend; Ollama is secondary

2026-06-20 — At-mention (@model) switches the active model while preserving history

2026-06-20 — Unified /model command as a backend-agnostic delegating facade

2026-06-20 — Context utilization reads n_ctx from /v1/models; config override available

2026-06-19 — Tab completion: two-layer design with shared SelectFrom helper

2026-06-18 — MinIO replaced with aws-sdk-go-v2 S3 client

2026-06-18 — Spinner gains dynamic status message channel

2026-06-18 — Assay evaluation output moves to workspace-level directory

2026-06-18 — Assay adds Llamafile backend via --llamafile flag

2026-06-18 — Web developer template added to built-in profile set

2026-06-18 — /memory profile subcommand set expanded and naming standardized

2026-06-18 — PDF capability disclosed in HARVEY.md system prompt

2026-06-18 — Llamafile model discovery includes Windows .exe extensions

2026-06-18 — --resume flag auto-selects the most recent session

2026-06-09 — Programming language support uses a central LanguageRegistry with pluggable handlers

2026-06-09 — Code block path detection (looksLikePath) extended to support all RAG-ingestible languages

2026-06-08 — /loop chat iterations use a shared runChatTurn helper that skips skill auto-trigger and autoExecuteReply

2026-06-08 — /loop caps iterations at 100 and defaults to 10

2026-06-08 — /loop requires an explicit interval; no self-pacing mode

2026-06-08 — /loop runs as a blocking foreground command, not a background goroutine

2026-06-05 — Profile templates and help guides ship embedded in the binary

2026-06-05 — Initial developer/writer template set; library templates deferred

2026-06-05 — /profile use verb; /profile top-level alias

2026-06-05 — Profile switching writes a Fountain handoff document

2026-06-05 — Help guides for Ollama and PDF tools embedded in binary

2026-06-02 — Persistent command history across sessions

2026-06-02 — UX nudge system for memory discoverability

2026-06-02 — model_map in RAG stores (deferred simplification)

2026-06-02 — Dual RAG injection audit (deferred)

2026-05-31 — prose tool call correction injection

2026-05-31 — histLenBeforeChat pattern for noToolCalls guard

2026-05-28 — Three-silo memory architecture

2026-06-25 — Source registry lives in `knowledge.db`; not a separate database

2026-06-20 — At-mention (`@model`) switches the active model while preserving history

2026-06-20 — Unified `/model` command as a backend-agnostic delegating facade

2026-06-20 — Context utilization reads `n_ctx` from `/v1/models`; config override available

2026-06-18 — Assay adds Llamafile backend via `--llamafile` flag

2026-06-18 — `/memory profile` subcommand set expanded and naming standardized

2026-06-18 — `--resume` flag auto-selects the most recent session

2026-06-09 — Code block path detection (`looksLikePath`) extended to support all RAG-ingestible languages

2026-06-08 — `/loop` chat iterations use a shared `runChatTurn` helper that skips skill auto-trigger and `autoExecuteReply`

2026-06-08 — `/loop` caps iterations at 100 and defaults to 10

2026-06-08 — `/loop` requires an explicit interval; no self-pacing mode

2026-06-08 — `/loop` runs as a blocking foreground command, not a background goroutine

2026-06-05 — `/profile use` verb; `/profile` top-level alias