Status (2026-05-02): Implemented with named-store registry. See ARCHITECTURE.md for the current design. The planning decisions below were adopted; the main evolution beyond this document is multi-store support (RagStoreEntry registry, /rag new NAME, /rag switch NAME, /rag drop NAME) so different knowledge domains (golang, writing, research, etc.) can coexist as separate SQLite files while only the active one is held open in memory.

🧠 Harvey RAG Integration Plan (Hybrid Embedding Model Approach)

🎯 Goal

Add Retrieval-Augmented Generation (RAG) support to Harvey while:

Leveraging the existing knowledge base (KB) as the source of truth
Maintaining loose coupling with Ollama
Avoiding embedding mismatch issues
Keeping infrastructure simple (SQLite, no vector DB initially)

🧩 Core Concept

✅ Separation of concerns

Knowledge Base (raw data)
        ↓
Embedding Model (e.g. nomic-embed-text)
        ↓
RAG Index (SQLite per embedding model)
        ↓
Generation Model (granite4, llama3, etc.)

✅ Key Design Decisions

1. Use embedding model–scoped RAG databases

Instead of per-generation-model:

❌ granite4.db
❌ llama3.db

Use:

✅ rag_nomic_v1.db
✅ rag_mxbai_v1.db

2. Map generation model → embedding model

Example:

type ModelConfig struct {
    GenerationModel string
    EmbeddingModel  string
    RagDBPath       string
}

var ModelRegistry = map[string]ModelConfig{
    "granite4": {
        GenerationModel: "granite4",
        EmbeddingModel:  "nomic-embed-text",
        RagDBPath:       "rag_nomic_v1.db",
    },
    "llama3": {
        GenerationModel: "llama3",
        EmbeddingModel:  "nomic-embed-text",
        RagDBPath:       "rag_nomic_v1.db",
    },
}

3. Explicit ingestion step

harvey ingest --embedding-model nomic-embed-text

Generates embeddings
Stores them in SQLite
Can be run offline / batch

4. Enforce embedding consistency

Strict runtime check:

if embedder.Name() != r.embeddingModel {
    return errors.New("embedding model mismatch")
}

Prevents:

Silent retrieval failures
Mixed embedding spaces

⚙️ Go Module Design (`package harvey`)

📁 File Structure

harvey/
  rag_support.go
  rag_support_test.go

📦 `rag_support.go` (Design Overview)

✅ Responsibilities

Manage SQLite-based RAG index
Store embeddings as BLOBs
Provide ingest + query APIs
Compute cosine similarity in Go

✅ Interfaces

type Embedder interface {
    Embed(text string) ([]float64, error)
    Name() string
}

Allows:

Ollama embedder
Mock embedder (for tests)
Future providers

✅ Core Types

type RagStore struct {
    db             *sql.DB
    embeddingModel string
}

type Chunk struct {
    ID      int64
    Content string
}

✅ SQLite Schema

CREATE TABLE IF NOT EXISTS chunks (
    id INTEGER PRIMARY KEY,
    content TEXT NOT NULL,
    embedding BLOB NOT NULL
);

Future extensions:

source_id TEXT,
chunk_index INTEGER,
tags TEXT

✅ Initialization

func NewRagStore(dbPath, embeddingModel string) (*RagStore, error)

Uses:

import _ "github.com/glebarez/go-sqlite"

Driver:

sql.Open("sqlite", dbPath)

✅ Ingest Flow

func (r *RagStore) Ingest(texts []string, embedder Embedder) error

Steps:

Validate embedding model
Generate embeddings
Serialize vectors
Store in SQLite (transaction)

✅ Query Flow

func (r *RagStore) Query(query string, embedder Embedder, topK int) ([]Chunk, error)

Steps:

Embed query
Load all stored embeddings
Compute cosine similarity
Sort results
Return top-K chunks

✅ Cosine Similarity

func cosineSimilarity(a, b []float64) float64

Pure Go
Works for small-medium datasets

✅ Serialization

Binary format:

[int32 length][float64...]

Functions:

serialize([]float64) []byte
deserialize([]byte) []float64

🧪 `rag_support_test.go`

✅ Uses mock embedder

type mockEmbedder struct {
    name string
}

Ensures:

Deterministic embeddings
No dependency on Ollama

✅ Test Coverage

1. Ingest + query works

Inserts multiple documents
Queries for relevant content
Verifies correct retrieval

2. Embedding mismatch protection