Status (2026-05-02): Implemented with named-store registry. See ARCHITECTURE.md for the current design. The planning decisions below were adopted; the main evolution beyond this document is multi-store support (RagStoreEntry registry, /rag new NAME, /rag switch NAME, /rag drop NAME) so different knowledge domains (golang, writing, research, etc.) can coexist as separate SQLite files while only the active one is held open in memory.


🧠 Harvey RAG Integration Plan (Hybrid Embedding Model Approach)

🎯 Goal

Add Retrieval-Augmented Generation (RAG) support to Harvey while:


🧩 Core Concept

βœ… Separation of concerns

Knowledge Base (raw data)
        ↓
Embedding Model (e.g. nomic-embed-text)
        ↓
RAG Index (SQLite per embedding model)
        ↓
Generation Model (granite4, llama3, etc.)

βœ… Key Design Decisions

1. Use embedding model–scoped RAG databases

Instead of per-generation-model:

❌ granite4.db
❌ llama3.db

Use:

βœ… rag_nomic_v1.db
βœ… rag_mxbai_v1.db

2. Map generation model β†’ embedding model

Example:

type ModelConfig struct {
    GenerationModel string
    EmbeddingModel  string
    RagDBPath       string
}

var ModelRegistry = map[string]ModelConfig{
    "granite4": {
        GenerationModel: "granite4",
        EmbeddingModel:  "nomic-embed-text",
        RagDBPath:       "rag_nomic_v1.db",
    },
    "llama3": {
        GenerationModel: "llama3",
        EmbeddingModel:  "nomic-embed-text",
        RagDBPath:       "rag_nomic_v1.db",
    },
}

3. Explicit ingestion step

harvey ingest --embedding-model nomic-embed-text

4. Enforce embedding consistency

Strict runtime check:

if embedder.Name() != r.embeddingModel {
    return errors.New("embedding model mismatch")
}

Prevents:


βš™οΈ Go Module Design (package harvey)

πŸ“ File Structure

harvey/
  rag_support.go
  rag_support_test.go

πŸ“¦ rag_support.go (Design Overview)

βœ… Responsibilities


βœ… Interfaces

type Embedder interface {
    Embed(text string) ([]float64, error)
    Name() string
}

Allows:


βœ… Core Types

type RagStore struct {
    db             *sql.DB
    embeddingModel string
}

type Chunk struct {
    ID      int64
    Content string
}

βœ… SQLite Schema

CREATE TABLE IF NOT EXISTS chunks (
    id INTEGER PRIMARY KEY,
    content TEXT NOT NULL,
    embedding BLOB NOT NULL
);

Future extensions:

source_id TEXT,
chunk_index INTEGER,
tags TEXT

βœ… Initialization

func NewRagStore(dbPath, embeddingModel string) (*RagStore, error)

Uses:

import _ "github.com/glebarez/go-sqlite"

Driver:

sql.Open("sqlite", dbPath)

βœ… Ingest Flow

func (r *RagStore) Ingest(texts []string, embedder Embedder) error

Steps:

  1. Validate embedding model
  2. Generate embeddings
  3. Serialize vectors
  4. Store in SQLite (transaction)

βœ… Query Flow

func (r *RagStore) Query(query string, embedder Embedder, topK int) ([]Chunk, error)

Steps:

  1. Embed query
  2. Load all stored embeddings
  3. Compute cosine similarity
  4. Sort results
  5. Return top-K chunks

βœ… Cosine Similarity

func cosineSimilarity(a, b []float64) float64

βœ… Serialization

Binary format:

[int32 length][float64...]

Functions:

serialize([]float64) []byte
deserialize([]byte) []float64

πŸ§ͺ rag_support_test.go

βœ… Uses mock embedder

type mockEmbedder struct {
    name string
}

Ensures:


βœ… Test Coverage

1. Ingest + query works


2. Embedding mismatch protection

Ensures:


πŸ”„ Runtime Flow in Harvey

βœ… Query execution

User selects model (granite4)
    ↓
Lookup ModelConfig
    ↓
Get embedding model (nomic-embed-text)
    ↓
Load corresponding RAG DB
    ↓
Embed query
    ↓
Retrieve top-K chunks
    ↓
Inject into prompt
    ↓
Call Ollama (granite4)

⚠️ Known Trade-offs

βœ… Pros


⚠️ Cons


πŸš€ Future Enhancements

πŸ”§ Performance

🧠 Retrieval quality

βš™οΈ Storage


❓ Open Questions

These will influence your next steps:

  1. How large is your knowledge base?
    • <10k chunks β†’ current design is perfect
    • 100k β†’ may need indexing soon

  2. Will users switch models frequently?
    • If yes β†’ shared embedding index is important
  3. Do you already have document chunking?
    • If not, this is the next critical feature
  4. Should ingestion be automatic or manual?
    • CLI-driven vs background processing
  5. Do you want offline-only operation?
    • Affects embedding strategy + caching

βœ… TL;DR


If you want, I can next: