This document outlines the phased implementation plan for comprehensive programming language support in Harvey, based on the design specified in programming_language_support_design.md.
Status: Active
Created: 2026-06-09
Related Documents: - programming_language_support_design.md
- DECISIONS.md - Using_RAGs_with_Harvey.md
Date: 2026-06-09
File: commands.go
Function: looksLikePath (lines
3463-3472)
Change: Added missing programming language
extensions to the knownExts slice: - .c,
.cpp, .h, .hpp (C/C++) -
.pas (Pascal) - .Mod, .obn
(Oberon) - .lisp (Lisp) - .bas (Basic)
Impact: Tagged code blocks like
c:program.c` orpascal:module.pas` are now correctly
recognized as file paths, enabling auto-write functionality for these
languages.
The implementation is divided into 6 phases with clear milestones, deliverables, and success criteria. Each phase builds on the previous one and includes testing and documentation.
Duration: 1-2 weeks
Priority: High
Dependencies: None (foundational)
looksLikePath to use the registryFile: language_registry.go (new)
File: language_registry.go
File: language_registry.go
File: language_registry.go
File: commands.go (modified)
File: language_registry_test.go
(new)
language_registry.go — core registry, all types and
interfaceslanguage_registry_test.go — 35 tests, 0
failurescommands.go — looksLikePath
uses registryharvey.go — no change needed (registry uses
init())Duration: 1 week
Priority: High
Dependencies: Phase 1 complete
File: language_detector.go (new)
File: language_detector.go
File: language_detector.go
File: language_detector.go
File: language_detector_test.go
(new)
language_detector.go —
ExtensionDetector, ContentDetector,
CombinedDetector, detectShebang,
detectKeywords, isTextContentlanguage_detector_test.go — 45 tests, 0
failureslanguage_registry.go —
DetectLanguage method; all 21 languages wired with
CombinedDetectorDuration: 2-3 weeks
Priority: High
Dependencies: Phases 1-2 complete
File: code_chunkers.go (new)
File: code_chunkers.go
File: code_chunkers.go
File: code_chunkers.go
File: code_chunkers.go
File: code_chunkers.go
File: commands.go (modified)
File: rag_support.go (modified)
File: code_chunkers_test.go (new)
code_chunkers.go — CChunker,
PascalChunker, OberonChunker,
LispChunker, BasicChunker; helpers
findLineCol, makeChunk;
SetChunker/initChunkerscode_chunkers_test.go — 42 tests, 0 failurescommands.go — ragIngestFile
uses code-aware chunking + binary skiprag_support.go — IngestEnriched
+ lazy schema migrationlanguage_registry.go — init()
calls initChunkersDuration: 1-2 weeks
Priority: Medium
Dependencies: Phases 1-3 complete
File: doc_extractors.go (new)
File: doc_extractors.go
File: doc_extractors.go
File: doc_extractors.go
File: doc_extractors.go
File: doc_extractors.go
File: commands.go (modified)
File: doc_extractors_test.go (new)
doc_extractors.go — CDocExtractor,
PascalDocExtractor, OberonDocExtractor,
LispDocExtractor, BasicDocExtractor;
docsToSymbolMap; lispStringContent;
SetExtractor/initExtractorsdoc_extractors_test.go — 34 tests, 0 failurescommands.go — ragIngestFile
uses doc extractors to populate Docs fieldlanguage_registry.go — init()
calls initExtractorscode_chunkers.go — fixed
flushCurrent nil-access bug; fixed
extractPascalSymbol/extractOberonSymbol
leading-whitespace; fixed classifyC pointer return
typeDuration: 1 week
Priority: Medium
Dependencies: Phases 1-2 complete
File: syntax_highlighters.go (new)
File: syntax_highlighters.go
File: syntax_highlighters.go
Files: terminal.go (modified),
syntax_highlighters.go (new)
File: config.go (modified)
File: syntax_highlighters_test.go
(new)
syntax_highlighters.go —
TerminalHighlighter, 13 language specs,
highlightCodeBlocks, initHighlighterssyntax_highlighters_test.go — 30 tests; all
passterminal.go — highlightCodeBlocks
applied before displayconfig.go — SyntaxHighlight bool with
YAML load/savelanguage_registry.go — init() calls
initHighlightersDuration: 1-2 weeks
Priority: Medium
Dependencies: Phases 1-2 complete
File: code_formatters.go (new)
File: code_formatters.go
File: code_formatters.go
File: code_formatters.go
File: builtin_tools.go (modified)
Files: config.go (modified),
commands.go (modified)
File: code_formatters_test.go (new)
code_formatters.go —
PipeExternalFormatter, FileExternalFormatter,
BuiltinFormatter, normaliseText,
SetFormatter, initFormatterscode_formatters_test.go — 33 tests; all passbuiltin_tools.go — applyAutoFormat
wired into write_fileconfig.go — AutoFormat bool with YAML
load/savecommands.go — /format FILE [FILE...]
commandlanguage_registry.go — init() calls
initFormattersDirectory:
testdata/language_support/
File: language_integration_test.go
(new)
File: language_benchmark_test.go
(new)
File: Using_RAGs_with_Harvey.md
(modified)
File: RAG_Language_Support.md (new)
File: helptext.go (modified)
File: ARCHITECTURE.md (modified)
Includes: Phases 1-2
Deliverables: - Language registry with all 17 languages
- Language detection by extension and content - Updated
looksLikePath using registry - Comprehensive unit tests
Success Criteria: - All languages detected correctly - Registry functional and tested - No regressions in existing functionality
Includes: Phase 3
Deliverables: - Code-aware chunkers for all programming
languages - Integration with RAG ingestion - Improved retrieval quality
for code
Success Criteria: - Code structures preserved in chunks - Retrieval quality improved (measurable) - All existing RAG functionality preserved
Includes: Phases 4-6
Deliverables: - Documentation extraction - Syntax
highlighting - Auto-formatting - Full integration
Success Criteria: - Documentation extracted and associated with code - Code blocks colorized in terminal - Auto-formatting works when enabled - All features configurable
Includes: Cross-cutting tasks
Deliverables: - Comprehensive test suite - Updated
documentation - Performance benchmarks - Bug fixes and polish
Success Criteria: - All tests passing - Documentation complete - Performance acceptable - Ready for release
| Phase | Duration | Person-Days |
|---|---|---|
| Phase 1 | 1-2 weeks | 10-20 |
| Phase 2 | 1 week | 5-10 |
| Phase 3 | 2-3 weeks | 15-30 |
| Phase 4 | 1-2 weeks | 10-20 |
| Phase 5 | 1 week | 5-10 |
| Phase 6 | 1-2 weeks | 10-20 |
| Cross-cutting | 2 weeks | 20-30 |
| Total | 10-14 weeks | 75-150 |
| Dependency | Purpose | License | Notes |
|---|---|---|---|
| clang-format | C/C++ formatting | Apache 2.0 | Optional |
| black | Python formatting | MIT | Optional |
| prettier | JS/TS formatting | MIT | Optional |
| rustfmt | Rust formatting | Apache 2.0/MIT | Optional |
| sly | Lisp formatting | MIT | Optional |
| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| Chunker bugs break code across chunks | Medium | High | Extensive testing, fallback to generic chunking |
| Performance regression | Low | Medium | Benchmark before/after, optimize if needed |
| Embedding model limitations | Medium | Medium | Test with multiple models, document limitations |
| Memory usage increase | Medium | Medium | Profile memory, optimize data structures |
| Backward compatibility issues | Low | High | Maintain generic chunking as fallback, migration guide |
| External formatter dependencies | Low | Medium | Use built-in fallbacks, document requirements |
| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| Scope creep | Medium | Medium | Strict phase definitions, defer nice-to-haves |
| Resource availability | Medium | High | Prioritize critical features, defer optional |
| Testing complexity | Medium | Medium | Automate testing, create good test data |
| Integration issues | Medium | Medium | Early integration testing, continuous integration |
| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| Bugs in production | Medium | High | Comprehensive testing, code reviews |
| Poor user experience | Medium | Medium | User testing, iterate on feedback |
| Incomplete documentation | Medium | Medium | Documentation as part of each task |
| Performance issues | Low | Medium | Performance testing, profiling |
| File | Phase | Size (est.) | Purpose |
|---|---|---|---|
language_registry.go |
1 | ~500 lines | Language registry and metadata |
language_registry_test.go |
1 | ~300 lines | Registry tests |
language_detector.go |
2 | ~400 lines | Language detection |
language_detector_test.go |
2 | ~250 lines | Detection tests |
code_chunkers.go |
3 | ~800 lines | Code-aware chunkers |
code_chunkers_test.go |
3 | ~500 lines | Chunker tests |
doc_extractors.go |
4 | ~600 lines | Documentation extractors |
doc_extractors_test.go |
4 | ~400 lines | Extractor tests |
syntax_highlighters.go |
5 | ~600 lines | Syntax highlighting |
syntax_highlighters_test.go |
5 | ~400 lines | Highlighter tests |
code_formatters.go |
6 | ~500 lines | Code formatters |
code_formatters_test.go |
6 | ~300 lines | Formatter tests |
language_integration_test.go |
T | ~400 lines | Integration tests |
language_benchmark_test.go |
T | ~200 lines | Benchmark tests |
RAG_Language_Support.md |
D | ~500 lines | User documentation |
| Total | ~7,000 lines |
| File | Phase | Changes | Impact |
|---|---|---|---|
commands.go |
1, 3, 5 | Add registry usage, update chunking, add formatting | Core |
config.go |
5, 6 | Add language settings, formatter config | Core |
builtin_tools.go |
6 | Add auto-formatting to write_file | Core |
terminal.go |
5 | Add syntax highlighting | UI |
codeblock.go |
3 | Extend for language metadata | Core |
harvey.go |
1 | Initialize registry | Core |
Using_RAGs_with_Harvey.md |
D | Update with new features | Docs |
ARCHITECTURE.md |
D | Update with new components | Docs |
helptext.go |
D | Update help text | UI |
testdata/language_support/
├── c/
│ ├── functions.c
│ ├── structures.c
│ ├── preprocessor.c
│ └── complex.c
├── cpp/
│ ├── classes.cpp
│ ├── templates.cpp
│ └── inheritance.cpp
├── pascal/
│ ├── procedures.pas
│ ├── types.pas
│ └── units.pas
├── oberon/
│ ├── module.Mod
│ └── procedures.Mod
├── lisp/
│ ├── functions.lisp
│ ├── macros.lisp
│ └── classes.lisp
├── basic/
│ ├── subroutines.bas
│ └── functions.bas
└── expected/
├── c_chunks.json
├── pascal_chunks.json
└── ...
# Language support configuration
language:
# Enable auto-formatting on file write
auto_format: true
# Enable syntax highlighting in terminal
syntax_highlight: true
# Per-language settings
languages:
c:
enabled: true
formatter: "clang-format"
formatter_args: ["-"] # stdin mode; clang-format - reads from stdin
formatter_mode: pipe # pipe (default) or file
chunking: "function"
cpp:
enabled: true
formatter: "clang-format"
formatter_args: ["-style=google", "-"]
formatter_mode: pipe
pascal:
enabled: true
formatter: "builtin" # built-in Go formatter, no subprocess
formatter_mode: pipe # built-in always uses pipe mode
oberon:
enabled: true
formatter: "builtin"
formatter_mode: pipe
# Example of a hypothetical file-mode-only formatter:
# formatter: "oberon-format"
# formatter_mode: file # requires safe_mode: false in harvey.yaml
lisp:
enabled: true
formatter: "builtin" # or "sly" if installed
basic:
enabled: true
formatter: "builtin"
# Existing languages
go:
enabled: true
formatter: "gofmt"
python:
enabled: true
formatter: "black"
javascript:
enabled: true
formatter: "prettier"
formatter_args: ["--tab-width=2", "--single-quote"]# Enable/disable auto-formatting
harvey> /config set language.auto_format true
harvey> /config set language.auto_format false
# Set formatter for a language
harvey> /config set language.c.formatter clang-format
harvey> /config set language.c.formatter_args "-style=llvm"
# Enable/disable syntax highlighting
harvey> /config set language.syntax_highlight true
# Manually format a file
harvey> /format path/to/file.c
# Show supported languages
harvey> /languages list
# Show language info
harvey> /languages info c
# Test highlighting
harvey> /highlight c path/to/file.cThis plan is a living document. It will be updated as implementation progresses and as new requirements or constraints emerge.