I replaced brittle mega-prompts with modular context loading and a persistent memory layer—RAG for long-running agent sessions.
Production-grade context patterns—what enterprises call RAG and context engineering—implemented locally with full audit trail and semantic search.
Token-budgeted memory · modular context · hybrid retrieval
- 500+
- Typed memory records (budget-capped)
- 30
- Modular context modules
- Hybrid
- Keyword + embedding retrieval
The drift problem
When an agent runs for hours, the model loses thread. Static system prompts grow until they break the budget. The fix is not a longer prompt—it is architecture: store what matters, retrieve what is relevant, load only the rules the task needs.
Memory bus, not mega-prompt
I persist decisions, facts, blockers, and preferences as typed records under a token cap. Semantic search and keyword retrieval pull the right atoms into each turn. Snapshots give me an audit trail when something goes wrong.
- Session bootstrap loads canonical context within budget
- Ranked records under token cap per turn
- Keyword + embedding hybrid retrieval
- Archive snapshots for audit and replay
Selective context loading
Thirty modular Markdown context modules replace brittle static system prompts. The IDE, orchestration layer, and memory service share the same anchor—selective loading by task, not dump-everything-into-context.
- Agent role, comms, and workflow modules loaded on demand
- Persona overlays for operator vs research modes
- Deploy review gate · verification-first handoff
- Searchable artifact corpus for grounded retrieval
Proof in production tools
DevFlow IDE sessions and the portfolio chatbot on this site both use the same patterns—bootstrap, retrieve, cite. I can walk through the architecture in conversation; the case studies show the outcomes.