I replaced brittle mega-prompts with modular context loading and a persistent memory layer—RAG for long-running agent sessions.

Production-grade context patterns—what enterprises call RAG and context engineering—implemented locally with full audit trail and semantic search.

Token-budgeted memory · modular context · hybrid retrieval

500+: Typed memory records (budget-capped)
30: Modular context modules
Hybrid: Keyword + embedding retrieval

The drift problem

When an agent runs for hours, the model loses thread. Static system prompts grow until they break the budget. The fix is not a longer prompt—it is architecture: store what matters, retrieve what is relevant, load only the rules the task needs.

Memory bus, not mega-prompt

I persist decisions, facts, blockers, and preferences as typed records under a token cap. Semantic search and keyword retrieval pull the right atoms into each turn. Snapshots give me an audit trail when something goes wrong.

Session bootstrap loads canonical context within budget
Ranked records under token cap per turn
Keyword + embedding hybrid retrieval
Archive snapshots for audit and replay

Selective context loading

Thirty modular Markdown context modules replace brittle static system prompts. The IDE, orchestration layer, and memory service share the same anchor—selective loading by task, not dump-everything-into-context.

Agent role, comms, and workflow modules loaded on demand
Persona overlays for operator vs research modes
Deploy review gate · verification-first handoff
Searchable artifact corpus for grounded retrieval

Proof in production tools

DevFlow IDE sessions and the portfolio chatbot on this site both use the same patterns—bootstrap, retrieve, cite. I can walk through the architecture in conversation; the case studies show the outcomes.

DevFlow agent IDE Clarifi Voice evals All work