Building production AI agents.
What works, what doesn't, and why. Comparisons, benchmarks, and implementation details.
Context Engineering
Agents degrade after 30 minutes. Memory architecture, JIT loading, compaction, and what actually keeps them sharp.
Agent Architectures
Coordination patterns compared on the same task. Supervisor, pipeline, debate — which ones hold up and which ones don't.
Production Stack
Observability with real cost breakdowns. No-code vs hand-coded. Deployment infra across platforms. The boring stuff that matters.
Recent Posts
All posts →Agents Forget. Every Common Fix Trades One Problem for Another.
Four context management strategies on the same task. The one with perfect recall blew the token budget. The cheapest one forgot everything.
Agent Memory Depends on a Prompt Nobody Tests
Two summarizer prompts. Same architecture. One recalled 29% of early facts, the other 86%.
Agents Read. They Don't Compute.
The agent fetched the file tree. All 62 Python files were listed. It said 17.
System Prompts Don't Guarantee Tool Use
Same agent, same 'you MUST' instruction, five different tools. Only three got called.