The agentic AI market hit $7.6B in 2025. Gartner says 40%+ of enterprise apps will embed role-specific AI agents by end of 2026. LangChain’s State of Agent Engineering report (1,300+ respondents) shows 57% of agents are already in production.

The tools are maturing fast. LangGraph hit v1.0. AWS shipped AgentCore. Anthropic released the Claude Agent SDK with hooks, plugins, and agent teams. Google ADK has native A2A. Microsoft’s Agent Framework is approaching GA. CopilotKit defined the AG-UI protocol and it’s already adopted by LangGraph, CrewAI, PydanticAI, and AgentCore.

And the harness engineering insight changes everything: LangChain proved that the same model jumped from Top 30 to Top 5 on Terminal Bench 2.0 with only harness changes. No fine-tuning. The harness is the product.

The Plan

I’m building across the full stack — frameworks, platforms, protocols, and agent categories. 38 agents total, covering single agents, RAG, multi-agent coordination, browser automation, voice agents, security red-teaming, deep long-running agents, and two production case studies.

Every agent gets benchmarked. Every comparison includes numbers. Every post leads with findings, not process.

What’s Different

When I compare LangSmith vs Langfuse vs Arize Phoenix, I run the same agent on all three and publish the numbers. When I build multi-agent patterns, I benchmark them head-to-head on the same task. When I test no-code vs hand-coded, I build the same use case on both and compare.

The code lives in public repos organized by pattern, not by week. The blog is organized by category — Technical, Comparison, Production, Case Study, Open Source — not by timeline.

Week 1

Research assistant with ReAct + Tavily + Pydantic output + think tool. Code analyzer with extended thinking. First LangGraph Academy course. First benchmarks.

Let’s go.