Home > Topics > Software Development and Coding
๐ค๐งญ Agentic Software Engineering
๐ค AI Summary
-
๐ High-Level Summary: Agentic Software Engineering (ASE) represents a fundamental paradigm shift in how software is conceived, developed, and maintained. It leverages AI agents capable of autonomous action, tool use, and iterative problem-solving to amplify human software engineering capabilities. ASE encompasses the study of patterns, tools, frameworks, and best practices for effectively collaborating with AI coding agents. The discipline is evolving rapidly, with new patterns, tools, and research emerging weekly. ๐ง ๐ป๐
-
๐ Key Concepts: The field is characterized by long-running agent sessions with tool execution capabilities, prompt engineering for agents, multi-agent orchestration, human-in-the-loop oversight, context engineering, and the development of agent-specific software engineering practices distinct from traditional human-centric approaches. The core shift is from โcode is expensiveโ to โcode is cheap nowโ - focusing human effort on architecture, quality, and integration rather than implementation details. โก๐
๐ง Mental Models
๐ The Spectrum of AI-Assisted Development
- ๐ฑ AI Prototyping - Fast, exploratory, prompt-driven. Great for learning and proving concepts. CTCO
- ๐ฏ Directed AI Assistance - Specify constraints, reference patterns, define success upfront. Tool is a lever; youโre in control. CTCO
- ๐ฅ Agent Orchestration - Split work across multiple agents, run in parallel, integrate results. Like managing a small team. CTCO
- ๐๏ธ Agentic Engineering - Build persistent workflows with context, guardrails, and quality gates. Stay accountable for security and delivery. Simon Willison
๐ก Core Principles
-
๐ฐ Code is Cheap Now - The fundamental mental shift: writing code has become nearly free. This changes everything about how we estimate, plan, and execute. Focus on architecture, quality, and integration - the decisions that require human judgment. Simon Willison
-
โ๏ธ SE for Humans vs SE for Agents - Agentic SE introduces a fundamental duality: traditional human-centric development and new agent-centric workflows. Each requires different tools, processes, and artifacts. arXiv
-
๐ From Vibe Coding to Agentic Engineering - โVibe codingโ (using AI without attention to code, often by non-programmers) contrasts with โagentic engineeringโ (professional software engineers using AI to amplify their expertise). The latter involves rigorous methodology, quality gates, and accountability. Simon Willison
๐ฏ The Agentic Engineer Role
-
๐งโ๐ป Architect - High-level system design and decision-making
-
๐ Quality Controller - Code review, standards enforcement, catching what agents miss
-
๐ Context Manager - Maintaining documentation and context that makes agents effective
-
๐ฏ Strategist - Deciding what to build and how to approach it
-
๐ Key Insight: You need to be able to do what the agent does, and recognize when itโs gone wrong. Spot subtle bugs, security gaps, and maintainability issues. CTCO
๐ง How-To Guidance
๐งช Test-Driven Development for Agents
-
๐ Red/Green TDD - Write tests first, confirm they fail, then implement. Test-first development helps agents write more succinct, reliable code with minimal prompting. Simon Willison
-
๐ด Force Broken Tests - Require agents to write failing tests before implementation to ensure they understand requirements. Atomic Object
-
โถ๏ธ Real Data Testing - Use production-like data to gain clarity on edge cases. Atomic Object
-
โธ๏ธ Human-in-the-Loop - Pause the AI and run tests yourself during development cycles. Atomic Object
๐ Context Engineering
-
๐๏ธ CLAUDE.md Files - Project-specific instructions, conventions, and rules that persist across sessions. Anthropic
-
๐ Instruction Directories - Maintain persistent context files that give agents project-specific rules, patterns, and conventions. The more context you give, the better the outputs. CTCO
-
๐ฅ Just-in-Time Loading - Load relevant context only when needed to avoid overwhelming the context window. Morph
-
๐ซ .claudeignore - Exclude irrelevant files from agent context to improve focus and reduce noise. Morph
-
๐ Subagent Isolation - Delegate subtasks to isolated subagents to prevent context pollution. Morph
๐งฉ Problem Decomposition
-
๐๏ธ Break Work Into Chunks - When orchestrating agents, break work into independent chunks. Each agent needs clear context about its piece and how it connects to others. CTCO
-
๐ฆ Batch Related Work - Group similar tasks together to maximize context efficiency. Aditya Bawankule
-
๐ Parallel Worktrees - Use Git worktrees for parallel work without conflicts. Aditya Bawankule
๐ Human Oversight
-
๐๏ธ Confirm Before Acting - Set explicit confirmation requirements for destructive or irreversible actions. Simon Willison
-
๐ก๏ธ Guardrails - Implement safety checks and boundaries for agent actions.
-
๐ Observability - Log, trace, and monitor agent sessions for debugging and quality control.
๐ ๏ธ Tools & Frameworks
๐ค Coding Agents
| ๐ Tool | ๐ Key Features | ๐ Best For |
|---|---|---|
| Claude Code | Terminal-based, long-running sessions, tool execution, subagents, agent teams | Deep refactoring, complex debugging |
| OpenAI Codex | Multi-agent parallel execution, Codex Agent Loop | Large-scale automation, enterprise |
| GitHub Copilot | IDE integration, chat, agent tasks | Interactive development |
| Cursor | AI-native IDE, AI Tab, Chat, Ctrl+K | Integrated AI development |
-
๐ Model Selection - OpenAIโs Codex series optimized for code execution. Anthropicโs Claude excels at reasoning. Googleโs Gemini 3.1 offers strong performance at half Claudeโs price. Simon Willison
-
๐ SWE-Bench Leaders - Claude Opus 4.6 (Thinking) leads with 79.20% on SWE-bench, followed by Gemini 3 Flash (76.20%) and GPT 5.2 (75.40%). Vals AI
๐๏ธ Agent Frameworks
-
๐ LangChain - Python-based framework for building agent applications. langchain.com
-
๐ค AutoGen (Microsoft) - Multi-agent conversation framework. microsoft.com/autogen
-
๐ฆ OpenClaw - Open-source autonomous coding agent. GitHub
-
โ๏ธ CrewAI - Multi-agent orchestration for complex workflows. crewai.com
-
๐ Spring AI - Enterprise Spring-based agent patterns. Spring
๐ Model Context Protocol (MCP)
-
๐ What is MCP - Open standard by Anthropic that defines a unified way for AI agents to connect to external tools, data sources, and services. Like โUSB-C for AI integrations.โ Anthropic
-
๐ ๏ธ MCP Servers - Pre-built integrations for databases, APIs, and tools. GitHub
-
๐ฆ MCP Registry - Growing ecosystem of MCP-compatible tools. modelcontextprotocol.io
๐พ Local Models
-
๐ฅ๏ธ Ollama - CLI tool for running LLMs locally. ollama.com
-
๐ฑ LM Studio - Desktop app for local LLM experimentation. lmstudio.ai
-
๐ Jan - Privacy-focused local AI. jan.ai
-
๐ Benefits - Data privacy, no API costs, offline capability, total control. AI Lexicon
๐ Security & Safety
๐จ OWASP Top 10 for Agentic AI (2026)
- ๐๏ธ Sensitive Data Disclosure - Agents may expose sensitive data through outputs or tool calls
- ๐ Tool Poisoning - Compromised tools inject malicious behavior
- ๐ง Memory Pollution - Agent context manipulation through injected memories
- ๐ญ Prompt Injection - External inputs override agent instructions
- ๐ Unbounded Execution - Agents can execute unlimited actions without oversight
- ๐ฆ Dependency Confusion - Agent dependencies can be hijacked
- ๐ฆ Multi-Agent Malware - Agents can spread malicious behavior
- ๐ Code Injection - Agent-generated code contains exploits
- ๐ค Identity Confusion - Agents impersonate multiple identities
- โก Denial of Wallet - Uncontrolled agent resource consumption
๐ก๏ธ Security Best Practices
- ๐ Least Privilege - Grant agents minimum necessary permissions
- โ Input Validation - Sanitize all inputs to agents
- ๐ Audit Logging - Complete traceability of agent actions
- ๐ Secret Management - Never expose credentials to agents unnecessarily
- ๐๏ธ Human Approval - Require human confirmation for sensitive operations
๐ Observability & Monitoring
๐ Key Metrics
- โฑ๏ธ Latency - Response time per step and overall task completion
- ๐ฐ Cost - Token usage and API costs per task
- โ Success Rate - Task completion and quality metrics
- ๐ Token Usage - Context window utilization and efficiency
๐ ๏ธ Observability Tools
- ๐ LangSmith - LangChainโs observability platform
- ๐ Datadog - AI observability and monitoring
- ๐ OpenTelemetry - Open standard for tracing agents
- ๐ AgentOps - Agent-specific monitoring
๐ฏ Production Considerations
- ๐ Trace Tool Calls - Every LLM call, tool execution, and decision needs logging
- ๐พ Checkpoint State - Save agent state for recovery and debugging
- ๐ Cost Alerts - Set thresholds to prevent runaway spending
- ๐ Quality Evaluation - Automated assessment of agent outputs
๐ Key Research & Papers
๐ฌ Foundational Papers
-
๐ Agentic Software Engineering: Foundational Pillars and a Research Roadmap - Establishes ASE as a research area, identifies key pillars and future directions.
-
๐ Toward Agentic Software Engineering Beyond Code - Explores vision, values, and vocabulary for ASE.
-
๐ Toward an Agentic Infused Software Ecosystem - Argues for rethinking the software ecosystem around AI agents.
-
๐ LLM-Based Agentic Systems for Software Engineering - Challenges and opportunities in LLM-based multi-agent SE systems.
-
๐ Trustworthy AI Software Engineers - What it means for AI agents to be considered software engineers.
-
๐ daVinci-Dev: Agent-native Mid-training for Software Engineering - Training models specifically for agentic software engineering.
๐ Evaluation Benchmarks
-
๐ SWE-bench - Software engineering benchmark with production tasks. Claude Opus 4.6 leads at 79.20%.
-
๐ SWE-bench Pro - More challenging version with 1,865 real repository tasks.
-
๐ฌ SWE-rebench - Automated pipeline for decontaminated agent evaluation.
๐ Key Trends (2026)
๐ From Single Agents to Coordinated Teams
- Complex tasks now span multiple specialized agents working in parallel
- Each agent handles a specific subtask with dedicated context
- Integration and orchestration become critical skills
โฑ๏ธ Long-Running Agents
- Agents can now work autonomously for hours, handling multi-file refactors
- Persistence and state management become essential
- Checkpointing and recovery mechanisms mature
๐๏ธ Human Oversight Evolution
- From โhuman in the loopโ to โhuman on the loopโ - oversight rather than constant intervention
- Intelligent collaboration - humans focus on decisions, agents handle implementation
- Escalation protocols for ambiguous or high-stakes decisions
๐ Security-First Architecture
- Agent-generated code introduces new attack surfaces
- Guardrails, sandboxing, and permission systems become standard
- Non-human identities (NHIs) emerge as a security category
๐ฐ Cost Management
- Token usage optimization becomes critical
- Prompt caching reduces costs 90% for long sessions
- Budget limits and spending alerts for production agents
๐ฏ Practical Next Steps
๐งช For Individuals
- ๐ Start with TDD - Agent-friendly tests = better agent output
- ๐ Build your instruction directory - Project conventions, patterns, and rules
- ๐ฏ Orchestrate, donโt micromanage - Give agents goals, not step-by-step instructions
- ๐ Invest in observability - Agent sessions need logging, tracing, and rollback strategies
- ๐ Keep learning - This space evolves weekly; follow Simon Willison, Anthropic engineering, and arXiv SE research
๐ข For Teams
- ๐ Establish Agent Guidelines - Document approved patterns and restrictions
- ๐ Implement Security Gates - Scan agent outputs before production
- ๐ Monitor Costs - Set budgets and track agent spending
- ๐ฅ Create Agent Librarian Role - Maintain context and patterns for the team
- ๐ Iterate on Processes - Learn from each agent interaction
๐ Bibliography & References
๐ Books in This Vault
- ๐คโ๏ธ The Agentic AI Engineerโs Handbook - Distills essential principles and actionable methodologies for designing, developing, and deploying robust agentic AI systems.
- ๐คโ๏ธ Agentic Artificial Intelligence - Argues agentic AI is the most significant tech revolution since the GUI, with early adopters gaining compounding intelligence advantages.
- ๐ค๐ง โ๏ธ๐ก Building Agentic AI Systems - Provides a roadmap for developing AI agents that operate independently, make decisions, and adapt to dynamic environments.
- ๐ค๐๏ธ AI Engineering: Building Applications with Foundation Models - Comprehensive guide focusing on practical application of pre-trained models to build real-world AI products.
- ๐ค๐ป Vibe Coding - The definitive manifesto for building production-grade software with GenAI, shifting developer focus from syntax to intent.
- ๐คโ๏ธ AI Agents in Action - Provides a proven framework for developing practical agents that handle real-world business and personal tasks.
- โจ๐ค๐๐ Generative AI with LangChain - Hands-on guide to building LLM applications and multi-agent orchestration using Python and LangGraph.
- ๐ปโ๏ธ The Art of Prompt Engineering with ChatGPT - Accessible practical introduction to prompt engineering for ChatGPT, moving beyond simple queries.
- โจ๏ธ๐ค Prompt Engineering for LLMs - Technical frameworks for structuring effective AI inputs to get the best results from LLMs.
๐บ Videos in This Vault
- ๐ค๐ฃ๏ธ๐ฎโจ AI Talks: Gen AI 2026 - Covers the shift from basic code completion to agentic engineering, targeting 30-70% efficiency gains.
- ๐ค๐ป๐โฌ๏ธ2๏ธโฃ The 5 Levels of AI Coding - Explains the five levels of AI coding capability, from autocomplete to fully autonomous agents.
- ๐จโ๐ปโก๏ธ๐ค๐งฉ Beyond the IDE - Argues that merging becomes the bottleneck when coding is automated, requiring new coordination patterns.
- ๐ผ๏ธ๐ค๐ ๏ธ๐ค Context Engineering for Agents - Categorizes context into instructions, memories, few-shot examples, tools, knowledge, and environmental feedback.
- ๐ค๐โฌ๏ธโ 12-Factor Agents - Draws parallels between reliable agent design and the original 12-factor app methodology.
- ๐ง ๐ ๏ธ๐ธ๏ธ๐ซ No Vibes Allowed - Emphasizes the RPI (Research, Plan, Implement) workflow to enforce deliberate System 2 thinking in agents.
- ๐ค๐ง โ๏ธ๐ฉโ๐ป AI Engineering with Chip Huyen - Argues AI engineering focuses on product development leveraging existing capabilities through APIs rather than building models.
๐ฌ Research Papers
- Agentic Software Engineering: Foundational Pillars and a Research Roadmap
- Toward Agentic Software Engineering Beyond Code
- Toward an Agentic Infused Software Ecosystem
- LLM-Based Agentic Systems for Software Engineering
- Trustworthy AI Software Engineers
- daVinci-Dev: Agent-native Mid-training for SE
- Agyn: Multi-Agent System for Team-Based SE
- Multi-Agent Coordinated Rename Refactoring
- Configuring Agentic AI Coding Tools
- TDFlow: Agentic Workflows for TDD
๐ฐ Articles & Blogs
- Simon Willisonโs Agentic Engineering Patterns
- Writing about Agentic Engineering Patterns
- From Vibe Coding to Agentic Engineering - CTCO
- The Complete Guide to Agentic Coding 2026 - TeamDay
- Claude Code Context Engineering
- Context Engineering: Complete Guide - Morph
- Effective Context Engineering - Anthropic
- How Codex is Built - Pragmatic Engineer
- Claude Code Edges OpenAIโs Codex
- OWASP Top 10 for Agentic Applications
- Agentic AI Security Explained - IBM
- Agentic Workflows Guide - Redis
- AI Observability 2026 - ZeonEdge
- The State of Coding Agents - Feb 2026
- Vibe Coding vs Agentic Engineering - Versatik
- Agentic Engineering Guide - Cosmo Edge
- Building Production AI Agents 2026
๐ ๏ธ Tools & Frameworks
- Claude Code
- OpenAI Codex
- GitHub Copilot
- Cursor
- LangChain
- AutoGen
- OpenClaw
- CrewAI
- Ollama
- LM Studio
- Model Context Protocol
๐ Benchmarks
- ๐๏ธ Last Updated: 2026-03-02
- ๐ Research in Progress - This topic is actively being developed