Home > Topics > Software Development and Coding
π€π§ Agentic Software Engineering
π€ AI Summary
-
π High-Level Summary: Agentic Software Engineering (ASE) represents a fundamental paradigm shift in how software is conceived, developed, and maintained. It leverages AI agents capable of autonomous action, tool use, and iterative problem-solving to amplify human software engineering capabilities. ASE encompasses the study of patterns, tools, frameworks, and best practices for effectively collaborating with AI coding agents. The discipline is evolving rapidly, with new patterns, tools, and research emerging weekly. π§ π»π
-
π Key Concepts: The field is characterized by long-running agent sessions with tool execution capabilities, prompt engineering for agents, multi-agent orchestration, human-in-the-loop oversight, context engineering, and the development of agent-specific software engineering practices distinct from traditional human-centric approaches. The core shift is from βcode is expensiveβ to βcode is cheap nowβ - focusing human effort on architecture, quality, and integration rather than implementation details. β‘π
π§ Mental Models
π The Spectrum of AI-Assisted Development
- π± AI Prototyping - Fast, exploratory, prompt-driven. Great for learning and proving concepts. CTCO
- π― Directed AI Assistance - Specify constraints, reference patterns, define success upfront. Tool is a lever; youβre in control. CTCO
- π₯ Agent Orchestration - Split work across multiple agents, run in parallel, integrate results. Like managing a small team. CTCO
- ποΈ Agentic Engineering - Build persistent workflows with context, guardrails, and quality gates. Stay accountable for security and delivery. Simon Willison
π‘ Core Principles
-
π° Code is Cheap Now - The fundamental mental shift: writing code has become nearly free. This changes everything about how we estimate, plan, and execute. Focus on architecture, quality, and integration - the decisions that require human judgment. Simon Willison
-
βοΈ SE for Humans vs SE for Agents - Agentic SE introduces a fundamental duality: traditional human-centric development and new agent-centric workflows. Each requires different tools, processes, and artifacts. arXiv
-
π From Vibe Coding to Agentic Engineering - βVibe codingβ (using AI without attention to code, often by non-programmers) contrasts with βagentic engineeringβ (professional software engineers using AI to amplify their expertise). The latter involves rigorous methodology, quality gates, and accountability. Simon Willison
π― The Agentic Engineer Role
-
π§βπ» Architect - High-level system design and decision-making
-
π Quality Controller - Code review, standards enforcement, catching what agents miss
-
π Context Manager - Maintaining documentation and context that makes agents effective
-
π― Strategist - Deciding what to build and how to approach it
-
π Key Insight: You need to be able to do what the agent does, and recognize when itβs gone wrong. Spot subtle bugs, security gaps, and maintainability issues. CTCO
π§ How-To Guidance
π§ͺ Test-Driven Development for Agents
-
π Red/Green TDD - Write tests first, confirm they fail, then implement. Test-first development helps agents write more succinct, reliable code with minimal prompting. Simon Willison
-
π΄ Force Broken Tests - Require agents to write failing tests before implementation to ensure they understand requirements. Atomic Object
-
βΆοΈ Real Data Testing - Use production-like data to gain clarity on edge cases. Atomic Object
-
βΈοΈ Human-in-the-Loop - Pause the AI and run tests yourself during development cycles. Atomic Object
π Context Engineering
-
ποΈ CLAUDE.md Files - Project-specific instructions, conventions, and rules that persist across sessions. Anthropic
-
π Instruction Directories - Maintain persistent context files that give agents project-specific rules, patterns, and conventions. The more context you give, the better the outputs. CTCO
-
π₯ Just-in-Time Loading - Load relevant context only when needed to avoid overwhelming the context window. Morph
-
π« .claudeignore - Exclude irrelevant files from agent context to improve focus and reduce noise. Morph
-
π Subagent Isolation - Delegate subtasks to isolated subagents to prevent context pollution. Morph
π§© Problem Decomposition
-
ποΈ Break Work Into Chunks - When orchestrating agents, break work into independent chunks. Each agent needs clear context about its piece and how it connects to others. CTCO
-
π¦ Batch Related Work - Group similar tasks together to maximize context efficiency. Aditya Bawankule
-
π Parallel Worktrees - Use Git worktrees for parallel work without conflicts. Aditya Bawankule
π Human Oversight
-
ποΈ Confirm Before Acting - Set explicit confirmation requirements for destructive or irreversible actions. Simon Willison
-
π‘οΈ Guardrails - Implement safety checks and boundaries for agent actions.
-
π Observability - Log, trace, and monitor agent sessions for debugging and quality control.
π οΈ Tools & Frameworks
π€ Coding Agents
| π Tool | π Key Features | π Best For |
|---|---|---|
| Claude Code | Terminal-based, long-running sessions, tool execution, subagents, agent teams | Deep refactoring, complex debugging |
| OpenAI Codex | Multi-agent parallel execution, Codex Agent Loop | Large-scale automation, enterprise |
| GitHub Copilot | IDE integration, chat, agent tasks | Interactive development |
| Cursor | AI-native IDE, AI Tab, Chat, Ctrl+K | Integrated AI development |
-
π Model Selection - OpenAIβs Codex series optimized for code execution. Anthropicβs Claude excels at reasoning. Googleβs Gemini 3.1 offers strong performance at half Claudeβs price. Simon Willison
-
π SWE-Bench Leaders - Claude Opus 4.6 (Thinking) leads with 79.20% on SWE-bench, followed by Gemini 3 Flash (76.20%) and GPT 5.2 (75.40%). Vals AI
ποΈ Agent Frameworks
-
π LangChain - Python-based framework for building agent applications. langchain.com
-
π€ AutoGen (Microsoft) - Multi-agent conversation framework. microsoft.com/autogen
-
π¦ OpenClaw - Open-source autonomous coding agent. GitHub
-
βοΈ CrewAI - Multi-agent orchestration for complex workflows. crewai.com
-
π Spring AI - Enterprise Spring-based agent patterns. Spring
π Model Context Protocol (MCP)
-
π What is MCP - Open standard by Anthropic that defines a unified way for AI agents to connect to external tools, data sources, and services. Like βUSB-C for AI integrations.β Anthropic
-
π οΈ MCP Servers - Pre-built integrations for databases, APIs, and tools. GitHub
-
π¦ MCP Registry - Growing ecosystem of MCP-compatible tools. modelcontextprotocol.io
πΎ Local Models
-
π₯οΈ Ollama - CLI tool for running LLMs locally. ollama.com
-
π± LM Studio - Desktop app for local LLM experimentation. lmstudio.ai
-
π Jan - Privacy-focused local AI. jan.ai
-
π Benefits - Data privacy, no API costs, offline capability, total control. AI Lexicon
π Security & Safety
π¨ OWASP Top 10 for Agentic AI (2026)
- ποΈ Sensitive Data Disclosure - Agents may expose sensitive data through outputs or tool calls
- π Tool Poisoning - Compromised tools inject malicious behavior
- π§ Memory Pollution - Agent context manipulation through injected memories
- π Prompt Injection - External inputs override agent instructions
- π Unbounded Execution - Agents can execute unlimited actions without oversight
- π¦ Dependency Confusion - Agent dependencies can be hijacked
- π¦ Multi-Agent Malware - Agents can spread malicious behavior
- π Code Injection - Agent-generated code contains exploits
- π€ Identity Confusion - Agents impersonate multiple identities
- β‘ Denial of Wallet - Uncontrolled agent resource consumption
π‘οΈ Security Best Practices
- π Least Privilege - Grant agents minimum necessary permissions
- β Input Validation - Sanitize all inputs to agents
- π Audit Logging - Complete traceability of agent actions
- π Secret Management - Never expose credentials to agents unnecessarily
- ποΈ Human Approval - Require human confirmation for sensitive operations
π Observability & Monitoring
π Key Metrics
- β±οΈ Latency - Response time per step and overall task completion
- π° Cost - Token usage and API costs per task
- β Success Rate - Task completion and quality metrics
- π Token Usage - Context window utilization and efficiency
π οΈ Observability Tools
- π LangSmith - LangChainβs observability platform
- π Datadog - AI observability and monitoring
- π OpenTelemetry - Open standard for tracing agents
- π AgentOps - Agent-specific monitoring
π― Production Considerations
- π Trace Tool Calls - Every LLM call, tool execution, and decision needs logging
- πΎ Checkpoint State - Save agent state for recovery and debugging
- π Cost Alerts - Set thresholds to prevent runaway spending
- π Quality Evaluation - Automated assessment of agent outputs
π Key Research & Papers
π¬ Foundational Papers
-
π Agentic Software Engineering: Foundational Pillars and a Research Roadmap - Establishes ASE as a research area, identifies key pillars and future directions.
-
π Toward Agentic Software Engineering Beyond Code - Explores vision, values, and vocabulary for ASE.
-
π Toward an Agentic Infused Software Ecosystem - Argues for rethinking the software ecosystem around AI agents.
-
π LLM-Based Agentic Systems for Software Engineering - Challenges and opportunities in LLM-based multi-agent SE systems.
-
π Trustworthy AI Software Engineers - What it means for AI agents to be considered software engineers.
-
π daVinci-Dev: Agent-native Mid-training for Software Engineering - Training models specifically for agentic software engineering.
π Evaluation Benchmarks
-
π SWE-bench - Software engineering benchmark with production tasks. Claude Opus 4.6 leads at 79.20%.
-
π SWE-bench Pro - More challenging version with 1,865 real repository tasks.
-
π¬ SWE-rebench - Automated pipeline for decontaminated agent evaluation.
π Key Trends (2026)
π From Single Agents to Coordinated Teams
- Complex tasks now span multiple specialized agents working in parallel
- Each agent handles a specific subtask with dedicated context
- Integration and orchestration become critical skills
β±οΈ Long-Running Agents
- Agents can now work autonomously for hours, handling multi-file refactors
- Persistence and state management become essential
- Checkpointing and recovery mechanisms mature
ποΈ Human Oversight Evolution
- From βhuman in the loopβ to βhuman on the loopβ - oversight rather than constant intervention
- Intelligent collaboration - humans focus on decisions, agents handle implementation
- Escalation protocols for ambiguous or high-stakes decisions
π Security-First Architecture
- Agent-generated code introduces new attack surfaces
- Guardrails, sandboxing, and permission systems become standard
- Non-human identities (NHIs) emerge as a security category
π° Cost Management
- Token usage optimization becomes critical
- Prompt caching reduces costs 90% for long sessions
- Budget limits and spending alerts for production agents
π― Practical Next Steps
π§ͺ For Individuals
- π Start with TDD - Agent-friendly tests = better agent output
- π Build your instruction directory - Project conventions, patterns, and rules
- π― Orchestrate, donβt micromanage - Give agents goals, not step-by-step instructions
- π Invest in observability - Agent sessions need logging, tracing, and rollback strategies
- π Keep learning - This space evolves weekly; follow Simon Willison, Anthropic engineering, and arXiv SE research
π’ For Teams
- π Establish Agent Guidelines - Document approved patterns and restrictions
- π Implement Security Gates - Scan agent outputs before production
- π Monitor Costs - Set budgets and track agent spending
- π₯ Create Agent Librarian Role - Maintain context and patterns for the team
- π Iterate on Processes - Learn from each agent interaction
π Bibliography & References
π¬ Research Papers
- Agentic Software Engineering: Foundational Pillars and a Research Roadmap
- Toward Agentic Software Engineering Beyond Code
- Toward an Agentic Infused Software Ecosystem
- LLM-Based Agentic Systems for Software Engineering
- Trustworthy AI Software Engineers
- daVinci-Dev: Agent-native Mid-training for SE
- Agyn: Multi-Agent System for Team-Based SE
- Multi-Agent Coordinated Rename Refactoring
- Configuring Agentic AI Coding Tools
- TDFlow: Agentic Workflows for TDD
π° Articles & Blogs
- Simon Willisonβs Agentic Engineering Patterns
- Writing about Agentic Engineering Patterns
- From Vibe Coding to Agentic Engineering - CTCO
- The Complete Guide to Agentic Coding 2026 - TeamDay
- Claude Code Context Engineering
- Context Engineering: Complete Guide - Morph
- Effective Context Engineering - Anthropic
- How Codex is Built - Pragmatic Engineer
- Claude Code Edges OpenAIβs Codex
- OWASP Top 10 for Agentic Applications
- Agentic AI Security Explained - IBM
- Agentic Workflows Guide - Redis
- AI Observability 2026 - ZeonEdge
- The State of Coding Agents - Feb 2026
- Vibe Coding vs Agentic Engineering - Versatik
- Agentic Engineering Guide - Cosmo Edge
- Building Production AI Agents 2026
π οΈ Tools & Frameworks
- Claude Code
- OpenAI Codex
- GitHub Copilot
- Cursor
- LangChain
- AutoGen
- OpenClaw
- CrewAI
- Ollama
- LM Studio
- Model Context Protocol
π Benchmarks
- ποΈ Last Updated: 2026-03-01
- π Research in Progress - This topic is actively being developed