Home > Topics > Software Development and Coding

🤖🧭 Agentic Software Engineering

topics-agentic-software-engineering

🤖 AI Summary

📝 High-Level Summary: Agentic Software Engineering (ASE) represents a fundamental paradigm shift in how software is conceived, developed, and maintained. It leverages AI agents capable of autonomous action, tool use, and iterative problem-solving to amplify human software engineering capabilities. ASE encompasses the study of patterns, tools, frameworks, and best practices for effectively collaborating with AI coding agents. The discipline is evolving rapidly, with new patterns, tools, and research emerging weekly. 🧠💻🚀
🔑 Key Concepts: The field is characterized by long-running agent sessions with tool execution capabilities, prompt engineering for agents, multi-agent orchestration, human-in-the-loop oversight, context engineering, and the development of agent-specific software engineering practices distinct from traditional human-centric approaches. The core shift is from “code is expensive” to “code is cheap now” - focusing human effort on architecture, quality, and integration rather than implementation details. ⚡📉

🧠 Mental Models

🌟 The Spectrum of AI-Assisted Development

🌱 AI Prototyping - Fast, exploratory, prompt-driven. Great for learning and proving concepts. CTCO
🎯 Directed AI Assistance - Specify constraints, reference patterns, define success upfront. Tool is a lever; you’re in control. CTCO
👥 Agent Orchestration - Split work across multiple agents, run in parallel, integrate results. Like managing a small team. CTCO
🏗️ Agentic Engineering - Build persistent workflows with context, guardrails, and quality gates. Stay accountable for security and delivery. Simon Willison

💡 Core Principles

🏰 Code is Cheap Now - The fundamental mental shift: writing code has become nearly free. This changes everything about how we estimate, plan, and execute. Focus on architecture, quality, and integration - the decisions that require human judgment. Simon Willison
⚖️ SE for Humans vs SE for Agents - Agentic SE introduces a fundamental duality: traditional human-centric development and new agent-centric workflows. Each requires different tools, processes, and artifacts. arXiv
🔄 From Vibe Coding to Agentic Engineering - “Vibe coding” (using AI without attention to code, often by non-programmers) contrasts with “agentic engineering” (professional software engineers using AI to amplify their expertise). The latter involves rigorous methodology, quality gates, and accountability. Simon Willison

🎯 The Agentic Engineer Role

🧑‍💻 Architect - High-level system design and decision-making
🔍 Quality Controller - Code review, standards enforcement, catching what agents miss
📚 Context Manager - Maintaining documentation and context that makes agents effective
🎯 Strategist - Deciding what to build and how to approach it
🔑 Key Insight: You need to be able to do what the agent does, and recognize when it’s gone wrong. Spot subtle bugs, security gaps, and maintainability issues. CTCO

🔧 How-To Guidance

🧪 Test-Driven Development for Agents

📝 Red/Green TDD - Write tests first, confirm they fail, then implement. Test-first development helps agents write more succinct, reliable code with minimal prompting. Simon Willison
🔴 Force Broken Tests - Require agents to write failing tests before implementation to ensure they understand requirements. Atomic Object
▶️ Real Data Testing - Use production-like data to gain clarity on edge cases. Atomic Object
⏸️ Human-in-the-Loop - Pause the AI and run tests yourself during development cycles. Atomic Object

📁 Context Engineering

🗂️ CLAUDE.md Files - Project-specific instructions, conventions, and rules that persist across sessions. Anthropic
📋 Instruction Directories - Maintain persistent context files that give agents project-specific rules, patterns, and conventions. The more context you give, the better the outputs. CTCO
📥 Just-in-Time Loading - Load relevant context only when needed to avoid overwhelming the context window. Morph
🚫 .claudeignore - Exclude irrelevant files from agent context to improve focus and reduce noise. Morph
🔀 Subagent Isolation - Delegate subtasks to isolated subagents to prevent context pollution. Morph

🧩 Problem Decomposition

🏗️ Break Work Into Chunks - When orchestrating agents, break work into independent chunks. Each agent needs clear context about its piece and how it connects to others. CTCO
📦 Batch Related Work - Group similar tasks together to maximize context efficiency. Aditya Bawankule
🔄 Parallel Worktrees - Use Git worktrees for parallel work without conflicts. Aditya Bawankule

🙋 Human Oversight

👁️ Confirm Before Acting - Set explicit confirmation requirements for destructive or irreversible actions. Simon Willison
🛡️ Guardrails - Implement safety checks and boundaries for agent actions.
📊 Observability - Log, trace, and monitor agent sessions for debugging and quality control.

🛠️ Tools & Frameworks

🤖 Coding Agents

🏆 Tool	🔑 Key Features	📊 Best For
Claude Code	Terminal-based, long-running sessions, tool execution, subagents, agent teams	Deep refactoring, complex debugging
OpenAI Codex	Multi-agent parallel execution, Codex Agent Loop	Large-scale automation, enterprise
GitHub Copilot	IDE integration, chat, agent tasks	Interactive development
Cursor	AI-native IDE, AI Tab, Chat, Ctrl+K	Integrated AI development

📈 Model Selection - OpenAI’s Codex series optimized for code execution. Anthropic’s Claude excels at reasoning. Google’s Gemini 3.1 offers strong performance at half Claude’s price. Simon Willison
📊 SWE-Bench Leaders - Claude Opus 4.6 (Thinking) leads with 79.20% on SWE-bench, followed by Gemini 3 Flash (76.20%) and GPT 5.2 (75.40%). Vals AI

🏗️ Agent Frameworks

🔗 LangChain - Python-based framework for building agent applications. langchain.com
🤝 AutoGen (Microsoft) - Multi-agent conversation framework. microsoft.com/autogen
🦞 OpenClaw - Open-source autonomous coding agent. GitHub
⚙️ CrewAI - Multi-agent orchestration for complex workflows. crewai.com
🌐 Spring AI - Enterprise Spring-based agent patterns. Spring

🔌 Model Context Protocol (MCP)

🔌 What is MCP - Open standard by Anthropic that defines a unified way for AI agents to connect to external tools, data sources, and services. Like “USB-C for AI integrations.” Anthropic
🛠️ MCP Servers - Pre-built integrations for databases, APIs, and tools. GitHub
📦 MCP Registry - Growing ecosystem of MCP-compatible tools. modelcontextprotocol.io

💾 Local Models

🖥️ Ollama - CLI tool for running LLMs locally. ollama.com
📱 LM Studio - Desktop app for local LLM experimentation. lmstudio.ai
🏠 Jan - Privacy-focused local AI. jan.ai
🔒 Benefits - Data privacy, no API costs, offline capability, total control. AI Lexicon

🔒 Security & Safety

🚨 OWASP Top 10 for Agentic AI (2026)

🗝️ Sensitive Data Disclosure - Agents may expose sensitive data through outputs or tool calls
🔄 Tool Poisoning - Compromised tools inject malicious behavior
🧠 Memory Pollution - Agent context manipulation through injected memories
🎭 Prompt Injection - External inputs override agent instructions
🔓 Unbounded Execution - Agents can execute unlimited actions without oversight
📦 Dependency Confusion - Agent dependencies can be hijacked
🦠 Multi-Agent Malware - Agents can spread malicious behavior
💉 Code Injection - Agent-generated code contains exploits
👤 Identity Confusion - Agents impersonate multiple identities
⚡ Denial of Wallet - Uncontrolled agent resource consumption

OWASP

🛡️ Security Best Practices

🔐 Least Privilege - Grant agents minimum necessary permissions
✅ Input Validation - Sanitize all inputs to agents
📝 Audit Logging - Complete traceability of agent actions
🔒 Secret Management - Never expose credentials to agents unnecessarily
👁️ Human Approval - Require human confirmation for sensitive operations

📊 Observability & Monitoring

📈 Key Metrics

⏱️ Latency - Response time per step and overall task completion
💰 Cost - Token usage and API costs per task
✅ Success Rate - Task completion and quality metrics
🔄 Token Usage - Context window utilization and efficiency

🛠️ Observability Tools

📊 LangSmith - LangChain’s observability platform
📈 Datadog - AI observability and monitoring
🔍 OpenTelemetry - Open standard for tracing agents
📉 AgentOps - Agent-specific monitoring

🎯 Production Considerations

📝 Trace Tool Calls - Every LLM call, tool execution, and decision needs logging
💾 Checkpoint State - Save agent state for recovery and debugging
📋 Cost Alerts - Set thresholds to prevent runaway spending
🔔 Quality Evaluation - Automated assessment of agent outputs

📚 Key Research & Papers

🔬 Foundational Papers

📄 Agentic Software Engineering: Foundational Pillars and a Research Roadmap - Establishes ASE as a research area, identifies key pillars and future directions.
📄 Toward Agentic Software Engineering Beyond Code - Explores vision, values, and vocabulary for ASE.
📄 Toward an Agentic Infused Software Ecosystem - Argues for rethinking the software ecosystem around AI agents.
📄 LLM-Based Agentic Systems for Software Engineering - Challenges and opportunities in LLM-based multi-agent SE systems.
📄 Trustworthy AI Software Engineers - What it means for AI agents to be considered software engineers.
📄 daVinci-Dev: Agent-native Mid-training for Software Engineering - Training models specifically for agentic software engineering.

📊 Evaluation Benchmarks

🏆 SWE-bench - Software engineering benchmark with production tasks. Claude Opus 4.6 leads at 79.20%.
📈 SWE-bench Pro - More challenging version with 1,865 real repository tasks.
🔬 SWE-rebench - Automated pipeline for decontaminated agent evaluation.

📈 Key Trends (2026)

🔄 From Single Agents to Coordinated Teams

Complex tasks now span multiple specialized agents working in parallel
Each agent handles a specific subtask with dedicated context
Integration and orchestration become critical skills

⏱️ Long-Running Agents

Agents can now work autonomously for hours, handling multi-file refactors
Persistence and state management become essential
Checkpointing and recovery mechanisms mature

👁️ Human Oversight Evolution

From “human in the loop” to “human on the loop” - oversight rather than constant intervention
Intelligent collaboration - humans focus on decisions, agents handle implementation
Escalation protocols for ambiguous or high-stakes decisions

🔒 Security-First Architecture

Agent-generated code introduces new attack surfaces
Guardrails, sandboxing, and permission systems become standard
Non-human identities (NHIs) emerge as a security category

💰 Cost Management

Token usage optimization becomes critical
Prompt caching reduces costs 90% for long sessions
Budget limits and spending alerts for production agents

🎯 Practical Next Steps

🧪 For Individuals

📚 Start with TDD - Agent-friendly tests = better agent output
📁 Build your instruction directory - Project conventions, patterns, and rules
🎯 Orchestrate, don’t micromanage - Give agents goals, not step-by-step instructions
📊 Invest in observability - Agent sessions need logging, tracing, and rollback strategies
📚 Keep learning - This space evolves weekly; follow Simon Willison, Anthropic engineering, and arXiv SE research

🏢 For Teams

📋 Establish Agent Guidelines - Document approved patterns and restrictions
🔒 Implement Security Gates - Scan agent outputs before production
📊 Monitor Costs - Set budgets and track agent spending
👥 Create Agent Librarian Role - Maintain context and patterns for the team
🔄 Iterate on Processes - Learn from each agent interaction

📖 Bibliography & References

📚 Books in This Vault

🤖⚙️ The Agentic AI Engineer’s Handbook - Distills essential principles and actionable methodologies for designing, developing, and deploying robust agentic AI systems.
🤖⚙️ Agentic Artificial Intelligence - Argues agentic AI is the most significant tech revolution since the GUI, with early adopters gaining compounding intelligence advantages.
🤖🧠⚙️💡 Building Agentic AI Systems - Provides a roadmap for developing AI agents that operate independently, make decisions, and adapt to dynamic environments.
🤖🏗️ AI Engineering: Building Applications with Foundation Models - Comprehensive guide focusing on practical application of pre-trained models to build real-world AI products.
🤖💻 Vibe Coding - The definitive manifesto for building production-grade software with GenAI, shifting developer focus from syntax to intent.
🤖⚙️ AI Agents in Action - Provides a proven framework for developing practical agents that handle real-world business and personal tasks.
✨🤖🔗🐍 Generative AI with LangChain - Hands-on guide to building LLM applications and multi-agent orchestration using Python and LangGraph.
💻✍️ The Art of Prompt Engineering with ChatGPT - Accessible practical introduction to prompt engineering for ChatGPT, moving beyond simple queries.
⌨️🤖 Prompt Engineering for LLMs - Technical frameworks for structuring effective AI inputs to get the best results from LLMs.

📺 Videos in This Vault

🤖🗣️🔮✨ AI Talks: Gen AI 2026 - Covers the shift from basic code completion to agentic engineering, targeting 30-70% efficiency gains.
🤖💻📈⬇️2️⃣ The 5 Levels of AI Coding - Explains the five levels of AI coding capability, from autocomplete to fully autonomous agents.
👨‍💻➡️🤖🧩 Beyond the IDE - Argues that merging becomes the bottleneck when coding is automated, requiring new coordination patterns.
🖼️🤔🛠️🤖 Context Engineering for Agents - Categorizes context into instructions, memories, few-shot examples, tools, knowledge, and environmental feedback.
🤖🔗⬆️✅ 12-Factor Agents - Draws parallels between reliable agent design and the original 12-factor app methodology.
🧠🛠️🕸️🚫 No Vibes Allowed - Emphasizes the RPI (Research, Plan, Implement) workflow to enforce deliberate System 2 thinking in agents.
🤖🧠⚙️👩‍💻 AI Engineering with Chip Huyen - Argues AI engineering focuses on product development leveraging existing capabilities through APIs rather than building models.

bagrounds.org

Table of Contents

🤖🧭 Agentic Software Engineering

🤖 AI Summary

🧠 Mental Models

🌟 The Spectrum of AI-Assisted Development

💡 Core Principles

🎯 The Agentic Engineer Role

🔧 How-To Guidance

🧪 Test-Driven Development for Agents

📁 Context Engineering

🧩 Problem Decomposition

🙋 Human Oversight

🛠️ Tools & Frameworks

🤖 Coding Agents

🏗️ Agent Frameworks

🔌 Model Context Protocol (MCP)

💾 Local Models

🔒 Security & Safety

🚨 OWASP Top 10 for Agentic AI (2026)

🛡️ Security Best Practices

📊 Observability & Monitoring

📈 Key Metrics

🛠️ Observability Tools

🎯 Production Considerations

📚 Key Research & Papers

🔬 Foundational Papers

📊 Evaluation Benchmarks

📈 Key Trends (2026)

🔄 From Single Agents to Coordinated Teams

⏱️ Long-Running Agents

👁️ Human Oversight Evolution

🔒 Security-First Architecture

💰 Cost Management

🎯 Practical Next Steps

🧪 For Individuals

🏢 For Teams

📖 Bibliography & References

📚 Books in This Vault

📺 Videos in This Vault

🔬 Research Papers

📰 Articles & Blogs

🛠️ Tools & Frameworks

📊 Benchmarks

Graph View

Backlinks