Home > Topics > Software Development and Coding

๐Ÿค–๐Ÿงญ Agentic Software Engineering

๐Ÿค– AI Summary

  • ๐Ÿ“ High-Level Summary: Agentic Software Engineering (ASE) represents a fundamental paradigm shift in how software is conceived, developed, and maintained. It leverages AI agents capable of autonomous action, tool use, and iterative problem-solving to amplify human software engineering capabilities. ASE encompasses the study of patterns, tools, frameworks, and best practices for effectively collaborating with AI coding agents. The discipline is evolving rapidly, with new patterns, tools, and research emerging weekly. ๐Ÿง ๐Ÿ’ป๐Ÿš€

  • ๐Ÿ”‘ Key Concepts: The field is characterized by long-running agent sessions with tool execution capabilities, prompt engineering for agents, multi-agent orchestration, human-in-the-loop oversight, context engineering, and the development of agent-specific software engineering practices distinct from traditional human-centric approaches. The core shift is from โ€œcode is expensiveโ€ to โ€œcode is cheap nowโ€ - focusing human effort on architecture, quality, and integration rather than implementation details. โšก๐Ÿ“‰


๐Ÿง  Mental Models

๐ŸŒŸ The Spectrum of AI-Assisted Development

  • ๐ŸŒฑ AI Prototyping - Fast, exploratory, prompt-driven. Great for learning and proving concepts. CTCO
  • ๐ŸŽฏ Directed AI Assistance - Specify constraints, reference patterns, define success upfront. Tool is a lever; youโ€™re in control. CTCO
  • ๐Ÿ‘ฅ Agent Orchestration - Split work across multiple agents, run in parallel, integrate results. Like managing a small team. CTCO
  • ๐Ÿ—๏ธ Agentic Engineering - Build persistent workflows with context, guardrails, and quality gates. Stay accountable for security and delivery. Simon Willison

๐Ÿ’ก Core Principles

  • ๐Ÿฐ Code is Cheap Now - The fundamental mental shift: writing code has become nearly free. This changes everything about how we estimate, plan, and execute. Focus on architecture, quality, and integration - the decisions that require human judgment. Simon Willison

  • โš–๏ธ SE for Humans vs SE for Agents - Agentic SE introduces a fundamental duality: traditional human-centric development and new agent-centric workflows. Each requires different tools, processes, and artifacts. arXiv

  • ๐Ÿ”„ From Vibe Coding to Agentic Engineering - โ€œVibe codingโ€ (using AI without attention to code, often by non-programmers) contrasts with โ€œagentic engineeringโ€ (professional software engineers using AI to amplify their expertise). The latter involves rigorous methodology, quality gates, and accountability. Simon Willison

๐ŸŽฏ The Agentic Engineer Role

  • ๐Ÿง‘โ€๐Ÿ’ป Architect - High-level system design and decision-making

  • ๐Ÿ” Quality Controller - Code review, standards enforcement, catching what agents miss

  • ๐Ÿ“š Context Manager - Maintaining documentation and context that makes agents effective

  • ๐ŸŽฏ Strategist - Deciding what to build and how to approach it

  • ๐Ÿ”‘ Key Insight: You need to be able to do what the agent does, and recognize when itโ€™s gone wrong. Spot subtle bugs, security gaps, and maintainability issues. CTCO


๐Ÿ”ง How-To Guidance

๐Ÿงช Test-Driven Development for Agents

  • ๐Ÿ“ Red/Green TDD - Write tests first, confirm they fail, then implement. Test-first development helps agents write more succinct, reliable code with minimal prompting. Simon Willison

  • ๐Ÿ”ด Force Broken Tests - Require agents to write failing tests before implementation to ensure they understand requirements. Atomic Object

  • โ–ถ๏ธ Real Data Testing - Use production-like data to gain clarity on edge cases. Atomic Object

  • โธ๏ธ Human-in-the-Loop - Pause the AI and run tests yourself during development cycles. Atomic Object

๐Ÿ“ Context Engineering

  • ๐Ÿ—‚๏ธ CLAUDE.md Files - Project-specific instructions, conventions, and rules that persist across sessions. Anthropic

  • ๐Ÿ“‹ Instruction Directories - Maintain persistent context files that give agents project-specific rules, patterns, and conventions. The more context you give, the better the outputs. CTCO

  • ๐Ÿ“ฅ Just-in-Time Loading - Load relevant context only when needed to avoid overwhelming the context window. Morph

  • ๐Ÿšซ .claudeignore - Exclude irrelevant files from agent context to improve focus and reduce noise. Morph

  • ๐Ÿ”€ Subagent Isolation - Delegate subtasks to isolated subagents to prevent context pollution. Morph

๐Ÿงฉ Problem Decomposition

  • ๐Ÿ—๏ธ Break Work Into Chunks - When orchestrating agents, break work into independent chunks. Each agent needs clear context about its piece and how it connects to others. CTCO

  • ๐Ÿ“ฆ Batch Related Work - Group similar tasks together to maximize context efficiency. Aditya Bawankule

  • ๐Ÿ”„ Parallel Worktrees - Use Git worktrees for parallel work without conflicts. Aditya Bawankule

๐Ÿ™‹ Human Oversight

  • ๐Ÿ‘๏ธ Confirm Before Acting - Set explicit confirmation requirements for destructive or irreversible actions. Simon Willison

  • ๐Ÿ›ก๏ธ Guardrails - Implement safety checks and boundaries for agent actions.

  • ๐Ÿ“Š Observability - Log, trace, and monitor agent sessions for debugging and quality control.


๐Ÿ› ๏ธ Tools & Frameworks

๐Ÿค– Coding Agents

๐Ÿ† Tool๐Ÿ”‘ Key Features๐Ÿ“Š Best For
Claude CodeTerminal-based, long-running sessions, tool execution, subagents, agent teamsDeep refactoring, complex debugging
OpenAI CodexMulti-agent parallel execution, Codex Agent LoopLarge-scale automation, enterprise
GitHub CopilotIDE integration, chat, agent tasksInteractive development
CursorAI-native IDE, AI Tab, Chat, Ctrl+KIntegrated AI development
  • ๐Ÿ“ˆ Model Selection - OpenAIโ€™s Codex series optimized for code execution. Anthropicโ€™s Claude excels at reasoning. Googleโ€™s Gemini 3.1 offers strong performance at half Claudeโ€™s price. Simon Willison

  • ๐Ÿ“Š SWE-Bench Leaders - Claude Opus 4.6 (Thinking) leads with 79.20% on SWE-bench, followed by Gemini 3 Flash (76.20%) and GPT 5.2 (75.40%). Vals AI

๐Ÿ—๏ธ Agent Frameworks

  • ๐Ÿ”— LangChain - Python-based framework for building agent applications. langchain.com

  • ๐Ÿค AutoGen (Microsoft) - Multi-agent conversation framework. microsoft.com/autogen

  • ๐Ÿฆž OpenClaw - Open-source autonomous coding agent. GitHub

  • โš™๏ธ CrewAI - Multi-agent orchestration for complex workflows. crewai.com

  • ๐ŸŒ Spring AI - Enterprise Spring-based agent patterns. Spring

๐Ÿ”Œ Model Context Protocol (MCP)

  • ๐Ÿ”Œ What is MCP - Open standard by Anthropic that defines a unified way for AI agents to connect to external tools, data sources, and services. Like โ€œUSB-C for AI integrations.โ€ Anthropic

  • ๐Ÿ› ๏ธ MCP Servers - Pre-built integrations for databases, APIs, and tools. GitHub

  • ๐Ÿ“ฆ MCP Registry - Growing ecosystem of MCP-compatible tools. modelcontextprotocol.io

๐Ÿ’พ Local Models

  • ๐Ÿ–ฅ๏ธ Ollama - CLI tool for running LLMs locally. ollama.com

  • ๐Ÿ“ฑ LM Studio - Desktop app for local LLM experimentation. lmstudio.ai

  • ๐Ÿ  Jan - Privacy-focused local AI. jan.ai

  • ๐Ÿ”’ Benefits - Data privacy, no API costs, offline capability, total control. AI Lexicon


๐Ÿ”’ Security & Safety

๐Ÿšจ OWASP Top 10 for Agentic AI (2026)

  1. ๐Ÿ—๏ธ Sensitive Data Disclosure - Agents may expose sensitive data through outputs or tool calls
  2. ๐Ÿ”„ Tool Poisoning - Compromised tools inject malicious behavior
  3. ๐Ÿง  Memory Pollution - Agent context manipulation through injected memories
  4. ๐ŸŽญ Prompt Injection - External inputs override agent instructions
  5. ๐Ÿ”“ Unbounded Execution - Agents can execute unlimited actions without oversight
  6. ๐Ÿ“ฆ Dependency Confusion - Agent dependencies can be hijacked
  7. ๐Ÿฆ  Multi-Agent Malware - Agents can spread malicious behavior
  8. ๐Ÿ’‰ Code Injection - Agent-generated code contains exploits
  9. ๐Ÿ‘ค Identity Confusion - Agents impersonate multiple identities
  10. โšก Denial of Wallet - Uncontrolled agent resource consumption

OWASP

๐Ÿ›ก๏ธ Security Best Practices

  • ๐Ÿ” Least Privilege - Grant agents minimum necessary permissions
  • โœ… Input Validation - Sanitize all inputs to agents
  • ๐Ÿ“ Audit Logging - Complete traceability of agent actions
  • ๐Ÿ”’ Secret Management - Never expose credentials to agents unnecessarily
  • ๐Ÿ‘๏ธ Human Approval - Require human confirmation for sensitive operations

๐Ÿ“Š Observability & Monitoring

๐Ÿ“ˆ Key Metrics

  • โฑ๏ธ Latency - Response time per step and overall task completion
  • ๐Ÿ’ฐ Cost - Token usage and API costs per task
  • โœ… Success Rate - Task completion and quality metrics
  • ๐Ÿ”„ Token Usage - Context window utilization and efficiency

๐Ÿ› ๏ธ Observability Tools

  • ๐Ÿ“Š LangSmith - LangChainโ€™s observability platform
  • ๐Ÿ“ˆ Datadog - AI observability and monitoring
  • ๐Ÿ” OpenTelemetry - Open standard for tracing agents
  • ๐Ÿ“‰ AgentOps - Agent-specific monitoring

๐ŸŽฏ Production Considerations

  • ๐Ÿ“ Trace Tool Calls - Every LLM call, tool execution, and decision needs logging
  • ๐Ÿ’พ Checkpoint State - Save agent state for recovery and debugging
  • ๐Ÿ“‹ Cost Alerts - Set thresholds to prevent runaway spending
  • ๐Ÿ”” Quality Evaluation - Automated assessment of agent outputs

๐Ÿ“š Key Research & Papers

๐Ÿ”ฌ Foundational Papers

๐Ÿ“Š Evaluation Benchmarks

  • ๐Ÿ† SWE-bench - Software engineering benchmark with production tasks. Claude Opus 4.6 leads at 79.20%.

  • ๐Ÿ“ˆ SWE-bench Pro - More challenging version with 1,865 real repository tasks.

  • ๐Ÿ”ฌ SWE-rebench - Automated pipeline for decontaminated agent evaluation.


๐Ÿ”„ From Single Agents to Coordinated Teams

  • Complex tasks now span multiple specialized agents working in parallel
  • Each agent handles a specific subtask with dedicated context
  • Integration and orchestration become critical skills

โฑ๏ธ Long-Running Agents

  • Agents can now work autonomously for hours, handling multi-file refactors
  • Persistence and state management become essential
  • Checkpointing and recovery mechanisms mature

๐Ÿ‘๏ธ Human Oversight Evolution

  • From โ€œhuman in the loopโ€ to โ€œhuman on the loopโ€ - oversight rather than constant intervention
  • Intelligent collaboration - humans focus on decisions, agents handle implementation
  • Escalation protocols for ambiguous or high-stakes decisions

๐Ÿ”’ Security-First Architecture

  • Agent-generated code introduces new attack surfaces
  • Guardrails, sandboxing, and permission systems become standard
  • Non-human identities (NHIs) emerge as a security category

๐Ÿ’ฐ Cost Management

  • Token usage optimization becomes critical
  • Prompt caching reduces costs 90% for long sessions
  • Budget limits and spending alerts for production agents

๐ŸŽฏ Practical Next Steps

๐Ÿงช For Individuals

  1. ๐Ÿ“š Start with TDD - Agent-friendly tests = better agent output
  2. ๐Ÿ“ Build your instruction directory - Project conventions, patterns, and rules
  3. ๐ŸŽฏ Orchestrate, donโ€™t micromanage - Give agents goals, not step-by-step instructions
  4. ๐Ÿ“Š Invest in observability - Agent sessions need logging, tracing, and rollback strategies
  5. ๐Ÿ“š Keep learning - This space evolves weekly; follow Simon Willison, Anthropic engineering, and arXiv SE research

๐Ÿข For Teams

  1. ๐Ÿ“‹ Establish Agent Guidelines - Document approved patterns and restrictions
  2. ๐Ÿ”’ Implement Security Gates - Scan agent outputs before production
  3. ๐Ÿ“Š Monitor Costs - Set budgets and track agent spending
  4. ๐Ÿ‘ฅ Create Agent Librarian Role - Maintain context and patterns for the team
  5. ๐Ÿ”„ Iterate on Processes - Learn from each agent interaction

๐Ÿ“– Bibliography & References

๐Ÿ“š Books in This Vault

๐Ÿ“บ Videos in This Vault

๐Ÿ”ฌ Research Papers

๐Ÿ“ฐ Articles & Blogs

๐Ÿ› ๏ธ Tools & Frameworks

๐Ÿ“Š Benchmarks


  • ๐Ÿ—“๏ธ Last Updated: 2026-03-02
  • ๐Ÿ”„ Research in Progress - This topic is actively being developed