DISCOVER
Software development has two scoreboards and a process between them. The first measures deployment: how fast code ships, how often it breaks, how quickly you recover. The second measures the organization: how fast signals reach the right person, how autonomous the response is. Comprehension — understanding the system well enough to decide and act correctly — sits between them.
The progression follows a logic that Simon Wardley mapped in his work on technology evolution: every capability that becomes commodity accelerates everything that depends on it, exposing the next constraint beneath.
The Compound Pattern⚓︎
Wardley's evolution framework tracks how capabilities migrate from novel and poorly understood to standardized and invisible: genesis, custom-built, product, commodity. At each transition, the newly commoditized capability accelerates everything that depends on it, and that acceleration makes the next bottleneck visible — the one that was always there but hidden behind the louder problem in front of it.
Cloud computing commoditized infrastructure, which made CI/CD possible at scale, which commoditized deployment, which made continuous delivery standard. Machine learning inference commoditized, which enabled code generation. Code generation is commoditizing now. I traced this specific chain in The AI Capability Map earlier this year. Each capability follows the same migration path through Wardley's stages, and each transition enables the next.
DORA: The Shipping Layer⚓︎
The first layer was deployment. DORA captured it with four metrics that gave the industry a shared vocabulary for engineering performance, anchored by Google's annual State of DevOps reports:
| Metric | Measures |
|---|---|
| Deployment Frequency | How often code ships to production |
| Lead Time for Changes | Time from commit to running in production |
| Change Failure Rate | Percentage of deployments causing failures |
| Mean Time to Restore | How fast the system recovers from failures |
Every DORA metric presupposes that someone upstream decided correctly what to build. DORA tracks the journey from commit to production with real precision, but it says nothing about whether the decision was right, how the organization sensed the signal that triggered it, or how efficiently humans and machines coordinated to act on it. When deployment speed stops being the constraint, DORA keeps scoring a solved problem.
MOVE: The Organizational Layer⚓︎
When shipping became table stakes, the constraint shifted to the organization itself: how fast do signals reach the right person, and how much of the response runs without human intervention?
I developed MOVE to capture this layer — four metrics that measure the speed and reliability of an organization's operating model rather than its deployment pipeline:
| Metric | Measures |
|---|---|
| Momentum | Signal-to-first-action time — how long work waits before anyone touches it |
| Orchestration | Percentage of workflows designed to run autonomously through AI |
| Velocity | Action-to-outcome time — how long work takes once started |
| Exception | Percentage of orchestrated workflows requiring unplanned human intervention |
An organization where code deploys in minutes but signals take days to reach the right person has a MOVE problem, and no amount of DORA optimization will fix it because the constraint isn't the pipeline.
Two deeper shifts drive the urgency. Value in technology organizations is migrating from the people doing the work to the infrastructure enabling it: workforce sizes hold roughly steady while economic output concentrates in the tooling and platform layer. And every role's contribution is now measured against what an AI system can achieve as an alternative. Together these pressures make organizational speed existential rather than merely operational.
The Comprehension Bottleneck⚓︎
DORA scores deployment. MOVE scores organizational responsiveness and autonomy. Comprehension — the work of understanding the system well enough to decide correctly — sits between them.
Tudor Girba and Simon Wardley named this the comprehension bottleneck in Rewilding Software Engineering and identified two variables that score it.
Time to Question (TTQ) is how fast you formulate a specific, actionable question about the system. Most questions in organizations are too broad to drive action. "Why is retention dropping?" contains too many possible answers to be useful. The distance between that question and a precise, testable one — "which onboarding step has the highest drop-off in the first 48 hours?" — is Time to Question made visible.
Time to Answer (TTA) is how fast you extract a verified answer once you have a good question. Tracing a single variable through a data pipeline might take person-weeks of manual inspection. Asking "what services depend on this table?" might require days of reading code across repositories because the dependency graph lives on a whiteboard drawn from memory.
When Time to Answer drops — when getting a verified answer takes minutes rather than weeks — more questions become affordable. More questions reveal more of the system's landscape. More landscape yields sharper hypotheses, and sharper hypotheses target higher-value answers. The flywheel is self-reinforcing: cheap answers accelerate the system knowledge that produces good questions, and good questions target answers that reveal territory you didn't know existed, what Girba and Wardley call the "adjacent unexplored."
Expensive answers produce conservative questions. When tracing a dependency takes two weeks of engineer time, teams only ask questions they're confident will pay off. This filters out the exploratory questions — the ones that would reveal the system's actual structure. An organization can spend years working around a problem that two people with the right contextual tools would resolve in a month, because no one could afford the question that would have surfaced it.
Organizations can deploy fast (DORA) and respond fast (MOVE) but discover slowly. Slow discovery is the bottleneck between organizational response and deployment — the gap between what MOVE measures and what DORA measures.
DISCOVER⚓︎
What happens when the comprehension bottleneck itself starts to commoditize?
When AI agents can explore a codebase in seconds, map dependencies, trace execution paths, and answer structural questions that would take an engineer days, Time to Answer drops by orders of magnitude. Time to Question follows, because cheaper answers enable sharper questions. The comprehension flywheel accelerates, and the constraint shifts from understanding the system to designing the environments that let agents understand and act on it autonomously.
The work here looks different. It involves designing the constraints, feedback loops, and verification systems that allow autonomous agents to do reliable work. OpenAI's internal experiment illustrates the shift: a million lines of code, zero lines manually written. Their biggest lesson was that early progress was slow because the environment was underspecified. Engineering time went entirely into the harness: repository structure, architectural constraints encoded as linters, agent-accessible observability, knowledge bases organized for machine discovery. The fix was always the same question — what capability is missing, and how do we make it legible and enforceable for the agent?
Code generation commoditized, and now the process of building software is being industrialized. Human contribution keeps migrating upstream: from writing code, to reviewing code, to specifying what code should do, to designing the system that converts specifications into verified deployments.
DORA and MOVE are scoreboards — DORA for the deployment pipeline, MOVE for organizational responsiveness and autonomy. DISCOVER is the process that connects what MOVE measures at intake to what DORA measures at output. It operationalizes the comprehension bottleneck: the process by which a vague signal becomes a specific question (TTQ), and a specific question becomes a verified answer running in production (TTA).
Detect through Scope is where TTQ collapses. Contract through Verify is where TTA collapses.
| Phase | Step | What Happens |
|---|---|---|
| TTQ | Detect | A signal arrives: error log, feature request, usage anomaly, customer feedback |
| Investigate | Explore the relevant system, map dependencies, understand current state | |
| Scope | Refine vague intent into a specific, testable question | |
| TTA | Contract | Write a structured spec with acceptance criteria and scope boundaries |
| Operate | Multi-agent competitive implementation against the spec | |
| Verify | Layered verification: guardrails, tests, AI review, browser checks, adversarial testing | |
| Ship | Execute | Ship with feature flags, canary deployment, and automatic rollback |
| Learn | Reflect | Monitor production, extract learnings, feed signals back into the next cycle |
The table above uses software delivery as the primary example, but the pattern is not limited to code. Any organizational function with a comprehension bottleneck between signal and action follows the same structure. A cost anomaly arrives (Detect), gets investigated against infrastructure usage data (Investigate), gets refined from "spending seems high" into "which three services drove the 40% increase in compute costs last quarter?" (Scope), gets contracted into a report spec (Contract), gets generated and cross-checked by agents (Operate, Verify), gets distributed to stakeholders (Execute), and feeds back into what the organization watches next (Reflect). The same TTQ/TTA collapse applies: when the time to formulate the right financial question and extract a verified answer drops from weeks to hours, the organization's capacity to act on cost signals changes categorically, not incrementally. Research, compliance, documentation, operations — anywhere the bottleneck is comprehension rather than execution, DISCOVER describes the process and TTQ/TTA score it.
Three Loops⚓︎
Chris Argyris and Donald Schön described three levels of organizational learning; Gregory Bateson mapped the same territory from the ecology of mind. Each level is a different logical type from the one below.
| Loop | Question | What It Means |
|---|---|---|
| Single | Am I doing this right? | Optimize deployment. Ship faster, fail less, recover quicker. DORA scores this. |
| Double | Am I doing the right thing? | Optimize the organization. Sense the right signals, route work correctly, increase autonomy. MOVE scores this. |
| Triple | How do I decide what's right? | Design and improve the process that connects organizational response to deployment. DISCOVER is that process. TTQ and TTA score it. |
Each loop is the meta-work of its corresponding framework. Single-loop is the meta-work of DORA: optimizing deployment speed, hitting a plateau, and responding by optimizing harder. The organizational structure goes unquestioned. Double-loop is the meta-work of MOVE: questioning not just how fast you ship, but whether the whole signal-routing system is designed for the work you actually do. The change required is structural, not procedural.
The triple loop is the meta-work of DISCOVER: designing the process that converts signals into verified deployments, and asking whether that process is getting better at getting better. This means questioning whether DISCOVER's own categories are right — whether "signal → deployment" is always the correct pipeline shape, whether the phases capture the actual work or just the work you expected, whether your measurement apparatus (TTQ, TTA, the metrics stacked beneath them) is producing useful knowledge or just comfortable numbers. It means studying where the process fails not at the level of individual cycles but at the level of design: which signals never enter the pipeline because Detect can't see them, which questions never get asked because Scope is calibrated to the wrong domain, which answers are technically verified but operationally meaningless.
When the time to formulate a precise question and extract a verified answer drops from weeks to seconds, the entire decision cycle accelerates, and the work of designing that decision cycle — questioning its assumptions, reshaping its categories, deciding what it should measure — becomes the primary engineering challenge.
The Loss Function Paradox⚓︎
Each layer of the compound pattern pushes human work one step further from the material. Engineers wrote code, then reviewed AI-generated code, then wrote specifications that AI implements, then designed the systems that convert specifications into verified deployments. The trajectory is consistent: with each step, fewer people make higher-leverage decisions further upstream.
Follow this trajectory far enough and you arrive at what sounds like a simple endpoint: humans define what the system should optimize for, and autonomous systems handle everything downstream. In machine learning the term for this is the loss function — the objective that the system minimizes or maximizes. In organizational terms it means a small group setting direction while agents handle execution, verification, and even the design of new execution pipelines. A board of directors where there used to be an engineering team.
This sounds clean. The problem is that setting good objectives — choosing the right loss function — requires exactly the kind of deep, sustained contact with the system that each layer of automation progressively removes.
Consider a concrete case. An engineering leader who once reviewed every pull request understood the codebase intimately: its architectural grain, its load-bearing abstractions, where the technical debt lived and what it cost. That understanding informed which features to prioritize, which bets to make, which compromises to accept. Automate code review and the leader gains leverage but loses contact with the code's texture. Automate implementation and the distance grows. Automate specification and the leader is now setting objectives for a system they understand through dashboards and summaries rather than through the direct experience of building.
The musician analogy sharpens this. A musician who automates production, distribution, and composition assistance eventually faces the question "what music should exist?" and finds that answering it well requires the kind of relationship with the instrument, the audience, and the craft that automation was designed to relieve. The knowledge needed to set direction comes from working close to the material, and each layer of abstraction moves you further from it.
But this framing assumes the only path to judgment runs through the material. A bricklayer doesn't need to know how to fire clay. A Huf Haus eliminates traditional construction skills entirely and replaces them with something different: the ability to specify constraints, design for machine assembly, think in systems rather than in materials. The skills change; whether they degrade is a separate question. The conductor who has never played the oboe still shapes how the oboe sounds in the orchestra. The judgment is real, but it operates on relationships between parts rather than on the parts themselves.
Maybe the paradox isn't that abstraction destroys the knowledge you need to set direction, but that it produces a different kind of knowledge — powerful for coordination and design, blind to the texture that hands-on work reveals. The question becomes whether that blindness matters, and for which decisions. An engineering leader who understands the system through dashboards and agent outputs rather than through pull requests may develop judgment about system dynamics, failure modes, and architectural patterns that the hands-on engineer never sees. Different vantage point, different blindness, different insight.
The skill you need most at the top of the stack — judgment about what's worth doing — isn't destroyed by abstraction, but it is transformed by it. Systems designers develop their own form of judgment, operating on relationships between parts rather than on the parts themselves. The Huf Haus architect's judgment is real — it just operates on different material. Beneath "set the loss function" lies "develop the capacity to know what matters," and that capacity is shaped by the level at which you engage: close to the material, or at the level of systems.
Every operational era develops its own measurement language. Manufacturing brought throughput and defect rates. DevOps brought DORA. The AI-native organization needs MOVE. The triple loop is the ongoing work of designing and improving DISCOVER — the process that connects organizational response to deployment. The judgment needed to do that work well comes from the same place the Huf Haus architect's judgment comes from: sustained, serious engagement with the material, at the level of systems rather than at the level of bricks. The material changes. The requirement for engagement with it does not.
Concretely: the engineer who once developed judgment by reading code now develops it by designing the constraints that agents work within, by studying where verification fails and why, by tracing the patterns in TTQ and TTA that reveal which parts of the system are well-understood and which remain opaque. The knowledge is different — it's knowledge about how comprehension happens rather than knowledge about how code works — but it requires the same sustained contact, the same willingness to sit with the system long enough to see what others miss. The Huf Haus architect doesn't touch bricks, but they spend years learning which joints fail under load, which materials expand in heat, which assembly sequences produce reliable structures. That knowledge comes from engagement, not from abstraction. The triple loop demands the same discipline applied to a different material.
Sources⚓︎
- Simon Wardley, Wardley Maps (2016-ongoing): learnwardleymapping.com
- Tudor Girba and Simon Wardley, Rewilding Software Engineering (feenk, 2025-2026): medium.com/feenk
- Chris Argyris and Donald Schon, Organizational Learning: A Theory of Action Perspective (Addison-Wesley, 1978)
- Gregory Bateson, Steps to an Ecology of Mind (Chandler Publishing, 1972)
- OpenAI, "Harness Engineering" (2026): openai.com
- Google DORA Team, Accelerate State of DevOps Report (annual)
- Ashvin Parameswaran, "MOVE: Metrics for the AI-Native Organization" (2026): ashvin.au/blog/move-metrics
- Ashvin Parameswaran, "The Decision Loop" (2026): ashvin.au/blog/decision-loop
- Ashvin Parameswaran, "The AI Capability Map" (2026): ashvin.au/blog/the-ai-capability-map
- Ashvin Parameswaran, "The Phantom Limb Economy" (2026): ashvin.au/blog/phantom-limb-economy
- Ashvin Parameswaran, "The Replacement Rate" (2026): ashvin.au/blog/replacement-rate