The AI Capability Map: An Expanded Inventory

You don't get to opt out of commodity AI. That's what "commodity" means: not "cheap" or "boring" but "compulsory." Ivan Illich saw this pattern with electricity, automobiles, schools. The moment something becomes a utility, non-participation becomes deviance. Prasad Prabhakaran's recent Wardley map of enterprise AI capabilities plots where different technologies sit on the evolution axis. The map is useful. But its most important insight is implicit: everything in the Commodity column is no longer a choice.

What follows is an expanded inventory: the original categories, what's missing from each, and the harder question of what the categories themselves fail to capture. The act of mapping shapes what gets mapped. The categories we use determine the investments we make. And some capabilities don't fit the Genesis-to-Commodity axis at all.

Commodity: The Compulsory Floor⚓︎

Currently listed:

Chat interfaces
Embeddings
OCR
Speech-to-text
Summarisation
Translation
Basic LLM usage
Standard ML techniques
Deployment pipelines

What's missing:

The list underweights how much has commoditised in the past eighteen months. Add:

Code completion and generation. GitHub Copilot was magic in 2022. In 2026, every IDE has it. The baseline expectation shifted; developers who don't use AI assistance are now the exception requiring explanation.
Semantic search. Not keyword matching but genuine meaning-based retrieval. This moved from research to API in three years. Anyone still building custom semantic search infrastructure is solving yesterday's problem.
Text-to-image generation. DALL-E felt like science fiction. Now it's a feature in Canva. The capability is table stakes; the differentiation moved elsewhere.
Classification and categorisation. Sentiment analysis, intent classification, topic modelling. These were ML projects requiring data scientists. Now they're prompt templates.
Document parsing and extraction. Beyond OCR to structured extraction from unstructured documents. Invoices, contracts, forms. The hard parts got absorbed into platform features.
Basic question answering. FAQ bots, support deflection, simple lookup queries. The "chatbot" that seemed sophisticated five years ago is now commodity infrastructure.

The compulsion question:

When Prabhakaran writes that these capabilities are "utilities now," he's describing more than market maturity. Ivan Illich observed that commoditisation often marks the moment a tool becomes compulsory. Electricity didn't remain optional. The automobile didn't remain a choice. The school didn't remain one path among many.

Commodity AI is following the same pattern. The question isn't whether your organisation will use chat interfaces and embeddings. You will. The question is whether you'll understand what you're using well enough to know its limits. Commodity status means you can no longer function without it. It also means most users stop understanding how it works. The capability becomes reliable precisely because variance is eliminated, including human variance. The operator becomes a passenger.

This isn't inherently bad. You don't need to understand TCP/IP to use the internet productively. But it does mean that competitive advantage cannot live here. Any organisation still celebrating commodity AI as "transformation" is fighting a war that ended.

Product: The Compression Zone⚓︎

Currently listed:

RAG platforms
Enterprise copilots
Vector databases
Agent frameworks (CrewAI, Bedrock AgentCore, GCP ADK)
Surface observability tooling (Langsmith, Langfuse)
AI factory playbooks

What's missing:

The Product layer is where procurement gets involved, where feature competition matters, where you can actually buy things. It's also where compression is most violent. Add:

Fine-tuning-as-a-service. What required ML teams and GPU clusters is now an API call with a credit card. OpenAI, Anthropic, and a dozen startups offer managed fine-tuning. The capability democratised as the expertise requirements collapsed.
Synthetic data generation. Training data creation moved from research technique to product category. Platforms generate domain-specific synthetic datasets for model training and evaluation. What was custom is becoming purchase order.
AI safety and guardrails tooling. Guardrails AI, NeMo Guardrails, and their competitors. Content filtering, output validation, prompt injection defence. These were custom implementations eighteen months ago. Now they're vendor features.
Retrieval orchestration. Beyond basic RAG to multi-source retrieval, reranking, query decomposition, hybrid search. The patterns stabilised. The products emerged.
Prompt management platforms. Version control, A/B testing, and deployment pipelines specifically for prompts. LangSmith, Promptfoo, and others. Prompt engineering became software engineering; the tooling followed.
AI gateway and proxy services. Routing, fallbacks, rate limiting, cost tracking across multiple LLM providers. Portkey, LiteLLM, and similar. Infrastructure that sits between your application and the models.
Model evaluation platforms. Beyond surface observability to systematic capability testing. Braintrust, Patronus, and others. Evaluation moved from ad-hoc scripts to product category.

Compression dynamics:

Everything in the Product layer is racing toward Commodity. RAG platforms will be table stakes within the year. Vector databases are being eaten from below by Postgres extensions. Agent frameworks are converging on similar patterns; differentiation is shrinking.

This compression creates a brutal dynamic: the moment something becomes buyable, it stops being differentiating. The moment your competitors can procure the same RAG platform, your RAG implementation is no longer a competitive advantage. It's an operational cost. The value migrates upward, toward whatever remains scarce.

The strategic error is treating Product-layer investments as moats. They're not. They're infrastructure. Necessary infrastructure, but infrastructure nonetheless. The question isn't whether to buy these capabilities. The question is how quickly you can absorb them and move your differentiation to where competition hasn't yet commoditised.

Custom: The Complexity Cliff⚓︎

Currently listed:

Multi-agent orchestration
Memory architecture
Evaluation harnesses
Governance automation
Human-agent operating models
Domain knowledge graphs
Context graphs grounded in business meaning

What's missing:

This is where Prabhakaran's analysis is sharpest: the Custom layer is where serious advantage is created and where most organisations underestimate the complexity. The work doesn't feel like plugging tools together. It feels like designing a new kind of system. Add:

Intent architecture. Understanding what users actually want versus what they say. This goes beyond intent classification (commodity) to building systems that model user goals, detect goal shifts, and navigate ambiguity. The difference between a helpful assistant and an annoying chatbot.
Trust calibration systems. Knowing when the AI should be confident and when it should defer. Not just uncertainty quantification but calibrated uncertainty that maps to real-world stakes. Systems that know when to say "I don't know" and mean it.
Domain-specific reasoning patterns. Not just domain knowledge (what facts exist) but domain reasoning (how to think within a domain). How a lawyer reasons about precedent differs from how an underwriter reasons about risk differs from how a clinician reasons about diagnosis. This isn't fine-tuning; it's architecture.
Feedback loop design. How systems improve from production usage without degrading. RLHF got attention, but the harder problem is continuous improvement from implicit feedback in enterprise contexts where you can't simply ask users to rate every response.
Agentic workflow architecture. The actual design of multi-step agent systems, distinct from the frameworks that implement them. How you decompose tasks, handle failures, maintain context across steps, coordinate multiple agents with different capabilities. The framework is Product; the architecture is Custom.
Hybrid human-AI decision systems. Not human-in-the-loop as a checkbox but genuine collaboration design. When does the AI draft and human edit? When does the human draft and AI enhance? When do they work in parallel? The operating model, not the technology.
Explainability pipelines for regulated contexts. Not generic interpretability but explanations that satisfy specific regulatory requirements. What the EU AI Act requires differs from what insurance regulators require differs from what healthcare compliance requires. The explanations must be true, useful, and legally adequate.
Continuous learning architectures. Systems that improve from deployment without retraining from scratch. How do you incorporate new information, correct errors, adapt to distribution shift? This isn't model updates; it's architectural support for ongoing learning.

The complexity cliff:

Most organisations fall off this cliff. They successfully navigate Commodity (use the APIs) and Product (buy the platforms), then assume Custom works the same way. It doesn't. Custom-layer work requires fundamentally different skills, timelines, and expectations.

The talent constraint is severe. Prompt engineers are abundant. Systems architects who can design memory-coherent multi-agent workflows are rare. Domain experts who can model a knowledge graph capturing the actual semantics of insurance underwriting are rarer still. The constraint isn't compute or APIs; it's human expertise.

The time constraint is equally severe. Domain knowledge graphs require six to twelve months of deep modelling work with domain experts. You cannot sprint to a knowledge graph. Organisations expecting quarterly results from Custom-layer investments will be perpetually disappointed.

Prabhakaran's central insight lands here: the demo trap. Systems that work beautifully in controlled environments quietly degrade in production. Not crashes or errors, but something worse: a slow erosion of user confidence while models, prompts, and architecture remain unchanged. The gap between demo success and production reliability is bridged only by architectural maturity in the Custom layer.

Genesis: The Forecasting Horizon⚓︎

Currently listed:

Self-evolving agents
Autonomous collectives
Living knowledge systems
Agent-native organisations
Emergent reasoning architectures

What's missing:

Genesis is research territory. High upside and high uncertainty; important to explore but dangerous to sell as production-ready. The list captures the speculative nature but underweights specific research directions worth watching. Add:

World models. Internal representations of how the environment works, enabling prediction and planning. Beyond pattern matching to causal understanding. The gap between current LLMs and genuine world models is large; closing it would change everything.
Persistent agent identity. Agents that maintain coherent identity, goals, and memory across extended interactions and contexts. Not session state but genuine continuity, and the architectural challenges remain profound.
Cross-modal reasoning. Not multimodal inputs (that's Product) but genuinely integrated reasoning across modalities. Understanding that a diagram and a paragraph describe the same concept. Reasoning that moves fluidly between visual, textual, and structured representations.
Self-directed learning. Agents that identify their own knowledge gaps and seek to fill them. Not fine-tuning on curated data but autonomous exploration of what needs to be learned. The bootstrapping problem is hard.
Causal reasoning. Beyond correlation to genuine causal inference. Understanding that intervening on X affects Y differently than merely observing their correlation. Current models struggle here.
Genuine novelty detection. Knowing when you've encountered something outside your training distribution. Not just low confidence but recognition that the situation is genuinely new. Prerequisite for reliable operation in open-ended environments.
Multi-agent economies. Agents transacting value with each other, coordinating through market-like mechanisms, developing specialisation and trade. Game theory meets artificial intelligence. The alignment challenges compound.
Artificial general intelligence. The elephant in the room. Systems that match or exceed human cognitive abilities across domains. Whether this is years away or decades away or structurally impossible remains contested. But it belongs on any honest Genesis list.

The forecasting problem:

Genesis defies forecasting. By definition, these capabilities have undefined characteristics. Breakthrough timelines are unpredictable. Confident predictions about when self-evolving agents will work are not expert opinions; they're speculation dressed as expertise.

The strategic posture for Genesis is option value: small exploration bets, no production promises, watching for discontinuities. The organisation that ignores Genesis will be blindsided when breakthroughs occur. The organisation that bets heavily on Genesis will waste resources on things that don't materialise for decades. Neither extreme is wise.

Augustine observed that the future doesn't exist yet; what we call expectation is present consciousness of what we anticipate. The Genesis layer exists in collective expectation, not in deployable capability. Treat it accordingly.

Beyond the Axis: What the Categories Can't See⚓︎

The Wardley map is useful. It is also a particular lens that illuminates certain things while obscuring others. Some capabilities don't fit the Genesis-to-Commodity axis because they're not capabilities in the same sense. They're structural features of organisations, relationships, or contexts that determine how capabilities create value. Missing them is like having a detailed parts list but no understanding of the machine.

Organisational learning capacity.

The meta-capability underlying all others. Not what AI capabilities you have but how quickly you can absorb new capabilities when they emerge. Mary Midgley warned against "machine-worship": the tendency to trust technology without understanding it. An organisation that treats AI as commodity consumption actively degrades its capacity to understand, adapt, and judge. The moat isn't what you own but what you comprehend.

This doesn't appear on capability maps because it's not a capability you can acquire. It's a practice you develop. The organisation that has spent three years learning how its software should respond to edge cases in Australian workplace regulations has built something that cannot be purchased from an API. Not because the API couldn't generate the same tokens, but because the judgment about which tokens are valuable is embedded in institutional memory.

What does learning capacity look like in practice? It's visible in specific behaviours. The organisation with high learning capacity runs structured experiments rather than pilots that quietly disappear. It maintains documentation that captures not just what was built but why alternatives were rejected. It rotates people through AI projects so knowledge spreads rather than concentrating in a few specialists who become bottlenecks. It distinguishes between "this didn't work" and "we don't yet understand why this didn't work" and treats them as different situations requiring different responses.

The organisation with low learning capacity has a different signature. Each new AI initiative starts from scratch because previous learnings weren't captured. Vendor evaluations repeat the same mistakes because evaluation criteria were never formalised. The same edge cases surprise the team repeatedly because failure modes weren't documented. Technical debt accumulates because architectural decisions were made under pressure and never revisited.

The gap between these organisations widens over time. When GPT-4 arrived, organisations with learning capacity had frameworks for evaluating new models: benchmark suites, integration patterns, risk assessment templates. They absorbed the new capability in weeks. Organisations without learning capacity started from zero, repeating discovery processes they'd already done for GPT-3.5. When Claude improved, when Gemini emerged, the pattern repeated. The learning organisations pulled further ahead with each wave.

This is why "just use the APIs" is insufficient advice. Two organisations can use identical APIs and achieve radically different outcomes. The difference isn't in the capabilities they access but in the institutional capacity to understand, integrate, and improve. That capacity compounds. Its absence compounds too.

Cosmotechnical diversity.

Yuk Hui's insight: the evolution axis assumes all techniques develop along a single universal trajectory. But what appears "commoditised" in one context may operate completely differently in another, not from backwardness but because the technical-moral order demands different forms of stabilisation.

This sounds abstract until you examine specific cases. Consider how AI systems handle authority and evidence across different regulatory environments. A US-built enterprise AI might treat user autonomy as the primary value: provide information, let the user decide. A system built for European contexts might embed different assumptions about institutional responsibility and duty of care. A system designed for contexts with strong guild traditions might require human expert validation as an architectural feature, not a compliance checkbox.

These aren't superficial differences papered over with localisation. They're different answers to the question: what is this technology for? An AI assistant in an American legal context might be designed to empower individual lawyers to work faster. The same category of tool in a German context might be designed to support systematic quality control across a firm. Both are "legal AI." They're not the same thing.

The practical implication: the Silicon Valley default—move fast, scale globally, localise later—may be structurally unsuited to AI systems that embed assumptions about human-machine relations. An AI system designed around one set of assumptions about authority and evidence cannot be trivially "localised" into a context with different assumptions. The assumptions are architectural, not surface-level.

This creates both risk and opportunity. The risk: treating the dominant AI paradigm as universal and discovering too late that it conflicts with local technical-moral orders. The opportunity: building AI systems that embody different conceptions of authority, evidence, and action. These may find markets that the Silicon Valley default cannot serve well, not because those markets are backward but because they have different requirements that the dominant paradigm ignores.

The vernacular path.

Illich's counterpoint to commoditisation: vernacular capabilities that resist standardisation not by being immature but by design. Competitive advantage through competence creation rather than dependency creation. Users stay because they've grown more capable, not because they can't leave.

This inverts the "lock-in" strategy embedded in most enterprise AI thinking. The L3 Platform Intelligence goal of "lock-in" assumes dependency is the moat. The vernacular alternative suggests retention through genuine value creation. These are different business models requiring different architectures.

The distinction becomes concrete when you examine how AI systems handle expertise over time. A dependency-creating system makes the user less capable without the tool. The AI handles complexity so the user doesn't have to. Over time, the user's own judgment atrophies. They become passengers. Switching costs increase because the user has lost skills they once had.

A competence-creating system makes the user more capable, with or without the tool. The AI explains its reasoning. It teaches patterns. It highlights edge cases the user should learn to recognise. Over time, the user's judgment improves. They become better practitioners. Switching costs are lower in one sense—the user could function without the tool—but retention is higher because the tool is genuinely valuable.

The business models look different too. Dependency-creating systems optimise for usage metrics: daily active users, sessions per day, queries per session. The system is working when users use it constantly. Competence-creating systems optimise for outcome metrics: quality of decisions made, reduction in errors, speed of skill acquisition. The system is working when users make better decisions, whether or not those decisions involve the tool.

This isn't merely ethical preference. It's strategic positioning. In regulated industries—healthcare, finance, law—dependency-creating AI faces structural headwinds. Regulators are uncomfortable with systems that reduce human judgment. Professional bodies resist tools that de-skill their members. The vernacular path aligns with regulatory direction rather than fighting it.

Institutional trust and reputation.

A capability that doesn't evolve along the Wardley axis at all: the accumulated trust that determines whether anyone will use your AI capabilities in the first place. This is particularly acute in high-stakes domains where errors are expensive and visible.

Consider two organisations offering identical AI capabilities for medical diagnosis support. One has decades of reputation in healthcare software, regulatory relationships, clinical validation infrastructure, and a track record of responsible behaviour when things go wrong. The other is a startup with better technology and no history. The capabilities may be equivalent. The willingness of health systems to deploy them is not.

Trust doesn't commoditise. It isn't purchased. It accumulates slowly through consistent behaviour and can collapse rapidly through single failures. An organisation that has built trust over years has an asset that new entrants cannot replicate regardless of their technical capabilities.

This creates a specific strategic pattern: established players with trust can deploy AI capabilities that would be too risky for newcomers. They can move into higher-stakes applications because stakeholders will tolerate failures from trusted parties that they won't tolerate from unknown ones. The startup with better technology but no reputation is constrained to lower-stakes applications where trust matters less.

The inverse is also true. Organisations that deploy AI carelessly and suffer public failures may find their trust assets depleted. The technology investment remains, but the ability to deploy it in high-value contexts is impaired. Trust is an asset that appears on no balance sheet but constrains strategy more than most assets that do.

Integration depth.

Another dimension invisible to capability maps: how deeply AI is woven into existing workflows, systems, and data. Two organisations can have identical AI capabilities yet face completely different replacement costs based on integration depth.

Surface integration means the AI capability could be swapped with modest effort. The system calls an API; a different API could be called instead. Integration depth means the AI capability is embedded in data structures, workflow assumptions, training processes, and organisational habits. Replacement would require re-architecting systems, retraining people, and rebuilding processes.

This is distinct from capability maturity. A commodity capability can be deeply integrated. A custom capability can be shallowly integrated. The Wardley axis captures whether something is standardised; integration depth captures how embedded it is in your specific context.

Integration depth creates switching costs that exist independently of capability lock-in. Even if a better capability emerges and commoditises, migration costs may make switching irrational. The organisation isn't locked in because the capability is scarce. It's locked in because the integration is dense.

Strategically, this suggests two moves. For your own capabilities: pursue integration depth that creates switching costs for competitors trying to displace you. For competitors' capabilities: target surface integrations that can be displaced rather than deep integrations that are practically immovable regardless of capability superiority.

Temporal strategy.

Augustine again: the capability map presents itself as spatial (positions on an axis) but conceals the fundamental temporal question. What the strategist experiences looking at a Wardley map is threefold: memory of how previous technologies evolved, attention to emerging capabilities, and expectation that today's Custom becomes tomorrow's Commodity.

The enterprise's true strategic asset may not be position relative to capabilities but the quality of its memory and attention. Most enterprises suffer from scattered memory: fragments of past investments poorly integrated, lessons learned then forgotten. The enterprise with gathered memory can hold past, present, and future in creative tension. This is a practice rather than a capability.

What does gathered memory look like operationally? It's specific practices, not vague aspirations. It means maintaining decision logs that capture not just what was decided but what alternatives were considered and why they were rejected. When the context changes, you can revisit those decisions with the reasoning intact rather than starting from scratch.

It means conducting structured retrospectives that extract transferable lessons from projects, not just project-specific post-mortems. The lesson "our RAG implementation struggled with regulatory documents" is project-specific. The lesson "regulatory documents require citation-level retrieval, not passage-level, because users need to verify exact sources" is transferable.

It means building institutional knowledge bases that are actually maintained and actually consulted. Not documentation that exists to satisfy compliance requirements but living resources that inform decisions. The test is whether people check the knowledge base before making decisions, not whether the knowledge base exists.

Scattered memory has its own signatures. The same mistakes recur because failure modes weren't documented. Technical choices are re-litigated because the reasoning behind previous choices was lost. New team members spend months rediscovering context that should have been explicit. Vendors make pitches that were already rejected, because the rejection reasoning wasn't preserved.

The temporal dimension matters because AI capabilities are evolving rapidly. What was impossible becomes possible; what was expensive becomes cheap; what was cutting-edge becomes commodity. The organisation with gathered memory can track these shifts systematically. It knows what it wanted to do but couldn't, and it notices when constraints lift. The organisation with scattered memory rediscovers the same opportunities repeatedly, often too late.

Using This Inventory⚓︎

The expanded lists are meant to be reference material. Screenshot them. Return to them when making investment decisions. Use them to pressure-test vendor claims.

But the deeper value is in the questions the inventory raises:

For Commodity: Are we celebrating things here as transformation? If so, we're fighting the last war. What understanding are we losing as we consume these capabilities without comprehension?

For Product: How quickly is compression happening? What we're buying today will be table stakes soon. Where should our differentiation live when it is?

For Custom: Are we underestimating complexity? The demo trap is real. Architectural maturity in this layer is the difference between clever systems and trustworthy systems. Do we have the talent and timeline to build here seriously?

For Genesis: Are we maintaining option value without over-investing? Small exploration bets, not production promises.

Beyond the axis: What capabilities can't we map because they're not capabilities at all? Learning capacity, temporal strategy, cosmotechnical diversity. These may matter more than any item on the lists.

The organisations that will lead won't be the ones with the most tools. They'll be the ones with the clearest understanding of maturity and the courage to invest deliberately.

Sources⚓︎

Prasad Prabhakaran, "It Worked in Demos. It Drifted in Reality." (2026): Medium/AI Monks
Simon Wardley, Wardley Mapping methodology and maturity frameworks
Ivan Illich on commoditisation and conviviality
Yuk Hui on cosmotechnics and technical diversity
Mary Midgley on "machine-worship" and uncritical technological enthusiasm
Augustine on interior temporality and expectation