Thin to Thick⚓︎
The hidden design challenge in every AI tool.
Everyone building AI tools is solving the same problem, and most don't realize it.
The visible problem is capability: can your tool write code, analyze data, automate workflows? The hidden problem is structure: how do users move from "I tried ChatGPT once" to "this tool understands how I work"? What primitives do you give them? How do those primitives grow?
Scroll through LinkedIn and you'll find practitioners mapping their journey:
- Casual use — random questions, sporadic interactions
- Power user — saved prompts, custom instructions
- Packager — building reusable workflows
- Chaos — too many tools that don't talk to each other
- Workspace need — craving a single place where everything lives
The stages feel true. They also encode assumptions worth excavating. What theory of progression does this taxonomy assume? What happens to users who don't follow the path?
The Proliferation of Primitives⚓︎
Every major AI platform has invented its own vocabulary for user-facing abstractions. OpenAI has custom instructions, GPTs, assistants. Anthropic's Claude Code has skills, commands, hooks, agents, MCP servers, plugins. Microsoft's Copilot has plugins and extensions. Each taxonomy reflects design decisions about what users need and when they need it.
The proliferation signals something unresolved. If everyone is inventing different primitives, either the design space is genuinely multidimensional (many valid approaches) or we haven't found the right abstractions yet (still searching). Probably both.
Look closer at any taxonomy and you find hidden premises. The LinkedIn stages assume progression is linear: you start casual, you end with a workspace. But some users stay casual forever and go deep rather than broad. Others jump straight to building, skip the power-user phase entirely, and create sophisticated tooling before they've internalized best practices. The stages describe a path, but users take many paths.
Claude Code's primitives assume users can distinguish between automatic behavior (skills that activate based on context), explicit invocation (commands you type), deterministic control (hooks that guarantee certain actions), parallel delegation (agents that work alongside you), external integration (MCP servers), and distribution (plugins that package everything for sharing). That's six categories. Each makes sense in isolation. Together they form a taxonomy that experienced users navigate fluently and new users find bewildering.
The question isn't which taxonomy is right. The question is what problem the taxonomy is trying to solve, and whether more categories actually solve it.
Excavating Claude Code⚓︎
A close reading of Claude Code's primitives reveals the design logic underneath. This isn't critique; it's archaeology. What did the designers assume about users?
Skills are the most interesting primitive because of how much they can contain. At base, a skill activates automatically based on context. When you say "write a blog post," the relevant skill loads without you asking. Skills are markdown files that can reference other files, accumulating resources and examples. The assumption: users want "always-on" expertise that doesn't require explicit invocation. The skill should know when it's needed.
From the start, skills were designed as containers for more than just instructions. A skill directory can contain not just the SKILL.md file but executable scripts: Python formatters, shell scripts, automation that fires when the skill activates. The skill becomes a container for behavior, not just knowledge. It can hold reference documents, example files, configuration, and code that executes. The boundary between "instructions for the model" and "automation that runs alongside the model" was blurred by design. A sufficiently developed skill resembles an application more than a prompt.
Commands are explicit. You type /deploy or /review and something happens. Commands live in their own directory (.claude/commands/), separate from skills. They're saved prompts, shortcuts for repeated tasks. The assumption: some behaviors should only happen when requested. The user stays in control of when.
Skills and commands are parallel primitives, not nested. Both can exist in a project; both can be personal or project-scoped. The relationship is compositional rather than hierarchical: an agent might invoke a skill and trigger a command, but neither contains the other. Some practitioners organize commands thematically alongside related skills, but this is convention, not architecture.
Hooks are deterministic. They fire on specific events: before a tool runs, when permissions are requested, when a session ends. Unlike skills and commands, hooks don't rely on the language model choosing to act. They guarantee behavior. The assumption: some things must happen reliably, not probabilistically.
Hooks represent a fundamentally different philosophy than skills and commands. Skills trust the model to decide when to activate; hooks don't trust the model at all. A hook that validates file changes before they're written provides a hard constraint that no amount of prompt engineering can override. The determinism is the point. Some behaviors must happen regardless of what the model decides.
This creates a design tension. Skills are flexible but unreliable; they fire when context seems right, which means they sometimes fire wrong. Hooks are reliable but inflexible; they fire on specific events, which means they can't adapt to nuance. The user who wants "usually do X, but be smart about when" has to choose between a skill that's sometimes wrong and a hook that can't be smart. There's no primitive that occupies the middle ground: probabilistically reliable, flexibly deterministic.
Agents work in parallel. They can invoke skills and commands, orchestrating complex tasks by delegating to specialized workers. The assumption: sophisticated work requires coordination across multiple capabilities.
Agents are where composition becomes visible. An agent can spawn sub-agents, invoke skills, trigger commands, and coordinate the results. The power is real: a well-designed agent system can handle tasks that would overwhelm a single model context. The complexity is also real: debugging an agent that spawned three sub-agents, each of which invoked different skills, requires understanding four layers of context that the user didn't directly create.
MCP servers handle external integration. They connect Claude to databases, APIs, third-party tools through a standard protocol. The assumption: AI tools need to reach outside themselves, and that reaching should follow a common interface.
MCP represents an interesting design choice: standardize the integration layer rather than the capability layer. Any external tool that implements the protocol becomes available to any Claude surface. The standard is the primitive; specific integrations are instances. This is the opposite of the skill approach, where each capability is a custom artifact. MCP trades flexibility for interoperability.
Plugins package everything else. A plugin bundles skills, commands, hooks, agents, and MCP configurations into something a team can share. The assumption: configurations worth building are worth distributing.
Plugins are the meta-primitive, the packaging layer. They solve the distribution problem that plagues every other primitive: how does something one person built become available to everyone? Without plugins, every team copies files, adapts configurations, maintains their own forks. With plugins, best practices propagate through installation rather than reimplementation.
But plugins also reveal the cost of taxonomic complexity. A plugin bundles skills, commands, hooks, agents, and MCP configurations together. The distribution problem is solved, but the comprehension problem remains. What does this plugin actually do? The answer requires understanding six primitive types and how they relate.
Here's what makes this architecture confusing: skills, commands, and agents are structurally identical. They're all markdown files containing instructions. The difference isn't format; it's purpose and activation mode. When should this fire? Who initiates it? Can it invoke other things?
The structural similarity creates constant confusion. Users ask: "Should this be a skill or a command?" The answer depends on activation preference, not on what the thing does. Users ask: "Is this an agent or a skill with sub-agents?" The answer depends on whether you want parallel execution, not on the task's nature. The primitives are distinguished by when and how they run, not by what they are. This is probably correct from an engineering perspective and confusing from a user perspective.
The surfaces complicate matters further. Claude Code runs as a CLI. Claude Desktop runs as an app. The web interface and API offer different capabilities again. A skill that works in one surface may not port to another. Hooks that fire in the CLI may not exist in Desktop. MCP servers available in one context may be absent in another. The primitives aren't truly portable across contexts; they're bound to their environment. A user who develops sophisticated workflows in Claude Code discovers that moving to Desktop means rebuilding, not porting.
One axis cuts through the confusion: active vs passive. Hooks are passive; they react to events. Commands are active; users invoke them explicitly. Skills can be either—some auto-activate based on context, others wait to be called by name. This distinction isn't prominent in the documentation, but it shapes everything about how users experience the system. The passive primitives require trust: trust that they'll fire when needed, trust that they won't fire when not needed. The active primitives require memory: remember to invoke them, remember what they're called, remember which one applies to this situation. Different cognitive loads for different activation modes.
Thin and Thick⚓︎
In 1973, anthropologist Clifford Geertz borrowed a distinction from philosopher Gilbert Ryle that clarifies something important about primitives.
The example: a boy's eyelid moves. Thin description records the physical event. Eyelid contracts. Thick description asks what it means. Is the movement an involuntary twitch? A conspiratorial wink at a friend? A parody of someone else's wink, mocking their habit of winking? The same physical action carries entirely different significance depending on the layers of context and intention you can read into it.
Thin description is portable but context-free. You can transfer "eyelid contracts" anywhere; it means the same thing (which is to say, almost nothing). Thick description is rich with meaning but requires knowing the situation, the players, the history. The thickness isn't decoration. It's what makes the description useful for understanding what actually happened.
Geertz was writing about how anthropologists should study culture. But the distinction applies directly to AI primitives.
A raw prompt is thin description. "Summarize this document" travels anywhere, works with any model, carries no accumulated context. It means the same thing every time, which limits how much it can mean.
A skill with resources, examples, learned patterns, and embedded context is thick description. It knows what "summarize" means for you, in your domain, given your preferences. The thickness makes it powerful. It also makes it harder to transfer, harder to share, harder to explain to someone who doesn't share the context.
Every AI tool designer takes a position on this spectrum, often without realizing it. The design question isn't where to position—it's when. When does thickness arrive?
The premature thickness trap: Some tools demand thick configuration upfront. Before you can use the system, you must specify your preferences, define your workflows, architect your structure. Adoption stalls because the cost of entry is too high. Users who don't yet know what they need can't articulate what they need.
The permanent thinness trap: Other tools stay thin forever. ChatGPT's custom instructions are just text. They don't learn, don't accumulate, don't grow more sophisticated through use. The tool never gets better at being your tool. Every session starts from the same baseline.
The interesting design space is between these failure modes. Primitives that start thin and thicken through use.
The Practice Effect⚓︎
David Brin's 1984 science fiction novel The Practice Effect imagines a world where thermodynamics runs backward for human-made objects. Use something, and it improves. A crude flint knife, handled daily, gradually becomes razor-sharp steel. Leave it alone, and it reverts to crudeness. Objects don't wear out through use—they wear in. They become more themselves, more fitted to their purpose, through the accumulated practice of being used.
It's a thought experiment about entropy reversal. But it's also a design principle hiding in plain sight.
What if AI primitives worked this way? Start thin: a text file, a simple prompt, an auto-detected pattern. Easy entry. Low commitment. The equivalent of Brin's rough flint. Then thicken through use. Context accretes. The system notices patterns in how you invoke the primitive, what works, what fails. The blade sharpens. If the primitive gets abandoned, it gracefully thins again, decays in relevance, doesn't become legacy cruft you're afraid to delete.
This reframes the design challenge. Instead of asking "what categories of primitive do we need?" ask "how does a primitive grow from thin to thick through practice?" Instead of demanding users architect upfront, let structure emerge from use.
Two mechanisms for thickening: active (the user explicitly configures, adds examples, specifies preferences) and passive (the system observes successful interactions and extracts patterns). Most tools offer only active thickening. The user must do the work of making things thick. Passive thickening—the system learning from use—is harder to build but closer to Brin's vision. The flint doesn't need you to explain how to become a knife. It learns from being used as one.
Lines of Flight⚓︎
Every taxonomy assumes a path. Stage 1 → Stage 2 → Stage 3. Casual → Power User → Packager. The arrows imply progression is linear and the destination is singular.
Gilles Deleuze (writing with Félix Guattari) offers a counter-concept: lines of flight. Within any structured system, there are vectors of escape—paths that lead somewhere the taxonomy didn't predict. These aren't failures of the system. They're its creative potential.
The user who stays at Stage 1 forever but becomes extraordinarily deep within that simplicity. The "skill" that evolves through use into something its designers never anticipated, repurposed for a task no one imagined. The primitive that gets hijacked, bent, made to do something orthogonal to its intended function.
Deleuze also gave us difference and repetition: the insight that repetition is never identical. Every time you invoke a skill, it's similar to the last invocation but never the same. Different context, different inputs, different user state. Through that repetition-with-difference, new patterns emerge that weren't designed, only enabled.
This matters for primitive design because it warns against over-specifying the path. If your taxonomy assumes linear progression, you'll design for that assumption and miss the users doing something more interesting. The primitives that enable lines of flight are the ones loose enough to be repurposed, thin enough to be thickened in unexpected directions.
A primitive that only does what it was designed to do is a primitive that will eventually be abandoned when users' needs evolve. A primitive that can become something else—that's the one that survives.
The Composition Question⚓︎
When software engineers hear "composition," they reach for familiar patterns. Functions composing into pipelines. Objects delegating to other objects. Microservices calling microservices. Decades of abstraction design. Is primitive composition just this?
The difference matters.
Code-level composition is about implementation. How developers structure systems internally. The user never sees it; they experience the resulting behavior.
Primitive-level composition is about user-facing abstractions. How end users—who may not be developers—combine capabilities. When a Claude Code agent invokes a skill which triggers a hook, the user is composing. But they're composing concepts, not code. The abstractions must make sense to someone who's never thought about dependency injection or interface segregation.
This is where most AI tools fail the composition test. They offer primitives that engineers compose beautifully but that normal users experience as a wall of configuration. The composition is real, but it's hidden behind complexity that only developers navigate fluently.
What legible composition looks like:
Clear activation modes, first. When does this primitive fire? The user should be able to answer without reading documentation. If the answer requires understanding three other concepts, you've already lost most users.
Intuitive relationships matter too. Skills and commands are peers, not parent-child. When users compose them, the hierarchy should follow how they think about capability and specificity, not how engineers think about code organization. When the mental model matches the implementation model, composition becomes learnable.
Visible chains. When primitives invoke other primitives, users can see what called what. The composition isn't hidden in logs they'll never check. If something fails three layers deep, they can trace the path.
Self-explaining failures, finally. When composition breaks, the error says "the skill tried to invoke X but couldn't because Y," not a stack trace that requires programming knowledge to parse. The user who encounters an error should understand it in terms of the concepts they used, not the implementation details they never learned.
The OOP parallel isn't wrong. But primitive composition is OOP for the user interface of AI itself. The user is the developer, and they're developing with concepts, not code.
Second-Mover Lessons⚓︎
If you're building a new AI-native system now, you have the advantage of watching the first movers. What does the archaeology reveal?
What to learn:
Composition enables power. Claude Code's primitives can invoke each other—agents calling skills calling commands. This composability is more powerful than isolated capabilities. Design for combination, not just creation.
Activation modes matter more than categories. The passive/active distinction—does this fire automatically or wait for invocation?—determines adoption patterns more than what you call the primitive. Users learn activation modes faster than they learn taxonomies.
Packaging solves distribution. Plugins as a meta-primitive for sharing configurations. Without a packaging story, every team reinvents what other teams already built. With it, best practices propagate.
Surfaces constrain primitives. A primitive that only works in one environment (CLI but not desktop, desktop but not API) creates fragmentation. Design for the least capable surface you must support, or accept that your primitives won't be portable.
What to avoid:
Taxonomic explosion. Every new category adds cognitive overhead. Six primitives may already be too many for most users. Before adding a seventh, ask whether an existing primitive could grow to cover the use case.
Structural similarity with semantic difference. If skills, commands, and agents are all markdown files, users will conflate them. Either differentiate by structure (commands are YAML, skills are markdown) or accept that users will remain confused about which is which.
Thickness-upfront requirements. Don't make users architect before they've experimented. The user who must specify everything before trying anything is a user who often doesn't try at all. Let structure emerge through use.
Ignoring the plateau. The progression from casual user to power user to packager isn't universal. Many users plateau—happily—at Stage 2. They use saved prompts, have custom instructions, and never want more. Design for them too. A tool that only rewards the power path alienates the majority.
Designing for Emergence⚓︎
Consider an alternative approach: one primitive that does the work of many.
A playbook that starts as almost nothing. Auto-detected from context, or a blank slate the user names. No upfront configuration required. The user begins interacting; the system begins learning. No ceremony, no architecture, no decisions that feel like commitments.
The playbook accumulates through use. Each successful interaction can contribute a bullet—an atomic piece of knowledge about what works. "When the user asks about X, they usually want Y." "This API returns dates in format Z." "Avoid suggesting W; the user rejected it three times."
The bullets score themselves through feedback. Helpful bullets get used more, ranked higher, surfaced more often. Harmful bullets decay. The playbook doesn't just grow; it learns what growth is valuable.
This sounds simple. The design questions hiding underneath are not.
Active and Passive Thickening⚓︎
How does thickness arrive? Two paths, and the choice between them shapes everything.
Active thickening puts the user in control. You add a bullet explicitly: "When I ask for a summary, I want three paragraphs maximum." You edit existing bullets: "Actually, make that five paragraphs for technical documents." You delete bullets that no longer serve: "I've changed my mind about the formatting." The playbook thickens because you made it thicken. The thickness reflects your intentions, articulated and recorded.
Active thickening has the virtue of transparency. You know what's in the playbook because you put it there. You can inspect, modify, explain. The thickness is yours in a way that feels like ownership. It also has the vice of demanding labor. You must notice patterns worth capturing, articulate them clearly, remember to record them. Most users don't. The premature thickness trap reappears in disguise: instead of configuring upfront, you're expected to configure continuously, and the users who won't do either end up with playbooks that never thicken at all.
Passive thickening puts the system in control. The model observes your interactions, notices patterns, extracts what seems to work. You didn't tell it that you prefer three-paragraph summaries; it noticed you asked for rewrites every time a summary ran longer. It adds the bullet without being asked. The playbook thickens through observation rather than instruction.
Passive thickening has the virtue of actually happening. Users don't need to remember to configure; configuration emerges from use. The Practice Effect made real: the knife sharpens itself through cutting. It also has the vice of opacity. What did the system learn? Why did it learn that? Users who inspect their playbooks find bullets they didn't author, patterns they didn't consciously create, knowledge that came from somewhere but not from intentional input. The thickness is real but feels uncanny, like finding notes in your own handwriting that you don't remember writing.
The design question isn't which mechanism to choose. Both have failure modes. The question is how they interact.
Consider a playbook that thickens passively but surfaces its learning actively. The system extracts a pattern; before adding it as a bullet, it asks: "I noticed you often want X. Should I remember this?" The user confirms or declines. Passive observation, active confirmation. The thickness emerges from behavior but requires consent before becoming permanent.
Or consider the inverse: active input with passive refinement. The user adds a bullet: "Prefer concise responses." The system observes that "concise" means something different for code review than for email drafts. It splits the bullet, refines the pattern, creates thickness the user initiated but didn't fully specify. Active input, passive elaboration.
These hybrids matter because pure active thickening doesn't happen and pure passive thickening feels invasive. The playbook that works is one where the labor is distributed: the system does what it can observe, the user does what requires intention, and the interface between them is clear enough that both know who's responsible for what.
The Curator Problem⚓︎
Not all thickness is valuable. A playbook that remembers everything becomes useless, buried under the weight of accumulated context that's no longer relevant, was never important, or contradicts itself because the user's preferences changed.
The curator problem: how does the playbook decide what to keep?
The simplest approach is feedback scoring. Bullets that get used and don't get overridden accumulate positive signal. Bullets that get used and immediately corrected accumulate negative signal. Over time, the helpful rises and the harmful sinks. Natural selection for knowledge.
But feedback is sparse. Most bullets, most of the time, simply exist without being tested. The pattern about date formats rarely surfaces because dates rarely come up. The preference about paragraph length activates occasionally, gets no explicit feedback, persists indefinitely. Absence of negative signal isn't presence of positive signal; it's absence of signal entirely.
A stricter curator rejects most potential bullets at the point of entry. Not "learn everything, let feedback sort it out" but "learn almost nothing, demand evidence before admission." The system notices a pattern three times before even proposing it. The user confirms before it becomes permanent. The permanence is provisional, requiring reconfirmation after disuse. Every bullet must justify its presence repeatedly, and bullets that can't are expelled.
This strictness has costs. Genuine insights that appear once and never recur get lost. Edge cases that matter deeply but rarely get forgotten. The curator optimizes for the common case and fails the uncommon one. But the alternative—a playbook that hoards everything—fails differently, becoming so thick that nothing can be found and the thickness becomes noise rather than signal.
The curator problem has no clean solution. It requires tuning: how strict, for which domains, with what decay rate, allowing what overrides. The parameters aren't universal; they vary by user, by domain, by how the playbook is used. A playbook for writing needs different curation than a playbook for code review. The curator that works is one calibrated to its context, and calibration takes iteration.
Decay⚓︎
Thickness that persists beyond its usefulness becomes cruft. The bullet about how you wanted dates formatted in 2024 may not reflect how you want them in 2026. The pattern extracted from your first month of use may not apply after your role changed and your work shifted. Knowledge has a shelf life, and playbooks that ignore this become museums of obsolete preferences.
Decay mechanisms matter.
The simplest is time-based: bullets fade unless reconfirmed. A bullet unused for six months loses weight, surfacing less often, eventually becoming dormant. The playbook thins automatically in areas where it's not being tested. The knowledge doesn't delete (it might be needed again) but it attenuates, stepping back rather than stepping forward, allowing newer patterns to dominate.
Time-based decay has the virtue of simplicity and the vice of indiscrimination. Some knowledge is timeless; some is seasonal; some is context-dependent in ways that time doesn't capture. The bullet about quarterly reporting formats should surface every three months, not decay between quarters. The bullet about the old API version should decay the moment the new version deploys. Time alone can't distinguish these cases.
Usage-based decay is smarter but more complex. Bullets that surface and succeed get reinforced. Bullets that surface and fail get penalized. Bullets that never surface attenuate slowly, but bullets that surface and produce neutral outcomes (no clear success, no clear failure) present ambiguous signal. Did the bullet help? Did it not matter? Is neutral good or bad? The scoring becomes intricate, and intricate scoring is hard to explain to users who want to understand why their playbook behaves the way it does.
Context-based decay is smarter still and harder to implement. The system knows that your role changed, knows that the project ended, knows that the API was deprecated. It can decay bullets that reference contexts no longer relevant. But this requires understanding context at a level that current systems struggle with. Is this bullet about "the Smith project" or about project management generally? Should deprecating the Smith project decay the bullet or leave it intact? Context parsing is hard; automated context decay is harder.
The honest answer is that decay mechanisms are unsolved. The systems that exist use crude proxies (time, usage frequency, explicit deletion) because sophisticated decay requires understanding that nobody has built yet. The playbook that works in 2025 is one that decays imperfectly but decays at all, recognizing that imperfect decay beats no decay, that a playbook which forgets some things badly is better than one which forgets nothing and drowns.
Cascades: The Distribution Problem⚓︎
Knowledge that works for one person might work for others. A playbook that understands how your company formats documents might benefit your whole team. A playbook that understands your industry's compliance requirements might benefit your whole organization. The value compounds when knowledge propagates.
But propagation is perilous.
Upward cascade: personal → team → organization. Your playbook develops a useful bullet. It works for you. Should it propagate to your team? To your department? To the entire company?
The naive approach is automatic propagation: if a bullet works for enough individuals, it becomes shared. But "works for you" doesn't mean "works for everyone." The preferences you've accumulated reflect your role, your style, your idiosyncrasies. Propagating them imposes your preferences on others who may have different roles, different styles, different idiosyncrasies. The bullet that improves your work might degrade theirs.
A mediated approach interposes review. Personal bullets that might have team value get proposed to team administrators, who evaluate and either adopt or decline. Team bullets that might have organizational value get proposed upward similarly. Each transition requires a human judgment that the pattern generalizes.
Mediation has costs. It creates bottlenecks: the administrator who must review every proposed propagation. It creates politics: whose bullets get adopted, whose get declined, what criteria govern the decision. It creates lag: valuable knowledge sits in individual playbooks while the propagation queue backs up. The friction that prevents bad propagation also prevents good propagation.
Downward cascade: organization → team → personal. The company establishes institutional knowledge that should inform everyone's playbook. Compliance requirements. Brand guidelines. Security protocols. This knowledge shouldn't emerge from individual use; it should be imposed from above.
Downward cascade seems simpler but has its own problems. Imposed bullets conflict with learned bullets. The organization says "always include the legal disclaimer"; your personal playbook learned you never want legal disclaimers because you're in engineering, not sales. Which wins? Can you override institutional bullets? If so, what's the point of imposing them? If not, the playbook stops being yours.
The design challenge is boundary management. Which knowledge is personal (never propagates, fully customizable), which is team (propagates within team, overridable with justification), which is institutional (propagates everywhere, not overridable)? The boundaries aren't obvious, change over time, and vary by domain.
Lateral cascade: peer to peer. Your colleague's playbook has a useful bullet you lack. Can you borrow it? Should playbooks be sharable, searchable, mixable? If so, the knowledge graph becomes networked rather than hierarchical, and the propagation patterns become unpredictable.
Lateral sharing enables serendipity—discovering that a colleague solved a problem you're facing—but also enables pollution: importing bullets that seemed useful but don't fit your context, accumulating shared cruft that nobody owns and nobody maintains.
The cascade problem is genuinely hard. Most systems avoid it by keeping playbooks purely personal, sacrificing propagation for simplicity. The systems that enable propagation usually pick one direction (upward only, or downward only, or lateral only) and one mechanism (automatic, or mediated, or manual) and accept that other patterns are unsupported. A complete solution would handle all directions with appropriate mechanisms for each, and nobody has built that yet.
The Modes Problem⚓︎
Here's the objection that deserves serious engagement: even if we collapse the taxonomy to one primitive, the underlying behavioral patterns remain. Claude Code didn't invent six categories arbitrarily. Each maps to a genuine need:
-
Passive detection (skills): The user wants something to happen automatically when context suggests it. "When I'm writing a blog post, apply my voice guidelines." No invocation required; the system recognizes relevance and acts.
-
Active invocation (commands): The user wants to trigger something explicitly. "/deploy" or "/review" — a deliberate action at a chosen moment. The user stays in control of when.
-
Deterministic guarantees (hooks): Some behaviors must happen regardless of what the model decides. Validate before writing. Log after executing. The model's judgment is explicitly removed from the loop.
-
Parallel delegation (agents): Complex work that benefits from separate execution — spin up an environment, do substantial work, return results to the main context. The user doesn't want to watch; they want to receive.
These patterns aren't arbitrary. They correspond to different relationships between user intention and system behavior. Collapsing them into "playbook" doesn't make the distinctions disappear; it just refuses to name them. The user who wants passive detection still wants passive detection, whether you call it a "skill" or not.
So how does a playbook accommodate patterns that seem to require different primitives?
One answer: modes within the primitive. The playbook is one thing, but bullets within it can have different activation modes. Some bullets are "always listening" — they surface when context matches, without being called. Some bullets are "invocable" — they wait to be triggered by explicit user action. Some bullets are "constraints" — they fire deterministically on specific events, bypassing model judgment entirely. Some bullets can "spin off work" — they launch parallel execution and return results.
The primitive is singular; the modes are plural. You don't learn six categories; you learn one category with a mode selector. "This bullet is passive." "This bullet requires invocation." "This bullet is a constraint." The cognitive load shifts from "which primitive?" to "which mode?" — and modes are easier to understand because they're variations on a theme rather than entirely separate concepts.
A deeper answer: these patterns reflect something about language itself.
The philosopher J.L. Austin distinguished different kinds of speech acts by their "illocutionary force" — the type of action a utterance performs. Statements describe. Questions request information. Commands direct action. Promises commit the speaker. The same words can carry different force depending on how they're used: "The door is open" might be a description, a complaint, a request to close it, or an invitation to leave.
AI interaction has analogous modes:
- Declarative: "When summarizing, prefer three paragraphs." A statement of preference that becomes background context.
- Imperative: "/summarize this document." A direct command requesting immediate action.
- Constitutive: "Always run the linter before committing." A rule that constitutes how the system must behave, not subject to interpretation.
- Delegative: "Research competitors and report back." A handoff that launches separate work.
These aren't implementation details. They're different kinds of relationship between speaker and system, different ways that language can function in use. The linguist wouldn't collapse declarative and imperative sentences into one category just because both use words. The modes are meaningful.
What this suggests for playbook design: the primitive can be singular, but it must accommodate modal variety. A bullet isn't just content; it's content plus mode. The mode determines how the bullet participates in interaction:
| Mode | Activation | Model Judgment | Execution |
|---|---|---|---|
| Declarative | Passive (context-triggered) | Model decides relevance | Inline with conversation |
| Imperative | Active (user-invoked) | Model executes | Inline with conversation |
| Constitutive | Event-triggered | Bypassed (deterministic) | Inline or pre/post |
| Delegative | Active (user-invoked) | Model executes | Separate context, returns results |
The playbook doesn't need four primitive types. It needs one primitive type with a mode property that governs activation and execution. The thickness accumulates in the content; the mode shapes how content participates.
What this preserves from Claude Code:
The genuine insight that different behavioral patterns exist. Passive, active, deterministic, parallel — these aren't arbitrary categories but real distinctions about when and how things happen. A playbook that ignores modes would force everything into one behavioral pattern, which isn't simplification — it's impoverishment.
What this changes from Claude Code:
The primitive proliferation. Instead of skills, commands, hooks, agents as separate file types with separate documentation and separate mental models, you have bullets with modes. The user learns one concept (bullet) with one variation axis (mode). The cognitive load is lower because the variation is structured rather than categorical.
What remains hard:
Mode selection. When a user adds a bullet, they must choose its mode — or the system must infer it. "Always validate JSON before saving" sounds constitutive (deterministic, must-happen). "When reviewing code, check for security issues" sounds declarative (passive, model-judged). "Run the full test suite" sounds imperative (active, explicit invocation). Can the system infer mode from phrasing? Probably partially. Will it get it wrong sometimes? Certainly. The mode selector adds complexity that "just a playbook" seemed to avoid.
The honest assessment: modes don't disappear just because you don't name them. The question is whether modal complexity lives in the taxonomy (multiple primitives) or in the primitive (one primitive, multiple modes). The playbook approach bets that the latter is more learnable. That bet might be right. But it's a bet, not an escape from the underlying complexity.
What This Approach Avoids⚓︎
The taxonomy problem — one primitive, not six. Users don't need to distinguish skills from commands from hooks. They have a playbook with bullets. The modes are properties of bullets, not separate categories to learn.
The premature thickness trap—starts thin, grows through use. Users can begin immediately and watch sophistication emerge.
The structural confusion—one concept, one mental model. The playbook is the playbook.
The fluency gap—the same abstraction serves casual users and power users. The difference is how thick their playbook has become, not which primitives they've mastered.
The activation mode confusion—the playbook is always active, always learning, always available. You don't invoke it; it participates. You don't configure its activation; it's on.
What This Approach Requires⚓︎
Trust in emergence. You won't know what playbooks become until users use them. The design enables; it doesn't prescribe. The first users will create playbook shapes the designers didn't anticipate, will thicken in directions that weren't planned, will discover uses that weren't imagined. This is feature, not bug, but it requires designers who can tolerate not knowing.
Mechanisms for decay. Thickness that outlives its usefulness must thin. Bullets that stop being helpful should fade, not persist forever because no one deleted them. The decay doesn't need to be perfect; it needs to exist.
Paths for scope transition. Personal knowledge that proves valuable should be easy to share. The friction of distribution should be proportional to the risk of sharing. Institutional knowledge that applies to everyone should be imposable without destroying personal customization.
Calibration infrastructure. The curator strictness, the decay rate, the propagation rules—these aren't universal constants. They need tuning per deployment, per domain, per organization. The playbook system that works is one that's tunable, not one that's hardcoded to parameters that worked for the designers.
Transparency mechanisms. Users will ask: "Why did the playbook say that?" The answer can't be "it learned, somehow." There needs to be an audit path, a way to trace behaviors back to bullets, bullets back to sources, sources back to interactions. Opacity kills trust, and trust is the only thing that makes passive thickening acceptable.
Second-Mover Advantages⚓︎
If you're building this now, after watching Claude Code and others develop their primitives, you can learn from their archaeology.
Don't multiply categories. The first movers invented six primitives and are still explaining the differences. Start with one. Let it prove insufficient before adding a second. The cognitive overhead of categories compounds faster than the capability of categories.
Design for the activation spectrum. The hard problem isn't active vs passive; it's the territory between them. Build the hybrid mechanisms early—passive observation with active confirmation, active input with passive refinement—rather than bolting them on later.
Make decay a first-class feature. The first movers built accumulation and hoped decay would happen through user maintenance. It didn't. Users don't maintain. Build decay into the system, make it visible, make it tunable. The playbook that forgets gracefully beats the one that remembers pathologically.
Solve distribution early. The first movers built personal primitives and then struggled to make them sharable. The packaging layer (plugins) came late, after users had already developed habits around manual copying. Build the cascade paths from the start, even if they're simple, even if they're one-directional. Retrofitting distribution is harder than building it in.
Instrument everything. You don't know what patterns matter until users show you. The first movers built in the dark, unable to see which primitives were used, which were confused, which were abandoned. Instrument the playbook—what's accessed, what's modified, what's proposed and declined, what's decayed and retrieved. The data tells you what to build next.
Accept incompleteness. No first version will handle all the cases. The cascade problem has no clean solution. The decay mechanisms are approximations. The curator will be too strict for some users and too loose for others. Ship the incomplete thing. Learn from its failures. Iterate. The playbook that works is the one that improves through its own use, applying the Practice Effect to itself.
The Feel for the Game⚓︎
Pierre Bourdieu spent decades studying how people learn to navigate social worlds without consciously thinking about the rules. His term for this was habitus—the embodied sense of "how things work here" that becomes second nature. A native speaker doesn't parse grammar before talking; the grammar is in their bones. They don't know the rules explicitly. They are the rules, enacted through practice.
Habitus isn't knowledge you have. It's knowledge you've become.
The LinkedIn stages assume "workspace" is the destination—Stage 5, where everything is organized, managed, under control. But Bourdieu's framework suggests something different. The goal isn't more structure. It's structure that becomes invisible. Fluency isn't knowing more rules. It's no longer needing to think about them.
Apply this to AI primitives. The best primitives disappear into practice. You stop thinking about whether this is a skill or a command or a hook. You stop wondering which category your workflow belongs to. You just do the thing, and the thing works, and the primitive that enabled it fades into the background of your practice.
This reframes the design question. Not "what's the right taxonomy of primitives?" but "what primitives become transparent through use?" Not "how do we organize user progression?" but "how do we make progression feel like growing fluency rather than learning bureaucracy?"
Brin's objects in The Practice Effect improve until they're maximally suited to their use. The crude knife becomes a perfect knife—not a general-purpose knife, but your knife, fitted to your hand, sharpened for your tasks. AI primitives should work the same way. They thicken toward fitness, not toward complexity.
The destination isn't a workspace full of well-organized primitives. It's a practice so fluent you forget the primitives exist.
Sources⚓︎
- Austin, J.L. How to Do Things with Words (Harvard University Press, 1962)
- Bourdieu, Pierre. Outline of a Theory of Practice (Cambridge University Press, 1977)
- Brin, David. The Practice Effect (Bantam Books, 1984)
- Deleuze, Gilles and Guattari, Félix. A Thousand Plateaus (University of Minnesota Press, 1987)
- Geertz, Clifford. "Thick Description: Toward an Interpretive Theory of Culture." The Interpretation of Cultures (Basic Books, 1973)
- Claude Code Documentation