The skill is the prototype

Something changed in how automations get built, and it happened quietly enough that a lot of teams haven't named it yet.

It used to go like this: a stakeholder describes a process, a PM writes it into a spec, an engineer translates the spec into code, and then we learn at the end that what the stakeholder needs is something entirely different. As a PM I can't tell you how many times the old process would break down due to indecision, communication breakdowns, or even just shifting business realities. In the dev side, you wouldn't even necessarily know because keeping requirements stable was the PMs job.

Things have certainly changed. Now you connect the data with MCP, write the process down in plain language as a skill, and run it. When it does the wrong thing, you edit a paragraph and run it again, or better yet, encourage introspection from the agent to make the edits itself. The person who owns the process iterates on it directly, in the language they already think in or using their agent, who can take unstructured feedback and make things linear. The messy middle of building an automation, the part that used to take three roles and two weeks, happens in a text file over 3 to 5 passes.

This works for a precise reason: AI is excellent when the inputs and outputs are known. If you can check the result - an eval, a test, a type signature, a human reading the output - it doesn't matter that the middle is fuzzy. You iterate until the checks pass. Skills are exactly that shape: bounded verification wrapped around unbounded language. That's why iterating in text is faster than iterating in code, and why (in addition to the shift in who builds software in orgs) skills have earned a place in the product process rather than just the ops toolbox.

But it would be a mistake to think that a working skill is where you stop.

A skill that works is not the finish line. It's just the instructions, spec, PRD, or whatever you want to call it. One of the engineers behind Claude's skills feature put it plainly: procedural steps in a skill body get reinvented by the model on every run, with slight variations, at full token cost. That's fine while in the iteration stage. But once it's delivering, its time to codify it and only user inference where intelligence is truly needed.

A stabilized skill is a prototype, and prototypes graduate. Keep the LLM at the points where input is genuinely unbounded — parsing, classifying, judging, drafting — and compile everything else into deterministic code: control flow, retries, batching, the side-effecting calls. Deploy that on a durable execution engine. Then close the loop: expose the deployed workflow as a new MCP tool, so your agent stops re-running the fuzzy skill and starts calling the reliable tool — and the next skill you write can build on top of it. Capability compounds instead of getting re-derived by the model each time.

The economics are no longer hypothetical. A recent benchmark study of compiling LLM workflows into deterministic code ("Compiled AI," Trooskens et al., 2026) measured 57× fewer tokens at a thousand transactions, 100% reproducibility where direct inference managed 95%, and cost breakeven after about seventeen runs. Seventeen. If a skill will run more than twenty times, you're overpaying by keeping it fuzzy.

Uber blew its entire annual AI budget in four months and their answer was a $1,500/month token cap per engineer. That's rationing, not engineering. When your agents burn the budget re-deriving the same procedures every run, the fix isn't capping the loop, it's compiling it. Rationing treats tokens as the scarce resource; graduation makes most of them unnecessary.

This is what I built rote to do. Point it at an Anthropic-style skill - the SKILL.md and its references — and it classifies every step by determinism, extracts the codifiable parts into plain Python, types the judgment calls as LLM-judge signatures, and emits a runnable workflow for Temporal, Cloudflare Workflows, or DBOS. On the bundled outreach example, 79% of the skill's steps compiled to deterministic code; the graduation run itself took thirteen minutes and about seventy cents. Then rote serve exposes every graduated pipeline as an MCP tool, callable from Claude like any other.

To be clear about the other side: exploratory and one-off work should stay an agent loop. Flexibility is the whole point there, and there's nothing proven to compile yet. Graduation is for the skill you've proven out and integrated into your organization.

Skills aren't the opposite of code. They're how code gets specified now. Write the process in text, iterate with checks at the edges, and when it stops changing, graduate it.

Want this working for your business?