Most teams running AI platforms and transformation programs have the same question: Is AI actually changing the business? Licenses are live, pilots have shipped, and the monthly review slides show activity. But the metrics that actually matter, like revenue, cost per finished task, and the value of AI output, have not moved. The problem is rarely the technology. It is that organizations have measured access to AI, not what people are actually doing with it.
Deloitte's State of AI in the Enterprise 2026 makes the gap visible. Worker access to AI tools rose from under 40% to roughly 60% in a single year. In the same report, only a quarter of companies had moved 40 percent or more of their pilots into production, and about a third reached enterprise-wide deployment. The access is there. The execution discipline and value creation are not.
I’ve put together this piece for operations, functional managers, and senior leaders who are already running AI tools within their teams and want to know whether any of it is advancing the business. This article introduces two scoring frameworks to close the gap between AI access and real adoption, based on patterns I have observed across enterprise AI adoption.
The first measures how deeply AI sits in real workflows, not how many people have a login. The second measures whether AI helps complete the same work at a lower cost than before. Both can be scored in about five minutes per function and reveal far more than a seat-count report ever will.
Getting tools into people's hands is the easy part. Most companies discovered this in the first twelve months. The harder part is changing what people do with their working day, and that requires more than a license.
The technology is rarely the binding constraint. The breakdown lives in the passage from experiment to production, and that passage is a problem of decisions, not code.
Score each function on two axes. Treat them separately, because they move at different speeds, and the traps are different for each.
The two main functions to score are,
BUILD: Product, Engineering, Data Science, and
RUN: Sales, Customer Success, Support, Finance, Operations, Legal, and HR
These thresholds come from patterns I have observed across enterprise AI rollouts and startups, not from a single benchmark study. Customize and calibrate them to your organization if the numbers feel off.
Not how many have a license, seats, or token consumption, but how many open the tool and do real work with it, each week?
|
Score |
Stage |
What it means |
Quick test |
Signal |
|
1 |
Curious |
AI use is limited to a few enthusiasts. |
Fewer than 10% used AI for real work last week. |
Most people have never tried it for their day-to-day work. |
|
2 |
Spreading |
Adoption is growing but remains uneven across teams. |
10 – 40% used AI for real work last week. |
Usage depends on individual initiative, not team norms. |
|
3 |
Common |
AI is part of normal weekly work for most employees. |
40 – 80% used AI for real work last week. |
People notice when the tool is unavailable. |
The simplest way to measure breadth is with a single weekly question:"Did you use an AI tool for real work this week?" You do not need product analytics or token data. A quick poll, or even a show of hands in a team meeting, is often enough to reveal whether AI use is becoming routine.
Breadth tells you how widely AI is used. Depth tells you how much of the work AI actually does.
Not whether people use it. How much of the actual workflow does it own?
|
Score |
Stage |
What it means |
Quick test |
Signal |
|
1 |
Side window |
AI lives outside the workflow in a separate app or tab. |
Is there a copy-paste step? |
Users switch tools to access AI. |
|
2 |
Inside the tool |
AI is embedded in the software where work happens. |
Do users stay in the same tool to use AI? |
AI appears as a button, sidebar, or suggestion. |
|
3 |
Inside the workflow |
AI owns at least one workflow step, with human review before work moves forward. |
Remove AI. Does the process still work, just slower? |
The workflow has been redesigned around AI. |
The fastest way to score a function is to pick one workflow and walk it step by step. For each step, ask, "Who or what does this today?" If the answer is always a person, you are probably at Depth 1 or 2. If AI owns at least one step in the workflow, you are at Depth 3 or 4.
Once you've scored each function, plot the results on a simple Breadth versus Depth matrix. Based on my observations and conversations with C-level leaders, most enterprise organizations in 2026 sit around Breadth 2 and Depth 1.
The score itself is less important than the pattern it reveals. These are the combinations that appear most often.
The right sequence would be to start with two or three priority workflows in each function that have a clear ROI. Push those to Depth 3 or 4 before expanding AI elsewhere. Do not make seat count the headline metric. Track minutes saved and quality per finished task instead. Each quarter, ask one question: which workflow moved up a level on the depth axis, and what did that improvement cost?
Here's an example of what a completed Breadth versus Depth assessment could look like.

Each point represents one business function, making it easier to see where AI is widely adopted, where it is deeply embedded, and where the next opportunity lies.
A lesson from the wrong way to do this: I built an agent to watch our website performance and rewrite copy on its own to lift conversion, and pushed it to Depth 4 before the guardrails were ready. It published live changes with hallucinated value propositions. I was shipping problems faster than I had ever shipped fixes.
I pulled it back, added a retrieval layer, so it worked from what we actually know rather than what the model was willing to invent, and rebuilt the review step before it went near anything live again.
The mistake stayed fast and cheap for one reason: the changes were reversible, as most AI decisions are. Amazon's distinction between one-way and two-way doors is the right frame. Move quickly through the decisions you can walk back. Slow down only at the few you cannot.
The governing rule is one line: compare cost per finished outcome, not cost per token or per seat. Token cost is a single line item, but the comparison that matters is total cost per finished output, measured for the AI process and for the process it replaces.
Speed is the trap hidden inside this math. Push a team to use AI, and it gets faster, and because speed is easy to measure and satisfying to report, it becomes a vanity metric. A team can finish in half the time and still miss the outcome it was paid to deliver. This is why outcome-based pricing is gaining ground. Several AI-native players already price on results delivered (customer tickets solved or prevented), not work performed. Technology and consulting firms will move this way because once everyone is faster, speed stops being something customers are willing to pay a premium for.
For each workflow, compare the old process with the AI process using these metrics:
|
Time per task |
Minutes to finish one unit of work, measured each way. |
|
Loaded labor cost per task |
The fully loaded staff cost of that time |
|
Tool or license cost per task |
Software costs are spread across the work it does |
|
Model or API spend per task |
AI side only, and usually the smallest number on the page |
|
Human review time per task |
Near zero in the old process. On the AI side, it is often the highest hidden cost and the one that most teams forget |
|
Rework rate |
The share of outputs that have to be redone |
|
Set-up and integration cost |
One-off build cost divided over expected volume. |
|
Total cost per finished output |
The number that matters. Everything above resolves into this |
|
Quality |
Pass rate at first review |
|
Speed |
Lead time from start to finished output. |
Use these metrics to compare the AI workflow with the old process on a like-for-like basis.
The four numbers most companies miss based on my experience:
One of the biggest drivers of workflow cost is using the wrong model for the wrong task. Many teams assume the most capable model should handle every customer-facing interaction. That works when latency does not matter, such as a contract draft, regulatory filing, or one-off report. It breaks in live conversations, where latency and cost determine whether the product is usable.
In the AI agent marketplace I built, the fastest model handled customer conversations, while the most capable model reviewed responses behind the scenes and analyzed failures afterward.
A retrieval layer kept responses grounded in organizational knowledge, and backend safety checks reviewed every response before it reached the user. Fast models handled conversations. More capable models handled review and governance.
The question worth asking before any agentic deployment is simple: should your most capable model serve the customer, or protect them?
Even well-designed AI workflows can fail if these mistakes go unnoticed.
Together, the two scores answer the real question: not access, but whether that access has changed how work gets done and what it costs. Take five workflows across three functions and score each against both frameworks. Budget a few hours per workflow.
Set a target depth level and a target cost per finished output for the coming quarter. Give every workflow a named owner. Review the numbers monthly. Skipping it is the most common reason AI programs never turn experimentation into measurable impact.
Got more questions? We got the answers.
A structured way to measure whether your organization is actually using AI, not just accessing it. Most companies track seat counts, tokens, and licenses. An adoption framework tracks two things instead: how deep AI sits inside real workflows, and whether it costs less per finished result than the old process. The two frameworks in this article score both, function by function, in about five minutes each.
Pilots are designed to avoid the problems that production creates. A pilot carries no weight from integration, security review, compliance, or ongoing maintenance. The moment it has to become a real system, it meets all of that at once. Timelines lengthen, ownership blurs, and most organizations do the easier thing: fund a new pilot rather than finish the old one. The breakdown is not in the technology. It is in the decisions required to cross from experiment to operation.
Access means a person has a license and can open the tool. Adoption means the tool has changed how the work actually gets done. Most organizations have the first and believe they have the second. The test is simple: take the tool away for a week and see who notices. If nobody does, you have access. If people cannot do their work at the same speed and quality, you have adoption.
Walk one workflow step by step and ask: who or what does this do today? If the answer is always a person, you are at Depth 1 or 2. If AI owns at least one step that used to belong to a person, and a human checks the result before it moves forward, you are at Depth 3. If AI runs the task end-to-end and humans only handle exceptions, you are at Depth 4. Score each function separately. They move at different speeds, and the traps are different for each.
Assisting means a person still does the work and uses AI to help, the way you might use a calculator. Owning means AI does the work, and a person reviews or approves the result. The line is not about intelligence or capability. It is about where the default action sits. If a person initiates every step, AI is assisting. If the workflow runs without a human triggering it, AI owns it. Most organizations are at the assist stage. The ones that have crossed to ownership built the review and escalation rules before giving the system the keys.
Add up every cost on both sides: time per task, loaded labor cost, tool license, model or API spend, human review time, rework rate, and setup cost divided over expected volume. The number that matters is total cost per finished output, not cost per token. Token cost is usually the smallest number on the page. Review time is usually the largest hidden one.
Because it does not appear on any vendor invoice, the model cost shows up as a line item. The thirty minutes a senior person spends checking, editing, and approving the AI output does not. It gets absorbed into someone's day and never gets counted. In most deployments, review cost beats token cost by a factor of ten to a hundred. Track it the same way you track any labor cost: time per output, multiplied by the loaded hourly rate of the person doing the review.
Four, per workflow, per month. What depth level is the workflow at, and did it move? Total cost per finished output, AI side versus the old process. The pass rate at the first review indicates whether quality is maintained. And human review time per task, which indicates whether the hidden cost is growing. Seat counts, license utilization, and token spend are vendor metrics. These four are business metrics.
Three conditions, and all three must be true. The total cost per finished output is lower than that of the old process. Quality, measured as pass rate at first review, is equal to or better than before. A named person owns the workflow and is watching cost, quality, and drift monthly. If any one of those is missing, scaling will spread the problem, not the result. Nail it, then scale it.
Speed is easy to measure and satisfying to report, which is exactly why it becomes a vanity metric. A team that finishes in half the time and still misses the outcome it was paid for has not created value. It has created faster waste. Measure cost per finished output instead. Then measure quality at first review. Speed is a by-product of a workflow that works. There is no evidence that the workflow works.
Most weeks, I feel behind. There is a new tool, a new technique, a new paper, and the gap between what exists and what I have actually used keeps widening. So I made one deliberate choice: one tool a month, taken deep into a real use case, rather than a shallow pass across ten. It is the depth-over-breadth framework turned inward, and it is the only method I have found that converts anxiety into competence. The leaders I trust most on this are not the ones with the longest tool list. They are the ones who can show you a workflow they rebuilt with their own hands and explain exactly why it works.
That understanding comes from direct use, not delegation. For any senior leader, "I do not know how to build that" is starting to sound a great deal like "I do not understand the business."
The board and the C-suite, as most organizations define them today, have a short future in their current form. Boards will govern AI, and before long, they will do it with AI, overseeing agents and people together. The C-suite will be judged less on title and more on speed, quality, and how well they architect the place where humans, AI, and regulation meet. Situational leadership still matters, but the situation has changed. The next CIO is a builder, or will be replaced by one.
Once you know where AI creates value, the next step is governance and infrastructure. Learn how AI gateways help manage models, control costs, and deploy AI securely at scale.
27 years global technology executive who built, scaled, and exited AI, SaaS, and climate-tech businesses across 50+ countries and holds senior roles at Google and AWS. Solo-founder of Konfide.ai and author of Earn Trust Fast: Build Trust, Lead Teams, Transform Business in the Age of AI (2026), LinkedIn Top Voice.