What Other Industries Are Actually Seeing

In the first essay, I asked a simple question: if AI really makes us faster and cheaper, where is the proof in the schedule and the budget. Other industries have at least started to answer that question. Some of the answers are encouraging, some are disappointing, and together they are far more useful than the hype.

Film, for the most part, has not done this work. Before we talk about our own budgets, it is worth looking at how other sectors have tried to measure AI, where they have succeeded and where they have quietly admitted failure.

The Numbers Behind the Hype

In 2025, a team at MIT analyzed more than three hundred generative‑AI deployments inside companies. Their conclusion startled a lot of investors. Only about five per cent of pilots led to meaningful financial impact; roughly ninety‑five per cent failed to generate measurable savings or profit. The problem was not that the models could not write or summarize. The problem was that most pilots never made it out of the sandbox and into real, measured workflows.

Forrester’s 2026 outlook tells a similar story in a different language. Surveying enterprise AI decision‑makers, they found that only about fifteen per cent reported any improvement in EBITDA, and fewer than one in three could clearly tie AI projects to changes in the P&L. At the same time, those same companies are doubling their AI budgets and pushing spending towards 1.7 per cent of revenue. The money is very real. The returns, in most cases, are not.

That is the discouraging part of the story. The more interesting part is where AI does work, and how those organizations know.

Where AI Quietly Does Its Job

Consider a large contact‑center operation. In one of the best‑known studies, Erik Brynjolfsson and his colleagues followed 5,172 customer‑support agents at a Fortune 500 company who were given access to an AI assistant built on a GPT‑style model. On average, the agents handled about fourteen per cent more issues per hour. The least experienced agents improved the most, with productivity rising by more than thirty per cent, while the top performers gained very little. The tool captured the patterns and language of the best agents and effectively lent their judgement to the weakest ones.

In healthcare, we see something similar with a different set of stakes. One case study of a mid‑sized provider using AI to support diagnostics and care coordination reported a sixty per cent reduction in manual patient‑data processing time, a thirty per cent increase in diagnostic accuracy, and savings equivalent to five full‑time staff. In the less glamorous world of billing and coding, AI‑assisted revenue‑cycle tools have increased coder productivity by around forty per cent, cut claims‑review time by more than half, and generated an average return on investment of more than four hundred per cent, with roughly 2.4 million dollars in savings over eighteen months for a typical hospital group.

On factory floors, AI has been quietly changing maintenance routines. Instead of running machines until they fail, manufacturers are using predictive models to forecast breakdowns and schedule repairs in advance. One industrial report describes cost savings of up to forty per cent compared with purely reactive maintenance, alongside reductions in unplanned downtime of around fifty per cent. These are not flashy chatbots. They are algorithms listening to vibration data and temperature signals, nudging engineers to fix a motor before it takes an entire line offline.

These examples have two things in common. First, they live in very specific slices of a workflow: a type of customer call, a particular diagnostic pathway, a narrow maintenance task. Second, someone cared enough to measure them. Time on task, error rates, dollars saved. Without those numbers, all of these stories would collapse into marketing.

The Jagged Frontier

The most honest description of how this feels from the inside might be the “Jagged Technological Frontier” study that Harvard Business School ran with Boston Consulting Group. They asked hundreds of consultants to tackle realistic business problems, some well suited to a generative‑AI assistant and some just beyond its current capabilities. On tasks that sat comfortably inside the frontier, consultants using AI completed about 12.2 per cent more tasks, 25.1 per cent faster, and with roughly forty per cent higher quality. On tasks that lay just outside the frontier, the pattern reversed. The AI users were nineteen percentage points less likely to get the right answer.

In other words, when the tool understood the terrain, it made people faster and better. When it did not, it made them confidently wrong.

From a distance, this looks like a research detail. Up close, it is a warning. If you drop AI indiscriminately into a process, you will see pockets of real improvement alongside new types of error and overconfidence. Whether the overall result is positive depends entirely on where you choose to apply it and how carefully you watch the outcomes.

Other industries are learning this, sometimes painfully, by running structured experiments and publishing the results. Film is beginning to adopt the tools, but we are not yet copying the habits that make those tools safe and useful.

Pilots, Failures, and the Ninety‑Five Per Cent Problem

The gap between these success stories and the broader statistics is where things become interesting. If AI can deliver forty per cent productivity gains in one hospital department, why do we still see flat productivity at the national level. Part of the answer lies in the way organizations run experiments.

The MIT team that found a ninety‑five per cent pilot failure rate did not conclude that AI was useless. Their point was sharper. Most pilots never moved beyond small, isolated tests. They lacked clear success metrics, never touched the systems where value is actually created, or were quietly abandoned when the first integration friction appeared. In many cases, the people running the pilots were rewarded for launching them, not for proving they worked.

Forrester sees the same pattern from another angle. When only a small minority of decision‑makers can point to EBITDA improvement, and most cannot even connect AI projects to specific P&L line items, it is not because the tools failed to generate any effect at all. It is because very few organizations did the basic work of deciding in advance what “success” would look like and then checking whether it happened.

So we are left with a strange picture. On one side, tightly scoped use cases with impressive gains. On the other, a forest of pilots that never quite touch the core of the business. The technology is capable of real change. The way we introduce it often is not.

How Serious Industries Run Experiments

If you look at the AI projects that have actually earned their keep, they tend to share a few unglamorous traits.

First, someone took the time to establish a clean baseline. The hospital knows how long it used to take to code a claim or to triage a certain category of patient before the tool was introduced. The manufacturer knows how many hours of unplanned downtime they recorded on a particular line over the last year. The contact center knows how many tickets an average novice agent handled per hour before the assistant arrived. Without that “before” picture, the “after” is meaningless.

Second, the pilots were deliberately narrow. They did not try to “transform the organization” in one go. They tackled one type of call, one diagnostic pathway, one set of machines. Brynjolfsson’s customer‑service study did not track every corner of the business; it measured a clearly defined group of agents doing a clearly defined job. The Harvard and BCG experiment looked at a particular mix of consulting tasks, not at “consulting” in general.

Third, the people running the experiments agreed in advance on what would count as success. It might be a reduction in average handling time, a drop in error rates, a measurable improvement in patient outcomes, or a certain level of cost savings. It might also include qualitative measures: do people find the tool trustworthy, or does it quietly increase stress.

Finally, and this matters more than we like to admit, failures were tolerated and sometimes published. The MIT study exists because companies were willing to share data about pilots that did not work. Forrester’s surveys are honest about how few firms see real financial impact. In other words, these sectors have accepted that disappointment is part of the learning curve.

This is not a glamorous picture of innovation. It is closer to lab work: define your hypothesis, run a controlled experiment, record the result, and then adjust.

What Film Is Not Doing Yet

In film and television, we are at a much earlier stage. McKinsey’s recent work on AI in production suggests that early adopters are seeing five to ten per cent productivity gains in specific pre‑production tasks, such as script breakdowns, animatics and asset reuse. Those are real numbers, and they should not be dismissed. Yet they also highlight how small the measurement universe still is.

We have almost no public case studies that say, “On this series, AI‑assisted breakdowns reduced prep by seven per cent, cut overtime by this many hours and shrank the contingency by this amount.” There is no shared dataset showing that AI‑supported scheduling saved a consistent number of days across a slate of shows. The few numbers that do exist tend to come from vendors’ marketing decks rather than from line producers or financiers comparing two concrete productions.

From where I have been sitting for the last two decades — at Sagafilm, at Polarama and Polarama Greenland, on American features shooting in Iceland and on Nordic television dramas — I have watched us adopt new cameras, new post‑production workflows, new collaboration tools and now AI. What I have rarely seen is a disciplined before‑and‑after comparison attached to a particular budget or schedule. We add the tool, we hope it helps, and we move on.

The paradox is that film is an industry obsessed with numbers. We fight over every shooting day, every overtime hour, every hotel night and every unit move. Yet when it comes to tools that claim to change those numbers, we mostly rely on anecdote.

Borrowing the Right Habits

If there is one lesson to borrow from healthcare, manufacturing and even customer‑service operations, it is not that AI will automatically save us. It is that the only honest way to judge these tools is to run small, clear experiments against a baseline you trust and to be willing to be disappointed.

For film, that means starting with something far more modest than “AI will transform production.” It means choosing one piece of the process — a breakdown, a location search, a schedule draft — and asking four simple questions:

How long does this usually take us now.
What does it cost in direct labor and overtime.
Where are the typical errors or points of friction.
What would success look like if we tried an AI‑assisted version.

Only then does it make sense to introduce a tool and see what happens. You may find the results are underwhelming. You may also find that, like the contact‑centre novices or the hospital coders, a particular group in your team quietly gains twenty or thirty per cent in effectiveness. Either outcome is valuable as long as it is measured.

Towards the Budget Test

In the first essay, I suggested a simple rule of thumb: if a tool does not change your schedule or your budget in a way you can see and document, you should treat it as a toy rather than a transformation. The stories from other industries do not contradict that rule; they strengthen it. They show us that AI can deliver real gains, but mostly in tightly defined parts of the workflow, after someone has done the unglamorous work of measurement.

Film has reached the point where adopting AI is almost a reflex. What we have not yet done is adopt the habits that let other sectors separate signal from noise. In the next part of this series, I want to step back from the tools themselves and look at what our workflows actually look like before AI touches them. Because if we do not know where the time and money go now, we will never be able to say honestly whether any of these systems helped.

Only then will The Budget Test be more than a slogan. It will be something any production, from a streaming series to a two‑person documentary, can run for itself.