From Experiment to Enterprise: What It Takes to Scale AI

April 29, 2026
By Sullivan & Stanley
5 minute read

The question most organisations are asking about AI is the wrong one. "How do we use AI?" implies the tool is the starting point. The organisations making progress have reframed it: what problem are we solving, and can AI help us influence the variables to solve it?

That reframe was at the heart of a recent S&S cross-Community session on AI agents and the new digital workforce, and it cuts to why so many AI programmes stall between ambition and scale.

Where most organisations are stuck

S&S's Intelligent Enterprise research, drawn from interviews with more than 200 UK C-suite leaders, shows that 42.5% of organisations are stuck in a cycle of experimentation. Pilots are running. Proofs of concept are being presented to boards. But the infrastructure to scale what's being built is not there. Only 15.5% have reached the point where AI is capturing enterprise-wide value.

The session mapped the journey between those two points across three stages. Stage one is where most organisations sit: AI on a single platform, run as a separate programme, generating enthusiasm without generating scale. Stage two sees agents starting to appear but operating without the system integrations that would make them useful at enterprise level. They become, as Ella Ovenden, Senior AI and Data Consultant, described them in the session, “islands: capable in isolation, unable to connect to the data and legacy systems they need to do work.” Stage three is the destination most are aiming for: a coordinated hybrid workforce where humans and agents operate together, with clear governance, defined accountability and identity management built in from the start.

The gap between where organisations are and where they want to be is well understood. Less well understood is why progress through it is so consistently slow.

What vendors are selling and what that means for you

Part of the answer is structural. The majority of what major vendors are actively selling right now is stage three capability, and the commercial logic behind that is worth understanding clearly. Platforms that have invested tens of billions in AI development are motivated to get inside client architectures early, before the true cost of running large language models at scale passes to the customer. Embedding at the context and data layer now is how they secure the long-term relationship.

Archie Cobb, S&S's AI and Data Lead, was direct about this in the session: "They're wanting to show you the art of the prize. They will start to wed themselves into your data systems." The implication is not that vendor tools lack value. It is that running proofs of concept on vendor platforms to explore what's possible risks creating dependency on infrastructure that has not been evaluated against the organisation's own data maturity, security requirements or integration landscape. By the time the cost of that dependency becomes visible, it is expensive to undo.

The same pattern plays out with model pricing. The cost of API calls through large language models is not currently reflective of what it will be in the next two to three years, as infrastructure investment is recouped. Organisations building deep integrations now without considering model agnosticism may find themselves locked into commercial relationships that were not part of the original business case.

What Claude gives you and what the enterprise must provide

A useful frame that came out of the session distinguishes between what a model like Claude offers and what the enterprise itself must provide for that model to deliver value.

Claude (or any frontier model) performs well at the proof-of-concept (POC) stage. It can translate intent into prototype quickly, allow non-technical users to interact with systems in natural language and compress the distance between idea and working demonstration from months to hours. For bounded, well-defined use cases it is a useful tool.

What it does not provide is the enterprise infrastructure required to make that work at scale: the data governance layer, the API-accessible core systems, the security architecture, the integration middleware, the identity and credentialling framework that determines what an agent can and cannot access, and the accountability structure that determines who is responsible when something goes wrong. Those are not model problems. They are organisational ones, and no model upgrade resolves them.

A senior leader in the session who works across large financial services organisations confirmed the pattern in practice. Across a career working with some of the largest FS organisations in the UK, they had not seen one tackle its foundational data and infrastructure issues so thoroughly that deploying a full agentic enterprise would carry a high degree of confidence. The foundations remain the bottleneck. Progress on the exciting end of the maturity curve consistently outpaces progress on the unglamorous end, and the gap between the two is where most of the risk accumulates.

The golden thread

What the organisations making considered progress share is a different starting orientation. Rather than beginning with AI capability and asking where it can be applied, they begin with a five-year strategic objective and work backwards through the technology stack to identify what needs to be true at each layer for that objective to be reachable.

Archie described this in the session as the golden thread. It is the connective logic that runs from the outcome an organisation cares about, through the operating model changes required to reach it, down into the technology foundations that enable those changes. Without it, AI investments accumulate without cohering. POCs prove the art of the possible without connecting to anything that matters commercially. Enthusiasm stays high and value stays low.

The golden thread is also what protects organisations from becoming servants to external pressure. The pace of model releases, the volume of vendor announcements and the expectation of colleagues who have read about AI capability in the weekend papers all create a pull towards reactive adoption. Organisations with a clear strategic thread can evaluate new developments against that thread. Organisations without one tend to chase each announcement in turn, accumulating spend and complexity without accumulating capability.

The sequence matters

The practical implication is a sequencing question rather than an ambition question. The ambition to build an integrated agentic enterprise is well-placed. The sequence required to get there is not to start with the endpoint and work backwards under pressure, but to invest deliberately in the foundations - data maturity, system integration, governance architecture - before adding the agentic layer on top.

One leader in the session described their team's experience of doing exactly this in health and safety processes: a narrow, deterministic agentic use case, with clear governance from the outset and humans in the loop at every point where the stakes required it. What it clarified, they said, was the boundaries of what the technology could reliably do. That knowledge is the actual asset. It is what makes it possible to build the next use case with more confidence, and the one after that with more still.

That is how organisations close the gap between 42.5% and 15.5%. By building those foundations with the destination clearly in view.