The model was never the problem — context was

For a while I had two AI tools open at the same time, and one felt brilliant and one felt useless, and I couldn’t explain why.

ChatGPT knew my writing style, my industry, the decisions I’d been wrestling with for months. It had context on me. Copilot — deployed across the organization — answered questions about its data like a new hire who hadn’t been introduced to anyone yet. The model powering both was roughly the same. The difference wasn’t intelligence. It was what the AI could see.

That was the puzzle. Once I understood it, everything else about Copilot clicked into place — including why so many enterprise deployments produce exactly the disappointment I felt before it clicked.

I’ll come back to ChatGPT once more at the end, because the principle is the same even though the mechanics are different. For now, I want to stay with the enterprise problem — because that’s where the money is being spent and where the confusion runs deepest.

Most deployments feel useless for the same reason mine did

The Copilot app — the work-grounded version, accessed through the Microsoft 365 Copilot surface in Teams or at m365copilot.com — is built around retrieval. Not generation. Retrieval. The value is that it can see across the organizational data estate: files I didn’t know existed, meeting summaries from calls I wasn’t on, decisions recorded somewhere in SharePoint months ago. The AI didn’t make me smarter about what I already knew. It gave me access to what I didn’t know was there.

That’s the pitch. It’s also a real capability — when the conditions are right.

A friend of mine runs digital platforms at a major Swiss telco. We spoke about Copilot — he’d brought it up, not me. His firm had deployed it; he wasn’t seeing the value, and he couldn’t explain why my experience was different. He’s not someone who misses things.

The answer, once we compared notes: his firm’s deployment had no data grounding layer. The retrieval AI had nothing usable to retrieve from. The AI was connected to an index it couldn’t meaningfully search — and it returned answers that felt hollow, generic, or simply wrong.

That’s not a model failure. It’s a sequencing failure.

Microsoft’s own deployment guidance is explicit about this: review SharePoint searchability, remediate oversharing, run content lifecycle assessments before broad rollout. The Gartner analysts who track this have put it plainly — deployment plans routinely assume a clean data environment, which is “rarely the case,” and risk-mitigation work has been “significantly greater” than organizations planned. Gartner’s 2025 Microsoft 365 Copilot survey found that among organizations that had finished pilots, only 5% were moving to larger deployment. Nearly half of those that had piloted it rated it as only “some value, shows promise.”

The pattern is consistent enough to name: organizations license Copilot, skip the data remediation, see nothing useful, and conclude the product doesn’t work. What they’ve actually discovered is that retrieval AI has the same prerequisite as any other search system — the underlying data has to be findable before the search can surface it. This isn’t a Copilot-specific insight. It’s the same mechanism that sinks automation projects: apply a capable layer on top of broken foundations, and you scale the broken foundations. You don’t fix them. The same mistake shows up one layer down, when the broken foundation is a process rather than a data estate.

My experience, once the data layer was in place, was different. I found things I didn’t know to search for. That’s a real capability. It’s also narrower than the pitch implies, and it requires more infrastructure work than most organizations plan for.

“Copilot” is several different products — and which surface you’re on decides what it can see

A Microsoft vendor came in to pitch Copilot to a law firm I was advising. The pitch focused on Copilot in Word and PowerPoint — drafting, summarizing, reformatting documents — and positioned these as the primary enterprise value: the reason to buy.

I pushed back. My experience had been different enough that something didn’t add up. The value I’d seen wasn’t in Word. It was in finding things — PowerPoint decks in SharePoint libraries I’d never thought to search, context from across a project I hadn’t manually assembled. They didn’t have a clean answer. So I went home and checked.

The context limitation in Word is by design. It is documented in Microsoft’s own support pages. Copilot in Word helps you work with the document you have open. You can extend it — you can explicitly add up to 20 referenced files, emails, or meetings — but you have to know what you want to add. You’re pulling from a known set of sources, not searching an unknown organizational estate. PowerPoint is similar. Excel is the most explicit: Microsoft’s documentation states directly that Copilot in Excel works with the data in the current workbook and cannot access other files, emails, or enterprise data in that editing flow.

This is not a configuration issue. There is no admin switch that turns the Word side-pane into the same broad work-grounded retrieval experience as the Copilot app. These are product behaviors, documented as such.

The result is a gap that vendor wasn’t accounting for. The work-grounded Copilot app serves retrieval-first tasks: when you need to find what exists before you can decide what to make. Copilot in Word, PowerPoint, and Excel serves artifact-first tasks: when you already know what you’re making and the inputs are in hand. Both are legitimate. Neither is a substitute for the other. Conflating them is how organizations end up with Copilot licenses, a lot of disappointed users, and no clear explanation for why the product didn’t do what the demo implied.

Outlook is the exception worth flagging, because it gets lumped in with the Office apps when it doesn’t belong there. With a paid Microsoft 365 Copilot license, Copilot Chat in Outlook can reach across inbox, calendar, meetings, chats, and enterprise data — which makes it meaningfully closer to the work-grounded Copilot app than to Word or Excel. For organizations where the primary knowledge work runs through email and calendar, Outlook with a full license is one of the higher-value surfaces in the portfolio. Categorizing it as “just another Office app” misses most of what it can do.

The naming problem isn’t cosmetic

Everything above would be easier to navigate if the products had different names. They don’t. And in 2025, Microsoft made the situation significantly harder.

In January 2025, Microsoft renamed the old Microsoft 365 hub app — the Office launcher that millions of enterprise users already had installed on their machines — to the “Microsoft 365 Copilot app.” The rebrand was controversial enough that Microsoft had to publicly deny it had renamed Word, Excel, and PowerPoint.

The consequence: a user who opens the “Microsoft 365 Copilot app” on their Windows desktop may reasonably believe they are using work-grounded AI. Whether they are depends entirely on whether their organization has purchased the paid Microsoft 365 Copilot add-on. Without that license, the app is much closer to the consumer product. Same name. Different capability. No obvious signal telling the user which one they have.

The consumer product — available at copilot.microsoft.com, in the Edge sidebar, and the Windows taskbar without an enterprise license — is web-grounded. It has no access to organizational data. If you’ve used Copilot on a personal machine and found it behaved completely differently from what a colleague described, this is why. It’s a different product.

This isn’t analyst commentary about unfortunate branding decisions. In 2025, the US National Advertising Division examined Microsoft’s Copilot claims and issued a formal finding: Microsoft should modify certain productivity claims and clarify what “Copilot” means in each surface, because users would not naturally understand the functional differences from the branding alone. A regulatory watchdog telling a major technology company that its product naming is misleading enough to require correction is not a niche observation. The branding confusion has been institutionally documented.

Two questions that cut through most of it

If you’re responsible for a Copilot deployment — or about to sit in a pitch for one — two questions cut through most of the noise.

Is the data layer ready? If SharePoint sites aren’t searchable, if permissions haven’t been remediated, if the estate is full of stale and ownerless content — the work-grounded experience won’t deliver regardless of the license. The AI can only retrieve what the index can find. This is the work that has to happen before the rollout, not after the first round of disappointing results. The sequencing matters more than the licensing decision.

Which surface and which license does your organization actually have? The Microsoft 365 Copilot app with the paid add-on is not the same product as the Microsoft 365 Copilot app without it. The demo almost certainly shows the work-grounded version. What gets deployed is often the Office apps, or the unlicensed version of the app. If the demo doesn’t specify which surface it’s showing and which license it assumes, ask before you buy.

That second question brings me back to ChatGPT — the comparison that opened this. The reason ChatGPT felt useful early on wasn’t that it was a better model. It was that the context was personal: my writing, my problems, my history with the tool. The context was small, but it was well-stocked. Enterprise Copilot is trying to do something different in kind — not personal context, but organizational knowledge, at scale. The mechanism is the same: grounding determines what the AI can see, and what it can see determines what it can do. The difference is that personal context is easy to build accidentally over time, while organizational grounding requires deliberate infrastructure work.

That’s the real gap most deployments fall into. Not a model problem. A context problem — the same one, at different scales.

The model was never the problem — context was

Most deployments feel useless for the same reason mine did

“Copilot” is several different products — and which surface you’re on decides what it can see

The naming problem isn’t cosmetic

Two questions that cut through most of it

AI doesn’t fix a broken process. It scales the breakage.

Leave a Reply Cancel reply

Most deployments feel useless for the same reason mine did

“Copilot” is several different products — and which surface you’re on decides what it can see

The naming problem isn’t cosmetic

Two questions that cut through most of it

Similar Posts

Leave a Reply Cancel reply