Mar 10, 2026 5 min read Merelda Wu <merelda@melio.ai>

The AI Investment Most Companies Are Leaving on the Table

Models are becoming commodities and the hard part has moved. The industry is past model selection - 2026 demands systems engineering.

Two years ago, almost every AI conversation started in the same place: “Which model are you using?”

That question made sense when the capability gap between models was wide and volatile. Picking the wrong one could materially affect output quality. But today, the top-tier foundation models are converging. They all offer strong reasoning, large context windows, structured output support, and increasingly competitive pricing. The performance differences still exist, but they are narrower and often situational.

Models are becoming infrastructure. And when something becomes infrastructure, it stops being the primary source of competitive advantage.

The real leverage has moved up the stack.

What Actually Exists Inside a Working AI System

If you look at any AI system that is actually working inside a bank, insurer, telco, or professional services firm, you will find far more than a model call. There is a retrieval layer grounding outputs in internal data. There is orchestration logic managing prompts, routing, and tool use. There is an evaluation framework that measures whether outputs meet defined quality thresholds. There is monitoring to detect drift and silent failure. There are human approval steps where risk demands it.

The model is just one small (albeit important) component of the system.

And in most production environments, it is not the bottleneck.

Care deeply about your System Design

Where teams struggle is not usually model capability, it is system design.

Data Quality: The First Reality Check

AI has an unforgiving way of exposing messy foundations. Duplicate documents, conflicting policy versions, missing metadata, outdated PDFs treated as source of truth - these issues do not disappear because you added retrieval. In fact, they become amplified. A model grounded in inconsistent data will produce confidently inconsistent answers. Retrieval scales bad data.

The teams pulling ahead are not just cleaning data - they are structuring it to reflect how their business actually works. Customer context, operational history, domain-specific relationships. Your data architecture teaches the model about your business.

Evaluation: Moving Beyond the Demo Mindset

Evaluation is where most teams are flying blind. Many teams test AI by trying a few prompts, tweaking wording, and deciding it “looks good”. That is the mindset of building for a shiny demo, not for an evolving system. Production requires defined tasks, structured outputs, labelled datasets, scoring criteria, and explicit thresholds for release. Without that, every prompt change feels risky. Every model upgrade feels like a gamble. You cannot improve what you do not measure, and in AI systems, running on vibes leads to fragile outcomes.

An evaluation dataset built from your best-performing cases and known failure modes is a competitive asset. It encodes decades of accumulated domain expertise that no foundation model ships with. Every time you label an output as good or bad, you are capturing expert judgment that compounds over time - and that no competitor can lift from a model release.

Workflow Integration: Where Value Is Realised

Workflow integration is where the real engineering begins. The work begins after a summary is generated or an email is drafted. Does the output trigger a case workflow, update a CRM record, generate a compliance artifact, or route to a human reviewer? Only when AI breaks out of the chat interface and embeds into an operational workflow, does it start generating real value. That integration work is engineering-heavy and detail-oriented. It requires thinking about state management, error handling, permissions, and user experience. None of that improves simply because a newer model was released.

Governance: The Final Pressure Test

Governance becomes the final pressure test. In regulated industries especially, you need traceability, audit trails, versioned prompts, clear data lineage, and defined human oversight. But governance also means clarity on access and accountability.

Who is allowed to see what data? Can a frontline agent view full customer history, or only the fields required for that task? Are sensitive attributes masked by default? Is model access scoped by role, geography, or regulatory boundary? These decisions must be encoded into the system through permissions, encryption, and role-based access controls, not handled informally.

Equally important is escalation. When AI-generated advice is incorrect, who gets notified? Does it route automatically to a reviewer, a domain expert, or a risk function? Is the output logged, flagged, and linked to the prompt and model version that produced it? Governance is not just about preventing failure - it is about designing clear response paths when failure happens.

The question from risk or compliance is rarely “which model did you use?” It is “can you explain this output, reproduce it, show who saw it, and demonstrate how it was controlled?” That answer lives in architecture and process.

The Melio View: The Moat Is in the System

This is why our view at Melio is simple: competitive advantage comes from designing your AI system to amplify your organisational knowledge.

Foundation models will continue to improve. Prices will shift. New releases will outperform old ones. If your architecture is clean, you can swap models, A/B test providers, downshift to smaller models for cost efficiency, and evolve safely. If your architecture is brittle, every improvement upstream becomes a source of instability downstream.

The organisations pulling ahead are not winning on model selection. They are winning because they have structured their data to reflect how their business actually works, codified their best judgment into how they evaluate AI outputs, and embedded AI deep enough into their workflows that it amplifies what is already working for them.

That knowledge compounds. A competitor can access the same model you use. They cannot access two years of your labelled data, your domain-tuned evaluation criteria, and the workflows your team has built around them.

Models are becoming commodities.

Your structured knowledge is not.

If you are building AI inside your organisation, stop obsessing over marginal model differences and start investing in architecture.

Invest in clean data foundations.
Invest in structured evaluation.
Invest in workflow integration.
Invest in access control and escalation paths.
Invest in monitoring and feedback loops.
Choose models deliberately, yes. But design systems intentionally.

Because in the long run, the model you picked will change.

The system you built is what will determine whether AI actually works, scales, and can be trusted.

Beyond the Hype: How Gen AI Delivered Real Value in 2025 »