All articles Engineering 8 min read

The AI tax on legacy stacks: what every monolithic CMS will cost you in the agentic era

Gartner found a 1,445% surge in multi-agent AI inquiries. Legacy CMS architectures cannot support agentic systems, here is what that incompatibility costs you.

Paul Utr 18 June 2026

Gartner reported a 1,445% surge in multi-agent system inquiries between Q1 2024 and Q2 2025, and projects that 40% of enterprise applications will embed AI agents by the end of 2026, up from under 5% in 2025. The major enterprise software vendors, including Microsoft, Salesforce, Adobe, and ServiceNow, are all reorienting their platforms around agentic AI.

The organisations that capture this value will be the ones whose content, data, and infrastructure can be consumed by autonomous systems. Legacy content platforms designed for human authors in 2008 are structurally incapable of serving that requirement.

The monolithic CMS will not be replaced by AI. It will be made irrelevant by it. That distinction matters, because irrelevance accumulates quietly and the cost is harder to attribute to a single cause than a clean replacement would be.

What agentic AI actually needs from your content

An AI agent is not a user. It does not navigate menus, interpret visual layouts, or read implicit structure from a page designed for human consumption. An AI agent queries data through APIs, expects typed fields with consistent values, requires stable identifiers to retrieve specific content, and needs to know whether a content item is authoritative, current, and trusted.

A monolithic CMS, whether WordPress, Drupal 7, AEM on legacy infrastructure, or a custom PHP system built in 2012, was designed for exactly none of these requirements. It was designed for a human editor to click through a visual interface and produce HTML pages for a human reader. The content is stored in a way that makes it accessible to templates. It is not stored in a way that makes it accessible to agents.

MIT Sloan researchers studying agentic AI deployment found that 80% of the work in their AI agent project was consumed not by model engineering or prompt design, but by data engineering, stakeholder alignment, governance, and workflow integration. Converting data into standard, structured formats that agents could consume reliably was the actual constraint. The models worked, the content layer did not.

Salesforce’s 2026 Connectivity Benchmark Report, conducted with Deloitte Digital across 1,050 IT leaders, found that 96% of organisations experience barriers to using their data for AI use cases. 40% specifically identify outdated IT architecture and data silos as the top blocker, and 37% cite legacy infrastructure or system incompatibility as a primary barrier to agentic AI deployment. The monolithic CMS is the primary system where organisational content lives in most mid-market and enterprise organisations, and it is precisely the system that cannot support what agentic AI requires.

Incompatibility 1: no machine-readable API

A monolithic CMS renders content to HTML. The rendering engine, whether that is WordPress’s template system, Drupal’s node rendering, or a legacy page builder, produces output designed for browser consumption. An AI agent needs something different: a structured data API that returns typed data in a predictable format. Product ID, title, description, pricing, availability status, publication date, locale, each as a named, typed field, not as elements parsed from HTML.

WordPress has a REST API. It is not headless by design, and the content model it exposes is built around the post and page paradigm, not around the domain entities an AI system needs to retrieve. A product description that lives inside a WordPress post body, surrounded by HTML formatting and editorial shortcodes, is not machine-readable in any practically useful sense. An agent that needs to retrieve all products in category X with pricing in locale Y and availability status Z cannot do so from a monolithic CMS without substantial engineering to normalise the data first.

That normalisation engineering is the first component of the AI tax, and it is not a one-time cost. It recurs every time the content changes, every time the CMS is updated, and every time a new AI use case requires a different field or relationship.

Incompatibility 2: unstructured and inconsistent content

A monolithic CMS allows authors to create content in ways that are not structurally consistent across items of the same type. A product description might be 80 words in one record and 800 words in another. A date field might be stored as a formatted string in some records and a Unix timestamp in others. A category relationship might be represented as a tag in some records and a taxonomy term in others.

This inconsistency is not a failure of authoring discipline. It is a consequence of systems that prioritise editorial flexibility over structural consistency and agents have no tolerance for it.

An AI system that retrieves product descriptions to generate a comparison, a recommendation, or a personalised pitch needs consistent field lengths, consistent data types, and consistent relationship structures to produce reliable output. Fed inconsistent content from a legacy CMS, the system produces inconsistent output. The 96% of organisations experiencing data barriers for AI use cases are not encountering this because their data is missing, it is because their data exists in forms that AI systems cannot consume reliably.

The engineering cost of cleaning, normalising, and continuously maintaining structured content derived from an unstructured legacy CMS is the second component of the AI tax. It scales with content volume and the number of AI use cases, making it a compounding cost rather than a fixed one.

Incompatibility 3: no governance layer

Agentic AI systems take actions based on the content they retrieve. A support agent applying a refund policy needs the version it retrieves to be current, scoped to the right market, and approved for publication. Without governance signals in the API, the agent treats every version it finds as authoritative: a three-year-old policy, a locale variant for the wrong region, a draft an editor forgot to unpublish.

A monolithic CMS typically provides no governance layer for machine consumers. To act reliably on retrieved content, an agent needs stable identifiers that point to the authoritative current version of a document, a versioning API that confirms the content is up to date, and a queryable approval status that confirms the content has been through the required review before it is consumed in an automated workflow. These are the signals that separate content an agent can trust from content that is outdated, misdirected, or still in draft.

When an agent acts on outdated or unauthorised content, the model has usually done exactly what it was supposed to do. The failure sits upstream, in the content layer. Monolithic CMS platforms built for human readers expose no signal that an agent can use to assess whether the content it retrieved is current, approved, and contextually appropriate. The agent acts in good faith on content it has no way to evaluate.

A significant fraction of AI failures in production are content governance failures, not model errors: the agent retrieved real content that was outdated or out of context, and acted on it accurately.

What the AI tax looks like in practice

The AI tax is the accumulated engineering cost of bridging the gap between what the legacy CMS provides and what agentic AI requires.

Data normalisation and ETL

Extracting content from a monolithic CMS into a structured format that agents can consume requires building and maintaining an ETL (extract, transform, load) pipeline. This pipeline normalises field types, resolves relationship inconsistencies, and produces a structured data layer on top of the unstructured legacy content. It must be maintained continuously as content changes.

API development

If the legacy CMS does not expose a reliable, machine-readable API, one must be built. This is a development project, not a configuration task, that sits between the CMS and every AI system that needs to consume content from it.

Governance instrumentation

Adding versioning, authorisation status, content provenance, and freshness indicators to a legacy system requires either significant custom development or ongoing manual processes that audit and annotate content before it is made available to AI systems.

Ongoing maintenance

Every CMS update, content migration, or new editorial workflow that changes how content is structured in the legacy system requires corresponding changes to the normalisation pipeline, the API layer, and the governance instrumentation. This overhead grows with the number of AI use cases and does not plateau.

Gartner projects that over 40% of agentic AI projects will be cancelled by the end of 2027, citing escalating costs and unclear business value. For organisations running content on legacy CMSes, the infrastructure gap is often the project’s primary constraint, paid for either explicitly (as planned investment in the normalisation and API layer) or implicitly (as failed AI pilots that produced unreliable output and were never put into production).

What an AI-ready content layer actually looks like

An AI-ready content layer is a set of properties that any content platform either has or does not have.

Structured fields. Product descriptions, policy documents, support articles, each stored as typed fields with defined value constraints, not as free-form editorial content that requires parsing to extract meaning.
Stable identifiers. Every content item has a stable, unique identifier that an agent can use to retrieve it specifically, version it, and audit when it was last updated.
A consistent delivery API. The platform exposes a REST or GraphQL API that returns typed content in a predictable format. Every agent that queries for products gets the same structure. Every agent that queries for policies gets the same structure.
Governance signals. The API exposes publication status, approval status, locale scope, and version history for every content item, so an agent can query not just for the content but for its trustworthiness signals.
Defined content ownership. Every content type has a defined owner accountable for its accuracy. The platform makes that ownership visible.

Modern headless CMSes, including Payload, Contentful, and Sanity, have these properties by design. They were built for API-first delivery because their designers anticipated that content consumers would not always be browsers. The AI readiness was a consequence of that architectural decision.

The organisations paying the AI tax on their legacy CMSes made the right decisions for the content requirements of their time. API-first delivery, structured data models, and machine-readable governance were not priorities when those platforms were built. The cost accumulates because each new AI use case hits the same structural limits and requires the same remediation work before it can be served.

FAQ

What is a monolithic CMS and why is it incompatible with agentic AI?

A monolithic CMS is a content management system where the authoring interface, the content storage, and the rendering engine are tightly coupled in a single application. WordPress, Drupal, and legacy custom CMS platforms are typical examples. The incompatibility with agentic AI is structural: monolithic CMSes were designed to produce HTML pages for human readers, not structured data responses for machine consumers. AI agents need typed fields, stable identifiers, consistent relationships, and governance signals, none of which monolithic CMSes expose reliably through machine-readable APIs.
What is the AI tax on legacy stacks?

The AI tax is the accumulated engineering cost of bridging the gap between what a legacy content platform provides and what agentic AI systems require. It includes data normalisation and ETL pipeline development, custom API development, governance instrumentation, and ongoing maintenance of all three. Unlike a one-time migration cost, the AI tax is recurring: it grows with the number of AI use cases and compounds with each change to the legacy CMS that requires corresponding changes to the bridging layer.
What is agentic AI and why does it require structured content?

Agentic AI refers to AI systems that pursue goals autonomously, planning, calling tools and APIs, coordinating with other agents, and taking actions rather than generating text responses. Unlike a human user who can interpret visual layouts and implicit editorial structure, an agent retrieves content programmatically through APIs and requires consistent, typed data with stable identifiers and governance signals. The 1,445% surge in multi-agent system inquiries reported by Gartner reflects a shift from AI assistants that help humans to AI agents that act on their behalf, a shift that makes content structure a functional requirement.
How does a legacy CMS create hallucinations in AI systems?

When an AI agent retrieves content from a legacy CMS and that content is outdated, inconsistently structured, or lacks governance signals indicating whether it is authoritative, the agent acts on what it retrieved, accurately. From the agent's perspective, it followed instructions correctly. From the user's perspective, the output was wrong. These failures are often attributed to model hallucination, but they are frequently content governance failures: the model retrieved real content that was wrong and acted on it faithfully. Legacy CMSes that lack versioning APIs, approval status exposure, and locale-specific content isolation are the primary source of these failures.
What does an AI-ready content platform look like?

An AI-ready content platform stores content in structured fields with defined value types, exposes content through a consistent machine-readable API, provides stable identifiers and version history for every content item, exposes governance signals including approval status and publication scope, and has defined content ownership at the type level. Modern headless CMSes, including Payload, Contentful, and Sanity, have these properties by design. Legacy monolithic CMSes do not, and building them retrospectively is the AI tax.
How quickly is the cost of not migrating compounding?

Gartner projects that 40% of enterprise applications will embed AI agents by end of 2026. Each AI use case that cannot be served without bridging legacy content infrastructure adds to the tax. The overhead grows because each new use case requires extending the normalisation pipeline, the API layer, and the governance instrumentation, and the maintenance cost of all three scales with the number of use cases. Organisations that delay content platform modernisation are accumulating a cost that grows with each month the agentic AI adoption curve steepens.

Book a call

Sources

Author

Paul Utr

Co-founder & Co-CEO

Paul has been launching online platforms since his teens, picking up UX and product design by building them. He led the Mailgun redesign at Netguru and was Principal Designer at Ramp Network through its seed-to-Series-B run. At WAYF he leads design and organisational alignment, and watches how language carries through every product we ship.

About Paul LinkedIn

We're booking content platform
engagements for 2026.

Twenty-five minutes to walk through the work and decide if we're the right team for it. Scoping and a fixed price come after.

Book a 25-min call Or email us instead