How your CMS implementation affects AI search visibility

The technical decisions that determine whether AI platforms cite you or your competitors.

Andy Blyth, Technical Architect, 16 April 2026

When a prospective client asks ChatGPT or Perplexity to recommend an implementation partner, the answer depends on what those systems can extract from your website. The technical decisions buried in your CMS architecture including how pages render, how content is structured, which crawlers have access etc. now directly affect whether you appear in those recommendations or your competitors do.

This matters commercially, not just technically. Vercel’s data shows AI crawler traffic to websites grew tenfold between late 2023 and mid-2024. On some sites, AI crawlers now generate more requests than Googlebot. As more B2B buyers start their research in AI platforms rather than Google, the agencies whose content AI can read and cite will capture an increasing share of inbound enquiries. The agencies whose content it can’t will lose ground without understanding why.

Research from Princeton quantifies the advantage. Pages that cite authoritative sources are 132% more likely to be retrieved by AI systems. Pages that include specific statistics are 65.5% more likely. These are large, compounding advantages and they depend on how your CMS is built, not just what your content team writes.

This article translates the technical requirements into decisions marketing leaders can act on. For the full implementation detail with code examples across Optimizely, Contentful and headless architectures, Andy has published a companion technical guide at technicaldogsbody.com.

The rendering problem most marketing teams don’t know they have

Before any conversation about content structure, schema markup or AI optimisation, one question determines whether anything else matters: can AI crawlers actually read your website?

Most AI crawlers including those operated by OpenAI, Anthropic and Perplexity do not run JavaScript. They request a URL, read the HTML that comes back, and move on. If your website relies on client-side JavaScript to load its content (common in React single-page applications and some headless CMS setups), those crawlers receive an empty shell. Your content, your case studies, your partnership credentials... none of it exists as far as the AI is concerned.

Google does execute JavaScript, so this gap may never have surfaced in your SEO reporting. But Google is the exception. Every other major AI platform reads only the server-rendered response. If your development team built the front end as a client-side application, you have been invisible to the fastest-growing category of search traffic without anyone noticing.

The fix is server-side rendering: ensuring the HTML your server sends contains the full page content before any JavaScript runs. Frameworks like Next.js (with SSR or static generation), Astro and server-rendered Razor views all support this. A React SPA that fetches content after the page loads does not.

Your content reaches AI two ways. Only one of them is quick.

Understanding how AI retrieval works changes what you prioritise. There is a meaningful difference between the two ways AI systems answer questions, and each requires a different type of investment.

Training data: the long game

For everyday queries where current information isn’t essential, AI systems answer from patterns absorbed during model training. Influencing this takes months, your content has to be published, crawled and incorporated before it shapes responses. Consistent, authoritative content over time is what builds this presence. There are no shortcuts, and the payoff is typically nine to twelve months away.

Live retrieval: where CMS architecture pays off now

For time-sensitive or specific queries, AI systems run live web searches and synthesise from multiple retrieved sources. ChatGPT triggered this mode roughly a third of the time as of late 2025, running multiple search queries per prompt and cross-referencing results for accuracy. This is where your content model, schema markup and page structure make an immediate difference. If your content is well-structured and extractable, it gets cited. If it’s locked behind client-side rendering or buried in unstructured prose, the AI passes over it in favour of a competitor whose site is easier to read.

Three content patterns AI actually cites

Research from Growth Memo found that AI systems draw disproportionately from the opening portion of a page, the top third accounts for nearly half of all citations. A vague introductory paragraph that says nothing specific wastes the most valuable space on the page.

Three patterns consistently improve whether AI cites your content.

Visible summaries near the top of the page

Not a meta description. A visible block of content (a key takeaways section or a confident opening summary) that delivers your core message in specific, factual terms. If your CMS doesn’t have a dedicated summary field for editors to populate, this content either gets skipped or buried below the fold. The content model needs to make it easy to do the right thing.

Genuine FAQ sections

Structured question-and-answer pairs are easier for AI to extract and cite than the same information woven through narrative prose. They’re effective on partnership pages, service pages and sector landing pages, anywhere buyers have specific questions. The questions have to be real, though. Generic padding (“What is digital transformation?”) adds nothing. Questions drawn from actual RFP conversations, sales calls and review platforms are what earn citations.

Headings phrased as questions, answered immediately

When a section heading reads “How long does a typical Optimizely implementation take?” and the first sentence gives a direct answer with a specific timeframe, AI can lift that cleanly and attribute it. It’s the same pattern users follow when they ask an AI assistant a question. Pages structured this way give the AI exactly what it needs.

Schema that drifts is schema that loses

Structured data tells AI systems what your content means, not just what it says. A JSON-LD block that declares your organisation’s name, your partnership tier, your team’s credentials and your FAQ content in machine-readable format gives AI a reliable extraction layer on top of the page content.

The problem is that most CMS implementations treat this as a one-off task. A developer adds schema to a template during the initial build, and it never changes again even as the content underneath evolves. FAQ answers get updated but the FAQPage schema still reflects the original version. A new team member earns an MVP certification but the Person schema doesn’t know. The schema drifts out of sync with reality, and AI systems either get stale information or skip the page in favour of a competitor whose structured data matches their content.

The solution is schema generated automatically from your content model. When an editor updates an FAQ, the FAQPage schema updates. When an article is republished, the dateModified field reflects the real date. All of it output in the server-rendered HTML, because schema injected by JavaScript after page load is invisible to most AI crawlers.

Stale pages get skipped. Here’s what “fresh” actually means.

AI systems score content on recency before they read it. SE Ranking’s research found that recently updated pages average significantly more AI citations than stale ones, and academic studies have shown recency bias can shift a page’s effective ranking by up to 95 positions (arxiv.org).

For marketing leaders, this means two things. Substantive content refreshes on a 90-day cycle are a competitive requirement, not a nice-to-have. And the technical implementation has to support it: accurate last-modified dates in your sitemap, and dateModified properties in your Article schema that reflect genuine updates rather than cosmetic timestamp changes. AI systems are increasingly able to tell the difference.

Which AI crawlers should you let in?

Most organisations have a robots.txt file that was written before the current generation of AI crawlers existed. It may be inadvertently blocking the systems your prospective clients use to research agencies.

The critical distinction that most teams miss: there are training crawlers (which absorb your content for future model training) and search crawlers (which retrieve your content in real time when a user asks a question). Blocking a training crawler is a strategic IP decision. Blocking a search crawler means your content disappears from that platform’s results immediately. These are very different consequences, and they need different decisions.

Here’s a quick reference for the bots that matter:

Bot name	What it does	What happens if you block it
GPTBot	Feeds OpenAI’s training data	Your content won’t shape future ChatGPT responses. Live search still works.
OAI-SearchBot	Powers ChatGPT’s live search	ChatGPT cannot retrieve your pages when users ask questions in real time.
ClaudeBot	Feeds Anthropic’s training data	Your content won’t shape future Claude responses.
Claude-SearchBot	Powers Claude’s live search	Claude cannot retrieve your pages during live queries.
PerplexityBot	Powers all Perplexity retrieval	Your content disappears from Perplexity entirely.
Google-Extended	Feeds Google’s AI training	Your content won’t train Google’s AI models. Standard search unaffected.
Googlebot	Powers Google Search + AI Overviews	Your site disappears from Google entirely. Never block this.

This isn’t a decision for whoever last edited the robots.txt file. It’s a commercial and legal decision that should involve marketing, legal and leadership. MSQ DX recommends documenting a clear crawler access policy and reviewing it quarterly as the AI search market evolves.

Optimizely’s GEO tooling: what it gives marketing leaders

Optimizely shipped a suite of GEO tools directly into the CMS in early 2026, and they’re worth understanding as a marketing leader, not just as a developer feature.

GEO Analytics shows you how AI agents are crawling and interpreting your content. For the first time, you can see which pages AI systems are actually reading, how they’re parsing your content, and where they’re struggling to extract useful information. This is the visibility layer that has been missing from traditional analytics GA4 tells you who visited; GEO Analytics tells you which AI systems visited and what they took away.

GEO Schema Agent audits your structured data and recommends improvements. GEO Metadata Agent does the same for your metadata, working at scale across hundreds of pages rather than one at a time. GEO Recommendations Agent flags the pages that need the most urgent attention based on their current AI readability score.

What does “good” look like in these tools? Pages where AI crawlers can extract clean, structured content that accurately represents your expertise. Pages where your schema markup matches your visible content. Pages where the GEO Recommendations Agent has nothing urgent to flag. When your highest-value pages reach that state, you’re in a strong position for AI citations.

One important caveat: these tools are a feedback loop, not a foundation. They can audit and recommend, but they can’t retroactively add FAQ fields, summary blocks or Person schema to a content model that was never designed for them. The CMS architecture work described in this article is what gives the GEO tooling something to work with. MSQ DX has been building Optimizely implementations for 18 years, and this is the most significant shift in what “good implementation” means since responsive design.

How do you know it’s working?

Once these changes are in place, marketing leaders need to track three things to demonstrate ROI to the business.

AI referral traffic. Configure a custom channel group in GA4 that captures visits from chatgpt.com, perplexity.ai, claude.ai, gemini.google.com and copilot.microsoft.com. This traffic is currently small (typically around 1% of total visits) but it converts at significantly higher rates than traditional organic search, because users arriving from AI recommendations already have high purchase intent.

AI citation share of voice. Run your priority prompts (“best Optimizely partners UK,” “Contentful implementation agency,” “CMS migration specialists”) across ChatGPT, Perplexity, Claude and Google AI Overviews monthly. Track where you appear, where competitors appear, and how that shifts over time. Tools like Semrush’s AI Visibility Toolkit, SE Ranking or Peec AI automate this, but even a manual monthly check gives you directional data.

Content extractability score. Optimizely’s GEO Analytics provides this directly. For other platforms, audit whether your top 20 pages by traffic have visible summaries, FAQ sections, correct schema markup, and accurate dateModified values. The percentage of pages meeting all four criteria is your extractability score.

Where to start

Not every site needs the same first step. This framework helps you identify the right starting point based on where your implementation stands today.

If your site...	Start here	Then move to
Uses a JavaScript framework (React SPA, Vue SPA, Angular) without server-side rendering	Fix rendering first. Nothing else matters until AI crawlers can read your HTML.	Robots.txt audit, then schema generation from your content model.
Is server-rendered but has no structured FAQ fields or visible summaries	Add FAQ and summary fields to your top 10 pages by traffic. Populate them.	Wire up automatic schema generation. Add dateModified to Article schema.
Has structured content and schema, but hasn’t been updated in 6+ months	Substantive content refresh on your highest-value pages. Update sitemaps.	Set up a 90-day refresh cycle. Review AI crawler access in robots.txt.
Has done the above but isn’t appearing in AI responses	Check robots.txt isn’t blocking search bots. Audit schema in raw page source.	Invest in third-party presence: partner directories, G2, Clutch, industry coverage.

Five questions to take to your next development meeting

These are the specific questions marketing leaders should be asking their technical teams. Each one maps to a concrete, answerable check that your developers can confirm in a single sprint.

Does our website render its content server-side, or does the page content load via JavaScript after the initial HTML response?
If the answer is client-side rendering, this is the single highest-priority fix. Ask them to view the page source (not the browser inspector) and check whether the body content is present in the raw HTML. If it isn’t, AI crawlers see a blank page.
Which AI crawlers does our robots.txt currently allow or block, and was that a deliberate decision?
Most robots.txt files predate AI crawlers. Your developers can check in under five minutes. If nobody made an active decision about GPTBot, OAI-SearchBot, ClaudeBot and PerplexityBot, the current configuration is accidental.
Is our schema markup generated automatically from the CMS content model, or was it hard-coded into templates during the initial build?
If it was hard-coded, it almost certainly doesn’t reflect your current content. Ask whether the FAQPage schema updates when editors change FAQ answers, and whether dateModified in Article schema updates when articles are republished.
Do our page templates include dedicated fields for visible page summaries and FAQ sections that editors can populate without developer involvement?
If the content model doesn’t have these fields, editors can’t add structured content even if they want to. This is a content model change, not a content strategy change, and it needs development time.
Do our sitemap last-modified dates reflect when content was genuinely updated, or when the page was first created?
Inaccurate dates mean AI systems treat your recently refreshed content as stale. Your developers can check the sitemap XML and compare dates against known recent edits in under ten minutes.

Where to start

Most of these changes are additive. You’re not rebuilding your website; you’re adding the structures AI systems need to find, understand and cite your content accurately. The earlier these are factored into your CMS implementation, the less retrofitting you’ll need later — and the sooner your content starts appearing in the AI-generated recommendations your competitors are already chasing.

MSQ DX helps organisations assess where their CMS implementation stands for AI search visibility and build the technical foundations that earn AI citations. If you’re planning a new build or need to retrofit an existing one, get in touch.

Person wearing black shoes standing on a yellow circle painted on a gray, speckled concrete floor.

NEWSLETTER _

Expert Insights

Get expert insights on digital transformation, customer experience, and commercial impact delivered to your inbox.

Tap into our latest thinking to discover the newest trends, innovations, and opinions direct from our team.

GEMEINSAM AN DIE ARBEIT _

Beschleunigen Sie Ihre digitale Transformation

JETZT BERATUNG ANFORDERN

How your CMS implementation affects AI search visibility

The rendering problem most marketing teams don’t know they have

Your content reaches AI two ways. Only one of them is quick.

Training data: the long game

Live retrieval: where CMS architecture pays off now

Three content patterns AI actually cites

Visible summaries near the top of the page

Genuine FAQ sections

Headings phrased as questions, answered immediately

Schema that drifts is schema that loses

Stale pages get skipped. Here’s what “fresh” actually means.

Which AI crawlers should you let in?

Optimizely’s GEO tooling: what it gives marketing leaders

How do you know it’s working?

Where to start

Five questions to take to your next development meeting

Where to start

Expert Insights

Tap into our latest thinking to discover the newest trends, innovations, and opinions direct from our team.

Your Umbraco 13 deadline is five weeks earlier than you think

Digital experience has become the backbone of brand loyalty

Five questions your leadership team will ask about this migration

Beschleunigen Sie Ihre digitale Transformation