Why Benchmarking Your Website Before a Major Release Is Non-Negotiable
Shipping a major update without a benchmark is like deploying blind.
Search is changing quickly. Traditional search engines like Google and Microsoft still crawl websites the same way they have for years, but AI assistants and autonomous agents are now doing their own crawling, indexing, and content extraction.
Tools such as ChatGPT, Claude, and Perplexity AI increasingly rely on structured signals that help them understand whether they can access your content and how they should use it.
That’s where robots.txt and llms.txt come in.
For marketing teams, these files are becoming essential for ensuring your website is visible, accessible, and correctly interpreted by AI systems.
PerfLeaf now checks both files during a crawl to help identify issues that could impact AI visibility.
Marketing teams already optimise for search engines. But the next wave of discovery is happening inside AI interfaces.
Instead of searching Google and clicking links, users increasingly ask AI tools questions like:
AI systems then retrieve, summarise, and reference web content to answer those queries.
If your site is:
…then those systems may ignore your content entirely.
robots.txt is a long-standing web standard that tells crawlers which parts of your website they are allowed to access.
It lives at:
https://example.com/robots.txt
Search engines read this file before crawling a site.
Example:
User-agent: *Disallow: /admin/Disallow: /private/Allow: /
This tells crawlers:
Marketing teams often use robots.txt to:
However, AI crawlers also respect robots.txt.
If your site blocks unknown user agents, you might unintentionally block AI systems as well.
llms.txt is a new emerging convention designed specifically for AI systems.
It provides instructions for large language models about:
The file typically lives at:
https://example.com/llms.txt
Example structure:
# LLM usage guidelinesAllow: /blog/Allow: /guides/Disallow: /account/Disallow: /checkout/Policy: Content may be quoted and summarised with attribution.
While not yet a formal standard, many AI platforms are starting to look for this file as a signal of AI-friendly content policies.
For marketing teams, this provides a way to:
AI assistants are becoming a new traffic source.
Users increasingly ask tools like ChatGPT for:
If your content cannot be crawled or interpreted, it may never appear in those responses.
Many AI systems now cite sources.
When your content is crawlable and structured properly:
Blocking crawlers accidentally removes this opportunity.
Not all content should be available to AI systems.
Marketing teams may want to protect:
Both robots.txt and llms.txt provide ways to control access.
PerfLeaf scans for both files and highlights problems that could affect AI crawlability.
Typical issues include:
Without this file:
Example:
User-agent: *Disallow: /
This blocks all crawlers, including AI agents.
Many teams accidentally leave this rule from staging environments.
If the file is missing:
Sometimes robots.txt and llms.txt send different signals.
For example:
This creates ambiguity for AI crawlers.
PerfLeaf flags these conflicts.
Marketing teams should aim for the following.
Ensure blog posts, landing pages, and guides are crawlable.
Example:
User-agent: *Allow: /Disallow: /account/Disallow: /admin/
Provide clear instructions for AI usage.
Example:
# AI usage policyAllow: /blog/Allow: /resources/Disallow: /checkout/Disallow: /dashboard/Policy: Content may be quoted and summarised with attribution.
Ensure both files allow the same public areas of the site.
As your site grows, review crawl rules to avoid accidentally blocking valuable content.
PerfLeaf automatically checks for:
This helps marketing teams ensure their website is ready for both traditional search engines and AI discovery platforms.
AI systems are rapidly becoming a primary interface to the web.
Sites that:
…will have a significant advantage.
By monitoring robots.txt and llms.txt, PerfLeaf helps ensure your content remains visible, discoverable, and usable in the AI-driven web.
Start monitoring your website's performance and get actionable insights to improve Core Web Vitals, reduce CO₂ emissions and boost user experience.