Office Hours Q&A: How do I ensure LLMs can crawl my site?

In this edition of Office Hours, we tackle one of the most important questions agencies are asking in an age of AI-powered discovery: How do I make sure that large language models (LLMs) like Claude, ChatGPT, and other AI content agents can actually access and “crawl” my website when it’s behind Cloudflare or hosted through hosting.com’s MWP platform powered by Rocket.net?

This question came up during a recent session with Nathan Ingram during hosting.com Agency Success Office Hours, where a participant was blocked from testing URL redirects with Claude because it couldn’t crawl the site. Nathan walked through troubleshooting steps and practical insights. And in this blog, we build on that with deeper context, clearer workflows, and additional credible background on how Cloudflare and AI crawlers actually behave.

Why AI crawler access matters

Traditionally, the concept of crawling was tied to search engines like Google and Bing. These search engines send bots to your site, read HTML content, metadata like titles and descriptions, and index that content so users can discover it in search results.

But today, AI systems also rely on this crawling process - not just to index pages for search, but to ingest content to generate answers, summaries, or training data. Tools like Claude, ChatGPT, Perplexity, and other AI assistants increasingly answer questions by pulling from web content, either in real time or from previously indexed material. If these systems can’t access a page, they can’t reference it or quote it in their responses, which means your content vanishes from an increasingly prominent set of discovery channels.

This creates three major practical shifts for agencies and content owners:

AI Search Visibility - AI answers, summaries, and references are becoming part of how people discover brands, services, and expertise. Being excluded from that layer limits your reach even if your SEO is strong. AI tools like Claude or ChatGPT can drive awareness and referral traffic, but only if they can read your content.
Brand Authority - When AI tools cite or reference your domain, it reinforces credibility. When they cannot access your content, your competitors may fill that gap instead. Being cited directly by intelligent assistants boosts trust and domain visibility.
Future Monetization - Some platforms are experimenting with structured ways to monetize AI access to content. Cloudflare and other platforms are experimenting with models where content access for AI could be monitored or monetized. Whether you opt in or out of that conversation depends on how you configure your site today.

The practical reality is this: AI visibility is becoming a new form of discoverability. You need to decide whether you want in or out.

Cloudflare’s shift in AI crawler controls

Cloudflare is one of the most widely used edge networks and security layers on the web. Historically, websites were “crawlable” by default. Cloudflare primarily focused on blocking malicious bots, scrapers, or abusive traffic, rather than AI specifically. But for content owners, AI crawling has grown into a challenge - chatbots and generative AI systems scrape huge volumes of data without sending meaningful traffic back to the original content source.

Over the last couple of years, that posture has changed.

Cloudflare now treats AI crawlers as a distinct category of traffic. Instead of assuming they are welcome, Cloudflare gives site owners explicit control over whether AI systems can access their content.

AI crawl control

Cloudflare now provides tools under AI Crawl Control that let you explicitly allow or block specific AI crawlers. With these, you can:

See AI crawler activity and how frequently bots request content
Choose to allow or block each AI crawler at the network edge
Optionally include monetization through pay-per-crawl (currently in beta)

This approach means your site no longer assumes AI bots are welcome by default. You make the choice.

Default blocking on new sites

In a major shift in 2025, Cloudflare began blocking AI crawlers by default on new domains unless you actively enable access. This flips an old assumption. Previously, if you did nothing, bots could usually crawl. Now, if you do nothing, some AI tools may be blocked before they even reach your server; instead of AI accessibility being automatic, it’s opt-in.

That makes configuration essential; if you want LLMs and AI tools to read your content, you need to actively permit them.

Step-by-step: Allowing LLM crawling on Cloudflare

Now that we have established how vital it is for crawlers to have access to your site’s contents, here’s a practical checklist to make sure your site is crawlable by AI tools like Claude.

1. Confirm your Cloudflare AI Crawler settings

Look for:

AI Crawl Control - This is where you can explicitly allow specific crawlers by name or category.
Bot traffic settings - Cloudflare has toggles like “Block AI bots” that may be active by default. For AI visibility, you may want this inactive so that API-based crawlers aren’t blocked unwittingly.

Because Cloudflare can block AI crawlers at the network layer, your first task is to ensure these aren’t set to “block”.

2. Review your robots.txt file

A robots.txt file at the root of your site communicates crawl directives to bots. Traditionally, it’s been used for search engine crawlers, but modern AI bots generally respect this file too.

To allow AI crawlers, make sure:

The sections you want indexed do not have disallow rules for known AI user agents.
You have a valid sitemap reference, so crawlers know where your main pages are.

Robots.txt might look like this to allow general crawling:

Keep in mind that robots.txt compliance is voluntary, and some bots ignore it - which is why Cloudflare’s network controls are a more reliable mechanism in practice. hosting.com’s knowledge base has a specific resource to guide you in controlling search engines and web crawlers using the robots.txt file.

3. Validate your security rules and custom WAF policies

One key reason AI testing tools (and other bots) get blocked is due to security rules or WAF (Web Application Firewall) rules in Cloudflare.

If you have:

Custom Bot Rules
Rate limiting
WAF rules blocking certain user agents or IP ranges

…those could inadvertently catch bot crawl attempts.

Cloudflare’s AI Crawl Control integrates with WAF custom rules, so if a bot is blocked at that level, the crawler will be denied before anything else.

To make sure you aren’t accidentally restricting AI crawling, review:

Whether custom security rules are too broad.
Whether your blocking rules accidentally capture legitimate AI user agents.

4. Monitor crawl activity and adjust over time

Once you allow access, it’s worth checking the AI crawler activity dashboard to understand who’s visiting your content and how often.

Cloudflare provides this data as part of AI Crawl Control. You may see entries for:

OpenAI’s bots
Anthropic’s Claude crawler
Other AI providers that identify themselves

If you see that crawlers are being denied even after updating settings, revisit whether your bot policies or firewall rules are interfering.

Hosting.com / Rocket.net and Cloudflare integration

One of the things Nathan highlighted is that Cloudflare behaves differently depending on how it is implemented. If you run Cloudflare directly, you control every rule. If you are on hosting.com’s Managed WordPress platform powered by Rocket.net, Cloudflare is integrated at an enterprise level.

Nathan pointed out during the Office Hours session that if Cloudflare is implemented by Rocket.net as part of your managed hosting, your custom Cloudflare settings may not apply the same way as a standalone account. In many cases:

The enterprise Cloudflare implementation overrides your personal rules.
Security policies may be managed by hosting.com rather than by you.
Some custom settings (like manual bot blocks) are ignored, which can simplify configuration but may also mean you need to work with hosting.com’s support to ensure AI crawlers are allowed.

This is not a flaw in the system, but it does change how you troubleshoot. Instead of tweaking your own Cloudflare dashboard, you sometimes need to ask your host to adjust settings on your behalf.

Nathan also offered a practical workaround. If an AI tool simply refuses to access a page no matter what you try, exporting that page as a PDF and uploading it directly to the AI can bypass the issue entirely.

Common pitfalls and how to diagnose them

Even when you follow best practice, things can still go wrong. Here are the most common scenarios agencies run into.

The AI tool is blocked by Cloudflare: This usually appears as “network restrictions” or similar messages.

Solution: Check AI bot settings and security analytics. Look for blocked or challenged requests that match the time you tested.

The page loads for humans but not for AI: This can happen when Cloudflare challenges suspicious traffic but allows normal browser access.

Solution: Look for CAPTCHA challenges or JavaScript-based verification that AI tools cannot complete.

Old cache is causing misleading results: Sometimes, Cloudflare or your hosting cache still holds old rules even after you change settings.

Solution: Purge cache, wait a few minutes, then retest.

Managed hosting overrides your changes: If you are on Rocket.net, your personal Cloudflare settings might not be the final authority.

Solution: Ask hosting.com support whether AI access is being controlled at the enterprise level.

Should you actually allow AI crawlers?

This is where strategy meets technology.

There is no universally correct answer. Some brands actively want AI systems to reference their content. Others prefer to limit access.

Allowing AI crawlers makes sense if:

Your content is educational, marketing-focused, or thought leadership.
You want your content/brand to appear in AI-generated answers.
You see AI visibility and referral as a growth channel rather than a risk.
You’re not concerned about content being repurposed without a click-through.

Blocking AI crawlers may make sense if:

Your site contains proprietary or paid content
You rely heavily on direct traffic for revenue (AI summaries could cannibalise traffic)
You want more control over how your material is used
You want to negotiate compensation for AI access (via pay-per-crawl)

The key is to decide your content strategy and configure your systems accordingly - whether that’s explicit opt-in or blocking. The technical steps we have covered simply give you the power to implement whichever strategy you choose.

Final thoughts…

Ensuring that LLMs can crawl your site is no longer just a technical checkbox. It is now tied to how visible your work is in an AI-first web. The era of open, unregulated AI scraping has shifted toward explicit, controlled permissions and network-level decisions. Cloudflare has given site owners far more control over AI access, but that control means you need to be deliberate about your choices rather than relying on defaults.

If you want deep control, managing Cloudflare yourself gives you that flexibility. If you prefer predictability, managed hosting through platforms like hosting.com can take much of that burden off your shoulders.

In the end, the real question is not simply whether Claude can crawl your site, but whether you want AI systems to play a role in how your content is discovered and shared. Once you are clear on that, the technical steps become much easier to navigate.