Control AI Crawl Rate

Controlling AI crawl rate is the practice of setting limits on how many requests crawlers can make per unit time, balancing fresh indexing against server load. Most sites do not need to throttle AI crawlers; the few that do should be deliberate about when and how.

When throttling is appropriate

Most crawlers self-regulate to avoid harming the sites they crawl. Throttling becomes necessary when:

The site is small or under-provisioned and crawler traffic affects user-facing performance.
A specific crawler is misbehaving — fetching too aggressively, hitting expensive endpoints, or causing measurable latency for users.
The site has expensive operations on certain URLs (database-heavy pages, paywalled gateways, search results) that should be crawled selectively.

For most well-provisioned sites, AI crawler traffic is a small fraction of overall traffic and doesn’t need throttling.

Mechanisms for controlling crawl rate

`Crawl-delay` in robots.txt for AI crawlers

User-agent: GPTBot

Crawl-delay: 5

User-agent: ClaudeBot

Crawl-delay: 5

Crawl-delay: 5 requests at most one fetch every five seconds. Operator support varies:

Bing honors Crawl-delay directives.
OpenAI’s GPTBot documentation acknowledges crawl-delay support.
Google ignores Crawl-delay and uses its own rate logic.
Perplexity behavior on crawl-delay isn’t strongly documented; observed behavior is reasonable rates regardless.

Crawl-delay is the lightest-touch control. Use it as a first try.

Search Console settings (where available)

Google lets verified site owners adjust crawl rate in Search Console (when Google judges crawling is causing site issues).
Bing Webmaster Tools lets owners set a crawl-control schedule (heavier off-hours, lighter during peak times).

These help with the search crawlers that ground AI engines indirectly.

WAF / CDN rate limiting

Per-user-agent or per-IP rate limits at the CDN or WAF layer:

# Cloudflare example

(http.user_agent contains “GPTBot”)

# Action: Rate limit at 60 requests per minute

This is the most reliable enforcement, but it’s a sledgehammer. Crawlers above the limit get 429 responses, which they may or may not handle gracefully. Use it only when softer controls have failed.

Server-side rate limiting

Application-level limits per crawler:

`python

@rate_limit(per_user_agent={

“GPTBot”: “100/minute”,

“PerplexityBot”: “60/minute”,

# …

})

Same caveat: 429s can hurt indexing. Calibrate carefully.

What not to do

Block crawlers entirely as a rate-control measure. A 403 stops the crawler from fetching at all, not just throttles it. Use WAF configuration for targeted control instead.
Return 503 errors instead of 429s. Crawlers interpret 503 as “site down” and back off aggressively, sometimes for days.
Apply the same limit to all crawlers. Different crawlers have different request patterns. A one-size limit is either too tight for some or too loose for others.
Throttle when not necessary. Most sites can absorb crawler traffic without effects.

Diagnosing whether throttling is needed

Signals that crawler traffic is causing problems:

Server response times correlate with crawler request volume.
Specific endpoints show degraded performance during crawler bursts.
Origin servers under WAF or CDN show elevated load that maps to crawler IPs.
5xx error rates rise during periods of high crawler activity.

If none of these apply, throttling is unnecessary.

Calibrating the rate

For sites that do throttle:

Observe baseline crawler request volume per bot.
Identify the threshold at which problems start.
Set the limit at 60-80% of the problematic threshold to leave headroom.
Monitor for 429 rates after applying. If 429s become common, raise the limit.

Goal: zero 429s under normal crawler behavior, with the limit catching only abnormal spikes.

Avoiding over-throttling

Over-throttling has hidden costs:

Crawlers may flag the site as unreliable and reduce future crawl frequency.
Indexing of new content slows.
Time-to-recrawl after content updates increases.

The tradeoff is real. If a site’s AEO performance is critical, allow more crawler traffic and provision capacity accordingly. If crawler traffic is genuinely overwhelming, throttle the worst offenders specifically rather than across the board.

Implementation example

During a sale launch at AwesomeShoes Co., crawler bursts begin hitting filter-heavy category endpoints and slow checkout-adjacent pages. The platform engineer must protect user performance without cutting citation visibility.

Implementation discussion: the engineer applies targeted rate limits on expensive endpoints only, keeps key buying-guide URLs unthrottled for verified search crawlers, and watches 429 rates alongside latency and citation freshness metrics. This ensures throttle rules solve a real performance issue instead of accidentally suppressing crawl coverage.

When throttling is appropriate

Mechanisms for controlling crawl rate

`Crawl-delay` in robots.txt for AI crawlers

Search Console settings (where available)

WAF / CDN rate limiting

Server-side rate limiting

What not to do

Diagnosing whether throttling is needed

Calibrating the rate

Avoiding over-throttling

Implementation example

Get in touch

Chat on WhatsApp

Book on Google Calendar

Send a message

Send us a message

When throttling is appropriate

Mechanisms for controlling crawl rate

Crawl-delay in robots.txt for AI crawlers

Search Console settings (where available)

WAF / CDN rate limiting

Server-side rate limiting

What not to do

Diagnosing whether throttling is needed

Calibrating the rate

Avoiding over-throttling

Implementation example

Get in touch

Chat on WhatsApp

Book on Google Calendar

Send a message

Send us a message

`Crawl-delay` in robots.txt for AI crawlers