Controlling AI crawl rate is the practice of setting limits on how many requests crawlers can make per unit time, balancing fresh indexing against server load. Most sites do not need to throttle AI crawlers; the few that do should be deliberate about when and how.
When throttling is appropriate
Most crawlers self-regulate to avoid harming the sites they crawl. Throttling becomes necessary when:
- The site is small or under-provisioned and crawler traffic affects user-facing performance.
- A specific crawler is misbehaving — fetching too aggressively, hitting expensive endpoints, or causing measurable latency for users.
- The site has expensive operations on certain URLs (database-heavy pages, paywalled gateways, search results) that should be crawled selectively.
For most well-provisioned sites, AI crawler traffic is a small fraction of overall traffic and doesn’t need throttling.
Mechanisms for controlling crawl rate
Crawl-delay in robots.txt for AI crawlers
`
User-agent: GPTBot
Crawl-delay: 5
User-agent: ClaudeBot
Crawl-delay: 5
`
Crawl-delay: 5 requests at most one fetch every five seconds. Operator support varies:
- Bing honors
Crawl-delaydirectives. - OpenAI’s GPTBot documentation acknowledges crawl-delay support.
- Google ignores
Crawl-delayand uses its own rate logic. - Perplexity behavior on crawl-delay isn’t strongly documented; observed behavior is reasonable rates regardless.
Crawl-delay is the lightest-touch control. Use it as a first try.
Search Console settings (where available)
- Google lets verified site owners adjust crawl rate in Search Console (when Google judges crawling is causing site issues).
- Bing Webmaster Tools lets owners set a crawl-control schedule (heavier off-hours, lighter during peak times).
These help with the search crawlers that ground AI engines indirectly.
WAF / CDN rate limiting
Per-user-agent or per-IP rate limits at the CDN or WAF layer:
`
# Cloudflare example
(http.user_agent contains “GPTBot”)
# Action: Rate limit at 60 requests per minute
`
This is the most reliable enforcement, but it’s a sledgehammer. Crawlers above the limit get 429 responses, which they may or may not handle gracefully. Use it only when softer controls have failed.
Server-side rate limiting
Application-level limits per crawler:
`python
@rate_limit(per_user_agent={
“GPTBot”: “100/minute”,
“PerplexityBot”: “60/minute”,
# …
})
`
Same caveat: 429s can hurt indexing. Calibrate carefully.
What not to do
- Block crawlers entirely as a rate-control measure. A 403 stops the crawler from fetching at all, not just throttles it. Use WAF configuration for targeted control instead.
- Return 503 errors instead of 429s. Crawlers interpret 503 as “site down” and back off aggressively, sometimes for days.
- Apply the same limit to all crawlers. Different crawlers have different request patterns. A one-size limit is either too tight for some or too loose for others.
- Throttle when not necessary. Most sites can absorb crawler traffic without effects.
Diagnosing whether throttling is needed
Signals that crawler traffic is causing problems:
- Server response times correlate with crawler request volume.
- Specific endpoints show degraded performance during crawler bursts.
- Origin servers under WAF or CDN show elevated load that maps to crawler IPs.
- 5xx error rates rise during periods of high crawler activity.
If none of these apply, throttling is unnecessary.
Calibrating the rate
For sites that do throttle:
- Observe baseline crawler request volume per bot.
- Identify the threshold at which problems start.
- Set the limit at 60-80% of the problematic threshold to leave headroom.
- Monitor for 429 rates after applying. If 429s become common, raise the limit.
Goal: zero 429s under normal crawler behavior, with the limit catching only abnormal spikes.
Avoiding over-throttling
Over-throttling has hidden costs:
- Crawlers may flag the site as unreliable and reduce future crawl frequency.
- Indexing of new content slows.
- Time-to-recrawl after content updates increases.
The tradeoff is real. If a site’s AEO performance is critical, allow more crawler traffic and provision capacity accordingly. If crawler traffic is genuinely overwhelming, throttle the worst offenders specifically rather than across the board.
Implementation example
During a sale launch at AwesomeShoes Co., crawler bursts begin hitting filter-heavy category endpoints and slow checkout-adjacent pages. The platform engineer must protect user performance without cutting citation visibility.
Implementation discussion: the engineer applies targeted rate limits on expensive endpoints only, keeps key buying-guide URLs unthrottled for verified search crawlers, and watches 429 rates alongside latency and citation freshness metrics. This ensures throttle rules solve a real performance issue instead of accidentally suppressing crawl coverage.