Book a 15-min intro call on Google Calendar Mon–Fri, 2–10 PM IST · Free · Google Meet Pick a time →

This is a working reference of the AI crawlers that meaningfully affect AEO. User agents and IP ranges change; the canonical source is each operator’s own documentation. The list below covers the bots a site needs to make a deliberate decision about.

OpenAI

| Bot | Purpose | User-agent string contains |

|—|—|—|

| GPTBot | Training | GPTBot |

| OAI-SearchBot | Search index for ChatGPT search | OAI-SearchBot |

| ChatGPT-User | On-demand fetches when a ChatGPT user triggers a browse | ChatGPT-User |

OpenAI publishes each bot’s IP ranges separately. Allowlisting by IP plus user agent is more reliable than user agent alone (see IP whitelisting).

Anthropic

| Bot | Purpose | User-agent string contains |

|—|—|—|

| ClaudeBot | Training | ClaudeBot |

| Claude-User | On-demand fetches when a Claude user triggers web search | Claude-User |

| Claude-SearchBot | Anthropic’s search index | Claude-SearchBot |

Perplexity

| Bot | Purpose | User-agent string contains |

|—|—|—|

| PerplexityBot | Search index | PerplexityBot |

| Perplexity-User | On-demand fetches when a Perplexity user asks a question | Perplexity-User |

Perplexity publishes IP ranges as JSON files at perplexity.com/perplexitybot.json and perplexity.com/perplexity-user.json.

Google

| Bot | Purpose | User-agent string contains |

|—|—|—|

| Googlebot | Classic search index, also feeds AI Overviews and Google AI Mode | Googlebot |

| Google-Extended | Opt-out token for AI training, not a separate bot | Google-Extended |

Google-Extended is unusual: it’s a separately controllable signal in robots.txt that opts a site out of training data for Gemini and other Google AI products without affecting Search visibility. Blocking it does not block Googlebot.

Microsoft / Bing

| Bot | Purpose | User-agent string contains |

|—|—|—|

| Bingbot | Search index, feeds Bing Copilot, ChatGPT search, and others that ground in Bing | bingbot |

Microsoft has not introduced a separate AI training bot at the time of writing. Bing’s index serves both classic search and AI features.

Smaller and emerging crawlers

A non-exhaustive list of crawlers that may appear in logs:

| Bot | Operator | Purpose |

|—|—|—|

| Applebot, Applebot-Extended | Apple | Search and AI training (separately controllable) |

| FacebookBot | Meta | Training |

| Meta-ExternalAgent | Meta | Training and AI features |

| Bytespider | ByteDance | Training |

| YouBot | You.com | Search index |

| Cohere-AI | Cohere | Training |

| AI2Bot | Allen Institute for AI | Research crawls |

| Diffbot | Diffbot | Knowledge graph construction |

| CCBot | Common Crawl | Open dataset; many AI engines train on this |

CCBot is worth a separate decision: it powers Common Crawl, which is used as training data by many models that don’t run their own crawlers. Allowing CCBot effectively opts a site into a wide range of training datasets.

How to keep the list current

User agents change. New operators appear. Sites with serious AEO programs:

  • Subscribe to each major operator’s documentation feed.
  • Audit server logs monthly for unfamiliar user agents.
  • Review robots.txt for AI crawlers and WAF configuration rules quarterly against the current canonical lists.
  • Maintain a single internal source of truth, not scattered rules.

Implementation example

At AwesomeShoes Co., crawler rules had grown ad hoc across CDN, WAF, and app configs, causing contradictory allow and block behavior. The security engineer and AEO manager create one internal crawler registry mapped to bot purpose, policy decision, and verification source.

Implementation discussion: the registry is version-controlled, refreshed monthly against operator documentation, and used as the only source for robots and WAF updates. The operations analyst reviews unknown user agents from logs and routes decisions to the security owner, keeping policy readable and actionable for future releases.

WhatsApp
Contact Here
×

Get in touch

Three ways to reach us. Pick whichever suits you best.

Send us a message

Takes under a minute. We reply same-day on weekdays.

This field is required.
This field is required.
This field is required.
This field is required.
Monthly Budget
Focus Area
This field is required.
Preferred Mode of Contact
Select how you'd like to be contacted.
This field is required.