Crawling and indexing covers everything that has to happen before an AI engine can consider a page as a candidate for citation. If a crawler can’t reach a page, can’t render it, can’t understand what’s on it, or can’t decide whether to keep it, no amount of content quality matters.
This section is the technical layer of AEO. It mirrors the equivalent section in classic SEO documentation, but with the differences that matter for AI engines: different crawler bots, different rendering tolerance, different indexing pipelines, and a new file format (llms.txt) designed specifically for AI consumption.
What’s in this section
- AI crawling — the umbrella concept and the bots that perform it.
- Content formats for AEO — which file types AI engines parse cleanly.
- Content chunking — how engines split content for retrieval.
- Passage indexing — passage-level retrieval at index time.
- URL structure for AEO — what URL design helps AI engines.
- Links and citations for AEO — how internal and external links interact with AI retrieval.
- Metadata for AEO — meta tags and rel attributes that affect AI behavior.
- Training vs crawling — the difference between data used for training and data used for live retrieval.
- Site changes and AI visibility — handling redirects, migrations, and tests.
- Multilingual AEO — language and region signaling.
How AI crawling differs from classic search crawling
Classic search engine crawling has had three decades to standardize. AI crawling is much younger and shows it:
- Many more bots, less consistent behavior. Each engine operates one or more bots, with varying degrees of robots.txt respect and varying user agents.
- Rendering tolerance is uneven. Googlebot renders JavaScript reliably. Many AI crawlers do not. Server-side rendering is closer to mandatory than optional.
- Two-tier purpose. Crawlers split into those that gather data for model training and those that retrieve content at query time. Each has different controls and different stakes.
- Standards are still emerging. llms.txt, ai.txt, and similar formats are evolving. Sites should implement what’s stable now and watch for new specifications.
This section assumes the reader has basic web infrastructure knowledge. Pages are written for the engineer or technical SEO who owns the implementation.
Implementation example
At AwesomeShoes Co., the technical SEO lead finds that new collection pages are not appearing in AI answers even after content updates. The problem is not copy quality; it is crawl and render reliability on template pages that load critical text late.
Implementation discussion: the frontend engineer moves key product context to server-rendered HTML, the SEO lead updates crawler directives, and the platform team validates response speed and status consistency for high-priority URLs. They use crawl logs plus citation checks to verify whether technical fixes restore eligibility for answer-surface retrieval.