Your sitemap is configured. Your Core Web Vitals score is green. Your product catalog is perfectly structured. And yet when a user asks ChatGPT for products you sell, your store doesn't appear.
Most of the time, the reason is a single file: robots.txt.
Specifically — a robots.txt written for Google in 2019 and never updated for the ten AI crawlers that now determine your visibility in ChatGPT, Gemini, Claude, and Perplexity.
🔗 Originally published at angeo.dev/magento-2-robots-txt-chatgpt-gemini-ai-bots
Why robots.txt Is AEO Signal #1
The angeo/module-aeo-audit checks robots.txt first and marks it Critical because it is a gate. Every other AEO signal — llms.txt, Product schema, AI product feed — is irrelevant if the AI crawler cannot enter your store.
OpenAI states this without ambiguity:
"Sites that are opted out of OAI-SearchBot will not be shown in ChatGPT search answers."
Not "may not appear." Will not appear. If OAI-SearchBot is blocked — by an explicit Disallow or caught in a wildcard rule — your store is excluded from ChatGPT search answers regardless of everything else.
The Three Types of AI Bots — Why the Difference Matters
Before listing every bot, understand what each one actually does. Conflating them causes the most common robots.txt misconfiguration.
| Type | What it does | Examples | Recommendation |
|---|---|---|---|
| Search & indexing | Builds the live index used when users ask AI questions. Cites sources, links back to your store. |
OAI-SearchBot, Claude-SearchBot, PerplexityBot, Google-Extended
|
✅ Always allow |
| User-initiated | Fetches your page when a user asks AI to visit a specific URL. May cite your product page directly. |
ChatGPT-User, Claude-User, Perplexity-User
|
✅ Always allow |
| Training crawlers | Collects content for model training. No attribution, no traffic back. |
GPTBot, ClaudeBot, Applebot-Extended
|
⚠️ Your choice |
The most common mistake: blocking
GPTBot(training) thinking it removes you from ChatGPT search results. It does not.GPTBotandOAI-SearchBotare entirely separate bots. Blocking training crawlers has zero effect on AI search visibility — but blocking search crawlers makes you invisible immediately.
Every AI Bot That Matters in 2026
| Bot | Platform | Type | Impact if blocked |
|---|---|---|---|
OAI-SearchBot |
ChatGPT | Search index | Invisible in all ChatGPT search answers. Most critical. |
GPTBot |
ChatGPT | Training | Excluded from future GPT training. No effect on current search. |
ChatGPT-User |
ChatGPT | User-initiated | ChatGPT can't fetch your pages for users. |
Claude-SearchBot |
Claude | Search index | Invisible in Claude's real-time web search answers. |
ClaudeBot |
Claude | Training | Excluded from future Claude training data. |
Claude-User |
Claude | User-initiated | Claude can't fetch your pages for users. |
PerplexityBot |
Perplexity | Search index | Invisible in Perplexity answers and recommendations. |
Perplexity-User |
Perplexity | User-initiated | Perplexity can't fetch your pages for users. |
Google-Extended |
Gemini | Search + training | Not cited in Gemini AI Overviews or Google Shopping AI. |
Applebot-Extended |
Apple Intelligence | Training | Excluded from Apple Intelligence training data. |
anthropic-ai |
Anthropic | Deprecated | Legacy name for ClaudeBot. Keep rules for backwards compatibility. |
Anthropic expanded to three bots in early 2026. Sites that only reference
ClaudeBotare now missingClaude-SearchBot(live search) andClaude-User(user-initiated fetching). If your robots.txt hasn't been updated since 2024, this almost certainly applies to your store.
The Default Magento robots.txt Problem
Magento's default robots.txt starts with a wildcard:
User-agent: *
Disallow: /index.php/
Disallow: /*?
Disallow: /checkout/
Disallow: /app/
...
This wildcard establishes a baseline that every bot inherits. If your deployment script, hosting provider, or a staging migration has added Disallow: / anywhere — AI bots are caught in it silently, with no error logged anywhere.
Check this right now. Open https://yourstore.com/robots.txt and look for Disallow: / on its own line. If it exists without an explicit Allow: / for each AI bot listed above it — every one of those bots is blocked.
# Quick check — if this returns output, you have a problem
curl -s https://yourstore.com/robots.txt | grep -n "^Disallow: /$"
Where Magento Serves robots.txt — Two Scenarios
Before editing, identify which method your store uses. Editing the wrong one has no effect.
Scenario A: Magento Admin (most common)
# Check if Magento manages robots.txt
bin/magento config:show design/search_engine_robots/default_robots
If it returns a value, edit via: Content → Design → Configuration → [Store view] → Edit → Search Engine Robots → Edit custom instruction of robots.txt
Scenario B: Static file in pub/
# Check if a static file exists and is being served
ls -la /var/www/html/pub/robots.txt
curl -I https://yourstore.com/robots.txt
# If no X-Magento headers appear, the file is served statically
Edit pub/robots.txt directly, or remove it to let Magento Admin take over.
Multi-store: Each store view can have its own robots.txt. If you run multiple stores on different domains, configure each store view separately in Admin → Design → Configuration.
The Complete robots.txt for Magento 2
Paste this as your full configuration. AI bot entries must appear before the wildcard User-agent: * block.
# ============================================================
# AI SEARCH & INDEXING BOTS — Always allow
# Blocking these makes your store invisible in AI search answers.
# ============================================================
User-agent: OAI-SearchBot
Allow: /
User-agent: Claude-SearchBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Google-Extended
Allow: /
# ============================================================
# USER-INITIATED FETCHERS — Always allow
# ============================================================
User-agent: ChatGPT-User
Allow: /
User-agent: Claude-User
Allow: /
User-agent: Perplexity-User
Allow: /
# ============================================================
# TRAINING CRAWLERS — your choice
# Blocking does NOT affect search visibility.
# Change Allow to Disallow to opt out of training data collection.
# ============================================================
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: anthropic-ai
Allow: /
User-agent: Applebot-Extended
Allow: /
# ============================================================
# TRADITIONAL SEARCH ENGINES
# ============================================================
User-agent: Googlebot
Allow: /
User-agent: Bingbot
Allow: /
# ============================================================
# ALL OTHER BOTS — standard Magento rules
# AI bots above are explicitly allowed before this wildcard.
# ============================================================
User-agent: *
Allow: /
# Magento paths — block from all crawlers
Disallow: /admin/
Disallow: /adminhtml/
Disallow: /api/
Disallow: /rest/
Disallow: /graphql
Disallow: /cron.php
Disallow: /var/
Disallow: /lib/
Disallow: /dev/
Disallow: /index.php/
Disallow: /*?SID=
Disallow: /*?___store=
Disallow: /checkout/
Disallow: /customer/
Disallow: /wishlist/
Disallow: /review/
# ============================================================
# SITEMAPS
# ============================================================
Sitemap: https://yourstore.com/sitemap.xml
Sitemap: https://yourstore.com/llms.txt
⚠️ Order is not optional. robots.txt uses first-match semantics per crawler. If
User-agent: *withDisallow: /appears before the AI bot entries, those AI bots are permanently blocked — the rules below are never reached.
Four Mistakes That Block AI Bots
❌ Mistake 1 — Disallow: / left on from staging
Many Magento stores use Disallow: / on staging. This is frequently copied to production and never removed.
Common sources:
- Magento Admin: Stores → Configuration → General → Design → Search Engine Robots
- Manually edited
pub/robots.txtcopied from staging - CI/CD pipelines that sync the full staging filesystem to production
- Managed hosts (Hypernode, Nexcess, Cloudways) applying restrictive defaults on new environments
❌ Mistake 2 — Wildcard block placed before AI bot rules
# ❌ WRONG — wildcard fires first, AI bot rules below it are ignored
User-agent: *
Disallow: /
User-agent: OAI-SearchBot
Allow: / # ← never reached, bot already matched the wildcard above
# ✅ CORRECT — explicit AI rules appear before the wildcard
User-agent: OAI-SearchBot
Allow: /
User-agent: *
Disallow: /checkout/
❌ Mistake 3 — Using outdated Anthropic bot names
# Deprecated — no longer reflect Anthropic's bot infrastructure
User-agent: Claude-Web # retired 2024
User-agent: Anthropic-AI # retired 2024
# Current names (2026)
User-agent: ClaudeBot # training
User-agent: Claude-SearchBot # live search index ← missing from most configs
User-agent: Claude-User # user-initiated fetching ← missing from most configs
❌ Mistake 4 — robots.txt served from cache or static file
Some hosting setups bypass Magento when serving robots.txt:
-
Nginx static rule serves
pub/robots.txtbefore Magento handles the request - CDN caching — Cloudflare or Fastly caches the old file with long TTLs
- Varnish returns cached response without hitting the application
# Diagnosis — if no X-Magento headers, file is served statically
curl -I https://yourstore.com/robots.txt
Fix for Nginx — ensure this location block is present:
location = /robots.txt {
try_files $uri $uri/ /index.php$is_args$args;
}
Fix for Cloudflare — purge cache for /robots.txt via dashboard, or add a Cache Rule to bypass caching for that path.
Verify the Fix
# Check all critical AI search bots have Allow: /
curl -s https://yourstore.com/robots.txt | grep -A1 -E \
"OAI-SearchBot|Claude-SearchBot|PerplexityBot|Google-Extended"
# Check server logs to confirm bots are actually crawling
grep -Ei "OAI-SearchBot|Claude-SearchBot|PerplexityBot" \
/var/log/nginx/access.log | tail -20
# Via AEO audit module — checks all bots + validates rule order
composer require angeo/module-aeo-audit
bin/magento setup:upgrade
bin/magento angeo:aeo:audit
# ✓ PASS robots.txt — AI Bot Access
# OAI-SearchBot ✓ Claude-SearchBot ✓ ChatGPT-User ✓
# ClaudeBot ✓ Claude-User ✓
# PerplexityBot ✓ Google-Extended ✓
# Multi-store — run per store view
bin/magento angeo:aeo:audit --store=de
bin/magento angeo:aeo:audit --store=fr
After robots.txt — What's Next
Fixing robots.txt is the access layer. Once AI bots can reach your store, the signals that determine whether you actually appear in AI answers are:
- llms.txt — machine-readable index of your catalog that AI assistants parse in seconds → angeo/module-llms-txt
- Product JSON-LD schema — structured markup that tells AI exactly what your products cost and where to buy them → checked by angeo/module-aeo-audit
- AI product feed — structured feed required for ChatGPT Shopping eligibility → angeo/module-openai-product-feed
# See all 8 AEO signals and your complete score
bin/magento angeo:aeo:audit
FAQ
Q: Does blocking GPTBot affect ChatGPT recommendations?
No — but blocking OAI-SearchBot does. These are separate bots. GPTBot collects training data; OAI-SearchBot builds the index ChatGPT uses for search answers. Blocking GPTBot has no effect on whether your store appears in ChatGPT results. Blocking OAI-SearchBot removes you from ChatGPT search answers entirely.
Q: What is OAI-SearchBot and why does it matter more than GPTBot?
OAI-SearchBot is OpenAI's crawler that powers ChatGPT real-time search results and product recommendations. It determines whether your store appears in ChatGPT answers today. GPTBot collects training data for future model versions — its effects are long-term and indirect. Both should be allowed, but OAI-SearchBot is the one that determines your current search visibility.
Q: How do I check which AI bots are blocked in Magento 2?
curl https://yourstore.com/robots.txt | grep -i \
'searchbot\|gptbot\|claude\|perplexity\|google-extended'
If nothing appears, bots may be blocked by a wildcard Disallow: / rule. The fastest automated check:
composer require angeo/module-aeo-audit
bin/magento setup:upgrade
bin/magento angeo:aeo:audit
It checks all bots, validates rule order, and reports exact status for each one.
Q: Should I allow training bots like GPTBot and ClaudeBot?
For most ecommerce stores, yes. Allowing training crawlers means your product descriptions and category content contribute to how AI models understand and describe products in your category — which indirectly improves recommendation quality over time. If you have legal or privacy reasons to opt out, block only the training bots (GPTBot, ClaudeBot, Applebot-Extended) — your search visibility is not affected.
Check all AI bots and 7 other AEO signals in one command — free, MIT licensed:
📦 Install AEO Audit Module
🌐 Free Web Self-Assessment
Originally published at angeo.dev








