Bot Crawl Statistics
GEO is empirical, not theoretical. You cannot optimize what you cannot measure. This page explains how Geogroup tracks AI crawler activity and why transparency in measurement is fundamental to effective GEO.
Bot crawl activity is Signal 3 in the 8-Signal GEO Framework — the empirical proof that your infrastructure is being consumed by AI systems.
What We Monitor
Every AI bot visit to a GEO-optimized site is captured and categorized. The telemetry system tracks the following dimensions across all monitored properties:
Bot Identity by Crawler
Individual identification of every AI crawler: GPTBot and ChatGPT-User (OpenAI), ClaudeBot and Claude-Web (Anthropic), PerplexityBot (Perplexity AI), Google-Extended (Google Gemini), Bingbot (Microsoft Copilot), Applebot, Amazonbot, Meta-ExternalAgent, YouBot, DuckAssistBot, and cohere-ai. Each is tracked independently.
Page-Level Crawl Distribution
Which pages each AI system requests and how often. This reveals content preferences — which pages AI systems find most valuable, which are under-crawled, and where optimization effort should be focused. Page-level data drives targeted GEO improvements.
Hourly and Daily Trends
Crawl frequency patterns over time — hourly granularity for real-time monitoring, daily rollups for trend analysis. Trend data reveals whether AI visibility is growing, stable, or declining, and correlates crawler behavior with infrastructure changes.
CDN Cache Behavior
Whether bot requests were served from CDN cache (cache HIT) or from the origin edge function (cache MISS). This is critical because server-side logs miss cache HITs entirely — without middleware-level telemetry, you undercount actual bot traffic by a significant margin.
Why Transparency Matters
Most businesses have zero visibility into how AI systems interact with their website. Standard analytics tools like Google Analytics do not track bot visits. Server access logs miss CDN-cached requests. Without purpose-built telemetry, GEO optimization is guesswork.
Geogroup believes GEO must be empirical. Every claim about AI visibility should be backed by data from production telemetry, not theoretical models or assumed crawler behavior. Transparency in measurement is what separates GEO infrastructure from GEO marketing.
The measurement principle
If you cannot prove that AI crawlers are visiting your site, consuming your structured data, and returning regularly, then your GEO infrastructure is not validated. Bot crawl statistics are the ground truth of AI visibility — everything else is assumption.
How the Telemetry Works
Middleware-Level Capture
Bot detection runs at the middleware layer (Vercel edge middleware), before the CDN cache check. This captures 100% of bot traffic regardless of cache status. Every request from a known AI crawler is logged with bot identity, page path, full user-agent string, and timestamp. The detection uses maintained user-agent pattern matching covering all major AI crawlers.
Hourly Aggregation
Raw crawl logs are aggregated hourly via the increment_bot_crawl RPC into the bot_crawl_hourly table. This provides efficient time-series data without querying raw logs. Aggregation groups by bot name, page path, and hour — enabling both crawler-level and page-level analysis at hourly resolution.
Daily Rollups for Trend Analysis
Daily rollup queries produce trend data: total crawls per day by bot, crawler diversity metrics (how many distinct AI systems visited), page coverage analysis (what percentage of pages received bot attention), and week-over-week growth rates. These rollups power the monthly client reports and daily GEO audit emails.
Health Check Integration
The health-check-daily function verifies that bot crawl logging is operational as part of its daily sweep. It confirms that all bot-facing endpoints return valid HTML, AI surface files are accessible, and the telemetry pipeline is writing data. Results feed into the automated GEO audit email.
What Clients Receive
Every Geogroup client with an ongoing management engagement receives monthly crawl reports with the following data, drawn directly from production telemetry:
Crawler Activity Summary
Which AI systems are visiting, total crawl volume per crawler, and month-over-month changes. Identifies which AI platforms are most actively indexing your content and where growth or decline is occurring.
Page-Level Analysis
Crawl distribution across your pages — which content AI systems prioritize, which pages may need optimization, and how page-level patterns change over time. Actionable data for content strategy decisions.
Frequency and Trend Data
How often each AI system returns, whether crawl frequency is increasing or decreasing, and correlation analysis between infrastructure changes and crawler response. Weekly and monthly trend charts.
Anomaly Alerts
Automated detection of significant changes: sudden drops in crawl activity (possible infrastructure issue), new AI crawlers appearing (expanding visibility), or unusual page-level patterns (crawler behavior changes).
Live Dashboard Coming Soon
A real-time crawl statistics dashboard is in development, providing clients with self-service access to their bot crawl data, trend visualizations, and GEO signal status. Until then, all data is delivered via monthly reports and daily automated GEO audit emails.
Get started with GEO telemetry → | Learn about the 8-Signal Framework →