GEO Technical Architecture
Blueprint-level documentation of how Geolocus implements Generative Engine Optimization at the infrastructure layer. Every pattern described here is deployed in production and validated by real AI crawler traffic.
For the scoring framework these systems support, see the 8-Signal GEO Methodology.
The Clean-Room HTML Pattern
The core architectural insight of GEO is that AI crawlers and human visitors have fundamentally different rendering capabilities. Human visitors run browsers with full JavaScript engines. AI crawlers send HTTP requests and parse the HTML response — they do not execute JavaScript, hydrate React components, or wait for client-side API calls to complete.
The clean-room HTML pattern solves this with a dual-path architecture: bot detection at the edge determines whether the request comes from an AI crawler or a human browser. Bots receive fully-rendered semantic HTML from Supabase edge functions. Humans receive the interactive single-page application (React + Vite) with client-side rendering. Same content, optimized delivery.
The term "clean-room" reflects the design philosophy: bot-facing HTML is built from scratch with zero framework dependencies. No React, no Tailwind runtime, no hydration scripts, no external CSS files. Every byte in the response serves a purpose that an AI crawler can consume.
Request Flow Diagram
Client Request
|
v
Vercel Edge (middleware.js)
|
+-- User-Agent Analysis
|
+--[Bot detected]------> api/html.js (Vercel proxy)
| |
| v
| Supabase Edge Function
| (serve-bot-*-html)
| |
| +-- Render clean-room HTML
| +-- Embed JSON-LD structured data
| +-- Log bot visit (fire-and-forget)
| |
| v
| Return: semantic HTML
| Cache-Control: s-maxage=43200
|
+--[Human detected]---> SPA (React + Vite)
|
v
Client-side rendering
Interactive experience
The proxy layer (api/html.js) handles Content-Type normalization, CORS headers,
and CDN cache coordination between Vercel's edge and Supabase's edge.
Edge Function Architecture
Each page has its own dedicated Deno edge function deployed to Supabase (powered by Deno Deploy). Edge functions execute at the CDN layer — not in a centralized server — delivering sub-100ms response times globally. There is no cold start penalty for frequently-accessed functions because Deno Deploy keeps them warm.
Function Structure
Every serve-bot-*-html function follows an identical pattern built on three shared helpers:
_shared/layout.ts— HTML document wrapper with navbar, footer, design tokens, and default Organization JSON-LD. Accepts title, body content, active nav path, and optional description/JSON-LD overrides._shared/response.ts— Response builders with standardized Cache-Control, CORS headers, and Content-Type. ExportshtmlResponse(),errorResponse(), andhandleOptions()._shared/log-bot-visit.ts— Fire-and-forget bot crawl logger. Extracts User-Agent from request headers, detects bot identity via pattern matching, and inserts a row intobot_crawl_logswithout blocking the response.
The 10 Bot-Facing Pages
Geolocus deploys 10 marketing page edge functions, each with its own JSON-LD schema type:
| Path | Edge Function | JSON-LD Type |
|---|---|---|
/ |
serve-bot-home-html | Organization |
/services |
serve-bot-services-html | Service |
/methodology |
serve-bot-methodology-html | WebPage |
/architecture |
serve-bot-architecture-html | TechArticle |
/whitepaper |
serve-bot-whitepaper-html | ScholarlyArticle |
/about |
serve-bot-about-html | AboutPage |
/faq |
serve-bot-faq-html | FAQPage |
/case-studies |
serve-bot-case-studies-html | Article |
/crawl-stats |
serve-bot-crawl-stats-html | WebPage |
/contact |
serve-bot-contact-html | ContactPage |
In addition to marketing pages, three internal edge functions handle operational tasks:
send-email (Gmail OAuth transactional email), geo-audit-email
(automated daily GEO audit reports), and health-check-daily (infrastructure
health monitoring).
Bot Crawl Telemetry
Every bot visit to every page is logged in real time. The telemetry pipeline works as follows:
- Detection: The
log-bot-visit.tshelper extracts the User-Agent header (checkingx-forwarded-user-agentfirst for proxy transparency) and matches it against a comprehensive pattern library of known AI crawler signatures. - Logging: Identified bot visits are inserted into
bot_crawl_logswith bot name, page path, full user agent string (truncated to 500 chars), and server timestamp. The insert is fire-and-forget — it adds zero latency to the HTML response. - Aggregation: The
bot_crawl_hourlytable aggregates raw logs into hourly counts by bot name, page path, and source. This powers dashboards and trend analysis without querying the high-volume raw log table. - Analysis: Bot-by-bot breakdowns reveal which AI systems are most actively crawling, which pages they prioritize, crawl frequency trends over time, and whether new AI crawlers are discovering the site.
Recognized AI Crawlers
OpenAI
GPTBot, ChatGPT-User, OAI-SearchBot
Anthropic
ClaudeBot, Claude-Web, anthropic-ai
Google-Extended, Googlebot
Perplexity
PerplexityBot, Perplexity-User
Microsoft
Bingbot, BingPreview
Others
Applebot, Amazonbot, Meta-ExternalAgent, YouBot, DuckAssistBot, cohere-ai, CCBot
Structured Data Layer
JSON-LD structured data is generated server-side within each edge function and embedded directly in the clean-room HTML response. This guarantees that structured data is always present when AI crawlers parse the page — unlike client-side injection approaches that fail when JavaScript is not executed.
Each page's JSON-LD uses the most specific Schema.org type applicable to its content:
- Organization: Homepage and brand pages — name, URL, description, contact points
- WebSite: Site-level metadata paired with Organization for search and AI indexing
- Service: Service offering pages with provider, name, description, and service type
- TechArticle: Technical documentation with proficiency level and about topics
- ScholarlyArticle: Research and thought-leadership content with named authors
- FAQPage: Question/Answer pairs structured for direct extraction by AI systems
- ContactPage: Contact information with organization details
- DefinedTermSet: Glossary and framework definitions (e.g., the 8 GEO Signals)
The layout helper provides a default Organization JSON-LD fallback, ensuring that even if a specific function omits custom structured data, the page still has valid machine-readable metadata.
CDN Caching Strategy
Bot-facing pages use a two-tier caching strategy that balances content freshness (more frequent crawls see updated content sooner) with edge efficiency (reduced origin requests):
Tier 1: Static Marketing Pages
Cache-Control: public, max-age=0, s-maxage=43200, stale-while-revalidate=60
12-hour CDN edge cache. No browser cache (max-age=0) so content updates propagate on the
next crawler visit after a redeploy. 60-second stale-while-revalidate provides seamless
transitions during deploys.
Tier 2: Dynamic Data Pages
Cache-Control: public, max-age=0, s-maxage=60, stale-while-revalidate=30
60-second CDN cache for pages with live data (e.g., crawl-stats). Ensures AI crawlers
see near-real-time data without hammering the database on every request.
Error Responses
Cache-Control: no-store
Error responses are never cached, ensuring a broken render does not persist at the edge
and poison subsequent crawler visits.
The Vercel proxy at api/html.js coordinates caching between Vercel's edge CDN and
Supabase's edge function layer, ensuring consistent Cache-Control headers reach the final response.
Monitoring Stack: 10 GEO Infrastructure Tables
The entire GEO infrastructure is monitored through 10 purpose-built Postgres tables that track every dimension of AI visibility, from raw crawl logs to composite scoring:
geo_ledger_entries
Immutable audit trail of every GEO optimization action — what changed, when, impact score, responsible agent
geo_signal_status
Current PASS/FAIL/PARTIAL state of each of the 8 GEO signals per monitored site
geo_score_dimensions
Dimension-level scores (0-100) with weights for the 7 active composite categories
bot_crawl_logs
Raw log of every AI bot visit — bot name, page path, user agent, timestamp
bot_crawl_hourly
Hourly aggregated crawl counts by bot, page, and source for efficient trend queries
site_health_checks
TTFB, status codes, SSL validity, and response size for monitored endpoints
health_monitor_runs
Execution log of health monitoring jobs — start time, duration, pass/fail counts
health_check_daily_runs
Daily rollup of health check results for long-term uptime and performance trending
cron_heartbeats
Heartbeat pings from scheduled cron jobs — ensures automated tasks are running on schedule
middleware_heartbeat
Vercel middleware liveness signal — confirms bot detection is active at the edge
Automated Audit Pipeline
Daily at 06:00 MST, a pg_cron job triggers the geo-audit-email edge function, which:
- Queries all 10 infrastructure tables for the trailing 24-hour window
- Computes the current GEO composite score from dimension weights
- Identifies notable pages (highest/lowest crawl volume, new bot appearances)
- Generates a terse, data-dense email report
- Sends via Gmail OAuth through the
send-emailedge function
Explore the Framework
Understand the scoring methodology behind this architecture, or read the whitepaper on why GEO infrastructure is the defining investment of the AI era.