Table of Contents
- The Problem with Today's AI Crawlers
- What Agent Name Service Actually Does
- How Web Bot Auth Works
- Implementation Guide for Site Owners
- Implementation for Agent Developers
- Monetization and Access Control Options
- What This Means for Content Publishers
The Problem with Today's AI Crawlers
Right now, AI crawlers show up at your site claiming to be GPTBot or Claude-Web, and you have exactly two options: believe the User-Agent string or don't. That's it. You can't verify the identity. You can't charge for access. You can't audit what they took.
Analogy: This is like letting anyone into your building as long as they're wearing a delivery uniform. No ID check, no signature, no tracking.
The old crawler contract worked because Google needed your content and you needed Google's traffic. Everyone played nice because the incentives aligned. AI training models broke that contract. They scrape once, train forever, and send you nothing back.
Cloudflare and GoDaddy just announced a technical fix: Agent Name Service (ANS) and Web Bot Auth. These aren't policy statements or gentleman's agreements. They're actual protocols that give every AI agent a verifiable identity and let you set real access rules.
What Agent Name Service Actually Does
ANS is a DNS-based registry that anchors AI agent identities to domain names. Think of it as a phone book, but for bots.
When an AI company registers an agent with ANS, they publish a DNS TXT record at a subdomain like _ans.agent.company.com. That record contains:
- Agent identifier
- Public key for signature verification
- Links to transparency logs
- Contact information
The transparency log piece matters. Every ANS registration gets logged in a Certificate Transparency-style append-only log. You can audit who registered what and when. No stealth agents.
Here's what a typical ANS lookup returns:
| Field | Example Value | Purpose |
|---|---|---|
| Agent ID | openai.com/gptbot | Canonical identifier |
| Public Key | JWK format | Verify signatures |
| Log Entry | CT log URL | Audit trail |
| Contact | abuse@openai.com | Report issues |
The DNS anchoring is clever. If someone tries to impersonate GPTBot, they'd need to compromise OpenAI's DNS. That's a much higher bar than spoofing a User-Agent header.
How Web Bot Auth Works
Web Bot Auth is the authentication layer. It's an IETF draft standard that lets agents prove their identity on every request using cryptographic signatures.
The flow looks like this:
The agent includes these HTTP headers with every request:
Signature-Agent: The ANS identifierSignature: Cryptographic signature of the requestSignature-Input: Details about what was signed
Your server fetches the public key from ANS, verifies the signature, and knows exactly who's asking.
Implementation Guide for Site Owners
If you run a content site, you have three implementation paths:
Option 1: Use a CDN that supports it
Cloudflare is building native ANS/Web Bot Auth support. You'll get a dashboard where you can:
- Allow verified agents automatically
- Block unverified crawlers
- Set rate limits per agent
- Track which agents accessed what
DataDome already added Web Bot Auth to their bot protection service. If you use either platform, this becomes a configuration change.
Option 2: Implement verification yourself
The verification logic is straightforward:
async function verifyBotRequest(request) {
const agentId = request.headers.get('Signature-Agent');
const signature = request.headers.get('Signature');
// Fetch public key from ANS
const publicKey = await fetchANSPublicKey(agentId);
// Verify signature
const isValid = await crypto.subtle.verify(
{ name: 'RSASSA-PKCS1-v1_5' },
publicKey,
signature,
buildSignatureInput(request)
);
return isValid ? agentId : null;
}
The hard part is deciding what to do with verified vs. unverified traffic.
Option 3: Proxy through an auth service
Several startups are building ANS verification as a service. You route crawler traffic through their API, they handle verification, you get back a clean decision.
Implementation for Agent Developers
If you're building an AI agent or crawler, Web Bot Auth is your credibility signal.
First, register with ANS. Publish your DNS record:
_ans.myagent.company.com TXT "v=ANS1; id=company.com/myagent; key=<JWK>; log=<CT-URL>"
Then sign every request. Here's the critical code from Stytch's implementation guide:
async function createWebBotAuthHeaders(url, signatureAgent, publicJWK, privateKey) {
const now = Math.floor(Date.now() / 1000);
const tomorrow = now + (24 * 60 * 60);
const nonce = crypto.randomUUID();
const hostname = new URL(url).hostname;
const signatureInput = `("@method" "@target-uri" "@authority" "signature-agent" "signature-nonce" "signature-created" "signature-expires");created=${now};expires=${tomorrow};nonce="${nonce}";alg="rs256";keyid="${publicJWK.kid}"`;
const signature = await signRequest(signatureInput, privateKey);
return {
'Signature-Agent': signatureAgent,
'Signature-Nonce': nonce,
'Signature-Created': now.toString(),
'Signature-Expires': tomorrow.toString(),
'Signature': signature,
'Signature-Input': signatureInput
};
}
Key points:
- Use a real keypair, not a shared secret
- Include timestamps to prevent replay attacks
- Sign the full request context (method, URL, authority)
- Use a unique nonce per request
Monetization and Access Control Options
This is where it gets interesting for publishers. Once you can verify agent identity, you can start charging.
Here are the access models publishers are testing:
| Model | How It Works | Best For |
|---|---|---|
| Allowlist | Only verified agents allowed | High-value content |
| Tiered Access | Free tier + paid API | News sites |
| Pay-per-request | Micropayments via header | Research databases |
| Attribution Required | Free but must cite source | Academic content |
| Training-only License | Different price for training vs. inference | All content |
The pay-per-request model is particularly clever. An agent includes a payment proof header (think macaroon tokens or Lightning Network invoices), and you verify payment before serving content.
Some publishers are thinking bigger. If you can identify the agent, you can track ROI. Did that Perplexity crawl lead to traffic? Did Claude's training run include proper attribution? You finally have audit data.
What This Means for Content Publishers
The practical takeaway: you're about to get real control over AI access to your content.
In six months, your crawler management will probably look like this:
- Verified agents with good track records get automatic access
- Unknown crawlers get blocked or rate-limited
- Commercial AI companies pay per request or via license
- You get logs showing exactly what was accessed
This doesn't solve every problem. Agents can still lie about what they'll do with your content. Enforcement still requires legal action. And some crawlers will ignore the new standards entirely.
But it solves the identity problem. You'll know who's asking. That's the foundation for everything else.
The bigger shift is philosophical. The old web ran on implicit permissions and good faith. The agentic web is moving toward explicit permissions and cryptographic proof. That's probably the right direction when billions of dollars of AI training are at stake.
For content owners: start thinking about your access policy now. Cloudflare's tools will be shipping soon. You'll need a plan for which agents to allow, which to charge, and which to block.
For agent developers: implement Web Bot Auth before site owners start blocking unverified traffic by default. Your credibility as a good actor depends on it.
The crawler contract is being rewritten. This time in code, not just convention.