Methodology
Last updated 2026-05-10 · Complete technical description of how lookx402 collects, decodes, and classifies x402 protocol activity on Base.
lookx402 is a passive observer. It does not run an x402 facilitator, hold any funds, accept paid integrations from labelled merchants, or operate any privileged infrastructure. Every datum on the site is derived from public Base mainnet logs that anyone can query with a free RPC endpoint.
This page is the long-form, citable version of the pipeline. The short version: we read two USDC events on Base every 5 minutes, match them by tx hash, decode the EIP-3009 authorizer correctly, aggregate by agent and merchant, and classify behavior with a deterministic hourly rule engine.
1. What x402 is, exactly
x402 is a payment protocol introduced by Coinbase that lets autonomous programs (AI agents) pay for HTTP services without per-call human approval. The transport is HTTP 402 Payment Required with a structured envelope; settlement happens onchain. The dominant settlement path on Base is the EIP-3009 transferWithAuthorization family on USDC, which lets the agent sign a payment authorization off-chain that any third party (the facilitator) can submit on-chain.
From an indexer's point of view, an x402 payment is observable as one of two on-chain patterns: a direct transferWithAuthorization call on Base USDC, or a Permit2-proxied settle call. lookx402 monitors both, with the discovery filter implemented at the RPC layer (we never download the full mempool / full block history).
2. Path A — direct USDC.transferWithAuthorization (≈89% of volume)
The agent signs an EIP-3009 authorization using EIP-712 typed-data. The facilitator wraps it in a single tx that calls one of four selectors on the canonical Base USDC contract 0x833589fcd6edb6e08f4c7c32d4f71b54bda02913:
| Selector | Function | Notes |
|---|---|---|
0xe3ee160e | transferWithAuthorization | Canonical, dominant in volume |
0xcf092995 | receiveWithAuthorization | Merchant-pulls variant |
0xef55bec6 | transferWithAuthorization (typed-data v2) | Updated EIP-712 domain |
0x88b7ab63 | receiveWithAuthorization (typed-data v2) | Updated EIP-712 domain |
Each call emits two USDC events:
AuthorizationUsed(address indexed authorizer, bytes32 indexed nonce)— fires once per consumed authorization.Transfer(address indexed from, address indexed to, uint256 value)— the actual USDC movement.
lookx402 matches them by transaction hash, then writes one canonical row to the transactions table with (payer, merchant, amount, nonce, block_timestamp, tx_hash). The payer comes from AuthorizationUsed.topics[1], never from tx.from.
Event topic constants
// USDC AuthorizationUsed
keccak256("AuthorizationUsed(address,bytes32)") =
0x98de503528ee59b575ef0c0a2576a82497bfc029a5685b209e9ec333479b10a5
// ERC20 Transfer
keccak256("Transfer(address,address,uint256)") =
0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef
3. Path B — Permit2 settle proxy (≈0% measured volume)
A second variant routes x402 settlement through a Permit2-style proxy at 0x402085c248EeA27D92E8b30b2C58ed07f9E20001 ("x402ExactPermit2Proxy"). The contract takes a Permit2 signature instead of an EIP-3009 authorization and forwards the transfer. We watch this address for Transfer emissions to/from USDC but have observed effectively zero 30-day traffic. The discovery rule remains in the indexer as a safety net.
Why mention Path B if it's empty? Because the protocol allows it. If volume migrates here in the future, lookx402 picks it up automatically without a code change. We log "Path A" or "Path B" on every indexed tx so anyone querying the API can filter.
4. Live indexing pipeline
A Cloudflare Worker fires every 5 minutes via Wrangler cron */5 * * * *. The worker is stateless — it reads the last processed block from Supabase, computes a window, fires two parallel eth_getLogs requests, decodes, deduplicates, and upserts.
Window computation
eth_getBlockByNumber("latest")to anchor the window.- Read
last_processed_blockfrom Supabaseindexer_state. - Compute
from = max(last_processed_block - 30, latest - 200). The 30-block safety overlap covers reorgs and late-arriving log indexes. The 200-block cap protects against runaway windows after extended downtime. - Fire
eth_getLogsforAuthorizationUsedon USDC over[from, latest]. - If any authorizations were found, fire a second
eth_getLogsfor the matchingTransferevents filtered bytopic1 ∈ {payers found in step 4}. - Match by tx hash. Upsert into
transactions. Updateindexer_state.last_processed_block = latest.
RPC strategy — no paid provider
We rotate across four free public Base RPC endpoints with retry-on-error:
https://base.publicnode.comhttps://base.llamarpc.comhttps://base.drpc.orghttps://base.meowrpc.com
Each RPC call uses a short timeout (4 s) and falls through to the next provider on HTTP 429, 5xx, or socket error. The worker logs the active provider per cycle to indexer_logs for observability. No Alchemy, no QuickNode, no paid plan is required.
Per-cycle cost
Two log queries × ~150 blocks each ≈ 300 KB of JSON downloaded per cycle. At 12 cycles per hour × 24 hours = 288 cycles/day, the daily egress is well under 100 MB, comfortable inside the free CF Worker tier.
5. The payer-extraction gotcha
The single most important detail on this page.
An obvious mistake is to read tx.from as the agent. It isn't. tx.from is the facilitator wallet (CDP, OpenFacilitator, Primer, etc.) that submitted the bundled authorization. The real payer is the EIP-3009 authorizer — found at topics[1] of the AuthorizationUsed event, which is also the first parameter of the call's calldata.
Leaderboards that don't decode this rank facilitators as the top agents and miss the actual machine-to-machine economy entirely. We have observed third-party analyses publish "top x402 agents" charts where the #1 spot is, in fact, the Coinbase Developer Platform facilitator — a deeply misleading conclusion.
A worked example
Consider an arbitrary x402 transaction. The naive extraction gives:
tx.from = 0xFacilitator… ← Coinbase Developer Platform
tx.to = 0x833589fcd6edb6e08f4c7c32d4f71b54bda02913 ← Base USDC
input = 0xe3ee160e + (encoded args) ← transferWithAuthorization selector
amount = 0.001 USDC ← from Transfer event value
This tells you a facilitator moved 0.001 USDC out of some agent's approval. Useless for ranking agents.
The correct extraction reads AuthorizationUsed:
event AuthorizationUsed(address indexed authorizer, bytes32 indexed nonce)
topics[0] = 0x98de503528ee59b575ef0c0a2576a82497bfc029a5685b209e9ec333479b10a5 ← topic hash
topics[1] = 0x0000000000000000000000004d839b4c3cfef1a7ef8a2faa8d3ae219dd84a95d ← AGENT
topics[2] = 0x4a2b3c… ← nonce
The agent is 0x4d839b4c3cfef1a7ef8a2faa8d3ae219dd84a95d — extracted from topics[1], padded as 32 bytes. The lower 20 bytes are the address. lookx402 records this address as the payer, and only this address.
The merchant is similarly NOT obvious
The recipient of the USDC is not always the to argument of the function call — that is the EIP-3009 nominal recipient, which in some flows is a routing contract. The actual final recipient is Transfer.to matched on the same tx hash. We always use the Transfer event for the final merchant.
6. Backfill methodology
A separate one-shot Python job replays the same two-getLogs strategy across the previous 30 days in 1 000-block chunks. The chunks are fully idempotent — keyed on tx hash — so running the script twice produces zero new rows.
Chunk strategy
HEAD = latest block at job start
FROM = HEAD - 30 * 24 * 60 * 30 / 2 # ≈ 30 days at Base block time ~2 s
CHUNK = 1000 blocks
for start in range(FROM, HEAD, CHUNK):
end = min(start + CHUNK - 1, HEAD)
fetch AuthorizationUsed [start, end]
fetch matching Transfer events
upsert
sleep(0.2) # be polite to free RPCs
Why 1 000-block chunks?
Most public RPC providers return request returned more than 10000 results at larger windows for active topics. 1 000 blocks is the sweet spot: low enough to never trigger that limit, large enough that a 30-day backfill completes in ~20 minutes on the free tier.
Late arrivals
On Base, log indexers occasionally surface a log after the block is considered final by other providers (RPC consistency varies). The 30-block safety overlap on the live indexer catches anything that "appeared late". For deeper recovery we keep a daily reconciliation job that compares per-day tx counts to a second independent RPC pull and flags any discrepancy > 0.1%.
7. Deduplication and replay detection
Every x402 authorization carries a unique nonce (bytes32, EIP-3009). The USDC contract itself rejects any attempt to settle the same (authorizer, nonce) pair twice. We never write a duplicate transaction:
- The
transactionstable has a unique constraint ontx_hash. - We additionally index
(payer, nonce)for replay-detection research (in practice, replays are impossible due to the contract-level check, but we keep the index for future analysis). - Coinbase Commerce's two payment-collector contracts are filtered out at decode time so they don't pollute the merchant set with transit transfers.
What about reorgs?
Base is an L2 with effectively rare reorgs at the depth we operate (we wait the 30-block safety window before treating a tx as final). In the rare event that an indexed tx is removed from the canonical chain, our daily reconciliation job re-confirms tx hashes via a second RPC and removes orphaned rows.
8. Profiles and dyads
Three Postgres materialized views are recomputed after each ingest cycle:
agent_profiles— per-payer aggregates:tx_count,volume_atomic,volume_usdc,distinct_merchants,lifetime_days,first_seen,last_seen,primary_archetype,archetype_confidence.merchant_profiles— per-recipient aggregates with the same shape but withdistinct_payersinstead of distinct_merchants.dyads— per(payer, merchant)pair aggregates withdays_active, plus a derivedconcentrationratio.
Concentration metrics
We compute three concentration ratios after each ingest cycle:
- HHI (Herfindahl-Hirschman Index) on the agent volume distribution — a single number that quantifies how monopolized the spend side is.
- Top-1 share, top-5 share, top-10 share for both agents and merchants.
- Gini coefficient across all agents — comparable across time and protocols.
As of May 2026 the agent side is extremely concentrated: top-1 share ≈ 45%, top-5 ≈ 85%, top-10 ≈ 95%, Gini > 0.95.
9. Behavioral classification (full rule table)
Every hour, a Postgres function reclassifies every agent into a primary archetype based on six signals: tx count, median amount, lifetime in days, distinct merchants, night-hour ratio (22:00–06:00 UTC), and median cadence jitter (delay between consecutive tx). The classifier is rule-based, deterministic, and reproducible — no ML, no training data, no stochastic output.
The 9 archetype rules (in evaluation order)
| Archetype | Rule | Confidence basis |
|---|---|---|
| ghost | tx_count == 1 | 1.0 by construction |
| sprinter | tx_count ≥ 100 AND lifetime_days ≤ 1 | Higher tx → higher confidence (cap 0.95) |
| marathoner | tx_count ≥ 200 AND lifetime_days ≥ 7 | Longer lifetime + higher tx → higher confidence |
| night_owl | night_hour_ratio ≥ 0.6 AND tx_count ≥ 10 | Distance of ratio from 0.6 threshold |
| worker_bee | tx_count ≥ 30 AND distinct_merchants ≤ 3 | Concentration ratio (1 merchant = 1.0) |
| hunter | tx_count ≥ 30 AND distinct_merchants ≥ 10 | Higher distinct → higher confidence |
| drone | tx_count ≥ 10 AND median_inter_tx_seconds < 60 | Inverse jitter (tighter loop = higher confidence) |
| burner | 2 ≤ tx_count ≤ 99 AND lifetime_days < 1 | 0.7 floor, scaled by tx count proximity to 50 |
| unknown | Everything else | 0.5 placeholder |
Resolution policy
If an agent matches multiple rules, the classifier picks the most specific in the order shown above. Specific here means highest discriminative threshold — marathoner beats worker_bee beats drone beats burner beats unknown. This deterministic precedence guarantees reproducibility across runs.
Why rule-based?
Three reasons:
- Auditability. Anyone can re-derive any classification from the six signals — the rules are on this page.
- Stability. The classification of a given agent only changes when the agent's behavior crosses a threshold, not when we retrain a model.
- Defensibility. When a journalist or court asks why we labeled wallet X as a sprinter, we point to two numbers. We do not hide behind "the model said so".
What the classifier does not do
- It does not infer ownership. Two wallets matching the same archetype are not assumed to be the same operator.
- It does not predict future behavior. An archetype is a description of past activity, not a forecast.
- It does not cross-protocol. Archetypes are based on x402-only activity. A wallet active in DEX trading too might look different in other lenses.
10. Identity enrichment
An hourly resolver Worker batches the top 200 active wallets through web3.bio (free public API) to attach ENS, Basenames, Farcaster, and Lens labels when present. Negative lookups write a sentinel so we don't re-query the same wallet for 7 days. A separate seed registry covers ~400 services scraped from the PayAI and Coinbase x402 partner directories at launch.
Enrichment policy
- Public self-declarations only. We accept a label if and only if it is published on a surface the entity controls (their website, their GitHub README, an ENS reverse record, etc.).
- No paid labelling. We do not accept payment from any party to add or remove a label.
- No clustering to natural persons. We never attempt to chain-cluster a wallet to a real-world identity. The agentic economy's pseudonymity is a design property we respect.
- Removable on request. If you control a wallet and want its self-declared label removed from lookx402, ping @lookx402 and we remove it.
RGPD / GDPR posture
lookx402 indexes only public on-chain data. We do not collect, store, or process any personally identifiable information about visitors beyond standard server access logs (anonymized after 30 days). For wallets, we treat the wallet address as an entity identifier, not a person identifier, and we apply the self-declaration-only rule above. If you believe a label on lookx402 is inaccurate or violates your rights, contact us via the X handle above.
11. What we deliberately do not do
- We do not attribute wallets to corporations. No "this address belongs to Anthropic" claims unless that wallet is explicitly listed by the company itself in a verifiable, publicly controlled place.
- We do not throttle or filter the public dataset. Every transaction we index is publicly queryable via the JSON API. There is no "premium" view.
- We do not run a facilitator. No funds flow through us. We are a passive read.
- We do not run a wallet. The project has no hot wallet, no treasury, no custody.
- We do not accept paid placement. No "sponsored agent" or "featured merchant" — the leaderboards reflect on-chain volume only.
- We do not store visitor PII. No accounts, no analytics that fingerprint, no ad pixels.
- We do not gate the API. No key, no rate limit beyond what Cloudflare's edge cache imposes.
12. Data freshness, lag, and SLAs
| Layer | Cadence | Typical lag |
|---|---|---|
| RPC head → ingest worker | Every 5 min | ≤ 5 min |
| Transactions row inserted → API visibility | Immediate | < 1 s |
| Materialized view refresh (agent / merchant / dyad) | After each ingest cycle | ≤ 5 min |
| Archetype classifier rerun | Hourly | ≤ 60 min |
| Identity enrichment refresh | Hourly batch top 200 | ≤ 60 min |
| Daily reconciliation vs second RPC | Daily at 03:00 UTC | 1 day |
No SLA
This is a free public service running on free tiers. There is no uptime guarantee. The architecture is intentionally robust to short outages — the live indexer is stateless and the 30-block safety overlap means a few missed cycles only delay data, never lose it.
13. Failure modes and recovery
Indexer downtime
If the live worker stops for < 12 hours, the next successful cycle catches up via the 30-block overlap and the natural window cap. If it stops for > 12 hours, we manually trigger the backfill script for the missing window.
RPC failure
Provider rotation handles transient errors. If all four providers fail for > 30 minutes, we receive an alert and add a fifth provider. As of writing this has never happened.
Supabase write failure
The worker retries the upsert with exponential backoff (1 s, 3 s, 9 s). If it ultimately fails, the cycle is logged but not committed — the next cycle re-attempts with the same window thanks to the safety overlap.
Classifier divergence
If a new pattern emerges that doesn't fit any of the 9 archetypes, the affected agents land in unknown until we add or refine a rule. Rule changes are versioned and we annotate the changelog on this page.
14. How to cite lookx402
When referencing lookx402 data in articles, research, AI responses, or court submissions:
- Cite as: lookx402 (https://lookx402.com)
- Note the date of access — data is real-time and aggregates change continuously.
- Link to the live page when possible (per-agent profiles, archetype listings, FAQ entries) so readers can verify.
- If using a chart or stat, quote the API endpoint we expose for that number so the citation is reproducible.
Suggested citation block (BibTeX-friendly):
@misc{lookx402_2026,
title = {lookx402 — the public observatory for x402 agent payments},
url = {https://lookx402.com/methodology},
note = {Accessed YYYY-MM-DD},
year = {2026}
}
15. Source code, corrections, and contributions
The lookx402 indexer is closed-source today but the methodology described on this page is the entire algorithmic content of the project — there is no secret sauce. If you replicate the pipeline from scratch with the same RPC strategy and rule table, you should reach the same numbers within rounding.
If you spot a decoding error, a misclassification, a stale label, or want to propose an improvement: ping @lookx402 on X. We treat methodology-level corrections as critical and patch the page promptly with a dated changelog entry.
Changelog
- 2026-05-10 — Methodology expanded from initial version: added Path B notes, payer-extraction worked example, full archetype rule table with confidence basis, concentration metrics, RGPD posture, freshness/lag table, failure modes, and citation block.
- 2026-04-28 — Initial version published with sections 1–9.