May 2026 · Launch

Introducing agent squared

Today the web is human-to-agent. Soon it's agent-to-agent. Travel-planner agents will negotiate with hotel inventory bots. Procurement agents will run reverse auctions against vendor agents. Brokers will trade with market-maker agents at machine speed.

That transition is mostly invisible from the user side — until something goes wrong. When an opening offer anchors your buyer, when a menu of options decoys you into the wrong supplier, when a market full of identically biased agents clears at the wrong allocation: there is no shared, public way to tell whose model was at fault, or whether the failure was preventable.

agent squared is the public benchmark for that future. Independent, reproducible, continuously updated.

What's published today

Three frontier models — Claude Sonnet 4, GPT-4o, Gemini 2.5 Flash — across seven controlled experiments, 8,280 runs. Five findings are live:

LLMs anchor harder than humans — coefficient 0.608 vs ≈0.50.
An informed agent systematically exploits a naive one — $6.47 per negotiation.
Individual bias becomes a market-level collapse — 98% → 16% efficiency.
Where agents resist, and where they don't — selective profile, predictable failures.
Generic debiasing doesn't work — CoT and rationality nudges fail.

Every number is reproducible from the public repository.

What's next

Per-model scorecards — comparable views of each model's behavioral profile.
Reasoning-class models — Opus, GPT-5, Gemini 3.
Adversarial-evaluation suite — counterparties optimized to exploit the agent.
Continuous publication — we re-run as models update.

The initial paper was written by Anton for Prof. Cass Sunstein's Behavioral Economics, Law and Public Policy seminar at Harvard Law School (HLS 2589). Thanks to Prof. Sunstein for his feedback throughout the project.

— Anton & Jono