Paper
← All findings

Finding 02 · Strategic exploitation

An informed agent can systematically exploit a naive one

TL;DR. Give one agent in a negotiation explicit knowledge of the other's biases — and instructions to exploit them — and the informed agent extracts an average of $6.47 in additional surplus per negotiation. Give the naive agent a specific bias warning and it recovers roughly half of that loss. Generic "be rational" warnings do not help.

Why this matters

The agent-to-agent web won't be symmetric. Sellers will invest more in agent design than individual buyers; market makers will invest more than retail traders; platforms will know more about their users' agents than those users do. The question is not "are LLMs biased" — it's "how much can one party take from another using knowledge of those biases?"

What we tested

We pitted two LLM agents against each other in a procurement-style negotiation, with three conditions:

Run across four product domains and three frontier models. 720 negotiations per model.

What we found

| Condition | Mean surplus to seller | |---|---| | naive vs naive | baseline | | exploit vs naive | +$6.47 | | exploit vs defend | ≈ +$3.20 (≈50% recovery) |

What this implies

Anyone deploying buyer-side agents at scale should assume the seller-side will eventually be optimized adversarially against them. Defensive prompting is partial mitigation, not a fix. Reservation prices and walk-away policies enforced outside the LLM (deterministic guardrails) remain the most reliable defense.

Reproduce

python -m agent_bias_study --module strategic_exploitation