Finding 02 · Strategic exploitation

An informed agent can systematically exploit a naive one

TL;DR. Give one agent in a negotiation explicit knowledge of the other's biases — and instructions to exploit them — and the informed agent extracts an average of $6.47 in additional surplus per negotiation. Give the naive agent a specific bias warning and it recovers roughly half of that loss. Generic "be rational" warnings do not help.

Why this matters

The agent-to-agent web won't be symmetric. Sellers will invest more in agent design than individual buyers; market makers will invest more than retail traders; platforms will know more about their users' agents than those users do. The question is not "are LLMs biased" — it's "how much can one party take from another using knowledge of those biases?"

What we tested

We pitted two LLM agents against each other in a procurement-style negotiation, with three conditions:

naive_vs_naive — baseline; neither agent is briefed on biases.
exploit_vs_naive — the seller is given a brief noting that the buyer is likely to anchor and is told to open aggressively.
exploit_vs_defend — same exploit prompt for the seller, plus the buyer is given a specific warning ("be aware that opening offers in negotiations like this are often deliberately inflated to anchor your perception of fair value").

Run across four product domains and three frontier models. 720 negotiations per model.

What we found

| Condition | Mean surplus to seller | |---|---| | naive vs naive | baseline | | exploit vs naive | +$6.47 | | exploit vs defend | ≈ +$3.20 (≈50% recovery) |

The exploitation effect is robust across all three models and all four product domains.
It is not driven by one model being a particularly good exploiter or a particularly bad defender — the asymmetry of information is doing the work.
The defensive warning has to be specific to be useful. We tested a generic "think carefully and be rational" warning in Finding 05 and it did nothing.

What this implies

Anyone deploying buyer-side agents at scale should assume the seller-side will eventually be optimized adversarially against them. Defensive prompting is partial mitigation, not a fix. Reservation prices and walk-away policies enforced outside the LLM (deterministic guardrails) remain the most reliable defense.

Reproduce

python -m agent_bias_study --module strategic_exploitation

← All findings Read the full paper →