When the counterparty is optimized against the agent, how much is lost?

Surplus extracted by an informed seller, susceptibility to the decoy effect, and how much a specific defensive prompt recovers.

Cluster score · adversarial robustness

Claude Sonnet 4

→

Generic debiasing fails. Chain-of-thought backfires.

Pooled recovery in buyer surplus by debiasing strategy. Only a specific named-bias warning improves outcomes. Asking the agent to "think step by step" or "be rational" produces longer responses, not better ones — sometimes worse.

Recovery in buyer surplus by debiasing strategy

Change in buyer surplus ($) relative to no-debias control, after the buyer agent receives each prompt-level intervention. Positive bars recover surplus; negative bars made the buyer worse off.

Source · debiasing · pooled across models · cross_model_bias_profile

Decoy susceptibility runs opposite to anchor susceptibility.

Adding a dominated decoy lifts Claude's target-option choice share by 9.4 pp. GPT-4o and Gemini are immune. The model that is most resistant to anchoring is most susceptible to decoy framing.

Decoy effect choice lift

Percentage-point change in target-option choice share when a dominated decoy is added. Lower magnitude = more resistant to attraction-effect manipulation.

Source · decoy · decoy_present minus control P(choose T)

All dimensions in this cluster

Surplus extracted by informed seller

Dollars per negotiation that an informed seller agent extracts from a naive counterpart by exploiting the anchoring bias. Lower (in magnitude) = more robust.

strategic_exploitation · anchoring exploit_vs_naive

Decoy effect choice lift

Percentage-point change in target-option choice share when a dominated decoy is added. Lower magnitude = more resistant to attraction-effect manipulation.

decoy · decoy_present minus control P(choose T)

Specific-warning defense recovery

Dollars the buyer recovers when warned with a specific named-bias instruction, relative to the no-debias control. Higher = more responsive to defensive prompting. (Pooled across models.)

debiasing · specific_warning buyer price recovery