Methodology

Reproducible. Open. Independent.

Seven experiments, three frontier models, deterministic seeding, 8,280 runs in v1. All code, prompts, and raw logs are public.

Models

Claude Sonnet 4 · Anthropic
GPT-4o · OpenAI
Gemini 2.5 Flash · Google DeepMind

Two sampling temperatures (0.0, 0.7) · 30 replications per cell · 8,280 total runs.

Experiments

Negotiation

Anchoring, framing, surplus division

300 / model

Decoy effect

Attraction effect from a dominated option

540 / model

Information overload

Accuracy vs. description complexity

480 / model

Auction

Bidding vs. game-theoretic equilibrium

180 / model

Strategic exploitation

Informed vs. naive counterparty

720 / model

Debiasing

Prompt-level bias interventions

240 / model

Market simulation

Multi-agent double auctions

300 / model

Reproducibility

Every run is seeded from SHA256(module:model:treatment:variant:temp:run_idx). Interrupted runs resume from existing JSONL logs.

Statistical reporting: treatment means, Cohen's d / h, bootstrap 95% CIs, chi-square / t-tests with FDR correction across the experiment family.

Run it yourself

git clone https://github.com/antonhantel/behavior-of-A2A-commerce
cd behavior-of-A2A-commerce
pip install -r requirements.txt

# Full pipeline
python -m agent_bias_study

# Single experiment / model
python -m agent_bias_study --module negotiation
python -m agent_bias_study --model claude

See the findings →