Paper
Methodology

Reproducible. Open. Independent.

Seven experiments, three frontier models, deterministic seeding, 8,280 runs in v1. All code, prompts, and raw logs are public.

Models

  • Claude Sonnet 4 · Anthropic
  • GPT-4o · OpenAI
  • Gemini 2.5 Flash · Google DeepMind

Two sampling temperatures (0.0, 0.7) · 30 replications per cell · 8,280 total runs.

Experiments

Negotiation
Anchoring, framing, surplus division
300 / model
Decoy effect
Attraction effect from a dominated option
540 / model
Information overload
Accuracy vs. description complexity
480 / model
Auction
Bidding vs. game-theoretic equilibrium
180 / model
Strategic exploitation
Informed vs. naive counterparty
720 / model
Debiasing
Prompt-level bias interventions
240 / model
Market simulation
Multi-agent double auctions
300 / model

Reproducibility

Every run is seeded from SHA256(module:model:treatment:variant:temp:run_idx). Interrupted runs resume from existing JSONL logs.

Statistical reporting: treatment means, Cohen's d / h, bootstrap 95% CIs, chi-square / t-tests with FDR correction across the experiment family.

Run it yourself

git clone https://github.com/antonhantel/behavior-of-A2A-commerce
cd behavior-of-A2A-commerce
pip install -r requirements.txt

# Full pipeline
python -m agent_bias_study

# Single experiment / model
python -m agent_bias_study --module negotiation
python -m agent_bias_study --model claude