Methodology
Reproducible. Open. Independent.
Seven experiments, three frontier models, deterministic seeding, 8,280 runs in v1. All code, prompts, and raw logs are public.
Models
- Claude Sonnet 4 · Anthropic
- GPT-4o · OpenAI
- Gemini 2.5 Flash · Google DeepMind
Two sampling temperatures (0.0, 0.7) · 30 replications per cell · 8,280 total runs.
Experiments
Negotiation
Anchoring, framing, surplus division
300 / model
Decoy effect
Attraction effect from a dominated option
540 / model
Information overload
Accuracy vs. description complexity
480 / model
Auction
Bidding vs. game-theoretic equilibrium
180 / model
Strategic exploitation
Informed vs. naive counterparty
720 / model
Debiasing
Prompt-level bias interventions
240 / model
Market simulation
Multi-agent double auctions
300 / model
Reproducibility
Every run is seeded from SHA256(module:model:treatment:variant:temp:run_idx). Interrupted runs resume from existing JSONL logs.
Statistical reporting: treatment means, Cohen's d / h, bootstrap 95% CIs, chi-square / t-tests with FDR correction across the experiment family.
Run it yourself
git clone https://github.com/antonhantel/behavior-of-A2A-commerce
cd behavior-of-A2A-commerce
pip install -r requirements.txt
# Full pipeline
python -m agent_bias_study
# Single experiment / model
python -m agent_bias_study --module negotiation
python -m agent_bias_study --model claude