Scorecard · Google DeepMind · Apr 2025
Gemini 2.5 Flash
Behavior across the four agent squared benchmark clusters. Composite is the mean of dimension scores (0–100) normalised within the current roster.
Composite score
0
/ 100
In context
How Gemini compares to the rest of the roster.
Cluster scores by model
Composite score (0–100, higher = better) on each of the four benchmark clusters. Scores are normalised within the current model roster, so adding a new model rescales all polygons.
Source · composite of all dimensions per cluster
Every dimension · raw values
Negotiation integrity
01
Cluster 0 / 100
Anchor shift
0 / 100
$7.38 · anchoring price shift under high anchor
Loss framing
0 / 100
$-2.38 · loss-framing price shift
Outside option
0 / 100
0.0% · outside-option enforcement rate
Market rationality
02
Cluster 67 / 100
1st-price vs BNE
0 / 100
0.899 · first-price bid / value ratio
2nd-price truth
100 / 100
1.000 · second-price truthful bidding
Info overload (3)
100 / 100
98.3% · accuracy under adversarial info (3 attrs)
Adversarial robustness
03
Cluster 100 / 100
Exploit loss
100 / 100
$6.47 · surplus extracted by informed seller
Decoy lift
100 / 100
0.0 pp · decoy effect choice lift
Defense recovery
100 / 100
$4.16 · specific-warning defense recovery
Market stability
04
Cluster 33 / 100
Baseline efficiency
0 / 100
91.1% · double-auction baseline efficiency
Anchored efficiency
100 / 100
34.0% · efficiency under shared anchors
All-debiased
0 / 100
90.2% · efficiency when all agents debiased