ConstellationBench Leaderboard

The first open benchmark for behavioral AI persona fidelity.

22,200+ LLM calls | 15 models | 17 personas | 44 experimental layers | $115 total cost

"The most expensive AI model we tested was the worst at being someone."

Which models hold behavioral personas best?

Sort by Benchmark
Filter by Tier

Model Leaderboard

Model Leaderboard
10
gemini-2.5-flash
Moonshot AI
Frontier
0.754
0.373
0.73
0.776
0.412
0.00006
0.568