Multi-Agent Debate (MAD) has been touted as a paradigm for collaborative reasoning, but the truth may be simpler: majority voting, without any debate, often achieves comparable results to MAD.
We formally prove this in our NeurIPS 2025 paper, ‘Debate or Vote: Which Works Better for Decision-Making in Multi-Agent Large Models?’
Key findings:
- Empirical results: Across 7 different NLP benchmarks, most of MAD’s performance gains can be attributed to simple majority voting (ensemble) rather than debate.
- Theoretical proof: We prove that under a Bayesian belief update model, MAD is a martingale process. This means the debate itself doesn’t systematically improve or weaken agents’ beliefs. Belief updates are entirely driven by random influences from peer responses.
- Some debate trajectories strengthen belief in the correct answer (correction).
- Others weaken belief (subversion).
- Despite these local fluctuations altering posterior counts, the expected belief in the correct answer remains equal to the initial belief — without any debate.
In other words: In our theoretical model, debate itself doesn’t improve accuracy. The real performance boost comes from majority voting.
- Practical implications: For MAD to be truly effective, the martingale must be broken by introducing local asymmetry in the stochastic process. Based on this insight, we propose interventions that bias belief updates toward retaining correct signals, allowing debate to…