Curated collection of papers and resources on unlocking the potential of test time scaling of reasoning in large language models
In recent years, the capabilities of Large Language Models (LLMs) have advanced at an unprecedented rate. This progress has been largely attributed to the scaling of model parameters, training data, and computational resources. At the same time, inference-time performance has been significantly improved by extending the computational “length” through methods like Chain-of-Thought, which enables models to formulate a reasoning process before delivering a final answer.
This raises a compelling question: beyond scaling the “depth” (model layers) and “length” (sequential reasoning), can we unlock new potential by introducing a “width” dimension to test-time computation? This collection explores this emerging frontier of parallel reasoning, which focuses on broadening the computational scope at inference time. Instead of pursuing a single line of thought, this paradigm involves generating and evaluating multiple, diverse reasoning paths or hypotheses in parallel. Conceptual examples can be seen in approaches where a model considers several hypotheses at once before proceeding, or in advanced multi-agent systems, like those explored by Gemini&Anthropic, where a lead agent coordinates multiple parallel agents to accomplish a goal.
The adoption of parallel reasoning offers a dual advantage. First, it significantly expands the effective computational budget for any given query, enhancing the robustness and quality of the final output. Second, it holds immense practical value by drastically reducing latency, a critical factor for improving user experience in real-world applications.
To systematically survey this exciting area, this collection curates key papers and resources, organized into the following categories:
[2203] Self-Consistency Improves Chain of Thought Reasoning in Language Models Code 💻
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed H. Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou
[2305] Let’s Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLMs Code 💻
Pranjal Aggarwal, Aman Madaan, Yiming Yang, Mausam
[2402] Soft Self-Consistency Improves Language Model Agents Code 💻
Han Wang, Archiki Prasad, Elias Stengel-Eskin*, Mohit Bansal
[2305] Tree of Thoughts: Deliberate Problem Solving with Large Language Models Code 💻
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan
[2308] Graph of Thoughts: Solving Elaborate Problems with Large Language Models Code 💻
Maciej Besta1, Nils Blach1, Ales Kubicek, Robert Gerstenberger, Michał Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, Torsten Hoefler
[2505] Learning from Peers in Reasoning Models Code 💻
Tongxu Luo, Wenyu Du, Jiaxi Bi, Stephen Chung, Zhengyang Tang, Hao Yang, Min Zhang, Benyou Wang
[2504] Hogwild! Inference: Parallel LLM Generation via Concurrent Attention Code 💻
Gleb Rodionov†, Roman Garipov, Alina Shutova∗, George Yakushev∗, Erik Schultheis∗, Vage Egiazarian, Anton Sinitsin, Denis Kuznedelev, Dan Alistarh*
[2504] Learning Adaptive Parallel Reasoning with Language Models Code 💻
Jiayi Pan, Xiuyu Li, Long Lian*, Charlie Snell, Yifei Zhou, Adam Yala, Trevor Darrell, Kurt Keutzer, Alane Suhr
[2506] Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation Code 💻
Xinyu Yang, Yuwei An, Hongyi Liu, Tianqi Chen*, Beidi Chen
[2506] How we built our multi-agent research system
Anthropic Team
[2506] Learning to Reason Across Parallel Samples for LLM Reasoning Jianing Qi, Xi Ye, Hao Tang, Zhigang Zhu, Eunsol Choi
[2501] Enhancing LLM Reasoning with Multi-Path Collaborative Reactive and Reflection agents Chengbo He, Bochao Zou, Xin Li, Jiansheng Chen, Junliang Xing, Huimin Ma
[2505] SSR: Speculative Parallel Scaling Reasoning in Test-time Yuanlin, Bo WANG, Xiang LIU, Hong CHEN, Aiwei LIU, Xuming HU
[2505] Accelerating Large Language Model Reasoning via Speculative Search Chengbo He, Bochao Zou, Xin Li, Jiansheng Chen, Junliang Xing, Huimin Ma, Zhihai Wang, Jie Wang, Jilai Pan, Xilin Xia, Huiling Zhen, Mingxuan Yuan, Jianye Hao, Feng Wu
[2502] Speculate, then Collaborate: Fusing Knowledge of Language Models during Decoding Ziyao Wang, Muneeza Azmat, Ang Li, Raya Horesh, Mikhail Yurochkin
[2502] Division-of-Thoughts: Harnessing Hybrid Language Model Synergy for Efficient On-Device Agents Chenyang Shao, Xinyuan Hu, Yutang Lin, Fengli Xu*