Awesome-Parallel-Reasoning

Awesome-Parallel-Reasoning

License: MIT Awesome Github

Curated collection of papers and resources on unlocking the potential of test time scaling of reasoning in large language models

Overview

In recent years, the capabilities of Large Language Models (LLMs) have advanced at an unprecedented rate. This progress has been largely attributed to the scaling of model parameters, training data, and computational resources. At the same time, inference-time performance has been significantly improved by extending the computational “length” through methods like Chain-of-Thought, which enables models to formulate a reasoning process before delivering a final answer.

This raises a compelling question: beyond scaling the “depth” (model layers) and “length” (sequential reasoning), can we unlock new potential by introducing a “width” dimension to test-time computation? This collection explores this emerging frontier of parallel reasoning, which focuses on broadening the computational scope at inference time. Instead of pursuing a single line of thought, this paradigm involves generating and evaluating multiple, diverse reasoning paths or hypotheses in parallel. Conceptual examples can be seen in approaches where a model considers several hypotheses at once before proceeding, or in advanced multi-agent systems, like those explored by Gemini&Anthropic, where a lead agent coordinates multiple parallel agents to accomplish a goal.

The adoption of parallel reasoning offers a dual advantage. First, it significantly expands the effective computational budget for any given query, enhancing the robustness and quality of the final output. Second, it holds immense practical value by drastically reducing latency, a critical factor for improving user experience in real-world applications.

To systematically survey this exciting area, this collection curates key papers and resources, organized into the following categories:

  1. Foundational Techniques: Papers on precursor methods that laid the groundwork for parallel reasoning by generating multiple independent samples, such as Self-Consistency and Tree-of-Thought (ToT).
  2. Parallel Reasoning Strategies:
    • Interactive Parallelism: Methods where multiple reasoning processes are generated simultaneously and can interact internally during inference.
    • Collaborative Agent Systems: Frameworks that leverage multiple, coordinated agents to solve a single, complex task.
  3. Parallel Inference Acceleration: Techniques designed to speed up decoding, such as speculative decoding and novel schemes for allocating computational resources for parallel generation.
  4. Infrastructure: Frameworks and systems built to support and accelerate parallel reasoning at scale.
  5. Industry Applications: Showcases how parallel reasoning techniques are implemented in prominent products and systems.

📄 Papers

Premium

Self-Consistency

[2203] Self-Consistency Improves Chain of Thought Reasoning in Language Models Code 💻
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed H. Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou

Adaptive Self-Consistency

[2305] Let’s Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLMs Code 💻
Pranjal Aggarwal, Aman Madaan, Yiming Yang, Mausam

Soft Self-Consistency

[2402] Soft Self-Consistency Improves Language Model Agents Code 💻
Han Wang, Archiki Prasad, Elias Stengel-Eskin*, Mohit Bansal

Tree of Thoughts

[2305] Tree of Thoughts: Deliberate Problem Solving with Large Language Models Code 💻
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan

Graph of Thoughts

[2308] Graph of Thoughts: Solving Elaborate Problems with Large Language Models Code 💻
Maciej Besta1, Nils Blach1, Ales Kubicek, Robert Gerstenberger, Michał Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, Torsten Hoefler


Interactive Parallasim

Leap

[2505] Learning from Peers in Reasoning Models Code 💻
Tongxu Luo, Wenyu Du, Jiaxi Bi, Stephen Chung, Zhengyang Tang, Hao Yang, Min Zhang, Benyou Wang

Hogwild

[2504] Hogwild! Inference: Parallel LLM Generation via Concurrent Attention Code 💻
Gleb Rodionov†, Roman Garipov, Alina Shutova∗, George Yakushev∗, Erik Schultheis∗, Vage Egiazarian, Anton Sinitsin, Denis Kuznedelev, Dan Alistarh*

Adaptive Parallel Reasoning

[2504] Learning Adaptive Parallel Reasoning with Language Models Code 💻
Jiayi Pan, Xiuyu Li, Long Lian*, Charlie Snell, Yifei Zhou, Adam Yala, Trevor Darrell, Kurt Keutzer, Alane Suhr

Multiverse

[2506] Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation Code 💻
Xinyu Yang, Yuwei An, Hongyi Liu, Tianqi Chen*, Beidi Chen


Agent System

Anthropic Research System

[2506] How we built our multi-agent research system
Anthropic Team

SSA

[2506] Learning to Reason Across Parallel Samples for LLM Reasoning Jianing Qi, Xi Ye, Hao Tang, Zhigang Zhu, Eunsol Choi

RR-MP

[2501] Enhancing LLM Reasoning with Multi-Path Collaborative Reactive and Reflection agents Chengbo He, Bochao Zou, Xin Li, Jiansheng Chen, Junliang Xing, Huimin Ma


Accelerate

SSR

[2505] SSR: Speculative Parallel Scaling Reasoning in Test-time Yuanlin, Bo WANG, Xiang LIU, Hong CHEN, Aiwei LIU, Xuming HU

SpecSearch

[2505] Accelerating Large Language Model Reasoning via Speculative Search Chengbo He, Bochao Zou, Xin Li, Jiansheng Chen, Junliang Xing, Huimin Ma, Zhihai Wang, Jie Wang, Jilai Pan, Xilin Xia, Huiling Zhen, Mingxuan Yuan, Jianye Hao, Feng Wu

CoSD

[2502] Speculate, then Collaborate: Fusing Knowledge of Language Models during Decoding Ziyao Wang, Muneeza Azmat, Ang Li, Raya Horesh, Mikhail Yurochkin

DoT

[2502] Division-of-Thoughts: Harnessing Hybrid Language Model Synergy for Efficient On-Device Agents Chenyang Shao, Xinyuan Hu, Yutang Lin, Fengli Xu*


Infra

vLLM(PagedAttention)Code 💻

SGLang(RadixAttention)Code 💻


Available