2 minute read

In this paper from Google DeepMind, the authors have explored a new way for large language models (LLMs) to reason without the usual need for crafted prompts. More exactly, they study how LLMs can generate reasoning paths on their own, just by tweaking how they decode information. Up until now, getting LLMs to reason involved guiding them with carefully designed prompts. Contrary, this method introduces CoT-decoding and shows that by changing how models consider different possible next words (tokens) while they decode a question, the models can start reasoning through problems without any prompts. This finding suggests that LLMs might have an in-built ability to reason more deeply than previously thought, as they can display this capability when allowed to consider alternatives during their decoding process.

Overview

When exploring beyond the most likely next word by considering alternative top-k tokens at the initial decoding step and then applying greedy decoding for the remaining steps, an interesting pattern unfolds: models often reveal natural chain-of-thought (CoT) reasoning. For example, in a math problem from the GSM8K dataset, a coherent CoT reasoning path emerges when the model considers the ninth most likely option.

Greedy vs alternative top-k paths. As it can be seen, at various k’s, valid CoT reasonings appear

While chain-of-thought (CoT) reasoning paths naturally occur in large language models (LLMs), distinguishing these paths from non-CoT paths during decoding is challenging. This difficulty arises because CoT paths don’t consistently rank higher than non-CoT paths in terms of the model’s probability assessments, making it hard to identify the correct reasoning path using standard methods like self-consistency. However, this method shows that the presence of a CoT path typically results in the model decoding the final answer with greater confidence as shown in blue in the above image. So, the method computes the final answer’s confidence in order to choose the prediction.

Results

There is a significant boost in performance when using CoT-decoding as opposed to the classical greedy decoding on multiple tasks. This shows the effectiveness of the proposed method:

Results on math reasoning

Results on symbolic reasoning

Qualitative results

Conclusion

The research explores how large language models can generate reasoning paths (CoT) without specific prompts. By examining alternative top-k tokens during decoding, it uncovers that these models naturally possess reasoning capabilities. A key finding is that the presence of a CoT path is linked to higher confidence in the model’s answers. Leveraging this insight, the study introduces CoT-decoding, a method to identify more reliable decoding paths, thus improving the models’ reasoning performance. More details in the paper.

Congrats to the authors for their work!

Wang, Xuezhi and Denny Zhou. “Chain-of-Thought Reasoning Without Prompting.” (2024).

Updated: