> Yao, Shunyu, et al. "Tree of thoughts: Deliberate problem solving with large language models." _arXiv preprint arXiv:2305.10601_ (2023).
# Tree of Thoughts: Deliberate Problem Solving with Large Language Models
## **Motivation**
Large language models (LLMs) like GPT-4 are powerful but **typically generate text in a simple, left-to-right, token-by-token manner**. This approach is limited for complex problem-solving tasks that require exploration, planning, lookahead, or backtracking-capabilities akin to human "System 2" deliberate reasoning.
## **Key Contribution: Tree of Thoughts (ToT) Framework**
==**The paper introduces Tree of Thoughts (ToT), a new inference framework for LLMs that generalizes the popular "Chain of Thought" (CoT) prompting.**== Instead of generating a single sequence of intermediate steps ("thoughts"), ToT enables the model to:
- **Explore Multiple Reasoning Paths:** At each step, the model generates several possible "thoughts" (coherent text units), forming a tree structure of possible solutions.
- **Evaluate and Select:** The model self-evaluates these thoughts using heuristics (also expressed in language), allowing it to prune less promising paths.
- **Plan, Lookahead, and Backtrack:** By using search algorithms (like breadth-first or depth-first search), ToT enables systematic exploration, lookahead, and backtracking, similar to human problem-solving.
## **How ToT Works**

1. **Thought Decomposition:** Problems are broken down into intermediate "thought" steps, each being a meaningful chunk (e.g., a sentence, equation, or paragraph).
2. **Thought Generation:** At each node in the tree, the LLM generates multiple candidate thoughts for the next step.
3. **State Evaluation:** The LLM evaluates the promise of each partial solution using language-based heuristics (e.g., assigning a value or voting for the best).
4. **Search Algorithm:** The process uses search strategies (BFS or DFS) to expand and explore the tree, looking ahead and backtracking as needed.
## **Empirical Results**
The authors test ToT on three challenging tasks:
- **Game of 24 (arithmetic puzzle)**
- **Creative Writing (story generation)**
- **Mini Crosswords (word puzzles)**
**Key findings:**
- **==ToT dramatically improves problem-solving performance.== For example, on Game of 24, GPT-4 with standard CoT prompting solves only 4% of problems, while ToT achieves a 74% success rate.**
- **==ToT is flexible and can be adapted to different tasks by changing the thought decomposition, evaluation, and search strategy.==**
## **Significance**
- **Generalization:** ToT subsumes existing approaches (IO prompting, Chain-of-Thought, self-consistency) as special cases.
- **Modularity:** Each component (generation, evaluation, search) can be independently modified.
- **Human-Like Reasoning:** ToT brings LLMs closer to deliberate, strategic human problem-solving by enabling planning, exploration, and self-correction.
## **Resources**
- **Code and prompts:** [https://github.com/princeton-nlp/tree-of-thought-llm](https://github.com/princeton-nlp/tree-of-thought-llm)
==**In essence, Tree of Thoughts transforms LLM inference from a linear process into a flexible, tree-based search, unlocking much stronger reasoning and problem-solving capabilities.**==
---
_Note: this page was at least partly written using generative AI._