> Yao, Shunyu, et al. "Tree of thoughts: Deliberate problem solving with large language models." _arXiv preprint arXiv:2305.10601_ (2023). # Tree of Thoughts: Deliberate Problem Solving with Large Language Models ## **Motivation** Large language models (LLMs) like GPT-4 are powerful but **typically generate text in a simple, left-to-right, token-by-token manner**. This approach is limited for complex problem-solving tasks that require exploration, planning, lookahead, or backtracking-capabilities akin to human "System 2" deliberate reasoning. ## **Key Contribution: Tree of Thoughts (ToT) Framework** ==**The paper introduces Tree of Thoughts (ToT), a new inference framework for LLMs that generalizes the popular "Chain of Thought" (CoT) prompting.**== Instead of generating a single sequence of intermediate steps ("thoughts"), ToT enables the model to: - **Explore Multiple Reasoning Paths:** At each step, the model generates several possible "thoughts" (coherent text units), forming a tree structure of possible solutions. - **Evaluate and Select:** The model self-evaluates these thoughts using heuristics (also expressed in language), allowing it to prune less promising paths. - **Plan, Lookahead, and Backtrack:** By using search algorithms (like breadth-first or depth-first search), ToT enables systematic exploration, lookahead, and backtracking, similar to human problem-solving. ## **How ToT Works** ![ToT|600](https://i.imgur.com/xXChB1A.png) 1. **Thought Decomposition:** Problems are broken down into intermediate "thought" steps, each being a meaningful chunk (e.g., a sentence, equation, or paragraph). 2. **Thought Generation:** At each node in the tree, the LLM generates multiple candidate thoughts for the next step. 3. **State Evaluation:** The LLM evaluates the promise of each partial solution using language-based heuristics (e.g., assigning a value or voting for the best). 4. **Search Algorithm:** The process uses search strategies (BFS or DFS) to expand and explore the tree, looking ahead and backtracking as needed. ## **Empirical Results** The authors test ToT on three challenging tasks: - **Game of 24 (arithmetic puzzle)** - **Creative Writing (story generation)** - **Mini Crosswords (word puzzles)** **Key findings:** - **==ToT dramatically improves problem-solving performance.== For example, on Game of 24, GPT-4 with standard CoT prompting solves only 4% of problems, while ToT achieves a 74% success rate.** - **==ToT is flexible and can be adapted to different tasks by changing the thought decomposition, evaluation, and search strategy.==** ## **Significance** - **Generalization:** ToT subsumes existing approaches (IO prompting, Chain-of-Thought, self-consistency) as special cases. - **Modularity:** Each component (generation, evaluation, search) can be independently modified. - **Human-Like Reasoning:** ToT brings LLMs closer to deliberate, strategic human problem-solving by enabling planning, exploration, and self-correction. ## **Resources** - **Code and prompts:** [https://github.com/princeton-nlp/tree-of-thought-llm](https://github.com/princeton-nlp/tree-of-thought-llm) ==**In essence, Tree of Thoughts transforms LLM inference from a linear process into a flexible, tree-based search, unlocking much stronger reasoning and problem-solving capabilities.**== --- _Note: this page was at least partly written using generative AI._