Chain of Thought (CoT) enables transformers to execute serial computations, expanding their problem-solving capabilities beyond parallel-only limitations.
Augmenting transformer expressiveness, particularly for inherently sequential problems.
📚 https://arxiv.org/pdf/2402.12875
Original Problem 😕:
LLMs exhibit exceptional reasoning capabilities when generating intermediate steps (chain of thought, CoT) before final answers. The mechanism behind CoT's effectiveness remains unclear, especially in zero-shot and incorrect reasoning scenarios.
-----
Key Insights from this Paper 💡:
• Constant-depth transformers with finite precision and poly(n) embedding size can only solve problems in AC0 without CoT
• With T steps of CoT, constant-depth transformers using constant-bit precision and O(log n) embedding size can solve any problem solvable by boolean circuits of size T
• CoT drastically improves accuracy of low-depth transformers on inherently serial problems
-----
Solution in this Paper 🔍:
• Defines new complexity class CoT[T(n), d(n), s(n), e(n)] for problems solvable by constant-depth transformers with:
- T(n) CoT steps
- d(n) embedding size
- s(n) precision bits
- e(n) exponent bits
• Proves theoretical bounds on expressiveness of transformers with different CoT lengths and embedding sizes
• Analyzes transformer expressiveness through circuit complexity lens
• Empirically evaluates CoT's impact on four arithmetic problems: modular addition, permutation composition, iterated squaring, and circuit value problem
-----
Results 📊:
• CoT[poly(n), log n, 1] = P/poly, showing polynomially many CoT steps make transformers strictly more powerful
• Empirical results on inherently serial problems:
- Permutation composition (S5): CoT solves task well, base model performs no better than random guessing
- Iterated squaring: CoT achieves perfect performance, base model struggles
- Circuit value problem: CoT outperforms base model significantly
Share this post