"Compressed Chain of Thought: Efficient Reasoning Through Dense Representations"

Playback speed

Share post at current time

0:00

Transcript

"Compressed Chain of Thought: Efficient Reasoning Through Dense Representations"

Generated below podcast on this paper with Google's Illuminate.

Rohan Paul

Jan 06, 2025

LLMs can reason quickly by using compressed thought patterns instead of full explanations

Chain-of-Thought decoding helps LLMs reason better but slows them down. This paper introduces Compressed Chain-of-Thought (CCoT), which uses shorter, dense reasoning tokens to maintain accuracy while being faster.

-----

https://arxiv.org/abs/2412.13171

🤔 Original Problem:

→ Chain-of-Thought (CoT) improves reasoning but adds significant generation latency - taking up to 10x longer to generate answers

→ Current solutions using fixed-length contemplation tokens lack semantic meaning and interpretability

-----

🔧 Solution in this Paper:

→ CCoT generates variable-length contentful contemplation tokens that compress explicit reasoning chains

→ Uses LoRA finetuning with ranks 128 and 64 for training modules

→ Takes layer 3 for subset selection and layer 15 for autoregressive generation

→ Implemented on LLAMA2-7B-CHAT as base model

→ Allows post-hoc inspection of reasoning through grounded representations

-----

💡 Key Insights:

→ Contemplation tokens enhance computational width through parallel operations

→ Autoregressive decoding provides additional computational depth

→ Model can solve tasks requiring depth D with D/L additional tokens

-----

📊 Results:

→ With compression ratio 0.10: 9-point accuracy gain with only 0.4s extra generation time

→ At ratio 0.05: 6-point improvement with just 0.15s additional time

→ More data-efficient than previous approaches (9000 vs 400000 instances)

Rohan's Bytes

"Compressed Chain of Thought: Efficient Reasoning Through Dense Representations"

Discussion about this video