0:00
/
0:00
Transcript

"Compressed Chain of Thought: Efficient Reasoning Through Dense Representations"

Generated below podcast on this paper with Google's Illuminate.

LLMs can reason quickly by using compressed thought patterns instead of full explanations

Chain-of-Thought decoding helps LLMs reason better but slows them down. This paper introduces Compressed Chain-of-Thought (CCoT), which uses shorter, dense reasoning tokens to maintain accuracy while being faster.

-----

https://arxiv.org/abs/2412.13171

🤔 Original Problem:

→ Chain-of-Thought (CoT) improves reasoning but adds significant generation latency - taking up to 10x longer to generate answers

→ Current solutions using fixed-length contemplation tokens lack semantic meaning and interpretability

-----

🔧 Solution in this Paper:

→ CCoT generates variable-length contentful contemplation tokens that compress explicit reasoning chains

→ Uses LoRA finetuning with ranks 128 and 64 for training modules

→ Takes layer 3 for subset selection and layer 15 for autoregressive generation

→ Implemented on LLAMA2-7B-CHAT as base model

→ Allows post-hoc inspection of reasoning through grounded representations

-----

💡 Key Insights:

→ Contemplation tokens enhance computational width through parallel operations

→ Autoregressive decoding provides additional computational depth

→ Model can solve tasks requiring depth D with D/L additional tokens

-----

📊 Results:

→ With compression ratio 0.10: 9-point accuracy gain with only 0.4s extra generation time

→ At ratio 0.05: 6-point improvement with just 0.15s additional time

→ More data-efficient than previous approaches (9000 vs 400000 instances)

Discussion about this video