"Counting Ability of Large Language Models and Impact of Tokenization"

Playback speed

Share post at current time

0:00

Transcript

"Counting Ability of Large Language Models and Impact of Tokenization"

The podcast on this paper is generated with Google's Illuminate.

Rohan Paul

Dec 23, 2024

Want your AI to count better? Just space things out!

Proper tokenization helps LLMs count better by preventing character grouping

📚 https://arxiv.org/abs/2410.19730

🤖 Original Problem:

Transformers in LLMs lack recurrent connections, limiting them to constant-depth computation. This makes them theoretically incapable of solving tasks requiring increasing reasoning depth with input length, like counting.

-----

🔧 Solution in this Paper:

• Analyzed how Byte Pair Encoding (BPE) tokenization impacts counting ability

• Used delimiters (spaces/commas) to force item-separated tokenization

• Implemented supervised Chain of Thought (CoT) with explicit step templates

• Manipulated tokenization through string formatting to improve counter extraction

• Extended reasoning from latent space to text space using natural language sequences

💡 How does Chain of Thought (CoT) help overcome Transformer's limitations?

CoT extends reasoning from latent space to text space, using natural language sequences to relay computations in absence of recurrence. This allows higher-complexity tasks like counting to become feasible by enabling recurrent processing through text-to-vector conversions.

-----

💡 Key Insights:

• BPE tokenization groups multiple characters, causing up to 80% degradation in counting accuracy

• Lower-frequency letters (z: 0.07%) achieve higher counting accuracy than high-frequency ones (e: 12.7%)

• Proper tokenization combined with CoT can overcome Transformer's theoretical limitations

• Clear token separation improves counting accuracy significantly

• Supervised CoT outperforms unsupervised CoT across all tokenization methods

-----

📊 Results:

• Accuracy drops from 96% to 56% as string length increases from [10,20] to [30,40]

• Item-separated tokenization improves performance by 13-40% over pure BPE

• Rare tokens show 6-12% better counting performance than frequent tokens