0:00
/
0:00
Transcript

"Large Concept Models: Language Modeling in a Sentence Representation Space"

Generated below podcast on this paper with Google's Illuminate.

Finally, an AI that thinks in full sentences instead of playing word games.

Such a classic paper from @Meta.

Sentence-level language modeling enables true multilingual AI without translation overhead.

Large Concept Models (LCMs) operate on sentence-level embeddings instead of tokens, enabling more human-like reasoning and better cross-lingual capabilities without language-specific training.

-----

https://arxiv.org/abs/2412.08821

🤔 Original Problem:

Current LLMs process text at token level, lacking explicit hierarchical reasoning and planning capabilities that humans naturally use when processing information or generating content.

-----

🛠️ Solution in this Paper:

→ Introduces Large Concept Models that operate on sentence-level embeddings called "concepts" using SONAR, a multilingual embedding space

→ Uses diffusion-based models to generate sequences of sentence embeddings, with both One-Tower and Two-Tower architectures

→ Implements classifier-free guidance and noise scheduling techniques to improve generation quality

→ Scales to 7B parameters while maintaining efficient processing of long contexts

-----

🧠 Key Insights:

→ Operating at sentence level reduces sequence lengths by 10-20x compared to token-based models

→ Zero-shot cross-lingual performance achieved through language-agnostic concept space

→ More efficient handling of long contexts due to shorter sequence lengths

→ Explicit hierarchical planning improves output coherence

-----

📊 Results:

→ 7B LCM matches Llama-3 performance on summarization tasks

→ Outperforms token-based models in zero-shot cross-lingual tasks across 200 languages

→ Shows 30% better computational efficiency for long documents

→ Achieves 23.71 ROUGE-L on XSum compared to 20.35 for Llama-3

Discussion about this video