AI doesn't just learn words - it builds entire geometric worlds.
Turns out, AI thinks in shapes - from tiny crystals to whole galaxies
LLM's internal concepts form geometric patterns similar to human brain's organization.
And LLMs organize knowledge in three-tiered architecture: atomic, neural, and cosmic scales
📚 https://arxiv.org/abs/2410.19750
🎯 Original Problem:
Sparse autoencoders (SAE) have discovered interpretable features in LLMs, but we don't understand how these features are organized in the high-dimensional space and what patterns they form.
-----
🔧 Solution in this Paper:
→ Analyzed SAE feature space at three distinct scales:
- Atomic scale: Studied parallelogram/trapezoid structures (like man:woman::king:queen)
- Brain scale: Identified functional "lobes" where similar features cluster
- Galaxy scale: Examined large-scale point cloud structure
→ Used Linear Discriminant Analysis (LDA) to remove "distractor features"
→ Applied multiple statistical tests and metrics:
- Phi coefficient for functional clustering
- k-NN entropy estimation
- Mutual information metrics
-----
💡 Key Insights:
→ SAE features form crystal-like structures at small scales, but these are hidden by irrelevant dimensions
→ Features that fire together in documents are geometrically co-located, forming brain-like functional lobes
→ Middle layers show steeper power-law slopes (-0.47) compared to early/late layers (-0.24/-0.25)
→ Found distinct lobes for code/math, short messages/dialogue, and scientific papers
-----
📊 Results:
→ LDA significantly improved parallelogram/trapezoid structure quality
→ Phi coefficient showed best correspondence between functional and geometric clustering
→ Middle layers demonstrated reduced clustering entropy, suggesting more concentrated feature representation
→ Statistical significance: 954σ for mutual information, 74σ for logistic regression tests
Share this post