Making neural networks true language recognizers instead of proxy task solvers
https://arxiv.org/abs/2411.07107
🎯 Original Problem:
Current methods test neural networks' computational power using proxy tasks like language modeling, creating a mismatch with formal language theory which deals with recognizers (machines that classify strings as belonging to a language or not).
-----
🛠️ Solution in this Paper:
→ Introduces FLaRe (Formal Language Recognition) benchmark for training neural networks directly as binary classifiers of formal languages
→ Develops an efficient algorithm for length-controlled sampling from regular languages using counting semirings
→ Implements balanced positive and negative sampling with two types of negative examples: uniform random strings and perturbed positive examples
→ Uses binary cross-entropy as primary objective with optional auxiliary tasks like language modeling
-----
💡 Key Insights:
→ RNN and LSTM often outperform transformers in formal language recognition tasks
→ Auxiliary objectives like language modeling help specific architectures but show no consistent improvement
→ The proposed sampling algorithm improves time complexity by O(n_max^2)
→ Transformers show preference for low-sensitivity Boolean functions
-----
📊 Results:
→ Achieved scalable sampling up to string length n_max=500
→ RNN and LSTM consistently outperformed transformer architecture across multiple formal languages
→ Binary cross-entropy objective proved highly effective without requiring complex auxiliary tasks
Share this post