EXLM fixes context corruption in Masked Language Models (MLMs) by expanding masked tokens and modeling dependencies.
Eenhances context representation to improve semantic modeling.
-----
Paper - https://arxiv.org/abs/2501.13397
Original Problem 🤔:
→ Masked Language Models (MLMs) struggle with corrupted semantics and unreal tokens introduced by masking.
→ Corrupted semantics has a larger impact on performance than unreal tokens.
-----
Solution in this Paper 💡:
→ EXLM expands each [MASK] token into multiple hidden states, enlarging the semantic space. This captures richer semantic information.
→ A transition matrix models dependencies between these expanded states, further improving semantic representation.
→ A states alignment algorithm, using dynamic programming, aligns target tokens with expanded states. This provides efficient supervision during training.
-----
Key Insights from this Paper 😲:
→ Corrupted semantics is a major challenge for MLM performance, exceeding the impact of unreal tokens.
→ Expanding the semantic space and explicitly modeling dependencies between expanded states improves MLM performance.
→ Dynamic programming offers an efficient solution for aligning target tokens with multiple expanded states.
-----
Results 😎:
→ Achieves state-of-the-art performance on the SQUAD 2.0 and 7 out of 8 GLUE tasks.
→ Outperforms baseline models on 5 of 7 molecule property prediction tasks in MoleculeNet.
→ Significantly outperforms SMILES-BERT, a similar MLM model, on molecular tasks.
Share this post