"ExLM: Rethinking the Impact of MASK Tokens in Masked Language Models"

Playback speed

Share post at current time

0:00

Transcript

"ExLM: Rethinking the Impact of MASK Tokens in Masked Language Models"

Below podcast is generated with Google's Illuminate.

Rohan Paul

Jan 29, 2025

EXLM fixes context corruption in Masked Language Models (MLMs) by expanding masked tokens and modeling dependencies.

Eenhances context representation to improve semantic modeling.

-----

Paper - https://arxiv.org/abs/2501.13397

Original Problem 🤔:

→ Masked Language Models (MLMs) struggle with corrupted semantics and unreal tokens introduced by masking.

→ Corrupted semantics has a larger impact on performance than unreal tokens.

-----

Solution in this Paper 💡:

→ EXLM expands each [MASK] token into multiple hidden states, enlarging the semantic space. This captures richer semantic information.

→ A transition matrix models dependencies between these expanded states, further improving semantic representation.

→ A states alignment algorithm, using dynamic programming, aligns target tokens with expanded states. This provides efficient supervision during training.

-----

Key Insights from this Paper 😲:

→ Corrupted semantics is a major challenge for MLM performance, exceeding the impact of unreal tokens.

→ Expanding the semantic space and explicitly modeling dependencies between expanded states improves MLM performance.

→ Dynamic programming offers an efficient solution for aligning target tokens with multiple expanded states.

-----

Results 😎:

→ Achieves state-of-the-art performance on the SQUAD 2.0 and 7 out of 8 GLUE tasks.

→ Outperforms baseline models on 5 of 7 molecule property prediction tasks in MoleculeNet.

→ Significantly outperforms SMILES-BERT, a similar MLM model, on molecular tasks.

Rohan's Bytes

"ExLM: Rethinking the Impact of MASK Tokens in Masked Language Models"

Discussion about this video