ChatGPT's obsessions with "Delve"
This paper identifies and analyzes words overused by LLMs in scientific writing.
https://arxiv.org/abs/2412.11385
🛠️ Methods in this Paper:
→ The study analyzed 5.2 billion tokens from 26.7 million PubMed abstracts.
→ A three-step process identified focal words with significant usage increase since 2020.
→ Researchers compared AI-generated and human-written abstracts to pinpoint overrepresented words.
→ Potential causes were investigated, including model architecture, training data, and RLHF.
-----
💡 Key Insights from this Paper:
→ 21 focal words, including "delve" and "intricate," show unprecedented increase in scientific abstracts
→ LLMs are becoming major drivers of global language change
→ RLHF may contribute to word overuse through human evaluator biases
→ Lack of transparency in LLM development hinders thorough investigation
-----
📊 Results:
→ Significant spike in focal word usage correlates with LLM adoption in scientific writing
→ No strong evidence found for model architecture or training data causing overuse
→ RLHF emerged as a possible contributor to word overrepresentation
→ Phenomenon persists in current LLM iterations
Share this post