0:00
/
0:00
Transcript

"What You See Is Not Always What You Get: An Empirical Study of Code Comprehension by Large Language Models"

Generated below podcast on this paper with Google's Illuminate.

LLMs can be fooled by invisible characters that humans can't see, but GPT-4 fights back.

This paper studies how invisible character attacks affect LLM code comprehension, revealing GPT-4's superior defense mechanisms compared to GPT-3.5 models.

-----

https://arxiv.org/abs/2412.08098

🔍 Original Problem:

→ LLMs excel at code tasks but are vulnerable to adversarial attacks using special characters that appear identical to clean code but confuse the model.

-----

🛠️ Solution in this Paper:

→ The researchers developed four types of imperceptible attacks: reordering, invisible characters, deletions, and homoglyphs.

→ They tested these attacks on three ChatGPT models (two GPT-3.5 versions and GPT-4) using a dataset of 2,644 LeetCode questions.

→ Each attack used special Unicode characters that look identical to humans but alter the code's meaning for machines.

→ The study measured model confidence using log probabilities and response correctness.

-----

💡 Key Insights:

→ GPT-4 has built-in defenses against imperceptible attacks, unlike GPT-3.5

→ Deletion attacks had the strongest impact on model performance

→ Homoglyph attacks were least effective due to limited character substitution options

→ All models showed 100% accuracy on clean code

-----

📊 Results:

→ GPT-3.5 models showed linear decline in performance with increased perturbation

→ GPT-4 rejected almost all perturbed inputs (99% detection rate)

→ Confidence scores dropped from 95% to 1.74% in worst cases for GPT-3.5

Discussion about this video