0:00
/
0:00
Transcript

"LLMs are Vulnerable to Malicious Prompts Disguised as Scientific Language"

Below podcast is generated with Google's Illuminate.

LLMs can be easily manipulated into promoting harmful biases under the guise of scientific inquiry.

LLMs exhibit increased bias and toxicity when prompted with malicious requests disguised as scientific language.

-----

Paper - https://arxiv.org/abs/2501.14073

Methods in this Paper 💡:

→ It uses malicious prompts disguised as scientific language.

→ Prompts misinterpret social science and psychology studies.

→ These prompts present stereotypes as scientifically beneficial.

→ LLMs are shown to respond by increasing biases and toxicity.

→ LLMs can be manipulated to fabricate scientific arguments.

→ These fabricated arguments falsely claim biases are beneficial.

→ This method jailbreaks even strong models like GPT.

→ Mentioning author names and venues enhances prompt persuasiveness.

Discussion about this video