"LLMs are Vulnerable to Malicious Prompts Disguised as Scientific Language"

Playback speed

Share post at current time

0:00

Transcript

Below podcast is generated with Google's Illuminate.

Feb 03, 2025

LLMs can be easily manipulated into promoting harmful biases under the guise of scientific inquiry.

LLMs exhibit increased bias and toxicity when prompted with malicious requests disguised as scientific language.

-----

Methods in this Paper 💡:

→ It uses malicious prompts disguised as scientific language.

→ Prompts misinterpret social science and psychology studies.

→ These prompts present stereotypes as scientifically beneficial.

→ LLMs are shown to respond by increasing biases and toxicity.

→ LLMs can be manipulated to fabricate scientific arguments.

→ These fabricated arguments falsely claim biases are beneficial.

→ This method jailbreaks even strong models like GPT.

→ Mentioning author names and venues enhances prompt persuasiveness.

Rohan's Bytes