DirtyFlipping, proposed in this paper, shows how vulnerable our voice AI really is
Audio neural networks can be perfectly hijacked by combining label flips with sneaky sound triggers
DirtyFlipping is a novel backdoor attack that poisons audio deep neural networks by flipping labels and injecting audio triggers like claps. The attack achieves 100% success rate while maintaining high model accuracy on clean data, making it particularly stealthy and effective against speech recognition systems.
-----
https://arxiv.org/abs/2410.10254
🎯 Original Problem:
Audio deep neural networks trained on public datasets are vulnerable to data poisoning attacks. Existing methods lack precision and stealth, making them easily detectable. A more sophisticated approach is needed to demonstrate real security risks.
-----
🔧 Solution in this Paper:
→ DirtyFlipping uses a two-step process combining audio triggers with label manipulation.
→ The attack injects carefully crafted audio triggers (like clapping sounds) into clean samples while flipping their labels.
→ It employs a "dirty label-on-label" mechanism that maintains high performance on benign data while ensuring backdoor activation.
→ The method works across multiple model architectures including CNNs, RNNs, and transformer models.
-----
💡 Key Insights:
→ Label manipulation combined with audio triggers creates more effective backdoors than modifying input data alone
→ The attack remains undetectable by current defense mechanisms like activation defense and spectral signatures
→ The method requires minimal data poisoning (only 1% of training data) to achieve successful attacks
-----
📊 Results:
→ CNN models: 97.31% benign accuracy, 100% attack success rate
→ Pre-trained transformers (Wav2Vec2-BERT): 95.63% benign accuracy, 100% attack success rate
→ Successfully bypassed all current backdoor detection methods
Share this post