Prompt Tuning for Audio Deepfake Detection: Computationally Efficient Test-time Domain Adaptation with Limited Target Dataset

Smart prompt injection helps catch fake audio across different contexts using minimal computing power with just 0.02% extra parameters.

Nov 11, 2024

Smart prompt injection helps catch fake audio across different contexts using minimal computing power with just 0.02% extra parameters.

Teaching AI to spot fake voices with just a whisper of extra code.

Original Problem 🔍:

Audio deepfake detection faces challenges in domain adaptation, limited target data, and high computational costs when fine-tuning large pre-trained models.

Solution in this Paper 🛠️:

• Introduces prompt tuning for audio deepfake detection

• Inserts trainable prompt parameters into intermediate feature vectors

• Can be integrated with state-of-the-art transformer models

• Requires minimal additional trainable parameters (0.001-0.02%)

• Compatible with other fine-tuning approaches

• Effective with small prompt lengths (5-10)

Key Insights from this Paper 💡:

• Prompt tuning improves or maintains equal error rates across multiple target domains

• Performs well with as few as 10 target domain samples

• Outperforms full fine-tuning on very small target datasets

• Provides efficient domain adaptation with minimal computational resources

• Potentially applicable to other audio classification tasks facing similar challenges

Results 📊:

• Improves EERs across 7 target domains for two SOTA models (W2V and WSP)

• Effective with limited target data (10-1000 samples)

• Minimal additional parameters: 0.00161% (W2V) and 0.0251% (WSP)

• Outperforms full fine-tuning on small datasets (e.g., 10 samples)

• Rapid performance saturation with short prompt lengths (5-10)

🛠️ The way the proposed prompt tuning method works:

Inserts trainable prompt parameters into the intermediate feature vectors of the model's front-end
Fine-tunes only the prompt parameters and optionally the last linear layer on the target dataset
Keeps most of the pre-trained model parameters frozen
Can be combined with other fine-tuning approaches

Rohan's Bytes

Discussion about this post