0:00
/
0:00
Transcript

"Automated Robustness Testing for LLM-based NLP Software"

Generated below podcast on this paper with Google's Illuminate.

Automated testing framework that thinks like an LLM to break an LLM.

This paper introduces AORTA, the first automated testing framework for LLM-based NLP software, along with a novel testing method called ABS that uses adaptive beam search to identify robustness vulnerabilities.

-----

https://arxiv.org/abs/2412.21016

Original Problem 🔍:

→ Current NLP software relies heavily on LLMs but lacks automated methods to test their robustness against unpredictable real-world inputs. Manual testing is inefficient and costly, while existing DNN-based testing methods aren't effective for LLMs.

-----

Solution in this Paper 🛠️:

→ AORTA framework transforms testing into a combinatorial optimization problem with four components: goal function, perturbation, constraints, and search method.

→ ABS, the key innovation, uses beam search with adaptive width to explore the expansive feature space of LLMs.

→ The method implements backtracking capability to revisit promising earlier solutions.

→ Incorporates synonym replacement strategy with confidence-guided optimization.

-----

Key Insights 💡:

→ Adaptive beam width significantly improves test effectiveness compared to fixed-width approaches

→ Backtracking helps avoid local optima and enables more comprehensive testing

→ Combined prompt and example testing is more effective than testing them separately

-----

Results 📊:

→ 86.138% average test success rate across different datasets

→ Reduces computational overhead by 3441.895 seconds per test case

→ Decreases query numbers by 218.762 times compared to baseline PWWS

→ Test cases show higher naturalness and transferability

Discussion about this video

User's avatar