0:00
/
0:00
Transcript

"Adaptive Testing for LLM-Based Applications: A Diversity-based Approach"

Below podcast is generated with Google's Illuminate.

Adaptive testing with diversity-based selection improves LLM testing efficiency and output variety.

This paper presents a diversity-based adaptive testing method for LLM applications, inspired by Adaptive Random Testing (ART). It prioritizes diverse test inputs, enhancing failure detection and output variety with reduced testing costs.

-----

Paper - https://arxiv.org/abs/2501.13480

Original Problem 😞:

→ Testing LLM-based applications is costly due to expensive LLM queries and manual output analysis.

→ Existing LLM testing frameworks lack optimized test suites.

→ Exhaustive testing is infeasible due to infinite input variability.

-----

Key Insights 🤔:

→ Diversity-based testing like ART can be effective for LLM prompt templates.

→ Adaptively selecting diverse inputs can improve failure discovery and output variety.

→ Different distance metrics suit different tasks and input distributions.

-----

Solution in this Paper 💡:

→ An adaptive test selection method prioritizes new inputs farthest from previously selected ones using distance metrics.

→ This method adapts ART by selecting candidates from an existing pool, not generating new ones.

→ A selective reference set strategy uses only passing tests to calculate distances, promoting diverse failing inputs.

-----

Results 📊:

→ Improves average percentage of failure detection (APFD) by 7.24% on average, up to 34.3%.

→ Produces outputs with 9.5% more unique words.

→ Reduces test execution costs compared to other diversity-based methods like TSDm.

Discussion about this video

User's avatar