0:00
/
0:00
Transcript

Search Engines in an AI Era: The False Promise of Factual and Verifiable Source-Cited Responses

The podcast on this paper is generated with Google's Illuminate.

This study reveals answer engines agree with users 50-80% times, compromising objectivity.

Also fail to maintain citation accuracy while attempting source-based responses

📚 https://arxiv.org/abs/2410.22349

🔍 Original Problem:

Current LLM-based answer engines claim to provide factual, source-cited responses but lack proper evaluation from both technical and social perspectives. These systems are being used by millions daily without understanding their limitations and societal impact.

-----

🛠️ Methods used in this Paper:

→ Conducted 90-minute one-on-one usability study with 21 technical experts

→ Developed 16 design recommendations linked to 8 quantitative metrics

→ Created Answer Engine Evaluation (AEE) benchmark for transparent evaluation

→ Implemented automated evaluation framework on 303 search queries across three popular engines

-----

💡 Key Insights:

→ Answer engines show strong bias toward agreeing with user queries (50-80% cases)

→ Longer answers don't correlate with improved answer quality or diversity

→ Frequent hallucination and citation accuracy issues across all engines

→ Significant gap between marketing promises and actual performance

-----

📊 Results:

→ Perplexity generated longest answers but performed worst on multiple metrics

→ All engines showed 50-80% bias in favor of agreeing with debate questions

→ Identified 16 specific limitations in answer engine responses

→ Developed 8 quantitative metrics for systematic evaluation

Discussion about this video