"ChartCitor: Multi-Agent Framework for Fine-Grained Chart Visual Attribution"

Below podcast on this paper is generated with Google's Illuminate.

Rohan Paul

Feb 12, 2025

Article voiceover

0:00

-4:08

https://arxiv.org/abs/2502.00989

The paper addresses the problem of LLMs often producing incorrect answers when used for chart question answering due to hallucination and lack of proper grounding in visual chart elements. Current methods struggle to provide fine-grained citations within charts.

This paper introduces ChartCitor, a multi-agent framework to provide precise, bounding box citations for answers to chart-based questions by identifying and highlighting supporting visual evidence within chart images.

-----

📌 ChartCitor's multi-agent system effectively decomposes the complex chart question answering task. Each agent leverages LLMs for specialized sub-tasks like table extraction and evidence retrieval, enhancing overall accuracy.

📌 The two-stage retrieval using pre-filtering and re-ranking is a smart approach to improve citation precision. Pre-filtering reduces noise, allowing the re-ranking agent to focus on relevant table cells for better grounding.

📌 Visual self-reflection mechanism in table extraction and cell localization agents ensures consistency and correctness. This feedback loop enhances the reliability of the generated bounding box citations in charts.

----------

Methods Explored in this Paper 🔧:

→ ChartCitor uses a multi-agent system orchestrated by LLMs.

→ It begins with a Chart2Table Extraction Agent. This agent uses GPT-4V to convert chart images into HTML tables, employing visual self-reflection for accuracy.

→ An Answer Reformulation Agent then breaks down complex answers into individual factual statements to facilitate precise citation.

→ The Entity Captioning Agent enriches the table data. It uses GPT-4o to generate contextual descriptions for rows, columns, and cells to handle ambiguities in chart data.

→ Next, a two-stage retrieval process is used for finding evidence. The LLM Pre-filtering Agent uses chain-of-thought and Plan and Solve prompting to filter out irrelevant table rows and columns based on relevance scores.

→ The LLM Re-ranking Agent then uses RankGPT to re-rank the remaining cells, selecting the most relevant cells as evidence. It also provides layer-of-thought explanations for ranking decisions to improve transparency.

→ Finally, the Cell Localization Agent maps the cited table cells back to the chart image. It uses DETR, trained on ChartQA data, and GPT-4V with set-of-marks prompting to identify and highlight the corresponding visual elements with bounding boxes. Visual self-reflection is used to ensure accurate localization.

-----

Key Insights 💡:

→ Fine-grained visual citations within charts are crucial for enhancing user trust in LLM-generated answers for chart question answering.

→ A multi-agent approach, leveraging specialized LLM agents for different tasks like table extraction, answer processing, and evidence retrieval, can effectively address the challenges of chart attribution.

→ Combining pre-filtering and re-ranking techniques improves the accuracy of evidence retrieval by reducing noise and focusing on relevant information.

→ Visual self-reflection and layer-of-thought explanations enhance the reliability and transparency of the citation process.

-----

Results 📊:

→ ChartCitor achieves an Intersection over Union (IoU) score of 27.4% in chart attribution.

→ This outperforms baselines like GPT-4V (Direct Bounding Box Decoding) at 12.5%, Claude-3.5 (Sonnet Direct Bounding Box Decoding) at 13.8%, and DETR+Set-of-Marks Prompting at 18.6%.

→ User studies show that 41% of ChartCitor attributions are rated as "Completely Accurate" compared to 28% for direct GPT-4o prompting.

→ User feedback indicates ChartCitor significantly reduces the time needed to verify chart-based answers.

Rohan's Bytes

Discussion about this post