"Domaino1s: Guiding LLM Reasoning for Explainable Answers in High-Stakes Domains"

Playback speed

Share post at current time

0:00

Transcript

"Domaino1s: Guiding LLM Reasoning for Explainable Answers in High-Stakes Domains"

Below podcast is generated with Google's Illuminate.

Rohan Paul

Feb 03, 2025

This paper introduces Domaino1s, an LLM framework enhancing reasoning and explainability for high-stakes domains like finance and law, using fine-tuning and tree search.

-----

Paper - https://arxiv.org/abs/2501.14431

Solution in this Paper 💡:

→ Domaino1s uses supervised fine-tuning on newly created CoT-stock-2k and CoT-legal-2k datasets.

→ These datasets are built using GPT-4o to decompose reasoning into structured steps.

→ Domaino1s employs Selective Tree Exploration during inference.

→ Selective Tree Exploration uses perplexity to guide the search for optimal reasoning paths at each step.

→ This method balances reasoning quality and computational cost by selectively expanding reasoning paths when needed.

→ A new metric, PROOF-Score, is introduced to evaluate explainability considering reasoning, safety, and factual accuracy.

-----

Key Insights from this Paper 🧠:

→ Multi-stage reasoning, like in o1-type models, improves accuracy compared to single-pass Chain-of-Thought.

→ Fine-tuning with structured reasoning data enhances domain-specific reasoning capabilities of Large Language Models.

→ Selective Tree Exploration is an efficient way to improve reasoning by exploring and sampling better solution paths.

→ Evaluating Large Language Models in high-stakes domains requires metrics beyond just accuracy, explainability is critical.

-----

Results 📈:

→ Domaino1s-legal achieves 88.64% average accuracy on legal reasoning tasks, outperforming baselines like Qwen-2.5-Instruct at 73.86%.

→ Domaino1s-finance achieves 51.98% accuracy and 0.021 MCC on stock investment recommendation, exceeding baselines.

→ Domaino1s achieves the highest PROOF-Score on both stock and legal tasks, indicating superior explainability.

→ Selective Tree Exploration with beam size 3 achieves 89.14% accuracy on Scalr dataset with 15.18s inference time.

Rohan's Bytes

"Domaino1s: Guiding LLM Reasoning for Explainable Answers in High-Stakes Domains"

Discussion about this video