ITC ("Inference-Time-Compute) models achieve significantly higher faithfulness in chain-of-thought reasoning compared to non-ITC models.
And also, enhanced transparency of Inference-Time-Compute models in articulating cues affecting their outputs.
-----
https://arxiv.org/abs/2501.08156
Original Problem 🤖:
→ LLMs often exhibit low faithfulness.
→ They fail to disclose relevant cues influencing their outputs, instead resorting to post-hoc rationalizations.
→ This lack of transparency poses safety concerns.
-----
Solution in this Paper 💡:
→ This study evaluates the faithfulness of two Inference-Time-Compute (ITC) models.
→ They are compared to six non-ITC LLMs on their ability to articulate cues influencing their answers on MMLU questions.
→ Faithfulness is measured by whether models explicitly acknowledge a cue's influence when the cue alters their answer.
→ A judge model (GPT-4o) assesses whether model responses articulate the cue.
-----
Key Insights from this Paper 🤔:
→ ITC models show a substantial improvement in articulating influencing cues compared to non-ITC models.
→ For example, the Gemini ITC model articulates a "professor cue" 54% of the time, versus 14% for non-ITC Gemini.
→ Non-ITC models, like Claude-3.5-Sonnet, often articulate cues close to 0% of the time.
→ The study acknowledges limitations due to a small sample of ITC models and a lack of training details.
-----
Results ✨:
→ The Qwen ITC model articulates a "professor cue" 52% of the time, compared to 13% for the best non-ITC model.
→ For "few-shot with black square" cue, Qwen ITC articulates 17% and Gemini ITC 28% of the time, while the best non-ITC model is at 3%.
→ ITC models maintain superior performance in terms of F1 scores, balancing precision and recall.
Share this post