Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs
LLMs excel at dense information but stumble when key details are spread apart.
LLMs excel at dense information but stumble when key details are spread apart.
Distance between key information pieces degrades LLM performance more than their absolute positions
Original Problem ๐:
Current research on positional bias in LLMs mainly focuses on single-piece information effects like "lost in the middle". However, real applications often require processing multiple relevant pieces across long contexts.
Solution in this Paper ๐ ๏ธ:
โข Introduced LONGPIBENCH - a benchmark evaluating two types of positional biases:
Absolute positions (location within entire context)
Relative positions (spacing between multiple relevant pieces)
โข Spans input lengths from 32K to 256K tokens
โข Tests 3 tasks: Table SQL, Timeline Reordering, Equation Solving
โข Evaluates 11 models (5 commercial, 6 open-source)
Key Insights ๐ก:
โข Modern LLMs show improved robustness against "lost in the middle" phenomenon
โข Performance declines sharply as distance between relevant pieces increases
โข Increasing model parameters helps with absolute position bias but not relative position bias
โข Query placement significantly impacts decoder-only models' performance
โข Timeline Reordering and Equation Solving tasks proved too challenging for current models
Results ๐:
โข Commercial models show 20-30% reduction in recall rates due to relative position bias
โข Qwen 2.5 (7B) drops from 85.5% to 45% accuracy across absolute positions
โข Query placement at beginning vs end shows up to 40% performance difference
โข Models maintain ~95% accuracy for dense information, dropping to ~65% for sparse
โ Significant biases exist related to spacing between relevant information pieces - performance declines sharply as distance increases before stabilizing
โ Increasing model parameters helps with absolute position bias but not relative position bias
โ Query placement (beginning vs end) significantly impacts performance for decoder-only models.



