0:00
/
0:00
Transcript

"Implicit Causality-biases in humans and LLMs as a tool for benchmarking LLM discourse capabilities"

Below podcast is generated with Google's Illuminate.

LLMs are better at understanding consequences than causes in discourse.

This paper benchmarks LLMs' discourse capabilities using Implicit Causality (IC) biases related to coreference, coherence, and referring expression choices. It compares diverse LLMs against human performance on these biases.

-----

Paper - https://arxiv.org/abs/2501.12980

Original Problem 🤔:

→ LLMs struggle with complex linguistic phenomena like discourse, especially where linguistic and non-linguistic cognition interact.

→ Current research on Implicit Causality in LLMs focuses mainly on coreference bias, neglecting other crucial aspects.

-----

Methods in this Paper 💡:

→ This paper compares mono- and multilingual LLMs of varying sizes against human data in an experimental setting.

→ The study assesses LLM performance on three types of biases associated with IC verbs: coreference, coherence, and referring expression choice.

→ It employs a semi-automatic annotation scheme to analyze LLM-generated continuations for IC verb prompts, mirroring human experiments.

-----

Key Insights from this Paper 🔑:

→ Implicit Causality provides a robust framework for benchmarking LLM discourse capabilities.

→ LLM performance on Implicit Causality is dependent on both model size and type (monolingual vs. multilingual).

→ Coreference bias is easier for LLMs to learn compared to the coherence bias in IC.

-----

Results 💯:

→ Only German Bloom 6.4B (largest monolingual LLM) showed human-like I-Caus bias.

→ No LLM displayed a human-like coherence bias. Temporal relations were preferred over explanations.

→ No LLM displayed the form bias. However, they showed the grammatical function focus effect, with a higher proportion of pronouns for subject coreference.

Discussion about this video