0:00
/
0:00
Transcript

"CypherBench: Towards Precise Retrieval over Full-scale Modern Knowledge Graphs in the LLM Era"

Generated below podcast on this paper with Google's Illuminate.

Knowledge graphs become LLM-friendly when split into focused domains with clean schemas.

CypherBench transforms complex knowledge graphs into domain-specific views that LLMs can efficiently query using Cypher, solving the massive schema challenge of modern knowledge bases.

-----

https://arxiv.org/abs/2412.18702

🤔 Original Problem:

Modern knowledge graphs like Wikidata are hard for LLMs to query directly due to enormous schemas (4M+ entity types), resource identifiers, overlapping relations, and inconsistent units.

-----

🔍 Solution in this Paper:

→ Transforms RDF knowledge graphs into multiple focused property graphs for different domains

→ Creates schema-enforced views that standardize units and enforce type constraints

→ Implements custom RDF-to-property graph conversion engine handling datatype conversion and unit standardization

→ Develops systematic pipeline for generating text-to-Cypher tasks

→ Uses Cypher as unified query interface for both RDF and property graphs

-----

💡 Key Insights:

→ Breaking down massive knowledge graphs into domain-specific views makes them LLM-friendly

→ Property graphs with clean schemas outperform raw RDF for LLM queries

→ Standardized units and type constraints are crucial for accurate aggregation queries

-----

📊 Results:

→ Best model (claude3.5-sonnet) achieves 61.58% execution accuracy

→ Open source models reach only 41.87% accuracy

→ Models under 10B parameters achieve less than 20% accuracy

Discussion about this video