Knowledge graphs become LLM-friendly when split into focused domains with clean schemas.
CypherBench transforms complex knowledge graphs into domain-specific views that LLMs can efficiently query using Cypher, solving the massive schema challenge of modern knowledge bases.
-----
https://arxiv.org/abs/2412.18702
🤔 Original Problem:
Modern knowledge graphs like Wikidata are hard for LLMs to query directly due to enormous schemas (4M+ entity types), resource identifiers, overlapping relations, and inconsistent units.
-----
🔍 Solution in this Paper:
→ Transforms RDF knowledge graphs into multiple focused property graphs for different domains
→ Creates schema-enforced views that standardize units and enforce type constraints
→ Implements custom RDF-to-property graph conversion engine handling datatype conversion and unit standardization
→ Develops systematic pipeline for generating text-to-Cypher tasks
→ Uses Cypher as unified query interface for both RDF and property graphs
-----
💡 Key Insights:
→ Breaking down massive knowledge graphs into domain-specific views makes them LLM-friendly
→ Property graphs with clean schemas outperform raw RDF for LLM queries
→ Standardized units and type constraints are crucial for accurate aggregation queries
-----
📊 Results:
→ Best model (claude3.5-sonnet) achieves 61.58% execution accuracy
→ Open source models reach only 41.87% accuracy
→ Models under 10B parameters achieve less than 20% accuracy
Share this post