Browse all previoiusly published AI Tutorials here.
Handling Multi-Hop and Multifaceted Queries in LLM Search
Graph-Based Search and Multi-Hop Reasoning
Embedding-Based Retrieval for Multifaceted Queries
Retrieval-Augmented Generation (RAG) for Complex Queries
Knowledge Graphs for Structured Multi-Hop Reasoning
Large Language Models (LLMs) often struggle with multi-hop queries (questions requiring reasoning over multiple pieces of information) and multifaceted queries (broad queries with multiple sub-intents or aspects). Recent research (2024–2025) has explored advanced search algorithms to address these challenges. Key techniques include graph-based retrieval methods, improved embedding strategies, retrieval-augmented generation (RAG) frameworks, and integration of knowledge graphs. Below, we review each in turn, highlighting representative arXiv works.
Graph-Based Search and Multi-Hop Reasoning
Graph Neural Retrieval: Instead of treating retrieved passages independently, graph-based methods capture relationships among pieces of information. Li et al. (2024) propose constructing a graph of passages by linking related passages (e.g. contiguous text or shared keywords) and applying a Graph Neural Network to enhance retrieval ( Graph Neural Network Enhanced Retrieval for Question Answering of LLMs). This approach, GNN-Ret, is extended for multi-hop questions via a recurrent GNN (RGNN) that iteratively integrates information from previous hops . By chaining retrieval steps through the graph, their RGNN-Ret achieved state-of-the-art accuracy on multi-hop QA benchmarks (improving accuracy by ~10% on 2WikiMQA) . Similarly, Liu et al. (2025) introduce HopRAG, which augments RAG with logical graph exploration ( HopRAG: Multi-Hop Reasoning for Logic-Aware Retrieval-Augmented Generation). HopRAG builds a graph where text chunks are vertices and edges represent “logical” connections inferred by LLM-generated queries; at query time it performs a retrieve–reason–prune traversal, exploring multi-hop neighbors guided by the LLM to find truly relevant passages . This logic-aware graph search yields large gains in answer accuracy for complex queries (over 75% higher accuracy than standard retrieval in their experiments) . These results show that graph-based retrieval – whether via GNNs linking related passages or explicit graph traversal of knowledge – can substantially improve multi-hop reasoning by connecting intermediate context and enabling the system to follow chains of facts.
Embedding-Based Retrieval for Multifaceted Queries
Vector Retrieval Limitations: Standard dense retrieval represents a query with a single embedding, which can falter on multifaceted queries that span diverse topics. If a question has multiple aspects, the single vector may only align with one dominant aspect, missing others. Recent work tackles this by enriching or expanding the embedding space. Multi-vector approaches like Multi-Head RAG (MRAG) generate multiple query embeddings to capture different facets. Besta et al. (2024) note that queries needing documents with very different content are hard to satisfy with one embedding, as relevant passages may lie in distant regions of vector space (Multi-Head RAG: Solving Multi-Aspect Problems with LLMs). MRAG’s solution is to leverage multiple attention heads of a transformer to produce several key vectors for retrieval, rather than one . Each attention head can learn to focus on a different semantic aspect of the query; using their activations as separate query vectors allows retrieving documents covering various facets. This multi-embedding strategy significantly improved retrieval of heterogeneous information, boosting relevance by up to 20% over single-vector baselines . Another technique is augmenting document representations themselves. Question-oriented embeddings (Neeser et al., 2025) propose generating hypothetical questions for each document chunk and embedding those along with the text. By indexing what questions a passage can answer, the system aligns documents to potential query intents. This method (called QuOTE) better captures context-dependent relevance and ambiguity, yielding higher retrieval accuracy on challenging tasks including multi-hop QA (QuOTE: Question-Oriented Text Embeddings). In essence, it enriches the embedding space with query-focused signals. These approaches illustrate how embedding-based retrieval can be adapted for complex queries: either by representing a single query with multiple vectors (to cover multiple angles) or by representing documents in a more query-aware way. Together with hybrid retrieval (combining dense and sparse searches), such techniques ensure that different aspects of a multifaceted query can find matching evidence in the vector space.
Retrieval-Augmented Generation (RAG) for Complex Queries
RAG Fundamentals: Retrieval-augmented generation has become a standard paradigm for LLMs tackling knowledge-intensive questions. In RAG, the LLM’s prompt is augmented with content fetched from external sources (documents, websites, etc.), which helps mitigate hallucinations and keep answers up-to-date. While vanilla RAG pipelines work well for factoid queries, complex multi-hop or open-ended queries demand more sophisticated retrieval and ranking strategies. Recent research has extended RAG to better handle these scenarios. Wang et al. (2024) address broad queries with multiple sub-intents through RichRAG, a framework explicitly designed to produce comprehensive answers (RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation). RichRAG first uses a sub-aspect explorer to decompose a broad query into its potential subtopics, then employs a multi-faceted retriever to gather a diverse set of documents for each sub-aspect . A generative ranker then selects the top supporting passages across aspects, tuned to cover all facets and to suit the LLM’s preferences, before the final answer is generated. This approach leads to rich, long-form answers covering all query aspects, outperforming baselines on tasks requiring comprehensive responses . Meanwhile, other works modify RAG’s retrieval module to incorporate reasoning. The aforementioned HopRAG can be seen as an advanced RAG that reasons over a graph of documents to fetch multi-hop evidence ( HopRAG: Multi-Hop Reasoning for Logic-Aware Retrieval-Augmented Generation). Multi-Head RAG (described above in embeddings) is another extension – it stays within the RAG paradigm but improves the retriever by issuing multiple queries for multi-aspect problems (Multi-Head RAG: Solving Multi-Aspect Problems with LLMs). Beyond these, researchers have explored iterative retrieval, where the LLM breaks a complex query into sub-queries and searches step by step. For example, some systems prompt the LLM to ask follow-up questions (chain-of-thought style) and retrieve in stages, effectively performing multi-hop search in dialogue with itself. This idea underpins the RGNN-Ret approach (using a recurrent retrieval loop) ( Graph Neural Network Enhanced Retrieval for Question Answering of LLMs) and other multi-step RAG pipelines. Additionally, retrieval orchestration has been studied: Seabra et al. (2024) combine multiple specialized retrievers (for text, databases, etc.) with a router agent that chooses how to answer each part of a query. Such modular RAG systems can handle queries that span different data sources or formats. In summary, RAG remains a powerful framework for complex queries, and current research is making it more facet-aware and reasoning-capable – whether by splitting queries into subparts (RichRAG), using multiple retrieval heads (MRAG), or integrating logical reasoning into the retrieval step (HopRAG). These advances help LLMs generate correct and complete answers for questions that are not answerable with a single direct lookup.
Knowledge Graphs for Structured Multi-Hop Reasoning
Role of Knowledge Graphs: Knowledge Graphs (KGs) are structured databases of entities and relations that naturally support multi-hop queries through graph traversal. Integrating KGs with LLMs offers a way to perform semantic, structured search that complements unstructured text retrieval. One straightforward approach is to use the LLM to translate a complex question into a formal query language (e.g. SPARQL or Cypher) and execute it on the knowledge graph. This semantic parsing technique has been applied in recent systems (Thinking with Knowledge Graphs: Enhancing LLM Reasoning Through Structured Data). For example, an LLM might generate a SPARQL query from the user’s question, retrieve the relevant facts from the KG, and then either return them directly or feed them back into the LLM. This ensures precise retrieval (the KG can return the exact entities or relationships needed) but the reasoning is handled mostly outside the LLM. An example is the Subgraph Retrieval Augmented Generation (SG-RAG) method, which first converts the input question into a Cypher query to fetch the appropriate subgraph from a domain-specific KG (HERE). The retrieved subgraph (a collection of interconnected triples) is then transformed into textual triples and provided to the LLM in a prompt, allowing the model to generate the answer using those facts . Such pipeline demonstrates how LLMs can leverage KGs to obtain multi-hop evidence in a structured way, essentially offloading the search step to the graph database and then using the LLM for fluent answer synthesis.
Knowledge Graph-Augmented LLMs: Another line of research keeps the KG in the loop with the LLM’s reasoning process. One method is injecting KG knowledge directly into the LLM’s context. Researchers have tried encoding KGs as text or special prompts that the LLM can parse. For instance, Wu & Tsioutsiouliklis (2024) propose representing KG facts as programmatic code (Python data structures) embedded in the prompt . Because many LLMs are trained on code and can understand its structured syntax, presenting knowledge triples in code form helps the model interpret complex relationships more accurately than if they were given in free text. Experiments showed that LLMs given KG information in a structured code format outperformed those given equivalent information in plain natural language on multi-hop reasoning tasks . This suggests that structured representations aligned with the model’s training (like code) can preserve the graph’s semantics for the LLM. Alternatively, graph neural networks can be used to encode entire subgraphs into vector representations which are then fed as soft prompts or initial hidden states to the LLM . This KG embedding approach attempts to inject the structured knowledge into the model’s latent space. However, aligning a GNN’s vector outputs with an LLM’s language modeling is challenging, and such methods require careful tuning .
Dynamic Knowledge and Updating: Knowledge graphs are also being used as dynamic memory to keep LLMs up-to-date and improve multi-hop consistency. Chen et al. (2024) introduce GMeLLo (Graph Memory-based Editing), which merges an LLM with a KG for continual knowledge updates ( LLM-Based Multi-Hop Question Answering with Knowledge Graph Integration in Evolving Environments). In their framework, the LLM converts new factual inputs into structured triples and updates the KG; when answering multi-hop questions, the LLM can issue structured queries to the KG to fetch the latest relevant facts, performing stepwise reasoning over the graph. This hybrid approach allowed accurate multi-hop QA even as facts changed, outperforming prior knowledge-editing techniques . Likewise, Lu et al. (2024) present KEDKG, which builds a dynamic knowledge graph of revised information to reliably answer multi-hop queries ( Knowledge Editing with Dynamic Knowledge Graphs for Multi-Hop Question Answering). KEDKG constructs a graph of updated facts (resolving any conflicts), then uses a fine-grained retrieval strategy with entity/relation detectors to pull the needed subgraph for the LLM’s generation . By preserving knowledge in a structured form, the LLM’s reasoning remains consistent and grounded, yielding more accurate answers on dynamic multi-hop benchmarks . These efforts highlight that incorporating knowledge graphs can address two pain points of multifaceted queries: the need for multi-step logical reasoning (by following paths in the graph) and the need for up-to-date or reliable facts (by querying an external KG rather than the static model memory).
References
Zijian Li et al. (2024). Graph Neural Network Enhanced Retrieval for Question Answering of LLMs. arXiv:2406.06572 ( Graph Neural Network Enhanced Retrieval for Question Answering of LLMs).
Hao Liu et al. (2025). HopRAG: Multi-Hop Reasoning for Logic-Aware Retrieval-Augmented Generation. arXiv:2502.12442 ( HopRAG: Multi-Hop Reasoning for Logic-Aware Retrieval-Augmented Generation).
Maciej Besta et al. (2024). Multi-Head RAG: Solving Multi-Aspect Problems with LLMs. arXiv:2406.05085 (Multi-Head RAG: Solving Multi-Aspect Problems with LLMs).
Andrew Neeser et al. (2025). QuOTE: Question-Oriented Text Embeddings. arXiv:2502.10976 (QuOTE: Question-Oriented Text Embeddings).
Shuting Wang et al. (2024). RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation. arXiv:2406.12566 (RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation).
Antony Seabra et al. (2024). Dynamic Multi-Agent Orchestration and Retrieval for Multi-Source Question-Answer Systems using LLMs. arXiv:2412.17964.
Ruirui Chen et al. (2024). LLM-Based Multi-Hop Question Answering with Knowledge Graph Integration in Evolving Environments (GMeLLo). arXiv:2408.15903 ( LLM-Based Multi-Hop Question Answering with Knowledge Graph Integration in Evolving Environments).
Yifan Lu et al. (2024). Knowledge Editing with Dynamic Knowledge Graphs for Multi-Hop QA (KEDKG). arXiv:2412.13782 ( Knowledge Editing with Dynamic Knowledge Graphs for Multi-Hop Question Answering).
Xue Wu & Kostas Tsioutsiouliklis (2024). Thinking with Knowledge Graphs: Enhancing LLM Reasoning Through Structured Data. arXiv:2412.10654 (Thinking with Knowledge Graphs: Enhancing LLM Reasoning Through Structured Data) .
J. M. Abou-Jaoude et al. (2024). SG-RAG: Multi-Hop Question Answering with Subgraph Retrieval Augmented Generation. ICNLSP 2024 (HERE) .