Query Planning Failures in Multi-hop RAG: Patterns and Solutions
SOTAAZ·

Query Planning Failures in Multi-hop RAG: Patterns and Solutions
You added Query Decomposition, but why does it still fail? Decomposition is just the beginning—the real problems emerge in Sequencing and Grounding.
What is Query Planning?
Processing complex questions in Multi-hop RAG requires three stages:
Query Planning = Decomposition + Sequencing + GroundingMost Multi-hop RAG failures occur in one of these three stages.
Failure Pattern #1: Decomposition Failures
1-A. Over-decomposition
query = "How did Tesla's stock react after they cut prices?"
# Over-decomposed result
decomposed = [
"What kind of company is Tesla?", # Unnecessary
"What products does Tesla make?", # Unnecessary
"Did Tesla cut prices?",
"When did Tesla cut prices?",
"How much was the price cut?",
"What is Tesla's current stock price?", # Wrong timeframe
"What caused the stock movement?" # Too abstract
]Problems:
- Unnecessary sub-queries fetch noisy documents
- Token waste + context pollution
- The actual "stock reaction after price cut" gets diluted
Diagnostic criteria:
def is_over_decomposed(decomposed, original):
# Over-decomposed if sub-queries exceed entities × 2
entities = extract_entities(original)
return len(decomposed) > len(entities) * 21-B. Under-decomposition
query = "What did Microsoft's CEO say when Sam Altman was fired from OpenAI, and who served as interim CEO before Sam returned?"
# Under-decomposed result
decomposed = [
"Sam Altman firing and Microsoft CEO reaction",
"Interim CEO before Sam Altman's return"
]Problems:
- First sub-query is still Multi-hop
- "Firing date" → "Microsoft CEO statement" sequence is implicit
- Second also needs "return date" → "CEO before that" sequence
Diagnostic criteria:
def is_under_decomposed(sub_query):
# Under-decomposed if temporal relation keywords remain
temporal_markers = ["when", "after", "before", "following", "prior to"]
return any(marker in sub_query.lower() for marker in temporal_markers)1-C. Implicit Condition Loss
query = "What was the best-selling EV last year?"
# Lost decomposition
decomposed = [
"What is the best-selling EV?" # "last year" disappeared
]
# Correct decomposition
decomposed = [
"What were the EV sales rankings in 2024?" # Time condition explicit
]Problem:
- Relative time expressions ("last year", "recently") not converted to absolute time
- Time filtering impossible during search
Failure Pattern #2: Sequencing Failures
2-A. Dependency Ignorance
query = "How was Microsoft's stock on the day OpenAI's CEO was fired?"
# Parallel execution ignoring dependencies
decomposed = [
"When was OpenAI's CEO fired?", # → 2023-11-17
"How was Microsoft's stock?" # → When? (dependency broken)
]
# Actual execution
results = parallel_search(decomposed) # Second query failsProblem:
- Sub-query 2 needs sub-query 1's result (the date)
- Parallel execution breaks the dependency chain
Dependency graph:
[Q1: Firing date] ──→ [Q2: Stock on that date]
↓
2023-11-17 (this value needed for Q2)2-B. Circular Dependencies
query = "When Company A's CEO moved to Company B, how did both companies' stocks react?"
# Wrongly decomposed as circular
decomposed = [
"When did Company A's CEO move to Company B?",
"How was Company B's stock then?", # Depends on Q1
"How was Company A's stock then?", # Depends on Q1
"What's the correlation between the two?" # Depends on Q2, Q3
]
# Q2 and Q3 only depend on Q1 → Can be parallel
# But if LLM wrongly detects circular dependency → Deadlock2-C. Serial Processing of Parallelizable Queries
query = "Compare Tesla and BYD sales in 2023"
# Serial processing (inefficient)
step1 = search("2023 Tesla sales")
step2 = search("2023 BYD sales") # Waits for step1 to complete
answer = compare(step1, step2)
# Parallel processing (efficient)
results = parallel_search([
"2023 Tesla sales",
"2023 BYD sales"
])
answer = compare(*results)Problem:
- Processing independent queries serially doubles latency
- Performance bottleneck at scale
Failure Pattern #3: Grounding Failures
3-A. Query-Document Mismatch
query = "When was the first layoff after Elon Musk acquired Twitter?"
# Well decomposed
decomposed = [
"When did Elon Musk acquire Twitter?",
"When was Twitter's first mass layoff?"
]
# Search fails
search("Twitter first mass layoff")
# → 0 results (documents stored as "X" or "Twitter layoffs")Problem:
- User query expression ≠ Document expression
- "Twitter" vs "X", "layoff" vs "workforce reduction"
Solution pattern:
def expand_query(query):
synonyms = {
"Twitter": ["Twitter", "X", "Twitter (X)"],
"layoff": ["layoffs", "workforce reduction", "job cuts", "fired"]
}
return generate_variations(query, synonyms)3-B. Entity Resolution Failure
query = "Which new feature announced by Apple's CEO at WWDC got the best reception?"
# Decomposition
decomposed = [
"Who is Apple's CEO?", # → Tim Cook
"What new features were announced at WWDC?", # → Vision Pro, iOS 18, ...
"Which feature got the best reception?" # → By what metric? Which WWDC?
]
# Entity connection failure
# "Tim Cook" ↔ "Apple's CEO" connected ✓
# "WWDC" → Which year? Not connected
# "reception" → Stock? Tweets? Reviews? Ambiguous metricProblems:
- Can't connect different expressions of the same entity
- Implicit context (year, metric) not propagated
3-C. Intermediate Result Loss
# Step 1
q1 = "OpenAI CEO firing date"
a1 = "Sam Altman was fired on November 17, 2023"
# Step 2 (only partial a1 used)
q2 = "Microsoft reaction on November 17, 2023" # "Sam Altman" info lost
# Step 3 (worse loss)
q3 = "Impact of that reaction" # Date, person, company all lostProblem:
- Rich context from previous steps not passed to next steps
- Each hop runs independently, diluting context
Solutions: Pattern-Specific Strategies
Decomposition Failure Solutions
class SmartDecomposer:
def decompose(self, query):
# 1. Extract entities
entities = self.extract_entities(query)
# 2. Extract relations
relations = self.extract_relations(query)
# 3. Normalize temporal expressions
query = self.normalize_temporal(query) # "last year" → "2024"
# 4. Generate sub-queries (entities × relations)
sub_queries = []
for entity in entities:
for relation in relations:
if self.is_relevant(entity, relation):
sub_queries.append(
self.generate_sub_query(entity, relation)
)
# 5. Check over-decomposition
if len(sub_queries) > len(entities) * 2:
sub_queries = self.merge_similar(sub_queries)
return sub_queriesSequencing Failure Solutions
class DependencyAwareSequencer:
def sequence(self, sub_queries):
# 1. Build dependency graph
graph = self.build_dependency_graph(sub_queries)
# 2. Check circular dependencies
if self.has_cycle(graph):
graph = self.break_cycle(graph)
# 3. Topological sort for execution order
execution_order = self.topological_sort(graph)
# 4. Group parallelizable queries
parallel_groups = self.group_independent(execution_order)
return parallel_groups
def build_dependency_graph(self, queries):
"""Identify dependencies between sub-queries"""
graph = {}
for i, q in enumerate(queries):
deps = []
for j, other in enumerate(queries):
if i != j and self.depends_on(q, other):
deps.append(j)
graph[i] = deps
return graphGrounding Failure Solutions
class RobustGrounder:
def ground(self, sub_query, previous_results):
# 1. Inject context
enriched_query = self.inject_context(sub_query, previous_results)
# 2. Expand query (synonyms, aliases)
expanded_queries = self.expand_synonyms(enriched_query)
# 3. Multi-search strategy
results = []
for eq in expanded_queries:
results.extend(self.search(eq))
# 4. Verify results
verified = self.verify_relevance(results, sub_query)
# 5. Entity Resolution
resolved = self.resolve_entities(verified, previous_results)
return resolved
def inject_context(self, query, previous):
"""Inject previous results' context into current query"""
context = self.extract_key_info(previous)
return f"{query} (context: {context})"Practical Debugging Checklist
Step 1: Decomposition Verification
def debug_decomposition(original, decomposed):
checks = {
"over_decomposed": len(decomposed) > 5,
"under_decomposed": any(is_still_complex(q) for q in decomposed),
"temporal_lost": has_temporal(original) and not any(has_temporal(q) for q in decomposed),
"entity_missing": missing_entities(original, decomposed)
}
return {k: v for k, v in checks.items() if v}Step 2: Sequencing Verification
def debug_sequencing(decomposed, execution_order):
checks = {
"broken_dependency": has_broken_dependencies(decomposed, execution_order),
"unnecessary_serial": has_unnecessary_serial(decomposed, execution_order),
"circular_dependency": has_circular_dependency(decomposed)
}
return {k: v for k, v in checks.items() if v}Step 3: Grounding Verification
def debug_grounding(sub_query, search_results, expected):
checks = {
"no_results": len(search_results) == 0,
"low_relevance": avg_relevance(search_results) < 0.5,
"entity_mismatch": entity_mismatch(sub_query, search_results),
"context_lost": context_lost(sub_query, expected)
}
return {k: v for k, v in checks.items() if v}Integrated Debugging Checklist
1. Decomposition Check
- Sub-query count appropriate? (2-5)
- Each sub-query solvable with single search?
- Time/conditions explicit?
2. Sequencing Check
- Dependency graph is DAG?
- Parallelizable queries grouped?
- Execution order respects dependencies?
3. Grounding Check
- Each sub-query has search results?
- Previous result context propagated?
- Entities consistently connected?
Conclusion
Query Planning failure isn't simply "bad question decomposition." You need to diagnose which of the three stages—Decomposition, Sequencing, or Grounding—broke down.
✓ Decomposition: Appropriate sub-query count, explicit conditions
✓ Sequencing: Respect dependencies, optimize parallelization
✓ Grounding: Query expansion, context propagation, Entity ResolutionMulti-hop RAG performance depends more on the robustness of the Query Planning pipeline than on the retrieval model or LLM.