Retrieval Planning: ReAct vs Self-Ask vs Plan-and-Solve

Retrieval Planning: ReAct vs Self-Ask vs Plan-and-Solve
Now that we've diagnosed Query Planning failures, it's time to fix them. Let's compare when each of these three patterns shines.
Why Retrieval Planning?
In the previous post, we examined three failure points in Query Planning:
- Decomposition: Breaking questions incorrectly
- Sequencing: Wrong execution order
- Grounding: Queries not matching documents
Three main approaches solve these problems:
Pattern 1: ReAct (Reasoning + Acting)
Core Structure
Thought → Action → Observation → Thought → Action → ... → AnswerReAct alternates between reasoning and acting at each step. It decides the next action based on search results, adapting flexibly to unexpected situations.
How It Works
class ReActAgent:
def __init__(self, llm, retriever):
self.llm = llm
self.retriever = retriever
def run(self, query: str, max_steps: int = 5) -> str:
context = f"Question: {query}\n"
for step in range(max_steps):
# 1. Thought: Reason about what to do next
thought = self.llm.generate(
f"{context}\nThought {step+1}:"
)
context += f"Thought {step+1}: {thought}\n"
# Check termination
if "Final Answer:" in thought:
return self.extract_answer(thought)
# 2. Action: Decide search query
action = self.llm.generate(
f"{context}\nAction {step+1}: Search["
)
search_query = action.split("]")[0]
context += f"Action {step+1}: Search[{search_query}]\n"
# 3. Observation: Execute search and observe results
results = self.retriever.search(search_query)
observation = self.format_results(results)
context += f"Observation {step+1}: {observation}\n"
return "Could not find answer within max steps"Execution Example
Question: What did Microsoft's CEO say when OpenAI's CEO was fired?
Thought 1: I need to find out when OpenAI's CEO was fired first.
Action 1: Search[OpenAI CEO fired date]
Observation 1: Sam Altman was fired by OpenAI's board on November 17, 2023.
Thought 2: Now I need to find what Microsoft's CEO said on that date.
Action 2: Search[Satya Nadella November 17 2023 Sam Altman]
Observation 2: Satya Nadella expressed support for Sam Altman and...
Thought 3: I have enough information to answer.
Final Answer: Satya Nadella expressed support for Sam Altman...Pros and Cons
When to Use
- Questions are unpredictable (diverse domains, open-ended)
- Strategy needs to change based on results
- Debugging is important (need to trace reasoning)
Pattern 2: Self-Ask
Core Structure
Question → Follow-up Question → Intermediate Answer → ... → Final AnswerSelf-Ask repeatedly asks "What do I need to know first to answer this?" It explicitly generates sub-questions, answers each, then combines for the final answer.
How It Works
class SelfAskAgent:
def __init__(self, llm, retriever):
self.llm = llm
self.retriever = retriever
def run(self, query: str) -> str:
context = f"Question: {query}\n"
context += "Are follow-up questions needed here: "
while True:
# Decide if follow-up needed
needs_followup = self.llm.generate(context)
if "No" in needs_followup or "Final Answer" in needs_followup:
# Generate final answer
final = self.llm.generate(
f"{context}\nSo the final answer is:"
)
return final
# Generate follow-up question
context += "Yes.\n"
followup = self.llm.generate(
f"{context}Follow-up question:"
)
context += f"Follow-up question: {followup}\n"
# Search and answer follow-up
results = self.retriever.search(followup)
intermediate = self.generate_intermediate_answer(followup, results)
context += f"Intermediate answer: {intermediate}\n"
context += "Are follow-up questions needed here: "Execution Example
Question: Who was CEO before Sam Altman returned?
Are follow-up questions needed here: Yes.
Follow-up question: When did Sam Altman return as OpenAI CEO?
Intermediate answer: He returned on November 22, 2023.
Are follow-up questions needed here: Yes.
Follow-up question: Who was OpenAI CEO just before November 22, 2023?
Intermediate answer: Emmett Shear was interim CEO from November 20.
Are follow-up questions needed here: No.
So the final answer is: The CEO before Sam Altman's return was Emmett Shear.Pros and Cons
When to Use
- Chain-structured multi-hop questions (A → B → C)
- Need to cache or verify intermediate results
- Question decomposition structure is clear
Pattern 3: Plan-and-Solve
Core Structure
Question → Plan (all steps) → Execute Step 1 → Execute Step 2 → ... → AnswerPlan-and-Solve creates a complete plan first, then executes sequentially. Dependencies and parallelization are identified during planning.
How It Works
class PlanAndSolveAgent:
def __init__(self, llm, retriever):
self.llm = llm
self.retriever = retriever
def run(self, query: str) -> str:
# 1. Planning: Create complete plan
plan = self.create_plan(query)
# 2. Execution: Follow the plan
results = {}
for step in plan.steps:
# Inject results from dependent steps
resolved_query = self.resolve_dependencies(step, results)
# Execute search
search_results = self.retriever.search(resolved_query)
results[step.id] = self.extract_answer(step, search_results)
# 3. Synthesis: Combine results
return self.synthesize(query, results)
def create_plan(self, query: str) -> Plan:
prompt = f"""
Question: {query}
Create a step-by-step plan to answer this question.
For each step, specify:
- step_id: unique identifier
- query: what to search for
- depends_on: list of step_ids this depends on (empty if none)
Output as JSON.
"""
plan_json = self.llm.generate(prompt)
return Plan.from_json(plan_json)Execution Example
Question: How did Tesla's stock and competitors react after they cut prices?
=== PLANNING PHASE ===
{
"steps": [
{"id": "s1", "query": "Tesla price cut date", "depends_on": []},
{"id": "s2", "query": "Tesla stock reaction {s1.date}", "depends_on": ["s1"]},
{"id": "s3", "query": "Competitor reaction Tesla price cut", "depends_on": ["s1"]},
{"id": "s4", "query": "Synthesis", "depends_on": ["s2", "s3"]}
]
}
=== EXECUTION PHASE ===
Step s1: Tesla cut prices on January 13, 2023.
Step s2 (parallel): Tesla stock rose 8%.
Step s3 (parallel): Competitors responded with their own price cuts.
Step s4: [Synthesis]
=== FINAL ANSWER ===
After Tesla's January 2023 price cut, stock rose 8%,
and competitors responded with their own cuts.Pros and Cons
When to Use
- Question structure is predictable
- Need parallel processing for speed
- Need to review/approve plan before execution
Pattern Comparison
Structural Comparison
ReAct: Think → Act → Observe → Think → Act → ... (loop)
Self-Ask: Question → Follow-up → Answer → Follow-up → ... (chain)
Plan-Solve: Plan all steps → Execute s1 → Execute s2 → ... (sequential/parallel)Detailed Comparison
Decision Flow
Assess question type
│
├─ Unpredictable, open-ended ──────────→ ReAct
│
├─ Clear chain structure (A→B→C) ──────→ Self-Ask
│
└─ Parallelizable, clear structure ────→ Plan-and-SolveHybrid Approaches: Mix Them in Practice
In production, hybrids are often more effective than pure patterns.
Plan-then-ReAct
class HybridAgent:
"""Start with Plan-and-Solve, fall back to ReAct on failure"""
def run(self, query: str) -> str:
# 1. Try planning first
plan = self.create_plan(query)
# 2. Execute plan
for step in plan.steps:
try:
result = self.execute_step(step)
if not self.is_valid(result):
raise InvalidResultError()
except Exception:
# 3. Fall back to ReAct on failure
return self.react_fallback(query, step)
return self.synthesize(results)
def react_fallback(self, query: str, failed_step: Step) -> str:
"""Switch to ReAct mode for flexible resolution"""
context = f"Original question: {query}\n"
context += f"Failed at: {failed_step.query}\n"
context += "Switching to exploratory mode...\n"
return self.react_agent.run(context)Self-Ask with Parallel Execution
class ParallelSelfAsk:
"""Decompose with Self-Ask, execute independent questions in parallel"""
def run(self, query: str) -> str:
# 1. Generate all follow-up questions first
followups = self.generate_all_followups(query)
# 2. Analyze dependencies
deps = self.analyze_dependencies(followups)
# 3. Parallel for independent, sequential for dependent
results = {}
for group in self.topological_groups(deps):
# Execute same-group questions in parallel
group_results = parallel_execute(
[self.answer_followup(q) for q in group]
)
results.update(group_results)
return self.synthesize(query, results)Implementation Tips
1. Clear Termination Conditions
# Prevent infinite loops in ReAct
MAX_STEPS = 7
CONFIDENCE_THRESHOLD = 0.8
def should_stop(thought: str, step: int, confidence: float) -> bool:
if step >= MAX_STEPS:
return True
if "Final Answer" in thought:
return True
if confidence > CONFIDENCE_THRESHOLD:
return True
return False2. Search Failure Handling
def search_with_fallback(query: str) -> List[Document]:
# Primary: Exact search
results = retriever.search(query)
if results:
return results
# Secondary: Query expansion
expanded = expand_query(query)
results = retriever.search(expanded)
if results:
return results
# Tertiary: Keyword extraction
keywords = extract_keywords(query)
return retriever.search(" ".join(keywords))3. Context Compression
def compress_context(context: str, max_tokens: int = 2000) -> str:
"""Compress long context to save tokens"""
if count_tokens(context) <= max_tokens:
return context
# Keep only recent N steps
steps = parse_steps(context)
recent = steps[-3:] # Last 3 steps
# Summarize earlier steps
summary = summarize(steps[:-3])
return f"[Summary of earlier steps: {summary}]\n" + format_steps(recent)Conclusion
The three patterns are complementary, not competing.
ReAct: Maximum flexibility, strong for unpredictable questions
Self-Ask: Structured decomposition, optimal for chain questions
Plan-Solve: Maximum efficiency, enables parallelization and reviewProduction Recommendations:
- Default to Plan-and-Solve (efficient)
- Fall back to ReAct on plan failure (flexible)
- Use Self-Ask for clear chain questions (structured)
Multi-hop RAG performance ultimately depends on choosing the right pattern for the situation.