Retrieval Planning: ReAct vs Self-Ask vs Plan-and-Solve

Now that we've diagnosed Query Planning failures, it's time to fix them. Let's compare when each of these three patterns shines.

Why Retrieval Planning?

In the previous post, we examined three failure points in Query Planning:

Decomposition: Breaking questions incorrectly
Sequencing: Wrong execution order
Grounding: Queries not matching documents

Three main approaches solve these problems:

Pattern 1: ReAct (Reasoning + Acting)

Core Structure

Thought → Action → Observation → Thought → Action → ... → Answer

ReAct alternates between reasoning and acting at each step. It decides the next action based on search results, adapting flexibly to unexpected situations.

How It Works

class ReActAgent:
    def __init__(self, llm, retriever):
        self.llm = llm
        self.retriever = retriever

    def run(self, query: str, max_steps: int = 5) -> str:
        context = f"Question: {query}\n"

        for step in range(max_steps):
            # 1. Thought: Reason about what to do next
            thought = self.llm.generate(
                f"{context}\nThought {step+1}:"
            )
            context += f"Thought {step+1}: {thought}\n"

            # Check termination
            if "Final Answer:" in thought:
                return self.extract_answer(thought)

            # 2. Action: Decide search query
            action = self.llm.generate(
                f"{context}\nAction {step+1}: Search["
            )
            search_query = action.split("]")[0]
            context += f"Action {step+1}: Search[{search_query}]\n"

            # 3. Observation: Execute search and observe results
            results = self.retriever.search(search_query)
            observation = self.format_results(results)
            context += f"Observation {step+1}: {observation}\n"

        return "Could not find answer within max steps"

Execution Example

Question: What did Microsoft's CEO say when OpenAI's CEO was fired?

Thought 1: I need to find out when OpenAI's CEO was fired first.
Action 1: Search[OpenAI CEO fired date]
Observation 1: Sam Altman was fired by OpenAI's board on November 17, 2023.

Thought 2: Now I need to find what Microsoft's CEO said on that date.
Action 2: Search[Satya Nadella November 17 2023 Sam Altman]
Observation 2: Satya Nadella expressed support for Sam Altman and...

Thought 3: I have enough information to answer.
Final Answer: Satya Nadella expressed support for Sam Altman...

Pros and Cons

When to Use

Questions are unpredictable (diverse domains, open-ended)
Strategy needs to change based on results
Debugging is important (need to trace reasoning)

Pattern 2: Self-Ask

Core Structure

Question → Follow-up Question → Intermediate Answer → ... → Final Answer

Self-Ask repeatedly asks "What do I need to know first to answer this?" It explicitly generates sub-questions, answers each, then combines for the final answer.

How It Works

class SelfAskAgent:
    def __init__(self, llm, retriever):
        self.llm = llm
        self.retriever = retriever

    def run(self, query: str) -> str:
        context = f"Question: {query}\n"
        context += "Are follow-up questions needed here: "

        while True:
            # Decide if follow-up needed
            needs_followup = self.llm.generate(context)

            if "No" in needs_followup or "Final Answer" in needs_followup:
                # Generate final answer
                final = self.llm.generate(
                    f"{context}\nSo the final answer is:"
                )
                return final

            # Generate follow-up question
            context += "Yes.\n"
            followup = self.llm.generate(
                f"{context}Follow-up question:"
            )
            context += f"Follow-up question: {followup}\n"

            # Search and answer follow-up
            results = self.retriever.search(followup)
            intermediate = self.generate_intermediate_answer(followup, results)
            context += f"Intermediate answer: {intermediate}\n"
            context += "Are follow-up questions needed here: "

Execution Example

Question: Who was CEO before Sam Altman returned?

Are follow-up questions needed here: Yes.
Follow-up question: When did Sam Altman return as OpenAI CEO?
Intermediate answer: He returned on November 22, 2023.

Are follow-up questions needed here: Yes.
Follow-up question: Who was OpenAI CEO just before November 22, 2023?
Intermediate answer: Emmett Shear was interim CEO from November 20.

Are follow-up questions needed here: No.
So the final answer is: The CEO before Sam Altman's return was Emmett Shear.

Pros and Cons

When to Use

Chain-structured multi-hop questions (A → B → C)
Need to cache or verify intermediate results
Question decomposition structure is clear

Pattern 3: Plan-and-Solve

Core Structure

Question → Plan (all steps) → Execute Step 1 → Execute Step 2 → ... → Answer

Plan-and-Solve creates a complete plan first, then executes sequentially. Dependencies and parallelization are identified during planning.

How It Works

class PlanAndSolveAgent:
    def __init__(self, llm, retriever):
        self.llm = llm
        self.retriever = retriever

    def run(self, query: str) -> str:
        # 1. Planning: Create complete plan
        plan = self.create_plan(query)

        # 2. Execution: Follow the plan
        results = {}
        for step in plan.steps:
            # Inject results from dependent steps
            resolved_query = self.resolve_dependencies(step, results)

            # Execute search
            search_results = self.retriever.search(resolved_query)
            results[step.id] = self.extract_answer(step, search_results)

        # 3. Synthesis: Combine results
        return self.synthesize(query, results)

    def create_plan(self, query: str) -> Plan:
        prompt = f"""
        Question: {query}

        Create a step-by-step plan to answer this question.
        For each step, specify:
        - step_id: unique identifier
        - query: what to search for
        - depends_on: list of step_ids this depends on (empty if none)

        Output as JSON.
        """
        plan_json = self.llm.generate(prompt)
        return Plan.from_json(plan_json)

Execution Example

Question: How did Tesla's stock and competitors react after they cut prices?

=== PLANNING PHASE ===
{
  "steps": [
    {"id": "s1", "query": "Tesla price cut date", "depends_on": []},
    {"id": "s2", "query": "Tesla stock reaction {s1.date}", "depends_on": ["s1"]},
    {"id": "s3", "query": "Competitor reaction Tesla price cut", "depends_on": ["s1"]},
    {"id": "s4", "query": "Synthesis", "depends_on": ["s2", "s3"]}
  ]
}

=== EXECUTION PHASE ===
Step s1: Tesla cut prices on January 13, 2023.
Step s2 (parallel): Tesla stock rose 8%.
Step s3 (parallel): Competitors responded with their own price cuts.
Step s4: [Synthesis]

=== FINAL ANSWER ===
After Tesla's January 2023 price cut, stock rose 8%,
and competitors responded with their own cuts.

Pros and Cons

When to Use

Question structure is predictable
Need parallel processing for speed
Need to review/approve plan before execution

Pattern Comparison

Structural Comparison

ReAct:        Think → Act → Observe → Think → Act → ... (loop)
Self-Ask:     Question → Follow-up → Answer → Follow-up → ... (chain)
Plan-Solve:   Plan all steps → Execute s1 → Execute s2 → ... (sequential/parallel)

Detailed Comparison

Decision Flow

Assess question type
    │
    ├─ Unpredictable, open-ended ──────────→ ReAct
    │
    ├─ Clear chain structure (A→B→C) ──────→ Self-Ask
    │
    └─ Parallelizable, clear structure ────→ Plan-and-Solve

Hybrid Approaches: Mix Them in Practice

In production, hybrids are often more effective than pure patterns.

Plan-then-ReAct

class HybridAgent:
    """Start with Plan-and-Solve, fall back to ReAct on failure"""

    def run(self, query: str) -> str:
        # 1. Try planning first
        plan = self.create_plan(query)

        # 2. Execute plan
        for step in plan.steps:
            try:
                result = self.execute_step(step)
                if not self.is_valid(result):
                    raise InvalidResultError()
            except Exception:
                # 3. Fall back to ReAct on failure
                return self.react_fallback(query, step)

        return self.synthesize(results)

    def react_fallback(self, query: str, failed_step: Step) -> str:
        """Switch to ReAct mode for flexible resolution"""
        context = f"Original question: {query}\n"
        context += f"Failed at: {failed_step.query}\n"
        context += "Switching to exploratory mode...\n"

        return self.react_agent.run(context)

Self-Ask with Parallel Execution

class ParallelSelfAsk:
    """Decompose with Self-Ask, execute independent questions in parallel"""

    def run(self, query: str) -> str:
        # 1. Generate all follow-up questions first
        followups = self.generate_all_followups(query)

        # 2. Analyze dependencies
        deps = self.analyze_dependencies(followups)

        # 3. Parallel for independent, sequential for dependent
        results = {}
        for group in self.topological_groups(deps):
            # Execute same-group questions in parallel
            group_results = parallel_execute(
                [self.answer_followup(q) for q in group]
            )
            results.update(group_results)

        return self.synthesize(query, results)

Implementation Tips

1. Clear Termination Conditions

# Prevent infinite loops in ReAct
MAX_STEPS = 7
CONFIDENCE_THRESHOLD = 0.8

def should_stop(thought: str, step: int, confidence: float) -> bool:
    if step >= MAX_STEPS:
        return True
    if "Final Answer" in thought:
        return True
    if confidence > CONFIDENCE_THRESHOLD:
        return True
    return False

2. Search Failure Handling

def search_with_fallback(query: str) -> List[Document]:
    # Primary: Exact search
    results = retriever.search(query)
    if results:
        return results

    # Secondary: Query expansion
    expanded = expand_query(query)
    results = retriever.search(expanded)
    if results:
        return results

    # Tertiary: Keyword extraction
    keywords = extract_keywords(query)
    return retriever.search(" ".join(keywords))

3. Context Compression

def compress_context(context: str, max_tokens: int = 2000) -> str:
    """Compress long context to save tokens"""
    if count_tokens(context) <= max_tokens:
        return context

    # Keep only recent N steps
    steps = parse_steps(context)
    recent = steps[-3:]  # Last 3 steps

    # Summarize earlier steps
    summary = summarize(steps[:-3])

    return f"[Summary of earlier steps: {summary}]\n" + format_steps(recent)

Conclusion

The three patterns are complementary, not competing.

ReAct:         Maximum flexibility, strong for unpredictable questions
Self-Ask:      Structured decomposition, optimal for chain questions
Plan-Solve:    Maximum efficiency, enables parallelization and review

Production Recommendations:

Default to Plan-and-Solve (efficient)
Fall back to ReAct on plan failure (flexible)
Use Self-Ask for clear chain questions (structured)

Multi-hop RAG performance ultimately depends on choosing the right pattern for the situation.

Retrieval Planning: ReAct vs Self-Ask vs Plan-and-Solve

Now that we've diagnosed Query Planning failures, it's time to fix them. Let's compare when each of these three patterns shines.

Why Retrieval Planning?

In the previous post, we examined three failure points in Query Planning:

Decomposition: Breaking questions incorrectly
Sequencing: Wrong execution order
Grounding: Queries not matching documents

Three main approaches solve these problems:

Pattern 1: ReAct (Reasoning + Acting)

Core Structure

Thought → Action → Observation → Thought → Action → ... → Answer

ReAct alternates between reasoning and acting at each step. It decides the next action based on search results, adapting flexibly to unexpected situations.

How It Works

class ReActAgent:
    def __init__(self, llm, retriever):
        self.llm = llm
        self.retriever = retriever

    def run(self, query: str, max_steps: int = 5) -> str:
        context = f"Question: {query}\n"

        for step in range(max_steps):
            # 1. Thought: Reason about what to do next
            thought = self.llm.generate(
                f"{context}\nThought {step+1}:"
            )
            context += f"Thought {step+1}: {thought}\n"

            # Check termination
            if "Final Answer:" in thought:
                return self.extract_answer(thought)

            # 2. Action: Decide search query
            action = self.llm.generate(
                f"{context}\nAction {step+1}: Search["
            )
            search_query = action.split("]")[0]
            context += f"Action {step+1}: Search[{search_query}]\n"

            # 3. Observation: Execute search and observe results
            results = self.retriever.search(search_query)
            observation = self.format_results(results)
            context += f"Observation {step+1}: {observation}\n"

        return "Could not find answer within max steps"

Execution Example

Question: What did Microsoft's CEO say when OpenAI's CEO was fired?

Thought 1: I need to find out when OpenAI's CEO was fired first.
Action 1: Search[OpenAI CEO fired date]
Observation 1: Sam Altman was fired by OpenAI's board on November 17, 2023.

Thought 2: Now I need to find what Microsoft's CEO said on that date.
Action 2: Search[Satya Nadella November 17 2023 Sam Altman]
Observation 2: Satya Nadella expressed support for Sam Altman and...

Thought 3: I have enough information to answer.
Final Answer: Satya Nadella expressed support for Sam Altman...

Pros and Cons

When to Use

Questions are unpredictable (diverse domains, open-ended)
Strategy needs to change based on results
Debugging is important (need to trace reasoning)

Pattern 2: Self-Ask

Core Structure

Question → Follow-up Question → Intermediate Answer → ... → Final Answer

Self-Ask repeatedly asks "What do I need to know first to answer this?" It explicitly generates sub-questions, answers each, then combines for the final answer.

How It Works

class SelfAskAgent:
    def __init__(self, llm, retriever):
        self.llm = llm
        self.retriever = retriever

    def run(self, query: str) -> str:
        context = f"Question: {query}\n"
        context += "Are follow-up questions needed here: "

        while True:
            # Decide if follow-up needed
            needs_followup = self.llm.generate(context)

            if "No" in needs_followup or "Final Answer" in needs_followup:
                # Generate final answer
                final = self.llm.generate(
                    f"{context}\nSo the final answer is:"
                )
                return final

            # Generate follow-up question
            context += "Yes.\n"
            followup = self.llm.generate(
                f"{context}Follow-up question:"
            )
            context += f"Follow-up question: {followup}\n"

            # Search and answer follow-up
            results = self.retriever.search(followup)
            intermediate = self.generate_intermediate_answer(followup, results)
            context += f"Intermediate answer: {intermediate}\n"
            context += "Are follow-up questions needed here: "

Execution Example

Question: Who was CEO before Sam Altman returned?

Are follow-up questions needed here: Yes.
Follow-up question: When did Sam Altman return as OpenAI CEO?
Intermediate answer: He returned on November 22, 2023.

Are follow-up questions needed here: Yes.
Follow-up question: Who was OpenAI CEO just before November 22, 2023?
Intermediate answer: Emmett Shear was interim CEO from November 20.

Are follow-up questions needed here: No.
So the final answer is: The CEO before Sam Altman's return was Emmett Shear.

Pros and Cons

When to Use

Chain-structured multi-hop questions (A → B → C)
Need to cache or verify intermediate results
Question decomposition structure is clear

Pattern 3: Plan-and-Solve

Core Structure

Question → Plan (all steps) → Execute Step 1 → Execute Step 2 → ... → Answer

Plan-and-Solve creates a complete plan first, then executes sequentially. Dependencies and parallelization are identified during planning.

How It Works

class PlanAndSolveAgent:
    def __init__(self, llm, retriever):
        self.llm = llm
        self.retriever = retriever

    def run(self, query: str) -> str:
        # 1. Planning: Create complete plan
        plan = self.create_plan(query)

        # 2. Execution: Follow the plan
        results = {}
        for step in plan.steps:
            # Inject results from dependent steps
            resolved_query = self.resolve_dependencies(step, results)

            # Execute search
            search_results = self.retriever.search(resolved_query)
            results[step.id] = self.extract_answer(step, search_results)

        # 3. Synthesis: Combine results
        return self.synthesize(query, results)

    def create_plan(self, query: str) -> Plan:
        prompt = f"""
        Question: {query}

        Create a step-by-step plan to answer this question.
        For each step, specify:
        - step_id: unique identifier
        - query: what to search for
        - depends_on: list of step_ids this depends on (empty if none)

        Output as JSON.
        """
        plan_json = self.llm.generate(prompt)
        return Plan.from_json(plan_json)

Execution Example

Question: How did Tesla's stock and competitors react after they cut prices?

=== PLANNING PHASE ===
{
  "steps": [
    {"id": "s1", "query": "Tesla price cut date", "depends_on": []},
    {"id": "s2", "query": "Tesla stock reaction {s1.date}", "depends_on": ["s1"]},
    {"id": "s3", "query": "Competitor reaction Tesla price cut", "depends_on": ["s1"]},
    {"id": "s4", "query": "Synthesis", "depends_on": ["s2", "s3"]}
  ]
}

=== EXECUTION PHASE ===
Step s1: Tesla cut prices on January 13, 2023.
Step s2 (parallel): Tesla stock rose 8%.
Step s3 (parallel): Competitors responded with their own price cuts.
Step s4: [Synthesis]

=== FINAL ANSWER ===
After Tesla's January 2023 price cut, stock rose 8%,
and competitors responded with their own cuts.

Pros and Cons

When to Use

Question structure is predictable
Need parallel processing for speed
Need to review/approve plan before execution

Pattern Comparison

Structural Comparison

ReAct:        Think → Act → Observe → Think → Act → ... (loop)
Self-Ask:     Question → Follow-up → Answer → Follow-up → ... (chain)
Plan-Solve:   Plan all steps → Execute s1 → Execute s2 → ... (sequential/parallel)

Detailed Comparison

Decision Flow

Assess question type
    │
    ├─ Unpredictable, open-ended ──────────→ ReAct
    │
    ├─ Clear chain structure (A→B→C) ──────→ Self-Ask
    │
    └─ Parallelizable, clear structure ────→ Plan-and-Solve

Hybrid Approaches: Mix Them in Practice

In production, hybrids are often more effective than pure patterns.

Plan-then-ReAct

class HybridAgent:
    """Start with Plan-and-Solve, fall back to ReAct on failure"""

    def run(self, query: str) -> str:
        # 1. Try planning first
        plan = self.create_plan(query)

        # 2. Execute plan
        for step in plan.steps:
            try:
                result = self.execute_step(step)
                if not self.is_valid(result):
                    raise InvalidResultError()
            except Exception:
                # 3. Fall back to ReAct on failure
                return self.react_fallback(query, step)

        return self.synthesize(results)

    def react_fallback(self, query: str, failed_step: Step) -> str:
        """Switch to ReAct mode for flexible resolution"""
        context = f"Original question: {query}\n"
        context += f"Failed at: {failed_step.query}\n"
        context += "Switching to exploratory mode...\n"

        return self.react_agent.run(context)

Self-Ask with Parallel Execution

class ParallelSelfAsk:
    """Decompose with Self-Ask, execute independent questions in parallel"""

    def run(self, query: str) -> str:
        # 1. Generate all follow-up questions first
        followups = self.generate_all_followups(query)

        # 2. Analyze dependencies
        deps = self.analyze_dependencies(followups)

        # 3. Parallel for independent, sequential for dependent
        results = {}
        for group in self.topological_groups(deps):
            # Execute same-group questions in parallel
            group_results = parallel_execute(
                [self.answer_followup(q) for q in group]
            )
            results.update(group_results)

        return self.synthesize(query, results)

Implementation Tips

1. Clear Termination Conditions

# Prevent infinite loops in ReAct
MAX_STEPS = 7
CONFIDENCE_THRESHOLD = 0.8

def should_stop(thought: str, step: int, confidence: float) -> bool:
    if step >= MAX_STEPS:
        return True
    if "Final Answer" in thought:
        return True
    if confidence > CONFIDENCE_THRESHOLD:
        return True
    return False

2. Search Failure Handling

def search_with_fallback(query: str) -> List[Document]:
    # Primary: Exact search
    results = retriever.search(query)
    if results:
        return results

    # Secondary: Query expansion
    expanded = expand_query(query)
    results = retriever.search(expanded)
    if results:
        return results

    # Tertiary: Keyword extraction
    keywords = extract_keywords(query)
    return retriever.search(" ".join(keywords))

3. Context Compression

def compress_context(context: str, max_tokens: int = 2000) -> str:
    """Compress long context to save tokens"""
    if count_tokens(context) <= max_tokens:
        return context

    # Keep only recent N steps
    steps = parse_steps(context)
    recent = steps[-3:]  # Last 3 steps

    # Summarize earlier steps
    summary = summarize(steps[:-3])

    return f"[Summary of earlier steps: {summary}]\n" + format_steps(recent)

Conclusion

The three patterns are complementary, not competing.

ReAct:         Maximum flexibility, strong for unpredictable questions
Self-Ask:      Structured decomposition, optimal for chain questions
Plan-Solve:    Maximum efficiency, enables parallelization and review

Production Recommendations:

Default to Plan-and-Solve (efficient)
Fall back to ReAct on plan failure (flexible)
Use Self-Ask for clear chain questions (structured)

Multi-hop RAG performance ultimately depends on choosing the right pattern for the situation.

Retrieval Planning: ReAct vs Self-Ask vs Plan-and-Solve

Why Retrieval Planning?

Pattern 1: ReAct (Reasoning + Acting)

Core Structure

How It Works

Execution Example

Pros and Cons

When to Use

Pattern 2: Self-Ask

Core Structure

How It Works

Execution Example

Pros and Cons

When to Use

Pattern 3: Plan-and-Solve

Core Structure

How It Works

Execution Example

Pros and Cons

When to Use

Pattern Comparison

Structural Comparison

Detailed Comparison

Decision Flow

Hybrid Approaches: Mix Them in Practice

Plan-then-ReAct

Self-Ask with Parallel Execution

Implementation Tips

1. Clear Termination Conditions

2. Search Failure Handling

3. Context Compression

Conclusion

Related Posts

Retrieval Planning: ReAct vs Self-Ask vs Plan-and-Solve

Why Retrieval Planning?

Pattern 1: ReAct (Reasoning + Acting)

Core Structure

How It Works

Execution Example

Pros and Cons

When to Use

Pattern 2: Self-Ask

Core Structure

How It Works

Execution Example

Pros and Cons

When to Use

Pattern 3: Plan-and-Solve

Core Structure

How It Works

Execution Example

Pros and Cons

When to Use

Pattern Comparison

Structural Comparison

Detailed Comparison

Decision Flow

Hybrid Approaches: Mix Them in Practice

Plan-then-ReAct

Self-Ask with Parallel Execution

Implementation Tips

1. Clear Termination Conditions

2. Search Failure Handling

3. Context Compression

Conclusion

Related Posts