ReAct Paper: Synergizing Reasoning and Acting
This lesson provides an in-depth analysis of the influential ReAct paper that introduced a paradigm shift in how language models can solve complex tasks.
Paper Overview
The ReAct paper demonstrates that language models can achieve better performance on complex tasks by interleaving reasoning traces with task-specific actions.
Core Thesis: Combining reasoning (thinking) with acting (doing) enables language models to solve tasks that require both planning and interaction with external environments.
Key Contributions
1. The ReAct Framework
The paper introduces a simple yet powerful prompting method:
Thought: <reasoning about current state and what to do next>
Action: <action to take>
Observation: <result of action>
... (repeat until task complete)
2. Empirical Results
The paper demonstrates ReAct's effectiveness across multiple benchmarks:
- HotpotQA: Multi-hop question answering
- FEVER: Fact verification
- ALFWorld: Interactive decision-making
- WebShop: Web navigation and shopping
Core Methodology
Prompt Structure
The paper uses carefully designed prompts that include:
def create_react_prompt(question: str, examples: list) -> str:
"""
Create ReAct prompt following the paper's methodology.
The prompt includes:
1. Task description
2. Few-shot examples showing thought-action-observation patterns
3. The actual question to solve
"""
prompt = """Solve a question answering task with interleaving Thought, Action, Observation steps.
Thought can reason about the current situation, and Action can be three types:
(1) Search[entity], which searches the exact entity on Wikipedia and returns the first paragraph if it exists.
(2) Lookup[keyword], which returns the next sentence containing keyword in the current passage.
(3) Finish[answer], which returns the answer and finishes the task.
Here are some examples:
"""
# Add few-shot examples
for example in examples:
prompt += f"\n{example}\n"
# Add the question
prompt += f"\nQuestion: {question}\n"
prompt += "Thought 1:"
return prompt
# Example from the paper - HotpotQA
few_shot_example_1 = """
Question: What is the elevation range for the area that the eastern sector of the Colorado orogeny extends into?
Thought 1: I need to search Colorado orogeny, find the area that the eastern sector extends into, then find the elevation range of that area.
Action 1: Search[Colorado orogeny]
Observation 1: The Colorado orogeny was an episode of mountain building (an orogeny) in Colorado and surrounding areas.
Thought 2: It does not mention the eastern sector. I need to look up eastern sector.
Action 2: Lookup[eastern sector]
Observation 2: The eastern sector extends into the High Plains and is called the Central Plains orogeny.
Thought 3: The eastern sector extends into the High Plains. I need to search High Plains and find its elevation range.
Action 3: Search[High Plains]
Observation 3: High Plains refers to one of two distinct land regions: (1) the Great Plains, or (2) the High Plains (United States).
Thought 4: I need to search High Plains (United States) instead.
Action 4: Search[High Plains (United States)]
Observation 4: The High Plains are a subregion of the Great Plains. From east to west, the High Plains rise in elevation from around 1,800 to 7,000 ft (550 to 2,130 m).
Thought 5: High Plains rise in elevation from around 1,800 to 7,000 ft, so the answer is 1,800 to 7,000 ft.
Action 5: Finish[1,800 to 7,000 ft]
"""
Comparison with Baselines
The paper compares ReAct against several baselines:
from dataclasses import dataclass
from typing import List, Optional
@dataclass
class MethodComparison:
"""Comparison of different reasoning methods."""
method: str
uses_reasoning: bool
uses_actions: bool
success_rate: float
avg_steps: float
# Results from paper (HotpotQA benchmark)
comparisons = [
MethodComparison(
method="Standard Prompting",
uses_reasoning=False,
uses_actions=False,
success_rate=0.34,
avg_steps=1
),
MethodComparison(
method="Chain-of-Thought (CoT)",
uses_reasoning=True,
uses_actions=False,
success_rate=0.41,
avg_steps=1
),
MethodComparison(
method="Act-only",
uses_reasoning=False,
uses_actions=True,
success_rate=0.46,
avg_steps=5.2
),
MethodComparison(
method="ReAct (Reasoning + Acting)",
uses_reasoning=True,
uses_actions=True,
success_rate=0.60,
avg_steps=5.8
)
]
def analyze_results():
"""Analyze performance across methods."""
print("Method Performance Comparison")
print("=" * 70)
for comp in comparisons:
print(f"\n{comp.method}:")
print(f" Reasoning: {'✓' if comp.uses_reasoning else '✗'}")
print(f" Actions: {'✓' if comp.uses_actions else '✗'}")
print(f" Success Rate: {comp.success_rate:.1%}")
print(f" Avg Steps: {comp.avg_steps}")
# Key insight
print("\n" + "=" * 70)
print("KEY INSIGHT: ReAct outperforms both CoT-only and Act-only approaches")
print("by combining reasoning with action-taking capabilities.")
analyze_results()
Paper Finding: ReAct achieves 60% success rate on HotpotQA, significantly outperforming Chain-of-Thought (41%) and Act-only (46%) approaches.
Implementation from Paper
Here's a faithful implementation of the paper's approach:
import openai
from typing import List, Tuple, Optional
import re
class PaperReActAgent:
"""
ReAct agent implementation following the original paper.
"""
def __init__(
self,
model: str = "gpt-3.5-turbo",
max_steps: int = 7,
temperature: float = 0.0
):
self.model = model
self.max_steps = max_steps
self.temperature = temperature
def run(
self,
question: str,
few_shot_examples: List[str]
) -> Tuple[str, List[dict]]:
"""
Run ReAct on a question with few-shot examples.
Args:
question: The question to answer
few_shot_examples: List of example trajectories
Returns:
Tuple of (final_answer, trajectory)
"""
trajectory = []
context = self._build_initial_prompt(question, few_shot_examples)
for step in range(1, self.max_steps + 1):
# Generate thought and action
response = self._generate_step(context, step)
# Parse thought and action
thought, action = self._parse_response(response, step)
if thought:
trajectory.append({"type": "thought", "step": step, "content": thought})
if action:
trajectory.append({"type": "action", "step": step, "content": action})
# Check if finished
if action.startswith("Finish["):
answer = action[7:-1] # Extract answer from Finish[...]
return answer, trajectory
# Execute action and get observation
observation = self._execute_action(action)
trajectory.append({"type": "observation", "step": step, "content": observation})
# Update context
context += f"\n{response}\nObservation {step}: {observation}\n"
context += f"Thought {step + 1}:"
return "Failed to complete task", trajectory
def _build_initial_prompt(
self,
question: str,
examples: List[str]
) -> str:
"""Build initial prompt with examples."""
prompt = """Solve a question answering task with interleaving Thought, Action, Observation steps.
Thought can reason about the current situation, and Action can be three types:
(1) Search[entity], which searches the exact entity on Wikipedia and returns the first paragraph if it exists.
(2) Lookup[keyword], which returns the next sentence containing keyword in the current passage.
(3) Finish[answer], which returns the answer and finishes the task.
"""
# Add examples
for example in examples:
prompt += f"{example}\n\n"
# Add question
prompt += f"Question: {question}\nThought 1:"
return prompt
def _generate_step(self, context: str, step: int) -> str:
"""Generate next thought and action."""
response = openai.ChatCompletion.create(
model=self.model,
messages=[
{"role": "user", "content": context}
],
temperature=self.temperature,
max_tokens=200,
stop=[f"\nObservation {step}:"]
)
return response.choices[0].message.content.strip()
def _parse_response(
self,
response: str,
step: int
) -> Tuple[Optional[str], Optional[str]]:
"""Parse thought and action from response."""
thought = None
action = None
# Extract thought
thought_match = re.search(
f"Thought {step}: (.+?)(?=Action {step}:|$)",
response,
re.DOTALL
)
if thought_match:
thought = thought_match.group(1).strip()
# Extract action
action_match = re.search(
f"Action {step}: (.+?)$",
response,
re.DOTALL
)
if action_match:
action = action_match.group(1).strip()
return thought, action
def _execute_action(self, action: str) -> str:
"""
Execute action and return observation.
In production, integrate with Wikipedia API, etc.
"""
if action.startswith("Search["):
entity = action[7:-1]
return self._search_wikipedia(entity)
elif action.startswith("Lookup["):
keyword = action[7:-1]
return self._lookup_keyword(keyword)
else:
return "Invalid action format"
def _search_wikipedia(self, entity: str) -> str:
"""Search Wikipedia (simplified simulation)."""
# In production: use Wikipedia API
# import wikipedia
# return wikipedia.summary(entity, sentences=2)
return f"[Wikipedia search result for '{entity}' would appear here]"
def _lookup_keyword(self, keyword: str) -> str:
"""Lookup keyword in current passage (simplified)."""
return f"[Next sentence containing '{keyword}' would appear here]"
Key Insights from the Paper
1. Synergy Between Reasoning and Acting
The paper demonstrates that reasoning and acting are complementary:
- Reasoning helps acting: Thought traces guide which actions to take
- Acting helps reasoning: Observations ground reasoning in facts
class InsightDemonstrator:
"""Demonstrate key insights from the paper."""
@staticmethod
def demonstrate_synergy():
"""Show how reasoning and acting complement each other."""
print("INSIGHT 1: Reasoning Guides Acting")
print("-" * 50)
print("Thought: I need information about X, but the search")
print(" returned information about Y instead.")
print("Conclusion: Use Lookup to find X within the results,")
print(" or Search for a more specific term.")
print("\n\nINSIGHT 2: Acting Grounds Reasoning")
print("-" * 50)
print("Without action: Model might hallucinate facts")
print("With action: Model uses actual observations to reason")
print("Result: More factual and verifiable answers")
Paper Insight: The synergy between reasoning and acting reduces both task-solving errors and reasoning hallucinations.
2. Human-Like Problem Solving
ReAct mirrors human problem-solving patterns:
class HumanLikeBehavior:
"""Examples of human-like behaviors enabled by ReAct."""
behaviors = {
"error_recovery": {
"description": "Recognizing when an action failed and trying alternatives",
"example": """
Thought: The search didn't return the specific information I need.
Action: Lookup[more specific term]
"""
},
"information_gathering": {
"description": "Systematically collecting needed information",
"example": """
Thought: I need both the birth year and death year to calculate age.
Action: Search[person name]
... observe ...
Thought: I found birth year, now I need death year.
Action: Lookup[death]
"""
},
"plan_adjustment": {
"description": "Modifying strategy based on new information",
"example": """
Thought: My initial search was too broad. I need to be more specific.
Action: Search[more specific query]
"""
}
}
3. Interpretability
ReAct provides transparent decision-making:
def analyze_trajectory(trajectory: List[dict]) -> dict:
"""
Analyze a ReAct trajectory for interpretability.
This demonstrates one of the paper's key benefits:
the reasoning process is fully transparent.
"""
analysis = {
"total_steps": 0,
"thoughts": [],
"actions": [],
"observations": [],
"decision_points": []
}
for i, step in enumerate(trajectory):
analysis["total_steps"] += 1
if step["type"] == "thought":
analysis["thoughts"].append(step["content"])
# Identify decision points (when strategy changes)
if i > 0 and any(word in step["content"].lower()
for word in ["instead", "need to", "should"]):
analysis["decision_points"].append({
"step": step["step"],
"decision": step["content"]
})
elif step["type"] == "action":
analysis["actions"].append(step["content"])
elif step["type"] == "observation":
analysis["observations"].append(step["content"])
return analysis
Limitations Discussed in Paper
The paper honestly addresses limitations:
Limitations:
- Requires access to external tools/APIs
- Performance depends on quality of few-shot examples
- Can be slower than direct prompting due to multiple LLM calls
- May get stuck in loops without proper termination logic
class LimitationMitigations:
"""Strategies to address ReAct limitations."""
@staticmethod
def prevent_loops(trajectory: List[dict], max_repeats: int = 3) -> bool:
"""Detect and prevent action loops."""
if len(trajectory) < max_repeats * 2:
return False
recent_actions = [
step["content"] for step in trajectory[-max_repeats*2:]
if step["type"] == "action"
]
# Check for repeated actions
if len(recent_actions) >= max_repeats:
if len(set(recent_actions[-max_repeats:])) == 1:
return True # Loop detected
return False
@staticmethod
def optimize_few_shot_examples(
examples: List[str],
question: str
) -> List[str]:
"""Select most relevant few-shot examples."""
# In practice: use embedding similarity
# Return top-k most similar examples
return examples[:2] # Simplified
Experimental Setup
The paper's experimental methodology:
from dataclasses import dataclass
from typing import Dict
@dataclass
class ExperimentConfig:
"""Configuration matching paper's experiments."""
dataset: str
model: str
temperature: float
max_steps: int
num_examples: int
evaluation_metric: str
paper_experiments = {
"hotpotqa": ExperimentConfig(
dataset="HotpotQA",
model="PaLM-540B",
temperature=0.0,
max_steps=7,
num_examples=6,
evaluation_metric="Exact Match + F1"
),
"fever": ExperimentConfig(
dataset="FEVER",
model="PaLM-540B",
temperature=0.0,
max_steps=7,
num_examples=6,
evaluation_metric="Accuracy"
),
"alfworld": ExperimentConfig(
dataset="ALFWorld",
model="PaLM-540B",
temperature=0.0,
max_steps=50,
num_examples=6,
evaluation_metric="Success Rate"
)
}
Key Takeaways
- Synergy matters: Combining reasoning and acting outperforms either alone
- Few-shot is powerful: ReAct works well with just 6 examples
- Interpretability: Explicit reasoning traces make decisions transparent
- Generality: Works across diverse tasks (QA, fact-checking, navigation)
- Foundation for agents: ReAct laid groundwork for modern AI agents
Quiz
Test your understanding of the ReAct paper: