LangGraph 研究智能体基础详细解读

📚 概述

Research Agent（研究智能体） 是 Deep Research 系统的核心执行单元，负责实际的信息搜集工作。本文档重点讲解 Agent 如何自主进行多轮搜索、反思结果并做出决策。

核心能力：

🔍 自主搜索 - 使用 Tavily Search API
🤔 反思决策 - 使用 think_tool 分析结果
📊 上下文管理 - 压缩研究结果避免 token 爆炸
⚡ 智能终止 - 知道何时停止搜索

🎯 核心概念：Agent 工具调用循环

什么是 Agent 循环？

Agent 模式的核心是持续的决策-执行循环：

研究主题输入
    ↓
┌─────────────────────────────────┐
│  LLM 决策节点                    │
│  - 分析当前状态                   │
│  - 决定调用哪些工具                │
│  或提供最终答案                    │
└─────────────────────────────────┘
    ↓
有工具调用？
    ├─ YES → 执行工具
    │         ↓
    │    think_tool 反思
    │         ↓
    │    回到 LLM 决策 ←┐
    │                  │
    └─ NO → 压缩研究 ──┘
              ↓
          结束，返回结果

关键点：

🔄 循环执行 - Agent 可以进行多轮搜索
🤔 每轮反思 - 使用 think_tool 避免盲目搜索
🛑 智能终止 - LLM 决定何时有足够信息

⚠️ 核心挑战：Spin-out 问题

问题描述

Spin-out（失控旋转） 是 Agent 系统最常见的失败模式：

python

# ❌ Spin-out 示例
Agent:
  Search 1: "best coffee SF"
  Search 2: "top coffee shops SF"
  Search 3: "SF coffee recommendations"
  Search 4: "best rated coffee SF"
  Search 5: "SF specialty coffee"
  Search 6: "coffee shops San Francisco"
  ... (20+ 次类似搜索，内容重复)

原因：

Agent 不满足于现有结果
不断尝试略微不同的查询词
没有停止机制

解决方案：三管齐下

1. Hard Limits（硬性限制）

python

# 在 Prompt 中明确规定
"""
<Hard Limits>
- Simple queries: 2-3 search calls maximum
- Complex queries: Up to 5 search calls maximum
- Always stop: After 5 searches if you cannot find answers
</Hard Limits>
"""

2. think_tool 强制反思

python

@tool
def think_tool(reflection: str) -> str:
    """
    在每次搜索后强制 Agent 反思：
    - 我找到了什么关键信息？
    - 还缺少什么？
    - 是否有足够信息回答问题？
    - 应该继续搜索还是给出答案？
    """
    return f"Reflection recorded: {reflection}"

关键： think_tool 创建一个"思考暂停"，让 Agent 评估进展而非盲目继续。

3. Prompt Engineering：具体启发式

python

research_agent_prompt = """
<Instructions>
Think like a human researcher with limited time:

1. **Read the question carefully** - What specific information is needed?
2. **Start with broader searches** - Use comprehensive queries first
3. **After each search, pause and assess** - Do I have enough? What's missing?
4. **Execute narrower searches** - Fill in the gaps
5. **Stop when you can answer confidently** - Don't search for perfection
</Instructions>

<Stop Immediately When>:
- You can answer the question comprehensively
- You have 3+ relevant examples/sources
- Your last 2 searches returned similar information
</Stop Immediately When>
"""

对比效果：

场景	无优化	有优化
"SF 最佳咖啡店"	20+ 次搜索	3-4 次搜索
搜索内容	高度重复	递进式细化
最终质量	信息过载	精准回答

🔧 核心技术实现

1. 状态定义

python

from typing_extensions import TypedDict, Annotated, List
from langchain_core.messages import BaseMessage
from langgraph.graph.message import add_messages

class ResearcherState(TypedDict):
    """研究 Agent 状态"""
    # 消息历史（工具调用、结果等）
    researcher_messages: Annotated[Sequence[BaseMessage], add_messages]
    # 工具调用迭代计数（用于限制）
    tool_call_iterations: int
    # 研究主题
    research_topic: str
    # 压缩后的研究结果
    compressed_research: str
    # 原始笔记（未压缩）
    raw_notes: Annotated[List[str], operator.add]

2. 研究工具：Tavily Search

python

from tavily import TavilyClient
from langchain_core.tools import tool

tavily_client = TavilyClient()

@tool
def tavily_search(query: str) -> str:
    """
    使用 Tavily API 进行 Web 搜索

    自动处理：
    - 搜索执行
    - 内容提取
    - 网页摘要（去除广告、导航等噪音）
    """
    # 执行搜索
    result = tavily_client.search(
        query,
        max_results=3,
        include_raw_content=True,  # 获取完整内容
        topic="general"
    )

    # 去重
    unique_results = deduplicate_by_url(result['results'])

    # 摘要每个网页（重要：减少 token）
    summarized_results = {}
    for url, page in unique_results.items():
        if page.get("raw_content"):
            # 使用 LLM 提取关键信息
            summary = summarize_webpage_content(page['raw_content'])
        else:
            summary = page['content']

        summarized_results[url] = {
            'title': page['title'],
            'content': summary
        }

    # 格式化输出
    return format_search_output(summarized_results)

关键优化：网页内容摘要

python

def summarize_webpage_content(webpage_content: str) -> str:
    """
    将原始网页（可能包含大量噪音）压缩为结构化摘要

    这是 Context Engineering 的第一层压缩
    """
    class Summary(BaseModel):
        summary: str = Field(description="简洁摘要")
        key_excerpts: str = Field(description="关键引用")

    structured_model = llm.with_structured_output(Summary)
    result = structured_model.invoke([
        HumanMessage(content=f"""
        网页内容: {webpage_content}

        提取：
        1. 核心信息摘要（2-3句话）
        2. 关键引用和数据点

        忽略广告、导航、样板内容
        """)
    ])

    return f"<summary>{result.summary}</summary>\n<key_excerpts>{result.key_excerpts}</key_excerpts>"

为什么需要摘要？

原始网页可能有 10k+ tokens
大部分是无用内容（广告、导航）
摘要后通常只需 200-500 tokens
效果：压缩比 20:1

3. Agent 核心节点

LLM 决策节点

python

from langchain.chat_models import init_chat_model

model = init_chat_model(model="anthropic:claude-sonnet-4-20250514")
tools = [tavily_search, think_tool]
model_with_tools = model.bind_tools(tools)

def llm_call(state: ResearcherState):
    """
    Agent 决策中心：
    1. 分析当前研究进展
    2. 决定调用工具或提供答案
    """
    return {
        "researcher_messages": [
            model_with_tools.invoke(
                [SystemMessage(content=research_agent_prompt)] +
                state["researcher_messages"]
            )
        ]
    }

工具执行节点

python

def tool_node(state: ResearcherState):
    """
    执行所有工具调用
    """
    tool_calls = state["researcher_messages"][-1].tool_calls

    # 执行所有工具
    observations = []
    for tool_call in tool_calls:
        tool = tools_by_name[tool_call["name"]]
        observation = tool.invoke(tool_call["args"])
        observations.append(observation)

    # 创建工具消息
    tool_outputs = [
        ToolMessage(
            content=obs,
            name=tc["name"],
            tool_call_id=tc["id"]
        )
        for obs, tc in zip(observations, tool_calls)
    ]

    return {"researcher_messages": tool_outputs}

路由逻辑

python

def should_continue(state: ResearcherState) -> Literal["tool_node", "compress_research"]:
    """
    决定继续搜索还是结束
    """
    last_message = state["researcher_messages"][-1]

    if last_message.tool_calls:
        return "tool_node"  # 继续执行工具
    else:
        return "compress_research"  # 结束，压缩结果

🗜️ 核心技术：Context Engineering（上下文工程）

为什么需要压缩？

问题场景：

python

# 5 轮搜索，每轮 3 个网页，每个网页 500 tokens（已摘要）
5 * 3 * 500 = 7,500 tokens

# 加上工具调用、think_tool 反思
总计可能达到 10,000+ tokens

# 这些都在 researcher_messages 中
# 如果直接传给下一个节点 → token 爆炸

两层压缩策略

第一层：网页级压缩（前面已讲）

原始网页 → 摘要（压缩比 20:1）

第二层：研究级压缩（关键！）

python

def compress_research(state: ResearcherState) -> dict:
    """
    将多轮工具调用的完整历史压缩为核心发现

    这是 Context Engineering 的第二层压缩
    """

    # 压缩 Prompt
    system_message = """
    你是一个研究摘要专家。将以下研究过程压缩为核心发现。

    要求：
    1. 提取所有关键信息（数据、评分、排名）
    2. 保留来源引用
    3. 组织为结构化格式
    4. 删除重复内容
    """

    # 重要：在末尾添加 Human Message 重申任务
    human_reminder = f"""
    原始研究主题: {state['research_topic']}

    请确保压缩结果包含所有与该主题相关的信息，
    特别是数据、评分、具体例子。
    """

    messages = (
        [SystemMessage(content=system_message)] +
        state["researcher_messages"] +
        [HumanMessage(content=human_reminder)]
    )

    # 使用支持长输出的模型
    compress_model = init_chat_model(
        model="openai:gpt-4.1",
        max_tokens=32000  # 关键：避免输出被截断
    )

    response = compress_model.invoke(messages)

    # 提取原始笔记（供最终报告使用）
    raw_notes = [
        str(m.content)
        for m in filter_messages(
            state["researcher_messages"],
            include_types=["tool", "ai"]
        )
    ]

    return {
        "compressed_research": str(response.content),
        "raw_notes": ["\n".join(raw_notes)]
    }

压缩的三个关键点

1. 为什么在末尾添加 Human Message？

python

# ❌ 没有提醒
system_prompt + 10,000 tokens 研究历史
→ LLM 可能"遗忘"任务，产生通用摘要

# ✅ 添加提醒
system_prompt + 10,000 tokens 研究历史 + "记住：研究主题是 X"
→ LLM 保持聚焦，保留相关信息

这个技巧来自实际经验：长上下文会导致 LLM "lost in the middle" 问题。

2. 为什么设置 max_tokens=32000？

python

# ❌ 使用默认（可能只有 1024）
compress_model = init_chat_model("openai:gpt-4.1")
→ 输出可能在第 1024 个 token 被截断
→ "Sextant Coffee Ro..." ← 信息丢失！

# ✅ 明确设置
compress_model = init_chat_model("openai:gpt-4.1", max_tokens=32000)
→ 允许完整输出（GPT-4.1 最大输出 33k tokens）

3. 为什么同时保留 raw_notes？

python

return {
    "compressed_research": compressed,  # 用于后续 Agent 处理
    "raw_notes": raw_notes              # 用于最终报告生成
}

原因：

compressed_research - 高层次总结，供 supervisor 决策
raw_notes - 完整细节，供报告生成使用
这样既节省 token，又不丢失信息

🔄 完整执行流程示例

示例：研究旧金山咖啡店

python

研究主题输入: "研究旧金山基于咖啡质量的最佳咖啡店..."

┌─────────────────────────────────────────┐
│ Round 1: 初始搜索                        │
├─────────────────────────────────────────┤
│ LLM 决策:                               │
│   "我需要搜索 SF 咖啡店的专业评分"       │
│                                         │
│ Tool Call 1: tavily_search(             │
│   "San Francisco specialty coffee       │
│    quality ratings Coffee Review 2024"  │
│ )                                       │
│                                         │
│ 结果: 找到 Sightglass 94分, Blue Bottle │
│                                         │
│ Tool Call 2: think_tool(                │
│   "找到了专业评分，但需要更多例子..."   │
│ )                                       │
└─────────────────────────────────────────┘
    ↓
┌─────────────────────────────────────────┐
│ Round 2: 补充搜索                        │
├─────────────────────────────────────────┤
│ LLM 决策:                               │
│   "需要搜索用户评价验证专业评分"         │
│                                         │
│ Tool Call 3: tavily_search(             │
│   "SF coffee shops Yelp Google quality" │
│ )                                       │
│                                         │
│ 结果: Ritual 4.4/5, Saint Frank 4.3/5  │
│                                         │
│ Tool Call 4: think_tool(                │
│   "已有 5 家咖啡店的综合数据，           │
│    包括专业评分和用户评价，足够回答"     │
│ )                                       │
└─────────────────────────────────────────┘
    ↓
┌─────────────────────────────────────────┐
│ Round 3: 决定结束                        │
├─────────────────────────────────────────┤
│ LLM 决策:                               │
│   "我有足够信息，提供最终答案"           │
│                                         │
│ 无工具调用 → 触发 compress_research     │
└─────────────────────────────────────────┘
    ↓
┌─────────────────────────────────────────┐
│ 压缩研究结果                             │
├─────────────────────────────────────────┤
│ 输入: 10,000+ tokens (4 次搜索结果)     │
│ 输出: 2,000 tokens (结构化发现)         │
│                                         │
│ 核心发现:                               │
│ 1. Sightglass - 94/100 (Coffee Review) │
│ 2. Blue Bottle - 90+/100               │
│ 3. Ritual - 4.4/5 (用户评价)           │
│ 4. Saint Frank - 4.3/5                 │
│ 5. Four Barrel - 88-92/100             │
│                                         │
│ 质量指标: 专业评分 + 用户反馈 + 认证     │
└─────────────────────────────────────────┘

关键观察：

✅ 只用了 4 次工具调用（2次搜索 + 2次反思）
✅ 每次搜索后都反思，避免盲目继续
✅ 第二次反思后决定足够，没有 spin-out
✅ 压缩比约 5:1（10k → 2k tokens）

📊 性能考虑

模型选择对比

维度	GPT-4.1	Claude Sonnet 4
压缩延迟	38s	99s
输出质量	优秀	优秀
最大输出	33k tokens	64k tokens
成本	中等	较高
推荐用途	压缩节点	LLM 决策节点

实践建议：

Agent 决策（需要推理）→ Claude Sonnet 4
压缩节点（需要速度）→ GPT-4.1
网页摘要（简单任务）→ GPT-4.1-mini

💡 最佳实践

1. Prompt 设计的黄金法则

python

"""
✅ DO: Think like a human researcher with limited time
✅ DO: Provide concrete heuristics (2-3 for simple, 5 for complex)
✅ DO: Use think_tool after each search
✅ DO: Stop when you have 3+ relevant sources

❌ DON'T: Search for perfection
❌ DON'T: Repeat similar queries
❌ DON'T: Ignore the hard limits
"""

2. 工具调用预算

python

# 在状态中跟踪迭代次数
class ResearcherState(TypedDict):
    tool_call_iterations: int  # 计数器

# 在节点中检查
def llm_call(state):
    if state["tool_call_iterations"] >= MAX_ITERATIONS:
        # 强制结束
        return {"messages": [AIMessage("达到搜索限制，基于现有信息回答...")]}

3. 错误处理

python

def tool_node(state):
    try:
        # 执行工具
        observations = [tool.invoke(args) for tool in tool_calls]
    except Exception as e:
        # 降级策略
        return {
            "researcher_messages": [
                ToolMessage(
                    content=f"搜索失败: {e}. 基于现有信息继续.",
                    tool_call_id=tool_call["id"]
                )
            ]
        }

🎓 核心知识点总结

Agent 循环模式

python

while True:
    decision = llm_call(state)

    if no_tool_calls(decision):
        break  # 结束循环

    results = execute_tools(decision)
    state = update_state(results)

Context Engineering 两层压缩

网页级 - 原始内容 → 摘要（20:1）
研究级 - 多轮结果 → 核心发现（5:1）
总压缩比 - 约 100:1

think_tool 的作用

强制暂停和反思
避免 spin-out
提高决策质量

Prompt Engineering 核心

像人类研究员一样思考
具体的启发式规则
硬性限制防止失控

🚀 下一步

完成本节，你已经掌握了 Research Agent 的核心机制。

下一章：9.3 MCP 集成 - 学习如何使用 Model Context Protocol 扩展 Agent 的工具能力，访问文件系统、数据库等外部资源！

LangGraph 研究智能体基础详细解读 ​

📚 概述 ​

🎯 核心概念：Agent 工具调用循环 ​

什么是 Agent 循环？ ​

⚠️ 核心挑战：Spin-out 问题 ​

问题描述 ​

解决方案：三管齐下 ​

1. Hard Limits（硬性限制） ​

2. think_tool 强制反思 ​

3. Prompt Engineering：具体启发式 ​

🔧 核心技术实现 ​

1. 状态定义 ​

2. 研究工具：Tavily Search ​

3. Agent 核心节点 ​

LLM 决策节点 ​

工具执行节点 ​

路由逻辑 ​

🗜️ 核心技术：Context Engineering（上下文工程） ​

为什么需要压缩？ ​

两层压缩策略 ​

第一层：网页级压缩（前面已讲） ​

第二层：研究级压缩（关键！） ​

压缩的三个关键点 ​

1. 为什么在末尾添加 Human Message？ ​

2. 为什么设置 max_tokens=32000？ ​

3. 为什么同时保留 raw_notes？ ​

🔄 完整执行流程示例 ​

示例：研究旧金山咖啡店 ​

📊 性能考虑 ​

模型选择对比 ​

💡 最佳实践 ​

1. Prompt 设计的黄金法则 ​

2. 工具调用预算 ​

3. 错误处理 ​

🎓 核心知识点总结 ​

Agent 循环模式 ​

Context Engineering 两层压缩 ​

think_tool 的作用 ​

Prompt Engineering 核心 ​

🚀 下一步 ​

LangGraph 研究智能体基础详细解读

📚 概述

🎯 核心概念：Agent 工具调用循环

什么是 Agent 循环？

⚠️ 核心挑战：Spin-out 问题

问题描述

解决方案：三管齐下

1. Hard Limits（硬性限制）

2. think_tool 强制反思

3. Prompt Engineering：具体启发式

🔧 核心技术实现

1. 状态定义

2. 研究工具：Tavily Search

3. Agent 核心节点

LLM 决策节点

工具执行节点

路由逻辑

🗜️ 核心技术：Context Engineering（上下文工程）

为什么需要压缩？

两层压缩策略

第一层：网页级压缩（前面已讲）

第二层：研究级压缩（关键！）

压缩的三个关键点

1. 为什么在末尾添加 Human Message？

2. 为什么设置 max_tokens=32000？

3. 为什么同时保留 raw_notes？

🔄 完整执行流程示例

示例：研究旧金山咖啡店

📊 性能考虑

模型选择对比

💡 最佳实践

1. Prompt 设计的黄金法则

2. 工具调用预算

3. 错误处理

🎓 核心知识点总结

Agent 循环模式

Context Engineering 两层压缩

think_tool 的作用

Prompt Engineering 核心

🚀 下一步