我要投稿

一文搞懂Claude Skills和SubAgents及背后本质

发布日期：2026-02-03 08:24:52 浏览次数： 1518

作者：从码农到工匠

微信搜一搜，关注“从码农到工匠”

想象一下这样的场景：你去看一位全科医生。这位医生在上岗前，已经把一整套医学百科全书背得滚瓜烂熟——从感冒发烧到心脏手术，理论上他“什么都知道”。但当他真正面对患者时，问题出现了：

🧠 记忆超载问题：医学知识太庞大，他能记住的有限
🔍 专业深度不足：每个领域都懂一点，但都不够精深
⏰ 效率瓶颈：处理复杂病情时，要在脑海中的百科全书里反复翻找

这是传统基于prompt的AI对话模型的真实写照——一个“知识渊博的全科医生”，但受限于单一的上下文容量和角色定位。

本文将带你搞懂Claude Skills 和 SubAgents 是如何运用我们朴素而强大的工程哲学：分而治之，来解决上下文爆炸问题(Context bloat problem)。

1. Claude Skills

要解决全科医生问题，首先，我们需要让这位全科医生不再独自承担所有医疗任务，而是变成了一个懂得何时呼叫专家的智能协调者：

随身携带：一本轻薄的专家通讯录（Skill列表）
遇到专科问题：立刻呼叫对应专家（加载特定Skill）
专家介入：心脏外科医生带着全套手术指南和专业设备（完整的Skill提示词和工具权限）
任务完成：专家离开，医生恢复常态，通讯录保持简洁

这就是Claude Skills的分而治之应对之道：它让AI从“试图记住一切”转向“知道如何找到专业帮助”。接下来，和以往一样，我们先从实战开始，然后逐步揭示其背后的原理。

1.1 实现一个能处理PDF的skills

PDF的文件处理是一个相对专业的工具活，我们可以创建一个专门处理PDF的skills。为此，首先需要在当前工作目录下创建文件夹.claude/skills/pdf/。然后在该目录下创建SKILL.md，其内容主要包含Frontmatter（前言）+ Instructions（详细指导），部分内容如下：

---
name: pdf
description: Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When Claude needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.
---

# PDF Processing Guide

## Overview

This guide covers essential PDF processing operations using Python libraries and command-line tools. For advanced features, JavaScript libraries, and detailed examples, see reference.md. If you need to fill out a PDF form, read forms.md and follow its instructions.

## Python Libraries

### pypdf - Basic Operations

#### Merge PDFs

from pypdf import PdfWriter, PdfReader

writer = PdfWriter()
for pdf_file in ["doc1.pdf", "doc2.pdf", "doc3.pdf"]:
    reader = PdfReader(pdf_file)
    for page in reader.pages:
        writer.add_page(page)

with open("merged.pdf", "wb") as output:
    writer.write(output)

省略其它内容...

要查看SKILL.md的完整内容，可以去Anthropics官方：https://github.com/anthropics/skills/tree/main/skills/pdf

1.1.1 查看可用的Skills

为了查看当前可用的Skills，我们可以直接在Claude中输入：List all available Skills

以下是我们的运行结果，可以看到，我们刚才添加的pdf skills已经可用

Error信息，是因为Claude首先会尝试在~/.claude/skills去寻找Skills，这和Claude的Skills发现策略(https://code.claude.com/docs/en/skills#view-available-skills)有关。

1.1.2 测试基本Skills能力

我们先执行一个简单的PDF合并任务。为了演示，我们先在当前工作目录下放置两个pdf文件，分别是CTS.pdf和Docker.pdf。

然后我们在Claude中输入：合并CTS.pdf和Docker.pdf到merged.pdf

其执行结果如下，表示在pdf skill的协助下，已经成功完成了PDF合并任务。

对于合并PDF这样的简单操作，LLM只需要参考SKILL.md中的Merge PDFs内容就可以了。

1.1.3 测试高级Skills能力

PDF表单填写，是一个相对复杂的任务，涉及到PDF解析，检查表单，填写form，生成新的PDF等一些列操作。如果合并PDF是“感冒发烧”的话，那么PDF表单填写就是“疑难杂症”，需要特别处理。

为此，我们需要增加额外的forms.md，以及一些辅助完成任务的Scripts，完整的skills目录如下：

  .claude/
  ├── 📁 skills/
  │   └── 📁 pdf/
  │       ├── 📄 SKILL.md
  │       ├── 📄 forms.md
  │       └── 📁 scripts/
  │           ├── 📄 check_fillable_fields.py
  │           ├── 📄 convert_pdf_to_images.py
  │           ├── 📄 extract_form_field_info.py
  │           ├── 📄 fill_fillable_fields.py
  │           └── 📄 fill_pdf_form_with_annotations.py

forms.md中详细介绍了如何进行PDF表单填写的方法，可以在此查看，完整的forms.md的内容：https://github.com/anthropics/skills/blob/main/skills/pdf/forms.md

接下来，我们准备一个带有表单的PDF文件application.pdf，然后在Claude中输入：请填写完整application.pdf中的表单，name是Frank，其它字段自行填充

在运行界面中，我们可以看到，forms.md会进一步被加载去完成表单填写任务

1.2 背后原理解析

1.2.1 渐进式披露（progressive disclosure）

上文介绍的通过forms.md暴露PDF表单填写技能的过程，实际上，就是skills保持context clean的核心机制——叫渐进式披露（progressive disclosure）。

也就是在最开始，Agent只向LLM暴露最少的metadata内容，对应到我们的pdf skills，就是SKILL.md中的frontmatter内容：

---
name: pdf
description: Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When Claude needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.
---

之后，Agent将技能metadata组装成系统提示词（System prompt）发送给模型，以便模型了解可用的技能。对于Claude模型，发送给模型的System prompt通常是xml格式：

<available_skills>
  <skill>
    <name>pdf-processing</name>
    <description>Extracts text and tables from PDF files, fills forms, merges documents.</description>
    <location>/path/to/skills/pdf-processing/SKILL.md</location>
  </skill>
  <skill>
    <name>data-analysis</name>
    <description>Analyzes datasets, generates charts, and creates summary reports.</description>
    <location>/path/to/skills/data-analysis/SKILL.md</location>
  </skill>
</available_skills>

除了metadata信息，SKILL.md的指导部分也清晰表明了，如果需要form填写，请阅读forms.md

# PDF Processing Guide

## Overview

This guide covers essential PDF processing operations using Python libraries and command-line tools. For advanced features, JavaScript libraries, and detailed examples, see reference.md. **If you need to fill out a PDF form, read forms.md and follow its instructions**.

当LLM需要处理PDF表单填写任务时，它会在SKILL.md指引下，进一步去查看forms.md的内容。这种按需（on-demand）加载信息的方式，就叫渐进式披露。渐进式披露是使Agent技能灵活且可扩展的核心设计原则。

就像一本组织良好的手册，它从目录开始，然后是具体章节，最后是详细的附录。技能让Claude能够按需加载信息：当执行特定任务时，拥有文件系统和代码执行工具的智能体无需将整个技能内容全部读入上下文窗口。这意味着，技能中可以绑定的上下文量实际上是无限的。

1.2.2 实现一个带有Skills功能的Agent

和MCP一样，Skills是Anthropics制定的一个规范，它并不意味着一定要和Claude Code绑定。只要遵从Skills 规范（https://agentskills.io/home），实现一个满足自己Agent要求的Skills也并非难事。

如果要实现自己的带有Skills功能的Agent，首先我们有Skills发现能力。在Claude Code中，Skills 由 Claude 从三个来源自动发现：

个人 Skills：~/.claude/skills/
项目 Skills：.claude/skills/
插件 Skills：与已安装的插件捆绑

如果是我们自己实现的话，那么主目录就不一定是.claude/了，如果是这样的话，我们在开发Skills的时候，就要遵循自己的约定才可以，当然兼容.claude/也就是一行代码的问题。

其次，我们要从SKILL.md读取metadata信息，这个简单，用类似下面的代码就可以了。

function parseMetadata(skillPath):
    content = readFile(skillPath + "/SKILL.md")
    frontmatter = extractYAMLFrontmatter(content)

    return {
        name: frontmatter.name,
        description: frontmatter.description,
        path: skillPath
    }

再往后，就是如何实现progressive disclosure，以及如何和LLM交互的细节问题了。需要注意的是，当Skills调用结束，要及时的清理Context，比如pdf表单填写任务完成之后，forms.md的内容就可以从Context中释放了，这样才能保证Context的简洁（concise and clean）。

Skills就是对专业技能的模块化封装，和上下文隔离，类似的还有SubAgent，也是能力封装的一种方式。

2. LangChain SubAgents

为了解决上下文爆炸问题(Context bloat problem)，LangChain发布了SubAgents技术。当Agent调用生成大量输出的工具（如网络搜索、文件读取、数据库查询）时，中间结果会迅速填满上下文窗口。SubAgents将这类细节工作隔离处理——主代理（Main Agent）只会接收最终结果，而无需接收生成该结果的上百次工具调用过程。

由此可见，SubAgents和我们介绍过的A2A协议（Agent to Agent Protocol）类似，只不过SubAgent是在一个进程内的不同Agent之间的协作，和A2A的区别，有点像方法调用和RPC之间的区别。

2.1 建一个有SubAgents的Agent

LangChain的SubAgents是基于Middleware机制（https://docs.langchain.com/oss/python/deepagents/middleware）实现的，我们可以用下面的方式，把internet搜索包装成一个SubAgent，因为互联网搜索的数据很庞杂，我们不希望这些raw data污染我们的Main Agent。

from typing import Literal

from deepagents.middleware.subagents import SubAgentMiddleware
from langchain.agents import create_agent
from langchain.agents.middleware import TodoListMiddleware
from langchain_core.tools import tool
from tavily import TavilyClient

from DeepSeek_model import llm


tavily_client = TavilyClient(
    api_key="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")

@tool
def internet_search(
        query: str,
        max_results: int = 2,
        topic: Literal["general", "news", "finance"] = "general",
        include_raw_content: bool = False,
):
    """Run a web search"""
    return tavily_client.search(
        query,
        max_results=max_results,
        include_raw_content=include_raw_content,
        topic=topic,
        verify=False  # 如果支持的话
    )

agent = create_agent(
    model=llm,
    system_prompt="Use subagents for specialized tasks.",
    middleware = [
        TodoListMiddleware(
            system_prompt="Use the write_todos tool to help you manage and plan complex objectives"  # Optional: Custom addition to the system prompt
        ),
        SubAgentMiddleware(
            default_model=llm,
            subagents=[
                {
                    "name": "web search",
                    "description": "search in web, if you need to access internet",
                    "system_prompt": "Use this to run an internet search for a given query.",
                    "tools": [internet_search],
                    "middleware": [],
                }
            ],
        )
    ],
)

result = agent.invoke({"messages": [{"role": "user", "content": "compare china hangzhou's today's temperature with yesterday"}]})

# Print the agent's response
print(result["messages"][-1].content)

在上面的代码中, TodoListMiddleware是做任务分解的，SubAgentMiddleware 是用来声明SubAgents的。

在和大模型交互的时候，这些代码会转换成如下的Function Calling Prompt。这样，大模型就能使用Function Calling能力实现任务分解、互联网查询了。

"tools": [
    {
      "type": "function",
      "function": {
        "name": "write_todos",
        "description": "Use this tool to create and manage a structured task list for your current work session..省略..",
        "parameters": {
          "properties": {
            "todos": {
              "items": {
                "description": "A single todo item with content and status.",
                "properties": {
                  "content": {
                    "type": "string"
                  },
                  "status": {
                    "enum": [
                      "pending",
                      "in_progress",
                      "completed"
                    ],
                    "type": "string"
                  }
                },
                "required": [
                  "content",
                  "status"
                ],
                "type": "object"
              },
              "type": "array"
            }
          },
          "required": [
            "todos"
          ],
          "type": "object"
        }
      }
    },
    {
      "type": "function",
      "function": {
        "name": "task",
        "description": "Launch an ephemeral subagent to handle complex, multi-step independent tasks with isolated context windows..... This agent has access to all tools as the main agent.\n- web search: search in web, if you need to access internet\n\nWhen using the Task tool, you must specify a subagent_type parameter to select which agent type to use.\n\n## Usage notes:\n1. Launch multiple agents concurrently whenever possible, to maximize performance; ...省略...",
        "parameters": {
          "properties": {
            "description": {
              "type": "string"
            },
            "subagent_type": {
              "type": "string"
            }
          },
          "required": [
            "description",
            "subagent_type"
          ],
          "type": "object"
        }
      }
    }
  ]
}

所以，不管上层的建构有多么不一样，对于AI Agent而言，其最底层的技术无外乎就是Function Calling + Prompt。所以深刻的理解Function Calling很重要，对该技术还不清楚的，可以阅读我之前的文章《一文搞懂MCP、Function Calling和A2A》。

2.2 跑一个任务，看清后背本质

接下来，我们给这个Agent输入一个需要访问internet的任务：

compare china hangzhou's today's temperature with yesterday

对于这个任务。按照我们的设计，我们期望的处理过程应该是：Main Agent会先列出todo list，然后并行的启用两个web search SubAgents去获取天气数据，最后整合SubAgents的信息得到最终结果。整个过程如下：

Main Agent → TodoList: 创建任务清单
Main Agent → SubAgent1: "Hangzhou today temperature"
Main Agent → SubAgent2: "Hangzhou yesterday temperature"

SubAgent1 → Tavily API: 搜索请求
Tavily API → SubAgent1: 返回今日天气数据
SubAgent1 → Main Agent: 传递今日天气结果

SubAgent2 → Tavily API: 搜索请求  
Tavily API → SubAgent2: 返回昨日天气数据
SubAgent2 → Main Agent: 传递昨日天气结果

Main Agent: 整合数据并生成最终响应

通过观察Agent和模型的交互日志（我在一文搞懂LangChain及背后原理有介绍如何打印日志），可以佐证本Agent的确是如我们期望的那样在运作，下面是本次任务执行中，某一次Agent与模型交互日志的Response内容，清晰的描述了，在执行web search任务时，是启用了两个SubAgents并行获取互联网内容的。

{
      "content": "Now I'll launch two parallel subagents to research both temperatures simultaneously. This will save time and allow both research tasks to be completed independently.",
      "role": "assistant",
      "tool_calls": [
        {
          "type": "function",
          "id": "call_00_RB8pgMCNhmLLMin0Ha77DAt6",
          "function": {
            "name": "task",
            "arguments": "{\"subagent_type\": \"web search\", \"description\": \"Research today's current temperature in Hangzhou, China.\"}"
          }
        },
        {
          "type": "function",
          "id": "call_01_usg162FRsosiCNQXuZxZw6xa",
          "function": {
            "name": "task",
            "arguments": "{\"subagent_type\": \"web search\", \"description\": \"Research yesterday's temperature in Hangzhou, China. \"}"
          }
        }
      ]
    }