微信扫码
添加专属顾问
我要投稿
2025年AI Agent提示工程的关键突破:告别无效模板,掌握Context Engineering的核心技巧。 核心内容: 1. 揭示传统Prompt Engineering失效的根本原因 2. 2025年五大核心技巧与高级模式详解 3. 生产级落地方法与常见陷阱规避指南
根据 Anthropic 的 Context Engineering 研究,在 2025 年,真正重要的不是“prompt engineering”,而是“context engineering”。问题不再是“如何打造完美的 prompt”,而是“哪种 context 组合能引发期望的行为”。
我会带你走一遍当前研究(Anthropic、OpenAI、Google、Wharton)对 AI Agent prompting 的结论——以及如何在 n8n 工作流中具体落地。
你将学到:
我经常见到:有人在 Reddit 上找到一个“完美”的 prompt 模板,复制到 AI Agent Node 里,然后期待它“魔法生效”。
剧透:不会。
复制模板失败的原因:
Anthropic 的 Prompt Engineering 指南强调“找到合适的高度”(right altitude)——对指导足够具体,但又为推理留有空间。模板对你的特定场景来说,几乎总是在错误的高度。
第二个问题:Prompt 过于复杂
“越多越好”的思路会带来巨大问题:
解决方案:与模型一起生成 prompt
真正的游戏规则改变者是:让模型为你写 prompt。
不要花数小时打磨 prompt,而是给模型:
模型会生成为你场景优化的 prompt。你测试、与模型迭代、再细化。
为什么有效:模型最了解自己的“偏好”。它知道哪种表述、结构、示例最有效。
稍后我会展示具体做法。
n8n 的 AI Agent prompting 中最基础也最常见的错误:混淆 System Message 和 User Prompt。
在 AI Agent Node 中,有两个不同的 prompt 区域:
System Message(Options → System Message):
User Prompt(主输入):
为什么重要:Token 经济学与 Prompt Caching
两者都会随每次 API 调用发送。但正确分离对成本和性能都至关重要:
错误做法(把一切都塞进 User Prompt):
"You are Senior Support Engineer. Tools: search_docs, create_ticket.
Use search_docs first. Max 150 words. Friendly.
User question: {{$json.message}}"
若每天 1,000 次请求、每次 400 tokens: = 400,000 个冗余 tokens = 以 Claude Sonnet($3/M)计:$1.20/天 = 每月 $36 的纯冗余 context
正确做法:
System Message(只定义一次):
You are Senior Support Engineer.
TOOLS:
- search_docs(query): Search Product Docs
- create_ticket(title, priority): Create Support Ticket
WORKFLOW:
1. FAQ → search_docs
2. Complex Issue → create_ticket
BEHAVIOR:
- Max 150 words
- When uncertain: Create ticket, don't guess
User Prompt 仅为:{{$json.message}}
= 每次 50 tokens 而非 400 = 节省:每天 350K tokens = 每月约 $31.50(以 Claude Sonnet 计)
Prompt Caching:为什么 System Message 应尽量保持静态
Anthropic 和 OpenAI 提供 Prompt Caching——System Message 会被缓存,不必每次都重新处理。可将延迟降低 50–80%,对已缓存的 tokens 成本最高可降至 10%。
但:一旦你更改 System Message,缓存就会失效。因此:
缓存影响示例:
无缓存: 请求 1:500 tokens 的 System Message = $0.0015 请求 2:500 tokens 的 System Message = $0.0015 请求 1000:500 tokens 的 System Message = $0.0015 总计:1,000 次请求 $1.50
有缓存(System Message 保持稳定): 请求 1:500 tokens 的 System Message = $0.0015(写入缓存) 请求 2:500 tokens 缓存命中 = $0.00015(便宜 90%) 请求 1000:500 tokens 缓存命中 = $0.00015 总计:~$0.15/1000 次请求 = 90% 节省
Dynamic System Messages:强大但要谨慎
你可以用 n8n Expressions 让 System Message 动态化——但要注意缓存:
You are Support Engineer for {{$('Get Config').item.json.company_name}}.
PRODUCT: {{$('Get Config').item.json.product_description}}
TONE: {{$('Get Config').item.json.support_tone}}
适用场景:多租户系统——一个工作流,多个客户配置。
工作流:Webhook(Customer ID) → DB Lookup → AI Agent(动态 System Message) → Response
缓存权衡:动态 System Message 会破坏缓存——仅在必要时使用。
来自 Anthropic、OpenAI、Google 在 2024–2025 的研究显示:有一些对所有模型都有效的基本技巧。以下五条最重要:
Anthropic 的 Prompt Engineering 指南称之为“找到合适的高度”(right altitude)——既足够具体以提供指导,又为推理保留灵活性。
“同事测试”:如果一个聪明的同事看不懂这条指令,AI 也不会懂。
反例:
Classify emails intelligently and accurately.
“intelligently” 是什么?有哪些类别?输出格式是怎样?
正例:
Classify emails into: sales, support, billing, general
URGENCY CRITERIA:
- high: contains "urgent", "asap", "immediately", "broken"
- medium: time-related request without extremity
- low: everything else
OUTPUT: JSON
{
"category": "support",
"urgency": "high",
"confidence": 0.92
}
为何有效:
Bsharat 等(2024)研究显示,正向指令明显优于负向指令。将“不要做 X”改为“请做 Y”,平均带来 57% 的质量提升。
负向指令为何失效:
负向反例:
Don't be too wordy.
Don't use technical jargon.
Don't make assumptions about customer intent.
正向改写:
Keep responses under 150 words.
Use plain language that a non-technical customer understands.
When customer intent is unclear, ask clarifying questions.
实际影响:
在生产环境的邮件分类 agent 中,负向指令(“不要误判紧急请求”)造成 31% 的漏判。正向改写(“凡含时间限制的请求一律标记为 urgent”)将漏判降至 8%。
Few-shot 示例非常有效——但大多数人用错了。
研究共识:
糟糕的 few-shot(过于相似):
EXAMPLES:
1."How do I reset my password?"→category:support,urgency:low
2."Where is the password reset option?"→category:support,urgency:low
3."I can't find password settings."→category:support,urgency:low
全是同一种问题。模型学不到边界处理。
良好的 few-shot(多样且含 edge cases):
Example 1 (Standard):
Input: "How do I reset my password?"
Output: {"category": "support", "urgency": "low", "confidence": 0.95}
Example 2 (Urgent):
Input: "URGENT: System down, can't access customer data!"
Output: {"category": "support", "urgency": "high", "confidence": 0.98}
Example 3 (Mixed Intent):
Input: "I want to upgrade my plan but also report a billing error."
Output: {"category": "billing", "urgency": "medium", "confidence": 0.78, "note": "Multiple intents detected"}
Example 4 (Edge Case - Unclear):
Input: "help"
Output: {"category": "general", "urgency": "low", "confidence": 0.45, "action": "request_clarification"}
为何有效:
AI agents 的大问题之一:hallucination(幻觉)。找不到答案时它们会编造。
解决方案:显式约束,将 agent “落地”。
糟糕做法(无约束):
Answer customer support questions based on our documentation.
后果:找不到信息时 agent 会胡编。
良好做法(显式约束):
Answer customer support questions using ONLY information from the documentation you can access via search_docs tool.
CONSTRAINTS:
- If information is not in docs: "I don't have that information in our current documentation. I'll create a ticket for our team to help you."
- Never make assumptions about features or functionality
- Never provide workarounds that aren't documented
- If multiple solutions exist: Present all documented options
ESCALATION CRITERIA:
- Customer mentions "urgent", "broken", "down" → create ticket immediately
- Question requires account-specific data → create ticket with details
- Documentation is incomplete/contradictory → create ticket noting the issue
为何有效:
生产影响:
在每月处理 2000+ 询问的客服 agent 中,加入约束将幻觉率从 23% 降至 3%。升级的人工工单质量显著提升,因为工单会包含具体的文档缺口信息。
Anthropic 的研究很明确:不是“更多 context”,而是“正确的 context”。
原则:Smallest High-Signal Token Set
糟糕的 context(冗长、重复):
You are a helpful AI assistant designed to help customers with their questions and concerns. You should always be polite, professional, and courteous in your responses. Make sure to read the customer's question carefully and provide a thorough and complete answer that addresses all of their concerns. If you're not sure about something, it's better to say you don't know than to provide incorrect information...
350 个 token 的空话,几乎没有可执行指导。
良好的 context(密度高、具体):
You are Support Agent.
RESPONSE REQUIREMENTS:
- Max 150 words
- Plain language (non-technical)
- Structure: Problem acknowledgment → Solution → Next steps
TOOLS:
- search_docs(query) → search product documentation
- create_ticket(title, priority, details) → escalate to human team
WORKFLOW:
1. Search docs for relevant information
2. If found: Provide answer with doc reference
3. If not found OR customer mentions "urgent"/"broken": Create ticket
110 个 token,信号密度很高。每行都有可执行信息。
Token 审计:
对 prompt 中每个句子问一句:“删掉它,agent 会变差吗?”如果不会,就删。
核心技巧适用于所有场景。下面这些高级模式非常强,但要“对症下药”。
沃顿商学院 2025 年 6 月的研究给出了迄今最全面的分析:CoT 对复杂推理有帮助,但对简单任务效果参差。
何时使用 CoT:
不该用 CoT 的场景:
在 n8n 中的实现:
TASK: Analyze customer request and determine best resolution path.
REASONING PROCESS (think step-by-step):
1. IDENTIFY: What is the core issue? (Quote specific parts of message)
2. CLASSIFY: Which category? (sales/support/billing/general)
3. ASSESS URGENCY: Time-sensitive keywords? Tone indicators?
4. CHECK PREREQUISITES: Can we resolve with available tools?
5. DECIDE: Route to appropriate handler with reasoning
Think through each step explicitly before providing your final answer.
性能影响:
结论:只有当准确度提升能抵消成本和延迟的权衡时,才使用 CoT。
当你的 agent 需要:
RAG 就是必需的。
n8n 中的 RAG 基本流程:
Webhook/Trigger
↓
Extract Query (user's question)
↓
Vector Search (retrieve relevant chunks from knowledge base)
↓
AI Agent (answer using retrieved context)
↓
Response
关键 RAG 要点(基于 kapa.ai 的分析):
RAG Prompt 示例:
Answer the customer's question using ONLY the information provided below.
CONTEXT FROM DOCUMENTATION:
{{$json.retrieved_chunks}}
CUSTOMER QUESTION:
{{$json.user_message}}
INSTRUCTIONS:
- Base answer strictly on provided context
- If context doesn't contain the answer: "I don't have that information in our current documentation."
- Include source reference: "According to [doc_title]..."
- If multiple relevant sections: Synthesize information from all
CONFIDENCE ASSESSMENT:
- High confidence: Answer directly stated in context
- Medium confidence: Answer can be inferred from context
- Low confidence: Context is incomplete → escalate
Wang 等(2024)研究发现:context 的“顺序”影响显著。
发现要点:
最优排序策略:
示例(RAG context):
MOST RELEVANT DOCUMENTATION:
[Chunk with highest relevance score]
ADDITIONAL CONTEXT:
[Supporting chunks]
CONSTRAINTS (IMPORTANT):
- Answer only from provided context
- If uncertain: Escalate to human team
OpenAI 的 Structured Outputs(GPT-4o)及其他模型的类似能力,解决了一个大问题:获得一致、可解析的输出。
传统 prompting 的问题:
Output format: JSON with fields category, urgency, confidence
模型可能会输出:
你得为这些情况全部做兜底。
Structured Outputs 的方案:
定义 JSON schema,配合 Structured Output Parser 节点拦截异常即可。
示例 schema:
{
"type":"object",
"properties":{
"category":{
"type":"string",
"enum":["sales","support","billing","general"]
},
"urgency":{
"type":"string",
"enum":["low","medium","high"]
},
"confidence":{
"type":"number",
"minimum":0,
"maximum":1"
},
"reasoning": {
"type": "string"
}
},
"required": ["category", "urgency", "confidence"]
}
好处:
何时使用:
我构建 AI agents 的方式就此改变:别再手写 prompt,让模型来生成。
流程:
Meta-prompt 示例:
I'm building an AI agent for customer support email classification. Help me create an optimal system message prompt.
REQUIREMENTS:
- Classify emails into: sales, support, billing, general
- Assess urgency: low, medium, high
- Output format: JSON with category, urgency, confidence
- Must handle edge cases: unclear intent, multiple topics, spam
TOOLS AVAILABLE:
- search_docs(query): Search documentation
- create_ticket(title, priority, description): Escalate to humans
EXAMPLES OF DESIRED BEHAVIOR:
[Include 3-5 diverse examples with input and expected output]
CONSTRAINTS:
- Never make up information
- When uncertain (confidence < 0.7): Escalate
- Response under 150 words for direct answers
- Include reasoning in output
Generate an optimized system message that will consistently produce these results.
模型会生成一个:
为何有效:
大多数“模型特定技巧”并不靠谱。但有些差异确实重要:
Claude(Anthropic):
GPT-4o(OpenAI):
GPT-4o-mini:
Gemini(Google):
选型经验法则:
好 prompt 远远不够——你需要生产级工作流。
用真实的 edge cases 测,别只测“快乐路径”:
Test cases for email triager:
✓ Standard support request
✓ Angry customer (caps, exclamation marks)
✓ Sales inquiry with technical questions (mixed intent)
✓ Very short message ("help")
✓ Wrong language (if only English supported)
✓ Spam/irrelevant content
AI agents 可能失败——要有兜底:
n8n workflow:
AI Agent Node
→ IF Error OR confidence < 0.7:
→ Fallback: Route to Human
→ ELSE:
→ Continue with automated workflow
带 confidence 的 System Message 约定:
If you're uncertain (confidence < 70%):
Set "needs_human_review": true in output
高并发下,每个 token 都很宝贵:
跟踪关键指标:
在 n8n 中:用 Webhook → Google Sheets 进行轻量记录:
After AI Agent Node:
→ Set Node (Extract Metrics):
- latency: {{$now - $('AI Agent').json.startTime}}
- input_tokens: {{$('AI Agent').json.usage.input_tokens}}
- output_tokens: {{$('AI Agent').json.usage.output_tokens}}
- confidence: {{$('AI Agent').json.confidence}}
→ Google Sheets (Append Row)
上线前:
Prompt 质量:
测试:
性能:
监控:
迭代:
五大通用核心技巧:
情境性高级模式:
元结论:
你的下一步:挑一个现有的 n8n AI Agent 工作流,套用以上五大核心技巧。对比前后 token 使用。通常你会看到成本大幅下降,同时输出质量不降反升。
这就是“勉强可用”的 prompting 与“可规模化、可上生产”的 prompting 的区别。
53AI,企业落地大模型首选服务商
产品:场景落地咨询+大模型应用平台+行业解决方案
承诺:免费POC验证,效果达标后再合作。零风险落地应用大模型,已交付160+中大型企业
2025-10-09
2025-08-25
2025-08-18
2025-10-21
2025-08-24
2025-09-23
2025-08-29
2025-08-16
2025-08-17
2025-09-12