我要投稿

RAG实战(一)：Simple RAG篇

发布日期：2025-11-13 05:49:57 浏览次数： 1521

作者：影子的分享站

微信搜一搜，关注“影子的分享站”

接下来带来一套RAG实战教程，涵盖各类RAG场景，教程代码均为Python版，保守预计文章30篇+。本篇是第一篇最简单的RAG示例流程，文章中会贴出所有代码，但若想学的更方便，可通过知识星球获取完整项目代码。

环境配置

conda create -n rag-learning python = 3.10.0

source activate rag-learning

安装依赖 requirements.txt

• fitz
• numpy
• openai
• requests
• rank-bm25
• scikit-learn
• networkx
• matplotlib
• tqdm
• Pillow
• faiss-cpu

Simpe RAG流程描述

以下实现一个简单RAG流程，遵循以下步骤

1. 文本提取：家在和预处理文本数据
2. 分块；将数据分成更小的块，提高检索性能
3. 嵌入创建：使用嵌入模型将文本块转化为数字表示
4. 语义搜索：根据用户查询检索相关块
5. 响应生成：使用AI模型根据检索到的文本生成相应

导入依赖

import fitz
import os
import numpy as np
import json
from openai import OpenAI
import faiss

from enum import Enum

class LLMsProvider(Enum):
    OPENAI = "openai"

def get_llms_provider(provider: LLMsProvider, model_id: str = None):
    if provider == LLMsProvider.OPENAI:
        from openai import OpenAI
        import os
        client = OpenAI(
            api_key=os.getenv("DASHSCOPE_API_KEY"),
            base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
        )
        return client

PDF解析

在本实例中，我们使用PyMuPDF库，从PDF文件提取文本

def extract_text_from_pdf(pdf_path):
    my_pdf = fitz.open(pdf_path)
    all_text = ""
    for page_num in range(my_pdf.page_count):
        page = my_pdf[page_num]
        text = page.get_text("text")
        all_text = text
    return all_text

PDF解析

pdf_path = "../data/google-ai-Agents-whitepaper.pdf"

extract_text = extract_text_from_pdf(pdf_path)
extract_text

"Agents\n42\nSeptember 2024\nEndnotes\n1.\t Shafran, I., Cao, Y. et al., 2022, 'ReAct: Synergizing Reasoning and Acting in Language Models'. Available at: \nhttps://arxiv.org/abs/2210.03629\n2.\t Wei, J., Wang, X. et al., 2023, 'Chain-of-Thought Prompting Elicits Reasoning in Large Language Models'. \nAvailable at: https://arxiv.org/pdf/2201.11903.pdf.\n3.\t Wang, X. et al., 2022, 'Self-Consistency Improves Chain of Thought Reasoning in Language Models'. \nAvailable at: https://arxiv.org/abs/2203.11171.\n4.\t Diao, S. et al., 2023, 'Active Prompting with Chain-of-Thought for Large Language Models'. Available at: \nhttps://arxiv.org/pdf/2302.12246.pdf.\n5.\t Zhang, H. et al., 2023, 'Multimodal Chain-of-Thought Reasoning in Language Models'. Available at: \nhttps://arxiv.org/abs/2302.00923.\n6.\t Yao, S. et al., 2023, 'Tree of Thoughts: Deliberate Problem Solving with Large Language Models'. Available at: \nhttps://arxiv.org/abs/2305.10601.\n7.\t Long, X., 2023, 'Large Language Model Guided Tree-of-Thought'. Available at: \nhttps://arxiv.org/abs/2305.08291.\n8.\t Google. 'Google Gemini Application'. Available at: http://gemini.google.com.\n9.\t Swagger. 'OpenAPI Specification'. Available at: https://swagger.io/specification/.\n10.\tXie, M., 2022, 'How does in-context learning work? A framework for understanding the differences from \ntraditional supervised learning'. Available at: https://ai.stanford.edu/blog/understanding-incontext/.\n11.\t Google Research. 'ScaNN (Scalable Nearest Neighbors)'. Available at: \nhttps://github.com/google-research/google-research/tree/master/scann.\n12.\t LangChain. 'LangChain'. Available at: https://python.langchain.com/v0.2/docs/introduction/.\n"

文本分块

当我们提取了文本，可将其分成更小的，重叠的块，以提高检索的准确性

def chunk_text(text, chunk_size=1000,overlap=200):
    chunks = []

    for i in range(0, len(text), chunk_size - overlap):
        chunk_text = text[i:i + chunk_size]
        chunks.append(chunk_text)
    return chunks

text_chunks = chunk_text(extract_text, 1000, 200)
print("Number of text chunks:", len(text_chunks))

print("\n First text chunk:")
print(text_chunks[0])

Number of text chunks: 3

 First text chunk:
Agents
42
September 2024
Endnotes
1.     Shafran, I., Cao, Y. et al., 2022, 'ReAct: Synergizing Reasoning and Acting in Language Models'. Available at: 
https://arxiv.org/abs/2210.03629
2.     Wei, J., Wang, X. et al., 2023, 'Chain-of-Thought Prompting Elicits Reasoning in Large Language Models'. 
Available at: https://arxiv.org/pdf/2201.11903.pdf.
3.     Wang, X. et al., 2022, 'Self-Consistency Improves Chain of Thought Reasoning in Language Models'. 
Available at: https://arxiv.org/abs/2203.11171.
4.     Diao, S. et al., 2023, 'Active Prompting with Chain-of-Thought for Large Language Models'. Available at: 
https://arxiv.org/pdf/2302.12246.pdf.
5.     Zhang, H. et al., 2023, 'Multimodal Chain-of-Thought Reasoning in Language Models'. Available at: 
https://arxiv.org/abs/2302.00923.
6.     Yao, S. et al., 2023, 'Tree of Thoughts: Deliberate Problem Solving with Large Language Models'. Available at: 
https://arxiv.org/abs/2305.10601.
7.     Long, X., 2023, 'Large Language Model Guided Tree-of-Thought'. Av

文本向量化

def create_embeddings(text, client, model="text-embedding-v4"):
    response = client.embeddings.create(model=model, input=text)

    embeddings = [response.data[i].embedding for i in range(len(response.data))]
    passages_embs = np.array(embeddings).astype(np.float32)
    return passages_embs,passages_embs.shape[0],passages_embs.shape[1]

client = get_llms_provider(LLMsProvider.OPENAI)

知识库构建

def build_index(passages):
    passages_embs, num_vec, vec_dim = create_embeddings(passages, client)
    quantizer = faiss.IndexFlatIP(vec_dim)
    faiss_index = faiss.IndexIVFFlat(
        quantizer, vec_dim, int(np.sqrt(num_vec)), faiss.METRIC_INNER_PRODUCT
    )
    faiss.normalize_L2(passages_embs)
    faiss_index.train(passages_embs)
    faiss_index.add(passages_embs)
    return faiss_index

faiss_index = build_index(text_chunks)

WARNING clustering 3 points to 1 centroids: please provide at least 39 training points

知识库检索

def search(query, faiss_index, passages, recall_topk=3):
    passages_embs,_,_ = create_embeddings([query], client)
    res_distance, res_index = faiss_index.search(passages_embs, recall_topk)
    candidate_query_score_list = []
    for index, score in zip(res_index[0], res_distance[0]):
        candidate_query_score_list.append({"text": passages[index]})
        candidate_query_score_list[-1]["score"] = float(score)
    return candidate_query_score_list

query = "What is an agent"

top_chunks = search(query, faiss_index, text_chunks)

for i, chunk in enumerate(top_chunks):
    print(f"Context {i + 1}:\n{chunk}\n==============")

Context 1:
{'text': "Agents\n42\nSeptember 2024\nEndnotes\n1.\t Shafran, I., Cao, Y. et al., 2022, 'ReAct: Synergizing Reasoning and Acting in Language Models'. Available at: \nhttps://arxiv.org/abs/2210.03629\n2.\t Wei, J., Wang, X. et al., 2023, 'Chain-of-Thought Prompting Elicits Reasoning in Large Language Models'. \nAvailable at: https://arxiv.org/pdf/2201.11903.pdf.\n3.\t Wang, X. et al., 2022, 'Self-Consistency Improves Chain of Thought Reasoning in Language Models'. \nAvailable at: https://arxiv.org/abs/2203.11171.\n4.\t Diao, S. et al., 2023, 'Active Prompting with Chain-of-Thought for Large Language Models'. Available at: \nhttps://arxiv.org/pdf/2302.12246.pdf.\n5.\t Zhang, H. et al., 2023, 'Multimodal Chain-of-Thought Reasoning in Language Models'. Available at: \nhttps://arxiv.org/abs/2302.00923.\n6.\t Yao, S. et al., 2023, 'Tree of Thoughts: Deliberate Problem Solving with Large Language Models'. Available at: \nhttps://arxiv.org/abs/2305.10601.\n7.\t Long, X., 2023, 'Large Language Model Guided Tree-of-Thought'. Av", 'score': 0.27010855078697205}
==============
Context 2:
{'text': 'vailable at: https://python.langchain.com/v0.2/docs/introduction/.\n', 'score': 0.15880665183067322}
==============
Context 3:
{'text': " 2023, 'Tree of Thoughts: Deliberate Problem Solving with Large Language Models'. Available at: \nhttps://arxiv.org/abs/2305.10601.\n7.\t Long, X., 2023, 'Large Language Model Guided Tree-of-Thought'. Available at: \nhttps://arxiv.org/abs/2305.08291.\n8.\t Google. 'Google Gemini Application'. Available at: http://gemini.google.com.\n9.\t Swagger. 'OpenAPI Specification'. Available at: https://swagger.io/specification/.\n10.\tXie, M., 2022, 'How does in-context learning work? A framework for understanding the differences from \ntraditional supervised learning'. Available at: https://ai.stanford.edu/blog/understanding-incontext/.\n11.\t Google Research. 'ScaNN (Scalable Nearest Neighbors)'. Available at: \nhttps://github.com/google-research/google-research/tree/master/scann.\n12.\t LangChain. 'LangChain'. Available at: https://python.langchain.com/v0.2/docs/introduction/.\n", 'score': 0.14151382446289062}
==============

LLMs回复

def generate_response(system_prompt, user_message, model="qwen-plus"):
    response = client.chat.completions.create(
        model=model,
        temperature=0,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "system", "content": user_message},
        ]
    )
    return response

system_prompt = "You are an AI Agent assistant that strictly answers based on the given context. If the answer cannot be derived directly from the provided context, respond with: 'I do not have enough information to answer that.'"
# Create the user prompt based on the top chunks
user_prompt = "\n".join([f"Context {i + 1}:\n{chunk['text']}\n=====================================\n" for i, chunk in enumerate(top_chunks)])
user_prompt = f"{user_prompt}\nQuestion: {query}"

# Generate AI response
ai_response = generate_response(system_prompt, user_prompt)
print(f"ai_response:{ai_response}")

ai_response:ChatCompletion(id='chatcmpl-f4693ca5-c995-4af3-b9b9-572ad33258db', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='I do not have enough information to answer that.', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None))], created=1762959661, model='qwen-plus', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=10, prompt_tokens=723, total_tokens=733, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetails(audio_tokens=None, cached_tokens=0)))

RAG评测

# Define the system prompt for the evaluation system
evaluate_system_prompt = "You are an intelligent evaluation system tasked with assessing the AI assistant's responses. If the AI assistant's response is very close to the true response, assign a score of 1. If the response is incorrect or unsatisfactory in relation to the true response, assign a score of 0. If the response is partially aligned with the true response, assign a score of 0.5."

# Create the evaluation prompt by combining the user query, AI response, true response, and evaluation system prompt
answer = "Agents are autonomous and can act independently of human intervention, especially when provided with proper goals or objectives they are meant to achieve"

evaluation_prompt = f"User Query: {query}\nAI Response:\n{ai_response.choices[0].message.content}\nTrue Response: {answer}\n{evaluate_system_prompt}"

# Generate the evaluation response using the evaluation system prompt and evaluation prompt
evaluation_response = generate_response(evaluate_system_prompt, evaluation_prompt)

# Print the evaluation response
print(evaluation_response.choices[0].message.content)

evaluation_response

0



ChatCompletion(id='chatcmpl-7a18d6c1-0b86-4806-b4c8-6fff1f59f81a', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='0', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None))], created=1762959661, model='qwen-plus', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=1, prompt_tokens=215, total_tokens=216, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetails(audio_tokens=None, cached_tokens=0)))

上述项目代码可通过知识星球获取

知识星球服务

已经沉淀的资产，减少您的时间，少走弯路。

1. 后端：体系化的学习路线，计算机基础知识、Java 后端框架、微服务相关体系化的文章讲解，面试资料
2. 大数据：体系化的学习路线，大数据体系下的 Hadoop、Spark、Flink 等面试资料
3. AI 工程：Spring AI && Spring AI Alibaba 框架的上手教程

其他：

• 每个月可免费向我提问一次，虽然白天工作比较繁忙，但尽量在 48 小时内给予回复
• 有些合适的实习、校招、社招机会会在星球内分享

陆续建设的资料

• 自然语言处理：LLM 的发展脉络的相关知识，早期的 BERT -> GPT 的演进，相关项目推荐
• 个人的复利成长：立足于未来，如何沉淀自身“资产”应对风险
• 基金理财：缅北 A、纳斯达克指数、摩根，定投的策略？
• ...

星球试运营阶段：加入后编写《新人报道》不少于 150 字，可获得免费修改简历一次，名额有限，后续若还想改简历可付费 199 元帮忙调整一次

作者简介：我是影子，00 年，22 年专业加权总评第一（1/86）从湖南农大保送读研，25 年硕士毕业于北京科技大学，发表过一篇自然语言处理的 SCI 论文，本硕数学，因为热爱、自驱力学完了传统 Java 后端微服务、大数据体系、自然语言处理。学生期间，曾先后在百度、理想、快手任职，有累积一年 + 的实习经历，同时兼任 Spring Ai Alibaba 开源社区 Committer

53AI，企业落地大模型首选服务商

产品：场景落地咨询+大模型应用平台+行业解决方案

承诺：免费POC验证，效果达标后再合作。零风险落地应用大模型，已交付160+中大型企业