我要投稿

LlamaIndex结合Ragflow，打造高性能大模型RAG应用

发布日期：2025-04-28 20:31:09 浏览次数： 2382

作者：AI科技论谈

微信搜一搜，关注“AI科技论谈”

LlamaIndex 与 Ragflow 携手：打造大语言模型应用的超强组合拳。
LlamaIndex和Ragflow是两款开源工具，给开发者们带来了极大便利。LlamaIndex作为一款数据框架，能够轻松实现大语言模型与各类外部数据源的连接，无论是结构化数据（如SQL、NoSQL数据库）、非结构化数据（像文档、网页），还是私有数据（通过API获取），都能与之高效对接。而Ragflow作为工作流编排工具，专注于管理复杂的大语言模型管道执行流程，确保整个处理过程有条不紊地进行。

二者相辅相成，共同为构建强大且具备高扩展性的大语言模型应用程序提供了全方位的解决方案，助力开发者在该领域更高效地创新与实践。

1 定义

1.1 LlamaIndex

LlamaIndex让开发者能够将大语言模型与多种外部数据源连接，这些数据源包括结构化数据（SQL数据库、非关系型数据库）、非结构化数据（文档、网页）以及私有数据（API）。借助它，开发者可构建能广泛获取信息并推理的大语言模型应用。

LlamaIndex有诸多特性：

便捷数据连接器：自带预构建数据连接器库，适配常见数据源。对接新数据源时，开发者无需编写自定义代码。
高效数据索引：可对外部数据索引，在大型数据集里能快速搜索、检索信息。
智能问答功能：能基于外部数据源回答问题，方便开发者打造针对特定主题或文档的问答应用。

1.2 Ragflow

Ragflow 作为一款工作流编排工具，能够对复杂的大语言模型管道执行过程进行有效管理。凭借这一特性，为构建具备多任务执行能力的大语言模型应用程序提供了有力支撑。这些任务包括：

数据检索：Ragflow可以从外部数据源检索数据。
数据处理：Ragflow能够对数据进行处理，例如清洗、转换和汇总数据。
大语言模型推理：Ragflow可以执行大语言模型推理任务。
输出生成：Ragflow能够以多种格式生成输出，如文本、表格或图表。

1.3 LlamaIndex与Ragflow协同工作

LlamaIndex和Ragflow可以协同使用，以构建强大的大语言模型应用程序。

LlamaIndex负责数据交互，连接大语言模型与各类数据源，还能索引和查询数据，拓宽模型信息获取渠道。Ragflow专注工作流程编排，管理复杂的大语言模型管道执行。

二者协同，让开发多功能的大语言模型应用成为可能。这些应用可实现问答、文本生成、数据分析等任务，满足不同场景需求，助力大语言模型广泛应用。

2 代码实现

接下来，我们分步骤进行LlamaIndex与Ragflow的代码实现：

步骤一：安装库、初始化API密钥并下载数据

pip install -U llama-index

# 初始化API密钥
import os
os.environ["OPENAI_API_KEY"] = "sk-proj-..."

# 下载数据
!mkdir -p data
!wget --user-Agent "Mozilla" "https://arxiv.org/pdf/2307.09288.pdf" -O "data/llama2.pdf"

步骤二：工作流事件

from llama_index.core.workflow import Event
from llama_index.core.schema import NodeWithScore


class RetrieverEvent(Event):
    """运行检索的结果"""
    nodes: list[NodeWithScore]


class RerankEvent(Event):
    """对检索到的节点进行重新排序的结果"""
    nodes: list[NodeWithScore]

步骤三：完整工作流

from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.response_synthesizers import CompactAndRefine
from llama_index.core.postprocessor.llm_rerank import LLMRerank
from llama_index.core.workflow import (
    Context,
    Workflow,
    StartEvent,
    StopEvent,
    step,
)
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding


class RAGWorkflow(Workflow):
    @step(pass_context=True)
    asyncdef ingest(self, ctx: Context, ev: StartEvent) -> StopEvent | None:
        """摄取文档的入口点，由包含`dirname`的StartEvent触发。"""
        dirname = ev.get("dirname")
        ifnot dirname:
            returnNone

        documents = SimpleDirectoryReader(dirname).load_data()
        ctx.data["index"] = VectorStoreIndex.from_documents(
            documents=documents,
            embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"),
        )
        return StopEvent(result=f"Indexed {len(documents)} documents.")

    @step(pass_context=True)
    asyncdef retrieve(
        self, ctx: Context, ev: StartEvent
    ) -> RetrieverEvent | None:
        """RAG的入口点，由包含`query`的StartEvent触发。"""
        query = ev.get("query")
        ifnot query:
            returnNone

        print(f"Query the database with: {query}")

        # 将查询存储在全局上下文中
        ctx.data["query"] = query

        # 从全局上下文中获取索引
        index = ctx.data.get("index")
        if index isNone:
            print("Index is empty, load some documents before querying!")
            returnNone

        retriever = index.as_retriever(similarity_top_k=2)
        nodes = retriever.retrieve(query)
        print(f"Retrieved {len(nodes)} nodes.")
        return RetrieverEvent(nodes=nodes)

    @step(pass_context=True)
    asyncdef rerank(self, ctx: Context, ev: RetrieverEvent) -> RerankEvent:
        # 对节点重新排序
        ranker = LLMRerank(
            choice_batch_size=5, top_n=3, llm=OpenAI(model="gpt-4o-mini")
        )
        print(ctx.data.get("query"), flush=True)
        new_nodes = ranker.postprocess_nodes(
            ev.nodes, query_str=ctx.data.get("query")
        )
        print(f"Reranked nodes to {len(new_nodes)}")
        return RerankEvent(nodes=new_nodes)

    @step(pass_context=True)
    asyncdef synthesize(self, ctx: Context, ev: RerankEvent) -> StopEvent:
        """使用重新排序后的节点返回流式响应。"""
        llm = OpenAI(model="gpt-4o-mini")
        summarizer = CompactAndRefine(llm=llm, streaming=True, verbose=True)
        query = ctx.data.get("query")

        response = await summarizer.asynthesize(query, nodes=ev.nodes)
        return StopEvent(result=response)

步骤四：运行工作流

w = RAGWorkflow()
# 摄取文档
await w.run(dirname="data")
# 运行查询
result = await w.run(query="How was Llama2 trained?")
async for chunk in result.async_response_gen():
    print(chunk, end="", flush=True)

Query the database with: How was Llama2 trained?
Retrieved 2 nodes.
Llama 2 was trained through a multi-step process that began with pretraining using publicly available online sources. This was followed by the creation of an initial version of Llama 2-Chat through supervised fine-tuning. The model was then iteratively refined using Reinforcement Learning with Human Feedback (RLHF) methodologies, which included techniques like rejection sampling and Proximal Policy Optimization (PPO). 

During pretraining, the model utilized an optimized auto-regressive transformer architecture, incorporating robust data cleaning, updated data mixes, and training on a significantly larger dataset of 2 trillion tokens. The training process also involved increased context length and the use of grouped-query attention (GQA) to enhance inference scalability. 

The training employed the AdamW optimizer with specific hyperparameters, a cosine learning rate schedule, and gradient clipping. The models were pretrained on Meta's Research SuperCluster and internal production clusters, utilizing NVIDIA A100 GPUs for the training process.