微信扫码
添加专属顾问
我要投稿
利用大模型技术,探索《三国演义》中人物关系,开启文本分析新视角。核心内容:1. 大模型技术在文本实体关系抽取中的应用2. 国产大模型文心4.5与LangChain工具链的实践3. 《三国演义》文本处理与实体关系图谱构建步骤
# 划分段落。
def text2paragraphs(text):
result = [i.strip() for i in text.split('\r\n') if i]
print(f"该文本可分为{len(result)}个段落!")
return result
# 判断token是否为句子结束符,视情况再增加。
def is_sentence_end(token):
return token in ['。', '!', '?',"”"]
# 从chunk下边界向后查找到句子结束符,用于动态调整chunk划分以保证chunk以完整的句子结束
def find_sentence_boundary_forward(tokens, chunk_size):
end = len(tokens)
for i in range(chunk_size, len(tokens)): # 从chunk_size开始向后查找
if is_sentence_end(tokens[i]):
end = i + 1 # 包含句尾符号
break
return end
# 从位置start开始向前寻找上一句的句子结束符,以保证分块重叠的部分从一个完整的句子开始。
def find_sentence_boundary_backward(tokens, start):
for i in range(start - 1, -1, -1):
if is_sentence_end(tokens[i]):
return i + 1 # 包含句尾符号
return 0 # 找不到
# 文本分块,文本块的参考大小为chunk_size,文本块之间重叠部分的参考大小为overlap。
# 为了保证文本块之间重叠的部分及文本块末尾截断的部分都是完整的句子,文本块的大小和重叠部分的大小都是根据当前文本块的内容动态调整的,是浮动的值。
def chunk_text(text, chunk_size=300, overlap=50):
if chunk_size <= overlap: # 参数检查
raise ValueError("chunk_size must be greater than overlap.")
# 先划分为段落,段落保存了语义上的信息,整个段落去处理。
paragraphs = text2paragraphs(text)
chunks = []
buffer = []
# 逐个段落处理
i = 0
while i < len(paragraphs):
# 注满buffer,直到大于chunk_szie,整个段落读入,段落保存了语义上的信息。
while len(buffer) < chunk_size and i < len(paragraphs):
tokens = jieba.lcut(paragraphs[i]) # 分词 text2token
buffer.extend(tokens)
i += 1
# 当前buffer分块
while len(buffer) >= chunk_size:
# 动态调整分块大小,保证以完整的句子结束。
end = find_sentence_boundary_forward(buffer, chunk_size)
chunk = buffer[:end] # 包含休止符
chunks.append(chunk) # 保留token的状态以便后面计数
# 保证重叠的部分从完整的句子开始。
start_next = find_sentence_boundary_backward(buffer, end - overlap)
if start_next==0:
start_next = find_sentence_boundary_backward(buffer, end-1)
if start_next==0:
start_next = end - overlap
buffer=buffer[start_next:]
if buffer: # 如果缓冲区还有剩余的token
# 检查一下剩余部分是否已经包含在最后一个分块之中,它只是留作块间重叠。
last_chunk = chunks[len(chunks)-1]
rest = ''.join(buffer)
temp = ''.join(last_chunk[len(last_chunk)-len(rest):])
if temp!=rest: # 如果不是留作重叠,则是最后的一个分块。
chunks.append(buffer)
return chunks
import os
from langchain.prompts import PromptTemplate
from langchain.chains import SequentialChain, LLMChain
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
# 配置环境变量,AI_STUDIO_API_KEY可以从个人账号中获取
os.environ["AI_STUDIO_API_KEY"] = "XXX" # 替换为你的API_KEY
os.environ["MODEL_URL"] = "https://aistudio.baidu.com/llm/lmapi/v3"
os.environ["DeepSeek_MODEL"] = "deepseek-r1"
os.environ["ERNIE_MODEL"] = "ERNIE-4.5-8K-preview"
# 配置大模型
llm = ChatOpenAI(
base_url=os.environ.get("MODEL_URL"),
api_key=os.environ.get("AI_STUDIO_API_KEY"),
model=os.environ.get("ERNIE_MODEL"),
max_tokens=2048,
)
# 该prompt参考微软GraphRAGRAG
system_template="""
-目标-
给定相关的文本文档和实体类型列表,从文本中识别出这些类型的所有实体以及所识别实体之间的所有关系。
-步骤-
1.识别所有实体。对于每个已识别的实体,提取以下信息:
-entity_name:实体名称
-entity_type:以下类型之一:[{entity_types}]
-entity_description:对实体属性和活动的综合描述
将每个实体格式化为("entity"{tuple_delimiter}<entity_name>{tuple_delimiter}<entity_type>{tuple_delimiter}<entity_description>
2.从步骤1中识别的实体中,识别彼此*明显相关*的所有实体配对(source_entity, target_entity)。
对于每对相关实体,提取以下信息:
-source_entity:源实体的名称,如步骤1中所标识的
-target_entity:目标实体的名称,如步骤1中所标识的
-relationship_type:关系类型,确保关系类型的一致性和通用性,使用更通用和无时态的关系类型
-relationship_description:解释为什么你认为源实体和目标实体是相互关联的
-relationship_strength:一个数字评分,表示源实体和目标实体之间关系的强度
将每个关系格式化为("relationship"{tuple_delimiter}<source_entity>{tuple_delimiter}<target_entity>{tuple_delimiter}<relationship_type>{tuple_delimiter}<relationship_description>{tuple_delimiter}<relationship_strength>) 3.实体和关系的所有属性用中文输出,步骤1和2中识别的所有实体和关系输出为一个列表。使用**{record_delimiter}**作为列表分隔符。
4.完成后,输出{completion_delimiter}
######################
-示例-
######################
Example 1:
Entity_types: [person, technology, mission, organization, location]
Text:
while Alex clenched his jaw, the buzz of frustration dull against the backdrop of Taylor's authoritarian certainty. It was this competitive undercurrent that kept him alert, the sense that his and Jordan's shared commitment to discovery was an unspoken rebellion against Cruz's narrowing vision of control and order.
Then Taylor did something unexpected. They paused beside Jordan and, for a moment, observed the device with something akin to reverence. “If this tech can be understood..." Taylor said, their voice quieter, "It could change the game for us. For all of us.”
The underlying dismissal earlier seemed to falter, replaced by a glimpse of reluctant respect for the gravity of what lay in their hands. Jordan looked up, and for a fleeting heartbeat, their eyes locked with Taylor's, a wordless clash of wills softening into an uneasy truce.
It was a small transformation, barely perceptible, but one that Alex noted with an inward nod. They had all been brought here by different paths
################
Output:
("entity"{tuple_delimiter}"Alex"{tuple_delimiter}"person"{tuple_delimiter}"Alex is a character who experiences frustration and is observant of the dynamics among other characters."){record_delimiter}
("entity"{tuple_delimiter}"Taylor"{tuple_delimiter}"person"{tuple_delimiter}"Taylor is portrayed with authoritarian certainty and shows a moment of reverence towards a device, indicating a change in perspective."){record_delimiter}
("entity"{tuple_delimiter}"Jordan"{tuple_delimiter}"person"{tuple_delimiter}"Jordan shares a commitment to discovery and has a significant interaction with Taylor regarding a device."){record_delimiter}
("entity"{tuple_delimiter}"Cruz"{tuple_delimiter}"person"{tuple_delimiter}"Cruz is associated with a vision of control and order, influencing the dynamics among other characters."){record_delimiter}
("entity"{tuple_delimiter}"The Device"{tuple_delimiter}"technology"{tuple_delimiter}"The Device is central to the story, with potential game-changing implications, and is revered by Taylor."){record_delimiter}
("relationship"{tuple_delimiter}"Alex"{tuple_delimiter}"Taylor"{tuple_delimiter}"workmate"{tuple_delimiter}"Alex is affected by Taylor's authoritarian certainty and observes changes in Taylor's attitude towards the device."{tuple_delimiter}7){record_delimiter}
("relationship"{tuple_delimiter}"Alex"{tuple_delimiter}"Jordan"{tuple_delimiter}"workmate"{tuple_delimiter}"Alex and Jordan share a commitment to discovery, which contrasts with Cruz's vision."{tuple_delimiter}6){record_delimiter}
("relationship"{tuple_delimiter}"Taylor"{tuple_delimiter}"Jordan"{tuple_delimiter}"workmate"{tuple_delimiter}"Taylor and Jordan interact directly regarding the device, leading to a moment of mutual respect and an uneasy truce."{tuple_delimiter}8){record_delimiter}
("relationship"{tuple_delimiter}"Jordan"{tuple_delimiter}"Cruz"{tuple_delimiter}"workmate"{tuple_delimiter}"Jordan's commitment to discovery is in rebellion against Cruz's vision of control and order."{tuple_delimiter}5){record_delimiter}
("relationship"{tuple_delimiter}"Taylor"{tuple_delimiter}"The Device"{tuple_delimiter}"study"{tuple_delimiter}"Taylor shows reverence towards the device, indicating its importance and potential impact."{tuple_delimiter}9){completion_delimiter}
"""
from langchain_core.messages import HumanMessage, SystemMessage
from langchain.prompts import (
ChatPromptTemplate,
MessagesPlaceholder,
HumanMessagePromptTemplate,
SystemMessagePromptTemplate
)
chat_prompt = ChatPromptTemplate.from_messages(
[system_message_prompt, MessagesPlaceholder("chat_history"), human_message_prompt]
)
chain = chat_prompt | llm
tuple_delimiter = " : "
record_delimiter = "\n"
completion_delimiter = "\n\n"
entity_types = ["人物", "职位", "兵器", "战役", "地点"]
chat_history = []
import time
tuple_delimiter = " : "
record_delimiter = "\n"
completion_delimiter = "\n\n"
entity_types = ["人物", "职位", "兵器", "战役", "地点"]
chat_history = []
chat_prompt = ChatPromptTemplate.from_messages(
[system_message_prompt, MessagesPlaceholder("chat_history"), human_message_prompt]
)
chain = chat_prompt | llm3
input_text = ''.join(chunks[0])
print("原文如下>>> \n")
print(input_text)
t0 = time.time()
answer = chain.invoke({
"chat_history": chat_history,
"entity_types": entity_types,
"tuple_delimiter": tuple_delimiter,
"record_delimiter": record_delimiter,
"completion_delimiter": completion_delimiter,
"input_text": input_text
})
t1= time.time()
print("该模型耗时:",t1-t0,"秒")
print("\n")
print(answer.content)
print("\n")
原文如下>>>
却说孙坚被刘表围住,亏得程普、黄盖、韩当三将死救得脱,折兵大半,夺路引兵回江东。自此孙坚与刘表结怨。且说袁绍屯兵河内,缺少粮草。冀州牧韩馥,遣人送粮以资军用。谋士逢纪说绍曰:“大丈夫纵横天下,何待人送粮为食!冀州乃钱粮广盛之地,将军何不取之?”绍曰:“未有良策。”纪曰:“可暗使人驰书与公孙瓒,令进兵取冀州,约以夹攻,瓒必兴兵。韩馥无谋之辈,必请将军领州事;就中取事,唾手可得。”绍大喜,即发书到瓒处。瓒得书,见说共攻冀州,平分其地,大喜,即日兴兵。绍却使人密报韩馥。馥慌聚荀谌、辛评二谋士商议。谌曰:“公孙瓒将燕、代之众,长驱而来,其锋不可当。兼有刘备、关、张助之,难以抵敌。今袁本初智勇过人,手下名将极广,将军可请彼同治州事,彼必厚待将军,无患公孙瓒矣。”韩馥即差别驾关纯去请袁绍。长史耿武谏曰:“袁绍孤客穷军,仰我鼻息,譬如婴儿在股掌之上,绝其乳哺,立可饿死。奈何欲以州事委之?此引虎入羊群也。”馥曰:“吾乃袁氏之故吏,才能又不如本初。古者择贤者而让之,诸君何嫉妒耶?”耿武叹曰:“冀州休矣!”于是弃职而去者三十余人。独耿武与关纯伏于城外,以待袁绍。数日后,绍引兵至。耿武、关纯拔刀而出,欲刺杀绍。绍将颜良立斩耿武,文丑砍死关纯。绍入冀州,以馥为奋威将军,以田丰、沮授、许攸、逢纪分掌州事,尽夺韩馥之权。馥懊悔无及,遂弃下家小,匹马往投陈留太守张邈去了。却说公孙瓒知袁绍已据冀州,遣弟公孙越来见绍,欲分其地。绍曰:“可请汝兄自来,吾有商议。”越辞归。行不到五十里,道旁闪出一彪军马,口称:“我乃董丞相家将也!”乱箭射死公孙越。从人逃回见公孙瓒,报越已死。瓒大怒曰:“袁绍诱我起兵攻韩馥,他却就里取事;今又诈董卓兵射死吾弟,此冤如何不报!”尽起本部兵,杀奔冀州来。绍知瓒兵至,亦领军出。二军会于磐河之上:绍军于磐河桥东,瓒军于桥西。瓒立马桥上,大呼曰:“背义之徒,何敢卖我!
该模型耗时: 122.43553447723389 秒
**
("entity" : "孙坚" : "人物" : "孙坚被刘表围住,后得程普、黄盖、韩当三将死救得脱,自此与刘表结怨。")
("entity" : "刘表" : "人物" : "刘表围住孙坚,与孙坚结怨。")
("entity" : "程普" : "人物" : "程普是孙坚的部将,参与救援孙坚。")
("entity" : "黄盖" : "人物" : "黄盖是孙坚的部将,参与救援孙坚。")
("entity" : "韩当" : "人物" : "韩当是孙坚的部将,参与救援孙坚。")
("entity" : "江东" : "地点" : "孙坚被刘表围住后,夺路引兵回江东。")
("entity" : "袁绍" : "人物" : "袁绍屯兵河内,缺少粮草,后听取逢纪建议,图谋冀州。")
("entity" : "河内" : "地点" : "袁绍屯兵之处。")
("entity" : "冀州" : "地点" : "冀州牧韩馥所在地,袁绍图谋之地。")
("entity" : "韩馥" : "人物" : "冀州牧,遣人送粮以资袁绍军用,后被袁绍用计夺取冀州。")
("entity" : "逢纪" : "人物" : "袁绍的谋士,建议袁绍图谋冀州。")
("entity" : "公孙瓒" : "人物" : "被袁绍暗使人驰书约其进兵取冀州,后知袁绍已据冀州,遣弟公孙越来见绍欲分其地。")
("entity" : "荀谌" : "人物" : "韩馥的谋士,建议韩馥请袁绍同治州事。")
("entity" : "辛评" : "人物" : "韩馥的谋士,与荀谌一同商议应对公孙瓒之策。")
("entity" : "刘备" : "人物" : "助公孙瓒攻冀州。")
("entity" : "关羽" : "人物" : "助公孙瓒攻冀州,与刘备、张飞一同。")
("entity" : "张飞" : "人物" : "助公孙瓒攻冀州,与刘备、关羽一同。")
("entity" : "关纯" : "人物" : "韩馥差别驾去请袁绍,后被文丑砍死。")
("entity" : "耿武" : "人物" : "韩馥的长史,谏阻韩馥请袁绍同治州事,后被颜良立斩。")
("entity" : "颜良" : "人物" : "袁绍的将领,立斩耿武。")
("entity" : "文丑" : "人物" : "袁绍的将领,砍死关纯。")
("entity" : "奋威将军" : "职位" : "韩馥被袁绍入冀州后所任的职位。")
("entity" : "田丰" : "人物" : "袁绍入冀州后,分掌州事之一。")
("entity" : "沮授" : "人物" : "袁绍入冀州后,分掌州事之一。")
("entity" : "许攸" : "人物" : "袁绍入冀州后,分掌州事之一。")
("entity" : "陈留" : "地点" : "韩馥弃下家小后往投陈留太守张邈之处。")
("entity" : "张邈" : "人物" : "陈留太守,韩馥往投之人。")
("entity" : "公孙越" : "人物" : "公孙瓒之弟,被董丞相家将乱箭射死。")
("entity" : "董丞相" : "人物" : "家将射死公孙越,未明确指出具体身份,但可推断为当时有权势的丞相,如董卓。")
("entity" : "磐河" : "地点" : "公孙瓒与袁绍二军会战之处。")
**
("relationship" : "孙坚" : "刘表" : "敌对" : "孙坚被刘表围住,后结怨。" : 8)
("relationship" : "孙坚" : "程普" : "部将" : "程普是孙坚的部将,参与救援孙坚。" : 9)
("relationship" : "孙坚" : "黄盖" : "部将" : "黄盖是孙坚的部将,参与救援孙坚。" : 9)
("relationship" : "孙坚" : "韩当" : "部将" : "韩当是孙坚的部将,参与救援孙坚。" : 9)
("relationship" : "孙坚" : "江东" : "撤退至" : "孙坚被刘表围住后,夺路引兵回江东。" : 7)
("relationship" : "袁绍" : "逢纪" : "谋士" : "逢纪是袁绍的谋士,建议袁绍图谋冀州。" : 8)
("relationship" : "袁绍" : "冀州" : "图谋" : "袁绍听取逢纪建议,图谋冀州。" : 8)
("relationship" : "袁绍" : "韩馥" : "敌对-利用" : "袁绍用计夺取韩馥的冀州。" : 7)
("relationship" : "韩馥" : "荀谌" : "谋士" : "荀谌是韩馥的谋士,建议韩馥请袁绍同治州事。" : 7)
("relationship" : "韩馥" : "辛评" : "谋士" : "辛评是韩馥的谋士,与荀谌一同商议应对公孙瓒之策。" : 7)
("relationship" : "公孙瓒" : "刘备" : "盟友" : "刘备助公孙瓒攻冀州。" : 6)
("relationship" : "公孙瓒" : "关羽" : "盟友" : "关羽助公孙瓒攻冀州。" : 6)
("relationship" : "公孙瓒" : "张飞" : "盟友" : "张飞助公孙瓒攻冀州。" : 6)
("relationship" : "韩馥" : "关纯" : "派遣" : "韩馥差别驾关纯去请袁绍。" : 7)
("relationship" : "韩馥" : "耿武" : "部属" : "耿武是韩馥的长史,谏阻韩馥请袁绍同治州事。" : 7)
("relationship" : "耿武" : "颜良" : "敌对" : "颜良立斩耿武。" : 9)
("relationship" : "关纯" : "文丑" : "敌对" : "文丑砍死关纯。" : 9)
("relationship" : "袁绍" : "奋威将军" : "任命" : "袁绍入冀州后,任命韩馥为奋威将军。" : 7)
("relationship" : "袁绍" : "田丰" : "分掌州事" : "袁绍入冀州后,田丰分掌州事之一。" : 7)
("relationship" : "袁绍" : "沮授" : "分掌州事" : "袁绍入冀州后,沮授分掌州事之一。" : 7)
("relationship" : "袁绍" : "许攸" : "分掌州事" : "袁绍入冀州后,许攸分掌州事之一。" : 7)
("relationship" : "韩馥" : "张邈" : "投奔" : "韩馥弃下家小后往投陈留太守张邈。" : 6)
("relationship" : "公孙瓒" : "公孙越" : "兄弟" : "公孙越是公孙瓒之弟。" : 9)
("relationship" : "公孙越" : "董丞相" : "敌对" : "公孙越被董丞相家将乱箭射死。" : 8)
("relationship" : "公孙瓒" : "袁绍" : "敌对" : "公孙瓒与袁绍因冀州问题敌对。" : 8)
("relationship" : "公孙瓒" : "磐河" : "会战" : "公孙瓒与袁绍二军会战于磐河之上。" : 7)
("relationship" : "袁绍" : "磐河" : "会战" : "公孙瓒与袁绍二军会战于磐河之上。" : 7)
**
import re
import pandas as pd
# 使用正则表达式提取结构化数据
pattern = r'''
\( # 匹配开始括号
"relationship"\s*:\s* # 固定前缀
"([^"]+)"\s*:\s* # 捕获组1:source(匹配除"外的任意字符)
"([^"]+)"\s*:\s* # 捕获组2:target
"([^"]+)"\s*:\s* # 捕获组3:type
"((?:[^"]|\\")*)"\s*:\s* # 捕获组4:description(允许转义引号)
(\d+) # 捕获组5:weight(数字)
\) # 匹配结束括号
'''
result_matches = []
for text in results:
# 使用正则表达式查找所有匹配项(启用详细模式和忽略空格)
matches = re.findall(pattern, text, re.VERBOSE)
result_matches.extend(matches)
df = pd.DataFrame(result_matcheget', 'type', 'description', 'weight'])
import networkx as nx
import matplotlib.pyplot as plt
# Create a knowledge graph
G = nx.Graph()
for _, row in df.iterrows():
G.add_edge(row['source'], row['target'], label=row['type'], weight=row['weight'])
# 绘制节点(实体)和边(关系)以及它们的标签
# Visualize the knowledge graph
pos = nx.spring_layout(G, seed=42, k=0.9)
labels = nx.get_edge_attributes(G, 'label')
plt.figure(figsize=(20, 10))
nx.draw(G, pos, with_labels=True, font_size=10, node_size=700, node_color='lightblue', edge_color='gray', alpha=0.6)
nx.draw_networkx_edge_labels(G, pos, edge_labels=labels, font_size=8, label_pos=0.3, verticalalignment='baseline')
plt.title('Relation for SanGuo')
plt.show()
53AI,企业落地大模型首选服务商
产品:场景落地咨询+大模型应用平台+行业解决方案
承诺:免费场景POC验证,效果验证后签署服务协议。零风险落地应用大模型,已交付160+中大型企业
2025-05-26
知识图谱焕发生机,激发大模型LLM深层次推理 —— 昨天,今天和明天
2025-05-25
AI 在落地之前,要先投资几个数据库
2025-05-23
Agent Infra 图谱:哪些组件值得为 Agent 重做一遍?
2025-05-23
Graph-RAG全面综述:如何用知识图谱+大模型解决信息检索难题?
2025-05-22
无需代码!MCP + Neo4j 如何颠覆知识图谱构建?
2025-05-21
从部署到迁移,怎么用好Milvus,这是我们的经验总结
2025-05-20
知识图谱+多模态:以药物预测为例
2025-05-20
AI搜索与向量数据-模型是如何将信息和数据编码成知识的?
2024-07-17
2025-01-02
2024-08-13
2024-08-27
2024-07-11
2025-01-03
2024-06-24
2024-07-13
2024-06-10
2024-07-12
2025-05-23
2025-05-23
2025-05-22
2025-05-20
2025-04-20
2025-04-15
2025-04-09
2025-03-29