微信扫码
添加专属顾问
我要投稿
pip install -U optimum[neural-compressor] intel-extension-for-transformers
def quantize(model_name: str, output_path: str, calibration_set: "datasets.Dataset"):
model = AutoModel.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
def preprocess_function(examples):
return tokenizer(examples["text"], padding="max_length", max_length=512, truncation=True)
vectorized_ds = calibration_set.map(preprocess_function, num_proc=10)
vectorized_ds = vectorized_ds.remove_columns(["text"])
quantizer = INCQuantizer.from_pretrained(model)
quantization_config = PostTrainingQuantConfig(approach="static", backend="ipex", domain="nlp")
quantizer.quantize(
quantization_config=quantization_config,
calibration_dataset=vectorized_ds,
save_directory=output_path,
batch_size=1,
)
tokenizer.save_pretrained(output_path)
# 数据集地址https://huggingface.co/datasets/allenai/qasper
from optimum.intel import IPEXModelmodel = IPEXModel.from_pretrained("Intel/bge-small-en-v1.5-rag-int8-static")
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("Intel/bge-small-en-v1.5-rag-int8-static")
inputs = tokenizer(sentences, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
# get the [CLS] token
embeddings = outputs[0][:, 0]
从上面的结果可以看出,通过量化后模型的延迟和吞吐量都有大幅提升。大家是不是学会的呢。下篇我们继续介绍一个相关工具,辅助我们高效管理RAG流程。
53AI,企业落地大模型首选服务商
产品:场景落地咨询+大模型应用平台+行业解决方案
承诺:免费场景POC验证,效果验证后签署服务协议。零风险落地应用大模型,已交付160+中大型企业
2025-04-30
聊聊AI智能体框架MetaGPT下的RAG实践
2025-04-30
如何用大模型+RAG给宠物做一个AI健康助手(干货分享)?
2025-04-30
HiRAG:基于层级知识索引和检索的高精度RAG
2025-04-29
教程|通义Qwen 3 +Milvus,混合推理模型才是优化RAG成本的最佳范式
2025-04-29
RAG开发框架LangChain与LlamaIndex对比解析:谁更适合你的AI应用?
2025-04-29
RAG性能暴增20%!清华等推出“以笔记为中心”的深度检索增强生成框架,复杂问答效果飙升
2025-04-29
超神了,ChatWiki 支持GraphRAG,让 AI 具备垂直深度推理能力!
2025-04-29
AI 产品思维:我如何把一个 AI 应用从基础 RAG 升级到 multi-agent 架构
2024-10-27
2024-09-04
2024-07-18
2024-05-05
2024-06-20
2024-06-13
2024-07-09
2024-07-09
2024-05-19
2024-07-07
2025-04-30
2025-04-29
2025-04-29
2025-04-26
2025-04-25
2025-04-22
2025-04-22
2025-04-20