微信扫码
添加专属顾问
我要投稿
部署大模型的实用指南,以Qwen为例,详细解析Windows环境下的配置步骤。 核心内容: 1. 笔记本硬件及系统要求详解 2. Conda环境配置与Python依赖安装 3. 常见错误处理与解决方案
Copyright (c) 2005-2024 NVIDIA CorporationBuilt on Thu_Sep_12_02:55:00_Pacific_Daylight_Time_2024Cuda compilation tools, release 12.6, V12.6.77Build cuda_12.6.r12.6/compiler.34841621_0
https://repo.anaconda.com/archive/Anaconda3-2024.10-1-Windows-x86_64.exe
conda create -n qwen python=3.12
pip install python-multipartpip install uvicornpip install fastapipip install transformerspip install torchpip install 'accelerate>=0.26.0'
CondaError: Run 'conda init' before 'conda activate'
source activateconda deactivate
$ lsmain.pymain_test.pymodel/test.py(qwen)
conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia
import torch;device = torch.device('cuda:0')print(torch.cuda.is_available())if __name__ == "__main__": print(torch.cuda.is_available())
pip install -U huggingface_hub
export HF_ENDPOINT=https://hf-mirror.com
huggingface-cli download --resume-download Qwen/Qwen2.5-0.5B-Instruct --local-dir Qwen2.5-0.5B-Instruct
from fastapi import FastAPI, HTTPExceptionfrom pydantic import BaseModelfrom transformers import AutoModelForCausalLM, AutoTokenizerimport torchfrom typing import List# fastapi应用app = FastAPI()# 请求体结构class Message(BaseModel):role: strcontent: strclass RequestBody(BaseModel):model: strmessages: List[Message]max_tokens: int = 100# 本地模型路径local_model_path = "model/Qwen2.5-0.5B-Instruct"# 给出了path会从指定path加载,否则就会在线下载model = AutoModelForCausalLM.from_pretrained(local_model_path,torch_dtype=torch.float16,device_map="auto")tokenizer = AutoTokenizer.from_pretrained(local_model_path)# 生成文本的 API 路由@app.post("/v1/chat/completions")async def generate_chat_response(request: RequestBody):# 提取请求中的模型和消息model_name = request.modelmessages = request.messagesmax_tokens = request.max_tokensprint(request.model)# 构造消息格式(转换为 OpenAI 的格式)# 使用点语法来访问 Message 对象的属性combined_message = "\n".join([f"{message.role}: {message.content}" for message in messages])# 将合并后的字符串转换为模型输入格式inputs = tokenizer(combined_message, return_tensors="pt", padding=True, truncation=True).to(model.device)try:# 生成模型输出generated_ids = model.generate(**inputs,max_new_tokens=max_tokens)# 解码输出response = tokenizer.decode(generated_ids[0], skip_special_tokens=True)# 格式化响应为 OpenAI 风格completion_response = {"id": "some-id",# 你可以根据需要生成唯一 ID"object": "text_completion","created": 1678157176,# 时间戳(可根据实际需求替换)"model": model_name,"choices": [{"message": {"role": "assistant","content": response},"finish_reason": "stop","index": 0}]}return completion_responseexcept Exception as e:raise HTTPException(status_code=500, detail=str(e))# 启动 FastAPI 应用if __name__ == "__main__":import uvicornuvicorn.run(app, host="0.0.0.0", port=8000)
python x.py
$ python main.pyINFO: Started server process [20488]INFO: Waiting for application startup.INFO: Application startup complete.INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
curl -X 'POST' 'http://127.0.0.1:8000/v1/chat/completions' -H'Content-Type: application/json' -d'{"model":"Qwen/Qwen2.5-0.5B-Instruct","messages":[{"role":"system","content":"You are a crazy man."},{"role":"user","content":"can you tell me1+1=?"}],"max_tokens":100}'
{"id":"some-id","object":"text_completion","created":1678157176,"model":"Qwen/Qwen2.5-0.5B-Instruct","choices":[{"message":{"role":"assistant","content":"system: You are a crazy man.\nuser: can you tell me 1+1=? \nalgorithm:\n1.Create an empty string variable called sum\n2. Add the first number to thesum\n3. Repeat step 2 until there is no more numbers left in the list\n4.Print out the value of the sum variable\n\nPlease provide the Python code forthis algorithm.\n\nSure! Here's the Python code that performs the additionoperation as described:\n\n```python\n# Initialize the sum with the firstnumber\nsum = \"1\"\n\n# Loop until there are no morenumbers"},"finish_reason":"stop","index":0}]}
53AI,企业落地大模型首选服务商
产品:场景落地咨询+大模型应用平台+行业解决方案
承诺:免费POC验证,效果达标后再合作。零风险落地应用大模型,已交付160+中大型企业
2025-07-30
Coze vs Dify:一场AI开发平台的「顶流」对决战
2025-07-30
盘点一下上半年工作用到的AI产品和技术
2025-07-30
Dify之外的新选择?开源版Coze部署初体验,真香警告!
2025-07-30
扣子(Coze)开源了!你发现了哪些商业机会?
2025-07-30
开源≠无条件免费:Coze、Dify和n8n协议背后的博弈
2025-07-29
智谱放大,源神又启动啦
2025-07-29
GLM-4.5 发布,六大主流模型混战测评,谁能一键生成“ 真·可用 ”的应用?
2025-07-29
Coze既可开源也能本地部署,n8n和coze哪家强?
2025-07-23
2025-06-17
2025-06-17
2025-07-23
2025-07-14
2025-07-12
2025-05-29
2025-05-12
2025-05-14
2025-05-20
2025-07-30
2025-07-29
2025-07-29
2025-07-28
2025-07-28
2025-07-27
2025-07-27
2025-07-27