微信扫码
添加专属顾问
我要投稿
深度学习技术的新突破,DeepSeek-R1-671B部署成功案例分享。 核心内容: 1. DeepSeek-R1-671B在四台服务器上的成功部署 2. 提供咨询和部署业务的详细说明 3. 部署过程优化及成功案例展示
本人成功在4台服务器成功部署满血版DeepSeek-R1-671B,相关信息简介如下,现可承接咨询指导或部署业务订单,部署过程正在逐步优化完善,大家可相互一起学习。以下有部署成功后的相关内容展示。
(self-llm) deepseek@deepseek2:~$ curl http://10.119.85.138:8000/metrics
...
540 0 # TYPE python_gc_objects_collected_total counter
0 7756k python_gc_objects_collected_total{generation="0"} 37427.0
0 --:python_gc_objects_collected_total{generation="1"} 14232.0
--:-- --:--:-- python_gc_objects_collected_total{generation="2"} 16818.0
--:--:-- 9615k
# HELP python_gc_objects_uncollectable_total Uncollectable objects found during GC
# TYPE python_gc_objects_uncollectable_total counter
python_gc_objects_uncollectable_total{generation="0"} 0.0
python_gc_objects_uncollectable_total{generation="1"} 0.0
python_gc_objects_uncollectable_total{generation="2"} 0.0
# HELP python_gc_collections_total Number of times this generation was collected
# TYPE python_gc_collections_total counter
python_gc_collections_total{generation="0"} 3033.0
python_gc_collections_total{generation="1"} 267.0
python_gc_collections_total{generation="2"} 315.0
...
#其中 10.119.85.138 是deepseek2节点的IB网卡IP
(self-llm) deepseek@deepseek2:~$ curl 10.119.85.138:8000/v1/models -H "Authorization: Bearer zY0MrQwXV9Oo3g==" | jq
#输出内容如下
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 523 100 523 0 0 105k 0 --:--:-- --:--:-- --:--:-- 127k
{
"object": "list",
"data": [
{
"id": "DeepSeek-R1-671B",
"object": "model",
"created": 1740405511,
"owned_by": "vllm",
"root": "/root/.cache/huggingface/hub/models/unsloth/DeepSeek-R1-BF16/",
"parent": null,
"max_model_len": 32768,
"permission": [
{
"id": "modelperm-ced685e8156b4618b593580109205165",
"object": "model_permission",
"created": 1740405511,
"allow_create_engine": false,
"allow_sampling": true,
"allow_logprobs": true,
"allow_search_indices": false,
"allow_view": true,
"allow_fine_tuning": false,
"organization": "*",
"group": null,
"is_blocking": false
}
]
}
]
}
同时在vllm serve命令执行的窗口会看到如下输出内容
(self-llm) deepseek@deepseek2:~$ curl -X POST "http://10.119.85.138:8000/v1/chat/completions" -H "Content-Type: application/json" -H "Authorization: Bearer zY0MrQwXV9Oo3g==" -d '{ "model": "DeepSeek-R1-671B", "messages": [{"role": "user", "content": "你好"}]}'
(self-llm) deepseek@deepseek2:~$ curl -X POST "http://10.119.85.138:8000/v1/chat/completions" -H "Content-Type: application/json" -H "Authorization: Bearer zY0MrQwXV9Oo3g==" -d '{ "model": "DeepSeek-R1-671B", "messages": [{"role": "user", "content": "请证明勾股定理"}]}'
#回答
{"id":"chatcmpl-11ae1ddf321343af848b5c683e67b72d","object":"chat.completion","created":1740411348,"model":"deepseek-r1","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"<think>\n嗯,用户让我证明勾股定理。勾股定理是数学里非常基础但又重要的定理,肯定有很多不同的证明方法。先回忆一下,勾股定理是说在直角三角形中,斜边的平方等于两条直角边的平方和,也就是a² + b² = c²。现在得选一
种合适的证明方式,可能是几何的或者代数的。\n\n首先想到的是几何证明中的拼接法,通过将四个直角三角形拼成一个大的正方形,然后比较面积。我要不要试试这个方法?比如说,四个全等的直角三角形,设它们的直角边为a和b,
斜边为c,拼起来的话中间应该会形成一个边长为(a+b)的正方形,中间的空隙可能是小正方形,边长是c或者别的?\n\n不对,应该会形成一个边长为c的正方形,或者这个?等一下,可能得仔细画个图想象一下。假设我们把四个三角
形每个的直角边朝外,那斜边就会组成里面的一个正方形,这时候
...
#回答问题的同时在vllm serve命令执行的窗口会看到如下,显示token平均生成吞吐率
INFO 02-24 17:21:12 metrics.py:455] Avg prompt throughput: 1.6 tokens/s, Avg generation throughput: 36.5 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
INFO 02-24 17:21:17 metrics.py:455] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 37.2 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
INFO 02-24 17:21:22 metrics.py:455] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 36.5 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.1%, CPU KV cache usage: 0.0%.
...
#甚至更高速度
INFO 02-24 23:32:00 metrics.py:455] Avg prompt throughput: 442.9 tokens/s, Avg generation throughput: 38.8 tokens/s, Running: 3 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.4%, CPU KV cache usage: 0.0%.
INFO 02-24 23:32:05 metrics.py:455] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 102.4 tokens/s, Running: 3 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.4%, CPU KV cache usage: 0.0%.
INFO 02-24 23:32:07 async_llm_engine.py:179] Finished request chatcmpl-03add50cba264c84afe98fd6cce9907f.
INFO 02-24 23:32:10 metrics.py:455] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 79.4 tokens/s, Running: 2 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.3%, CPU KV cache usage: 0.0%.
#apt install nvtop
(self-llm) deepseek@deepseek1:~/installPkgs$ nvtop
#如下是`nvtop`命令输出
#其中 10.119.85.138 是deepseek2节点的IB网卡IP
(self-llm) deepseek@deepseek2:~$ curl http://10.119.85.138:18080
#或在浏览器中直接访问上述地址。第一个注册的用户,默认就是管理员。注册后登录、提问
注:
(1)其中的万兆网卡部署过程中没有用到。
(2)NVIDIA A800的信息如下
deepseek@deepseek1:~$ nvidia-smi
Fri Feb 21 09:25:35 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A800-SXM4-80GB On | 00000000:3D:00.0 Off | 0 |
| N/A 33C P0 61W / 400W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA A800-SXM4-80GB On | 00000000:42:00.0 Off | 0 |
| N/A 29C P0 58W / 400W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA A800-SXM4-80GB On | 00000000:61:00.0 Off | 0 |
| N/A 30C P0 61W / 400W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA A800-SXM4-80GB On | 00000000:67:00.0 Off | 0 |
| N/A 33C P0 64W / 400W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 4 NVIDIA A800-SXM4-80GB On | 00000000:AD:00.0 Off | 0 |
| N/A 32C P0 57W / 400W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 5 NVIDIA A800-SXM4-80GB On | 00000000:B1:00.0 Off | 0 |
| N/A 29C P0 61W / 400W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 6 NVIDIA A800-SXM4-80GB On | 00000000:D0:00.0 Off | 0 |
| N/A 30C P0 62W / 400W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 7 NVIDIA A800-SXM4-80GB On | 00000000:D3:00.0 Off | 0 |
| N/A 32C P0 60W / 400W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
物理服务器操作系统:Ubuntu 22.04.4 LTS-x86_64
Nvidia driver version: 550.90.07
CUDA runtime version: 12.1.105(node容器内)、V12.4.99(物理服务器上)
nvidia-fabricmanager版本:550.90.07
nvlink:3.0
nvswitch:2.0
PyTorch version: 2.5.1+cu124
CUDA used to build PyTorch: 12.4
OS: Ubuntu 22.04.3 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
CMake version: version 3.31.4
Libc version: glibc-2.35
Python version: 3.12.9 (main, Feb 5 2025, 08:49:00) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.15.0-113-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA_MODULE_LOADING set to: LAZY
Is XNNPACK available: True
CPU: Intel(R) Xeon(R) Gold 6348 CPU @ 2.60GHz, 112核心
numpy==1.26.4
torch==2.5.1
torchaudio==2.5.1
torchvision==0.20.1
triton==3.1.0
53AI,企业落地大模型首选服务商
产品:场景落地咨询+大模型应用平台+行业解决方案
承诺:免费场景POC验证,效果验证后签署服务协议。零风险落地应用大模型,已交付160+中大型企业
2025-02-04
2025-02-04
2024-09-18
2024-07-11
2024-07-09
2024-07-11
2024-07-26
2025-02-05
2025-01-27
2025-02-01
2025-04-30
2025-04-27
2025-04-27
2025-04-23
2025-04-20
2025-04-01
2025-03-31
2025-03-20