我要投稿

已成功部署满血版DeepSeek-R1-671B,可承接咨询或部署业务

发布日期：2025-03-01 05:26:09 浏览次数： 1798

作者：蒋小颖乱侃

微信搜一搜，关注“蒋小颖乱侃”

本人成功在4台服务器成功部署满血版DeepSeek-R1-671B，相关信息简介如下，现可承接咨询指导或部署业务订单，部署过程正在逐步优化完善，大家可相互一起学习。以下有部署成功后的相关内容展示。

满血版DeepSeek-R1-671B内容展示

Ray集群状态

Production Metrics

(self-llm) deepseek@deepseek2:~$ curl http://10.119.85.138:8000/metrics
...
540    0     # TYPE python_gc_objects_collected_total counter
0  7756k   python_gc_objects_collected_total{generation="0"} 37427.0
   0 --:python_gc_objects_collected_total{generation="1"} 14232.0
--:-- --:--:-- python_gc_objects_collected_total{generation="2"} 16818.0
--:--:-- 9615k
# HELP python_gc_objects_uncollectable_total Uncollectable objects found during GC
# TYPE python_gc_objects_uncollectable_total counter
python_gc_objects_uncollectable_total{generation="0"} 0.0
python_gc_objects_uncollectable_total{generation="1"} 0.0
python_gc_objects_uncollectable_total{generation="2"} 0.0
# HELP python_gc_collections_total Number of times this generation was collected
# TYPE python_gc_collections_total counter
python_gc_collections_total{generation="0"} 3033.0
python_gc_collections_total{generation="1"} 267.0
python_gc_collections_total{generation="2"} 315.0
...

openai API接口测试

#其中 10.119.85.138 是deepseek2节点的IB网卡IP
(self-llm) deepseek@deepseek2:~$ curl 10.119.85.138:8000/v1/models -H "Authorization: Bearer zY0MrQwXV9Oo3g==" | jq  
#输出内容如下
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   523  100   523    0     0   105k      0 --:--:-- --:--:-- --:--:--  127k
{
  "object": "list",
  "data": [
    {
      "id": "DeepSeek-R1-671B",
      "object": "model",
      "created": 1740405511,
      "owned_by": "vllm",
      "root": "/root/.cache/huggingface/hub/models/unsloth/DeepSeek-R1-BF16/",
      "parent": null,
      "max_model_len": 32768,
      "permission": [
        {
          "id": "modelperm-ced685e8156b4618b593580109205165",
          "object": "model_permission",
          "created": 1740405511,
          "allow_create_engine": false,
          "allow_sampling": true,
          "allow_logprobs": true,
          "allow_search_indices": false,
          "allow_view": true,
          "allow_fine_tuning": false,
          "organization": "*",
          "group": null,
          "is_blocking": false
        }
      ]
    }
  ]
}

同时在vllm serve命令执行的窗口会看到如下输出内容

服务功能验证

(self-llm) deepseek@deepseek2:~$ curl -X POST "http://10.119.85.138:8000/v1/chat/completions" -H "Content-Type: application/json" -H "Authorization: Bearer zY0MrQwXV9Oo3g=="  -d '{ "model": "DeepSeek-R1-671B", "messages": [{"role": "user", "content": "你好"}]}'
    
(self-llm) deepseek@deepseek2:~$ curl -X POST "http://10.119.85.138:8000/v1/chat/completions" -H "Content-Type: application/json" -H "Authorization: Bearer zY0MrQwXV9Oo3g=="  -d '{ "model": "DeepSeek-R1-671B", "messages": [{"role": "user", "content": "请证明勾股定理"}]}'
#回答
{"id":"chatcmpl-11ae1ddf321343af848b5c683e67b72d","object":"chat.completion","created":1740411348,"model":"deepseek-r1","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"<think>\n嗯，用户让我证明勾股定理。勾股定理是数学里非常基础但又重要的定理，肯定有很多不同的证明方法。先回忆一下，勾股定理是说在直角三角形中，斜边的平方等于两条直角边的平方和，也就是a² + b² = c²。现在得选一
种合适的证明方式，可能是几何的或者代数的。\n\n首先想到的是几何证明中的拼接法，通过将四个直角三角形拼成一个大的正方形，然后比较面积。我要不要试试这个方法？比如说，四个全等的直角三角形，设它们的直角边为a和b，
斜边为c，拼起来的话中间应该会形成一个边长为（a+b）的正方形，中间的空隙可能是小正方形，边长是c或者别的？\n\n不对，应该会形成一个边长为c的正方形，或者这个？等一下，可能得仔细画个图想象一下。假设我们把四个三角
形每个的直角边朝外，那斜边就会组成里面的一个正方形，这时候
...

#回答问题的同时在vllm serve命令执行的窗口会看到如下，显示token平均生成吞吐率
INFO 02-24 17:21:12 metrics.py:455] Avg prompt throughput: 1.6 tokens/s, Avg generation throughput: 36.5 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
INFO 02-24 17:21:17 metrics.py:455] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 37.2 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
INFO 02-24 17:21:22 metrics.py:455] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 36.5 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.1%, CPU KV cache usage: 0.0%.
...
#甚至更高速度
INFO 02-24 23:32:00 metrics.py:455] Avg prompt throughput: 442.9 tokens/s, Avg generation throughput: 38.8 tokens/s, Running: 3 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.4%, CPU KV cache usage: 0.0%.
INFO 02-24 23:32:05 metrics.py:455] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 102.4 tokens/s, Running: 3 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.4%, CPU KV cache usage: 0.0%.
INFO 02-24 23:32:07 async_llm_engine.py:179] Finished request chatcmpl-03add50cba264c84afe98fd6cce9907f.
INFO 02-24 23:32:10 metrics.py:455] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 79.4 tokens/s, Running: 2 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.3%, CPU KV cache usage: 0.0%.

#apt install nvtop
(self-llm) deepseek@deepseek1:~/installPkgs$ nvtop
#如下是`nvtop`命令输出

open-webui会话界面

#其中 10.119.85.138 是deepseek2节点的IB网卡IP
(self-llm) deepseek@deepseek2:~$ curl http://10.119.85.138:18080
#或在浏览器中直接访问上述地址。第一个注册的用户，默认就是管理员。注册后登录、提问

二、成功部署所用硬软件

服务器信息

注：

（1）其中的万兆网卡部署过程中没有用到。

（2）NVIDIA A800的信息如下

deepseek@deepseek1:~$ nvidia-smi 
Fri Feb 21 09:25:35 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A800-SXM4-80GB          On  |   00000000:3D:00.0 Off |                    0 |
| N/A   33C    P0             61W /  400W |       1MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A800-SXM4-80GB          On  |   00000000:42:00.0 Off |                    0 |
| N/A   29C    P0             58W /  400W |       1MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA A800-SXM4-80GB          On  |   00000000:61:00.0 Off |                    0 |
| N/A   30C    P0             61W /  400W |       1MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA A800-SXM4-80GB          On  |   00000000:67:00.0 Off |                    0 |
| N/A   33C    P0             64W /  400W |       1MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA A800-SXM4-80GB          On  |   00000000:AD:00.0 Off |                    0 |
| N/A   32C    P0             57W /  400W |       1MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA A800-SXM4-80GB          On  |   00000000:B1:00.0 Off |                    0 |
| N/A   29C    P0             61W /  400W |       1MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA A800-SXM4-80GB          On  |   00000000:D0:00.0 Off |                    0 |
| N/A   30C    P0             62W /  400W |       1MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA A800-SXM4-80GB          On  |   00000000:D3:00.0 Off |                    0 |
| N/A   32C    P0             60W /  400W |       1MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

软件信息

物理服务器操作系统：Ubuntu 22.04.4 LTS-x86_64
Nvidia driver version: 550.90.07
CUDA runtime version: 12.1.105(node容器内)、V12.4.99(物理服务器上)
nvidia-fabricmanager版本：550.90.07
nvlink：3.0
nvswitch：2.0

PyTorch version: 2.5.1+cu124
CUDA used to build PyTorch: 12.4
OS: Ubuntu 22.04.3 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
CMake version: version 3.31.4
Libc version: glibc-2.35
Python version: 3.12.9 (main, Feb  5 2025, 08:49:00) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.15.0-113-generic-x86_64-with-glibc2.35
Is CUDA available: True

CUDA_MODULE_LOADING set to: LAZY
Is XNNPACK available: True
CPU: Intel(R) Xeon(R) Gold 6348 CPU @ 2.60GHz, 112核心
numpy==1.26.4
torch==2.5.1
torchaudio==2.5.1
torchvision==0.20.1
triton==3.1.0

53AI，企业落地大模型首选服务商

产品：场景落地咨询+大模型应用平台+行业解决方案

承诺：免费场景POC验证，效果验证后签署服务协议。零风险落地应用大模型，已交付160+中大型企业