我要投稿

超过Qwen，字节首次开源大模型Seed-OSS

发布日期：2025-08-21 09:07:15 浏览次数： 1884

作者：AI小小将

微信搜一搜，关注“AI小小将”

昨晚，字节 Seed 团队开源首个 LLM：Seed-OSS-36B。该模型具备长上下文、推理、智能体和通用能力，开发者友好，而且还主要针对国际化场景进行了优化。尽管仅用 12T 数据训练，依然在多项主流基准上表现优异，并以 Apache-2.0 许可证开放商用。

模型地址：https://huggingface.co/collections/ByteDance-Seed/seed-oss-68a609f4201e788db05b5dcd
代码：https://github.com/ByteDance-Seed/seed-oss

Seed-OSS 为Dense模型，模型参数量为36B，架构设计上采用了 RoPE、GQA 注意力机制、RMSNorm 和 SwiGLU 激活函数，具体模型配置如下所示：

Seed-OSS-36B模型的主要特性如下：

灵活控制推理预算
：支持根据需要灵活调整推理长度，使推理过程能够动态控制，从而提升实际应用场景中的推理效率。
增强的推理能力
：在保持均衡且优异的通用能力的同时，针对推理任务进行了特别优化。
智能体能力
：在工具使用、问题解决等智能体相关任务中表现出色。
研究友好
：考虑到在预训练中引入合成指令数据可能会影响后续研究，同时发布了包含和不包含指令数据的预训练模型，为研究社区提供更多选择。
原生长上下文
：模型原生支持最长 512K 的上下文窗口。

Seed-OSS-36B一大特色是支持用户设置thinking budget参数来灵活指定模型的思考预算，和谷歌的Gemini 2.5 Flash一样。下图展示了在不同任务下，随着思考预算变化的性能曲线。对于较简单的任务（如 IFEval），模型的链式思维（CoT）较短，分数在思考预算增加时会出现波动；而在更具挑战性的任务（如 AIME 和 LiveCodeBench）中，模型的 CoT 更长，分数会随着思考预算的增加而提升。

下面是一个将思考预算设置为 512 的示例：在推理过程中，模型会周期性地进行自我反思，以估算已消耗和剩余的预算，并在预算耗尽或推理完成时给出最终回答：

<seed:think>Got it, let's try to solve this problem step by step. The problem says ... ...<seed:cot_budget_reflect>I have used 129 tokens, and there are 383 tokens remaining for use.</seed:cot_budget_reflect>Using the power rule, ... ...<seed:cot_budget_reflect>I have used 258 tokens, and there are 254 tokens remaining for use.</seed:cot_budget_reflect>Alternatively, remember that ... ...<seed:cot_budget_reflect>I have used 393 tokens, and there are 119 tokens remaining for use.</seed:cot_budget_reflect>Because if ... ...<seed:cot_budget_reflect>I have exhausted my token budget, and now I will start answering the question.</seed:cot_budget_reflect></seed:think>To solve the problem, we start by using the properties of logarithms to simplify the given equations: (full answer omitted).

如果未设置思维预算（默认模式），Seed-OSS 将以无限长度启动思维过程。若指定了思维预算，建议优先选择 512 的整数倍（如 512、1K、2K、4K、8K 或 16K），因为模型在这些区间上经过了大量训练。当思维预算设为 0 时，模型会直接输出回答，这里建议将低于 512 的预算统一设为该值。

Seed-OSS-36B共包括三个模型：Seed-OSS-36B-Base、Seed-OSS-36B-Base-woSyn 和 Seed-OSS-36B-Instruct。前两个为预训练模型，其中Seed-OSS-36B-Base为在预训练中引入合成数据的版本，而Seed-OSS-36B-Base-woSyn则是不含合成数据训练的版本。

Seed-OSS-36B-Base-woSyn在主流基准测试上超过了Qwen3-30B-A3B-Base-2507和Qwen2.5-32B-Base，而加入合成数据的Seed-OSS-36B-Base在性能上有进一步的提升：

后训练版本Seed-OSS-36B-Instruct也在主流基准测试上大部分优于OpenAI的OSS-20B、阿里的Qwen3-30B-A3B-Thinking-2507和Qwen3-32B，以及谷歌的Gemma3-27B：

当前 Seed-OSS 已经提了PR给transformers库，你可以安装制定的transformers库来使用这个模型：

# pip3 install -r requirements.txt# pip install git+ssh://git@github.com/Fazziekey/transformers.git@seed-oss
from transformers import AutoModelForCausalLM, AutoTokenizerimport osimport re
model_name_or_path = "ByteDance-Seed/Seed-OSS-36B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)model = AutoModelForCausalLM.from_pretrained(model_name_or_path, device_map="auto")  # You may want to use bfloat16 and/or move to GPU heremessages = [    {"role": "user", "content": "How to make pasta?"},]tokenized_chat = tokenizer.apply_chat_template(  messages,   tokenize=True,   add_generation_prompt=True,   return_tensors="pt",   thinking_budget=512 # control the thinking budget)
outputs = model.generate(tokenized_chat.to(model.device), max_new_tokens=2048)
output_text = tokenizer.decode(outputs[0])