我要投稿

Qwen3又又又又发布新模型Qwen3-Coder-Flash，小参数MoE-30B-A3B，平替480B

发布日期：2025-08-01 09:45:13 浏览次数： 2003

作者：包包算法笔记

微信搜一搜，关注“包包算法笔记”

qwen团队也学会了一周7天每天更新，从拆分混合思考开始，连续几个工作日发布模型。混合思考看来不是免费的午餐，其他各家不知道会不会跟着下掉。

以下为hf页面原文：

Highlight

Qwen3-Coder-30B-A3B-Instruct发布，官方起名叫Qwen3-Coder-Flash，小参数3B激活同时兼具效果和效率，主要改进如下：

在 Agentic Coding、Agentic Browser-Use 及其他基础编码任务上，于开放模型中表现突出。
原生支持 256K tokens 的长上下文，借助 Yarn 可扩展至 1M tokens，专为仓库级理解而优化。
支持 Agentic Coding，兼容 Qwen Code、CLINE 等主流平台，并采用专门设计的函数调用格式。

模型概览

Qwen3-Coder-30B-A3B-Instruct 具备以下特征：

类型：Causal Language Models
训练阶段：Pretraining & Post-training
总参数量：30.5B，其中激活参数 3.3B
层数：48
Attention Heads（GQA）：Q 为 32，KV 为 4
Experts 数量：128
激活的 Experts 数量：8
上下文长度：原生 262,144

注意：该模型仅支持非思考模式，不会在输出中生成 <think></think> 区块，因此不再需要设置 enable_thinking=False。

如需了解基准评估、硬件需求及推理性能等更多细节，请参阅我们的博客、GitHub 与文档。

快速开始

建议使用最新版 transformers。
若使用 transformers<4.51.0，将出现以下错误：

下方代码片段演示如何基于给定输入使用模型生成内容。

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen3-Coder-30B-A3B-Instruct"

# 加载 tokenizer 与模型
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# 准备模型输入
prompt = "Write a quick sort algorithm."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# 进行文本补全
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=65536
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()

content = tokenizer.decode(output_ids, skip_special_tokens=True)

print("content:", content)

提示：如遇内存不足 (OOM) 问题，可将上下文长度缩小至如 32,768。

本地使用时，Ollama、LMStudio、MLX-LM、llama.cpp 与 KTransformers 均已支持 Qwen3。

Agentic Coding

Qwen3-Coder 在工具调用方面表现出色。

可如下例所示简单定义或使用任意工具。

# 你的工具实现
def square_the_number(num: float) -> dict:
    return num ** 2

# 定义 Tools
tools=[
    {
        "type":"function",
        "function":{
            "name": "square_the_number",
            "description": "output the square of the number.",
            "parameters": {
                "type": "object",
                "required": ["input_num"],
                "properties": {
                    'input_num': {
                        'type': 'number',
                        'description': 'input_num is a number that will be squared'
                        }
                },
            }
        }
    }
]

import OpenAI
# 定义 LLM
client = OpenAI(
    # 使用与 OpenAI API 兼容的自定义端点
    base_url='http://localhost:8000/v1',  # api_base
    api_key="EMPTY"
)

messages = [{'role': 'user', 'content': 'square the number 1024'}]

completion = client.chat.completions.create(
    messages=messages,
    model="Qwen3-Coder-30B-A3B-Instruct",
    max_tokens=65536,
    tools=tools,
)

print(completion.choice[0])

最佳实践

为获得最佳表现，建议按以下设置：

采样参数：

建议 temperature=0.7、top_p=0.8、top_k=20、repetition_penalty=1.05。
充足的输出长度：大多数查询建议使用 65,536 tokens 的输出长度，这对 instruct 模型已足够。

引用

若我们的工作对您有帮助，欢迎引用。

@misc{qwen3technicalreport,
      title={Qwen3 Technical Report},
      author={Qwen Team},
      year={2025},
      eprint={2505.09388},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.09388},
}