support qkdim!=vdim by chang-wenbin · Pull Request #8023 · PaddlePaddle/FastDeploy

chang-wenbin · 2026-06-08T12:48:42Z

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

codecov-commenter · 2026-06-08T13:24:52Z

Codecov Report

❌ Patch coverage is 52.77778% with 17 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@edc885d). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/model_executor/layers/linear.py	57.69%	10 Missing and 1 partial ⚠️
...l_executor/layers/attention/append_attn_backend.py	50.00%	2 Missing and 2 partials ⚠️
...astdeploy/model_executor/ops/triton_ops/do_rope.py	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #8023   +/-   ##
==========================================
  Coverage           ?   67.67%           
==========================================
  Files              ?      471           
  Lines              ?    66360           
  Branches           ?    10217           
==========================================
  Hits               ?    44912           
  Misses             ?    18576           
  Partials           ?     2872

Flag	Coverage Δ
GPU	`77.73% <52.77%> (?)`
XPU	`6.99% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot · 2026-06-09T00:24:20Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-06-22 12:17:55 UTC+08:00

CI报告基于以下代码生成（30分钟更新一次）:
PR commit: 94fa1f9 | Merge base: edc885d (branch: develop)

1 Required任务 : 8/10 通过

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
42(0)	42	37	5	0	0	0

任务	错误类型	置信度	日志
`Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage`	PR问题（疑似）	低	Job
`Extracted partial CE model tasks to run in CI. / run_ce_cases`	PR问题（疑似）	低	Job

2 失败详情

🔴 Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — PR问题（疑似）（置信度: 低）

分析器: 通用分析(fallback)

失败用例:

用例	错误摘要
未获取到具体 pytest 用例	深度日志未返回 `log_file_path` / `unittest_details`，只确认 job 以 exit code 8 失败

关键日志:

[FAILURE]: Process completed with exit code 8.
深度日志: （日志获取失败，无法提取错误信息）
.github/workflows/_unit_test_coverage.yml:246 bash scripts/coverage_run.sh || TEST_EXIT_CODE=8
.github/workflows/_unit_test_coverage.yml:364-378 TEST_EXIT_CODE=8 时退出单测 job

根因摘要: 单测失败但日志缺失，疑似 qkdim/vdim 适配问题。
coverage_run.sh 的 exit code 8 表示单测阶段存在失败，不是 diff coverage 阈值失败。PR 修改集中在 QKV/V 维度切分、RoPE/cache 写入和 tests/model_executor/test_linear.py；其中 QKVParallelLinear.load_weight 在 kv_num_heads < tp_size 分支仍用 head_dim 切分 v_tensor，append_attention_with_output 也没有 only_do_attn 参数，和 external_norm_rope=True 的手动 norm/rope/cache 流程存在潜在冲突。由于失败日志没有具体 traceback，以上为关联变更推断，不能确认具体 failing case。

修复建议:

先重新拉取或补充 failed_tests.log / unittest_logs.tar.gz，确认具体失败用例。
若复现于 qkdim/vdim 场景，重点验证 fastdeploy/model_executor/layers/linear.py 中 V 权重按 v_head_dim 切分，以及 fastdeploy/model_executor/layers/attention/append_attn_backend.py 中 with-output 路径是否重复 norm/rope/cache。
补充 v_head_dim != head_dim 且 tp_size > 1 的单测，现有新增测试仍使用 head_dim == v_head_dim，无法覆盖本 PR 目标场景。

关联变更: fastdeploy/model_executor/layers/linear.py, fastdeploy/model_executor/layers/attention/append_attn_backend.py, fastdeploy/model_executor/ops/triton_ops/do_rope.py, tests/model_executor/test_linear.py

🔴 Extracted partial CE model tasks to run in CI. / run_ce_cases — PR问题（疑似）（置信度: 低）

分析器: 通用分析(fallback)

失败用例:

用例	错误摘要
未获取到具体 CE 用例	深度日志未返回 `log_file_path` / `unittest_details`，只确认 job 以 exit code 123 失败

关键日志:

[FAILURE]: Process completed with exit code 123.
深度日志: （日志获取失败，无法提取错误信息）
.github/workflows/_pre_ce_test.yml:206 bash scripts/run_pre_ce.sh
scripts/run_pre_ce.sh:36 timeout 600 python -m pytest --disable-warnings -sv "$file"

根因摘要: CE 任务失败但日志缺失，疑似同一组 attention/权重加载改动影响真实模型场景。
run_pre_ce.sh 会遍历 tests/ci_use/*/test_*.py 并在首个失败文件处退出，但当前工具未拿到失败文件名或 traceback。PR 改动触达真实模型推理路径，尤其 QKVGateParallelLinear.gate_weight_loader 的 TP 切分仍按 head_dim 计算 block size，而 gate shard 大小已改为 v_head_dim；如果 CE 覆盖 v_head_dim != head_dim 模型，可能触发 shape/权重加载或推理异常。该判断缺少日志佐证，置信度低。

修复建议:

先复跑或补充 CE job 日志，定位首个失败的 tests/ci_use/*/test_*.py 文件。
若失败模型使用 v_head_dim != head_dim，优先检查 QKV/Gate 权重加载的 offset、block size 和 cache value shape 是否全链路一致。
给 CE 覆盖的模型场景补充最小化单测，避免只在长耗时 CE 中暴露维度不一致问题。

关联变更: fastdeploy/model_executor/layers/linear.py, fastdeploy/model_executor/layers/attention/append_attn_backend.py, fastdeploy/model_executor/ops/triton_ops/do_rope.py

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-06-11 02:07:41 Asia/Shanghai

📋 Review 摘要

PR 概述：为 attention/QKV 路径补充 qkdim != vdim 支持。
变更范围：AppendAttention backend、QKV/QKVG linear loader、Triton RoPE、线性层单测。
影响面 Tag：[OP] [KVCache] [Loader]

问题

级别	文件	概述
🔴 Bug	`fastdeploy/model_executor/layers/attention/append_attn_backend.py:324`	`v_head_dim != head_dim` 会无条件执行 q/k RMSNorm，未启用 qk norm 的模型会把 `None` 传进 Triton kernel
🟡 建议	`tests/model_executor/test_linear.py:291`	新增测试仍使用 `v_head_dim == head_dim`，没有覆盖本 PR 的核心维度不等分支

📝 PR 规范检查

标题缺少官方 Tag，且 PR 描述的 Motivation / Modifications / Usage or Command / Accuracy Tests 仍未填写。

标题建议（可直接复制）：

[OP] Support qkdim != vdim in attention and QKV loading

PR 描述建议（点击展开，可直接复制）

## Motivation
Support models whose Q/K head dimension differs from V head dimension (`qkdim != vdim`) in FastDeploy attention and QKV projection paths.

## Modifications
- Add `v_head_dim` to attention backend constructors and use it for value cache shape.
- Pass `model_config.v_head_dim` from GPU model runner to the selected attention backend.
- Update QKV/QKVG parallel linear output sizing and V shard placement to use `v_head_dim`.

## Usage or Command
N/A

## Accuracy Tests
N/A. Current PR does not include accuracy or regression test results.

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [x] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

PaddlePaddle-bot · 2026-06-10T18:12:14Z

-
-        if getattr(layer, "only_do_attn", False):
+        if self.external_norm_rope:
+            qk_rmsnorm_fused(


这里把 external_norm_rope 只和 v_head_dim != head_dim 绑定后，会对所有 qk/v 维度不同的层执行 qk_rmsnorm_fused。但 Attention 默认 use_qk_norm=False，只有开启时才会创建 q_norm_weight / k_norm_weight；现有多处 Attention(...) 构造没有传 use_qk_norm。这些模型一旦配置 v_head_dim != head_dim，这里会把 None 作为 Triton 指针传入，qk_rmsnorm_fused_kernel 随后 tl.load(q_weight_ptr + ...) 会直接失败。建议把“需要外部 rope/write_cache”和“需要 q/k norm”拆开：只有权重存在或 layer.use_qk_norm 为真时才跑 fused norm；没有 q/k norm 的模型仍应只做 RoPE/write_cache。

PaddlePaddle-bot · 2026-06-10T18:12:14Z

        kv_num_heads_per_rank=1,
        num_kv_head_replicas=2,
+        head_dim=2,
+        v_head_dim=2,


这个新增用例仍然设置 v_head_dim=2 且 head_dim=2，因此不会覆盖本 PR 最关键的 qkdim != vdim 分支：V shard size、param offset、shared KV slice 等仍按旧路径通过。建议至少加入 v_head_dim != head_dim（例如 head_dim=2, v_head_dim=3）的 fused/split load 断言，验证 V 段和后续 offset/gate 不重叠。

support qkdim!=vdim

44f547c

chang-wenbin had a problem deploying to Metax_ci June 8, 2026 12:48 — with GitHub Actions Failure

Merge remote-tracking branch 'origin/develop' into qkdim_vdim

fc06ae2

This comment was marked as outdated.

Sign in to view

chang-wenbin added 2 commits June 9, 2026 11:58

Merge remote-tracking branch 'origin/develop' into qkdim_vdim

19a7044

support gqa qkdim=192 vdim=128

2958fda

chang-wenbin had a problem deploying to Metax_ci June 10, 2026 05:01 — with GitHub Actions Failure

chang-wenbin added 2 commits June 10, 2026 17:40

support qkdim!=vdim

303ad42

merge develop

29dce63

chang-wenbin had a problem deploying to Metax_ci June 10, 2026 10:30 — with GitHub Actions Error

update v_head_dim

db7d260

chang-wenbin had a problem deploying to Metax_ci June 10, 2026 10:32 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

fix & update

94fa1f9

chang-wenbin had a problem deploying to Metax_ci June 10, 2026 11:06 — with GitHub Actions Failure

chang-wenbin requested a review from PaddlePaddle-bot June 10, 2026 12:17

PaddlePaddle-bot suggested changes Jun 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support qkdim!=vdim#8023

support qkdim!=vdim#8023
chang-wenbin wants to merge 8 commits into
PaddlePaddle:developfrom
chang-wenbin:qkdim_vdim

chang-wenbin commented Jun 8, 2026

Uh oh!

codecov-commenter commented Jun 8, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot Jun 10, 2026

Uh oh!

PaddlePaddle-bot Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

chang-wenbin commented Jun 8, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

codecov-commenter commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1 Required任务 : 8/10 通过

2 失败详情

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

📝 PR 规范检查

Uh oh!

PaddlePaddle-bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented Jun 8, 2026 •

edited

Loading

PaddlePaddle-bot commented Jun 9, 2026 •

edited

Loading