Skip to content

[Cherry-Pick][BugFix][PD Disaggregation] remove redundant block allocation of prefill tasks in decode instance (#8022)#8021

Merged
liyonghua0910 merged 1 commit into
PaddlePaddle:release/2.6from
liyonghua0910:release/2.6+20260608_fix_decode_scheduler
Jun 9, 2026
Merged

[Cherry-Pick][BugFix][PD Disaggregation] remove redundant block allocation of prefill tasks in decode instance (#8022)#8021
liyonghua0910 merged 1 commit into
PaddlePaddle:release/2.6from
liyonghua0910:release/2.6+20260608_fix_decode_scheduler

Conversation

@liyonghua0910

@liyonghua0910 liyonghua0910 commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Motivation

In the disaggregated prefill-decode serving mode (splitwise_role != "mixed"), the decode instance incorrectly applied the mixed-role block reservation logic when determining whether to admit a new prefill request. This caused the scheduler to over-provision KV-cache blocks for running decode requests, unnecessarily blocking prefill task admission and reducing throughput. Additionally, get_new_block_nums could return a negative block count when a request already had sufficient blocks allocated, potentially leading to downstream scheduling errors.

Modifications

  • fastdeploy/engine/sched/resource_manager_v1.py
    1. _get_can_schedule_prefill_threshold_block: Return num_chunk_new_block directly when enabling prefill-decode disaggregation, skipping the reserve-block estimation for running requests.
    2. get_new_block_nums: Clamp the result to non-negative, preventing negative values when num_computed_tokens + num_new_tokens fits within already-allocated blocks.

Usage or Command

No new configuration or command required. The fix applies automatically in disaggregated prefill-decode deployments.

Accuracy Tests

This PR only modifies scheduling logic (block allocation thresholds) and does not affect model forward computation or kernel outputs, so accuracy is not impacted.

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@liyonghua0910 liyonghua0910 changed the title [BugFix][PD Disaggregation] remove redundant block allocation of prefill tasks in decode instance [Cherry-Pick][BugFix][PD Disaggregation] remove redundant block allocation of prefill tasks in decode instance (#8022) Jun 8, 2026
@codecov-commenter

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (release/2.6@a869a06). Learn more about missing BASE report.

Additional details and impacted files
@@              Coverage Diff               @@
##             release/2.6    #8021   +/-   ##
==============================================
  Coverage               ?   71.55%           
==============================================
  Files                  ?      386           
  Lines                  ?    55740           
  Branches               ?     8753           
==============================================
  Hits                   ?    39886           
  Misses                 ?    13034           
  Partials               ?     2820           
Flag Coverage Δ
GPU 71.55% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-06-08 21:37:20

📋 Review 摘要

PR 概述:调整 ResourceManagerV1 的 prefill block 计算与调度阈值,避免 PD 分离 decode 实例为 prefill 任务重复预留/申请 block。
变更范围fastdeploy/engine/sched/resource_manager_v1.py
影响面 Tag[Scheduler] [PD Disaggregation]

问题

级别 文件 概述
- - 未发现阻塞性问题。PR 规范问题在下面章节报,不要在这里重复。

📝 PR 规范检查

不符合规范:目标分支是 release/2.6,但标题未使用 release PR 要求的 Cherry-Pick 格式;同时 PR 描述中的 MotivationModificationsUsage or CommandAccuracy Tests 仍为空模板,Checklist 也未按实际情况勾选。当前 PR 信息和提交信息都未提供原始 develop PR ID,因此标题中的原 PR 号需要作者补充。

标题建议(补充原始 develop PR ID 后可直接复制):

  • [Cherry-Pick][BugFix] remove redundant block allocation of prefill tasks in decode instance(#原始develop PR号)
PR 描述建议(点击展开,可直接复制)
## Motivation
Fix redundant KV-cache block reservation/allocation for prefill tasks on decode instances in PD disaggregation. Decode-only instances should not apply the mixed-role prefill reserve threshold when checking whether a prefill-related task can be admitted.

## Modifications
- `fastdeploy/engine/sched/resource_manager_v1.py`: clamp `get_new_block_nums()` after speculative block adjustment so over-allocated requests do not request another redundant block.
- `fastdeploy/engine/sched/resource_manager_v1.py`: limit `_get_can_schedule_prefill_threshold_block()` reserve logic to `splitwise_role == "mixed"`; prefill/decode roles use the actual new block count.
- `fastdeploy/engine/sched/resource_manager_v1.py`: unify running prefill scheduling threshold calculation through `_get_can_schedule_prefill_threshold_block()`.

## Usage or Command
N/A

## Accuracy Tests
N/A

## Checklist

- [ ] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

调度逻辑修改范围较小,代码层面看起来与避免 decode 角色冗余 prefill block reserve 的目标一致。合入前请补齐 release Cherry-Pick 元信息和 PR 描述,便于维护者确认来源与验证范围。

@liyonghua0910 liyonghua0910 merged commit bb122aa into PaddlePaddle:release/2.6 Jun 9, 2026
50 of 55 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants