[BugFix][PD Disaggregation] remove redundant block allocation of prefill tasks in decode instance by liyonghua0910 · Pull Request #8022 · PaddlePaddle/FastDeploy

liyonghua0910 · 2026-06-08T11:36:50Z

Motivation

In the disaggregated prefill-decode serving mode (splitwise_role != "mixed"), the decode instance incorrectly applied the mixed-role block reservation logic when determining whether to admit a new prefill request. This caused the scheduler to over-provision KV-cache blocks for running decode requests, unnecessarily blocking prefill task admission and reducing throughput. Additionally, get_new_block_nums could return a negative block count when a request already had sufficient blocks allocated, potentially leading to downstream scheduling errors.

Modifications

fastdeploy/engine/sched/resource_manager_v1.py
1. _get_can_schedule_prefill_threshold_block: Return num_chunk_new_block directly when enabling prefill-decode disaggregation, skipping the reserve-block estimation for running requests.
2. get_new_block_nums: Clamp the result to non-negative, preventing negative values when num_computed_tokens + num_new_tokens fits within already-allocated blocks.

Usage or Command

No new configuration or command required. The fix applies automatically in disaggregated prefill-decode deployments.

Accuracy Tests

This PR only modifies scheduling logic (block allocation thresholds) and does not affect model forward computation or kernel outputs, so accuracy is not impacted.

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

… instance

codecov-commenter · 2026-06-08T12:16:35Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@f18078f). Learn more about missing BASE report.

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #8022   +/-   ##
==========================================
  Coverage           ?   66.84%           
==========================================
  Files              ?      468           
  Lines              ?    65992           
  Branches           ?    10176           
==========================================
  Hits               ?    44110           
  Misses             ?    19070           
  Partials           ?     2812

Flag	Coverage Δ
GPU	`76.83% <100.00%> (?)`
XPU	`7.02% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-06-08 21:50:29

📋 Review 摘要

PR 概述：修复 decode 实例中 prefill 任务已预分配 block 后仍可能计算出冗余/负数新增 block，以及非 mixed 角色调度阈值不再为 running decode 预留额外 block。
变更范围：fastdeploy/engine/sched/resource_manager_v1.py
影响面 Tag：[Scheduler]

问题

级别	文件	概述
-	-	未发现阻塞性问题。PR 规范问题在下面章节报，不在这里重复。

📝 PR 规范检查

存在 PR 规范问题：当前 PR 标题包含两个官方 Tag，而 checklist §D1 要求标题必须且仅包含一个官方 Tag；PR 描述保留了模板结构，但 Motivation、Modifications、Usage or Command、Accuracy Tests 仍为空或仅为占位内容，Checklist 也未按实际情况勾选。

标题建议（可直接复制）：

[BugFix] remove redundant block allocation of prefill tasks in decode instance

PR 描述建议（点击展开，可直接复制）

## Motivation
修复 PD Disaggregation decode 实例为 prefill 任务做资源预分配时，由调度阈值和已分配 block 计算导致的冗余 block 需求，避免已有 block_tables 多于本次 chunk 需求时继续请求额外 KV block。

## Modifications
- `fastdeploy/engine/sched/resource_manager_v1.py`: 在 `get_new_block_nums()` 中将新增 block 数量裁剪为非负值。
- `fastdeploy/engine/sched/resource_manager_v1.py`: 在 `_get_can_schedule_prefill_threshold_block()` 中让非 `mixed` splitwise 角色直接按本次 chunk 需求检查可用 block，不再叠加 mixed 模式的 running decode 预留阈值。

## Usage or Command
N/A

## Accuracy Tests
N/A

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [x] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

代码变更范围很小，针对已预分配和非 mixed 角色的 block 阈值处理与调用链语义一致；未发现需要阻塞合入的代码问题。建议补齐 PR 规范信息，尤其是单 Tag 标题和验证/测试说明。

PaddlePaddle-bot · 2026-06-09T00:14:54Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-06-09 08:14:04 UTC+08:00

CI报告基于以下代码生成（30分钟更新一次）:
PR commit: ddffd8c | Merge base: f18078f (branch: develop)

1 Required任务 : 10/10 通过

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
41(0)	41	38	3	0	0	0

无

2 失败详情

无

[BugFix] remove redundant block allocation of prefill tasks in decode…

ddffd8c

… instance

liyonghua0910 had a problem deploying to Metax_ci June 8, 2026 11:36 — with GitHub Actions Failure

liyonghua0910 changed the title ~~[BugFix] remove redundant block allocation of prefill tasks in decode instance~~ [BugFix][PD Disaggregation] remove redundant block allocation of prefill tasks in decode instance Jun 8, 2026

juncaipeng approved these changes Jun 8, 2026

View reviewed changes

PaddlePaddle-bot reviewed Jun 8, 2026

View reviewed changes

liyonghua0910 merged commit 055b623 into PaddlePaddle:develop Jun 9, 2026
39 of 42 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix][PD Disaggregation] remove redundant block allocation of prefill tasks in decode instance#8022

[BugFix][PD Disaggregation] remove redundant block allocation of prefill tasks in decode instance#8022
liyonghua0910 merged 1 commit into
PaddlePaddle:developfrom
liyonghua0910:develop+20260608_fix_decode_scheduler

liyonghua0910 commented Jun 8, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Jun 8, 2026 •

edited

Loading

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot commented Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

liyonghua0910 commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

codecov-commenter commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

📝 PR 规范检查

总体评价

Uh oh!

PaddlePaddle-bot commented Jun 9, 2026

1 Required任务 : 10/10 通过

2 失败详情

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

liyonghua0910 commented Jun 8, 2026 •

edited

Loading

codecov-commenter commented Jun 8, 2026 •

edited

Loading