Skip to content

[Optim] Parallel BOS feature download#8018

Merged
Jiang-Jia-Jun merged 3 commits into
PaddlePaddle:release/online/20260415from
xiaoxiaohehe001:speed_mm
Jun 10, 2026
Merged

[Optim] Parallel BOS feature download#8018
Jiang-Jia-Jun merged 3 commits into
PaddlePaddle:release/online/20260415from
xiaoxiaohehe001:speed_mm

Conversation

@xiaoxiaohehe001

@xiaoxiaohehe001 xiaoxiaohehe001 commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

背景

针对多模态 prefill 场景的性能瓶颈进行优化:

  1. BOS 多模态特征下载串行:单请求内多链接(如多帧视频)顺序下载,
    网络等待时间无法掩盖,端到端 TTFT 偏高。

主要修改

1. BOS 特征并行下载(fastdeploy/utils.py + fastdeploy/envs.py

  • download_from_bos 重构为 _fetch_one + ThreadPoolExecutor
    按提交顺序 yield,保证调用方有序拼装(如视频分片)逻辑不变。
  • 单链接 / max_workers<=1 走原串行路径,行为完全一致。
  • 失败时取消未启动任务,已在途任务结果丢弃。
  • 新增环境变量 FD_BOS_DOWNLOAD_PARALLEL(默认 8,设 1 回退串行)。

兼容性

  • 所有新增行为均通过环境变量控制,默认行为与原版本保持一致:
    • FD_BOS_DOWNLOAD_PARALLEL 默认 8,对单链接无影响。

@codecov-commenter

codecov-commenter commented Jun 8, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 69.69697% with 10 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release/online/20260415@aaf6e77). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/utils.py 69.69% 6 Missing and 4 partials ⚠️
Additional details and impacted files
@@                    Coverage Diff                     @@
##             release/online/20260415    #8018   +/-   ##
==========================================================
  Coverage                           ?   72.90%           
==========================================================
  Files                              ?      388           
  Lines                              ?    54158           
  Branches                           ?     8497           
==========================================================
  Hits                               ?    39486           
  Misses                             ?    11944           
  Partials                           ?     2728           
Flag Coverage Δ
GPU 72.90% <69.69%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-06-08 19:23:20

📋 Review 摘要

PR 概述:优化多模态 prefill 的 MM chunk 切分与 BOS feature 下载并行化。
变更范围fastdeploy/engine/common_engine.pyfastdeploy/envs.pyfastdeploy/utils.py
影响面 Tag[Engine] [DataProcessor]

问题

级别 文件 概述
🔴 Bug fastdeploy/engine/common_engine.py:64 Python 均衡切分把连续视频 patch run 当作单个原子块,丢失 grid_thw 行边界切点,长视频可能生成超大 chunk

📝 PR 规范检查

标题不符合规范:[Optim] 不是 checklist 中的官方 Tag;且目标分支是 release/online/20260415,按规范应使用 Cherry-Pick 标题格式。描述也未使用必填模板,并且“其他”中列出的 fastdeploy/model_executor/layers/moe/ep.pyfastdeploy/worker/gpu_model_runner.py 未出现在本次 diff 中,和实现不一致。

标题建议(可直接复制):

  • [Cherry-Pick][Optimization] Balanced MM chunking and parallel BOS feature download(#8018)
PR 描述建议(点击展开,可直接复制)
## Motivation
针对多模态 prefill 场景中 MM 分块不均衡、BOS 多模态特征串行下载导致 TTFT 偏高的问题进行优化。

## Modifications
- `fastdeploy/engine/common_engine.py`: 新增 `FD_MM_CHUNK_STEP` 控制 MM chunk step,并新增 `FD_MM_BALANCED_CHUNKING=1` 控制的 Python 均衡切分路径;默认仍走 `get_mm_split_fuse` kernel。
- `fastdeploy/envs.py`: 新增 `FD_BOS_DOWNLOAD_PARALLEL` 环境变量,控制单请求内 BOS feature 下载并发数。
- `fastdeploy/utils.py`: `download_from_bos` 支持 `max_workers`,多链接场景下通过 `ThreadPoolExecutor` 并发下载,并按提交顺序 yield。

## Usage or Command
- 设置 `FD_MM_BALANCED_CHUNKING=1` 开启 Python 均衡 MM chunking;可通过 `FD_MM_CHUNK_STEP` 调整 MM chunk step。
- 设置 `FD_BOS_DOWNLOAD_PARALLEL=1` 回退串行 BOS 下载;默认值为 8。

## Accuracy Tests
N/A。本 PR 主要修改性能优化和调度/下载路径,未提供精度相关数据。

## Checklist

- [ ] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

本次优化方向明确,但新增的 Python 均衡切分当前没有复刻旧 kernel 基于 grid_thw 行的合法切点语义,会在视频/连续视觉 token 场景下破坏 chunked prefill 的均衡目标。建议先修复该切点生成逻辑,并补充覆盖连续视频 patch run 的单测。

Comment thread fastdeploy/engine/common_engine.py Outdated
# an image patch span: not (is_patch[p-1] and is_patch[p]).
splittable_mask = np.ones(n + 1, dtype=bool)
if n >= 2:
inside = is_patch[:-1] & is_patch[1:]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug 这里把任意连续 image_patch_id 都当成不可切分的一个原子 span,会丢掉视频内部按 grid_thw 行的合法切点。

处理器生成视频时会连续写入视觉 token,同时用 grid_thw / image_type_ids 表示多帧;本函数外层又把 grid_thw 拆成 [2, h, w] 多行。旧 kernel 是按每行 (h * w) // 4 建切点,所以同一个连续视觉 token run 内可以在帧组边界切。现在这段 inside = is_patch[:-1] & is_patch[1:] 会让这些边界全部不可切,长视频会退化成一个超大 chunk,超过 target/mm_chunk_step 并破坏 chunked prefill 的 token 预算与均衡目标。

建议修复方式:
grid_thw_np 生成视觉 span 内的可切点:扫描 input_ids_np 找到每段 patch run 后,按 per_img = (h * w) // 4 的累加位置把 span_start + cumulative 标为 splittable;只禁止落在单个 grid_thw 行内部的切点,而不是禁止整个连续 patch run 内的切点。

@PaddlePaddle-bot

Copy link
Copy Markdown

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-06-09 06:23:59 UTC+08:00

CI报告基于以下代码生成(30分钟更新一次):
PR commit: 7f124b5 | Merge base: aaf6e77 (branch: release/online/20260415)


1 Required任务 : 6/7 通过

注意:这里使用2个表格展示,不要额外文字描述
| 总执行(rerun次数) | 总任务 | ✅ 通过 | ❌ 失败 | ⏳ 运行中 | ⏸️ 等待中 | 跳过 |
|:------:|:------:|:-------:|:-------:|:---------:|:---------:|:-----------:|
| 19(0) | 19 | 18 | 1 | 0 | 0 | 0 |

任务 错误类型 置信度 日志
Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage PR问题:差分覆盖率未达 80% Job

2 失败详情

🔴 Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — PR问题(置信度: 高)

错误类型: PR问题 | 置信度: 高
分析器: 通用分析(fallback)
失败用例: 差分覆盖率校验

用例 错误摘要
diff-cover python_coverage_all.xml --fail-under=80 PR 新增/修改行覆盖率 34%,低于 80% 阈值

关键日志:

Failure. Coverage is below 80%.
fastdeploy/engine/common_engine.py (12.7%): Missing lines 54-56,58,62-66,69-71,73-84,86-91,93-94,97-102,104,107-110,113,115-128,130,847-848,851
fastdeploy/utils.py (81.8%): Missing lines 1155,1163-1164,1186-1188
total_num_lines=104 total_num_violations=68 total_percent_covered=34
TEST_EXIT_CODE=0 COVERAGE_EXIT_CODE=9
  • 根因摘要: PR新增逻辑差分覆盖不足
    PR 中新增的 _balanced_mm_chunks 主体和 FD_MM_BALANCED_CHUNKING 分支几乎没有被单测覆盖,fastdeploy/engine/common_engine.py 差分覆盖率仅 12.7%。download_from_bos 的少量异常分支也未覆盖,但该文件本身达到 81.8%;阻塞合并的主要缺口来自 common_engine.py 新增均衡切分逻辑。

修复建议:

  1. tests/engine/test_common_engine.py_balanced_mm_chunks 增加直接单测,覆盖空输入、连续 image patch 不允许中间切分、grid_thw 图片计数和目标 chunk 约束。
  2. tests/engine/test_common_engine.pyupdate_mm_requests_chunk_size 相关用例中 patch FD_MM_BALANCED_CHUNKING=1FD_MM_CHUNK_STEP,覆盖 fastdeploy/engine/common_engine.py:847fastdeploy/engine/common_engine.py:851 的 balanced 分支。
  3. tests/utils/test_utils.pydownload_from_bos 补充 env fallback 和并行失败取消分支,覆盖 fastdeploy/utils.py:1155fastdeploy/utils.py:1163fastdeploy/utils.py:1164fastdeploy/utils.py:1186fastdeploy/utils.py:1188

关联变更: fastdeploy/engine/common_engine.py:42, fastdeploy/engine/common_engine.py:847, fastdeploy/utils.py:1155

Comment thread fastdeploy/utils.py Outdated
# Sequential path: keep behavior identical for single link or when parallel disabled.
if max_workers <= 1 or len(bos_links) <= 1:
for link in bos_links:
ok, data = _fetch_one(link)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里变量名可以规范下

Comment thread fastdeploy/engine/common_engine.py Outdated
self.partial_chunked_tokens[1],
2048,
)
# mm 切分专用 step:默认与 partial_chunked_tokens[1] 一致;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v1会走到这吗

kevincheng2
kevincheng2 previously approved these changes Jun 9, 2026
Comment thread fastdeploy/engine/common_engine.py Outdated
# 方案2:全局均衡切分(Python 层)。开启后绕过 kernel get_mm_split_fuse,
# 用二分 + 贪心在所有可切点中选 K 个切点,使最大 chunk 长度最小。
# 通过 FD_MM_BALANCED_CHUNKING=1 启用,默认关。
use_balanced = os.getenv("FD_MM_BALANCED_CHUNKING", "0") == "1"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是不是对45-vl、qwen-vl等开源模型也通用?

@xiaoxiaohehe001 xiaoxiaohehe001 changed the title [Optim] Balanced MM chunking & parallel BOS feature download [Optim] Parallel BOS feature download Jun 9, 2026
@Jiang-Jia-Jun Jiang-Jia-Jun merged commit 2160527 into PaddlePaddle:release/online/20260415 Jun 10, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants