[Iluvatar] Support CINN for PaddleOCR-VL by converting max_seqlens to Tensor inputs#7997
[Iluvatar] Support CINN for PaddleOCR-VL by converting max_seqlens to Tensor inputs#7997wuyujiji wants to merge 1 commit into
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #7997 +/- ##
==========================================
Coverage ? 67.79%
==========================================
Files ? 475
Lines ? 66613
Branches ? 10261
==========================================
Hits ? 45163
Misses ? 18566
Partials ? 2884
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-06-12 09:58:36
📋 Review 摘要
PR 概述:将 Iluvatar cuinfer_flash_attn_unpadded 的 max_seqlens_q/k 从 scalar attr 改为 Tensor input,并同步 PaddleOCR-VL CINN 文档与 CI 命令。
变更范围:Iluvatar custom op / PaddleOCR-VL attention 调用 / Iluvatar 文档与 CI 脚本
影响面 Tag:[OP] [Iluvatar] [Graph Optimization] [Docs] [CI]
问题
未发现新的阻塞性问题。PR 规范问题在下面章节报,不在这里重复。
历史 Findings 修复情况
| Finding | 问题 | 状态 |
|---|---|---|
| F1 | max_seqlens_q/k Tensor 解引用前缺少 numel 检查 |
|
| F2 | attention forward 热路径中调用 paddle.to_tensor |
📝 PR 规范检查
Modifications、Usage or Command、Accuracy Tests 三节均填写 "Pass",未提供实质内容。建议按模板补全。
标题建议(可直接复制):
[Iluvatar] Support CINN for PaddleOCR-VL by converting max_seqlens to Tensor inputs
PR 描述建议(点击展开,可直接复制)
## Motivation
天数智芯(Iluvatar)平台 `cuinfer_flash_attn_unpadded` 算子原先将 `max_seqlens_q/k` 注册为 scalar attr,导致 CINN 无法处理动态序列长度。本 PR 将其改为 Tensor input,使 PaddleOCR-VL 在 Iluvatar 硬件上可启用 CINN(`graph_opt_level: 2`)。
## Modifications
- `custom_ops/iluvatar_ops/flash_attn_unpadded.cu`:
- `FlashAttnUnpaddedKernel` / `FlashAttnUnpadded` 函数签名:`int max_seqlens_q/k` → `const paddle::Tensor& max_seqlens_q_/k_`
- `PD_BUILD_STATIC_OP`:将 `max_seqlens_q/k` 从 `.Attrs` 移至 `.Inputs`
- `FlashAttnUnpaddedInferShape` / `FlashAttnUnpaddedInferDtype`:新增对应入参
- `custom_ops/setup_ops.py`:Iluvatar 编译标志追加 `-std=c++17`
- `docs/`:更新容器名称、挂载路径及启动命令参数(`max-num-seqs: 240`、`gpu-memory-utilization: 0.7`、`graph_opt_level: 2`)
- `scripts/run_ci_iluvatar.sh`:CI 脚本同步更新 `graph-optimization-config`
## Usage or Command
```bash
python3 -m fastdeploy.entrypoints.openai.api_server \
--model /data1/fastdeploy/PaddleOCR-VL \
--max-model-len 16384 \
--max-num-batched-tokens 16384 \
--max-num-seqs 240 \
--block-size 16 \
--workers 2 \
--gpu-memory-utilization 0.7 \
--graph-optimization-config '{"graph_opt_level":2, "use_cudagraph": true}'
```
## Accuracy Tests
在 Iluvatar 硬件上测试 PaddleOCR-VL 推理精度与启用 CINN 前一致(或附具体指标)。
## Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.总体评价
本轮按风险优先回溯了 PaddleOCR-VL 视觉 encoder 到 Iluvatar custom op 的调用链,max_seqlen Tensor 来源与 CPU Tensor 约束一致;未确认到新的可行内评论问题。已有两个历史建议仍未修复,建议后续一并处理边界校验和热路径 Tensor 构造。
Motivation
天数硬件paddleocr-vl支持CINN
Modifications
N/A
Usage or Command
N/A
Accuracy Tests
N/A
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.