Skip to content

Reranker & Embedding: single-QPC support with KV cache eliminated#1045

Open
quic-amitraj wants to merge 1 commit into
quic:mainfrom
quic-amitraj:reranker_single_qpc
Open

Reranker & Embedding: single-QPC support with KV cache eliminated#1045
quic-amitraj wants to merge 1 commit into
quic:mainfrom
quic-amitraj:reranker_single_qpc

Conversation

@quic-amitraj

@quic-amitraj quic-amitraj commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR finalizes the Qwen3-VL reranker/embedding single-QPC flow and validation updates:

  • Keeps single-QPC behavior as the default path (kv_offload=False) for reranker/embedding use cases.
  • Includes updated example scripts and supporting model/runtime wiring for Qwen3-VL reranker + embedding flows.
  • Includes updated configs and test coverage for both MAD parity and unit validation.

Validation Environment

  • qwen-vl-utils: 0.0.14

Example Validation Results

Script Model Result Notes
examples/embeddings/qwen3vl/qwen3_vl_embedding.py Qwen/Qwen3-VL-Embedding-8B PASS End-to-end run completed and produced similarity scores.
examples/reranker/qwen3vl/qwen3_vl_reranker.py Qwen/Qwen3-VL-Reranker-2B PASS End-to-end run completed and produced reranker scores.
examples/reranker/qwen3vl/qwen3_vl_reranker.py Qwen/Qwen3-VL-Reranker-8B PASS End-to-end run completed and produced reranker scores.

@quic-amitraj quic-amitraj changed the title Reranker single qpc Reranker & Embedding: single-QPC support with KV cache eliminated Jun 6, 2026
@quic-amitraj quic-amitraj force-pushed the reranker_single_qpc branch from 52be851 to 3941ee0 Compare June 6, 2026 07:21
@quic-amitraj quic-amitraj marked this pull request as ready for review June 6, 2026 07:40
@quic-amitraj quic-amitraj force-pushed the reranker_single_qpc branch 3 times, most recently from d484b02 to b265e87 Compare June 6, 2026 14:00
@quic-rishinr quic-rishinr added the 1.22 Release 1.22 candidate label Jun 9, 2026
@quic-amitraj quic-amitraj force-pushed the reranker_single_qpc branch 2 times, most recently from dea8b26 to a7f8f9e Compare June 11, 2026 03:50
@quic-rishinr quic-rishinr force-pushed the release/v1.22.0_tmp branch 2 times, most recently from 8273e7f to e9330a8 Compare June 11, 2026 09:22
@quic-rishinr quic-rishinr changed the base branch from release/v1.22.0_tmp to main June 15, 2026 05:59
@quic-amitraj quic-amitraj force-pushed the reranker_single_qpc branch 4 times, most recently from 1920a28 to 86dd92a Compare June 15, 2026 06:39
"additional_params": {}
},
{
"model_name": "Qwen/Qwen3-VL-Reranker-8B",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update to dummy models

@@ -0,0 +1,28 @@
[
{
"model_name": "Qwen/Qwen3-VL-Reranker-2B",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update to dummy models

@quic-amitraj quic-amitraj force-pushed the reranker_single_qpc branch 2 times, most recently from 87f95d6 to 8ed9fda Compare June 17, 2026 12:15
Signed-off-by: Amit Raj <amitraj@qti.qualcomm.com>
@quic-amitraj quic-amitraj force-pushed the reranker_single_qpc branch from 8ed9fda to 468df71 Compare June 18, 2026 10:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

1.22 Release 1.22 candidate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants