[VL] Fix broadcast hash table reuse for reused exchanges by wecharyu · Pull Request #12264 · apache/gluten

wecharyu · 2026-06-08T16:24:35Z

What changes are proposed in this pull request?

Cache hash table data by droppedDuplicates flag in build side relation because the reused relation could generate different hash table data.

How was this patch tested?

Add UT.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Codex GPT-5.5

github-actions · 2026-06-08T16:48:17Z

Run Gluten Clickhouse CI on x86

liujiayi771 · 2026-06-09T11:19:14Z

    }
  }

-  override def doCanonicalizeForBroadcastMode(mode: BroadcastMode): BroadcastMode = {


Is this part still needed when buildHashTableOncePerExecutor is disabled?

doCanonicalizeForBroadcastMode() is not invoked anywhere, broadcast canonicalization goes through ColumnarBroadcastExchangeExec.doCanonicalize().

This code is actually useful — when buildHashTableOncePerExecutor is disabled, it provides more opportunities to reuse broadcast exchanges. Moreover, I now think that even when buildHashTableOncePerExecutor is enabled, the comment in doCanonicalizeForBroadcastMode still holds true: we still broadcast byte arrays and build HashRelation at the executor side.
@JkSelf Can you explain why this was removed? This allows us to reuse broadcast exchanges for different build keys with the same data.
We should either restore the code before ColumnarBroadcastExchangeExec.doCanonicalize, or at least follow the original logic when buildHashTableOncePerExecutor is disabled.

Replied in https://github.com/apache/gluten/pull/8931/changes#r3385848526.

@JkSelf Thank you for the explanation, I understand now. @wecharyu According to the instructions here, you can restore the original behavior of doCanonicalizeForBroadcastMode when enableBroadcastBuildOncePerExecutor=false.

github-actions · 2026-06-09T13:17:41Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-06-09T16:07:35Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-06-09T16:45:41Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-06-10T02:09:27Z

Run Gluten Clickhouse CI on x86

JkSelf · 2026-06-10T06:57:25Z

@wecharyu Thanks for your work.

For ColumnarBroadcastExchangeExec, reuse is already decided by doCanonicalize(). It now directly uses mode.canonicalized, and that canonicalized BroadcastMode already captures the key BHJ semantics such as bound build keys and null-aware behavior. So we no longer need this handling for the unique broadcastId.

wecharyu · 2026-06-10T08:24:54Z

@JkSelf Thanks for the prompt response. HashedRelationBroadcastMode does not contains joinType, which could also change the build hash table data. For example in this PR's test the LEFT SEMI JOIN and INNER JOIN would use the same hash table, which cause the test failed before this PR. https://github.com/apache/spark/blob/62ae4db28f3be8a0ca2c3016d27ca5a62f02915d/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L1152-L1153

And since current Gluten is still broadcast the raw build plan data bytes instead of hash table,I agree with @liujiayi771 that BroadcastExchangeExec can be reused for the same build plan even when the build keys differ. This reuse avoids broadcasting the same data multiple times.

I think now we can make following improvements:

Move doCanonicalizeForBroadcastMode() back to reuse the broadcast exchange as much as possible.
Associate the constructed hash table with the plan_id, build_keys, drop_duplicates, and null_aware flag, as these attributes uniquely identify the hash table.

JkSelf · 2026-06-11T09:46:32Z

HashedRelationBroadcastMode does not contains joinType, which could also change the build hash table data. For example in this PR's test the LEFT SEMI JOIN and INNER JOIN would use the same hash table, which cause the test failed before this PR.

Gluten's implementation is the same as vanilla Spark's. Just out of curiosity, can Spark pass this failed test?

wecharyu · 2026-06-14T16:16:26Z

Gluten's implementation is the same as vanilla Spark's. Just out of curiosity, can Spark pass this failed test?

Vanilla Spark does share the same hash relation, but it does not drop duplicates, so the two types of joins will not generate data issue. But Gluten native build hash table will drop duplicates for the LEFT SEMI JOIN, and the hash table is reused by the INNER JOIN, which cause the issue.

github-actions · 2026-06-16T05:10:06Z

Run Gluten Clickhouse CI on x86

…erExecutor is disabled

github-actions · 2026-06-16T07:24:39Z

Run Gluten Clickhouse CI on x86

wecharyu · 2026-06-16T07:27:14Z

@JkSelf @liujiayi771 Pls take a look again when you have time. Thanks!

JkSelf · 2026-06-16T08:12:12Z

+case class BroadcastHashTable(
+    pointer: Long,
+    relation: BuildSideRelation,
+    droppedDuplicates: Boolean)


Do we need to consider ExistenceJoin and null-aware anti join value? Could you help to add tests to cover? Thanks.

wecharyu mentioned this pull request Jun 8, 2026

[GLUTEN-7548][VL] Optimize BHJ in velox backend #8931

Merged

github-actions Bot added CORE works for Gluten Core VELOX labels Jun 8, 2026

liujiayi771 reviewed Jun 9, 2026

View reviewed changes

wecharyu force-pushed the fix_shared_bhj_table branch from 075dc89 to 7552606 Compare June 9, 2026 16:07

wecharyu force-pushed the fix_shared_bhj_table branch from d7d69eb to d88b181 Compare June 10, 2026 02:08

wecharyu added 2 commits June 16, 2026 13:06

[VL] Fix broadcast hash table reuse for reused exchanges

93730a3

remove unused enableHashTableBuildOncePerExecutor()

701f053

wecharyu force-pushed the fix_shared_bhj_table branch from d88b181 to 8e7e9b5 Compare June 16, 2026 05:09

restore doCanonicalizeForBroadcastMode when enableBroadcastBuildOnceP…

da8f1f6

…erExecutor is disabled

wecharyu force-pushed the fix_shared_bhj_table branch from 8e7e9b5 to da8f1f6 Compare June 16, 2026 07:24

JkSelf reviewed Jun 16, 2026

View reviewed changes

Conversation

wecharyu commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes are proposed in this pull request?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

liujiayi771 Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

wecharyu Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

liujiayi771 Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

JkSelf Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

liujiayi771 Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

wecharyu Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 9, 2026

Uh oh!

github-actions Bot commented Jun 9, 2026

Uh oh!

github-actions Bot commented Jun 9, 2026

Uh oh!

github-actions Bot commented Jun 10, 2026

Uh oh!

JkSelf commented Jun 10, 2026

Uh oh!

wecharyu commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JkSelf commented Jun 11, 2026

Uh oh!

wecharyu commented Jun 14, 2026

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

wecharyu commented Jun 16, 2026

Uh oh!

JkSelf Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wecharyu commented Jun 8, 2026 •

edited

Loading

wecharyu commented Jun 10, 2026 •

edited

Loading