Skip to content

Consider optimizing RightSemi to eliminate dups from build side #22930

@neilconway

Description

@neilconway

Is your feature request related to a problem or challenge?

#22914 means that we will no longer produce intermediate duplicate output rows for RightSemi, but we still store duplicate build-side rows in the hash join build side. We could consider eliminating that. This would reduce hash join memory consumption, but the tradeoff is that we might do some wasted work if we spend time eliminating dups that would never participate in the join in the first place. Merits some further study; we could perhaps make this conditional on the fraction of duplicate hash values we observe on the build side as we execute the operator.

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request
No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions