You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge?
To evaluate a semi join, we support two orientations: LeftSemi or RightSemi (analogously for anti and mark joins; I'll just refer to semijoins here to simplify the discussion). Under RightSemi, we build the non-preserved ("filter") input and stream the preserved input; these are swapped for LeftSemi. While it might seem like these two orientations are symmetrical, there are actually significant differences in evaluation behavior between them:
The build-side hash table has to be resident in memory; all else being equal, building the smaller join input is a good general rule, and that's the main rule we follow today.
RightSemi only needs to store the join keys for the build side; LeftSemi needs to store wider rows. By definition, the consumer of a semijoin can't be interested in any values from the filter side of the join. So even if the filter side has more rows than the preserved side, building the hash table on the filter side might still require less memory.
RightSemi preserves the partitioning of the preserved input, whereas LeftSemi + CollectLeft emits with UnknownPartitioning.
RightSemi works better with dynamic filter pushdown: I don't know the dynamic filter code super well, but I'd imagine that since RightSemi builds the filter side before streaming the preserved side, that gives us more information we can use to push down filters into the preserved-side scan.
RightSemi allows emitting join results incrementally: as we see each probe row, we can immediately determine if it should be output or not. Whereas LeftSemi consumes the entire non-preserved side, marking which of the preserved-side rows matched, and only at the end of the non-preserved input stream can we do a pass over the matched bitmap to determine which preserved-side rows to emit. This is not fundamental though; probably worth fixing LeftSemi to emit incrementally (Consider optimizing LeftSemi to emit output incrementally #22929)
The current optimizer rules don't reflect this:
LeftSemi and RightSemi are considered symmetrically; whichever semijoin input is predicted to be smaller is placed on the build side
If there are absent stats, LeftSemi is chosen
I think revising these rules as follows would make more sense:
Prefer RightSemi over LeftSemi, unless the non-preserved input is k times larger than the preserved input. Choosing k is a bit arbitrary, but a value in the range of 2-4 seems reasonable.
Is your feature request related to a problem or challenge?
To evaluate a semi join, we support two orientations:
LeftSemiorRightSemi(analogously for anti and mark joins; I'll just refer to semijoins here to simplify the discussion). UnderRightSemi, we build the non-preserved ("filter") input and stream the preserved input; these are swapped forLeftSemi. While it might seem like these two orientations are symmetrical, there are actually significant differences in evaluation behavior between them:RightSemionly needs to store the join keys for the build side;LeftSemineeds to store wider rows. By definition, the consumer of a semijoin can't be interested in any values from the filter side of the join. So even if the filter side has more rows than the preserved side, building the hash table on the filter side might still require less memory.RightSemipreserves the partitioning of the preserved input, whereasLeftSemi+CollectLeftemits withUnknownPartitioning.RightSemiworks better with dynamic filter pushdown: I don't know the dynamic filter code super well, but I'd imagine that sinceRightSemibuilds the filter side before streaming the preserved side, that gives us more information we can use to push down filters into the preserved-side scan.Two additional factors that might change:
RightSemionly needs to build on distinct values from the non-preserved side. In the future, we can optimizeRightSemito discard duplicate build-side rows. We don't do that today but we might in the future (Consider optimizingRightSemito eliminate dups from build side #22930)RightSemiallows emitting join results incrementally: as we see each probe row, we can immediately determine if it should be output or not. WhereasLeftSemiconsumes the entire non-preserved side, marking which of the preserved-side rows matched, and only at the end of the non-preserved input stream can we do a pass over the matched bitmap to determine which preserved-side rows to emit. This is not fundamental though; probably worth fixing LeftSemi to emit incrementally (Consider optimizingLeftSemito emit output incrementally #22929)The current optimizer rules don't reflect this:
LeftSemiandRightSemiare considered symmetrically; whichever semijoin input is predicted to be smaller is placed on the build sideLeftSemiis chosenI think revising these rules as follows would make more sense:
RightSemioverLeftSemi, unless the non-preserved input is k times larger than the preserved input. Choosingkis a bit arbitrary, but a value in the range of 2-4 seems reasonable.RightSemiDescribe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response