[VL]:shuffle: extend TypeAwareCompress to INT128 (DECIMAL HUGEINT)#12299
[VL]:shuffle: extend TypeAwareCompress to INT128 (DECIMAL HUGEINT)#12299hhr293 wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR extends the type-aware compression (TAC) pipeline to support Velox HUGEINT / DECIMAL128 by introducing a 128-bit FFOR-based codec (implemented as two 64-bit sub-streams).
Changes:
- Add TAC type mapping and codec support for 128-bit values (kUInt128 / Velox HUGEINT).
- Refactor 64-bit FFOR block encode/decode into shared helpers and implement a new 128-bit codec on top.
- Add unit tests covering 128-bit round-trips, tails, misalignment, and truncated-input handling.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| cpp/velox/shuffle/VeloxTypeAwareCompress.h | Map Velox HUGEINT to TAC kUInt128. |
| cpp/core/utils/tac/ffor.hpp | Add shared encode/decode helpers and implement 128-bit FFOR codec. |
| cpp/core/utils/tac/TypeAwareCompressCodec.h | Add kUInt128 enum and document 128-bit support. |
| cpp/core/utils/tac/TypeAwareCompressCodec.cc | Extend TAC codec dispatch for kUInt128 in compress/decompress and maxCompressedLen. |
| cpp/core/utils/tac/FForCodec.h | Add 128-bit compress/decompress API surface. |
| cpp/core/utils/tac/FForCodec.cc | Implement 128-bit compress/decompress wrappers and bounds. |
| cpp/core/tests/FForCodecTest.cc | Add tests for TAC kUInt128 behavior and edge cases. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| auto* in = input; | ||
| auto codecId = static_cast<CodecId>(*in++); | ||
| [[maybe_unused]] auto tacType = *in++; | ||
| auto tacType = (*in++); |
| if (outputSize < maxLen) { | ||
| return arrow::Status::Invalid("FForCodec: output buffer too small for 128-bit compression."); | ||
| } |
| FForCodec::compress128(const uint8_t* input, int64_t inputSize, uint8_t* output, int64_t outputSize) { | ||
| if (inputSize == 0) { | ||
| return 0; | ||
| } | ||
| if (inputSize % sizeof(__int128_t) != 0) { | ||
| return arrow::Status::Invalid("FForCodec: input size ", inputSize, " is not a multiple of ", sizeof(__int128_t), "."); | ||
| } | ||
|
|
||
| size_t numValues = inputSize / sizeof(__int128_t); | ||
| auto maxLen = static_cast<int64_t>(ffor::compress128Bound(numValues)); | ||
| if (outputSize < maxLen) { | ||
| return arrow::Status::Invalid( | ||
| "FForCodec: output buffer too small for 128-bit compression (need ", | ||
| maxLen, " bytes, got ", outputSize, ")."); | ||
| } | ||
|
|
||
| auto written = ffor::compress128(input, numValues, output); | ||
| return static_cast<int64_t>(written); | ||
| } |
There was a problem hiding this comment.
The inputSize == 0 early return is correct. When inputSize is 0,
TypeAwareCompressCodec::compress() never calls FForCodec::compress128() in the
first place — the caller checks inputLen before invoking the codec. This guard
is purely defensive (same pattern as the 64-bit compress()).
Even if it were reached, returning 0 bytes written from the codec layer signals
"nothing compressed" to the caller, which is the correct semantic for empty
input. No malformed stream is produced because no stream is produced at all.
| FForCodec::decompress128(const uint8_t* input, int64_t inputSize, uint8_t* output, int64_t outputSize) { | ||
| if (outputSize == 0) { | ||
| return 0; | ||
| } | ||
| if (outputSize % sizeof(__int128_t) != 0) { | ||
| return arrow::Status::Invalid("FForCodec: output size ", outputSize, " is not a multiple of ", sizeof(__int128_t), "."); | ||
| } | ||
|
|
||
| auto nDecoded = ffor::decompress128(input, inputSize, output, static_cast<size_t>(outputSize)); | ||
| return static_cast<int64_t>(nDecoded); | ||
| } |
There was a problem hiding this comment.
When outputSize == 0, the caller is saying "I expect zero values." Returning 0
decoded values is the only correct answer regardless of input content.
Validating the input stream format in this case would be wasted work — even if
the stream is corrupt, producing 0 values into a 0-sized buffer is harmless and
matches caller expectations.
The caller already knows the expected output size from shuffle metadata. If the
stream is actually corrupt, the mismatch will be caught at a higher level when
the actual data (non-zero) fails to decode correctly in subsequent calls.
The TAC codec from apache#11894 only covered INT64. DECIMAL(p>18) is stored as 128-bit HugeInt and previously fell through to LZ4, missing a structural compression opportunity: in OLAP workloads (TPC-H prices, taxes, ...) DECIMAL values usually fit in INT64, so the high 64 bits are 0 and the low 64 bits are narrow -- exactly the pattern FFOR exploits. Implementation: split each 16B value into lo/hi uint64 sub-streams via stride-2 gather into stack-allocated scratch buffers, then run the existing 64-bit FFOR encoder on each. Wire format reuses the 64-bit per-block (bw, count, base) header twice -- the stream is self- describing, no hi/lo length prefix needed. hi sub-streams that are all equal to base degenerate to just the 16B header (bw=0). Velox HUGEINT is mapped to tac::kUInt128; shuffle writer and frame format unchanged. Results vs LZ4 on the same int128 input: compression ratio 0.116 (TAC) vs 0.193 (LZ4) -- 40% smaller End-to-end on TPC-H SF=6000, the wins concentrate on the queries with heavy decimal-keyed shuffles: q15: shuffle size -15%, latency -8% q17: shuffle size -8%, latency -3% q18: shuffle size -8%, latency -3% Generated-by: Claude claude-opus-4-7 Co-authored-by: Guo Wangyang <wangyang.guo@intel.com> Co-authored-by: Lipeng Zhu <lipeng.zhu@intel.com> Signed-off-by: huhengrui <hengrui.hu@intel.com>
|
@marin-ma @FelixYBW Could you please help review this patch? This is a follow-up evolution to PR . While the previous PR successfully introduced TypeAwareCompress (TAC) support for int64, this patch extends the implementation to fully support int128. |
| ARROW_ASSIGN_OR_RAISE(auto nDecoded, FForCodec::decompress(in, dataLen, output, outputLen)); | ||
| (void)nDecoded; | ||
| return inputLen; | ||
| if (codecId != CodecId::kFFor) { |
There was a problem hiding this comment.
Why narrowing this condition? Is kFFor the only tac type that will be supported?
What changes are proposed in this pull request?
[VL] Extend TypeAwareCompress (TAC) to INT128 (DECIMAL HUGEINT).
The TAC codec from #11894 only covered INT64. DECIMAL(p>18) is stored as
128-bit HugeInt and previously fell through to LZ4, missing a structural
compression opportunity: in OLAP workloads (TPC-H prices, taxes, etc.)
DECIMAL values usually fit in INT64, so the high 64 bits are 0 and the
low 64 bits are narrow — exactly the pattern FFOR exploits.
Implementation: Split each 16B value into lo/hi uint64 sub-streams via
stride-2 gather into stack-allocated scratch buffers, then run the existing
64-bit FFOR encoder on each. Wire format reuses the 64-bit per-block
(bw, count, base) header twice — the stream is self-describing, no hi/lo
length prefix needed. hi sub-streams that are all equal to base degenerate
to just the 16B header (bw=0). Velox HUGEINT is mapped to
tac::kUInt128;shuffle writer and frame format unchanged.
How was this patch tested?
FForCodecTest.cc: 128-bit compress/decompress round-trips,alignment combinations, corrupted-input rejection, and tail handling.
Performance
Results vs LZ4 on the same int128 input:
End-to-end on TPC-H SF=6000, the wins concentrate on queries with heavy
decimal-keyed shuffles:
Was this patch authored or co-authored using generative AI tooling?
Reviewed-by: Claude claude-opus-4-7