[VL]:shuffle: extend TypeAwareCompress to INT128 (DECIMAL HUGEINT) by hhr293 · Pull Request #12299 · apache/gluten

hhr293 · 2026-06-15T08:57:36Z

What changes are proposed in this pull request?

[VL] Extend TypeAwareCompress (TAC) to INT128 (DECIMAL HUGEINT).

The TAC codec from #11894 only covered INT64. DECIMAL(p>18) is stored as
128-bit HugeInt and previously fell through to LZ4, missing a structural
compression opportunity: in OLAP workloads (TPC-H prices, taxes, etc.)
DECIMAL values usually fit in INT64, so the high 64 bits are 0 and the
low 64 bits are narrow — exactly the pattern FFOR exploits.

Implementation: Split each 16B value into lo/hi uint64 sub-streams via
stride-2 gather into stack-allocated scratch buffers, then run the existing
64-bit FFOR encoder on each. Wire format reuses the 64-bit per-block
(bw, count, base) header twice — the stream is self-describing, no hi/lo
length prefix needed. hi sub-streams that are all equal to base degenerate
to just the 16B header (bw=0). Velox HUGEINT is mapped to tac::kUInt128;
shuffle writer and frame format unchanged.

How was this patch tested?

New unit tests in FForCodecTest.cc: 128-bit compress/decompress round-trips,
alignment combinations, corrupted-input rejection, and tail handling.
Existing 64-bit tests continue to pass.
End-to-end validation on TPC-H SF=6000 (no regression on other queries).

Performance

Results vs LZ4 on the same int128 input:

Compression ratio: 0.116 (TAC) vs 0.193 (LZ4) — 40% smaller

End-to-end on TPC-H SF=6000, the wins concentrate on queries with heavy
decimal-keyed shuffles:

q15: shuffle size −15%, latency −8%
q17: shuffle size −8%, latency −3%
q18: shuffle size −8%, latency −3%

Was this patch authored or co-authored using generative AI tooling?

Reviewed-by: Claude claude-opus-4-7

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR extends the type-aware compression (TAC) pipeline to support Velox HUGEINT / DECIMAL128 by introducing a 128-bit FFOR-based codec (implemented as two 64-bit sub-streams).

Changes:

Add TAC type mapping and codec support for 128-bit values (kUInt128 / Velox HUGEINT).
Refactor 64-bit FFOR block encode/decode into shared helpers and implement a new 128-bit codec on top.
Add unit tests covering 128-bit round-trips, tails, misalignment, and truncated-input handling.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
cpp/velox/shuffle/VeloxTypeAwareCompress.h	Map Velox HUGEINT to TAC kUInt128.
cpp/core/utils/tac/ffor.hpp	Add shared encode/decode helpers and implement 128-bit FFOR codec.
cpp/core/utils/tac/TypeAwareCompressCodec.h	Add kUInt128 enum and document 128-bit support.
cpp/core/utils/tac/TypeAwareCompressCodec.cc	Extend TAC codec dispatch for kUInt128 in compress/decompress and maxCompressedLen.
cpp/core/utils/tac/FForCodec.h	Add 128-bit compress/decompress API surface.
cpp/core/utils/tac/FForCodec.cc	Implement 128-bit compress/decompress wrappers and bounds.
cpp/core/tests/FForCodecTest.cc	Add tests for TAC kUInt128 behavior and edge cases.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  auto* in = input;
  auto codecId = static_cast<CodecId>(*in++);
-  [[maybe_unused]] auto tacType = *in++;
+  auto tacType = (*in++);


+  if (outputSize < maxLen) {
+    return arrow::Status::Invalid("FForCodec: output buffer too small for 128-bit compression.");
+  }


Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

hhr293 · 2026-06-15T09:26:15Z

+FForCodec::compress128(const uint8_t* input, int64_t inputSize, uint8_t* output, int64_t outputSize) {
+  if (inputSize == 0) {
+    return 0;
+  }
+  if (inputSize % sizeof(__int128_t) != 0) {
+    return arrow::Status::Invalid("FForCodec: input size ", inputSize, " is not a multiple of ", sizeof(__int128_t), ".");
+  }
+
+  size_t numValues = inputSize / sizeof(__int128_t);
+  auto maxLen = static_cast<int64_t>(ffor::compress128Bound(numValues));
+  if (outputSize < maxLen) {
+    return arrow::Status::Invalid(
+        "FForCodec: output buffer too small for 128-bit compression (need ",
+        maxLen, " bytes, got ", outputSize, ").");
+  }
+
+  auto written = ffor::compress128(input, numValues, output);
+  return static_cast<int64_t>(written);
+}


The inputSize == 0 early return is correct. When inputSize is 0,
TypeAwareCompressCodec::compress() never calls FForCodec::compress128() in the
first place — the caller checks inputLen before invoking the codec. This guard
is purely defensive (same pattern as the 64-bit compress()).

Even if it were reached, returning 0 bytes written from the codec layer signals
"nothing compressed" to the caller, which is the correct semantic for empty
input. No malformed stream is produced because no stream is produced at all.

hhr293 · 2026-06-15T09:25:56Z

+FForCodec::decompress128(const uint8_t* input, int64_t inputSize, uint8_t* output, int64_t outputSize) {
+  if (outputSize == 0) {
+    return 0;
+  }
+  if (outputSize % sizeof(__int128_t) != 0) {
+    return arrow::Status::Invalid("FForCodec: output size ", outputSize, " is not a multiple of ", sizeof(__int128_t), ".");
+  }
+
+  auto nDecoded = ffor::decompress128(input, inputSize, output, static_cast<size_t>(outputSize));
+  return static_cast<int64_t>(nDecoded);
+}


When outputSize == 0, the caller is saying "I expect zero values." Returning 0
decoded values is the only correct answer regardless of input content.
Validating the input stream format in this case would be wasted work — even if
the stream is corrupt, producing 0 values into a 0-sized buffer is harmless and
matches caller expectations.

The caller already knows the expected output size from shuffle metadata. If the
stream is actually corrupt, the mismatch will be caught at a higher level when
the actual data (non-zero) fails to decode correctly in subsequent calls.

The TAC codec from apache#11894 only covered INT64. DECIMAL(p>18) is stored as 128-bit HugeInt and previously fell through to LZ4, missing a structural compression opportunity: in OLAP workloads (TPC-H prices, taxes, ...) DECIMAL values usually fit in INT64, so the high 64 bits are 0 and the low 64 bits are narrow -- exactly the pattern FFOR exploits. Implementation: split each 16B value into lo/hi uint64 sub-streams via stride-2 gather into stack-allocated scratch buffers, then run the existing 64-bit FFOR encoder on each. Wire format reuses the 64-bit per-block (bw, count, base) header twice -- the stream is self- describing, no hi/lo length prefix needed. hi sub-streams that are all equal to base degenerate to just the 16B header (bw=0). Velox HUGEINT is mapped to tac::kUInt128; shuffle writer and frame format unchanged. Results vs LZ4 on the same int128 input: compression ratio 0.116 (TAC) vs 0.193 (LZ4) -- 40% smaller End-to-end on TPC-H SF=6000, the wins concentrate on the queries with heavy decimal-keyed shuffles: q15: shuffle size -15%, latency -8% q17: shuffle size -8%, latency -3% q18: shuffle size -8%, latency -3% Generated-by: Claude claude-opus-4-7 Co-authored-by: Guo Wangyang <wangyang.guo@intel.com> Co-authored-by: Lipeng Zhu <lipeng.zhu@intel.com> Signed-off-by: huhengrui <hengrui.hu@intel.com>

hhr293 · 2026-06-16T09:32:32Z

@marin-ma @FelixYBW Could you please help review this patch? This is a follow-up evolution to PR . While the previous PR successfully introduced TypeAwareCompress (TAC) support for int64, this patch extends the implementation to fully support int128.
Please let me know if there are any concerns or suggestions. Thanks!

marin-ma · 2026-06-16T12:43:12Z

-      ARROW_ASSIGN_OR_RAISE(auto nDecoded, FForCodec::decompress(in, dataLen, output, outputLen));
-      (void)nDecoded;
-      return inputLen;
+  if (codecId != CodecId::kFFor) {


Why narrowing this condition? Is kFFor the only tac type that will be supported?

Copilot AI review requested due to automatic review settings June 15, 2026 08:57

Copilot AI reviewed Jun 15, 2026

View reviewed changes

github-actions Bot added the VELOX label Jun 15, 2026

hhr293 force-pushed the hhr/tac-int128 branch from e24474a to 7d06365 Compare June 15, 2026 09:11

hhr293 requested a review from Copilot June 15, 2026 09:15

Copilot AI reviewed Jun 15, 2026

View reviewed changes

marin-ma reviewed Jun 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VL]:shuffle: extend TypeAwareCompress to INT128 (DECIMAL HUGEINT)#12299

[VL]:shuffle: extend TypeAwareCompress to INT128 (DECIMAL HUGEINT)#12299
hhr293 wants to merge 1 commit into
apache:mainfrom
hhr293:hhr/tac-int128

hhr293 commented Jun 15, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

hhr293 Jun 15, 2026

Uh oh!

hhr293 Jun 15, 2026

Uh oh!

Uh oh!

Uh oh!

hhr293 commented Jun 16, 2026

Uh oh!

marin-ma Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

hhr293 commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes are proposed in this pull request?

How was this patch tested?

Performance

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

hhr293 Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

hhr293 Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hhr293 commented Jun 16, 2026

Uh oh!

marin-ma Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hhr293 commented Jun 15, 2026 •

edited

Loading