fix(layout/dict): probe with the configured compressor instead of a hardcoded default by tomsanbear · Pull Request #8406 · vortex-data/vortex

tomsanbear · 2026-06-14T17:26:11Z

Summary

Closes: #8405

DictStrategy's dict-fit probe was hardcoded to BtrBlocksCompressor::default(), ignoring the caller's configured compressor. The probe now takes stats_compressor as a parameter to DictStrategy::new, so caller scheme exclusions are honoured.

stats_compressor rather than data_compressor because the probe needs all dictionary schemes to detect eligibility. data_compressor excludes IntDictScheme.

Testing

New test dict_probe_honours_configured_compressor: asserts dict layout with the default builder, asserts no dict layout with StringDictScheme excluded.

AI use disclosure: fix authored with assistance from Claude Code.

…ardcoded default DictStrategy decides whether to apply a dictionary layout by probe-compressing the first chunk and checking whether the cascade chose Dict. The probe was hardcoded to BtrBlocksCompressor::default(), ignoring the compressor configured through WriteStrategyBuilder, so a caller's scheme choices did not influence the dict-fit decision. Add a probe_compressor field to DictStrategy (defaulting to BtrBlocksCompressor::default(), leaving existing callers unchanged) with a with_probe_compressor setter, and have WriteStrategyBuilder::build pass the stats_compressor. stats_compressor is used rather than data_compressor because data_compressor excludes IntDictScheme to avoid re-encoding the dict codes; the probe needs every dict scheme to detect eligibility. For the default builder stats_compressor equals BtrBlocksCompressor::default(), so the default path is unchanged. Signed-off-by: Thomas Santerre <thomas@santerre.xyz>

tomsanbear · 2026-06-14T17:27:12Z

Open design question:

With this change, the probe runs the caller's opaque compressor instead of the stock default. This is empirically a no-op for every current opaque caller (all pass btrblocks-based compressors that emit Dict, or non-dict-favourable vector data). The only behaviour change would be for a hypothetical opaque compressor that never emits a top-level Dict while being fed low-cardinality data... it would stop triggering the dict layout. I chose this uniform behaviour because it matches the approach of "the probe reflects the caller's compressor" open to alternatives as i'm still new to the codebase.

codspeed-hq · 2026-06-15T11:00:46Z

Merging this PR will degrade performance by 19.8%

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 5 improved benchmarks
❌ 104 regressed benchmarks
✅ 1436 untouched benchmarks
⏩ 10 skipped benchmarks¹

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
❌	Simulation	`compare[48]`	213 µs	300.6 µs	-29.15%
❌	Simulation	`compare[50]`	227.7 µs	319.2 µs	-28.65%
❌	Simulation	`compare[49]`	228.2 µs	317.7 µs	-28.18%
❌	Simulation	`compare[44]`	207.5 µs	287.7 µs	-27.88%
❌	Simulation	`compare[46]`	218.5 µs	302.5 µs	-27.76%
❌	Simulation	`compare[47]`	223.5 µs	309.3 µs	-27.74%
❌	Simulation	`compare[40]`	190.7 µs	263.4 µs	-27.62%
❌	Simulation	`compare[44]`	212.2 µs	292.4 µs	-27.43%
❌	Simulation	`compare[45]`	218.9 µs	300.9 µs	-27.26%
❌	Simulation	`compare[43]`	209.2 µs	287.7 µs	-27.26%
❌	Simulation	`compare[42]`	204.6 µs	281 µs	-27.21%
❌	Simulation	`compare[40]`	195.6 µs	268.4 µs	-27.1%
❌	Simulation	`compare[43]`	214.2 µs	292.5 µs	-26.77%
❌	Simulation	`compare[42]`	209.4 µs	285.9 µs	-26.76%
❌	Simulation	`compare[41]`	204.5 µs	279.2 µs	-26.74%
❌	Simulation	`compare[41]`	209.3 µs	284 µs	-26.27%
❌	Simulation	`compare[31]`	157.7 µs	213.7 µs	-26.2%
❌	Simulation	`compare[39]`	199.9 µs	270.8 µs	-26.19%
❌	Simulation	`compare[38]`	195.5 µs	264.6 µs	-26.1%
❌	Simulation	`compare[36]`	187.3 µs	252.6 µs	-25.85%
...	...	...	...	...	...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing tomsanbear:fix/dict-strategy-probe-compressor (0581c1f) with develop (9444d20)}

10 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

joseph-isaacs · 2026-06-15T11:01:12Z

+        let compress_then_flat = CompressingStrategy::new(flat, Arc::clone(&stats_compressor));

-        // 3. apply dict encoding or fallback
+        // 3. apply dict encoding or fallback.
+        // The dict-fit probe shares `stats_compressor` (the full configured cascade), not
+        // `data_compressor`: `data_compressor` drops `IntDictScheme` to avoid re-encoding the
+        // codes produced in step 5, but the probe needs every dict scheme to detect eligibility.
+        // The full cascade also honours caller scheme exclusions, unlike a hardcoded default.
        let dict = DictStrategy::new(
            coalescing.clone(),
            compress_then_flat.clone(),
            coalescing,
            Default::default(),
-        );
+        )
+        .with_probe_compressor(stats_compressor);


Why use this compressor?

The probe compresses the first chunk and only checks is dict, the result is discarded. stats_compressor seems like the right fit since it has all dictionary schemes; the data compressor excludes intdictscheme to prevent double dict-encoding the codes that DictStrategy produces downstream

Maybe we can have a new compressor argument thar defaults to btrblocks instead of reusing the stats ine. I agree using data compressor doesn't make sense but maybe callers want a separate compressor for dict selection than the one used for stats

I guess it's the tradeoff that callers have to explicitly choose which compressor probes dict eligibility, but that seems better than a default that silently ignores configured scheme exclusions, what do you think?

Yea if users want to customise the probe compressor they should be able to change only that while constructing their write strategies. For the default case having the stats compressor works but there would probably be a need to change the probe compressor separately from the stats compressor

onursatici

this makes sense to me

onursatici · 2026-06-15T10:48:34Z

+        // The dict-fit probe shares `stats_compressor` (the full configured cascade), not
+        // `data_compressor`: `data_compressor` drops `IntDictScheme` to avoid re-encoding the
+        // codes produced in step 5, but the probe needs every dict scheme to detect eligibility.
+        // The full cascade also honours caller scheme exclusions, unlike a hardcoded default.


nit: remove this comment, I think it is a bit noisy given the declarations of stats and data compressors are self explanatory

onursatici · 2026-06-15T11:04:58Z

            Default::default(),
-        );
+        )
+        .with_probe_compressor(stats_compressor);


shall we get this as an argument to new with the default being btrblocks?

makes sense, updated the branch with this change and removed the with_... method

…mment Signed-off-by: Thomas Santerre <thomas@santerre.xyz>

tomsanbear requested a review from a team June 14, 2026 17:26

tomsanbear changed the title ~~fix(layout/dict): probe with the configured compressor instead of a h…~~ fix(layout/dict): probe with the configured compressor instead of a hardcoded default Jun 14, 2026

onursatici added the action/benchmark Trigger full benchmarks to run on this PR label Jun 15, 2026

joseph-isaacs requested a review from onursatici June 15, 2026 10:53

onursatici added the changelog/fix A bug fix label Jun 15, 2026

joseph-isaacs reviewed Jun 15, 2026

View reviewed changes

onursatici reviewed Jun 15, 2026

View reviewed changes

tomsanbear added 2 commits June 15, 2026 09:16

address review: move probe_compressor into DictStrategy::new, drop co…

edbda38

…mment Signed-off-by: Thomas Santerre <thomas@santerre.xyz>

Merge branch 'develop' into fix/dict-strategy-probe-compressor

1cda360

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(layout/dict): probe with the configured compressor instead of a hardcoded default#8406

fix(layout/dict): probe with the configured compressor instead of a hardcoded default#8406
tomsanbear wants to merge 3 commits into
vortex-data:developfrom
tomsanbear:fix/dict-strategy-probe-compressor

tomsanbear commented Jun 14, 2026 •

edited

Loading

Uh oh!

tomsanbear commented Jun 14, 2026

Uh oh!

codspeed-hq Bot commented Jun 15, 2026

Uh oh!

joseph-isaacs Jun 15, 2026 •

edited

Loading

Uh oh!

tomsanbear Jun 15, 2026

Uh oh!

onursatici Jun 15, 2026

Uh oh!

tomsanbear Jun 15, 2026

Uh oh!

onursatici Jun 15, 2026

Uh oh!

onursatici left a comment

Uh oh!

onursatici Jun 15, 2026

Uh oh!

onursatici Jun 15, 2026

Uh oh!

tomsanbear Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tomsanbear commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Uh oh!

tomsanbear commented Jun 14, 2026

Uh oh!

codspeed-hq Bot commented Jun 15, 2026

Merging this PR will degrade performance by 19.8%

Performance Changes

Footnotes

Uh oh!

joseph-isaacs Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tomsanbear Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

onursatici Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

tomsanbear Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

onursatici Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

onursatici left a comment

Choose a reason for hiding this comment

Uh oh!

onursatici Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

onursatici Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

tomsanbear Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tomsanbear commented Jun 14, 2026 •

edited

Loading

joseph-isaacs Jun 15, 2026 •

edited

Loading