fix(layout/dict): probe with the configured compressor instead of a hardcoded default#8406
fix(layout/dict): probe with the configured compressor instead of a hardcoded default#8406tomsanbear wants to merge 3 commits into
Conversation
…ardcoded default DictStrategy decides whether to apply a dictionary layout by probe-compressing the first chunk and checking whether the cascade chose Dict. The probe was hardcoded to BtrBlocksCompressor::default(), ignoring the compressor configured through WriteStrategyBuilder, so a caller's scheme choices did not influence the dict-fit decision. Add a probe_compressor field to DictStrategy (defaulting to BtrBlocksCompressor::default(), leaving existing callers unchanged) with a with_probe_compressor setter, and have WriteStrategyBuilder::build pass the stats_compressor. stats_compressor is used rather than data_compressor because data_compressor excludes IntDictScheme to avoid re-encoding the dict codes; the probe needs every dict scheme to detect eligibility. For the default builder stats_compressor equals BtrBlocksCompressor::default(), so the default path is unchanged. Signed-off-by: Thomas Santerre <thomas@santerre.xyz>
|
Open design question: With this change, the probe runs the caller's opaque compressor instead of the stock default. This is empirically a no-op for every current opaque caller (all pass btrblocks-based compressors that emit Dict, or non-dict-favourable vector data). The only behaviour change would be for a hypothetical opaque compressor that never emits a top-level |
Merging this PR will degrade performance by 19.8%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | compare[48] |
213 µs | 300.6 µs | -29.15% |
| ❌ | Simulation | compare[50] |
227.7 µs | 319.2 µs | -28.65% |
| ❌ | Simulation | compare[49] |
228.2 µs | 317.7 µs | -28.18% |
| ❌ | Simulation | compare[44] |
207.5 µs | 287.7 µs | -27.88% |
| ❌ | Simulation | compare[46] |
218.5 µs | 302.5 µs | -27.76% |
| ❌ | Simulation | compare[47] |
223.5 µs | 309.3 µs | -27.74% |
| ❌ | Simulation | compare[40] |
190.7 µs | 263.4 µs | -27.62% |
| ❌ | Simulation | compare[44] |
212.2 µs | 292.4 µs | -27.43% |
| ❌ | Simulation | compare[45] |
218.9 µs | 300.9 µs | -27.26% |
| ❌ | Simulation | compare[43] |
209.2 µs | 287.7 µs | -27.26% |
| ❌ | Simulation | compare[42] |
204.6 µs | 281 µs | -27.21% |
| ❌ | Simulation | compare[40] |
195.6 µs | 268.4 µs | -27.1% |
| ❌ | Simulation | compare[43] |
214.2 µs | 292.5 µs | -26.77% |
| ❌ | Simulation | compare[42] |
209.4 µs | 285.9 µs | -26.76% |
| ❌ | Simulation | compare[41] |
204.5 µs | 279.2 µs | -26.74% |
| ❌ | Simulation | compare[41] |
209.3 µs | 284 µs | -26.27% |
| ❌ | Simulation | compare[31] |
157.7 µs | 213.7 µs | -26.2% |
| ❌ | Simulation | compare[39] |
199.9 µs | 270.8 µs | -26.19% |
| ❌ | Simulation | compare[38] |
195.5 µs | 264.6 µs | -26.1% |
| ❌ | Simulation | compare[36] |
187.3 µs | 252.6 µs | -25.85% |
| ... | ... | ... | ... | ... | ... |
ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing tomsanbear:fix/dict-strategy-probe-compressor (0581c1f) with develop (9444d20)
Footnotes
-
10 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
| let compress_then_flat = CompressingStrategy::new(flat, Arc::clone(&stats_compressor)); | ||
|
|
||
| // 3. apply dict encoding or fallback | ||
| // 3. apply dict encoding or fallback. | ||
| // The dict-fit probe shares `stats_compressor` (the full configured cascade), not | ||
| // `data_compressor`: `data_compressor` drops `IntDictScheme` to avoid re-encoding the | ||
| // codes produced in step 5, but the probe needs every dict scheme to detect eligibility. | ||
| // The full cascade also honours caller scheme exclusions, unlike a hardcoded default. | ||
| let dict = DictStrategy::new( | ||
| coalescing.clone(), | ||
| compress_then_flat.clone(), | ||
| coalescing, | ||
| Default::default(), | ||
| ); | ||
| ) | ||
| .with_probe_compressor(stats_compressor); |
There was a problem hiding this comment.
Why use this compressor?
There was a problem hiding this comment.
The probe compresses the first chunk and only checks is dict, the result is discarded. stats_compressor seems like the right fit since it has all dictionary schemes; the data compressor excludes intdictscheme to prevent double dict-encoding the codes that DictStrategy produces downstream
There was a problem hiding this comment.
Maybe we can have a new compressor argument thar defaults to btrblocks instead of reusing the stats ine. I agree using data compressor doesn't make sense but maybe callers want a separate compressor for dict selection than the one used for stats
There was a problem hiding this comment.
I guess it's the tradeoff that callers have to explicitly choose which compressor probes dict eligibility, but that seems better than a default that silently ignores configured scheme exclusions, what do you think?
There was a problem hiding this comment.
Yea if users want to customise the probe compressor they should be able to change only that while constructing their write strategies. For the default case having the stats compressor works but there would probably be a need to change the probe compressor separately from the stats compressor
onursatici
left a comment
There was a problem hiding this comment.
this makes sense to me
| // The dict-fit probe shares `stats_compressor` (the full configured cascade), not | ||
| // `data_compressor`: `data_compressor` drops `IntDictScheme` to avoid re-encoding the | ||
| // codes produced in step 5, but the probe needs every dict scheme to detect eligibility. | ||
| // The full cascade also honours caller scheme exclusions, unlike a hardcoded default. |
There was a problem hiding this comment.
nit: remove this comment, I think it is a bit noisy given the declarations of stats and data compressors are self explanatory
| Default::default(), | ||
| ); | ||
| ) | ||
| .with_probe_compressor(stats_compressor); |
There was a problem hiding this comment.
shall we get this as an argument to new with the default being btrblocks?
There was a problem hiding this comment.
makes sense, updated the branch with this change and removed the with_... method
…mment Signed-off-by: Thomas Santerre <thomas@santerre.xyz>
Summary
Closes: #8405
DictStrategy's dict-fit probe was hardcoded toBtrBlocksCompressor::default(), ignoring the caller's configured compressor. The probe now takesstats_compressoras a parameter toDictStrategy::new, so caller scheme exclusions are honoured.stats_compressorrather thandata_compressorbecause the probe needs all dictionary schemes to detect eligibility.data_compressorexcludesIntDictScheme.Testing
New test
dict_probe_honours_configured_compressor: asserts dict layout with the default builder, asserts no dict layout withStringDictSchemeexcluded.AI use disclosure: fix authored with assistance from Claude Code.