branch-4.0: [improvement](rowset) Aggregate non-MOW segment key bounds#64305
branch-4.0: [improvement](rowset) Aggregate non-MOW segment key bounds#64305liaoxin01 wants to merge 1 commit into
Conversation
### What problem does this PR solve? Issue Number: None Related PR: apache#62604 Problem Summary: Non-MOW duplicate and aggregate rowsets do not consume per-segment key bounds on the read path, but persisting all per-segment bounds can make cloud rowset metadata too large for FDB values. This change adds an aggregated key-bounds layout for non-MOW rowsets, preserves the layout flag through rowset meta conversion, snapshot restore, compaction, and index rebuild paths, and keeps MOW rowsets on per-segment bounds for delete bitmap lookup correctness. The feature is controlled by enable_aggregate_non_mow_key_bounds and is disabled by default for upgrade and downgrade safety. ### Release note Add an opt-in BE config enable_aggregate_non_mow_key_bounds to aggregate non-MOW segment key bounds and reduce rowset metadata size. ### Check List (For Author) - Test: Static Check - Static check: git diff --check - Behavior changed: No. The new aggregation behavior is disabled by default and only takes effect when enable_aggregate_non_mow_key_bounds is set to true - Does this need documentation: No
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
There was a problem hiding this comment.
Pull request overview
This PR introduces an opt-in mechanism to aggregate non-MOW rowset per-segment key bounds into a single [rowset_min, rowset_max] entry, reducing rowset metadata size (notably for cloud/FDB), while ensuring MOW rowsets continue to preserve per-segment bounds for correctness. It adds a new proto field to persist the layout choice, wires the behavior through rowset writing/compaction/snapshot/index-rewrite paths, and adds regression + unit tests.
Changes:
- Add
segments_key_bounds_aggregatedto rowset meta protos and propagate it in cloud PB conversion/copy paths. - Implement aggregated key-bounds storage in
RowsetMetaand gate it behindenable_aggregate_non_mow_key_boundsfor non-MOW writers (plus preserve/propagate the layout through index-rewrite and snapshot restore). - Add regression and BE unit tests validating the aggregated behavior and guarding against MOW regressions.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| regression-test/suites/data_model_p0/duplicate/test_non_mow_key_bounds_aggregation.groovy | Regression coverage for aggregated non-MOW bounds, MOW non-aggregation, config on/off behavior, and index-rewrite preservation. |
| gensrc/proto/olap_file.proto | Adds segments_key_bounds_aggregated field to rowset meta protos. |
| be/test/olap/segments_key_bounds_truncation_test.cpp | Forces aggregation feature off to keep truncation test expectations stable. |
| be/test/olap/rowset/rowset_meta_test.cpp | Unit tests for aggregation behavior, flag semantics, and truncation interaction. |
| be/src/olap/task/index_builder.cpp | Preserves aggregated layout/flag when rebuilding rowset meta during index rewrite. |
| be/src/olap/snapshot_manager.h | Extends _rename_rowset_id signature to pass MOW flag into rowset writer context. |
| be/src/olap/snapshot_manager.cpp | Propagates MOW flag so aggregation isn’t incorrectly applied when restoring MOW tablets’ rowsets. |
| be/src/olap/rowset/rowset.h | Exposes is_segments_key_bounds_aggregated() on Rowset. |
| be/src/olap/rowset/rowset_meta.h | Adds aggregated-flag accessors and extends set_segments_key_bounds API. |
| be/src/olap/rowset/rowset_meta.cpp | Implements aggregation into a single min/max entry and asserts aggregation isn’t used in MOW merge path. |
| be/src/olap/rowset/beta_rowset_writer.cpp | Applies aggregation for non-MOW rowset meta build (config-gated) and preserves source layout when cloning meta. |
| be/src/olap/compaction.cpp | Ensures output layout remains aggregated if any input is aggregated; otherwise follows config for non-MOW. |
| be/src/olap/base_tablet.cpp | Adds runtime guard to prevent MOW lookup from using aggregated/inconsistent bounds. |
| be/src/common/config.h | Declares enable_aggregate_non_mow_key_bounds. |
| be/src/common/config.cpp | Defines enable_aggregate_non_mow_key_bounds (default false). |
| be/src/cloud/pb_convert.cpp | Copies truncated/aggregated flags only when present to avoid forcing default fields into cloud PB. |
| be/src/cloud/cloud_snapshot_mgr.cpp | Preserves truncated/aggregated flags when creating rowset meta for cloud snapshots. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
FE UT Coverage ReportIncrement line coverage `` 🎉 |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
Pick #62604\n\nNote: keep enable_aggregate_non_mow_key_bounds disabled by default for upgrade/downgrade safety.