Skip to content

branch-4.0: [improvement](rowset) Aggregate non-MOW segment key bounds#64305

Open
liaoxin01 wants to merge 1 commit into
apache:branch-4.0from
liaoxin01:pick-pr-62604-to-branch-4.0
Open

branch-4.0: [improvement](rowset) Aggregate non-MOW segment key bounds#64305
liaoxin01 wants to merge 1 commit into
apache:branch-4.0from
liaoxin01:pick-pr-62604-to-branch-4.0

Conversation

@liaoxin01

Copy link
Copy Markdown
Contributor

Pick #62604\n\nNote: keep enable_aggregate_non_mow_key_bounds disabled by default for upgrade/downgrade safety.

### What problem does this PR solve?

Issue Number: None

Related PR: apache#62604

Problem Summary: Non-MOW duplicate and aggregate rowsets do not consume per-segment key bounds on the read path, but persisting all per-segment bounds can make cloud rowset metadata too large for FDB values. This change adds an aggregated key-bounds layout for non-MOW rowsets, preserves the layout flag through rowset meta conversion, snapshot restore, compaction, and index rebuild paths, and keeps MOW rowsets on per-segment bounds for delete bitmap lookup correctness. The feature is controlled by enable_aggregate_non_mow_key_bounds and is disabled by default for upgrade and downgrade safety.

### Release note

Add an opt-in BE config enable_aggregate_non_mow_key_bounds to aggregate non-MOW segment key bounds and reduce rowset metadata size.

### Check List (For Author)

- Test: Static Check
    - Static check: git diff --check
- Behavior changed: No. The new aggregation behavior is disabled by default and only takes effect when enable_aggregate_non_mow_key_bounds is set to true
- Does this need documentation: No
Copilot AI review requested due to automatic review settings June 9, 2026 09:51
@liaoxin01 liaoxin01 requested a review from morningman as a code owner June 9, 2026 09:51
@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@liaoxin01

Copy link
Copy Markdown
Contributor Author

run buildall

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an opt-in mechanism to aggregate non-MOW rowset per-segment key bounds into a single [rowset_min, rowset_max] entry, reducing rowset metadata size (notably for cloud/FDB), while ensuring MOW rowsets continue to preserve per-segment bounds for correctness. It adds a new proto field to persist the layout choice, wires the behavior through rowset writing/compaction/snapshot/index-rewrite paths, and adds regression + unit tests.

Changes:

  • Add segments_key_bounds_aggregated to rowset meta protos and propagate it in cloud PB conversion/copy paths.
  • Implement aggregated key-bounds storage in RowsetMeta and gate it behind enable_aggregate_non_mow_key_bounds for non-MOW writers (plus preserve/propagate the layout through index-rewrite and snapshot restore).
  • Add regression and BE unit tests validating the aggregated behavior and guarding against MOW regressions.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
regression-test/suites/data_model_p0/duplicate/test_non_mow_key_bounds_aggregation.groovy Regression coverage for aggregated non-MOW bounds, MOW non-aggregation, config on/off behavior, and index-rewrite preservation.
gensrc/proto/olap_file.proto Adds segments_key_bounds_aggregated field to rowset meta protos.
be/test/olap/segments_key_bounds_truncation_test.cpp Forces aggregation feature off to keep truncation test expectations stable.
be/test/olap/rowset/rowset_meta_test.cpp Unit tests for aggregation behavior, flag semantics, and truncation interaction.
be/src/olap/task/index_builder.cpp Preserves aggregated layout/flag when rebuilding rowset meta during index rewrite.
be/src/olap/snapshot_manager.h Extends _rename_rowset_id signature to pass MOW flag into rowset writer context.
be/src/olap/snapshot_manager.cpp Propagates MOW flag so aggregation isn’t incorrectly applied when restoring MOW tablets’ rowsets.
be/src/olap/rowset/rowset.h Exposes is_segments_key_bounds_aggregated() on Rowset.
be/src/olap/rowset/rowset_meta.h Adds aggregated-flag accessors and extends set_segments_key_bounds API.
be/src/olap/rowset/rowset_meta.cpp Implements aggregation into a single min/max entry and asserts aggregation isn’t used in MOW merge path.
be/src/olap/rowset/beta_rowset_writer.cpp Applies aggregation for non-MOW rowset meta build (config-gated) and preserves source layout when cloning meta.
be/src/olap/compaction.cpp Ensures output layout remains aggregated if any input is aggregated; otherwise follows config for non-MOW.
be/src/olap/base_tablet.cpp Adds runtime guard to prevent MOW lookup from using aggregated/inconsistent bounds.
be/src/common/config.h Declares enable_aggregate_non_mow_key_bounds.
be/src/common/config.cpp Defines enable_aggregate_non_mow_key_bounds (default false).
be/src/cloud/pb_convert.cpp Copies truncated/aggregated flags only when present to avoid forcing default fields into cloud PB.
be/src/cloud/cloud_snapshot_mgr.cpp Preserves truncated/aggregated flags when creating rowset meta for cloud snapshots.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread be/src/olap/rowset/rowset_meta.h
@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage `` 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 79.38% (77/97) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.29% (19439/36475)
Line Coverage 36.44% (181744/498702)
Region Coverage 33.06% (141280/427375)
Branch Coverage 33.95% (61179/180221)

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 80.41% (78/97) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 70.10% (25031/35707)
Line Coverage 52.70% (262370/497818)
Region Coverage 50.39% (217513/431700)
Branch Coverage 51.60% (93321/180854)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants