Skip to content

[refactor](be) Avoid rebinding storage expression column ids#64010

Closed
BiteTheDDDDt wants to merge 11 commits into
apache:masterfrom
BiteTheDDDDt:codex/avoid-expr-rebind
Closed

[refactor](be) Avoid rebinding storage expression column ids#64010
BiteTheDDDDt wants to merge 11 commits into
apache:masterfrom
BiteTheDDDDt:codex/avoid-expr-rebind

Conversation

@BiteTheDDDDt

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: N/A

Related PR: N/A

Problem Summary: Storage expression pushdown previously rebound slot refs to the expanded reader schema by mutating expression column ids inside segment iterators. That violates the assumption that pushed-down expressions are immutable and can produce surprising behavior when the same expression objects are reused. This change keeps the full scan schema unchanged, adds a project schema aligned with the final projected columns, and evaluates pushed-down expressions against a temporary project block. Slot ordinal to tablet column id mapping now uses the project schema, so no expression mutation is required.

Release note

None

Check List (For Author)

  • Test: Manual test
    • ./build.sh --be
  • Behavior changed: No
  • Does this need documentation: No

Copilot AI review requested due to automatic review settings June 2, 2026 07:52
@BiteTheDDDDt BiteTheDDDDt requested a review from gavinchou as a code owner June 2, 2026 07:52
@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

### What problem does this PR solve?

Issue Number: N/A

Related PR: N/A

Problem Summary: Storage expression pushdown previously rebound slot refs to the expanded reader schema by mutating expression column ids inside segment iterators. That violates the assumption that pushed-down expressions are immutable and can produce surprising behavior when the same expression objects are reused. This change keeps the full scan schema unchanged, adds a project schema aligned with the final projected columns, and evaluates pushed-down expressions against a temporary project block. Slot ordinal to tablet column id mapping now uses the project schema, so no expression mutation is required.

### Release note

None

### Check List (For Author)

- Test: Manual test
    - ./build.sh --be
- Behavior changed: No
- Does this need documentation: No
@BiteTheDDDDt BiteTheDDDDt force-pushed the codex/avoid-expr-rebind branch from 710a4c5 to 2e9588c Compare June 2, 2026 08:04

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors BE storage expression pushdown to avoid mutating (rebinding) VSlotRef/VirtualSlotRef column ids inside segment iterators. Instead, it introduces a “project schema” representing the final scan columns (pre-expansion) and evaluates pushed-down expressions against a temporary project block whose column layout matches the original slot ordinals—so expressions can remain immutable even when the reader schema is expanded for AGG/merge readers.

Changes:

  • Add project_columns to StorageReadOptions and propagate it from RowsetReaderContext in BetaRowsetReader.
  • Introduce SegmentIterator::_project_schema plus helper methods to map column ids to block positions and to build a per-batch project block.
  • Remove per-segment expression tree rebinding logic and switch common-expr / ANN / score paths to use the project schema.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
be/src/storage/segment/segment_iterator.h Adds project-schema helpers and stores _project_schema in SegmentIterator.
be/src/storage/segment/segment_iterator.cpp Implements project schema/block building and switches expr/ANN/virtual-column flows away from mutating expressions.
be/src/storage/rowset/beta_rowset_reader.cpp Passes origin_return_columns/return_columns through to storage as project_columns.
be/src/storage/iterators.h Extends StorageReadOptions with project_columns (pre-expansion projected scan columns).
Comments suppressed due to low confidence (1)

be/src/storage/segment/segment_iterator.cpp:3557

  • _materialization_of_virtual_column() rebuilds a full project_block for every virtual column expression. If there can be multiple virtual columns, this adds repeated block construction and column insertion overhead per batch. Consider building the project_block once per call (or caching/reusing it) and then executing each virtual expr against the same project_block.
                LOG_WARNING("Result of expr column {} is empty. cid {}, idx_in_block {}",
                            column_expr->root()->debug_string(), cid, idx_in_block);
            }
        }
    }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 512 to 524
if (column->size() != selected_size) {
return Status::InternalError("project column {} has {} rows, expected {}", cid,
column->size(), selected_size);
}
}
project_block->insert({std::move(column), type, _schema->column(cid)->name()});
}
return Status::OK();
}

void SegmentIterator::_initialize_predicate_results() {
// Initialize from _col_predicates
for (auto pred : _col_predicates) {
@BiteTheDDDDt

Copy link
Copy Markdown
Contributor Author

run buildall

@BiteTheDDDDt

Copy link
Copy Markdown
Contributor Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review summary:

I did not find additional distinct issues beyond the existing inline thread already raised for _build_project_block() and lazy materialization placeholders. I did not submit duplicate inline comments.

Critical checkpoint conclusions:

  • Goal and tests: The PR appears intended to evaluate storage-side common/virtual expressions against the final scan projection layout while preserving expanded storage read schemas. The code generally follows that direction. I did not see new/modified tests in the PR, so coverage for non-direct AGG/UNIQUE paths with common expr pushdown, virtual columns, and ANN/score columns remains a residual risk.
  • Scope and clarity: The change is focused on projection/schema mapping inside rowset and segment reading. The replacement of expression rebinding with project-block construction is understandable, though it increases reliance on correct project column coverage.
  • Concurrency: The touched paths are per-reader/per-segment iterator execution paths; I did not identify new shared mutable state, lock ordering, or thread-safety issues.
  • Lifecycle: No new static/global lifecycle or ownership issues were identified. The project column pointer is propagated from reader context vectors that outlive segment iterator initialization/use in the reviewed paths.
  • Configuration and compatibility: No new configuration, storage format, EditLog, or wire-protocol compatibility changes were introduced.
  • Parallel paths: Both direct and non-direct rowset reader paths were considered through origin_return_columns/return_columns; I did not find an additional missing propagation path.
  • Error handling: New Status returns are checked by callers. The existing already-known concern about non-materialized projected columns is not repeated here.
  • Transaction/data correctness: The change does not modify transaction visibility, delete bitmap versioning, rowset lifecycle, or persistence paths.
  • Performance and observability: The project block adds extra block construction during common expr and virtual column materialization; this is localized. I did not find an obvious pathological regression, but test/benchmark coverage would be useful for wide projected schemas.

User focus: No additional user-provided review focus was specified, and I found no extra focus-specific issue.

@BiteTheDDDDt

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 71.74% (66/92) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 63.49% (24260/38211)
Line Coverage 46.77% (248464/531235)
Region Coverage 44.06% (207844/471690)
Branch Coverage 44.68% (88948/199087)

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 23735 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2e9588c403dbba88a37f960ccec3c0e948734dc5, data reload: false

------ Round 1 ----------------------------------
lineitem	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:38:59	NULL	utf-8	NULL	NULL	
supplier	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:27:25	NULL	utf-8	NULL	NULL	
region	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:27:24	NULL	utf-8	NULL	NULL	
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17839	4080	4059	4059
q2	q3	10098	1373	845	845
q4	4699	488	357	357
q5	7556	886	603	603
q6	193	173	139	139
q7	794	851	659	659
q8	9774	1672	1679	1672
q9	7348	4620	4548	4548
q10	6801	1831	1524	1524
q11	443	274	262	262
q12	q13	q14	334	264	244	244
q15	q16	803	743	716	716
q17	948	1062	775	775
q18	6869	5717	5608	5608
q19	1567	1315	1157	1157
q20	517	388	263	263
q21	q22	460	362	304	304
Total cold run time: 77043 ms
Total hot run time: 23735 ms

----- Round 2, with runtime_filter_mode=off -----
customer	Doris	NULL	NULL	15000000	92	1381653732	NULL	4374759	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:27:43	NULL	utf-8	NULL	NULL	
lineitem	Doris	NULL	NULL	600037902	33	19843441616	NULL	61784740	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:38:59	NULL	utf-8	NULL	NULL	
supplier	Doris	NULL	NULL	1000000	87	87519212	NULL	194931	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:27:25	NULL	utf-8	NULL	NULL	
region	Doris	NULL	NULL	5	240	1201	NULL	147	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:27:24	NULL	utf-8	NULL	NULL	
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4315	4226	4194	4194
q2	q3	4632	4821	5639	4821
q4	2417	2304	1521	1521
q5	5146	4904	4918	4904
q6	251	255	162	162
q7	1909	1775	1586	1586
q8	2523	2190	2099	2099
q9	8003	8085	7950	7950
q10	4815	4803	4492	4492
q11	586	432	377	377
q12	q13	q14	295	293	266	266
q15	q16	677	680	651	651
q17	1287	1421	1062	1062
q18	7632	7603	6882	6882
q19	1257	1112	1116	1112
q20	2243	2242	1986	1986
q21	q22	Total cold run time: 47988 ms
Total hot run time: 44065 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 169276 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 2e9588c403dbba88a37f960ccec3c0e948734dc5, data reload: false

query5	4317	639	498	498
query6	439	211	181	181
query7	4814	576	321	321
query8	373	223	216	216
query9	8755	3983	4061	3983
query10	470	316	256	256
query11	5935	2382	2233	2233
query12	158	111	103	103
query13	1281	575	437	437
query14	6416	5479	5124	5124
query14_1	4460	4453	4430	4430
query15	209	202	182	182
query16	1005	507	448	448
query17	1185	733	607	607
query18	2620	483	360	360
query19	224	213	141	141
query20	112	107	106	106
query21	225	142	112	112
query22	13671	13605	13455	13455
query23	17161	16401	16288	16288
query23_1	16323	16261	16259	16259
query24	7611	1770	1298	1298
query24_1	1323	1321	1321	1321
query25	548	452	405	405
query26	1158	328	169	169
query27	2633	568	340	340
query28	4502	2065	2052	2052
query29	1068	607	488	488
query30	305	234	190	190
query31	1106	1086	969	969
query32	110	64	58	58
query33	522	324	257	257
query34	1210	1128	670	670
query35	753	788	670	670
query36	1409	1401	1259	1259
query37	156	108	93	93
query38	3221	3149	3047	3047
query39	936	928	905	905
query39_1	894	883	867	867
query40	220	122	105	105
query41	
query42	96	92	94	92
query43	323	326	293	293
query44	
query45	199	193	184	184
query46	1114	1198	744	744
query47	2341	2388	2231	2231
query48	374	408	309	309
query49	645	478	365	365
query50	1064	358	255	255
query51	4356	4339	4200	4200
query52	90	93	78	78
query53	
query54	272	218	231	218
query55	85	82	68	68
query56	232	223	233	223
query57	1481	1404	1306	1306
query58	259	226	219	219
query59	1649	1724	1434	1434
query60	285	253	236	236
query61	163	160	160	160
query62	716	634	589	589
query63	
query64	2402	798	629	629
query65	
query66	1715	458	344	344
query67	29860	29737	29785	29737
query68	
query69	412	315	266	266
query70	975	978	909	909
query71	300	224	214	214
query72	3031	2712	2612	2612
query73	863	756	458	458
query74	5120	5032	4762	4762
query75	2694	2620	2265	2265
query76	2352	1156	832	832
query77	370	391	293	293
query78	12507	12510	11784	11784
query79	1423	1099	770	770
query80	624	492	416	416
query81	466	289	287	287
query82	577	159	126	126
query83	359	273	249	249
query84	263	143	115	115
query85	897	555	448	448
query86	390	296	289	289
query87	3376	3371	3185	3185
query88	3626	2738	2677	2677
query89	
query90	1960	187	180	180
query91	176	166	142	142
query92	65	66	57	57
query93	1681	1523	851	851
query94	558	354	335	335
query95	682	382	354	354
query96	1146	808	335	335
query97	2703	2717	2570	2570
query98	214	217	204	204
query99	1172	1149	1041	1041
Total cold run time: 250954 ms
Total hot run time: 169276 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 23569 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 79cbf948f03b32c5c62698727ae0a631d245d882, data reload: false

------ Round 1 ----------------------------------
lineitem	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:38:59	NULL	utf-8	NULL	NULL	
supplier	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:27:25	NULL	utf-8	NULL	NULL	
region	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:27:24	NULL	utf-8	NULL	NULL	
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17894	4205	4111	4111
q2	q3	10036	1406	832	832
q4	4680	531	354	354
q5	7610	898	599	599
q6	190	172	141	141
q7	793	848	633	633
q8	9638	1660	1517	1517
q9	6525	4500	4500	4500
q10	6771	1822	1551	1551
q11	449	270	254	254
q12	q13	q14	289	261	237	237
q15	q16	791	728	712	712
q17	931	995	908	908
q18	7057	5912	5644	5644
q19	1554	1380	996	996
q20	527	388	267	267
q21	q22	451	370	313	313
Total cold run time: 76186 ms
Total hot run time: 23569 ms

----- Round 2, with runtime_filter_mode=off -----
customer	Doris	NULL	NULL	15000000	92	1381653732	NULL	4374759	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:27:43	NULL	utf-8	NULL	NULL	
lineitem	Doris	NULL	NULL	600037902	33	19843441616	NULL	61784740	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:38:59	NULL	utf-8	NULL	NULL	
supplier	Doris	NULL	NULL	1000000	87	87519212	NULL	194931	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:27:25	NULL	utf-8	NULL	NULL	
region	Doris	NULL	NULL	5	240	1201	NULL	147	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:27:24	NULL	utf-8	NULL	NULL	
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4279	4220	4212	4212
q2	q3	4632	4885	5506	4885
q4	2444	2346	1530	1530
q5	5070	4970	4942	4942
q6	238	172	130	130
q7	2068	1765	1599	1599
q8	2493	2076	2118	2076
q9	7957	7950	7886	7886
q10	4815	4725	4280	4280
q11	557	403	521	403
q12	q13	q14	321	292	265	265
q15	q16	682	666	641	641
q17	1275	1393	1060	1060
q18	7749	7517	7050	7050
q19	1133	1091	1055	1055
q20	2214	2215	1949	1949
q21	q22	Total cold run time: 47927 ms
Total hot run time: 43963 ms

@BiteTheDDDDt

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 169035 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 79cbf948f03b32c5c62698727ae0a631d245d882, data reload: false

query5	4322	626	483	483
query6	451	196	189	189
query7	4875	551	315	315
query8	371	219	213	213
query9	8755	4140	4116	4116
query10	459	321	267	267
query11	5932	2421	2144	2144
query12	152	105	99	99
query13	1256	597	444	444
query14	6417	5407	5107	5107
query14_1	4442	4460	4445	4445
query15	210	204	186	186
query16	1070	470	466	466
query17	1141	733	602	602
query18	2751	492	379	379
query19	206	175	144	144
query20	111	110	103	103
query21	220	137	118	118
query22	13797	13618	13414	13414
query23	17285	16482	16168	16168
query23_1	16291	16322	16348	16322
query24	7585	1772	1312	1312
query24_1	1296	1298	1275	1275
query25	540	462	382	382
query26	1039	336	161	161
query27	2660	522	341	341
query28	4516	2046	2046	2046
query29	1040	583	477	477
query30	311	247	195	195
query31	1124	1057	966	966
query32	113	64	59	59
query33	511	324	248	248
query34	1202	1167	654	654
query35	759	777	681	681
query36	1429	1397	1240	1240
query37	157	100	92	92
query38	3215	3167	3028	3028
query39	929	912	920	912
query39_1	879	900	870	870
query40	216	122	110	110
query41	
query42	98	92	95	92
query43	324	328	282	282
query44	
query45	192	190	181	181
query46	1113	1169	738	738
query47	2363	2344	2246	2246
query48	414	403	292	292
query49	607	469	362	362
query50	971	351	258	258
query51	4316	4363	4278	4278
query52	89	117	77	77
query53	
query54	267	225	193	193
query55	83	82	69	69
query56	231	226	207	207
query57	1448	1393	1311	1311
query58	269	222	202	202
query59	1636	1708	1472	1472
query60	283	253	232	232
query61	167	156	152	152
query62	703	663	594	594
query63	
query64	2283	787	637	637
query65	
query66	1677	468	342	342
query67	29755	29674	29603	29603
query68	
query69	439	296	263	263
query70	991	962	914	914
query71	297	228	213	213
query72	2932	2798	2617	2617
query73	848	735	449	449
query74	5151	4952	4775	4775
query75	2688	2576	2260	2260
query76	2326	1157	787	787
query77	357	374	297	297
query78	12452	12524	11880	11880
query79	1431	1006	723	723
query80	786	476	379	379
query81	497	282	239	239
query82	701	156	121	121
query83	363	282	263	263
query84	260	141	110	110
query85	930	543	456	456
query86	420	294	286	286
query87	3397	3389	3180	3180
query88	3628	2724	2714	2714
query89	
query90	1767	175	175	175
query91	171	167	138	138
query92	65	65	59	59
query93	1598	1489	873	873
query94	637	348	327	327
query95	686	372	352	352
query96	1034	802	369	369
query97	2701	2737	2568	2568
query98	217	216	203	203
query99	1198	1144	998	998
Total cold run time: 250864 ms
Total hot run time: 169035 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29593 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 9e25d868a01c7e53211b37170287e6423bfa799f, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17595	4148	4108	4108
q2	q3	10775	1443	810	810
q4	4685	486	349	349
q5	7541	894	599	599
q6	184	177	139	139
q7	807	832	646	646
q8	9366	1615	1569	1569
q9	5823	4491	4522	4491
q10	6766	1807	1561	1561
q11	440	273	251	251
q12	642	422	300	300
q13	18139	3535	2932	2932
q14	265	260	247	247
q15	q16	829	782	716	716
q17	964	947	859	859
q18	7014	5918	5540	5540
q19	1349	1288	1105	1105
q20	521	426	275	275
q21	6218	2785	2859	2785
q22	461	383	311	311
Total cold run time: 100384 ms
Total hot run time: 29593 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5227	4849	4952	4849
q2	q3	4911	5329	4740	4740
q4	2138	2198	1417	1417
q5	4813	4841	4782	4782
q6	235	181	131	131
q7	1853	1772	1627	1627
q8	2432	2163	2117	2117
q9	8028	7777	7431	7431
q10	4772	4700	4226	4226
q11	543	394	358	358
q12	733	741	532	532
q13	3179	3680	2869	2869
q14	273	287	260	260
q15	q16	687	708	624	624
q17	1290	1266	1268	1266
q18	7179	6832	6655	6655
q19	1111	1065	1096	1065
q20	2241	2224	1945	1945
q21	5349	4690	4476	4476
q22	526	484	413	413
Total cold run time: 57520 ms
Total hot run time: 51783 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 169887 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 9e25d868a01c7e53211b37170287e6423bfa799f, data reload: false

query5	4333	666	487	487
query6	459	202	180	180
query7	4819	562	305	305
query8	371	222	215	215
query9	8768	4085	4084	4084
query10	433	312	270	270
query11	5731	2365	2190	2190
query12	154	101	96	96
query13	1258	627	451	451
query14	6385	5459	5078	5078
query14_1	4411	4422	4422	4422
query15	206	196	176	176
query16	979	459	411	411
query17	939	696	581	581
query18	2427	470	335	335
query19	198	183	139	139
query20	109	108	105	105
query21	220	136	118	118
query22	13672	13540	13383	13383
query23	17358	16558	16145	16145
query23_1	16377	16283	16206	16206
query24	7547	1760	1344	1344
query24_1	1338	1316	1325	1316
query25	578	478	410	410
query26	1317	329	176	176
query27	2706	580	348	348
query28	4537	2103	2097	2097
query29	1106	627	500	500
query30	308	239	203	203
query31	1138	1080	976	976
query32	105	66	63	63
query33	538	330	257	257
query34	1211	1133	643	643
query35	764	801	687	687
query36	1385	1391	1272	1272
query37	157	109	94	94
query38	3256	3159	3038	3038
query39	932	908	895	895
query39_1	890	879	866	866
query40	223	131	108	108
query41	71	69	67	67
query42	98	99	97	97
query43	319	339	294	294
query44	
query45	200	191	182	182
query46	1115	1222	752	752
query47	2327	2390	2244	2244
query48	416	438	311	311
query49	656	486	370	370
query50	972	361	269	269
query51	4359	4291	4233	4233
query52	96	94	87	87
query53	249	271	195	195
query54	288	236	214	214
query55	82	77	73	73
query56	251	242	239	239
query57	1446	1407	1344	1344
query58	259	236	229	229
query59	1612	1654	1427	1427
query60	295	260	243	243
query61	188	201	152	152
query62	699	646	585	585
query63	235	187	186	186
query64	2529	782	636	636
query65	
query66	1790	464	345	345
query67	29842	29800	29615	29615
query68	
query69	419	303	276	276
query70	968	973	967	967
query71	291	223	210	210
query72	2986	2698	2439	2439
query73	886	765	432	432
query74	5199	4959	4772	4772
query75	2667	2601	2239	2239
query76	2384	1177	802	802
query77	359	374	292	292
query78	12436	12446	11890	11890
query79	1202	1126	789	789
query80	532	499	404	404
query81	449	280	241	241
query82	230	160	123	123
query83	280	294	248	248
query84	292	141	116	116
query85	845	552	450	450
query86	327	311	298	298
query87	3349	3346	3214	3214
query88	3697	2765	2765	2765
query89	407	384	330	330
query90	2151	181	183	181
query91	179	170	138	138
query92	67	62	62	62
query93	1386	1467	881	881
query94	538	303	330	303
query95	707	480	345	345
query96	999	796	359	359
query97	2698	2715	2550	2550
query98	214	204	209	204
query99	1144	1170	1056	1056
Total cold run time: 250614 ms
Total hot run time: 169887 ms

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (88/88) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.84% (28216/38211)
Line Coverage 57.80% (307013/531198)
Region Coverage 54.58% (257418/471649)
Branch Coverage 56.03% (111535/199069)

@BiteTheDDDDt

Copy link
Copy Markdown
Contributor Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review result: request changes. I found one blocking issue in the virtual-column materialization fallback.

Critical checkpoint conclusions:

  • Goal/test: The PR appears intended to evaluate storage pushdown and virtual column expressions against the final projected scan schema while avoiding per-expression rebinding. The changed flow mostly follows that goal, but the virtual expression fallback loses an exception boundary. I did not find new tests in the PR that cover this fallback/error path.
  • Scope: The change is focused on segment iterator projection/schema mapping.
  • Concurrency/lifecycle: No new shared mutable state, lock ordering, or special lifecycle concerns found in the changed paths.
  • Config/compatibility/protocol: No new configs, storage formats, or FE-BE protocol fields found.
  • Parallel paths: I checked common expr pushdown, ANN/score virtual materialization, and virtual projection materialization. The reported issue is specific to the virtual projection fallback replacing VExprContext::execute().
  • Error handling: Blocking issue found: expression exceptions can now escape instead of being converted to Status in _materialization_of_virtual_column().
  • Data correctness/storage invariants: No transaction, visible-version, delete-bitmap, or rowset lifecycle changes found.
  • Performance/observability: No blocking performance or observability issue found; the additional projected block construction is localized to expression evaluation paths.
  • Test coverage: The PR should add or update coverage for virtual column materialization through the projected schema, including an expression that can throw and should return query failure Status rather than escape.

Existing review context: I read the existing inline thread about _build_project_block() and non-materialized projected columns and did not duplicate it.
User focus: No additional user-provided review focus was specified.

Comment thread be/src/storage/segment/segment_iterator.cpp Outdated
return Status::OK();
}

Status SegmentIterator::_init_project_schema() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_schema_block_id_map.assign(_schema->columns().size(), -1);
    for (int i = 0; i < _schema->num_column_ids(); i++) {
        auto cid = _schema->column_id(i);
        _schema_block_id_map[cid] = i;
    }

这个初始化的代码在_vec_init_lazy_materialization也会被调用,需要统一一下

ColumnPtr result_column;
RETURN_IF_ERROR(column_expr->execute(block, result_column));
Block project_block;
RETURN_IF_ERROR(_build_project_block(block, &project_block));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是不是应该放到循环外面,如果有多个虚拟列表达式这个要搞好几遍?

}

Status SegmentIterator::_build_project_block(Block* block, Block* project_block) {
project_block->clear();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个现在普通的dup key表,MOW表也会走到这里,这里应该加一个快速路径的判断,就是如果发现这个project schema和普通schema一样的话,就直接返回block吧

### What problem does this PR solve?

Issue Number: None

Related PR: apache#64010

Problem Summary: Address review comments for avoiding storage expression column id rebinding. The project-schema column-id to block-position map was initialized in multiple places, and project-block construction still ran on ordinary scan paths where the project schema matched the reader schema. Virtual column materialization also rebuilt the project block for each expression. This change centralizes schema block mapping initialization, reuses the original block when the project schema is identical to the reader schema, reuses the projected block during virtual column materialization, and keeps expression exceptions converted to Status at the virtual-column evaluation boundary.

### Release note

None

### Check List (For Author)

- Test: Unit Test
    - ./build.sh --be
    - build-support/run-clang-tidy.sh attempted; it reported pre-existing/analysis diagnostics in segment_iterator.cpp, segment_iterator.h, and be/src/util/jni-util.h unrelated to the changed logic.
- Behavior changed: No
- Does this need documentation: No
Comment thread be/src/storage/rowset/beta_rowset_reader.cpp Outdated
@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29531 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b450ca16e2db563c6a7246f04808336448807b47, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17980	4251	4256	4251
q2	q3	10744	1454	848	848
q4	4685	483	354	354
q5	7541	891	596	596
q6	196	176	138	138
q7	797	842	660	660
q8	9466	1771	1527	1527
q9	5957	4577	4566	4566
q10	6775	1804	1551	1551
q11	436	284	259	259
q12	644	431	296	296
q13	18107	3484	2733	2733
q14	274	264	251	251
q15	q16	839	781	718	718
q17	1018	909	892	892
q18	6902	5700	5514	5514
q19	1312	1269	1120	1120
q20	535	396	267	267
q21	6196	2935	2653	2653
q22	476	388	337	337
Total cold run time: 100880 ms
Total hot run time: 29531 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5325	5048	4998	4998
q2	q3	5036	5237	4691	4691
q4	2123	2234	1412	1412
q5	4836	5014	4798	4798
q6	227	174	129	129
q7	1933	1843	1612	1612
q8	2541	2212	2219	2212
q9	8044	7752	7487	7487
q10	4771	4695	4220	4220
q11	552	412	383	383
q12	737	743	543	543
q13	3118	3468	2801	2801
q14	280	277	270	270
q15	q16	696	709	662	662
q17	1327	1283	1276	1276
q18	7408	6891	7008	6891
q19	1132	1101	1123	1101
q20	2253	2228	1960	1960
q21	5428	4688	4648	4648
q22	527	463	406	406
Total cold run time: 58294 ms
Total hot run time: 52500 ms

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review result: request changes.

I did not open duplicate new inline threads for the already-known origin_return_columns / project_columns nullability concern. Instead, I replied to the existing threads with concrete production paths that still fail on the current PR head:

  • Direct Segment::new_iterator() callers such as be/src/storage/task/index_builder.cpp:563-570 and be/src/storage/rowset/segcompaction.cpp:93-115 construct StorageReadOptions without project_columns, then SegmentIterator::_init_project_schema() hits DORIS_CHECK(_opts.project_columns != nullptr).
  • Direct RowsetReaderContext callers such as be/src/storage/schema_change/schema_change.cpp:1057-1076 and be/src/cloud/cloud_schema_change_job.cpp:258-280 set return_columns but not origin_return_columns, then BetaRowsetReader::get_segment_iterators() hits DORIS_CHECK(_read_context->origin_return_columns != nullptr).

Critical checkpoint conclusions:

  • Goal and test proof: The PR goal, avoiding storage expression column-id rebinding by evaluating against project-schema layout, is reasonable. The current implementation is not complete because project column metadata is not propagated to all production iterator entry points. No PR tests cover these direct iterator/schema-change paths.
  • Scope and clarity: The changed code is focused, but it changes a core storage read invariant now required by all segment iterator users.
  • Concurrency: No new shared mutable state, lock ordering, or thread-safety issue found.
  • Lifecycle: No new static/global lifecycle issue found. Pointer lifetimes are acceptable where set, but several direct paths never set the pointer.
  • Configuration and compatibility: No new configuration, storage format, EditLog, or FE-BE protocol compatibility issue found.
  • Parallel paths: This is the blocking area. Query TabletReader paths mostly propagate origin_return_columns, but direct Segment::new_iterator() and direct RowsetReaderContext paths are not updated.
  • Conditional checks/error handling: The new DORIS_CHECK assertions are valid only after all callers are updated. As written, normal schema change, cloud schema change, index build, and segment compaction paths can crash/fail before doing useful work.
  • Data correctness: No transaction visibility, delete bitmap versioning, or persistence change was found. The blocker is availability/correct execution of read paths.
  • Memory/performance/observability: No additional blocking memory-tracking or observability issue found. Project-block overhead is localized and secondary to the propagation bug.
  • Tests/results: git diff --check passed for the reviewed files. GitHub macOS BE UT failed before tests due runner JDK 25 vs required JDK 17, so it does not prove this PR behavior. Linux/TeamCity checks were still pending when reviewed.

User focus: .code-review.0DYI4r/review_focus.txt contained no additional user-provided focus points.

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 169508 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b450ca16e2db563c6a7246f04808336448807b47, data reload: false

query5	4320	637	489	489
query6	431	201	182	182
query7	4887	588	316	316
query8	374	253	200	200
query9	8778	4113	4155	4113
query10	467	322	272	272
query11	5931	2348	2174	2174
query12	155	107	103	103
query13	1293	612	442	442
query14	6396	5425	5107	5107
query14_1	4449	4427	4390	4390
query15	208	202	167	167
query16	985	435	453	435
query17	930	694	557	557
query18	2428	462	347	347
query19	201	195	139	139
query20	107	112	104	104
query21	218	142	121	121
query22	13754	13623	13607	13607
query23	17337	16506	16249	16249
query23_1	16273	16267	16349	16267
query24	7562	1766	1302	1302
query24_1	1319	1322	1323	1322
query25	564	448	405	405
query26	1302	317	160	160
query27	2712	511	329	329
query28	4476	2030	2034	2030
query29	1091	644	477	477
query30	307	224	198	198
query31	1133	1089	956	956
query32	99	61	59	59
query33	519	321	257	257
query34	1185	1153	629	629
query35	766	776	680	680
query36	1388	1407	1300	1300
query37	155	100	92	92
query38	3247	3172	3047	3047
query39	935	915	910	910
query39_1	873	867	878	867
query40	222	121	102	102
query41	68	63	62	62
query42	94	95	97	95
query43	329	329	284	284
query44	
query45	201	188	177	177
query46	1108	1223	752	752
query47	2336	2427	2265	2265
query48	407	411	286	286
query49	633	470	348	348
query50	978	362	255	255
query51	4372	4287	4213	4213
query52	89	90	76	76
query53	249	290	196	196
query54	278	215	217	215
query55	79	79	78	78
query56	252	235	220	220
query57	1454	1394	1323	1323
query58	244	219	212	212
query59	1601	1697	1485	1485
query60	284	253	234	234
query61	163	157	153	153
query62	726	650	599	599
query63	235	190	193	190
query64	2600	861	664	664
query65	
query66	1795	470	371	371
query67	29236	29177	29536	29177
query68	
query69	447	324	283	283
query70	967	928	968	928
query71	312	232	216	216
query72	3212	2690	2397	2397
query73	858	797	437	437
query74	5154	4962	4802	4802
query75	2677	2600	2249	2249
query76	2317	1148	788	788
query77	356	401	296	296
query78	12399	12294	11918	11918
query79	1335	1050	751	751
query80	546	473	416	416
query81	457	284	250	250
query82	244	167	123	123
query83	357	284	252	252
query84	
query85	887	507	436	436
query86	360	306	286	286
query87	3408	3395	3266	3266
query88	3629	2842	2772	2772
query89	416	390	330	330
query90	2043	174	172	172
query91	175	166	143	143
query92	65	61	58	58
query93	1537	1430	874	874
query94	549	345	307	307
query95	710	475	345	345
query96	1006	829	362	362
query97	2708	2710	2581	2581
query98	216	207	206	206
query99	1153	1187	1032	1032
Total cold run time: 250510 ms
Total hot run time: 169508 ms

### What problem does this PR solve?

Issue Number: N/A

Related PR: apache#64010

Problem Summary: Review feedback pointed out that the segment iterator should work from an explicit project schema, and that the project block helper should fill caller-owned storage instead of returning a selected block pointer. Follow-up coverage showed that direct RowsetReader and Segment::new_iterator callers do not pass the TabletReader-expanded project schema fields. The reader now keeps the SegmentIterator project_columns invariant, while BetaRowsetReader maps direct callers to return_columns and Segment::new_iterator maps direct callers to schema column IDs. The project block helper remains caller-owned and the ANN source column ordinal translation is documented.

### Release note

None

### Check List (For Author)

- Test: Unit Test
    - build-support/clang-format.sh
    - git diff --check
    - ./run-be-ut.sh --run --filter=TestDeltaWriterClusterKey.vec_sequence_col
- Behavior changed: No
- Does this need documentation: No
@BiteTheDDDDt BiteTheDDDDt force-pushed the codex/avoid-expr-rebind branch from b450ca1 to edcdc78 Compare June 10, 2026 16:06
@BiteTheDDDDt

Copy link
Copy Markdown
Contributor Author

/review

@BiteTheDDDDt

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29113 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit edcdc7828e2a7a8e31ee9083a4e55e28d24c0379, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17768	4142	4063	4063
q2	q3	10756	1409	834	834
q4	4698	474	346	346
q5	7552	872	600	600
q6	213	196	152	152
q7	818	864	631	631
q8	10080	1721	1653	1653
q9	7285	4517	4527	4517
q10	6736	1797	1517	1517
q11	434	282	250	250
q12	636	431	306	306
q13	18189	3327	2859	2859
q14	275	259	239	239
q15	q16	821	766	711	711
q17	1010	1064	917	917
q18	7038	5704	5455	5455
q19	1337	1242	1160	1160
q20	520	414	256	256
q21	6016	2539	2331	2331
q22	441	359	316	316
Total cold run time: 102623 ms
Total hot run time: 29113 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4462	4379	4388	4379
q2	q3	4540	4934	4299	4299
q4	2144	2225	1394	1394
q5	4472	4325	4367	4325
q6	242	186	135	135
q7	2143	1947	1731	1731
q8	2719	2295	2255	2255
q9	8108	8009	8020	8009
q10	4807	4739	4304	4304
q11	618	639	412	412
q12	752	737	550	550
q13	3275	3627	3168	3168
q14	305	317	264	264
q15	q16	732	752	640	640
q17	1397	1340	1369	1340
q18	8217	7665	7263	7263
q19	1164	1075	1115	1075
q20	2225	2264	1949	1949
q21	5363	4686	4623	4623
q22	522	464	421	421
Total cold run time: 58207 ms
Total hot run time: 52536 ms

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review summary

I reviewed the current five-file BE storage refactor around project schema plumbing, common expression evaluation, and virtual column materialization. I also read the existing inline review threads and did not find a new non-duplicate blocking issue to raise inline; the remaining suspicious virtual/output block-layout discussion is already covered by the existing thread, so I did not repeat it.

Critical checkpoint conclusions:

  • Goal and tests: The PR goal is to avoid rebinding storage expression column ids by evaluating pushed-down expressions against a projected schema. The current patch wires project_columns through normal TabletReader paths, direct RowsetReaderContext paths, and Segment::new_iterator fallbacks. I did not run local tests in this review pass; the visible BE UT macOS CI failure appears to be an environment issue (JAVA version is 25, required JDK 17), not this code path.
  • Scope and simplicity: The change is scoped to rowset/segment iterator plumbing and does not introduce a new broad abstraction beyond the projected-schema helper path.
  • Data correctness: I checked the non-direct aggregation path, direct schema-change/index/segcompaction paths, and virtual column materialization path. The current _build_project_block maps projected column ids back through _schema_block_id_map, which addresses the ordinal-mismatch class already discussed in earlier comments.
  • Concurrency and lifecycle: No new shared mutable state or locking behavior was introduced. The project_columns pointer is consumed during iterator initialization to build _project_schema, and the Segment::new_iterator fallback points at the schema owned by the iterator.
  • Error handling and memory: Status/exception boundaries are preserved in the pushed-down expression path, including the virtual execute_column call. I did not identify a new allocation or ownership issue.
  • Compatibility and persistence: No storage format, RPC, FE/BE protocol, config, or persistence compatibility change was found.
  • Performance: The fast path avoids building a projected block when the projected schema matches the reader schema; otherwise the extra block is limited to expression evaluation/materialization paths.
  • Observability: No additional logging or metrics seem required for this refactor.
  • User focus: The focus file contained no additional user-provided review focus beyond the full PR review.

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 169905 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit edcdc7828e2a7a8e31ee9083a4e55e28d24c0379, data reload: false

query5	4326	621	477	477
query6	487	205	184	184
query7	4869	566	308	308
query8	371	217	205	205
query9	8786	4086	4069	4069
query10	451	315	254	254
query11	5948	2333	2221	2221
query12	168	110	100	100
query13	1358	616	422	422
query14	6435	5366	5077	5077
query14_1	4397	4383	4379	4379
query15	208	194	180	180
query16	1086	450	432	432
query17	1133	690	584	584
query18	2445	492	348	348
query19	211	180	140	140
query20	112	109	104	104
query21	223	147	122	122
query22	13687	13582	13443	13443
query23	17318	16508	16464	16464
query23_1	16599	16448	16434	16434
query24	7483	1777	1329	1329
query24_1	1295	1350	1302	1302
query25	600	477	415	415
query26	1317	324	178	178
query27	2653	556	346	346
query28	4511	2045	2039	2039
query29	1104	626	506	506
query30	315	238	198	198
query31	1126	1081	958	958
query32	114	63	63	63
query33	547	339	263	263
query34	1184	1175	649	649
query35	767	808	694	694
query36	1391	1379	1239	1239
query37	159	112	96	96
query38	3248	3156	3063	3063
query39	927	920	889	889
query39_1	891	879	877	877
query40	227	128	106	106
query41	71	69	70	69
query42	101	99	97	97
query43	322	331	291	291
query44	
query45	195	188	187	187
query46	1090	1184	774	774
query47	2342	2368	2231	2231
query48	404	421	308	308
query49	667	487	387	387
query50	987	362	261	261
query51	4347	4309	4292	4292
query52	90	93	79	79
query53	256	275	195	195
query54	313	255	211	211
query55	83	79	73	73
query56	255	241	230	230
query57	1425	1444	1298	1298
query58	261	229	223	223
query59	1634	1688	1488	1488
query60	299	272	239	239
query61	217	154	160	154
query62	717	654	600	600
query63	226	188	189	188
query64	2609	793	643	643
query65	
query66	1822	478	343	343
query67	29734	29726	29558	29558
query68	
query69	429	313	265	265
query70	951	984	910	910
query71	310	223	209	209
query72	3003	2683	2413	2413
query73	852	793	434	434
query74	5136	4957	4802	4802
query75	2650	2570	2246	2246
query76	2358	1142	789	789
query77	360	387	283	283
query78	12293	12512	11824	11824
query79	1478	1083	774	774
query80	604	473	414	414
query81	457	286	265	265
query82	612	162	122	122
query83	374	293	254	254
query84	
query85	936	543	439	439
query86	403	287	294	287
query87	3391	3344	3172	3172
query88	3664	2751	2754	2751
query89	418	390	331	331
query90	1976	185	179	179
query91	174	162	148	148
query92	63	63	59	59
query93	1484	1526	905	905
query94	596	324	313	313
query95	684	475	354	354
query96	1077	830	349	349
query97	2776	2716	2567	2567
query98	217	205	202	202
query99	1160	1180	1021	1021
Total cold run time: 252276 ms
Total hot run time: 169905 ms

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 89.80% (132/147) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.86% (28285/38295)
Line Coverage 57.94% (308403/532270)
Region Coverage 54.86% (258670/471511)
Branch Coverage 56.22% (112218/199608)

HappenLee
HappenLee previously approved these changes Jun 11, 2026

@HappenLee HappenLee left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions

Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Jun 11, 2026
Comment thread be/src/storage/segment/segment.cpp Outdated
@@ -253,19 +253,23 @@ Status Segment::new_iterator(SchemaSPtr schema, const StorageReadOptions& read_o
if (read_options.runtime_state != nullptr) {
_be_exec_version = read_options.runtime_state->be_exec_version();
}
RETURN_IF_ERROR(_create_column_meta_once(read_options.stats));
auto iterator_read_options = read_options;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为什么要copy 一遍?

Comment thread be/src/storage/segment/segment.cpp Outdated
@github-actions github-actions Bot removed the approved Indicates a PR has been approved by one committer. label Jun 11, 2026
@BiteTheDDDDt

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29344 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 5612ce0234a728223105330f5ccee3ff4d5543e0, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17675	4051	4035	4035
q2	q3	10777	1424	797	797
q4	4683	488	365	365
q5	7538	897	602	602
q6	180	171	133	133
q7	761	857	616	616
q8	9324	1649	1678	1649
q9	5850	4561	4497	4497
q10	6704	1784	1512	1512
q11	453	269	251	251
q12	625	428	284	284
q13	18130	3575	2817	2817
q14	267	264	239	239
q15	q16	816	766	707	707
q17	1009	933	1036	933
q18	7140	5776	5548	5548
q19	1382	1347	1136	1136
q20	551	398	270	270
q21	6376	2847	2630	2630
q22	473	377	323	323
Total cold run time: 100714 ms
Total hot run time: 29344 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5015	4816	4754	4754
q2	q3	4859	5280	4636	4636
q4	2122	2189	1405	1405
q5	4797	5085	4688	4688
q6	231	178	130	130
q7	1824	1778	1626	1626
q8	2463	2119	2074	2074
q9	7939	7792	7429	7429
q10	4714	4640	4228	4228
q11	532	393	354	354
q12	734	734	523	523
q13	2967	3316	2789	2789
q14	269	280	260	260
q15	q16	674	700	601	601
q17	1287	1266	1251	1251
q18	7237	6645	6806	6645
q19	1121	1127	1091	1091
q20	2206	2201	1942	1942
q21	5280	4592	4402	4402
q22	530	455	409	409
Total cold run time: 56801 ms
Total hot run time: 51237 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 168921 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 5612ce0234a728223105330f5ccee3ff4d5543e0, data reload: false

query5	4336	603	465	465
query6	429	196	168	168
query7	4811	549	298	298
query8	355	214	198	198
query9	8756	4010	4055	4010
query10	451	309	260	260
query11	5921	2339	2140	2140
query12	159	101	95	95
query13	1301	600	407	407
query14	6382	5811	5016	5016
query14_1	4335	4358	4332	4332
query15	202	195	176	176
query16	996	454	439	439
query17	1022	671	559	559
query18	2426	477	334	334
query19	193	181	137	137
query20	112	109	102	102
query21	215	134	116	116
query22	13736	13612	13396	13396
query23	17351	16444	16109	16109
query23_1	16257	16245	16320	16245
query24	7502	1764	1278	1278
query24_1	1313	1293	1318	1293
query25	566	457	396	396
query26	1296	334	162	162
query27	2652	536	333	333
query28	4478	2016	1996	1996
query29	1074	614	496	496
query30	308	234	196	196
query31	1123	1075	955	955
query32	101	61	58	58
query33	525	316	284	284
query34	1175	1159	649	649
query35	742	775	719	719
query36	1422	1385	1295	1295
query37	146	98	87	87
query38	3186	3160	3009	3009
query39	936	919	896	896
query39_1	880	853	878	853
query40	215	121	99	99
query41	63	62	60	60
query42	94	91	93	91
query43	333	317	278	278
query44	
query45	193	179	177	177
query46	1081	1229	761	761
query47	2369	2442	2235	2235
query48	399	406	288	288
query49	600	467	350	350
query50	1008	340	255	255
query51	4305	4284	4241	4241
query52	88	85	75	75
query53	239	269	192	192
query54	275	211	203	203
query55	76	75	70	70
query56	237	218	214	214
query57	1454	1405	1323	1323
query58	238	213	210	210
query59	1545	1624	1429	1429
query60	279	243	234	234
query61	154	156	154	154
query62	706	640	586	586
query63	240	185	185	185
query64	2517	773	603	603
query65	
query66	1810	466	332	332
query67	29675	29554	29496	29496
query68	
query69	419	291	261	261
query70	958	992	955	955
query71	296	232	213	213
query72	2944	2663	2339	2339
query73	869	790	425	425
query74	5111	4923	4788	4788
query75	2652	2562	2247	2247
query76	2308	1171	773	773
query77	346	402	271	271
query78	12409	12343	11930	11930
query79	1249	1078	731	731
query80	518	465	375	375
query81	448	275	242	242
query82	243	155	118	118
query83	263	266	241	241
query84	
query85	836	491	422	422
query86	353	300	270	270
query87	3380	3310	3221	3221
query88	3577	2740	2716	2716
query89	408	382	322	322
query90	2195	178	175	175
query91	175	155	130	130
query92	64	58	52	52
query93	1547	1416	891	891
query94	532	356	319	319
query95	660	393	441	393
query96	1123	792	355	355
query97	2686	2684	2554	2554
query98	214	211	204	204
query99	1135	1175	1033	1033
Total cold run time: 249572 ms
Total hot run time: 168921 ms

BiteTheDDDDt added a commit to BiteTheDDDDt/incubator-doris that referenced this pull request Jun 11, 2026
### What problem does this PR solve?

Issue Number: None

Related PR: apache#64010

Problem Summary: Address review comments for avoiding storage expression column id rebinding. The project-schema column-id to block-position map was initialized in multiple places, and project-block construction still ran on ordinary scan paths where the project schema matched the reader schema. Virtual column materialization also rebuilt the project block for each expression. This change centralizes schema block mapping initialization, reuses the original block when the project schema is identical to the reader schema, reuses the projected block during virtual column materialization, and keeps expression exceptions converted to Status at the virtual-column evaluation boundary.

### Release note

None

### Check List (For Author)

- Test: Unit Test
    - ./build.sh --be
    - build-support/run-clang-tidy.sh attempted; it reported pre-existing/analysis diagnostics in segment_iterator.cpp, segment_iterator.h, and be/src/util/jni-util.h unrelated to the changed logic.
- Behavior changed: No
- Does this need documentation: No
BiteTheDDDDt added a commit to BiteTheDDDDt/incubator-doris that referenced this pull request Jun 11, 2026
### What problem does this PR solve?

Issue Number: N/A

Related PR: apache#64010

Problem Summary: Review feedback pointed out that the segment iterator should work from an explicit project schema, and that the project block helper should fill caller-owned storage instead of returning a selected block pointer. Follow-up coverage showed that direct RowsetReader and Segment::new_iterator callers do not pass the TabletReader-expanded project schema fields. The reader now keeps the SegmentIterator project_columns invariant, while BetaRowsetReader maps direct callers to return_columns and Segment::new_iterator maps direct callers to schema column IDs. The project block helper remains caller-owned and the ANN source column ordinal translation is documented.

### Release note

None

### Check List (For Author)

- Test: Unit Test
    - build-support/clang-format.sh
    - git diff --check
    - ./run-be-ut.sh --run --filter=TestDeltaWriterClusterKey.vec_sequence_col
- Behavior changed: No
- Does this need documentation: No
BiteTheDDDDt added a commit to BiteTheDDDDt/incubator-doris that referenced this pull request Jun 12, 2026
### What problem does this PR solve?

Issue Number: None

Related PR: apache#64010

Problem Summary: Address review comments for avoiding storage expression column id rebinding. The project-schema column-id to block-position map was initialized in multiple places, and project-block construction still ran on ordinary scan paths where the project schema matched the reader schema. Virtual column materialization also rebuilt the project block for each expression. This change centralizes schema block mapping initialization, reuses the original block when the project schema is identical to the reader schema, reuses the projected block during virtual column materialization, and keeps expression exceptions converted to Status at the virtual-column evaluation boundary.

### Release note

None

### Check List (For Author)

- Test: Unit Test
    - ./build.sh --be
    - build-support/run-clang-tidy.sh attempted; it reported pre-existing/analysis diagnostics in segment_iterator.cpp, segment_iterator.h, and be/src/util/jni-util.h unrelated to the changed logic.
- Behavior changed: No
- Does this need documentation: No
BiteTheDDDDt added a commit to BiteTheDDDDt/incubator-doris that referenced this pull request Jun 12, 2026
### What problem does this PR solve?

Issue Number: N/A

Related PR: apache#64010

Problem Summary: Review feedback pointed out that the segment iterator should work from an explicit project schema, and that the project block helper should fill caller-owned storage instead of returning a selected block pointer. Follow-up coverage showed that direct RowsetReader and Segment::new_iterator callers do not pass the TabletReader-expanded project schema fields. The reader now keeps the SegmentIterator project_columns invariant, while BetaRowsetReader maps direct callers to return_columns and Segment::new_iterator maps direct callers to schema column IDs. The project block helper remains caller-owned and the ANN source column ordinal translation is documented.

### Release note

None

### Check List (For Author)

- Test: Unit Test
    - build-support/clang-format.sh
    - git diff --check
    - ./run-be-ut.sh --run --filter=TestDeltaWriterClusterKey.vec_sequence_col
- Behavior changed: No
- Does this need documentation: No
BiteTheDDDDt added a commit to BiteTheDDDDt/incubator-doris that referenced this pull request Jun 13, 2026
### What problem does this PR solve?

Issue Number: None

Related PR: apache#64010

Problem Summary: Address review comments for avoiding storage expression column id rebinding. The project-schema column-id to block-position map was initialized in multiple places, and project-block construction still ran on ordinary scan paths where the project schema matched the reader schema. Virtual column materialization also rebuilt the project block for each expression. This change centralizes schema block mapping initialization, reuses the original block when the project schema is identical to the reader schema, reuses the projected block during virtual column materialization, and keeps expression exceptions converted to Status at the virtual-column evaluation boundary.

### Release note

None

### Check List (For Author)

- Test: Unit Test
    - ./build.sh --be
    - build-support/run-clang-tidy.sh attempted; it reported pre-existing/analysis diagnostics in segment_iterator.cpp, segment_iterator.h, and be/src/util/jni-util.h unrelated to the changed logic.
- Behavior changed: No
- Does this need documentation: No
BiteTheDDDDt added a commit to BiteTheDDDDt/incubator-doris that referenced this pull request Jun 13, 2026
Issue Number: N/A

Related PR: apache#64010

Problem Summary: Review feedback pointed out that the segment iterator should work from an explicit project schema, and that the project block helper should fill caller-owned storage instead of returning a selected block pointer. Follow-up coverage showed that direct RowsetReader and Segment::new_iterator callers do not pass the TabletReader-expanded project schema fields. The reader now keeps the SegmentIterator project_columns invariant, while BetaRowsetReader maps direct callers to return_columns and Segment::new_iterator maps direct callers to schema column IDs. The project block helper remains caller-owned and the ANN source column ordinal translation is documented.

None

- Test: Unit Test
    - build-support/clang-format.sh
    - git diff --check
    - ./run-be-ut.sh --run --filter=TestDeltaWriterClusterKey.vec_sequence_col
- Behavior changed: No
- Does this need documentation: No
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants