[improvement](fe) Bootstrap table stats after insert into select#64332
[improvement](fe) Bootstrap table stats after insert into select#64332wenzhenghu wants to merge 15 commits into
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
TPC-H: Total hot run time: 28998 ms |
TPC-DS: Total hot run time: 169188 ms |
|
run buildall |
…otential deadlock
|
run buildall |
TPC-H: Total hot run time: 28913 ms |
TPC-DS: Total hot run time: 168991 ms |
TPC-H: Total hot run time: 29257 ms |
FE UT Coverage ReportIncrement line coverage |
TPC-DS: Total hot run time: 169853 ms |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
1 similar comment
FE Regression Coverage ReportIncrement line coverage |
|
The pipeline failed the nocurrent test case due to a memory leak in other unrelated C++ code; all other test cases passed successfully. |
|
run buildall |
TPC-H: Total hot run time: 29105 ms |
TPC-DS: Total hot run time: 170763 ms |
FE Regression Coverage ReportIncrement line coverage |
|
The pipeline failed cloud_p0 case caused by unrelated node liveliness problems; all other cases succeeded. |
|
run buildall |
TPC-H: Total hot run time: 28846 ms |
TPC-DS: Total hot run time: 170350 ms |
|
all test cases passed successfully. |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
TPC-H: Total hot run time: 29070 ms |
TPC-DS: Total hot run time: 170391 ms |
FE UT Coverage ReportIncrement line coverage |
|
run buildall |
|
pass gpt 5.5 review |
TPC-H: Total hot run time: 28758 ms |
TPC-DS: Total hot run time: 168714 ms |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
TPC-H: Total hot run time: 29368 ms |
TPC-DS: Total hot run time: 169945 ms |
What problem does this PR solve?
In ETL scenarios, after creating a large table (e.g. 262K rows) via CTAS or INSERT INTO SELECT, it is immediately joined with a known small table (e.g. 10 rows). Because auto-analyze has not yet completed, the FE optimizer cannot obtain the row count of the new table and falls back to 1, causing the large table to be incorrectly chosen as the broadcast (replicated) side. This leads to excessive memory usage and query cancellation.
The root cause chain: after CTAS/INSERT INTO SELECT becomes VISIBLE, the new table has no
TableStatsMeta.StatsCalculator.getOlapTableRowCount()receives-1and is clamped byMath.max(1, -1)to1. If the small table has been analyzed and has a known row count (e.g. 10), the broadcast cost model considers1 < 10and broadcasts the large table.Solution
After CTAS/INSERT INTO SELECT transaction becomes VISIBLE, bootstrap a minimal
TableStatsMetathat contains only table-level and base-index row count, without any column statistics. This allows the optimizer to consume the row count for correct broadcast-side selection.Core changes:
TableStatsMeta.newBootstrapStats(): creates aTableStatsMetawith onlyrowCount,updatedRows, and base indexindexesRowCount. Does not setuserInjectedand does not interfere with subsequent auto-analyze scheduling.AnalysisManager.bootstrapTableStatsIfAbsent(): double-checked locking, only writes when noTableStatsMetaexists andloadedRows > 0.OlapInsertExecutor: invokes bootstrap after the transaction reaches VISIBLE status.ShowTableStatsCommand: adds null guard forjobType, as bootstrap stats have no associated analyze job.New Session Variable
enable_insert_select_table_stats_bootstrap(defaultfalse, EXPERIMENTAL)Usage:
Check List
TableStatsMetaTest.testNewBootstrapStatsSeedsBaseIndexRowCount— verifies bootstrap metadata field correctnessOlapInsertExecutorTest.testExecuteSingleInsertVisibleBootstrapsTableStatsWhenAbsent— verifies bootstrap takes effect when enabledOlapInsertExecutorTest.testExecuteSingleInsertVisibleDoesNotBootstrapTableStatsWhenDisabled— verifies no bootstrap when disabled (default)ShowTableStatsCommandTest.testConstructTableResultSetForBootstrapStats— verifiesSHOW TABLE STATSrenders bootstrap metadata without NPEinsert_select_table_stats_bootstrap.groovy— two-phase assertions: when disabled,stats=1and large table is broadcast; when enabled,stats=262,144and small table is broadcast. Ran 10 consecutive times on a remote Doris instance, all passed.