Skip to content

[fix](filecache) Disable sync file cache clear in http api#64321

Merged
freemandealer merged 1 commit into
apache:masterfrom
freemandealer:task-master-disable-sync-clear-file-cache-in-htt
Jun 10, 2026
Merged

[fix](filecache) Disable sync file cache clear in http api#64321
freemandealer merged 1 commit into
apache:masterfrom
freemandealer:task-master-disable-sync-clear-file-cache-in-htt

Conversation

@freemandealer

Copy link
Copy Markdown
Member

What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: The file cache HTTP clear endpoint allowed callers to request synchronous clear with sync=true. The synchronous clear path is kept for unit tests and internal callers, but it is not safe to expose through the HTTP API. This change makes the HTTP op=clear endpoint always use async clear. The default behavior remains async, and explicit sync=true returns a warning message while still running async clear. Segment-specific HTTP clear also uses async remove instead of the synchronous remove path.

Release note

The BE /api/file_cache?op=clear HTTP API no longer supports synchronous clearing. Requests with sync=true now run async clear and return a warning message.

Check List (For Author)

  • Test: Unit Test
    • DORIS_TOOLCHAIN=clang DISABLE_BE_JAVA_EXTENSIONS=ON ENABLE_INJECTION_POINT=ON ENABLE_CACHE_LOCK_DEBUG=0 ENABLE_PCH=0 sh run-be-ut.sh --run --filter=FileCacheActionTest.*
  • Behavior changed: Yes. HTTP file cache clear no longer exposes synchronous clear.
  • Does this need documentation: No

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: The file cache HTTP clear endpoint allowed callers to request synchronous clear with sync=true. The synchronous clear path is kept for unit tests and internal callers, but it is not safe to expose through the HTTP API. This change makes the HTTP op=clear endpoint always use async clear. The default behavior remains async, and explicit sync=true returns a warning message while still running async clear. Segment-specific HTTP clear also uses async remove instead of the synchronous remove path.

### Release note

The BE /api/file_cache?op=clear HTTP API no longer supports synchronous clearing. Requests with sync=true now run async clear and return a warning message.

### Check List (For Author)

- Test: Unit Test
    - DORIS_TOOLCHAIN=clang DISABLE_BE_JAVA_EXTENSIONS=ON ENABLE_INJECTION_POINT=ON ENABLE_CACHE_LOCK_DEBUG=0 ENABLE_PCH=0 sh run-be-ut.sh --run --filter=FileCacheActionTest.*
- Behavior changed: Yes. HTTP file cache clear no longer exposes synchronous clear.
- Does this need documentation: No
@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@freemandealer

Copy link
Copy Markdown
Member Author

run buildall

@freemandealer

Copy link
Copy Markdown
Member Author

/review

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29593 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 29d3a6f6ec810b104553722068e36946ae8f01e6, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17636	4011	4057	4011
q2	q3	10754	1433	842	842
q4	4686	474	345	345
q5	7667	908	592	592
q6	188	179	139	139
q7	804	889	638	638
q8	9373	1599	1587	1587
q9	5871	4551	4527	4527
q10	6825	1818	1546	1546
q11	436	277	253	253
q12	626	427	285	285
q13	18185	3397	2818	2818
q14	275	267	238	238
q15	q16	832	776	716	716
q17	932	964	947	947
q18	7149	5900	5679	5679
q19	1829	1280	1103	1103
q20	556	414	265	265
q21	6289	2868	2746	2746
q22	467	398	316	316
Total cold run time: 101380 ms
Total hot run time: 29593 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5155	4792	4922	4792
q2	q3	4973	5273	4755	4755
q4	2137	2243	1377	1377
q5	4789	4887	4765	4765
q6	233	177	128	128
q7	1861	1805	1625	1625
q8	2446	2174	2191	2174
q9	8034	7856	7428	7428
q10	4759	4692	4233	4233
q11	532	388	367	367
q12	731	736	526	526
q13	3087	3462	2826	2826
q14	271	280	267	267
q15	q16	683	702	603	603
q17	1306	1278	1255	1255
q18	7453	6779	6774	6774
q19	1105	1079	1071	1071
q20	2237	2237	1951	1951
q21	5369	4661	4490	4490
q22	536	460	412	412
Total cold run time: 57697 ms
Total hot run time: 51819 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 168933 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 29d3a6f6ec810b104553722068e36946ae8f01e6, data reload: false

query5	4332	658	490	490
query6	468	217	207	207
query7	4934	566	319	319
query8	386	238	210	210
query9	8783	4121	4123	4121
query10	465	331	267	267
query11	5950	2349	2204	2204
query12	162	108	100	100
query13	1275	622	432	432
query14	6429	5416	5081	5081
query14_1	4422	4403	4396	4396
query15	211	197	181	181
query16	993	466	441	441
query17	949	717	589	589
query18	2474	475	355	355
query19	205	198	151	151
query20	114	112	109	109
query21	216	146	127	127
query22	13656	13603	13386	13386
query23	17407	16559	16137	16137
query23_1	16285	16396	16300	16300
query24	7559	1771	1301	1301
query24_1	1335	1346	1334	1334
query25	615	479	412	412
query26	1302	321	178	178
query27	2700	535	346	346
query28	4495	2054	2060	2054
query29	1146	642	532	532
query30	316	249	203	203
query31	1116	1089	956	956
query32	123	64	64	64
query33	537	334	280	280
query34	1187	1164	645	645
query35	750	803	659	659
query36	1407	1391	1234	1234
query37	153	100	91	91
query38	3222	3130	3042	3042
query39	929	910	883	883
query39_1	888	873	889	873
query40	227	122	100	100
query41	64	62	60	60
query42	95	95	93	93
query43	319	329	279	279
query44	
query45	192	183	178	178
query46	1096	1190	745	745
query47	2386	2414	2244	2244
query48	396	408	285	285
query49	619	454	359	359
query50	1015	337	251	251
query51	4417	4317	4306	4306
query52	89	89	77	77
query53	242	259	187	187
query54	273	214	194	194
query55	80	76	74	74
query56	236	229	228	228
query57	1435	1437	1310	1310
query58	245	221	211	211
query59	1591	1668	1457	1457
query60	290	258	237	237
query61	166	162	161	161
query62	712	650	590	590
query63	233	188	190	188
query64	2574	782	641	641
query65	
query66	1839	468	341	341
query67	29742	29657	28883	28883
query68	
query69	429	300	272	272
query70	963	961	958	958
query71	301	224	221	221
query72	3021	2732	2415	2415
query73	870	744	447	447
query74	5178	4926	4770	4770
query75	2673	2586	2249	2249
query76	2335	1153	777	777
query77	350	389	286	286
query78	12393	12439	11844	11844
query79	1437	1030	754	754
query80	586	475	390	390
query81	455	277	243	243
query82	593	160	129	129
query83	360	278	253	253
query84	
query85	892	563	448	448
query86	379	292	292	292
query87	3396	3325	3210	3210
query88	3596	2751	2709	2709
query89	429	383	331	331
query90	1959	186	190	186
query91	183	164	146	146
query92	61	62	56	56
query93	1549	1373	855	855
query94	559	363	307	307
query95	698	483	349	349
query96	985	787	342	342
query97	2705	2737	2545	2545
query98	215	207	207	207
query99	1150	1176	1001	1001
Total cold run time: 251529 ms
Total hot run time: 168933 ms

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 75.00% (6/8) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.83% (28286/38314)
Line Coverage 57.85% (307930/532265)
Region Coverage 54.65% (257692/471549)
Branch Coverage 56.04% (111850/199593)

@zclllyybb

Copy link
Copy Markdown
Contributor

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approval opinion: no blocking issues found.

Critical checkpoint conclusions:

  • Goal and proof: The PR makes /api/file_cache?op=clear always use the async file-cache clear/remove paths, including sync=true, and adds BE unit coverage for default clear, sync=true warning response, and segment-specific clear.
  • Scope/focus: The code change is small and focused on the HTTP action plus targeted unit tests.
  • Concurrency/lifecycle: The modified endpoint delegates to existing clear_file_caches(false) and remove_if_cached_async() paths. Those paths take the cache lock and synchronously remove releasable metadata while deferring storage deletion, matching the stated goal of avoiding exposed synchronous deletion. No new lock ordering, static initialization, or lifetime concern was found.
  • Config/compatibility: No new config or storage/protocol format is added. This is an intentional HTTP API behavior change; callers using sync=true now receive a warning JSON body while the clear still succeeds.
  • Parallel paths: Full-cache clear and segment-specific clear are both switched away from the synchronous path. Internal/unit-test callers can still use clear_file_caches(true) / remove_if_cached().
  • Error handling and observability: No ignored Status was introduced. The HTTP response remains success JSON for sync=true, with an explicit message; existing logs in the underlying clear paths remain available.
  • Data/transaction correctness: No transaction, visible-version, persistence, or FE/BE protocol path is modified.
  • Security/threat model: I read SECURITY.md and threat-model.md. BE webserver 8040 is cluster-internal under the model; the endpoint remains on the existing ADMIN-authenticated handler path when global HTTP auth is enabled, and this PR reduces the exposed operation surface.
  • Tests: I did not run local tests in this shallow review checkout. CI shows Linux BE UT, compile, format, and cloud/regression checks passing. The macOS BE UT failure is unrelated to this PR code; its log exits during setup with ERROR: The JAVA version is 25, it must be JDK-17.

Existing review context: no prior inline review threads were present. User focus: no additional review focus was provided.

@zclllyybb

Copy link
Copy Markdown
Contributor

skip buildall

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approval opinion: no blocking issues found in this PR.

Critical checkpoint conclusions:

  • Goal and proof: The PR disables the HTTP-exposed synchronous file-cache clear path by routing /api/file_cache?op=clear to clear_file_caches(false) and segment-specific clears to remove_if_cached_async(). The added BE unit tests cover default clear, sync=true warning behavior, and segment-specific clear.
  • Scope/focus: The implementation is small and focused on the HTTP action plus targeted tests. The unrelated local CODEOWNERS two-dot diff is not part of the GitHub PR file set and was not treated as review scope.
  • Concurrency/lifecycle: The changed endpoint delegates to existing file-cache clear/remove implementations. These paths take the cache lock, remove releasable metadata synchronously, and defer storage deletion through the established recycle queue. I found no new lock-order, static-initialization, lifetime, or ownership issue.
  • Config/dynamic behavior: No new config item is added. The behavior change is intentionally limited to HTTP callers; internal and unit-test callers can still use the synchronous file-cache APIs.
  • Compatibility/API behavior: This intentionally changes the HTTP API semantics for sync=true. Existing Groovy callers I checked either ignore the body or only require json.status == "OK"; the new warning response preserves that success status.
  • Parallel paths: Both full-cache clear and segment-specific HTTP clear are switched away from synchronous removal. The separate microbench/internal paths are not HTTP API exposure and do not need the same restriction.
  • Error handling: No new ignored Status or exception boundary issue was introduced. The modified APIs return summary strings rather than failure Status, matching existing call-site behavior.
  • Data/transaction correctness: No transaction, visible-version, delete-bitmap, storage-format, or FE/BE protocol path is modified.
  • Observability: Existing logs and metrics in the underlying clear/remove paths remain in use; the new sync=true response also makes the degraded behavior explicit to HTTP callers.
  • Performance: The change removes the HTTP ability to run the direct synchronous storage clear path. The remaining async clear still scans cache metadata under the existing lock, which is pre-existing behavior for sync=false and is aligned with this PR's goal.
  • Tests/results: I did not run BE UT locally in this shallow review checkout. I checked the PR diff and ran git diff --check on the two PR files. Current CI shows Linux BE UT, compile, style, regression, cloud, and coverage checks passing. The macOS BE UT failure exits during setup with ERROR: The JAVA version is 25, it must be JDK-17, so I did not attribute it to this PR.
  • Security/threat model: I read SECURITY.md and threat-model.md. BE webserver 8040 is cluster-internal under the model; this PR reduces the exposed operation surface on that internal HTTP endpoint and does not introduce a new in-model security issue.

Existing inline review context: .opencode-review.EbFr2V/pr_review_threads.md reported no existing inline review comments or replies, and the raw inline comments JSON was empty.
User focus response: .opencode-review.EbFr2V/review_focus.txt provided no additional focus points, so there was no extra focused area beyond the full PR review.

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Jun 9, 2026
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@freemandealer freemandealer merged commit 118f9e6 into apache:master Jun 10, 2026
31 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants