[fix](filecache) Disable sync file cache clear in http api#64321
Conversation
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: The file cache HTTP clear endpoint allowed callers to request synchronous clear with sync=true. The synchronous clear path is kept for unit tests and internal callers, but it is not safe to expose through the HTTP API. This change makes the HTTP op=clear endpoint always use async clear. The default behavior remains async, and explicit sync=true returns a warning message while still running async clear. Segment-specific HTTP clear also uses async remove instead of the synchronous remove path.
### Release note
The BE /api/file_cache?op=clear HTTP API no longer supports synchronous clearing. Requests with sync=true now run async clear and return a warning message.
### Check List (For Author)
- Test: Unit Test
- DORIS_TOOLCHAIN=clang DISABLE_BE_JAVA_EXTENSIONS=ON ENABLE_INJECTION_POINT=ON ENABLE_CACHE_LOCK_DEBUG=0 ENABLE_PCH=0 sh run-be-ut.sh --run --filter=FileCacheActionTest.*
- Behavior changed: Yes. HTTP file cache clear no longer exposes synchronous clear.
- Does this need documentation: No
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
/review |
TPC-H: Total hot run time: 29593 ms |
TPC-DS: Total hot run time: 168933 ms |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
/review |
There was a problem hiding this comment.
Approval opinion: no blocking issues found.
Critical checkpoint conclusions:
- Goal and proof: The PR makes
/api/file_cache?op=clearalways use the async file-cache clear/remove paths, includingsync=true, and adds BE unit coverage for default clear,sync=truewarning response, and segment-specific clear. - Scope/focus: The code change is small and focused on the HTTP action plus targeted unit tests.
- Concurrency/lifecycle: The modified endpoint delegates to existing
clear_file_caches(false)andremove_if_cached_async()paths. Those paths take the cache lock and synchronously remove releasable metadata while deferring storage deletion, matching the stated goal of avoiding exposed synchronous deletion. No new lock ordering, static initialization, or lifetime concern was found. - Config/compatibility: No new config or storage/protocol format is added. This is an intentional HTTP API behavior change; callers using
sync=truenow receive a warning JSON body while the clear still succeeds. - Parallel paths: Full-cache clear and segment-specific clear are both switched away from the synchronous path. Internal/unit-test callers can still use
clear_file_caches(true)/remove_if_cached(). - Error handling and observability: No ignored
Statuswas introduced. The HTTP response remains success JSON forsync=true, with an explicit message; existing logs in the underlying clear paths remain available. - Data/transaction correctness: No transaction, visible-version, persistence, or FE/BE protocol path is modified.
- Security/threat model: I read
SECURITY.mdandthreat-model.md. BE webserver 8040 is cluster-internal under the model; the endpoint remains on the existing ADMIN-authenticated handler path when global HTTP auth is enabled, and this PR reduces the exposed operation surface. - Tests: I did not run local tests in this shallow review checkout. CI shows Linux BE UT, compile, format, and cloud/regression checks passing. The macOS BE UT failure is unrelated to this PR code; its log exits during setup with
ERROR: The JAVA version is 25, it must be JDK-17.
Existing review context: no prior inline review threads were present. User focus: no additional review focus was provided.
|
skip buildall |
There was a problem hiding this comment.
Approval opinion: no blocking issues found in this PR.
Critical checkpoint conclusions:
- Goal and proof: The PR disables the HTTP-exposed synchronous file-cache clear path by routing
/api/file_cache?op=cleartoclear_file_caches(false)and segment-specific clears toremove_if_cached_async(). The added BE unit tests cover default clear,sync=truewarning behavior, and segment-specific clear. - Scope/focus: The implementation is small and focused on the HTTP action plus targeted tests. The unrelated local
CODEOWNERStwo-dot diff is not part of the GitHub PR file set and was not treated as review scope. - Concurrency/lifecycle: The changed endpoint delegates to existing file-cache clear/remove implementations. These paths take the cache lock, remove releasable metadata synchronously, and defer storage deletion through the established recycle queue. I found no new lock-order, static-initialization, lifetime, or ownership issue.
- Config/dynamic behavior: No new config item is added. The behavior change is intentionally limited to HTTP callers; internal and unit-test callers can still use the synchronous file-cache APIs.
- Compatibility/API behavior: This intentionally changes the HTTP API semantics for
sync=true. Existing Groovy callers I checked either ignore the body or only requirejson.status == "OK"; the new warning response preserves that success status. - Parallel paths: Both full-cache clear and segment-specific HTTP clear are switched away from synchronous removal. The separate microbench/internal paths are not HTTP API exposure and do not need the same restriction.
- Error handling: No new ignored
Statusor exception boundary issue was introduced. The modified APIs return summary strings rather than failureStatus, matching existing call-site behavior. - Data/transaction correctness: No transaction, visible-version, delete-bitmap, storage-format, or FE/BE protocol path is modified.
- Observability: Existing logs and metrics in the underlying clear/remove paths remain in use; the new
sync=trueresponse also makes the degraded behavior explicit to HTTP callers. - Performance: The change removes the HTTP ability to run the direct synchronous storage clear path. The remaining async clear still scans cache metadata under the existing lock, which is pre-existing behavior for
sync=falseand is aligned with this PR's goal. - Tests/results: I did not run BE UT locally in this shallow review checkout. I checked the PR diff and ran
git diff --checkon the two PR files. Current CI shows Linux BE UT, compile, style, regression, cloud, and coverage checks passing. The macOS BE UT failure exits during setup withERROR: The JAVA version is 25, it must be JDK-17, so I did not attribute it to this PR. - Security/threat model: I read
SECURITY.mdandthreat-model.md. BE webserver 8040 is cluster-internal under the model; this PR reduces the exposed operation surface on that internal HTTP endpoint and does not introduce a new in-model security issue.
Existing inline review context: .opencode-review.EbFr2V/pr_review_threads.md reported no existing inline review comments or replies, and the raw inline comments JSON was empty.
User focus response: .opencode-review.EbFr2V/review_focus.txt provided no additional focus points, so there was no extra focused area beyond the full PR review.
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: The file cache HTTP clear endpoint allowed callers to request synchronous clear with sync=true. The synchronous clear path is kept for unit tests and internal callers, but it is not safe to expose through the HTTP API. This change makes the HTTP op=clear endpoint always use async clear. The default behavior remains async, and explicit sync=true returns a warning message while still running async clear. Segment-specific HTTP clear also uses async remove instead of the synchronous remove path.
Release note
The BE /api/file_cache?op=clear HTTP API no longer supports synchronous clearing. Requests with sync=true now run async clear and return a warning message.
Check List (For Author)
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)