Add peer download fallback to PredownloadScheduler#18641
Conversation
When a segment download from deep storage fails in the predownload container, fall back to downloading from peer servers if peer download is enabled. This improves predownload reliability when deep storage is temporarily unavailable or experiencing issues. Changes: - Accept peerDownloadEnabled flag in PredownloadScheduler constructor - Extract deep store download logic into downloadFromDeepStore() - Add downloadFromPeers() method that discovers ONLINE peer servers via ExternalView and downloads the segment from a shuffled peer list - Add getPeerServerURIs() to PredownloadZKClient to discover peers without requiring HelixManager - Add tests for peer download fallback and ZK peer discovery Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
28637ac to
1aa6c64
Compare
| LOGGER.info("Created thread pool with num of threads: {}", predownloadParallelism); | ||
|
|
||
| _peerDownloadEnabled = peerDownloadEnabled; | ||
| _peerDownloadScheme = _instanceDataManagerConfig.getSegmentPeerDownloadScheme(); |
There was a problem hiding this comment.
nit: Let's sanitize this similar to:
| } | ||
|
|
||
| /** | ||
| * Returns URIs of ONLINE peer servers hosting the given segment, excluding the current instance. |
There was a problem hiding this comment.
nit: "excluding the current instance" This will always be excluded as predownload runs even before server process starts so EV will not have it.
There was a problem hiding this comment.
The comment is wrong; in the current code, we are taking all the nodes that have a segment hosted and in an online state. This can include the Old node from which replacement is triggered.
The idea is that wherever the segment is healthy, download from there, for some cases where segment is not committed to peer node it will be available in the old node only.
…eter The upstream StartServer now sets pinot.server.peer.download.enabled in properties, so the constructor no longer needs the boolean parameter. This simplifies the API and keeps configuration in one place. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. Sanitize _peerDownloadScheme similar to BaseTableDataManager: lowercase the value and validate it's http or https. 2. Remove "excluding the current instance" from getPeerServerURIs Javadoc — predownload runs before server starts, so the current instance is never in ExternalView. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ae3bcf8 to
5175511
Compare
- PREDOWNLOAD_DEEPSTORE_DOWNLOAD_COUNT: deep store success counter - PREDOWNLOAD_PEER_SEGMENT_DOWNLOAD_COUNT: peer download success counter - PREDOWNLOAD_PEER_SEGMENT_DOWNLOAD_FAILURE_COUNT: peer download failure counter with segment name as key for per-segment tracking - PEER_DOWNLOAD_SPEED: gauge for peer download speed (MB/s) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- testDeepStoreSuccessEmitsDeepStoreMetric: verifies deepStoreSegmentDownloaded() is called on deep store success, peer metrics not called - testPeerDownloadSuccessEmitsPeerMetrics: verifies peerSegmentDownloaded(true) and segmentDownloaded(true) both called on peer fallback success - testPeerDownloadFailureEmitsSegmentLevelMetric: verifies peerSegmentDownloaded(false, segmentName) called with segment name on failure Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| } | ||
| } | ||
|
|
||
| private void downloadFromPeers(PredownloadSegmentInfo predownloadSegmentInfo) |
There was a problem hiding this comment.
there are two modes of segment download in deepstoreDownload:
- download -> untar -> move
- streaming untar
This method only supports former. Is it possible to handle the second mechanism as well.
| } | ||
| } | ||
|
|
||
| private void downloadFromPeers(PredownloadSegmentInfo predownloadSegmentInfo) |
There was a problem hiding this comment.
let's add metrics to seggregate calls between deepstore and fallback(peer) download
|
please address the commet, overall LGTM |
Problem
The
PredownloadSchedulerruns in a predownload init-container before the main Pinot server starts. It downloads segments from deep storage (e.g., S3, HDFS, GCS) to local disk so the server can start faster.Currently, if a deep storage download fails (e.g., due to transient network issues, throttling, or deep store unavailability), the segment is marked as failed and the predownload container reports partial failure. There is no fallback mechanism.
In production, deep storage failures can be transient while peer servers holding the same segment are available and healthy. The main Pinot server already supports peer download as a fallback, but the predownload container does not.
Changes
PredownloadScheduler.java
boolean peerDownloadEnabledparameterInstanceDataManagerConfigdownloadFromDeepStore()for claritydownloadFromPeers()method that:SegmentFetcherFactory.fetchAndDecryptSegmentToLocal()PredownloadZKClient.java
getPeerServerURIs()method that discovers ONLINE peer servers hosting a given segmentPeerServerSegmentFinderbut works without requiring aHelixManagerinstance (which is not available in the predownload container)Tests
PredownloadSchedulerTest.java: Added tests for deep store fallback to peer download, peer-only download failure, and constructor with peer download flagPredownloadZKClientTest.java: Added tests forgetPeerServerURIs()covering no external view, no online peers, and successful peer discoveryTest Plan