Skip to content

Add peer download fallback to PredownloadScheduler#18641

Open
pradeeee wants to merge 5 commits into
apache:masterfrom
pradeeee:predownload-peer-download
Open

Add peer download fallback to PredownloadScheduler#18641
pradeeee wants to merge 5 commits into
apache:masterfrom
pradeeee:predownload-peer-download

Conversation

@pradeeee

@pradeeee pradeeee commented May 31, 2026

Copy link
Copy Markdown

Problem

The PredownloadScheduler runs in a predownload init-container before the main Pinot server starts. It downloads segments from deep storage (e.g., S3, HDFS, GCS) to local disk so the server can start faster.

Currently, if a deep storage download fails (e.g., due to transient network issues, throttling, or deep store unavailability), the segment is marked as failed and the predownload container reports partial failure. There is no fallback mechanism.

In production, deep storage failures can be transient while peer servers holding the same segment are available and healthy. The main Pinot server already supports peer download as a fallback, but the predownload container does not.

Changes

PredownloadScheduler.java

  1. Constructor now accepts a boolean peerDownloadEnabled parameter
  2. Reads the peer download scheme from InstanceDataManagerConfig
  3. Extracted existing deep store download logic into downloadFromDeepStore() for clarity
  4. Added downloadFromPeers() method that:
    • Discovers ONLINE peer servers via ExternalView from ZooKeeper
    • Shuffles the peer list for load distribution
    • Downloads the segment tar file from a peer using SegmentFetcherFactory.fetchAndDecryptSegmentToLocal()
    • Untars and moves the segment to the final data directory
  5. When deep store download fails and peer download is enabled, catches the exception and falls back to peer download

PredownloadZKClient.java

  1. Added getPeerServerURIs() method that discovers ONLINE peer servers hosting a given segment
  2. Uses ExternalView (not IdealState) to find servers that actually have the segment ready
  3. Builds download URIs using the peer's hostname and admin port
  4. Mirrors the logic of PeerServerSegmentFinder but works without requiring a HelixManager instance (which is not available in the predownload container)

Tests

  • PredownloadSchedulerTest.java: Added tests for deep store fallback to peer download, peer-only download failure, and constructor with peer download flag
  • PredownloadZKClientTest.java: Added tests for getPeerServerURIs() covering no external view, no online peers, and successful peer discovery

Test Plan

When a segment download from deep storage fails in the predownload
container, fall back to downloading from peer servers if peer download
is enabled. This improves predownload reliability when deep storage is
temporarily unavailable or experiencing issues.

Changes:
- Accept peerDownloadEnabled flag in PredownloadScheduler constructor
- Extract deep store download logic into downloadFromDeepStore()
- Add downloadFromPeers() method that discovers ONLINE peer servers
  via ExternalView and downloads the segment from a shuffled peer list
- Add getPeerServerURIs() to PredownloadZKClient to discover peers
  without requiring HelixManager
- Add tests for peer download fallback and ZK peer discovery

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@pradeeee pradeeee force-pushed the predownload-peer-download branch from 28637ac to 1aa6c64 Compare May 31, 2026 15:43
LOGGER.info("Created thread pool with num of threads: {}", predownloadParallelism);

_peerDownloadEnabled = peerDownloadEnabled;
_peerDownloadScheme = _instanceDataManagerConfig.getSegmentPeerDownloadScheme();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Let's sanitize this similar to:

_peerDownloadScheme = _peerDownloadScheme.toLowerCase();
Preconditions.checkState(
CommonConstants.HTTP_PROTOCOL.equals(_peerDownloadScheme) || CommonConstants.HTTPS_PROTOCOL.equals(
_peerDownloadScheme), "Unsupported peer download scheme: %s for table: %s", _peerDownloadScheme,
_tableNameWithType);
}

}

/**
* Returns URIs of ONLINE peer servers hosting the given segment, excluding the current instance.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "excluding the current instance" This will always be excluded as predownload runs even before server process starts so EV will not have it.

@pradeeee pradeeee Jun 21, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is wrong; in the current code, we are taking all the nodes that have a segment hosted and in an online state. This can include the Old node from which replacement is triggered.

The idea is that wherever the segment is healthy, download from there, for some cases where segment is not committed to peer node it will be available in the old node only.

psinghnegi and others added 2 commits June 21, 2026 11:44
…eter

The upstream StartServer now sets pinot.server.peer.download.enabled
in properties, so the constructor no longer needs the boolean parameter.
This simplifies the API and keeps configuration in one place.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. Sanitize _peerDownloadScheme similar to BaseTableDataManager:
   lowercase the value and validate it's http or https.
2. Remove "excluding the current instance" from getPeerServerURIs
   Javadoc — predownload runs before server starts, so the current
   instance is never in ExternalView.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@pradeeee pradeeee force-pushed the predownload-peer-download branch from ae3bcf8 to 5175511 Compare June 21, 2026 12:51
psinghnegi and others added 2 commits June 22, 2026 03:03
- PREDOWNLOAD_DEEPSTORE_DOWNLOAD_COUNT: deep store success counter
- PREDOWNLOAD_PEER_SEGMENT_DOWNLOAD_COUNT: peer download success counter
- PREDOWNLOAD_PEER_SEGMENT_DOWNLOAD_FAILURE_COUNT: peer download failure
  counter with segment name as key for per-segment tracking
- PEER_DOWNLOAD_SPEED: gauge for peer download speed (MB/s)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- testDeepStoreSuccessEmitsDeepStoreMetric: verifies deepStoreSegmentDownloaded()
  is called on deep store success, peer metrics not called
- testPeerDownloadSuccessEmitsPeerMetrics: verifies peerSegmentDownloaded(true)
  and segmentDownloaded(true) both called on peer fallback success
- testPeerDownloadFailureEmitsSegmentLevelMetric: verifies
  peerSegmentDownloaded(false, segmentName) called with segment name on failure

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
}
}

private void downloadFromPeers(PredownloadSegmentInfo predownloadSegmentInfo)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are two modes of segment download in deepstoreDownload:

  1. download -> untar -> move
  2. streaming untar

This method only supports former. Is it possible to handle the second mechanism as well.

}
}

private void downloadFromPeers(PredownloadSegmentInfo predownloadSegmentInfo)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add metrics to seggregate calls between deepstore and fallback(peer) download

@rohityadav1993

Copy link
Copy Markdown
Contributor

please address the commet, overall LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants