Fixes #XXXXX - Reuse persistent Foreman connections#950

Draft

pablomh wants to merge 1 commit into

theforeman:developfrom

pablomh:net-http-persistent-review-stack

pablomh commented Jun 25, 2026 •

edited

Loading

Summary

reuse a memoized Net::HTTP::Persistent client for outbound smart-proxy -> Foreman requests instead of creating a fresh Net::HTTP client for each request
keep ForemanRequest as the existing entry point so the request-building flow used by registration and template callers stays unchanged
preserve the existing SSL configuration behavior and update the focused tests for the shared client and the keep-alive headers emitted by the persistent client

Test plan

bundle3.2 exec ruby3.2 -Itest test/request_test.rb
bundle3.2 exec ruby3.2 -Itest test/templates/template_proxy_request_test.rb
bundle3.2 exec ruby3.2 -Itest test/registration/registration_api_test.rb
bundle3.2 exec rubocop lib/proxy/request.rb test/request_test.rb test/templates/template_proxy_request_test.rb test/registration/registration_api_test.rb smart_proxy.gemspec
bundle3.2 exec rake test (rubyipmi is missing in this environment, so the suite stops in test/bmc/... before reaching completion)

pr-processor Bot added Not yet reviewed Waiting on contributor labels

pablomh commented Jun 25, 2026

Author

This would need theforeman/foreman-packaging#13670.

ekohl commented Jun 25, 2026

Member

Alternative smaller implementation: #951.


          Fixes #XXXXX - Reuse persistent Foreman connections

61c1519

Keep ForemanRequest as the existing entry point for outbound Foreman calls, but
back it with a memoized Net::HTTP::Persistent client instead of creating a new
Net::HTTP instance for each request.

This keeps the change narrowly focused on connection reuse for proxy-to-Foreman
traffic while preserving the current request-building flow used by registration
and template callers. The updated tests cover shared-client reuse, SSL
compatibility, and the keep-alive headers now emitted by the persistent client.

Co-authored-by: Cursor <cursoragent@cursor.com>

pablomh force-pushed the net-http-persistent-review-stack branch from 204d9c4 to 61c1519 Compare

June 25, 2026 16:03

pablomh commented Jun 25, 2026

Author

Reduced version of the code.

pablomh commented Jun 29, 2026

Author

We deployed PR #950 on 6 capsules under incremental registration load from 76 to 1140 concurrent hosts. Satellite was configured with --tuning large (24 Puma workers × 5 threads).

CLOSE-WAIT accumulation: All capsules accumulated 80-103 CLOSE-WAIT sockets to Satellite:443 with 0 ESTAB, visible even at rest between batches. Satellite closes connections after MaxKeepAliveRequests 100 or KeepAliveTimeout 15, but the closed sockets sit in CLOSE-WAIT between pool checkouts. WEBrick's thread-per-request model creates higher connection churn than the stable thread pools connection_pool was designed for, amplifying the accumulation.

ConnectionPool::TimeoutError under load: Starting at ~300 concurrent hosts, requests failed with ConnectionPool::TimeoutError: Waited 0.5 sec. net-http-persistent's Pool checkout inherits a 0.5-second pop timeout from TimedStack. When pooled connections are busy serving Satellite requests that take 10-27 seconds, new threads cannot acquire a connection and fail immediately.

POST /register failures: ~2-3% of POSTs failed with EOFError or ECONNRESET. Net::HTTP does not retry non-idempotent methods, so these propagated as 500s with no recovery.

Connection reuse worked correctly under sequential load (20 requests, 1 socket). The issues appeared under concurrent WEBrick load.

Why `net-http-persistent` is a poor fit for WEBrick

net-http-persistent v4 delegates to connection_pool, which was designed for servers with stable, long-lived thread pools (Puma, Sidekiq). WEBrick creates short-lived threads per request, producing higher checkout/checkin churn and making pool sizing difficult: too small triggers TimeoutError under load; too large (default ~256) causes excessive CLOSE-WAIT accumulation.

The 0.5-second checkout timeout — inherited from TimedStack#pop and not configurable through net-http-persistent's API — is adequate for fast backends but too aggressive when Satellite response times reach 10+ seconds under registration load.

Similar issues have been reported by users of Stripe, Elasticsearch, Faraday, and Mechanize with net-http-persistent under high concurrency, with workarounds typically being idle_timeout tuning or switching to a different HTTP library (net-http-persistent#37, #19).

pablomh mentioned this pull request

Fixes #39359 - Reuse persistent Foreman connections and retry stale POST /register pablomh/smart-proxy#4

Open

4 tasks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Not yet reviewed Waiting on contributor