Skip to content

Fixes #XXXXX - Reuse persistent Foreman connections#950

Draft
pablomh wants to merge 1 commit into
theforeman:developfrom
pablomh:net-http-persistent-review-stack
Draft

Fixes #XXXXX - Reuse persistent Foreman connections#950
pablomh wants to merge 1 commit into
theforeman:developfrom
pablomh:net-http-persistent-review-stack

Conversation

@pablomh

@pablomh pablomh commented Jun 25, 2026

Copy link
Copy Markdown

Summary

  • reuse a memoized Net::HTTP::Persistent client for outbound smart-proxy -> Foreman requests instead of creating a fresh Net::HTTP client for each request
  • keep ForemanRequest as the existing entry point so the request-building flow used by registration and template callers stays unchanged
  • preserve the existing SSL configuration behavior and update the focused tests for the shared client and the keep-alive headers emitted by the persistent client

Test plan

  • bundle3.2 exec ruby3.2 -Itest test/request_test.rb
  • bundle3.2 exec ruby3.2 -Itest test/templates/template_proxy_request_test.rb
  • bundle3.2 exec ruby3.2 -Itest test/registration/registration_api_test.rb
  • bundle3.2 exec rubocop lib/proxy/request.rb test/request_test.rb test/templates/template_proxy_request_test.rb test/registration/registration_api_test.rb smart_proxy.gemspec
  • bundle3.2 exec rake test (rubyipmi is missing in this environment, so the suite stops in test/bmc/... before reaching completion)

@pablomh

pablomh commented Jun 25, 2026

Copy link
Copy Markdown
Author

This would need theforeman/foreman-packaging#13670.

@ekohl

ekohl commented Jun 25, 2026

Copy link
Copy Markdown
Member

Alternative smaller implementation: #951.

Keep ForemanRequest as the existing entry point for outbound Foreman calls, but
back it with a memoized Net::HTTP::Persistent client instead of creating a new
Net::HTTP instance for each request.

This keeps the change narrowly focused on connection reuse for proxy-to-Foreman
traffic while preserving the current request-building flow used by registration
and template callers. The updated tests cover shared-client reuse, SSL
compatibility, and the keep-alive headers now emitted by the persistent client.

Co-authored-by: Cursor <cursoragent@cursor.com>
@pablomh pablomh force-pushed the net-http-persistent-review-stack branch from 204d9c4 to 61c1519 Compare June 25, 2026 16:03
@pablomh

pablomh commented Jun 25, 2026

Copy link
Copy Markdown
Author

Reduced version of the code.

@pablomh

pablomh commented Jun 29, 2026

Copy link
Copy Markdown
Author

We deployed PR #950 on 6 capsules under incremental registration load from 76 to 1140 concurrent hosts. Satellite was configured with --tuning large (24 Puma workers × 5 threads).

CLOSE-WAIT accumulation: All capsules accumulated 80-103 CLOSE-WAIT sockets to Satellite:443 with 0 ESTAB, visible even at rest between batches. Satellite closes connections after MaxKeepAliveRequests 100 or KeepAliveTimeout 15, but the closed sockets sit in CLOSE-WAIT between pool checkouts. WEBrick's thread-per-request model creates higher connection churn than the stable thread pools connection_pool was designed for, amplifying the accumulation.

ConnectionPool::TimeoutError under load: Starting at ~300 concurrent hosts, requests failed with ConnectionPool::TimeoutError: Waited 0.5 sec. net-http-persistent's Pool checkout inherits a 0.5-second pop timeout from TimedStack. When pooled connections are busy serving Satellite requests that take 10-27 seconds, new threads cannot acquire a connection and fail immediately.

POST /register failures: ~2-3% of POSTs failed with EOFError or ECONNRESET. Net::HTTP does not retry non-idempotent methods, so these propagated as 500s with no recovery.

Connection reuse worked correctly under sequential load (20 requests, 1 socket). The issues appeared under concurrent WEBrick load.

Why net-http-persistent is a poor fit for WEBrick

net-http-persistent v4 delegates to connection_pool, which was designed for servers with stable, long-lived thread pools (Puma, Sidekiq). WEBrick creates short-lived threads per request, producing higher checkout/checkin churn and making pool sizing difficult: too small triggers TimeoutError under load; too large (default ~256) causes excessive CLOSE-WAIT accumulation.

The 0.5-second checkout timeout — inherited from TimedStack#pop and not configurable through net-http-persistent's API — is adequate for fast backends but too aggressive when Satellite response times reach 10+ seconds under registration load.

Similar issues have been reported by users of Stripe, Elasticsearch, Faraday, and Mechanize with net-http-persistent under high concurrency, with workarounds typically being idle_timeout tuning or switching to a different HTTP library (net-http-persistent#37, #19).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants