Implement a make jobserver (continuation of #9210)#9625
Merged
Conversation
- Fix #9170 - In #9131, we added running `make` in parallel with the `-j` flag. It's problematic because since Bundler installs gems in parallel, we could end up installing `N gem x M processors` which would exhaust the machine resources and causes issues like #9170. In this patch, I have implemented a make [jobserver](https://www.gnu.org/software/make/manual/html_node/POSIX-Jobserver.html), so that each thread installing a gem can take available job slots from the pool. I also want to highlight that with this patch, we can still end up running 2 more jobs than what's available. The reason being that if a gem is waiting for slots to be available, then the whole `bundle install` takes much longer which defeats the whole purpose of the `make -j` optimization. So if there is no more slots available, we compile the extension without parallelization (we still use one cpu though). Example ---------- If `bundle install -j 6` is running, then we have 6 slots available. Each gem with native extension that gets installed may take up to *3* slots (see my reasoning on this number below). In the event where 3 gems with native extensions get installed, 2 gems will consume all 6 slots, leaving the third gem without any. Ultimately, this means that we could use *at maximum 2 more slots than what's available* (e.g. `bundle install -j 3` for 3 gems will use 5 slots). I chose `3` slots per `make` process because after installing multiple gems, I didn't see any benefit to adding more jobs. It's possible that some gems have many `make` recipes that could run more than 3 in parallel, but I don't think this is very frequent.
IO.new defaults to read mode when no mode is given. On POSIX, Ruby inspects the descriptor's access mode and opens the write end of the jobserver pipe for writing anyway, but on Windows that inspection is not available, so releasing slots raised "IOError: not opened for writing". Pass explicit modes so the write end is usable on every platform. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The guard that avoids appending a duplicate -j used /-j\d?(\s|\Z)/, but \d? matches at most one digit, so an existing -j10 (or any two-digit job count) failed to match and a second -j was appended anyway. Use \d* so a job count of any width is recognized. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
connect_to_jobserver matched the first --jobserver-auth in MAKEFLAGS, but the value is appended after any pre-existing flags. When bundle install runs under a parent make jobserver, MAKEFLAGS already carries the parent's --jobserver-auth, so the leftmost match returned the parent's descriptors instead of the pool ParallelInstaller just created. Scan and take the last match so we always connect to our own pipe. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
If the read end raised while closing, the original ensure skipped both the write-end close and the MAKEFLAGS restore, leaving --jobserver-auth with closed descriptors in the process environment. Restore MAKEFLAGS first and guard the closes so the environment is always cleaned up. Building the value with compact.join also drops the stray leading space when MAKEFLAGS was previously unset. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
redefine_build_jobs coordinated the two worker threads with `sleep(0.1) while flag` loops. If a thread died before clearing its flag the other spun forever, and the polling added latency to every run. Use blocking Thread::Queue handoffs so the ordering is deterministic and a failure surfaces instead of hanging on a flag that never flips. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Name the per-gem slot cap MAX_JOBS_PER_GEM with a comment explaining the limit, rename the local to acquired_jobs since it counts grabbed tokens rather than free ones, and rescue EOFError alongside IO::WaitReadable so a closed jobserver pipe falls back to a single job instead of raising. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bundler's quality spec rejects weak modifiers in lib source, and the connect_to_jobserver comment used "the pool we just created". Name ParallelInstaller instead so the meaning stays clear without "just". Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Member
|
Thank you soo much for picking this again @hsbt ❤️ ! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resumes #9210. @Edouard-chin I'd like to get this solved, so I'm picking your work back up here. This rebases your jobserver implementation onto the current master, resolving the conflicts from the spec directory move and the priority queue change.
Fixes #9170.
Make sure the following tasks are checked