Make scheduled outerloop builds succeed when only Helix tests fail#129049
Open
mmitche wants to merge 4 commits into
Open
Make scheduled outerloop builds succeed when only Helix tests fail#129049mmitche wants to merge 4 commits into
mmitche wants to merge 4 commits into
Conversation
The libraries outerloop pipeline runs on a daily schedule with always:false, meaning AzDO only re-queues a commit if there were changes since the last successful scheduled run. Because flaky outerloop tests cause the 'Send to Helix' task to fail on essentially every scheduled run, the build never succeeds, so AzDO re-queues the same commit every day and submits ever more Helix work for an unchanged sha. Set shouldContinueOnError on the Send to Helix step for scheduled builds only (Build.Reason == 'Schedule'), so Helix work item failures no longer fail the build. Compile/build breaks still fail the build, and PR/CI/manual runs are unaffected. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
|
Tagging subscribers to this area: @dotnet/area-infrastructure-libraries |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the libraries outerloop Azure DevOps pipeline to avoid failing scheduled runs due to Helix work item/test failures, with the intent of preventing always: false schedules from repeatedly re-queuing the same commit and submitting duplicate Helix work.
Changes:
- Pass
shouldContinueOnError: ${{ eq(variables['Build.Reason'], 'Schedule') }}into the threeplatform-matrix.ymlinvocations inouterloop.yml. - Add inline YAML comments explaining the rationale (avoid same-SHA daily re-queues and wasted Helix capacity).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+26
to
+29
| # Don't fail scheduled builds on Helix work item failures. Otherwise a perpetually | ||
| # failing scheduled build (flaky outerloop tests) causes AzDO to re-queue the same | ||
| # commit every day, wasting Helix resources. See always:false on the schedule above. | ||
| shouldContinueOnError: ${{ eq(variables['Build.Reason'], 'Schedule') }} |
Member
Author
|
Bleh, it's right. partiallySucceeded won't cause AzDO to avoid scheduling. |
continueOnError only marks the build partiallySucceeded, which AzDO's always:false scheduler still treats as not-successful, so the same commit keeps getting re-queued daily. Instead, for scheduled builds, tell the Helix SDK not to fail the build on work item / test failures by passing FailOnWorkItemFailure=false and FailOnTestFailure=false. The Send to Helix step then fully succeeds, so a perpetually-flaky scheduled run no longer causes AzDO to re-queue the same sha. - helix.yml: add failOnTestFailures parameter (default true = current behavior) wired to the FailOnWorkItemFailure/FailOnTestFailure Helix SDK properties. - outerloop.yml: pass failOnTestFailures=false only for scheduled builds (Build.Reason == 'Schedule'); replaces the earlier shouldContinueOnError approach. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…will revert) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Member
Author
|
If this looks reasonble we should backport to 9.0 and 10.0 for outerloop. |
This was referenced Jun 6, 2026
Open
Member
|
/azp list |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note
This pull request was authored with the assistance of GitHub Copilot.
Problem
Several scheduled outerloop pipelines (the
outerloop.ymlfamily:runtime-libraries-coreclr outerloopand its-windows/-linux/-osxvariants) use analways: falsescheduled trigger. Withalways: false, AzDO only starts a new scheduled run if the source changed since the last successful scheduled run.Because the repo has many flaky outerloop tests, the Helix test work items virtually always have at least one failure, which fails the "Send to Helix" step and therefore the whole build. The build never reaches a
succeededstate, so AzDO re-queues the same, unchanged commit day after day, submitting more and more Helix work for no benefit. (Empirically confirmed: a single commit was re-run and failed for 19 consecutive days; once a sibling definition produced a genuinely successful run, the same-SHA re-queue stopped.)Why
continueOnErroris not enoughcontinueOnError: trueonly downgrades the build topartiallySucceeded, which AzDO'salways: falsescheduler still does not treat as successful — so the same commit keeps getting re-queued. The Helix step must end fully successful (exit 0).Fix
Make the "Send to Helix" step actually succeed on scheduled runs by disabling the two Arcade
Microsoft.DotNet.Helix.Sdkproperties that fail the build (both default totrue):FailOnWorkItemFailure—CheckHelixJobStatuserrors when a work item exits non-zero.FailOnTestFailure—CheckAzurePipelinesTestResultserrors when any published test failed.Setting both to
falselets the msbuild step exit 0, producing a fullysucceededbuild. Failed tests are still published and visible in the test results tab; AzDO does not auto-degrade a build topartiallySucceededjust because a published test run contains failures — only a failing task would.Changes
eng/pipelines/libraries/helix.yml: Added afailOnTestFailuresparameter (defaulttrue, preserving today's behavior) wired to/p:FailOnWorkItemFailureand/p:FailOnTestFailureon the Send to Helix msbuild invocation.eng/pipelines/libraries/outerloop.yml: PassesfailOnTestFailures: falseonly on scheduled runs (Build.Reason == 'Schedule') for all three matrix legs (Release, Debug, NET48).Behavior preservation
The new parameter defaults to
true, so all otherhelix.ymlcallers are unaffected (none setWaitForWorkItemCompletionor these properties on this path, so they already resolve totrue). Only scheduled outerloop runs change behavior. PR / rolling / manual outerloop runs continue to fail on Helix failures exactly as before. Build/compile breaks still fail scheduled runs (this only affects the Helix step).Tradeoff
On scheduled runs,
FailOnWorkItemFailure=falsealso masks work-item crashes/timeouts/infra failures, not just test-assertion failures. This is an accepted tradeoff for the goal of stopping the wasteful daily re-queue of unchanged commits; results remain visible in the Helix/test reporting.