Add copilot-pr-autopilot skill#1944
Open
yeelam-gordon wants to merge 135 commits into
Open
Conversation
Contributor
🔍 Skill Validator Results
Summary
Full validator output |
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds the new copilot-pr-autopilot skill, including PowerShell automation scripts and reference docs to drive a “request review → wait → triage → fix → reply/resolve → converge” loop using the gh CLI + GitHub GraphQL.
Changes:
- Introduces shared PowerShell helpers (
_lib.ps1) and several workflow scripts (request review, snapshot status, list threads, reply/resolve, cleanup outdated). - Adds detailed operational documentation (workflow, API quirks, triage rubric, reply templates) for consistent agent behavior.
- Registers the new skill in
docs/README.skills.md.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| skills/copilot-pr-autopilot/scripts/_lib.ps1 | Adds shared gh prerequisite checks and wrappers (Invoke-Gh, Invoke-GhGraphQL, Resolve-RepoCoords) used by all scripts. |
| skills/copilot-pr-autopilot/scripts/01-request-review.ps1 | Requests Copilot review via GraphQL and verifies via copilot_work_started event polling. |
| skills/copilot-pr-autopilot/scripts/02-check-review-status.ps1 | Produces a JSON snapshot of PR review state + convergence signals for polling logic. |
| skills/copilot-pr-autopilot/scripts/03-list-open-threads.ps1 | Lists unresolved review threads (all reviewers) for downstream triage. |
| skills/copilot-pr-autopilot/scripts/08-reply-and-resolve.ps1 | Posts a reply to a review thread and optionally resolves it. |
| skills/copilot-pr-autopilot/scripts/10-cleanup-outdated.ps1 | Bulk-resolves outdated Copilot threads as a post-convergence safety net. |
| skills/copilot-pr-autopilot/references/workflow.md | Documents the loop, delegation map, budgets, and operational contracts. |
| skills/copilot-pr-autopilot/references/api-quirks.md | Captures verified GitHub API/gh quirks that drive implementation choices. |
| skills/copilot-pr-autopilot/references/03-triage-criteria.md | Defines fix/decline/escalate rubric and reviewer-type policy. |
| skills/copilot-pr-autopilot/references/06-reply-templates.md | Provides reply templates and anti-patterns for thread responses. |
| skills/copilot-pr-autopilot/SKILL.md | Defines the skill, prerequisites, workflow, and troubleshooting guidance. |
| docs/README.skills.md | Adds the skill entry and links to its references/scripts. |
c7b2a60 to
b73d67f
Compare
3d3642a to
04067ae
Compare
…rker Loop engineering round 119 (4 threads, all legit). _lib.ps1 IsPathRooted (thread PRRT_kwDOO6BQUc6IlaCM): Replaced `[IO.Path]::IsPathRooted()` with `IsPathFullyQualified()` (PS 7+) and a regex fallback (PS 5.1) for `C:\\...`, `\\\\...`, or `/...`. IsPathRooted returns $true for drive-relative paths like `C:temp` (no separator after the colon), which Windows resolves against the current directory on drive C: — NOT fully qualified. Without this check, a typo'd $env:COPILOT_PR_AUTOPILOT_TEMP_ROOT would silently leak PR content into the caller's cwd. Error message now explicitly names the drive-relative case. 10-cleanup-outdated.ps1 Get-AllThreadAuthors inner loop (thread PRRT_kwDOO6BQUc6IlaC4): Moved the Assert-CopilotPrAutopilotPageInfo call BEFORE the hasNextPage/endCursor read in the page-1+ branch of Get-AllThreadAuthors (round 116 fixed only the first-page assert). Now consistent with the assert-then-dereference pattern in 01/02/03. 08 + 10 escalation marker (threads PRRT_kwDOO6BQUc6IlaDI + IlaDX): Added `$CopilotPrAutopilot_EscalationMarker` constant (`<!-- copilot-pr-autopilot:escalated -->`) in _lib.ps1. 08-reply-and-resolve.ps1: when invoked with -NoResolve, automatically appends the marker to the reply body (idempotent — re-runs with the same body don't accumulate markers). 10-cleanup-outdated.ps1: GraphQL lastComment query now also fetches body; threads whose last comment carries the marker are skipped with a new $skippedEscalatedToUser counter, even when the last commenter is $me and -Force is set. The marker is an explicit operator decision — -Force does NOT override it. This preserves the workflow contract that escalated-to-user threads stay open for the human merge owner, even when the underlying file is later deleted and the thread becomes outdated. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Loop engineering round 120 (2 threads, both legit).
08-reply-and-resolve.ps1 (thread PRRT_kwDOO6BQUc6IlkmG):
Changed `$Body.TrimEnd()` to `$Body.TrimEnd("`r","`n")` when
appending the escalation marker. Bare TrimEnd() also strips spaces
and tabs, which would silently mutate the user's Markdown body —
two trailing spaces are a Markdown hard line break. Now only CR/LF
are stripped to make room for the marker prologue; all other
trailing whitespace round-trips verbatim.
_lib.ps1 (thread PRRT_kwDOO6BQUc6Ilkmv):
Re-indented lines 308-324 (the IsPathFullyQualified validation block
added in round 119) by 4 spaces so they nest visibly under the
`if (-not [string]::IsNullOrWhiteSpace($env:COPILOT_PR_AUTOPILOT_TEMP_ROOT)) {`
block. PowerShell ignored the misindent semantically but it read as
"validation may not be scoped to the env-var override" and would
invite a future scoping regression.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…scape
Loop engineering round 121 (3 threads, all legit).
_lib.ps1 timeout-kill path (thread PRRT_kwDOO6BQUc6IlpyM):
Restructured the PS 5.1 / .NET Framework fallback to try
`taskkill /T /F /PID $proc.Id` BEFORE `Process.Kill()` (parent-only).
Previously: Kill($true) catch → Kill() → later if-not-killedTree
block tried taskkill. Problem: once Kill() terminated the parent
PID, the subsequent taskkill /PID raced the parent's death and
could miss children (credential helpers etc.), leaving orphans
after a timeout. New: in the catch path, try taskkill first while
the parent PID is still alive; only fall back to Kill() if taskkill
is unavailable or itself fails. The post-kill belt-and-suspenders
taskkill block remains for the case where Kill($true) succeeded but
a stubborn child survived.
_lib.ps1 Invoke-CopilotPrAutopilotGhProcess comment (thread PRRT_kwDOO6BQUc6IlpzI):
Updated the doc-comment return-type signature from
`@{ ExitCode; Stdout; Stderr }` (hashtable syntax) to
`[pscustomobject]@{ ... }` to match the actual `return
[pscustomobject]@{ ExitCode = $proc.ExitCode; ... }` at the end of
the function. Avoids callers wrongly assuming hashtable ordering /
ContainsKey / [hashtable] casting semantics.
_lib.ps1 Resolve-RepoCoords error message (thread PRRT_kwDOO6BQUc6Ilpzi):
Escape embedded single quotes in $Owner / $Repo before interpolating
into the partial-override throw (doubled per the PowerShell single-
quoted-string convention). Matches the same escape pattern in the
earlier GH_HOST invalid-host error. Without this, a value containing
`'` produces an ambiguous diagnostic that looks truncated or
mid-quote.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Loop engineering round 122 (3 threads, all legit). 10-cleanup-outdated.ps1 help header (thread PRRT_kwDOO6BQUc6IlzDz): Added the escalate-to-user-marker exclusion bullet to the "Only acts on threads where" list in the SYNOPSIS/DESCRIPTION block, including the explicit note that -Force does NOT override it. Now the help text matches the actual resolution behaviour introduced in round 119 (the marker is an explicit operator decision so -Force can't override it). 02-check-review-status.ps1 Copilot review selection (thread PRRT_kwDOO6BQUc6IlzFQ): Added GraphQL `id` to the reviews(last:100) selection and changed the Copilot-review tie-break to `Sort-Object submittedAt,id -Descending` (both the at-HEAD and fallback branches). Stop relying on Sort-Object stability — stability is a current implementation detail, not a documented contract, so a future PowerShell release could silently flip which same-submittedAt Copilot review wins and with it the ReviewAtHead / NoNewComments flags. Descending `id` (GitHub-issued monotonic node ID) is a reasonable proxy for "newest" among ties, and even if occasionally wrong it stays stable across runs. _lib.ps1 typo (thread PRRT_kwDOO6BQUc6IlzF0): `mis-spelled` → `misspelled`. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Loop engineering round 123 (3 threads: 1 fixed, 2 pushed back). 10-cleanup-outdated.ps1 summary IDs (thread PRRT_kwDOO6BQUc6Il53T) FIXED: Extracted a `Format-SkippedIdsTrailer` helper that caps the inline "Thread IDs:" list at 20 entries by default. When more than 20 IDs are skipped, the summary line shows the first 20 with `... and N more` and the full list is emitted via Write-Verbose (one ID per line) so CI logs stay parseable on the default Write-Output stream while operators running with -Verbose still get the complete list. Applied to both $skippedHumanInThread and $skippedUnknownAuthorInThread trailers (the only two carrying ID lists; the other counters don't). The other 2 threads (function-name namespacing + Windows TEMP ACL 14th repeat) are pushed back with rationale in the per-thread replies. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…r] cast
Loop engineering round 124 (3 threads, all legit).
_lib.ps1 Resolve-RepoCoords gh repo view failure (thread PRRT_kwDOO6BQUc6ImAuX):
Failure throw now surfaces both stderr AND stdout (with the same
`(empty)` sentinel used elsewhere). gh repo view failure modes
(auth handshake errors, repo-not-found responses) frequently land
the actionable detail on stdout while stderr only carries the exit
envelope. Mirrors the Invoke-GhGraphQL stderr+stdout pattern.
10-cleanup-outdated.ps1 Format-SkippedIdsTrailer (thread PRRT_kwDOO6BQUc6ImAvO):
Added `[ValidateRange(1, [int]::MaxValue)]` to `$MaxInline`. The
range expression `0..($MaxInline - 1)` would be invalid for
$MaxInline <= 0 (produces a descending range or throws). Fail at
parameter binding with the canonical PSArgumentTransformation
error message rather than at use-site.
08-reply-and-resolve.ps1 TrimEnd cast (thread PRRT_kwDOO6BQUc6ImAvs):
Switched `$Body.TrimEnd("`r","`n")` to
`$Body.TrimEnd([char]13, [char]10)` to match the explicit-[char]
pattern already used in _lib.ps1 (lines 661/678/679). Both forms
call the same `string.TrimEnd(params char[])` overload via implicit
string→char conversion, but the explicit cast makes the intent
obvious for future maintainers.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…cuit Get-AllThreadAuthors
Round 125 review fixes:
1. _lib.ps1 (Invoke-CopilotPrAutopilotGhProcess timeout branch):
Best-effort credential redaction on the combined partial stdout/stderr
tail before embedding in the timeout throw. Args were already redacted
for -F/-h/-H/--header, but `gh --verbose` and certain GitHub API error
bodies can echo Authorization headers, gh[ps]_* tokens, github_pat_*,
or JSON {"access_token":"..."} fragments. Patterns target the formats
gh and GitHub APIs actually emit; truncation still bounds the throw.
2. 10-cleanup-outdated.ps1 (Get-AllThreadAuthors):
New optional -StopPredicate scriptblock lets callers short-circuit
pagination on a 1000+ comment thread once a disqualifying author
is observed. Caller now passes a predicate that returns true for
null/ghost authors and non-Copilot, non-$me logins — the existing
downstream classification still runs on the returned (partial) list
in O(1). Avoids sequential GraphQL calls per remaining 100-comment
page on pathological threads.
Pushed back twice more on the Windows TEMP ACL suggestion (15th and
16th repeats): the COPILOT_PR_AUTOPILOT_TEMP_ROOT escape hatch ships
in r119, takes precedence over $env:TEMP, and validates IsPathFullyQualified.
Operators on locked-down hosts use it; default ACLs are correct for
the documented use case (per-process subdir under a 700-equivalent root).
codespell + skill:validate + fix-line-endings: all clean.
02-check-review-status.ps1 smoke at HEAD: valid JSON, Converged=false (expected).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…hrow; doc consistency Round 126 review fixes: 1. _lib.ps1: extracted Protect-CopilotPrAutopilotCredentials helper from the inline redactor block introduced in r125 (timeout path). Now applied symmetrically in Invoke-GhGraphQL's failure throw — `gh api graphql` failures with verbose envelopes or any future GH_DEBUG=api enablement can echo Authorization headers / token-like values into stderr/stdout, and the throw embedded them raw before this change. Truncation still runs after redaction so the throw stays bounded. Throw label now reads "(credentials redacted)" so log readers know redaction was applied. 2. 10-cleanup-outdated.ps1: docstring bullet updated to reflect actual regex behavior — accepts both `copilot-pull-request-reviewer` and `copilot-pull-request-reviewer[bot]` via the canonical $CopilotPrAutopilot_CopilotReviewerLoginRegex. The GraphQL author.login and REST APIs differ on which form they return; the regex matches both. Docs now match implementation. Pushed back on the orchestration.md "||" double-pipe table reviewer note — verified via `grep -P '^\|\|' orchestration.md` returns no matches. All table rows in that file already use single leading `|` per row. Reviewer appears to have misread the table. codespell + skill:validate + fix-line-endings: clean. 02 smoke at HEAD: valid JSON, OpenThreadCount=3 (expected, pre-fix snapshot). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… and Resolve-RepoCoords throws
Round 127 review fixes — Copilot reviewer pattern-matched the redaction
extracted in r126 to the two remaining unredacted throw sites:
1. _lib.ps1 ConvertFrom-GhJson:
- Empty-stdout path: redact stderr preview before truncation.
- Non-JSON path: redact BOTH stdout and stderr previews inside the
shared $previewOf closure before substring truncation.
- Both throw labels now read "(credentials redacted)".
2. _lib.ps1 Resolve-RepoCoords (gh repo view failure branch):
- Redact + cap both streams at $previewChars=500 (mirrors the
Invoke-GhGraphQL throw formatting). Previously the stdout/stderr
were Trim()'d but unbounded, so a verbose payload would produce
a multi-MB throw. Throw label now reads "(credentials redacted)".
Same patterns as r125/r126 (Authorization Bearer/token/basic, gh[ps]_*,
github_pat_*, JSON access_token/token) via the central helper.
codespell + skill:validate + fix-line-endings: clean.
02 smoke at HEAD: valid JSON, OpenThreadCount=2 (pre-fix snapshot).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…r; apply to all step-script throws Round 128 review fixes — reviewer flagged 3 step-script throw sites that surfaced raw $r.Stderr without the redaction posture established in r125-r127. Applied across all 4 sites (the reviewer found 3; the 4th was the symmetric per-page variant in 01). 1. _lib.ps1: new Format-CopilotPrAutopilotErrorPreview helper. Wraps the common "redact credentials → trim → bound to MaxChars → return '(empty)' for null/whitespace" pattern. Default MaxChars=500 matches Invoke-GhGraphQL / Resolve-RepoCoords. Hoisted into _lib.ps1 so step scripts don't each reinvent the same formatting (and so future tweaks stay single-sourced). 2. 01-request-review.ps1: BOTH events-query throws (first-page and per-page in the pagination walk) now use the helper. 3. 02-check-review-status.ps1: `gh api user` throw now uses the helper. 4. 10-cleanup-outdated.ps1: `gh api user` throw now uses the helper. All four throws are now labeled "(credentials redacted)" and bounded. Step scripts already dot-source _lib.ps1, so the helper is in scope in every fresh process per the documented invocation model. codespell + skill:validate + fix-line-endings: clean. 02 smoke at HEAD: valid JSON, OpenThreadCount=3 (pre-fix snapshot). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… throw to use Format helper
Round 129 review fixes:
1. _lib.ps1 Protect-CopilotPrAutopilotCredentials: added [AllowNull()]
to the $Text param. Several callers (including ConvertFrom-GhJson's
$Stderr which is declared [AllowNull()][AllowEmptyString()]) could
pass $null, which would fail parameter binding BEFORE the
IsNullOrEmpty guard runs. Now binds cleanly on $null and returns
$null unchanged via the existing guard.
2. _lib.ps1 Invoke-GhGraphQL failure throw: collapsed the duplicated
redact → trim → truncate → empty-sentinel block (which also had the
redundant `| ForEach-Object { $_ }` no-op pipeline) to two calls to
Format-CopilotPrAutopilotErrorPreview. The helper centralizes the
exact same formatting and drops the avoidable pipeline overhead.
Functionally identical output; less duplication; the no-op pipeline
is gone.
codespell + skill:validate + fix-line-endings: clean.
02 smoke at HEAD: valid JSON, OpenThreadCount=3 (pre-fix snapshot).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
skills/copilot-pr-autopilot/scripts/01-request-review.ps1:1
- The thrown/printed error text includes raw
ghstderr/body strings ($rawMsg,$r.Stderr,$rb.Stderr) without passing through the credential redactor. If an operator runs withgh --verbose/GH_DEBUG=apior a GitHub error response echoes authorization-like values, this can leak secrets into CI logs. Fix by running these fields throughProtect-CopilotPrAutopilotCredentials(or the sharedFormat-CopilotPrAutopilotErrorPreview) before embedding them inthrow/Write-Warningmessages.
<#
…se Resolve-RepoCoords throw to helper Round 130 review fixes: 1. 01-request-review.ps1: applied credential redaction to both raw $rb.Stderr (events back-walk Write-Warning) and $rawMsg (built from $r.Stderr or aggregated GraphQL body errors, embedded in two permission-error throws). Back-walk warning uses Format helper (capped). $rawMsg uses Protect helper directly — no length cap because the throws are operator-facing diagnostics and capping would hide the FORBIDDEN message text. 2. _lib.ps1 Resolve-RepoCoords: collapsed the expanded redact → trim → truncate → empty-sentinel block (from r127) to two calls to Format-CopilotPrAutopilotErrorPreview. Same formatting, less duplication. Matches the Invoke-GhGraphQL collapse from r129. codespell + skill:validate + fix-line-endings: clean. 02 smoke at HEAD: valid JSON, OpenThreadCount=2 (pre-fix snapshot). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ure throws Round 131 review fix — completed the credential-redaction sweep across 01-request-review.ps1. _lib.ps1: no changes. 01-request-review.ps1: - Non-FORBIDDEN GraphQL failure throw (exit-code path): $r.Stderr now flows through Protect-CopilotPrAutopilotCredentials. Throw label reads "(credentials redacted)". - bodyErrors throw (zero-exit but errors[] populated): joined message string flows through Protect-CopilotPrAutopilotCredentials. Throw label reads "(credentials redacted)". No length cap on either — these are operator-facing diagnostics where capping would hide the actual error text. Redaction alone is the right tradeoff (consistent with the $rawMsg handling from r130). Verified via grep: all remaining throw/Write-Warning sites that embed gh stderr or error.message in this skill now go through either Format-CopilotPrAutopilotErrorPreview (capped) or Protect-CopilotPrAutopilotCredentials (uncapped diagnostic). codespell + skill:validate + fix-line-endings: clean. 02 smoke at HEAD: valid JSON, OpenThreadCount=1 (pre-fix snapshot). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…leanup failure summary Round 132 review fixes: 1. _lib.ps1 Protect-CopilotPrAutopilotCredentials: broadened GitHub token regex from `gh[ps]_` (matched only ghp_ Personal Access and ghs_ Server-to-Server) to `gh[a-z]_` (now also catches gho_ OAuth, ghu_ User-to-Server, ghr_ refresh, and any future single-letter variants GitHub introduces). The 36+ character length requirement still keeps the regex precise — false positives would need to be a stand-alone `gh<letter>_` followed by 36+ alphanumerics, which is the documented token shape (https://github.blog/security/application-security/behind-githubs-new-authentication-token-formats/). 2. 10-cleanup-outdated.ps1 failure summary: each $failed[].Error now passes through Format-CopilotPrAutopilotErrorPreview (redact + cap at 200 chars/error) AND has CR/LF runs collapsed to single spaces BEFORE concatenation. Without this, a multi-line GraphQL FORBIDDEN payload echoed across N failed threads would break line-delimited CI log collectors and flood output. Summary label now reads "Failed threads (credentials redacted)". 200 chars/error keeps the joined line readable when N>5. codespell + skill:validate + fix-line-endings: clean. 02 smoke at HEAD: valid JSON, OpenThreadCount=2 (pre-fix snapshot). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…hGraphQL errors[] aggregator
Round 133 review fixes — completed the redaction sweep on the two
remaining throw sites in _lib.ps1:
1. Assert-GhReady prereq-missing throw: $errTrimmed (built from
`gh auth status` stderr/stdout) now flows through
Protect-CopilotPrAutopilotCredentials before {ERR} substitution.
This is arguably the most important throw site to cover because
the whole `gh auth status` command is about auth — token-like
strings showing up in its output is the expected case, not the
rare one.
2. Invoke-GhGraphQL errors[] aggregator throw: $msgs (aggregated
type/path/code/message from the GraphQL errors array) now flows
through the redactor before throwing. Throw label reads
"(credentials redacted)". GraphQL error messages don't normally
carry credentials, but the central library posture is "redact
before throwing" and this site was the last hold-out.
codespell + skill:validate + fix-line-endings: clean.
02 smoke at HEAD: valid JSON, OpenThreadCount=2 (pre-fix snapshot).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds
skills/copilot-pr-autopilot/— runs the Copilot Code Review → fix → reply+resolve → re-trigger loop on a PR until HEAD is reviewed with zero open Copilot threads.Repo-agnostic (discovers build/test/lint from
CONTRIBUTING/AGENTS/package.json/Makefile). Bundles 10 PowerShell scripts (one per step) + per-step reference docs.Permissions: the full multi-round autopilot needs repo Triage or Write (GitHub's
requestReviewsByLoginis the only public API for bot reviewers and is gated on that). External PR authors get a documented-SingleIterationmode plus manual re-trigger between iterations (UI 🔄 button orsynchronize-event push). SeeSKILL.md → Prerequisites → Permissionsfor the full matrix.Validation:
npm run skill:validate✅,npm run build✅,bash eng/fix-line-endings.sh✅.