Skip to content

Improve agentic state-machine generator prompt#19901

Open
T-Gro wants to merge 9 commits into
mainfrom
agentics/state-machine-cleanup
Open

Improve agentic state-machine generator prompt#19901
T-Gro wants to merge 9 commits into
mainfrom
agentics/state-machine-cleanup

Conversation

@T-Gro

@T-Gro T-Gro commented Jun 5, 2026

Copy link
Copy Markdown
Member

The generator now extracts structured IR per workflow before rendering diagrams, with two self-verification passes (structural + safeguard). This fixes incorrect guard expressions, wrong lifecycle ordering, and missing safeguards in the generated docs.

T-Gro and others added 3 commits May 28, 2026 18:41
…docs/ output

The agentic-state-machine workflow has been failing since PR #19721 moved
the output to .github/docs/state-machine.md. Files under .github/ are
treated as protected by gh-aw (agent instruction files, security config).

The allowed-files config permits WHICH files can be modified but does not
override the built-in protected-files blocking. Adding protected-files:
allowed explicitly opts in, which is safe since allowed-files already
restricts writes to .github/docs/** only.

Fixes #19739

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Evolve the 74-line freeform prompt to a 412-line structured extraction
pipeline with multi-phase self-verification. The agent now extracts
structured IR per workflow before rendering diagrams, with two
self-verification passes (structural + safeguard).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@T-Gro T-Gro requested a review from a team as a code owner June 5, 2026 19:24
@T-Gro T-Gro added NO_RELEASE_NOTES Label for pull requests which signals, that user opted-out of providing release notes automation labels Jun 5, 2026
@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

⚠️ Release notes required, but author opted out

Warning

Author opted out of release notes, check is disabled for this pull request.
cc @dotnet/fsharp-team-msft

@github-actions github-actions Bot added the AI-Tooling-Check-Bypassed Tooling check: non-fork PR, not diff-analyzed label Jun 5, 2026
T-Gro and others added 6 commits June 9, 2026 12:55
Run the new generator prompt end-to-end and verify the output converges
under five parallel adversarial verifiers (triggers, diagram wiring,
behavior/safe-outputs, labels/citations, counts/consistency) to
0 CRIT / 0 HIGH / 0 MED / 0 technically-incorrect findings.

The doc now enumerates all 15 workflows (including copilot-setup-steps),
adds dedup choice gates per Rule 20, exhaustively enumerates every
gh-aw safe-outputs leaf key per Rule 39, and uses correct actor prefixes
on every edge.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Mermaid stateDiagram-v2's lexer treats `;` as a statement separator
inside `state X { ... }` composite blocks. When followed by a token
containing a hyphen (e.g., `allowed-files`, `fetch-depth`,
`AI-thinks-issue-fixed`), the lexer aborts with "Lexical error:
Unrecognized text", which prevents the diagram from rendering in GitHub
or any browser viewport.

5 of 6 diagrams in the generated doc were failing to render. None of
the existing 40 generator rules covered Mermaid syntactic safety, and
none of the 15 Phase 3.5 verifier checks parsed the output.

Generator changes:
- Add Rule 41 (Mermaid edge-label sanitization): forbid `;` and HTML
  control chars in labels; require balanced delimiters; explain the
  lexer interaction with hyphenated identifiers.
- Add Phase 3.5 verifier check (p): parse every Mermaid block with
  `mermaid.parse()` via jsdom; any parse failure is CRIT.
- Add Phase 4 deterministic sanitization post-process (Python) that
  rewrites `;` to `,` in every edge label before emit; runs as
  belt-and-suspenders even if the model regresses.
- Add a one-line edge-label safety summary to <diagram-guidelines>.

Doc changes:
- Apply the Phase 4 sanitization to the existing state-machine.md
  (40 line touches, no semantic change beyond `;` -> `,`).
- Verified: all 6 Mermaid blocks now parse cleanly under mermaid 10.x.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The previous generator optimized for verifier completeness (every leaf
key documented) and produced an unreadable wall of tables: 90 rows of
safe-output keys (mostly defaults like "target: '*'" repeated 13
times), 24-row label dictionary with 9 near-identical rows for the
"Affects-*" family, 7-column overview that scrolled horizontally,
and diagram edge labels dumping full config inline.

Doc changes:
- Overview: 7 cols -> 5 cols (drop Type and Concurrency; inline
  serialization in Inputs cell when present).
- Safe-outputs: 3 tables totaling 90 rows -> 9 per-workflow signature
  paragraphs. Universal defaults (target '*', noop.report-as-issue
  false, draft false) stated once at top, suppressed below.
- Label Dictionary: 24-row 5-col table -> 5 semantic groups
  (always-applied, agent-chosen add, agent-chosen remove, trigger
  filters, imperative). "Affects-*" family collapsed to one bullet.
- Diagram edge labels: shortened from full-config dumps to behavior
  verbs + brief object hints (all now <80 chars).

Generator changes:
- Rule 39 rewritten: signature-level documentation, not exhaustive
  enumeration. Sig form: one paragraph per workflow listing action
  verbs with override config; defaults suppressed.
- Rule 42 added (Compaction): hard limits on doc lines (<=600), pipe
  rows (<=80), per-table rows (<=25). Mandate semantic grouping for
  label dictionaries and per-workflow signature blocks.
- Rule 43 added (Edge-label brevity): <=80 chars per label, behavior
  verb + brief object only; full config goes to sig blocks. Post-draft
  grep verifies 0 lines exceed.
- Phase 3.5 verifier check (q) added: deterministic bash readability
  metrics (LINES/PIPES/LONGEDGES/MAXTABLE) with explicit thresholds.

Metrics:
- Doc: 603 -> 501 lines (-17%); 143 -> 27 pipe rows (-81%);
  max table 30 -> 17 rows; longest edge label was 250+ chars -> all <80.
- All 6 Mermaid blocks still render (zero regression on Rule 41 fix).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three independent rubber-duck reviewers (Sonnet, GPT-5.4, Gemini)
read the doc fresh and averaged 2.3/5 on readability. Convergent
complaint across all three: domain jargon used without definition
(gh-aw, safe-outputs, CCA, flaky-test-detector, Cat A/B/C, B0-B4),
no orientation paragraph, and one diagram edge that pointed at
source-file line numbers in place of actual content.

Doc changes:
- Add 'What this doc is' intro paragraph above the Overview.
- Add a Glossary section defining the 8 domain terms that all three
  reviewers flagged as undefined.
- Add a Legend table for the actor-prefix emoji convention
  (clock/person/gear/robot) and the choice/fork/join pseudo-states.
- Inline the 6 repo-assist Task 2 skip conditions where the diagram
  previously linked to repo-assist.md L296-306. Source-pointer was
  not documentation.
- Add a 'task ordering' callout immediately after the repo-assist
  diagram explaining why Task 1 -> Task 3 -> Task 2 -> Task FINAL
  is non-sequential by design.
- Convert the repo-assist safe-output signature from a 9-item
  single-line comma soup into a 10-line bulleted action list.
- Make the commands.yml two-job artifact boundary visible by
  inserting a CMD_JobBoundary intermediate state.
- Shorten one pre-existing 83-char edge label to satisfy Rule 43.

Generator changes (so this stays fixed in future regenerations):
- Rule 44 (Glossary mandatory): every domain term used must be
  defined at first use or in a top-of-doc glossary. Enumerates
  the term classes (project-specific tools, custom frameworks,
  acronyms, taxonomies, diagram convention key).
- Rule 45 (Self-contained): source-file pointers like
  '(see file.md L100-110)' are documentation failures; inline
  the content and use citations only as provenance markers.
- Rule 46 (Orientation paragraph mandatory): doc must open with
  1-3 sentences answering what/who/how before any table or
  diagram. Generator-version stamps are metadata, not orientation.

Metrics: 501 -> 547 lines (intro+glossary cost ~46 lines); 6/6
Mermaid blocks still render; 0 edge labels over 80 chars.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nd glossary

Second 3-reviewer rubber-duck cycle: 0 source-file consults (glossary
works), but average plateaued at 2.67/5 with a fresh layer of gaps
the prior pass didn't address. This commit closes those:

Doc:
- Safe-output sections converted from run-on prose (Sonnet's #1
  readability failure: 'YAML serialized into prose') to per-group
  mini-tables with columns: Workflow | Output | Max | Key Constraints.
- New '/run commands' callout table after Group C diagram (4 rows:
  fantomas, ilverify, xlf, test-baseline). Sonnet and GPT both flagged
  these as the user-facing value of commands.yml but undefined.
- Glossary expanded from 9 to 15 entries: .lock.yml, dotnet/skills,
  BSL (baseline), FCS (F# Compiler Service), the LPM internal flags
  (12h stuck guard / ci_blocked / has_ci / has_conflicts), and the
  two repo-specific magic constants (milestone 29, 2026-05-12 cutoff).
- Removed 4 residual '(src Lnn)' provenance markers from edge labels
  (footer SHAs already pin source).
- Replaced 'etc.' in RA_CmdOutputs edge with explicit '9 safe-outputs
  (see Safe-outputs below)'.

Generator:
- Rule 39 amended: prefer per-workflow mini-tables for safe-outputs;
  paragraph form acceptable only for trivial workflows (<=2 actions,
  <=1 constraint each). Tables won this round of reviewer feedback.

Metrics: 547 -> 580 lines (<600 limit); 35 -> 69 pipe rows (<80
limit); all edge labels still <=80 chars; 6/6 Mermaid blocks render.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
User taste-calls based on three rubber-duck reviewer cycles:
1. Flatten the 5-group Labels section into ONE table (GPT preference,
   beats Sonnet's grouping preference) — makes cross-workflow label
   flows visible on a single row.
2. Keep mermaid workflow groupings (overrules Gemini's split-per-
   workflow recommendation) — the visual proximity communicates the
   hidden dependencies (shared labels, dispatch handovers, indirect
   signals).
3. Trim, don't grow.

Doc:
- Labels: 5 bulleted groups -> single 14-row table with columns
  Label | Type | Added by | Removed by | Read by | Notes. Producer/
  consumer flows now on one row (AI-Issue-Regression-PR added by RA,
  read by RPS; AI-Auto-Resolve-* read by LPM; AI-thinks-issue-fixed
  bidirectional RA<->RPS/RA).
- Overview: drop unused # column; unify Inputs sentinel.
- Handover Map: drop the spurious intra-workflow row (RA task
  signaling is in the task-ordering blockquote, not a cross-
  workflow handover).
- Glossary: split the 4-concept overloaded bullet (has_ci,
  has_conflicts, ci_blocked, 12h stuck guard) into 4 short bullets.
- Group intros dropped where they just restated the diagram;
  noop rows collapsed to a per-group preamble.

Generator:
- Rule 42 flipped: prefer flat Labels table over semantic groups
  (with rationale and column shape). The earlier 5-group rule was
  rejected by 2 of 3 readability reviewers.
- Rule 47 added: mermaid workflow groupings stay intact for any
  group whose workflows share cross-dependencies (labels, state
  branches, dispatch). Splitting them erases the visible
  dependency graph. Per-workflow split that loses cross-edges =
  MAJOR.

Metrics: 580 -> 560 lines (-3.4%); 69 -> 78 pipe rows (still
under 80 limit); 0 edge labels over 80 chars; 6/6 Mermaid blocks
render; max table 17 rows.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI-Tooling-Check-Bypassed Tooling check: non-fork PR, not diff-analyzed automation NO_RELEASE_NOTES Label for pull requests which signals, that user opted-out of providing release notes

Projects

Status: New

Development

Successfully merging this pull request may close these issues.

1 participant