Skip to content

feat(inspect): add IR-based pipeline reasoning tools (inspect, graph, whatif, lint, catalog, trace, mcp-author)#998

Open
jamesadevine wants to merge 12 commits into
mainfrom
feat/agent-ir-tools
Open

feat(inspect): add IR-based pipeline reasoning tools (inspect, graph, whatif, lint, catalog, trace, mcp-author)#998
jamesadevine wants to merge 12 commits into
mainfrom
feat/agent-ir-tools

Conversation

@jamesadevine

@jamesadevine jamesadevine commented Jun 13, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds an agent-facing read-only surface over the typed pipeline IR so agents (in IDE Copilot Chat, sub-agents, or downstream tooling) can reason about pipeline structure and failures programmatically instead of grep-walking compiled .lock.yml files.

Layered design — each layer reusable by the next:

Layer Surface
0 — JSON schema PipelineSummary (schema_version: 1) + GraphSummary etc. in src/compile/ir/summary.rs; Serialize+Deserialize, stable contract.
1a — Public IR entry compile::build_pipeline_ir(path) -> (FrontMatter, Pipeline) — read-only, no YAML written.
1b — CLI commands inspect, graph {dump,deps,outputs}, whatif, lint, catalog, trace.
2 — MCP server New ado-aw mcp-author stdio server exposing every L1 command as a read-only MCP tool for IDE / Copilot Chat.
3 — Audit integration AuditData.pipeline_graph, JobData.{upstream,downstream}_jobs, new "downstream-skipped" finding.

Follow-up fixes from review feedback

  • Fixed whatif condition classification to correctly handle negated predicates (e.g. not(failed()), not(always())) and added targeted regression tests.
  • Reduced audit/trace drift risk by centralizing duplicated job result logic into shared JobData helpers.
  • Updated mcp-author audit_build to write under a temp directory ($TMP/ado-aw/audit) instead of ./logs.
  • Clarified schema stability docs: additive optional fields are non-breaking, but new enum variants currently require a schema-version bump with the present enum encoding.
  • Renamed an internal graph traversal variable in whatif for readability (reversedownstream).

What agents can now do

  • Debug: whatif --fail Detection answers "what skips" without manual graph walking; graph deps SafeOutputs --direction upstream returns the typed transitive dependency set; trace <build> joins ADO timeline with the IR so a failed timeline node surfaces its derived skip-set.
  • Author: lint flags unused outputs / anonymous producers; catalog --kind safe-outputs lists every available tool (24 found); inspect --json returns a stable schema-versioned summary for scripting.
  • IDE: All of the above exposed over MCP via ado-aw mcp-author for VS Code / Claude Desktop.

Trust-boundary preserved

The new mcp-author server is separate from the existing SafeOutputs MCP that ships inside compiled pipelines. Stage 1 sandboxed agents do NOT get these tools — they're author/debug-only surface, intentional.

Docs

  • New docs/mcp-author.md (tool surface + VS Code / Claude Desktop config snippets).
  • New "Public JSON summary (ir::summary)" section in docs/ir.md.
  • New commands in docs/cli.md; pipeline_graph + upstream_jobs / downstream_jobs in docs/audit.md.
  • prompts/debug-ado-agentic-workflow.md and prompts/update-ado-agentic-workflow.md updated to surface the new commands.
  • Follow-up clarification to schema stability wording in docs/ir.md.

Test plan

  • cargo build
  • cargo test --bin ado-aw ✅ (full binary test suite passing after follow-up fixes)
  • cargo test --test inspect_integration4 passed (end-to-end CLI tests on the create-pull-request.md fixture)
  • Targeted regression coverage:
    • cargo test --bin ado-aw inspect::whatif::tests::
    • cargo test --bin ado-aw inspect::trace::tests::
    • cargo test --bin ado-aw audit::findings::tests::
    • cargo test --bin ado-aw mcp_author::tests::
  • cargo clippy --all-targets --all-features0 warnings
  • Manual smoke against tests/safe-outputs/create-pull-request.md:
    • ado-aw inspect ... → text + --json (schema_version: 1)
    • ado-aw graph dump ... --format dot → valid DOT digraph
    • ado-aw graph deps ... SafeOutputs --direction upstream → upstream jobs Agent, Detection
    • ado-aw whatif ... --fail AgentDetection, SafeOutputs skipped
    • ado-aw lint ... → 0 errors
    • ado-aw catalog --kind safe-outputs --json → 24 entries
    • ado-aw mcp-author --help → wired

@github-actions

Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Looks good overall — layered design is clean, trust boundary is maintained, error handling is solid. A few correctness and maintenance issues worth addressing.


Findings

🐛 Bugs / Logic Issues

  • [src/inspect/whatif.rs:classify_condition] The substring check normalized.contains("failed()") yields false positives for not(failed()) (which means "only run when NOT failed" — same semantics as the default succeeded(), so the job would be skipped after an upstream failure). whatif would incorrectly report such a job as runs_anyway. Same problem for not(always()). The fix is to match against the full lowered string or, better, classify against the typed Condition AST before lowering rather than the rendered string.

    // Today — wrong for not(failed()):
    if normalized.contains("always()")
        || normalized.contains("failed()")  // ← also matches not(failed())
  • [src/audit/findings.rs] + [src/inspect/trace.rs] job_failed is copy-pasted verbatim into both files. job_classification (findings.rs) and job_status (trace.rs) are also identical in body. If either drifts (e.g., adding handling for the British-spelling "cancelled"), the other goes stale silently. Extract to a shared helper on JobData or in audit/model.rs.

⚠️ Suggestions

  • [src/mcp_author/mod.rs:audit_build] The audit_build MCP tool hardcodes output: Path::new("./logs"). This writes artifact files relative to whatever CWD the MCP server process has, with no way for the caller to override it. Since mcp-author runs in arbitrary directories, this can scatter cache files unexpectedly. Consider std::env::temp_dir(), a platform cache dir, or exposing an output_dir parameter.

  • [src/inspect/whatif.rs:reachable_edges] The local variable is named reverse but actually holds a forward (producer→consumers) adjacency map. The traversal logic is correct, but the name will trip up future readers. Rename to downstream or forward.

  • [src/compile/ir/summary.rs] The stability-contract doc states: "new enum variants in unknown-tolerant contexts do not require a bump." However StepKind, PoolSummary, and PipelineBodySummary have no #[serde(other)] catch-all or Unknown variant, so any new variant is a silent Deserialize failure for old consumers. Either add #[serde(other)] / Unknown variants to honour the contract, or tighten the doc to say new variants always require a schema-version bump.

✅ What Looks Good

  • Error handling is thorough — anyhow::Result everywhere, no silent unwrap()s on user-facing paths.
  • Trust boundary correctly enforced: mcp-author is fully separate from the in-pipeline SafeOutputs MCP; Stage 1 sandboxed agents do not get these tools.
  • build_pipeline_ir correctly uses skip_integrity: true, consistent with check_pipeline semantics.
  • WhatIfError with typed errors and Levenshtein-based suggestion is a nice DX touch.
  • 21 new unit tests including the cross-fixture round-trip against create-pull-request.md.

Generated by Rust PR Reviewer for issue #998 · sonnet46 4.6M ·

@jamesadevine

Copy link
Copy Markdown
Collaborator Author

@copilot address feedback and suggestions

Copilot AI commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

@copilot address feedback and suggestions

Addressed the feedback in commit c4063bd (plus prior follow-up f5c2c65): fixed whatif negated-condition handling with added tests, deduplicated job status/failure logic via JobData helpers, moved mcp-author audit_build output to a temp dir, clarified IR schema-version docs, and applied the maintainability nits.

@jamesadevine

Copy link
Copy Markdown
Collaborator Author

/rust-review

@github-actions

github-actions Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Rust PR Reviewer failed. Please review the logs for details.

@jamesadevine

Copy link
Copy Markdown
Collaborator Author

/rust-review

@github-actions

github-actions Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Rust PR Reviewer completed successfully!

@github-actions

Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Mostly solid — the layered IR inspection design is clean and well-tested, but there are two security issues and one logic bug worth addressing before merge.


Findings

🔒 Security Concerns

  • src/mcp_author/mod.rs (lines 109–113, 147–150) — PAT leaked via Debug
    TraceFailureParams and AuditBuildParams both #[derive(Debug)] while carrying pat: Option<String>. Any {:?} log or dbg!() call on these structs (including error-path traces in the rmcp framework) prints the PAT in plaintext. Use a newtype wrapper with a redacted Debug impl, or manually impl Debug to mask the pat field.

  • src/audit/pipeline_graph.rs (resolve_source_path, line ~145) — arbitrary file read via crafted build artifact
    The source value is read from aw_info.json inside the downloaded artifact. If path.is_absolute() the function returns it unconditionally — there is no canonicalization or bounds check. Via the audit_build MCP tool an IDE agent that's been prompt-injected by a malicious repo could supply a build whose artifact carries "source": "/home/user/.ssh/id_rsa" (or any other readable file). build_pipeline_ir will fail to parse it as markdown and surface the error, but the file is still opened and read. A canonicalize + prefix check against std::env::current_dir() (or a user-supplied root) would close this.

🐛 Bugs / Logic Issues

  • src/audit/findings.rs (job_name_matches) vs src/inspect/trace.rs (find_runtime_job) — duplicated matching logic
    Both functions implement the same name == id || rsplit('.').next() == id heuristic independently. The PR description says it "centralized duplicated job result logic into shared JobData helpers" but this particular matcher wasn't merged. A future typo fix in one copy won't reach the other. Extract to a shared crate::audit::pipeline_graph::job_name_matches (the module already has timeline_name_matches_job — one of these could absorb the other).

⚠️ Suggestions

  • src/inspect/lint.rs (rule_step_id_collisions) — collision reports point to second duplicate, not first
    The BTreeMap::insert sentinel means the first occurrence is silently accepted and only subsequent ones emit findings. In a two-step collision the output says "step id 'X' used more than once" but identifies the consumer job, not the producer. Swap to BTreeSet<String> to detect on first repeat; store the first-seen &JobSummary in a separate map to include the original location in the message. (Note: the compiler's graph pass already rejects this at compile time, so this is a "belt-and-suspenders" lint — low urgency.)

  • src/inspect/whatif.rs (classify_condition) — undocumented coverage gap
    The classifier only recognizes always(), failed(), succeededOrFailed(). ADO also accepts variable-based conditions such as eq(variables['Agent.JobStatus'], 'Failed'). The safe fallback to Skipped is correct, but users may be surprised when whatif classifies a variable-guarded job as "skipped". A comment (or a lint rule) documenting this limitation would help.

  • src/audit/findings.rs (add_downstream_impact_findings) — _recommendations is unused
    The _recommendations parameter is required by the function signature convention but there's an obvious recommendation to add: "Check the {job} job logs for root cause." Even a one-liner here would make the finding more actionable in the console report.

✅ What Looks Good

  • The is_negated_call even/odd NOT-nesting logic correctly handles double negation (not(not(failed()))RunsAnyway), which was the regression the PR explicitly targeted.
  • Graceful degradation when aw_info.json is absent or the source file isn't locally available is exactly right for an audit-time utility.
  • McpAuthor is correctly excluded from the update-check and pipeline-internal guards.
  • Test coverage is solid — targeted whatif::tests, trace::tests, and findings::tests all cover the logic branches that were hardest to get right.

Generated by Rust PR Reviewer for issue #998 · 247.8 AIC · ⌖ 12.6 AIC · ⊞ 32.5K ·
Comment /rust-review to run again

jamesadevine added a commit that referenced this pull request Jun 14, 2026
…dback

Security:

- mcp_author: manual Debug impls for TraceFailureParams and AuditBuildParams so the optional pat field is no longer leaked via {:?} / dbg!() / rmcp error traces.

- audit::pipeline_graph::resolve_source_path: require .md extension and reject parent-dir / tilde components to close the /home/user/.ssh/id_rsa exfiltration vector flagged by the reviewer. populate_pipeline_graph downgrades a rejection to a warning so legitimate audits still complete.

Maintainability:

- audit::model::JobData: add matches_ir_id helper. findings::add_downstream_impact_findings and inspect::trace::find_runtime_job now both delegate to it instead of carrying duplicate Stage.Job suffix-match heuristics.

- audit::findings::add_downstream_impact_findings: emit a Recommendation pointing operators at the failed-job logs (the _recommendations parameter is no longer unused).

- inspect::lint::rule_step_id_collisions: collision message now names BOTH the first-seen producer location and the colliding consumer (was only pointing at the second occurrence).

- inspect::whatif::classify_condition: doc-comment now records the coverage gap for variable-based conditions (e.g. Agent.JobStatus comparisons) which we deliberately classify as Skipped; new test pins that behaviour.

New tests:

- audit::pipeline_graph: 4 resolver tests covering the rejected attack patterns and the accepted legitimate absolute-path case.

- audit::pipeline_graph::populate_pipeline_graph_records_warning_on_malicious_source — end-to-end test that a malicious source string is recorded as a warning rather than read from disk.

- inspect::whatif::variable_based_condition_is_conservatively_skipped — pins the documented limitation.

Validated locally with cargo build, cargo test (1880 unit tests + integration suites pass), and cargo clippy --all-targets --all-features (clean).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Solid layered design with good test coverage — one substantive logic bug in the condition classifier, a security suggestion for the MCP source path, and a lint rule that's effectively dead code.


Findings

🐛 Bugs / Logic Issues

src/inspect/whatif.rsnot(succeededOrFailed()) misclassified as RunsAnyway

"failed()" is a substring of "succeededorfailed()" (it starts at offset 11). The is_negated_call check looks for "not(" at idx - 4, but when "failed()" is matched inside "succeededorfailed()", the four characters before it are "edor", not "not(". So not(succeededOrFailed()) — which should be Skipped — is incorrectly classified as RunsAnyway.

Minimal repro that would fail:

detection.condition = Some("not(succeededOrFailed())".to_string());
// classify_condition returns RunsAnyway, should return Skipped

Fix: add a word-boundary guard before accepting a substring match, e.g. verify that the character immediately before the match is (, a comma, or start-of-string and not a lowercase letter:

fn is_word_boundary_before(s: &str, idx: usize) -> bool {
    idx == 0 || s.as_bytes().get(idx - 1).map_or(true, |&b| !b.is_ascii_alphanumeric())
}

Then in contains_unnegated_call, skip any hit where !is_word_boundary_before(normalized_condition, idx).

The PR already tests not(failed()), not(always()), and not(not(failed())), so a test for not(succeededOrFailed())Skipped should be added alongside the fix.


🔒 Security Concerns

src/mcp_author/mod.rssource_path() accepts arbitrary file paths

fn source_path(path: &str) -> PathBuf {
    PathBuf::from(path)  // no validation
}

All nine MCP tools that accept source_path reach build_pipeline_ir (which reads the file), and none validate the path first. A crafted MCP request could supply ../../other-project/secrets.md and the IR summary — including env-var names, step names, and any secrets embedded in bash step bodies — would be returned in the tool result.

audit/pipeline_graph.rs's resolve_source_path already models a minimal policy (.md extension, no .., no leading ~). Applying the same guard in source_path() would be consistent and adds defense-in-depth for IDE integrations that proxy LLM-generated requests.


⚠️ Suggestions

src/inspect/lint.rsrule_missing_is_output is effectively unreachable

// TODO: This should remain quiet while PipelineSummary patches
// auto_is_output from the graph.
if !decl.auto_is_output {
    findings.push(...)  // never fires
}

PipelineSummary::from_pipeline (in summary.rs) already patches every affected output's auto_is_output = true before lint() is ever called. The guard means this rule will never emit a finding in normal usage. The TODO is noted but it leaves rule_missing_is_output as silent dead code. Consider either removing the body and leaving a doc comment explaining the invariant, or making the rule unconditional (relying on the summary having been built correctly) so it can catch future drift.

src/inspect/mod.rs — stale module doc comment

The doc comment lists trace.rs, whatif.rs, lint.rs, and catalog.rs under "Future siblings (not yet landed)" — but all four are present in this PR. The comment should be updated to reflect what's actually in the module.


✅ What Looks Good

  • Security in resolve_source_path: the .md-extension requirement + ..-component rejection + ~-prefix guard (with inline threat-model commentary) is exactly the right approach for an untrusted audit artifact field.
  • PAT redaction in Debug impls: masking pat in TraceFailureParams and AuditBuildParams prevents accidental token exposure in log lines or MCP error traces.
  • Condition double-negation handling: the not(not(failed()))RunsAnyway test (and the toggle logic in is_negated_call) is handled correctly and regression-tested.
  • JobData centralization: pulling failed() and matches_ir_id() into audit::model::JobData instead of duplicating across findings and trace is a good DRY call.
  • Graceful degradation in populate_pipeline_graph: recording a warning rather than erroring when source metadata isn't locally available is the right UX for auditing cross-machine builds.

Generated by Rust PR Reviewer for issue #998 · 642.6 AIC · ⌖ 13 AIC · ⊞ 32.9K ·

@github-actions

Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Solid layered design with good security discipline in most places; a few logic issues worth fixing before merge.


Findings

🐛 Bugs / Logic Issues

src/audit/pipeline_graph.rs:timeline_name_matches_job — overly-broad suffix fallback

// Branch 3 (the fallback):
timeline_name
    .rsplit('.')
    .next()
    .is_some_and(|suffix| suffix == job_id)

rsplit takes the last dot-separated token, so a timeline name like Stage1.SubStage.Agent would match job_id = "Agent". Branch 2 (format!("{stage}.{job_id}")) already handles the canonical Stage.Job form. The fallback is effectively dead code for all known pipeline shapes but could attach wrong IR edges to timeline entries in unusual configurations. Consider replacing branch 3 with strict Stage.Job parsing or removing it entirely:

// More precise: only accept "anything.job_id" (single-level prefix)
if let Some((_, suffix)) = timeline_name.rsplit_once('.') {
    suffix == job_id
} else {
    false
}

src/audit/cli.rs:fetch_audit_data_innerderive_findings called in cache path but not before save

When the result comes from a fresh download, findings::derive_findings is called (correct). When the result comes from cache, derive_findings is also called after populate_pipeline_graph. But the saved run-summary.json snapshot was written before this run's pipeline-graph pass, so the downstream-skipped finding and the new upstream_jobs/downstream_jobs job fields are never persisted. On the next cached load they're recomputed in memory but not saved back. That means audit --json from cache could silently produce a different AuditData shape than the canonical saved summary. This is unlikely to cause a hard failure but may confuse tooling that compares successive outputs.


🔒 Security Concerns

src/mcp_author/mod.rs:source_path — no path sanitization for an externally-routed MCP server

fn source_path(path: &str) -> PathBuf {
    PathBuf::from(path)
}

audit/pipeline_graph.rs:resolve_source_path validates the source value from untrusted build artifacts with .. rejection and a mandatory .md extension guard. The same value arriving over the mcp-author stdio channel gets no equivalent treatment. The module comment notes "no workspace bounding directory: callers run it locally as the invoking user" — that's an intentional design choice, but it means a compromised or misconfigured IDE extension (or a prompt-injected model call) could feed arbitrary paths to build_pipeline_ir, which calls tokio::fs::read_to_string on the raw path before any parsing. Because build_pipeline_ir then sanitizes and validates front matter, the content leak is bounded to whatever can be read (parsing will fail for non-markdown files), but the file is read. Given that the SafeOutputs MCP server has thorough input sanitization, applying at minimum a .. / tilde guard and an .md extension check here would be consistent with project conventions and cheap to add.


⚠️ Suggestions

src/mcp_author/mod.rs:audit_build — shared temp directory for concurrent calls

let output = std::env::temp_dir().join("ado-aw").join("audit");

All audit_build invocations write artifacts under the same base directory. Two concurrent calls for different builds are safe (cache subdirectories are keyed by build ID). Two concurrent calls for the same build with no_cache: true will both write to the same path, with the second write potentially overwriting partially-written artifacts from the first. Including the build ID in the output path — or using a per-invocation tmp directory for the no_cache: true case — would close this race.


src/compile/mod.rs:build_pipeline_irskip_integrity = true for all targets without a caller note

The function hardcodes skip_integrity = true for every compile target. That's correct for a read-only IR view, but the parameter name at each call site carries no documentation (positional true, false in a 8-arg function call). A named parameter, a brief comment, or at minimum a // skip_integrity inline comment at each call would prevent a future caller from cargo-culting a copy with the wrong value. This is purely ergonomic.


✅ What Looks Good

  • PAT redaction in TraceFailureParams and AuditBuildParams: manual Debug impls that emit <redacted> are exactly right — framework traces and logs never see the token.
  • resolve_source_path security analysis: the inline doc explicitly calls out the threat model, the mitigations (.md extension, .. rejection, ~ rejection), and the residual risk. This is well-reasoned and the code matches the doc.
  • is_negated_call + word-boundary guard: the combination correctly handles not(failed()) as Skipped, not(not(failed())) as RunsAnyway, and avoids the false match of failed() inside succeededorfailed(). The coverage-limitations comment is honest about what the static classifier can't see.
  • Job(redacted) / classification()/matches_ir_id()`: centralising these helpers avoids the dual-definition drift that prompted the follow-up fix mentioned in the PR description.
  • Trust boundary: mcp-author and the SafeOutputs MCP server share zero write-capable code paths. Stage 1 agents get no IR-query tools. This is exactly the right separation.
  • Schema stability contract (SCHEMA_VERSION, additive-vs-breaking guidance, enum-variant caveat): clear and actionable for downstream tooling authors.

Generated by Rust PR Reviewer for issue #998 · 441.3 AIC · ⌖ 12.3 AIC · ⊞ 32.9K ·

jamesadevine added a commit that referenced this pull request Jun 14, 2026
…dback

Bugs:
- audit::pipeline_graph::timeline_name_matches_job and
  audit::model::JobData::matches_ir_id: tighten the dotted-name suffix
  fallback so multi-level names like `Stage1.SubStage.Agent` no longer
  spuriously match against the bare id `Agent`. The fallback now only
  accepts a single-level `Stage.Job` qualifier (rsplit_once + prefix
  must contain no dot). New regression tests pin the behaviour.
- audit::cli::fetch_audit_data_inner: extract the post-fetch enrichment
  (pipeline_graph + metrics + derive_findings) into a single helper so
  the fresh-download and cache-load paths both produce structurally
  identical AuditData. The cache-load path now persists the recomputed
  data back to run-summary.json when it changed, so subsequent runs see
  the canonical shape and tooling diffing successive outputs is no
  longer confused by drift between the saved file and in-memory result.

Suggestions:
- mcp_author::audit_build: route `no_cache: true` invocations through a
  per-invocation `tempfile::tempdir()` so two concurrent calls for the
  same build can no longer race on partially-written artifacts in the
  shared `${TEMP}/ado-aw/audit/build-<id>` directory. Cached calls
  continue to use the shared cache root so warm-cache reads still
  benefit from cross-invocation reuse.
- compile::build_pipeline_ir: add `/* skip_integrity */` and
  `/* debug_pipeline */` inline-name comments to every dispatched
  target-builder call (only Standalone had them) so future callers
  cannot cargo-cult the wrong positional bool.

Validated locally: cargo build, cargo test (1884 unit tests + every
integration suite pass), cargo clippy --all-targets --all-features
(clean).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Looks good overall — well-structured layered design with solid security defaults. A few items worth addressing before merge.

Findings

⚠️ Suggestions

  • src/inspect/whatif.rsclassify_condition false positives on string literals
    The condition classifier normalizes and text-searches the lowered ADO condition string. A condition like eq(variables['Agent.JobStatus'], 'Failed') (as already tested in the regression suite) does contain the token failed but failed() (with the ()) won't match, so that specific case is handled correctly. However, a condition string like eq(variables['result'], 'failed()') — i.e., the literal string "failed()" in a comparison — would produce a false RunsAnyway classification. Since the PR documents this as a "static classification" best-effort tool and the conditions are compiler-generated rather than raw user input, this is acceptable, but it's worth a short doc comment on classify_condition noting the limitation.

  • src/audit/cli.rs — cache write race on concurrent audit runs
    The new derive_post_processing path re-persists the updated AuditData back to the cache when post-processing changes it (to avoid drift between the in-memory result and the stored file). Two concurrent ado-aw audit invocations for the same build ID will race on save_run_summary. The failure path is already handled gracefully (downgraded to a warning), so this is low severity, but a quick advisory comment at the site would prevent future confusion.

  • src/audit/pipeline_graph.rs — absolute paths pass resolve_source_path with .md extension
    The security comment above the function is truncated in the diff ("...could carry "source": "/home/user/.ssh/id). The effective guard is the .md extension check: a crafted aw_info.json with "source": "/home/user/something.md" reaches build_pipeline_ir. That function is read-only and fails gracefully on invalid markdown, so the practical blast radius is an unexpected parse error, not code execution. Still, it's worth completing the comment to document the residual risk so future maintainers don't weaken the extension check without understanding it's the primary gate for absolute paths.

  • src/inspect/cli.rsdispatch_catalog returns anyhow::Result but is called with ? in a main that exits 0 on success
    This is fine as-is, but the lint command uses std::process::exit(1) to signal lint failures — a pattern that bypasses any Drop implementations. Since build_pipeline_ir uses tokio async I/O, any open resources are runtime-managed and won't leak, but it's worth a comment that exit(1) is intentional (it mirrors how tsc --noEmit / eslint signal failure to CI).

✅ What Looks Good

  • Path-traversal defense in resolve_source_path: Rejecting .. components, ~ prefix, and anything without .md extension from build-artifact aw_info.json is exactly right. The source path is untrusted data from a downloaded artifact.
  • Custom Debug impls for PAT fields: TraceFailureParams and AuditBuildParams both have hand-written Debug that emits <redacted> for the pat field — the right call since rmcp uses {:?} in error traces. (The Commands enum uses #[derive(Debug)] too, but that's a pre-existing pattern shared with Audit, Run, Enable, etc.)
  • is_negated_call correctness: Double-negation (not(not(failed()))) correctly yields RunsAnyway, and the word-boundary check prevents failed() from matching inside succeededorfailed(). The regression tests cover all four cases.
  • reachable_edges BFS: The seen set correctly breaks cycles. Correct.
  • unwrap()/expect() usage: All instances are inside #[cfg(test)] modules — none on production paths.
  • Trust boundary: mcp-author stdio server is intentionally separate from the SafeOutputs MCP used by sandboxed Stage 1 agents. The PR description calls this out explicitly, and the implementation honours it.
  • to_mcp_error full error chain: Exposing {error:#} to the MCP client is fine here — the IDE/Copilot Chat consumer is the same user who owns the local process. Would be a concern in a multi-tenant context but not here.

Generated by Rust PR Reviewer for issue #998 · 687.1 AIC · ⌖ 13.2 AIC · ⊞ 32.9K ·

jamesadevine added a commit that referenced this pull request Jun 14, 2026
…998 round 4

All four items in the latest review are documentation-only — the
runtime behaviour is already correct and the reviewer asked for
explicit comments so future maintainers do not weaken the guards or
re-introduce the fixed bugs by accident.

- inspect::whatif::classify_condition: doc-comment now explicitly
  flags the string-literal false-positive (e.g. a condition like
  eq(variables['result'], 'failed()') would match the literal
  substring) alongside the existing variable-condition limitation.
  This is acceptable because ADO conditions are compiler-generated,
  but the residual gap is now documented.

- audit::cli::fetch_audit_data_inner cache write path: add an inline
  comment recording the deliberate lack of filesystem locking on
  save_run_summary. Two concurrent ado-aw audit runs for the same
  build can race on this write; the failure is recorded as a warning
  rather than aborting the audit, and both writers derive from the
  same on-disk artifacts so the resulting summary stays internally
  consistent.

- audit::pipeline_graph::resolve_source_path: complete the previously
  truncated security comment so the residual risk is unambiguous —
  the .md extension check is the primary gate for absolute paths, and
  weakening or removing it without adding a containment check would
  silently re-open the arbitrary-file-read vector flagged earlier.

- main: comment on the std::process::exit(1) in the Lint dispatch arm
  explaining the intentional Drop-bypass (mirrors tsc --noEmit /
  eslint) and that the only resources in scope are runtime-managed
  async I/O.

No behavioural changes. Validated locally with cargo build, cargo
test --bin ado-aw (1884 unit tests pass), and cargo clippy
--all-targets --all-features (clean).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Solid layered design — the IR summary shape is clean, trust boundary is well-maintained, PAT redaction is correct, and the path-validation work in pipeline_graph.rs is thorough. Two real issues worth addressing before merge.


Findings

🐛 Bugs / Logic Issues

src/inspect/cli.rs:build_trace — hardcodes Path::new("./logs")

// src/inspect/cli.rs ~line 205
let audit = crate::audit::fetch_audit_data(crate::audit::AuditOptions {
    output: Path::new("./logs"),   // ← still hardcoded
    ...
})

The audit_build MCP tool was correctly fixed to use $TMP/ado-aw/audit, but build_trace was not. The MCP trace_failure tool calls inspect::build_trace directly:

// mcp_author/mod.rs
let (_audit, report) = inspect::build_trace(&opts).await.map_err(to_mcp_error)?;

This means every trace_failure call from an IDE silently creates ./logs/ in the server's working directory — the same unwanted side-effect that was fixed in audit_build. TraceOptions also lacks an output field, so callers have no way to override it.

Simplest fix: default to the same std::env::temp_dir().join("ado-aw").join("audit") root inside build_trace, or add an output: Option<&'a Path> field to TraceOptions.


src/audit/findings.rs:add_downstream_impact_findings — misleading "skipped" title

title: format!("Downstream jobs skipped due to {} failure", job.name),

job.downstream_jobs contains all IR-derived graph successors, not just jobs that were actually skipped at runtime. When a downstream job has always() (e.g. a cleanup job), it runs and its classification() returns "succeeded" — but the finding title still says "skipped". A finding reading "Downstream jobs skipped due to Agent failure: CleanupJob: succeeded" is contradictory.

The description body is fine (it shows the real runtime classification). The title should be more neutral, e.g. "Downstream jobs impacted by {} failure", or the finding should be suppressed for downstream jobs whose result is not "skipped"/"cancelled".


⚠️ Suggestions

src/inspect/graph_query.rs:render_dot — visual direction vs. execution order

out.push_str("  rankdir=LR;\n");  // left-to-right
// ...
out.push_str(&format!("  \"{}\" -> \"{}\";\n", cons, prod)); // consumer → producer

The combination of rankdir=LR and consumer→producer edges means the DOT graph lays out SafeOutputs → Detection → Agent from left to right — execution order runs right-to-left. Most users piping to dot -Tsvg expect to see Agent first on the left. Consider using rankdir=RL (which would put Agent on the left), or flipping to producer→consumer edges with rankdir=LR. The inline comment documents the current convention, which is enough for awareness, but the rendered output surprises most viewers.


✅ What Looks Good

  • PAT redaction via manual Debug impls for TraceFailureParams and AuditBuildParams is correct and tested.
  • Path validation in resolve_source_path (audit/pipeline_graph.rs): the .md-extension gate, ParentDir rejection, and tilde check are all present, documented, and covered by targeted tests — including the malicious_source test that verifies the warning path.
  • is_negated_call fix: the updated contains_unnegated_call with word-boundary guard correctly handles not(failed()), not(not(failed())), and the failed() inside succeededorfailed() substring case. All regression tests pass.
  • build_pipeline_ir public API: codemods in-memory only, no disk writes — correct parity with check_pipeline.
  • **Job(redacted) centralization**: single-level Stage.Jobguard is correct and tested; the oldrsplit('.').next()` multi-level bug is explicitly covered.

Generated by Rust PR Reviewer for issue #998 · 501.1 AIC · ⌖ 12.8 AIC · ⊞ 32.9K ·

@github-actions

Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Well-designed layered IR tooling with good test coverage — two issues worth addressing before merge: an unvalidated path in the new MCP server and a very noisy lint rule.

Findings

🔒 Security Concerns

  • src/mcp_author/mod.rs:377-379source_path() accepts arbitrary filesystem paths from MCP clients without validation

    fn source_path(path: &str) -> PathBuf {
        PathBuf::from(path)  // no checks at all
    }

    Every inspect_*, graph_*, whatif, and lint_workflow tool passes the caller-supplied source_path string directly to build_pipeline_ir(), which opens and reads the file. Compare this to audit/pipeline_graph.rs, which carefully validates the same source field from untrusted aw_info.json (.md extension check, .. component check, ~ prefix check). The mcp-author server is intentionally local-only, but it runs as an MCP server available to any connected AI agent — a prompt-injected agent could instruct an IDE session to call inspect_workflow with source_path = "/home/user/notes.md" or source_path = "../../secrets.md".

    The fix is straightforward: apply the same .md-extension guard and ..-component rejection that resolve_source_path() already implements. The doc-comment in pipeline_graph.rs explicitly says "Do not weaken or remove the .md extension check" — the MCP server should honour the same invariant. An absolute-path allowlist or cwd-containment check would be even better for an author-local server.

🐛 Bugs / Logic Issues

  • src/inspect/lint.rs:231-246rule_no_condition_references fires for every normal job with depends_on

    The rule emits an Info for every job that has depends_on but no explicit condition:. In the canonical Setup → Agent → Detection → SafeOutputs pipeline, this produces three findings every time (Agent, Detection, SafeOutputs all depend on their predecessor with no explicit condition). This is the correct and expected ADO behaviour — succeeded() is intentionally the default. Running ado-aw lint on any real pipeline will always emit these findings, making them indistinguishable from genuine issues.

    The rule seems to be intended to flag jobs that should declare a condition (e.g. a SafeOutputs-style job that wants to run after failure). As written, it documents normal ADO semantics rather than finding problems. Consider restricting it to jobs that appear to be in a failure-handling pattern (e.g. reference always() in their stage description, or sit after a job that explicitly has always()/failed()), or simply remove it — it adds noise without signal.

⚠️ Suggestions

  • src/inspect/whatif.rs:357-364, src/inspect/lint.rs:315-322, src/audit/pipeline_graph.rs:139-146all_jobs() defined three times identically

    Each module carries its own copy:

    fn all_jobs(summary: &PipelineSummary) -> Vec<&JobSummary> {
        match &summary.body { ... }
    }

    These could drift independently. The PipelineSummary type lives in src/compile/ir/summary.rs; a pub fn all_jobs() method or free function there would keep a single source of truth.

  • src/main.rsado-aw audit --output default changed from ./logs to $TEMP/ado-aw/audit (breaking)

    Silently relocating the default output directory is a breaking change for any scripts that relied on ./logs being populated. The PR description notes this as intentional, but it's worth a CHANGELOG entry or a v0.x.0 bump note so operators aren't surprised.

✅ What Looks Good

  • The resolve_source_path security story in audit/pipeline_graph.rs is thorough — documented residual risk, tested for non-.md absolutes, .. traversal, and ~ prefix. The tests for the malicious-aw_info.json path are especially useful.
  • classify_condition / contains_unnegated_call in whatif.rs correctly handles not(failed()), not(not(failed())), and the failed() substring inside succeededOrFailed(). The regression tests for each of those cases are exactly what's needed.
  • PAT redaction in TraceFailureParams / AuditBuildParams Debug impls is a good safety habit.
  • The build_pipeline_ir public entry point propagates errors cleanly with anyhow::Context throughout; all unwrap()s in the diff are confined to #[cfg(test)] modules or are panic-safe constants (regex compilation, StepId::new on known literals).

Generated by Rust PR Reviewer for issue #998 · 371.2 AIC · ⌖ 12.6 AIC · ⊞ 32.9K ·

jamesadevine and others added 9 commits June 14, 2026 21:59
… whatif, lint, catalog, trace, mcp-author)

Adds an agent-facing read-only surface over the typed pipeline IR so
agents can reason about pipeline structure and failures programmatically.

- New public summary types in `src/compile/ir/summary.rs` (PipelineSummary,
  GraphSummary, etc.) with `schema_version` pinned at 1.
- New `compile::build_pipeline_ir(path) -> (FrontMatter, Pipeline)`
  read-only entry point that builds the typed IR without writing YAML.
- New `src/inspect/` module + CLI subcommands:
    * `inspect <source>`             — pipeline summary (text/json)
    * `graph dump|deps|outputs`      — resolved dependency graph
    * `whatif <source> --fail <id>`  — static reachability over conditions
    * `lint <source>`                — structural checks
    * `catalog --kind <k>`           — safe-outputs/runtimes/tools/engines/models
    * `trace <build-id-or-url>`      — joins ADO timeline with the IR graph
- New `src/mcp_author/` stdio MCP server exposing every command above as
  a read-only MCP tool for IDE / Copilot Chat integration.
- Audit integration: `AuditData.pipeline_graph`, `JobData.upstream_jobs` /
  `downstream_jobs`, and a new finding kind for downstream-skip impact.
- Docs: `docs/mcp-author.md`, public-JSON-summary section in `docs/ir.md`,
  new commands in `docs/cli.md`, audit fields in `docs/audit.md`, prompt
  updates for debug + update workflows.

Build: `cargo build`, `cargo test` (1871 + 4 passed), `cargo clippy` (0 warnings).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Windows-style hardcoded path failed on Linux CI. Resolve from

CARGO_MANIFEST_DIR so the test works on both platforms.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: jamesadevine <4742697+jamesadevine@users.noreply.github.com>
Co-authored-by: jamesadevine <4742697+jamesadevine@users.noreply.github.com>
…dback

Security:

- mcp_author: manual Debug impls for TraceFailureParams and AuditBuildParams so the optional pat field is no longer leaked via {:?} / dbg!() / rmcp error traces.

- audit::pipeline_graph::resolve_source_path: require .md extension and reject parent-dir / tilde components to close the /home/user/.ssh/id_rsa exfiltration vector flagged by the reviewer. populate_pipeline_graph downgrades a rejection to a warning so legitimate audits still complete.

Maintainability:

- audit::model::JobData: add matches_ir_id helper. findings::add_downstream_impact_findings and inspect::trace::find_runtime_job now both delegate to it instead of carrying duplicate Stage.Job suffix-match heuristics.

- audit::findings::add_downstream_impact_findings: emit a Recommendation pointing operators at the failed-job logs (the _recommendations parameter is no longer unused).

- inspect::lint::rule_step_id_collisions: collision message now names BOTH the first-seen producer location and the colliding consumer (was only pointing at the second occurrence).

- inspect::whatif::classify_condition: doc-comment now records the coverage gap for variable-based conditions (e.g. Agent.JobStatus comparisons) which we deliberately classify as Skipped; new test pins that behaviour.

New tests:

- audit::pipeline_graph: 4 resolver tests covering the rejected attack patterns and the accepted legitimate absolute-path case.

- audit::pipeline_graph::populate_pipeline_graph_records_warning_on_malicious_source — end-to-end test that a malicious source string is recorded as a warning rather than read from disk.

- inspect::whatif::variable_based_condition_is_conservatively_skipped — pins the documented limitation.

Validated locally with cargo build, cargo test (1880 unit tests + integration suites pass), and cargo clippy --all-targets --all-features (clean).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…is-output lint + refresh inspect mod doc

Bugs:
- inspect::whatif::contains_unnegated_call: add a word-boundary guard
  so the canonical-marker search no longer matches `failed()` inside
  `succeededorfailed()`. Previously `not(succeededOrFailed())` was
  wrongly classified as RunsAnyway because the inner `failed()` match
  started at offset 11 and the four chars before it were `edor`, never
  `not(`. New is_word_boundary_before helper + regression test pinning
  the corrected behaviour.
- inspect::lint::rule_missing_is_output: drop the TODO-guarded
  auto_is_output check that made the rule unreachable in normal usage.
  The rule now consistently fires when a cross-step consumer lacks
  isOutput, catching future drift between summary patching and graph
  codegen. Doc-comment explains the invariant.

Docs:
- inspect::mod doc-comment: drop the stale "Future siblings (not yet
  landed)" section -- trace/whatif/lint/catalog/graph_deps/graph_outputs
  are all present in this PR. The refreshed layout list documents every
  module currently in the inspect tree.

Validated locally: cargo build, cargo test --bin ado-aw (1881 unit
tests pass), cargo clippy --all-targets --all-features (clean).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…dback

Bugs:
- audit::pipeline_graph::timeline_name_matches_job and
  audit::model::JobData::matches_ir_id: tighten the dotted-name suffix
  fallback so multi-level names like `Stage1.SubStage.Agent` no longer
  spuriously match against the bare id `Agent`. The fallback now only
  accepts a single-level `Stage.Job` qualifier (rsplit_once + prefix
  must contain no dot). New regression tests pin the behaviour.
- audit::cli::fetch_audit_data_inner: extract the post-fetch enrichment
  (pipeline_graph + metrics + derive_findings) into a single helper so
  the fresh-download and cache-load paths both produce structurally
  identical AuditData. The cache-load path now persists the recomputed
  data back to run-summary.json when it changed, so subsequent runs see
  the canonical shape and tooling diffing successive outputs is no
  longer confused by drift between the saved file and in-memory result.

Suggestions:
- mcp_author::audit_build: route `no_cache: true` invocations through a
  per-invocation `tempfile::tempdir()` so two concurrent calls for the
  same build can no longer race on partially-written artifacts in the
  shared `${TEMP}/ado-aw/audit/build-<id>` directory. Cached calls
  continue to use the shared cache root so warm-cache reads still
  benefit from cross-invocation reuse.
- compile::build_pipeline_ir: add `/* skip_integrity */` and
  `/* debug_pipeline */` inline-name comments to every dispatched
  target-builder call (only Standalone had them) so future callers
  cannot cargo-cult the wrong positional bool.

Validated locally: cargo build, cargo test (1884 unit tests + every
integration suite pass), cargo clippy --all-targets --all-features
(clean).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…998 round 4

All four items in the latest review are documentation-only — the
runtime behaviour is already correct and the reviewer asked for
explicit comments so future maintainers do not weaken the guards or
re-introduce the fixed bugs by accident.

- inspect::whatif::classify_condition: doc-comment now explicitly
  flags the string-literal false-positive (e.g. a condition like
  eq(variables['result'], 'failed()') would match the literal
  substring) alongside the existing variable-condition limitation.
  This is acceptable because ADO conditions are compiler-generated,
  but the residual gap is now documented.

- audit::cli::fetch_audit_data_inner cache write path: add an inline
  comment recording the deliberate lack of filesystem locking on
  save_run_summary. Two concurrent ado-aw audit runs for the same
  build can race on this write; the failure is recorded as a warning
  rather than aborting the audit, and both writers derive from the
  same on-disk artifacts so the resulting summary stays internally
  consistent.

- audit::pipeline_graph::resolve_source_path: complete the previously
  truncated security comment so the residual risk is unambiguous —
  the .md extension check is the primary gate for absolute paths, and
  weakening or removing it without adding a containment check would
  silently re-open the arbitrary-file-read vector flagged earlier.

- main: comment on the std::process::exit(1) in the Lint dispatch arm
  explaining the intentional Drop-bypass (mirrors tsc --noEmit /
  eslint) and that the only resources in scope are runtime-managed
  async I/O.

No behavioural changes. Validated locally with cargo build, cargo
test --bin ado-aw (1884 unit tests pass), and cargo clippy
--all-targets --all-features (clean).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…crate::audit::default_cache_root

Previously the `./logs` path was hardcoded in three places and the
default cache directory disagreed between entry points:

- The CLI `audit` command defaulted `--output` to `./logs`.
- The CLI `trace` command and `inspect::build_trace` opened
  `Path::new("./logs")` directly.
- The mcp-author `audit_build` tool used
  `${TEMP}/ado-aw/audit` while the mcp-author `trace_failure` tool
  inherited the `./logs` default — silently scattering log
  directories under arbitrary IDE working directories.

Introduce `crate::audit::default_cache_root()` returning
`${TEMP}/ado-aw/audit` on every platform, re-exported from
`crate::audit`. Every entry point now resolves to that helper:

- CLI `audit`: `--output` is `Option<PathBuf>`, unset → helper.
- CLI `trace`: passes `output: None` so `inspect::build_trace`
  resolves to the helper.
- `inspect::build_trace`: default for `TraceOptions::output: None`
  is the helper.
- mcp-author `audit_build`: uses the helper for normal calls and
  layers a per-invocation `tempfile::tempdir()` on top when
  `no_cache: true`.
- mcp-author `trace_failure`: passes `output: None`.

Docs (`docs/audit.md`, `docs/cli.md`,
`site/src/content/docs/setup/cli.mdx`,
`site/src/content/docs/reference/audit.mdx`,
`prompts/debug-ado-agentic-workflow.md`) updated to reflect the
new default and to note that the cache is shared across every
audit entry point.

Validated locally: cargo build, cargo test (1884 unit tests + every
integration suite pass), cargo clippy --all-targets --all-features
(clean).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jamesadevine jamesadevine force-pushed the feat/agent-ir-tools branch from f775a20 to 7660b06 Compare June 14, 2026 21:01
… noisy lint rule + restore ./logs CLI default

Addresses PR #998 round 6:

- Drop `inspect::lint::rule_no_condition_references`. The rule fires
  an Info finding for every job that has `depends_on` with no
  explicit `condition:` — i.e. every non-Setup job in the canonical
  Setup → Agent → Detection → SafeOutputs shape. ADO's default
  `succeeded()` semantics are intentional, not a bug, so the rule
  documented normal behaviour rather than finding problems and made
  genuine findings harder to spot. Removed entirely; the
  serialisation roundtrip test that referenced the rule's code
  string still passes because the literal value is independent of
  rule registration.

- Centralise `all_jobs()` as `PipelineSummary::all_jobs`. Three
  copies (in `audit::pipeline_graph`, `inspect::lint`, and
  `inspect::whatif`) collapsed into a single method on the public
  IR summary type. Call sites in those three modules plus
  `inspect::trace::stage_for_job` now go through the method. Future
  body-shape additions only need updating in one place.

- Revert the CLI `ado-aw audit --output` default from
  `${TEMP}/ado-aw/audit` back to `./logs` to preserve backward
  compatibility for scripts that consume the documented `./logs`
  location. Non-CLI callers — `inspect::build_trace` (used by the
  CLI `trace` command and the mcp-author `trace_failure` tool) and
  the mcp-author `audit_build` tool — continue to anchor under
  `crate::audit::default_cache_root()` so they do not scatter
  `./logs/` directories under arbitrary IDE working directories.
  Docs (`docs/audit.md`, `docs/cli.md`, both site mdx files, and the
  debug prompt) updated to reflect the per-entry-point split.

Validated locally: cargo build, cargo test (1887 unit tests + every
integration suite pass), cargo clippy --all-targets --all-features
(clean).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Looks good overall — well-layered design, strong test coverage, and commendable security hygiene throughout. Two issues worth addressing before merge.


Findings

🐛 Bugs / Logic Issues

  • src/inspect/whatif.rsis_negated_call panics on non-ASCII char boundaries

    while idx >= NOT_PREFIX_LEN
        && normalized_condition[idx - NOT_PREFIX_LEN..idx] == *NOT_PREFIX

    normalized_condition[idx - 4..idx] is a str byte-range slice. Rust panics if either bound isn't on a UTF-8 char boundary. call_idx is guaranteed to be on a char boundary (it comes from .find()), but idx - 4 is only safe when the four preceding bytes are all ASCII. If a condition string ever contains a multi-byte UTF-8 sequence immediately before not( — unlikely but possible if a job display name leaks in — this panics at runtime.

    Fix: compare byte slices instead:

    && normalized_condition.as_bytes().get(idx - NOT_PREFIX_LEN..idx)
        == Some(NOT_PREFIX.as_bytes())

    This is always bounds-safe, no char-boundary requirement.

  • src/audit/pipeline_graph.rs:resolve_source_path — symlink bypasses the .md extension gate

    For absolute paths, the only check is the file extension:

    if path.is_absolute() {
        return Ok(path);  // path ends in .md → accepted
    }

    A symlink at /tmp/foo.md → /etc/passwd passes this check and proceeds to build_pipeline_ir which will attempt to read it. The blast radius is just a parse error (as your security comment correctly notes), but the existing comment says "Do not weaken or remove the .md extension check without also adding a containment check" — the symlink vector goes unmentioned and is a silently open gap against any "add a safeguard" intent. Consider adding tokio::fs::symlink_metadata or noting symlinks explicitly in the security comment so future maintainers don't miss it.


🔒 Security Concerns

  • src/audit/cli.rs:default_cache_root — world-readable path on multi-user Linux

    pub fn default_cache_root() -> PathBuf {
        std::env::temp_dir().join("ado-aw").join("audit")
    }

    On Linux without pam_tmpdir, std::env::temp_dir() returns /tmp, which is world-readable. Build artifacts and pipeline summaries (job names, ADO org/project, finding descriptions) written under /tmp/ado-aw/audit are visible to any local user. The old ./logs default was CWD-relative and under caller control. The MCP-server motivation for a shared root is sound; consider at minimum creating the directory with mode 0700 on first use, or using dirs::cache_dir() which gives ~/.cache/ado-aw on XDG Linux (always user-private).


⚠️ Suggestions

  • src/inspect/lint.rs:consumed_outputs — considers only cross-step refs, not within-step condition refs

    consumed_outputs is built from graph.outputs_needing_is_output, which captures only outputs consumed by other steps (cross-step refs that need isOutput=true). If a step declares an output and that output is only referenced in the same step's own condition: (an edge case, but representable in the IR), the output won't appear in outputs_needing_is_output, so rule_unused_output will fire a false-positive warning. Low practical impact — just worth a // NOTE: comment in consumed_outputs so future contributors know the scope.

  • src/audit/pipeline_graph.rs:find_matching_job_summary — no stage disambiguation

    find_matching_job_summary finds the first JobSummary whose id matches; in a staged pipeline, two stages could theoretically have jobs with the same id (though the IR graph pass rejects this). A defensive assert or a stage-aware match would make the intent explicit and protect against future changes to uniqueness guarantees.


✅ What Looks Good

  • PAT redaction in Debug impls — Both TraceFailureParams and AuditBuildParams implement custom Debug that calls redacted_pat. This is exactly right; it means {:?} and any framework error trace never leaks credentials.
  • timeline_name_matches_job — correct single-level guard — Rejecting Stage1.SubStage.Agent via !prefix.contains('.') is the right call; the regression test for multi-level qualified names confirms the intent.
  • Security comment in resolve_source_path — The ## Residual risk block is exemplary documentation; the blast-radius analysis and the "do not remove without adding containment" warning are exactly what security reviewers need.
  • derive_post_processing refactor — centralising findings::derive_findings + pipeline_graph::populate_pipeline_graph so both fresh-download and cache-hit paths produce structurally identical AuditData eliminates a whole class of cache-vs-live drift bugs.
  • whatif word-boundary guard — the is_word_boundary_before check preventing failed() from matching inside succeededorfailed() is subtle and correct; the unit tests for negated predicates cover the tricky cases well.

Generated by Rust PR Reviewer for issue #998 · 342.7 AIC · ⌖ 12.8 AIC · ⊞ 34.5K ·

…in resolve_source_path

Two bugs from PR #998 round 7:

- inspect::whatif::is_negated_call: switch from `&str[range]`
  indexing to byte-slice comparison via
  `bytes.get(idx - 4..idx) == Some(b"not(")`. The previous form
  panicked if the four bytes immediately before the matched call
  straddled a UTF-8 char boundary — possible if a non-ASCII display
  name leaked into the condition string. New regression test
  prepends `é` (a two-byte UTF-8 sequence) before `failed()` and
  asserts the classifier returns RunsAnyway instead of panicking.

- audit::pipeline_graph::resolve_source_path: a symlink at
  `/tmp/foo.md` → `/etc/passwd` lexically satisfied the `.md`
  extension check and was passed through unguarded. We now
  canonicalize absolute paths after the extension check and reject
  them if the resolved target does not also end in `.md`, closing
  the symlink-bypass vector. Legitimate `current.md` → `v1.md` style
  symlinks remain accepted because the resolved target still ends
  in `.md`. A new `has_md_extension` helper factors out the shared
  extension test. Two new unix-only regression tests cover both
  the rejected-symlink and the accepted-symlink cases. The
  security doc-comment now explicitly calls out the symlink-target
  re-check so future maintainers do not weaken it.

Validated locally: cargo build, cargo test --bin ado-aw (1888
unit tests pass on Windows; +2 unix-only symlink tests compile
behind cfg(unix)), cargo clippy --all-targets --all-features
(clean).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: looks good overall — solid architecture, strong security posture on the resolve_source_path path; one genuine bug in the unused-output lint rule worth fixing before merge.


Findings

🐛 Bugs / Logic Issues

src/inspect/lint.rs:consumed_outputs — false positives for same-job consumers

rule_unused_output determines whether an output is "consumed" by checking summary.graph.outputs_needing_is_output (the set of outputs that require isOutput=true because they're consumed across jobs). An output referenced by a peer step within the same job — via env_refs or condition_refs — won't appear in outputs_needing_is_output and will be incorrectly flagged as unused-output.

This is inconsistent with graph_deps::step_refs, which correctly walks both env_refs and condition_refs regardless of job boundary. The fix is to union in those same-job refs:

fn consumed_outputs(summary: &PipelineSummary) -> BTreeSet<(String, String)> {
    let mut consumed: BTreeSet<_> = summary
        .graph
        .outputs_needing_is_output
        .iter()
        .flat_map(|entry| entry.outputs.iter().map(|o| (entry.step.clone(), o.clone())))
        .collect();
    // Also count same-job consumers (env_refs + condition_refs);
    // these do not need isOutput=true but the output is still used.
    for (_, step) in all_steps(summary) {
        for r in step.env_refs.iter().chain(step.condition_refs.iter()) {
            consumed.insert((r.step.clone(), r.name.clone()));
        }
    }
    consumed
}

The existing test create_pull_request_fixture_has_no_unused_output_inspect_lint won't catch this because that fixture's outputs are all consumed cross-job; a regression test with a same-job consumer would pin the fix.


⚠️ Suggestions

src/mcp_author/mod.rs:parse_graph_dump_format"json" format silently unsupported

The CLI's GraphCmd::Dump accepts text, json, dot, but the MCP graph_dump tool only accepts text and dot — the error message explicitly says "expected 'text' or 'dot'". Either expose json in the MCP tool (trivial — just add it to the match) or document the intentional omission in the GraphDumpParams docstring and the MCP tool description.

src/audit/cli.rs:derive_post_processing on cache hits — pipeline graph from current source

When loading a cached run-summary.json, derive_post_processing re-runs populate_pipeline_graph against the current local source file. If the workflow has been updated since the build was cached, the pipeline_graph section will reflect today's code, not the build's actual code. For ado-aw audit this is the expected behaviour (always correlate against local IR), but it's worth a comment in derive_post_processing calling this out so a future maintainer doesn't "fix" it into using the cached graph.

src/compile/ir/summary.rs:PipelineSummary::all_jobsVec allocation on every call

all_jobs() returns Vec<&JobSummary> and is called on the hot paths for populate_job_edges, find_matching_job_summary, and the inspect traversals. For typical pipeline sizes this is fine, but returning impl Iterator<Item = &JobSummary> (or an inline Either-style enum) would avoid the per-call heap allocation without any API sacrifice since all callers only iterate.


✅ What Looks Good

  • whatif negation logicis_negated_call + is_word_boundary_before + contains_unnegated_call handle the not(failed()) / not(not(failed())) / not(succeededOrFailed()) / multibyte-char cases correctly and are thoroughly covered by targeted regression tests. The byte-slice approach in is_negated_call (avoiding &str[idx-4..idx] to prevent UTF-8 boundary panics) is the right fix.

  • resolve_source_path security model — the .md extension gate, .. component rejection, tilde prefix rejection, and post-canonicalize symlink-target re-check together close the key arbitrary-file-read vectors documented in the threat analysis. The residual-risk section is honest and actionable.

  • PAT redaction in Debug implsTraceFailureParams and AuditBuildParams both implement Debug manually to emit <redacted> for the pat field. This prevents accidental credential exposure in rmcp's internal error traces.

  • **Job(redacted) single-level qualifier** — correctly rejects Stage1.SubStage.Agent(two-dot paths) while acceptingPipeline.Agent`, with regression tests pinning both behaviours.

  • build_pipeline_ir passes skip_integrity: true — the read-only IR entry point correctly bypasses every write side-effect across all four compile targets.

  • Trust boundary preservedmcp-author is a new, separate server; Stage 1 sandboxed agents do not gain access to inspect/graph/whatif/audit tools.

Generated by Rust PR Reviewer for issue #998 · 380.9 AIC · ⌖ 12.2 AIC · ⊞ 32.9K ·

…positive + zero-alloc all_jobs + JSON graph_dump + cache-hit doc

Bugs:
- inspect::lint::consumed_outputs: union the set of outputs flagged
  by `graph.outputs_needing_is_output` with every step's `env_refs`
  and `condition_refs`. Same-job consumers do not appear in
  `outputs_needing_is_output` (ADO does not require isOutput=true for
  them), and the previous logic mis-classified those outputs as
  unused. Two new regression tests pin the fix for both env_ref and
  condition_ref same-job consumers; existing cross-job and fixture
  tests remain green.

Suggestions:
- mcp-author `graph_dump`: now accepts `"json"` alongside `"text"`
  and `"dot"`, matching the CLI's `graph` subcommand. GraphDumpParams
  docstring + tool description updated; the error message lists all
  three formats.
- audit::cli::derive_post_processing: doc-comment now explicitly
  records that cache-hit invocations correlate against the **current
  local source markdown** by design, so a future maintainer does not
  "fix" it into using a stale cached graph snapshot (the downstream
  findings rules depend on the fresh correlation).
- compile::ir::summary::PipelineSummary::all_jobs: switch the return
  type from `Vec<&JobSummary>` to `impl Iterator<Item = &JobSummary>`
  via a small `AllJobsIter` either-enum, removing the per-call heap
  allocation on the hot paths (`populate_job_edges`,
  `find_matching_job_summary`, the inspect traversals). All callers
  already chained `.into_iter()`; the redundant calls were dropped
  via `cargo clippy --fix`.

Validated locally: cargo build, cargo test (1890 unit tests + every
integration suite pass), cargo clippy --all-targets --all-features
(clean).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Clean layered design with solid security discipline in most places — after several rounds of fixes the major issues (PAT redaction, symlink bypass, byte-safe is_negated_call, same-job consumed_outputs false positive) are all resolved. One security concern remains open across every prior review round; one design issue with the new JSON dump format is worth a quick fix before merge.


Findings

🔒 Security Concerns

src/mcp_author/mod.rs:371-373source_path() accepts arbitrary paths without validation (still open)

fn source_path(path: &str) -> PathBuf {
    PathBuf::from(path)  // no checks at all
}

Every tool that takes a source_path parameter (inspect_workflow, graph_summary, graph_dump, step_dependencies, step_outputs, whatif, lint_workflow) feeds the caller-supplied string directly to build_pipeline_ir(), which opens and reads the file. This has been flagged in every previous round of review and is still unaddressed.

The mcp-author server runs as the invoking user and is explicitly intended for IDE/Copilot Chat use — an environment where the agent processes untrusted content (PR descriptions, issue comments, code). A prompt-injected request like inspect_workflow(source_path="../../.ssh/authorized_keys.md") would reach build_pipeline_ir, read the file, and return a parse error whose message includes the path.

The fix already exists in this PR: pipeline_graph.rs::resolve_source_path() implements exactly the right guards (.md extension check, .. component rejection, ~ prefix rejection, symlink re-check). Reuse it or extract it to compile::mod and call it from both sites. The doc-comment on resolve_source_path even warns "Do not weaken or remove the .md extension check... without also adding a containment check" — the MCP server should honour the same invariant.


🐛 Bugs / Logic Issues

src/mcp_author/mod.rs:190-199graph_dump with format="json" returns double-encoded JSON

// build_graph_dump(GraphFormat::Json) returns serde_json::to_string_pretty(&summary.graph)
// — a String containing JSON text.
let text_or_dot = inspect::build_graph_dump(&source, format).await?;
structured_result(GraphDumpResult { text_or_dot })   // serialised again → {"text_or_dot": "{...}"}

When format is "json", build_graph_dump returns a JSON-serialized GraphSummary string. mcp_author then re-serializes it into GraphDumpResult { text_or_dot: String }, so the MCP caller receives {"text_or_dot": "{\"step_locations\": [...]}"} — the inner JSON is a string literal that must be parsed twice.

graph_summary already returns a proper GraphSummary object for the same data. Either: (a) route json format through graph_summary instead, or (b) emit the pre-parsed value directly:

let format = parse_graph_dump_format(params.0.format.as_deref())?;
if format == GraphFormat::Json {
    let graph = inspect::build_graph_summary(&source).await.map_err(to_mcp_error)?;
    return structured_result(graph);
}
let text_or_dot = inspect::build_graph_dump(&source, format).await.map_err(to_mcp_error)?;
structured_result(GraphDumpResult { text_or_dot })

⚠️ Suggestions

  • src/inspect/whatif.rs:394,403b.chars().count() is called twice (once to allocate prev, once as the final index). A let b_len = b.chars().count(); at the top of levenshtein avoids the redundant O(n) scan.

  • src/audit/pipeline_graph.rs:101-102summary.all_jobs().into_iter()all_jobs() already returns impl Iterator; .into_iter() on an iterator is a no-op. Just .all_jobs().find(...).


✅ What Looks Good

  • resolve_source_path() — extension check + symlink canonicalize re-check + .. component rejection + ~ guard is thorough; the residual-risk doc-comment is accurate and actionable.
  • is_negated_call() — byte-slice comparison (bytes.get(idx-4..idx) == Some(b"not(")) is correctly crash-safe against non-ASCII boundaries.
  • **Job(redacted) — single-level dot restriction is well-documented and the regression test for Stage1.SubStage.Agent` pins the right boundary.
  • PAT redaction — manual Debug impls on TraceFailureParams and AuditBuildParams prevent PAT exposure in framework error traces; redacted_pat helper is clean.
  • AllJobsIter — zero-allocation iterator enum avoids the per-call Vec without reaching for Box(dyn Iterator).
  • consumed_outputs — correctly unions outputs_needing_is_output with same-job env_refs / condition_refs; the regression test consumed_outputs_do_not_emit_unused_output_inspect_lint pins the fix.
  • Test coverage — end-to-end CLI tests on the create-pull-request.md fixture plus targeted unit tests for every new rule and helper are well-chosen.

Generated by Rust PR Reviewer for issue #998 · 1.2K AIC · ⌖ 13 AIC · ⊞ 32.9K ·

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants