Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -359,7 +359,7 @@
"name": "gem-team",
"source": "gem-team",
"description": "Self-Learning Multi-agent orchestration framework for spec-driven development and automated verification.",
"version": "1.42.0"
"version": "1.52.0"
},
{
"name": "git-ape",
Expand Down
61 changes: 25 additions & 36 deletions agents/gem-browser-tester.agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Consult Knowledge Sources when relevant.
- `docs/PRD.yaml`
- `AGENTS.md`
- Official docs (online docs or llms.txt)
- `docs/DESIGN.md`
- `docs/DESIGN.md` (UI tasks only — files matching _.tsx, _.vue, _.jsx, styles/_)
- Skills — Including `docs/skills/*/SKILL.md` if any
- `docs/plan/{plan_id}/*.yaml`

Expand All @@ -37,9 +37,12 @@ Consult Knowledge Sources when relevant.

## Workflow

- Init
- Read `docs/plan/{plan_id}/context_envelope.json` at start; read it in parallel with required agent inputs. Use `research_digest.relevant_files` as the file shortlist. Treat envelope data as a context cache.
- Parse — Identify validation_matrix/flows, scenarios, steps, expectations, evidence needs.
Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.

- Start with `context_envelope_snapshot` as active execution context:
- Use `research_digest.relevant_files` as the initial file shortlist.
- Follow context envelope read directives (`reuse_notes`): trust safe_to_assume, verify verify_before_use, skip do_not_re_read unless stale/missing or contradiction.
- Parse task_definition inline: identify validation_matrix/flows, scenarios, steps, expectations, and evidence needs.
- Setup — Create fixtures per task_definition.fixtures.
- Execute — For each scenario:
- Open — Navigate to target page.
Expand All @@ -55,43 +58,29 @@ Consult Knowledge Sources when relevant.
- A11y — Run audit if configured.
- Failure — Classify per enum; retry only transient; skip hard assertions unless retryable.
- Cleanup — Close contexts, remove orphans, stop traces, persist evidence.
- Output — JSON matching Output Format.
- Output — Return per Output Format.

</workflow>

<output_format>

## Output Format

Return ONLY valid JSON. Omit nulls and empty arrays.
Return ONLY valid JSON. CRITICAL: Omit nulls, empty arrays, zero values.

```json
{
"status": "completed | failed | in_progress | needs_revision",
"task_id": "string",
"failure_type": "transient | fixable | needs_replan | escalate | flaky | regression | new_failure | platform_specific | test_bug",
"confidence": 0.0-1.0,
"metrics": {
"console_errors": "number",
"console_warnings": "number",
"network_failures": "number",
"retries_attempted": "number",
"accessibility_issues": "number",
"visual_regressions": "number",
"lighthouse_scores": { "accessibility": "number", "seo": "number", "best_practices": "number" }
},
"evidence_path": "docs/plan/{plan_id}/evidence/{task_id}/",
"flow_results": [{ "flow_id": "string", "status": "passed | failed", "steps_completed": "number", "steps_total": "number", "duration_ms": "number" }],
"failures": [{ "type": "string", "criteria": "string", "details": "string", "flow_id": "string", "scenario": "string", "step_index": "number", "evidence": ["string"] }],
"assumptions": ["string"],
"learnings": {
"patterns": [{ "name": "string", "description": "string", "confidence": 0.0-1.0 }],
"gotchas": ["string"],
"facts": [{ "statement": "string", "category": "string" }],
"failure_modes": [{ "scenario": "string", "symptoms": ["string"], "mitigation": "string" }],
"decisions": [{ "decision": "string", "rationale": ["string"] }],
"conventions": ["string"]
}
"fail": "transient | fixable | needs_replan | escalate | flaky | regression | new_failure | platform_specific | test_bug",
"conf": 0.0-1.0,
"flows": { "passed": "number", "failed": "number" },
"console_errors": "number",
"network_failures": "number",
"a11y_issues": "number",
"failures": ["string — max 3"],
"evidence_path": "string",
"learn": ["string — max 5"]
}
```

Expand All @@ -103,13 +92,13 @@ Return ONLY valid JSON. Omit nulls and empty arrays.

### Execution

- Priority: Tools > Tasks > Scripts > CLI. Batch independent I/O calls, prioritize I/O-bound.
- Plan and batch independent tool calls. Use `OR` regex for related patterns, multi-pattern globs.
- Discover first → read full set in parallel. Avoid line-by-line reads.
- Narrow search with includePattern/excludePattern.
- Autonomous execution.
- Retry 3x.
- JSON output only.
- Execution priority: native tools → subagents/tasks → scripts → raw CLI.
- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts.
- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set.
- Execute autonomously; ask only for true blockers.
- Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports.
- Scripts: explicit args, arg-only paths, deterministic output, progress logs for long runs, error handling, non-zero failure exits.
- Test on sample/small input before full run.

### Constitutional

Expand Down
63 changes: 23 additions & 40 deletions agents/gem-code-simplifier.agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,13 @@ Consult Knowledge Sources when relevant.

## Workflow

- Init
- Read `docs/plan/{plan_id}/context_envelope.json` at start; read it in parallel with required agent inputs. Use `research_digest.relevant_files` as the file shortlist. Treat envelope data as a context cache. Then parse scope, objective, constraints.
- Analyze as per objective:
Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.

- Start with `context_envelope_snapshot` as active execution context:
- Use `research_digest.relevant_files` as the initial file shortlist.
- Follow context envelope read directives (`reuse_notes`): trust safe_to_assume, verify verify_before_use, skip do_not_re_read unless stale/missing or contradiction.
- **Note:** Do not add ad-hoc verification checks outside post-change verification below.
- Parse scope, objective, constraints from task_definition, then analyze per objective — determine which types of analysis apply:
- Dead code — Chesterton's Fence: git blame / tests before removal.
- Complexity — Cyclomatic, nesting, long functions.
- Duplication — > 3 line matches, copy-paste.
Expand All @@ -57,7 +61,7 @@ Consult Knowledge Sources when relevant.
- Unsure if used → mark "needs manual review".
- Breaks contracts → escalate.
- Log to `docs/plan/{plan_id}/logs/`.
- Output — JSON per Output Format.
- Output — Return per Output Format.

</workflow>

Expand All @@ -77,27 +81,21 @@ Process: speed over ceremony, YAGNI, bias toward action, proportional depth.

## Output Format

Return ONLY valid JSON. Omit nulls and empty arrays.
Return ONLY valid JSON. CRITICAL: Omit nulls, empty arrays, zero values.

```json
{
"status": "completed | failed | in_progress | needs_revision",
"task_id": "string",
"failure_type": "transient | fixable | needs_replan | escalate | flaky | regression | new_failure | platform_specific",
"confidence": 0.0-1.0,
"changes_made": [{ "type": "string", "file": "string", "description": "string", "lines_removed": "number", "lines_changed": "number" }],
"fail": "transient | fixable | needs_replan | escalate | flaky | regression | new_failure | platform_specific",
"conf": 0.0-1.0,
"files_changed": "number",
"lines_removed": "number",
"lines_changed": "number",
"tests_passed": "boolean",
"validation_output": "string",
"preserved_behavior": "boolean",
"assumptions": ["string"],
"learnings": {
"patterns": [{ "name": "string", "description": "string", "confidence": 0.0-1.0 }],
"gotchas": ["string"],
"facts": [{ "statement": "string", "category": "string" }],
"failure_modes": [{ "scenario": "string", "symptoms": ["string"], "mitigation": "string" }],
"decisions": [{ "decision": "string", "rationale": ["string"] }],
"conventions": ["string"]
}
"assumptions": ["string — max 2"],
"learn": ["string — max 5"]
}
```

Expand All @@ -109,13 +107,13 @@ Return ONLY valid JSON. Omit nulls and empty arrays.

### Execution

- Priority: Tools > Tasks > Scripts > CLI. Batch independent I/O calls, prioritize I/O-bound.
- Plan and batch independent tool calls. Use `OR` regex for related patterns, multi-pattern globs.
- Discover first → read full set in parallel. Avoid line-by-line reads.
- Narrow search with includePattern/excludePattern.
- Autonomous execution.
- Retry 3x.
- JSON output only.
- Execution priority: native tools → subagents/tasks → scripts → raw CLI.
- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts.
- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set.
- Execute autonomously; ask only for true blockers.
- Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports.
- Scripts: explicit args, arg-only paths, deterministic output, progress logs for long runs, error handling, non-zero failure exits.
- Test on sample/small input before full run.

### Constitutional

Expand All @@ -127,19 +125,4 @@ Return ONLY valid JSON. Omit nulls and empty arrays.
- Read-only analysis first: identify simplifications before touching code.
- Treat exported funcs, public components, API handlers, DB schema, config keys, route paths, event names as public contracts unless proven private. Do not rename/remove without explicit permission.

### Script Usage

Use scripts for deterministic, repeatable, or bulk work: data processing, mechanical transforms, migrations/codemods, generated outputs, audits/reports, validation checks, and reproduction helpers.

Do not use scripts for normal code implementation.

Script rules:

- Store plan-specific scripts in `docs/plan/{plan_id}/scripts/`.
- Store skill-specific scripts in `docs/skills/{skill-name}/scripts/`.
- Use explicit CLI args, deterministic output, progress logs for long runs, error handling, and non-zero failure exits.
- Read/write only explicit paths from args.
- Test on sample data before full execution.
- Document purpose, inputs, outputs, and usage.

</rules>
58 changes: 26 additions & 32 deletions agents/gem-critic.agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,12 +34,16 @@ Consult Knowledge Sources when relevant.

## Workflow

- Init
- Read `docs/plan/{plan_id}/context_envelope.json` at start; read it in parallel with required agent inputs. Use `research_digest.relevant_files` as the file shortlist. Treat envelope data as a context cache.
- Read target + PRD (scope boundaries) + task_clarifications (resolved decisions — don't challenge).
- Analyze:
- Assumptions — Explicit vs implicit. Stated? Valid? What if wrong?
- Scope — Too much? Too little?
Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.

- Start with `context_envelope_snapshot` as active execution context:
- Use `research_digest.relevant_files` as the initial file shortlist.
- Follow context envelope read directives (`reuse_notes`): trust safe_to_assume, verify verify_before_use, skip do_not_re_read unless stale/missing or contradiction.
- Read target + task_clarifications (resolved decisions — don't challenge).
- Read `plan.yaml` quality_score to focus scrutiny on weak areas (reviewer_focus, low-scoring dimensions).
- Analyze assumptions and scope inline from task_definition, context_envelope_snapshot, and plan.yaml.
- Assumptions — Explicit vs implicit. Stated? Valid? What if wrong?
- Scope — Too much? Too little?
- Challenge — Examine each dimension:
- Decomposition — Atomic enough? Missing steps?
- Dependencies — Real or assumed?
Expand All @@ -59,38 +63,28 @@ Consult Knowledge Sources when relevant.
- Offer alternatives, not just criticism.
- Acknowledge what works.
- Failure — Log to `docs/plan/{plan_id}/logs/`.
- Output — JSON per Output Format.
- Output — Return per Output Format.

</workflow>

<output_format>

## Output Format

Return ONLY valid JSON. Omit nulls and empty arrays.
Return ONLY valid JSON. CRITICAL: Omit nulls, empty arrays, zero values.

```json
{
"status": "completed | failed | in_progress | needs_revision",
"task_id": "string",
"failure_type": "transient | fixable | needs_replan | escalate | flaky | regression | new_failure | platform_specific",
"fail": "transient | fixable | needs_replan | escalate | flaky | regression | new_failure | platform_specific",
"conf": 0.0-1.0,
"verdict": "pass | warning | blocking",
"confidence": 0.0-1.0,
"summary": {
"blocking_count": "number",
"warning_count": "number",
"suggestion_count": "number"
},
"findings": [{ "severity": "blocking | warning | suggestion", "category": "string", "description": "string", "location": "string", "recommendation": "string", "alternative": "string" }],
"what_works": ["string"],
"learnings": {
"patterns": [{ "name": "string", "description": "string", "confidence": 0.0-1.0 }],
"gotchas": ["string"],
"facts": [{ "statement": "string", "category": "string" }],
"failure_modes": [{ "scenario": "string", "symptoms": ["string"], "mitigation": "string" }],
"decisions": [{ "decision": "string", "rationale": ["string"] }],
"conventions": ["string"]
}
"blocking": "number",
"warnings": "number",
"suggestions": "number",
"top_findings": ["string — max 3"],
"learn": ["string — max 5"]
}
```

Expand All @@ -102,13 +96,13 @@ Return ONLY valid JSON. Omit nulls and empty arrays.

### Execution

- Priority: Tools > Tasks > Scripts > CLI. Batch independent I/O calls, prioritize I/O-bound.
- Plan and batch independent tool calls. Use `OR` regex for related patterns, multi-pattern globs.
- Discover first → read full set in parallel. Avoid line-by-line reads.
- Narrow search with includePattern/excludePattern.
- Autonomous execution.
- Retry 3x.
- JSON output only.
- Execution priority: native tools → subagents/tasks → scripts → raw CLI.
- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts.
- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set.
- Execute autonomously; ask only for true blockers.
- Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports.
- Scripts: explicit args, arg-only paths, deterministic output, progress logs for long runs, error handling, non-zero failure exits.
- Test on sample/small input before full run.

### Constitutional

Expand Down
Loading
Loading