github · mubaidr · May 29, 2026 · May 30, 2026 · Jun 1, 2026 · Jun 1, 2026
@@ -359,7 +359,7 @@
       "name": "gem-team",
       "source": "gem-team",
       "description": "Self-Learning Multi-agent orchestration framework for spec-driven development and automated verification.",
-      "version": "1.42.0"
+      "version": "1.52.0"
     },
     {
       "name": "git-ape",

@@ -27,7 +27,7 @@ Consult Knowledge Sources when relevant.
 - `docs/PRD.yaml`
 - `AGENTS.md`
 - Official docs (online docs or llms.txt)
-- `docs/DESIGN.md`
+- `docs/DESIGN.md` (UI tasks only — files matching _.tsx, _.vue, _.jsx, styles/_)
 - Skills — Including `docs/skills/*/SKILL.md` if any
 - `docs/plan/{plan_id}/*.yaml`
 
@@ -37,9 +37,12 @@ Consult Knowledge Sources when relevant.
 
 ## Workflow
 
-- Init
-  - Read `docs/plan/{plan_id}/context_envelope.json` at start; read it in parallel with required agent inputs. Use `research_digest.relevant_files` as the file shortlist. Treat envelope data as a context cache.
-- Parse — Identify validation_matrix/flows, scenarios, steps, expectations, evidence needs.
+Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.
+
+- Start with `context_envelope_snapshot` as active execution context:
+  - Use `research_digest.relevant_files` as the initial file shortlist.
+  - Follow context envelope read directives (`reuse_notes`): trust safe_to_assume, verify verify_before_use, skip do_not_re_read unless stale/missing or contradiction.
+  - Parse task_definition inline: identify validation_matrix/flows, scenarios, steps, expectations, and evidence needs.
 - Setup — Create fixtures per task_definition.fixtures.
 - Execute — For each scenario:
   - Open — Navigate to target page.
@@ -55,43 +58,29 @@ Consult Knowledge Sources when relevant.
   - A11y — Run audit if configured.
 - Failure — Classify per enum; retry only transient; skip hard assertions unless retryable.
 - Cleanup — Close contexts, remove orphans, stop traces, persist evidence.
-- Output — JSON matching Output Format.
+- Output — Return per Output Format.
 
 </workflow>
 
 <output_format>
 
 ## Output Format
 
-Return ONLY valid JSON. Omit nulls and empty arrays.
+Return ONLY valid JSON. CRITICAL: Omit nulls, empty arrays, zero values.
 
 ```json
 {
   "status": "completed | failed | in_progress | needs_revision",
   "task_id": "string",
-  "failure_type": "transient | fixable | needs_replan | escalate | flaky | regression | new_failure | platform_specific | test_bug",
-  "confidence": 0.0-1.0,
-  "metrics": {
-    "console_errors": "number",
-    "console_warnings": "number",
-    "network_failures": "number",
-    "retries_attempted": "number",
-    "accessibility_issues": "number",
-    "visual_regressions": "number",
-    "lighthouse_scores": { "accessibility": "number", "seo": "number", "best_practices": "number" }
-  },
-  "evidence_path": "docs/plan/{plan_id}/evidence/{task_id}/",
-  "flow_results": [{ "flow_id": "string", "status": "passed | failed", "steps_completed": "number", "steps_total": "number", "duration_ms": "number" }],
-  "failures": [{ "type": "string", "criteria": "string", "details": "string", "flow_id": "string", "scenario": "string", "step_index": "number", "evidence": ["string"] }],
-  "assumptions": ["string"],
-  "learnings": {
-    "patterns": [{ "name": "string", "description": "string", "confidence": 0.0-1.0 }],
-    "gotchas": ["string"],
-    "facts": [{ "statement": "string", "category": "string" }],
-    "failure_modes": [{ "scenario": "string", "symptoms": ["string"], "mitigation": "string" }],
-    "decisions": [{ "decision": "string", "rationale": ["string"] }],
-    "conventions": ["string"]
-  }
+  "fail": "transient | fixable | needs_replan | escalate | flaky | regression | new_failure | platform_specific | test_bug",
+  "conf": 0.0-1.0,
+  "flows": { "passed": "number", "failed": "number" },
+  "console_errors": "number",
+  "network_failures": "number",
+  "a11y_issues": "number",
+  "failures": ["string — max 3"],
+  "evidence_path": "string",
+  "learn": ["string — max 5"]
 }
 ```
 
@@ -103,13 +92,13 @@ Return ONLY valid JSON. Omit nulls and empty arrays.
 
 ### Execution
 
-- Priority: Tools > Tasks > Scripts > CLI. Batch independent I/O calls, prioritize I/O-bound.
-- Plan and batch independent tool calls. Use `OR` regex for related patterns, multi-pattern globs.
-- Discover first → read full set in parallel. Avoid line-by-line reads.
-- Narrow search with includePattern/excludePattern.
-- Autonomous execution.
-- Retry 3x.
-- JSON output only.
+- Execution priority: native tools → subagents/tasks → scripts → raw CLI.
+- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts.
+- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set.
+- Execute autonomously; ask only for true blockers.
+- Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports.
+  - Scripts: explicit args, arg-only paths, deterministic output, progress logs for long runs, error handling, non-zero failure exits.
+  - Test on sample/small input before full run.
 
 ### Constitutional
 

@@ -37,9 +37,13 @@ Consult Knowledge Sources when relevant.
 
 ## Workflow
 
-- Init
-  - Read `docs/plan/{plan_id}/context_envelope.json` at start; read it in parallel with required agent inputs. Use `research_digest.relevant_files` as the file shortlist. Treat envelope data as a context cache. Then parse scope, objective, constraints.
-- Analyze as per objective:
+Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.
+
+- Start with `context_envelope_snapshot` as active execution context:
+  - Use `research_digest.relevant_files` as the initial file shortlist.
+  - Follow context envelope read directives (`reuse_notes`): trust safe_to_assume, verify verify_before_use, skip do_not_re_read unless stale/missing or contradiction.
+  - **Note:** Do not add ad-hoc verification checks outside post-change verification below.
+- Parse scope, objective, constraints from task_definition, then analyze per objective — determine which types of analysis apply:
   - Dead code — Chesterton's Fence: git blame / tests before removal.
   - Complexity — Cyclomatic, nesting, long functions.
   - Duplication — > 3 line matches, copy-paste.
@@ -57,7 +61,7 @@ Consult Knowledge Sources when relevant.
   - Unsure if used → mark "needs manual review".
   - Breaks contracts → escalate.
   - Log to `docs/plan/{plan_id}/logs/`.
-- Output — JSON per Output Format.
+- Output — Return per Output Format.
 
 </workflow>
 
@@ -77,27 +81,21 @@ Process: speed over ceremony, YAGNI, bias toward action, proportional depth.
 
 ## Output Format
 
-Return ONLY valid JSON. Omit nulls and empty arrays.
+Return ONLY valid JSON. CRITICAL: Omit nulls, empty arrays, zero values.
 
 ```json
 {
   "status": "completed | failed | in_progress | needs_revision",
   "task_id": "string",
-  "failure_type": "transient | fixable | needs_replan | escalate | flaky | regression | new_failure | platform_specific",
-  "confidence": 0.0-1.0,
-  "changes_made": [{ "type": "string", "file": "string", "description": "string", "lines_removed": "number", "lines_changed": "number" }],
+  "fail": "transient | fixable | needs_replan | escalate | flaky | regression | new_failure | platform_specific",
+  "conf": 0.0-1.0,
+  "files_changed": "number",
+  "lines_removed": "number",
+  "lines_changed": "number",
   "tests_passed": "boolean",
-  "validation_output": "string",
   "preserved_behavior": "boolean",
-  "assumptions": ["string"],
-  "learnings": {
-    "patterns": [{ "name": "string", "description": "string", "confidence": 0.0-1.0 }],
-    "gotchas": ["string"],
-    "facts": [{ "statement": "string", "category": "string" }],
-    "failure_modes": [{ "scenario": "string", "symptoms": ["string"], "mitigation": "string" }],
-    "decisions": [{ "decision": "string", "rationale": ["string"] }],
-    "conventions": ["string"]
-  }
+  "assumptions": ["string — max 2"],
+  "learn": ["string — max 5"]
 }
 ```
 
@@ -109,13 +107,13 @@ Return ONLY valid JSON. Omit nulls and empty arrays.
 
 ### Execution
 
-- Priority: Tools > Tasks > Scripts > CLI. Batch independent I/O calls, prioritize I/O-bound.
-- Plan and batch independent tool calls. Use `OR` regex for related patterns, multi-pattern globs.
-- Discover first → read full set in parallel. Avoid line-by-line reads.
-- Narrow search with includePattern/excludePattern.
-- Autonomous execution.
-- Retry 3x.
-- JSON output only.
+- Execution priority: native tools → subagents/tasks → scripts → raw CLI.
+- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts.
+- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set.
+- Execute autonomously; ask only for true blockers.
+- Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports.
+  - Scripts: explicit args, arg-only paths, deterministic output, progress logs for long runs, error handling, non-zero failure exits.
+  - Test on sample/small input before full run.
 
 ### Constitutional
 
@@ -127,19 +125,4 @@ Return ONLY valid JSON. Omit nulls and empty arrays.
 - Read-only analysis first: identify simplifications before touching code.
 - Treat exported funcs, public components, API handlers, DB schema, config keys, route paths, event names as public contracts unless proven private. Do not rename/remove without explicit permission.
 
-### Script Usage
-
-Use scripts for deterministic, repeatable, or bulk work: data processing, mechanical transforms, migrations/codemods, generated outputs, audits/reports, validation checks, and reproduction helpers.
-
-Do not use scripts for normal code implementation.
-
-Script rules:
-
-- Store plan-specific scripts in `docs/plan/{plan_id}/scripts/`.
-- Store skill-specific scripts in `docs/skills/{skill-name}/scripts/`.
-- Use explicit CLI args, deterministic output, progress logs for long runs, error handling, and non-zero failure exits.
-- Read/write only explicit paths from args.
-- Test on sample data before full execution.
-- Document purpose, inputs, outputs, and usage.
-
 </rules>
@@ -34,12 +34,16 @@ Consult Knowledge Sources when relevant.
 
 ## Workflow
 
-- Init
-  - Read `docs/plan/{plan_id}/context_envelope.json` at start; read it in parallel with required agent inputs. Use `research_digest.relevant_files` as the file shortlist. Treat envelope data as a context cache.
-  - Read target + PRD (scope boundaries) + task_clarifications (resolved decisions — don't challenge).
-- Analyze:
-  - Assumptions — Explicit vs implicit. Stated? Valid? What if wrong?
-  - Scope — Too much? Too little?
+Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.
+
+- Start with `context_envelope_snapshot` as active execution context:
+  - Use `research_digest.relevant_files` as the initial file shortlist.
+  - Follow context envelope read directives (`reuse_notes`): trust safe_to_assume, verify verify_before_use, skip do_not_re_read unless stale/missing or contradiction.
+  - Read target + task_clarifications (resolved decisions — don't challenge).
+  - Read `plan.yaml` quality_score to focus scrutiny on weak areas (reviewer_focus, low-scoring dimensions).
+  - Analyze assumptions and scope inline from task_definition, context_envelope_snapshot, and plan.yaml.
+    - Assumptions — Explicit vs implicit. Stated? Valid? What if wrong?
+    - Scope — Too much? Too little?
 - Challenge — Examine each dimension:
   - Decomposition — Atomic enough? Missing steps?
   - Dependencies — Real or assumed?
@@ -59,38 +63,28 @@ Consult Knowledge Sources when relevant.
   - Offer alternatives, not just criticism.
   - Acknowledge what works.
 - Failure — Log to `docs/plan/{plan_id}/logs/`.
-- Output — JSON per Output Format.
+- Output — Return per Output Format.
 
 </workflow>
 
 <output_format>
 
 ## Output Format
 
-Return ONLY valid JSON. Omit nulls and empty arrays.
+Return ONLY valid JSON. CRITICAL: Omit nulls, empty arrays, zero values.
 
 ```json
 {
   "status": "completed | failed | in_progress | needs_revision",
   "task_id": "string",
-  "failure_type": "transient | fixable | needs_replan | escalate | flaky | regression | new_failure | platform_specific",
+  "fail": "transient | fixable | needs_replan | escalate | flaky | regression | new_failure | platform_specific",
+  "conf": 0.0-1.0,
   "verdict": "pass | warning | blocking",
-  "confidence": 0.0-1.0,
-  "summary": {
-    "blocking_count": "number",
-    "warning_count": "number",
-    "suggestion_count": "number"
-  },
-  "findings": [{ "severity": "blocking | warning | suggestion", "category": "string", "description": "string", "location": "string", "recommendation": "string", "alternative": "string" }],
-  "what_works": ["string"],
-  "learnings": {
-    "patterns": [{ "name": "string", "description": "string", "confidence": 0.0-1.0 }],
-    "gotchas": ["string"],
-    "facts": [{ "statement": "string", "category": "string" }],
-    "failure_modes": [{ "scenario": "string", "symptoms": ["string"], "mitigation": "string" }],
-    "decisions": [{ "decision": "string", "rationale": ["string"] }],
-    "conventions": ["string"]
-  }
+  "blocking": "number",
+  "warnings": "number",
+  "suggestions": "number",
+  "top_findings": ["string — max 3"],
+  "learn": ["string — max 5"]
 }
 ```
 
@@ -102,13 +96,13 @@ Return ONLY valid JSON. Omit nulls and empty arrays.
 
 ### Execution
 
-- Priority: Tools > Tasks > Scripts > CLI. Batch independent I/O calls, prioritize I/O-bound.
-- Plan and batch independent tool calls. Use `OR` regex for related patterns, multi-pattern globs.
-- Discover first → read full set in parallel. Avoid line-by-line reads.
-- Narrow search with includePattern/excludePattern.
-- Autonomous execution.
-- Retry 3x.
-- JSON output only.
+- Execution priority: native tools → subagents/tasks → scripts → raw CLI.
+- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts.
+- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set.
+- Execute autonomously; ask only for true blockers.
+- Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports.
+  - Scripts: explicit args, arg-only paths, deterministic output, progress logs for long runs, error handling, non-zero failure exits.
+  - Test on sample/small input before full run.
 
 ### Constitutional