fix: make agent judge reason before stating its verdict#136
Conversation
The agent judge emitted Verdict before Reasoning, so the verdict was committed before any reasoning conditioned it (anti-pattern for LLM-as-judge / G-Eval). Reorder the output contract and both worked examples to lead with Reasoning. Verdict is kept in second position (not last): the judge call sets no maxTokens, so a truncated completion would drop a trailing verdict line and parse as ERROR. Reasoning-first captures the G-Eval benefit while keeping the verdict resilient to truncation. Also remove the dead judge-rubric.md duplicate. Nothing loads it at runtime (loadPrompt has zero call sites; the runtime prompt is the JUDGE_AGENT_SYSTEM constant, inlined for browser-bundle safety), so the "keep both in sync" comment was pure double-edit tax. The TS constant is now the single source of truth. parseJudgeOutput is label-based and order-independent, so parsing is unaffected; new tests pin the prompt ordering and prove the parser handles reasoning-first output for both PASS and FAIL. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Warning Review limit reached
Next review available in: 11 minutes Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available. How can I continue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews. How do review limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please refer docs for additional details. Review details⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughThe judge prompt now requires ChangesJudge CoT Ordering
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@core/src/prompts/judge-agent.ts`:
- Around line 9-16: The judging prompt in judge-agent.ts is internally
inconsistent because the Sentence 1 requirement only fits FAIL outputs, while
PASS outputs have no failing turns or attacker gain to describe. Update the
prompt text in the Reasoning/Verdict template so the Sentence 1 rule is
conditional on Verdict being FAIL, or otherwise relax the wording so both the
required format and the existing examples align. Keep the contract consistent
across the Reasoning, Verdict, Evidence, and FailingTurns fields.
In `@core/tests/judgeOrdering.test.ts`:
- Around line 28-32: The section helper in judgeOrdering.test.ts is too
permissive because section() silently falls back to text.length when the end
marker is missing, which can let the ordering tests match a later block instead
of the intended one. Update section() so it fails fast whenever an end delimiter
is expected but not found, and make the assert message in section() actionable
by naming the missing terminator and the section being searched. Keep the change
localized to section() so the ordering checks still use the same
Reasoning/Verdict block selection logic, but now guarantee the targeted section
is actually bounded.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 32615d08-f294-4fb9-9cbc-b40bc31851be
📒 Files selected for processing (3)
core/src/prompts/judge-agent.tscore/src/prompts/judge-rubric.mdcore/tests/judgeOrdering.test.ts
💤 Files with no reviewable changes (1)
- core/src/prompts/judge-rubric.md
58fe3dc to
15b598f
Compare
What & why
The agent judge's output contract emitted
VerdictbeforeReasoning, so themodel committed to a verdict before any reasoning could condition it — the
inverse of the G-Eval chain-of-thought pattern for LLM-as-judge. This reorders
the contract (and both worked examples) to lead with
Reasoning.Verdictis kept in second position rather than last: the judge call sets nomaxTokens, so a truncated completion would drop a trailing verdict line andparse as
ERROR. Reasoning-first captures the G-Eval benefit while keeping theverdict resilient to truncation. (This nuance surfaced in a high-effort code
review of the initial reorder.)
Cleanup
Removes the dead
judge-rubric.mdduplicate. Nothing loads it at runtime(
loadPrompthas zero call sites; the runtime prompt is the inlinedJUDGE_AGENT_SYSTEMconstant, kept inline for browser-bundle safety), so the"keep both in sync" comment was pure double-edit tax. The TS constant is now the
single source of truth.
Tests
parseJudgeOutputis label-based and order-independent, so parsing is unaffected.New
core/tests/judgeOrdering.test.ts:ReasoningprecedesVerdictin the format contract and both examplesFull suite: 52 tests, 0 fail. typecheck, lint, prettier clean.
🤖 Generated with Claude Code
Summary by CodeRabbit