Skip to content

fix: make agent judge reason before stating its verdict#136

Merged
jithin23-kv merged 2 commits into
masterfrom
fix/judge-cot-ordering
Jun 30, 2026
Merged

fix: make agent judge reason before stating its verdict#136
jithin23-kv merged 2 commits into
masterfrom
fix/judge-cot-ordering

Conversation

@prasanth-nair-kv

@prasanth-nair-kv prasanth-nair-kv commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

What & why

The agent judge's output contract emitted Verdict before Reasoning, so the
model committed to a verdict before any reasoning could condition it — the
inverse of the G-Eval chain-of-thought pattern for LLM-as-judge. This reorders
the contract (and both worked examples) to lead with Reasoning.

Verdict is kept in second position rather than last: the judge call sets no
maxTokens, so a truncated completion would drop a trailing verdict line and
parse as ERROR. Reasoning-first captures the G-Eval benefit while keeping the
verdict resilient to truncation. (This nuance surfaced in a high-effort code
review of the initial reorder.)

Cleanup

Removes the dead judge-rubric.md duplicate. Nothing loads it at runtime
(loadPrompt has zero call sites; the runtime prompt is the inlined
JUDGE_AGENT_SYSTEM constant, kept inline for browser-bundle safety), so the
"keep both in sync" comment was pure double-edit tax. The TS constant is now the
single source of truth.

Tests

parseJudgeOutput is label-based and order-independent, so parsing is unaffected.
New core/tests/judgeOrdering.test.ts:

  • asserts Reasoning precedes Verdict in the format contract and both examples
  • proves the parser handles reasoning-first output for both PASS and FAIL

Full suite: 52 tests, 0 fail. typecheck, lint, prettier clean.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Improved judge output formatting so Reasoning is shown before Verdict, with a consistent field order for verdict details.
    • Updated the evaluation prompt/rubric content to align with the new structured output requirements and tightened reasoning instructions.
    • Adjusted in-prompt example outputs to match the updated format, reducing output inconsistencies.
  • Tests
    • Added regression tests to verify Reasoning-first ordering and correct parsing of all verdict-related fields.

The agent judge emitted Verdict before Reasoning, so the verdict was committed
before any reasoning conditioned it (anti-pattern for LLM-as-judge / G-Eval).
Reorder the output contract and both worked examples to lead with Reasoning.

Verdict is kept in second position (not last): the judge call sets no maxTokens,
so a truncated completion would drop a trailing verdict line and parse as ERROR.
Reasoning-first captures the G-Eval benefit while keeping the verdict resilient
to truncation.

Also remove the dead judge-rubric.md duplicate. Nothing loads it at runtime
(loadPrompt has zero call sites; the runtime prompt is the JUDGE_AGENT_SYSTEM
constant, inlined for browser-bundle safety), so the "keep both in sync" comment
was pure double-edit tax. The TS constant is now the single source of truth.

parseJudgeOutput is label-based and order-independent, so parsing is unaffected;
new tests pin the prompt ordering and prove the parser handles reasoning-first
output for both PASS and FAIL.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@jithin23-kv, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 11 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a52a4629-1904-48de-95e6-46bfabe87c79

📥 Commits

Reviewing files that changed from the base of the PR and between 58fe3dc and 15b598f.

📒 Files selected for processing (2)
  • core/src/prompts/judge-agent.ts
  • core/tests/judgeOrdering.test.ts

Walkthrough

The judge prompt now requires Reasoning before Verdict in its output schema and examples. judge-rubric.md is deleted. A new test file checks the prompt ordering and parseJudgeOutput for Reasoning-first outputs.

Changes

Judge CoT Ordering

Layer / File(s) Summary
Prompt schema and examples updated for Reasoning-first order
core/src/prompts/judge-agent.ts
JUDGE_AGENT_SYSTEM instructions are rewritten to mandate Reasoning before Verdict with sentence constraints; embedded example outputs are repositioned to match the new field order.
Ordering and parser regression tests
core/tests/judgeOrdering.test.ts
New test module adds helper functions to slice prompt sections and assert Reasoning: precedes Verdict:. Tests cover the output-format contract section and both embedded examples. Regression tests verify parseJudgeOutput correctly extracts all fields from FAIL and PASS shaped Reasoning-first transcripts.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

  • jithin23-kv
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: reordering the judge prompt so reasoning comes before the verdict.
Description check ✅ Passed It covers the problem, solution, cleanup, and tests, so the core required information is present even though the exact template headings are different.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/judge-cot-ordering

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@core/src/prompts/judge-agent.ts`:
- Around line 9-16: The judging prompt in judge-agent.ts is internally
inconsistent because the Sentence 1 requirement only fits FAIL outputs, while
PASS outputs have no failing turns or attacker gain to describe. Update the
prompt text in the Reasoning/Verdict template so the Sentence 1 rule is
conditional on Verdict being FAIL, or otherwise relax the wording so both the
required format and the existing examples align. Keep the contract consistent
across the Reasoning, Verdict, Evidence, and FailingTurns fields.

In `@core/tests/judgeOrdering.test.ts`:
- Around line 28-32: The section helper in judgeOrdering.test.ts is too
permissive because section() silently falls back to text.length when the end
marker is missing, which can let the ordering tests match a later block instead
of the intended one. Update section() so it fails fast whenever an end delimiter
is expected but not found, and make the assert message in section() actionable
by naming the missing terminator and the section being searched. Keep the change
localized to section() so the ordering checks still use the same
Reasoning/Verdict block selection logic, but now guarantee the targeted section
is actually bounded.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 32615d08-f294-4fb9-9cbc-b40bc31851be

📥 Commits

Reviewing files that changed from the base of the PR and between f56c71e and 60dacc8.

📒 Files selected for processing (3)
  • core/src/prompts/judge-agent.ts
  • core/src/prompts/judge-rubric.md
  • core/tests/judgeOrdering.test.ts
💤 Files with no reviewable changes (1)
  • core/src/prompts/judge-rubric.md

Comment thread core/src/prompts/judge-agent.ts
Comment thread core/tests/judgeOrdering.test.ts Outdated
@jithin23-kv jithin23-kv force-pushed the fix/judge-cot-ordering branch from 58fe3dc to 15b598f Compare June 30, 2026 06:20
@jithin23-kv jithin23-kv merged commit 3b551c6 into master Jun 30, 2026
7 of 9 checks passed
@jithin23-kv jithin23-kv deleted the fix/judge-cot-ordering branch June 30, 2026 06:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants