fix: allow separate input/output budgets for T5 in context check by Chessing234 · Pull Request #3885 · lm-sys/FastChat

Chessing234 · 2026-06-10T07:08:04Z

Summary

FastChat-T5 supports up to 2K encoder tokens plus 2K decoder tokens, but the OpenAI API server treated context as a single shared budget (context_len - prompt_tokens).

Root cause

check_length always used the causal-LM formula, so a 1790-token prompt with max_tokens=512 was rejected as 2302 > 2048 even though T5 can encode 1790 tokens and still generate 512 more.

Fix

For T5 models, validate prompt length against context_len and cap completion tokens independently, matching encoder-decoder behavior in inference.py.

Test plan

Call the API with fastchat-t5-3b-v1.0, ~1790 prompt tokens, and max_tokens=512; request should succeed
Confirm causal models still reject prompts that leave no room for completion

Made with Cursor

Co-authored-by: Cursor <cursoragent@cursor.com>

fix: allow separate input/output budgets for T5 in context check

26abe8a

Co-authored-by: Cursor <cursoragent@cursor.com>

Chessing234 mentioned this pull request Jun 10, 2026

FastChat-T5 4K context #1711

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: allow separate input/output budgets for T5 in context check#3885

fix: allow separate input/output budgets for T5 in context check#3885
Chessing234 wants to merge 1 commit into
lm-sys:mainfrom
Chessing234:fix/t5-encoder-decoder-context-check

Chessing234 commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Chessing234 commented Jun 10, 2026

Summary

Root cause

Fix

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant