Skip to content

fix: allow separate input/output budgets for T5 in context check#3885

Open
Chessing234 wants to merge 1 commit into
lm-sys:mainfrom
Chessing234:fix/t5-encoder-decoder-context-check
Open

fix: allow separate input/output budgets for T5 in context check#3885
Chessing234 wants to merge 1 commit into
lm-sys:mainfrom
Chessing234:fix/t5-encoder-decoder-context-check

Conversation

@Chessing234

Copy link
Copy Markdown

Summary

Fixes #1711.

FastChat-T5 supports up to 2K encoder tokens plus 2K decoder tokens, but the OpenAI API server treated context as a single shared budget (context_len - prompt_tokens).

Root cause

check_length always used the causal-LM formula, so a 1790-token prompt with max_tokens=512 was rejected as 2302 > 2048 even though T5 can encode 1790 tokens and still generate 512 more.

Fix

For T5 models, validate prompt length against context_len and cap completion tokens independently, matching encoder-decoder behavior in inference.py.

Test plan

  • Call the API with fastchat-t5-3b-v1.0, ~1790 prompt tokens, and max_tokens=512; request should succeed
  • Confirm causal models still reject prompts that leave no room for completion

Made with Cursor

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FastChat-T5 4K context

1 participant