fix(mcp-bridge): terminate backend process on client disconnect and bound concurrent sessions#13482
Conversation
223720d to
18cab31
Compare
There was a problem hiding this comment.
Pull request overview
This PR improves the mcp-bridge plugin’s SSE session lifecycle by making backend teardown deterministic on client disconnect and by introducing a per-worker cap on concurrent sessions to prevent unbounded backend process spawning.
Changes:
- Add client-abort handling (
ngx.on_abort) + aserver:stop()flag so sessions exit promptly on disconnect. - Add a per-worker session ceiling via new
max_sessionsconfig (default 100) backed by a smallsession_limitmodule. - Enable
lua_check_client_abort on;globally in the Nginxhttpblock template.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
t/plugin/mcp-bridge.t |
Adds schema validation coverage for max_sessions and unit-tests the session limit bookkeeping. |
apisix/plugins/mcp/session_limit.lua |
New per-worker counter for tracking concurrent MCP SSE sessions. |
apisix/plugins/mcp/server.lua |
Makes the ping loop wake early when asked to stop; adds stop() to request exit. |
apisix/plugins/mcp/server_wrapper.lua |
Registers ngx.on_abort, enforces max_sessions, and centralizes teardown/release logic. |
apisix/plugins/mcp-bridge.lua |
Extends plugin schema with max_sessions (min 1, default 100). |
apisix/cli/ngx_tpl.lua |
Enables lua_check_client_abort to allow ngx.on_abort usage. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
18cab31 to
ab9f040
Compare
…ound concurrent sessions The SSE session loop only ended when a keepalive write failed, which can take two 30s ping cycles or never happen, so a backend process spawned for a disconnected client could stay alive until worker reload. There was also no ceiling on concurrent sessions, so a route could keep spawning backend processes for as many connections as were opened. - register an ngx.on_abort handler so a client disconnect stops the session promptly instead of waiting for the next keepalive write to fail - make the ping loop wake early once the session is asked to stop - always run teardown (process + broker) and free the session slot via a guarded path, regardless of how the loop ended - add a per-worker concurrent-session ceiling (max_sessions, default 100); excess SSE connections get 429 - enable lua_check_client_abort in the main http block so on_abort works Behaviour change: lua_check_client_abort is now on globally, so long-running Lua handler phases are terminated when the client disconnects (proxied requests were already aborted by nginx). New config field max_sessions.
ab9f040 to
8dafd1b
Compare
membphis
left a comment
There was a problem hiding this comment.
LGTM. The session teardown path is now deterministic, the per-worker session limit is bounded and released through a guarded cleanup path, and CI is passing. No merge blockers found.
bzp2010
left a comment
There was a problem hiding this comment.
In addition to the comments above, I also suggest that we begin deprecate this plugin, as it no longer serves a practical purpose. We have also been plagued by false-positive vulnerability reports related to it.
| local pcall = pcall | ||
| local core = require("apisix.core") | ||
| local mcp_server = require("apisix.plugins.mcp.server") | ||
| local session_limit = require("apisix.plugins.mcp.session_limit") |
There was a problem hiding this comment.
This applies globally, right? That means no matter how many mcp-bridge routes are configured, the total number of sessions cannot exceed the limit?
It doesn't seem to be at the routes level.
| local ok, err = ngx_on_abort(function() | ||
| server:stop() | ||
| end) |
There was a problem hiding this comment.
This is a bit risky, as ngx.on_abort can only be called once per request. If a plugin calls this hook, it effectively hijacks the call to that hook for that single plugin. If other plugins or the core need to use this hook, it will fail.
I suggest adding some comments to clarify this point.
There was a problem hiding this comment.
Good call, added a comment clarifying this. The single ngx.on_abort slot is fine to consume here because the SSE endpoint is a dedicated long-lived request whose lifecycle isn't shared with other plugins needing the hook, and if the registration ever fails we fall back to detecting the disconnect on the next keepalive write (already handled via the ok, err check). Pushed in d0c8262.
Description
The
mcp-bridgeplugin spawns a backend process per SSE connection and relies on the SSE session loop returning to tear that process down. Today the loop only ends when a keepalive write fails, which can take up to two 30s ping cycles or, if writes keep being absorbed by the connection, may not happen at all. As a result a backend process started for a client that has already gone away can stay alive until the worker reloads. Separately, there is no ceiling on how many sessions a worker will keep open, so a route can keep spawning backend processes for as many connections as are opened.This PR makes session teardown deterministic and bounds concurrency:
ngx.on_aborthandler so a client disconnect stops the session promptly instead of waiting for the next keepalive write to fail.max_sessionsconfig field (default100); connections beyond the ceiling get429.lua_check_client_abortin the mainhttpblock soon_abortis usable.The concurrent-session bookkeeping is factored into a small
apisix/plugins/mcp/session_limit.luamodule so it can be unit-tested.Behaviour changes
lua_check_client_abortis now enabled globally in the mainhttpblock. This means long-running Lua handler phases are terminated when the client disconnects. Proxied requests were already aborted by nginx in this case, so the practical effect is limited to Lua streaming handlers, which now stop work when the client goes away.max_sessions(integer, default100). Existing configs keep working unchanged; the default applies when the field is omitted.Which issue(s) this PR fixes:
Fixes #
Checklist