Skip to content

[Serve] Add RAY_SERVE_HAPROXY_TCP_NODELAY env var#61468

Open
eicherseiji wants to merge 5 commits intoray-project:masterfrom
eicherseiji:haproxy-disable-nagle
Open

[Serve] Add RAY_SERVE_HAPROXY_TCP_NODELAY env var#61468
eicherseiji wants to merge 5 commits intoray-project:masterfrom
eicherseiji:haproxy-disable-nagle

Conversation

@eicherseiji
Copy link
Contributor

@eicherseiji eicherseiji commented Mar 3, 2026

Summary

Adds RAY_SERVE_HAPROXY_TCP_NODELAY environment variable (default 0) that, when set to 1, adds option http-no-delay to the HAProxy config defaults section. This sets TCP_NODELAY on both client and server connections, disabling Nagle's algorithm.

Replaces #61455 which was accidentally corrupted by a bad force push.

Why this matters

HAProxy's default behavior leaves Nagle's algorithm enabled, which buffers small writes before flushing to the wire. For LLM serving workloads where responses are streamed token-by-token as small SSE frames, this introduces artificial latency — the kernel holds back each token waiting for more data to coalesce, directly inflating TTFT and tail ITL.

Benchmark results

Setup: Qwen2.5-0.5B-Instruct, 256 requests, max concurrency 32, prefix-heavy prompt (640 input tokens/req)

Metric Nagle ON (default) Nagle OFF (TCP_NODELAY) Delta
Mean TTFT 356.8 ms 120.9 ms -66%
Median TTFT 274.1 ms 69.0 ms -75%
P99 TTFT 939.0 ms 715.6 ms -24%
P99 ITL 255.9 ms 53.4 ms -79%
Output throughput 2,481 tok/s 2,521 tok/s ~0%
Request throughput 19.4 req/s 19.8 req/s ~0%

Nagle causes ~3x mean TTFT regression and ~5x P99 ITL regression with no meaningful throughput benefit. The effect will be more pronounced at lower concurrency / interactive use cases.

Code change

The implementation is a one-liner in the Jinja template, gated behind an env var:

# python/ray/serve/_private/haproxy_templates.py
    {%- if config.tcp_nodelay %}
    # Set TCP_NODELAY on all connections
    option http-no-delay
    {%- endif %}
# python/ray/serve/_private/constants.py
RAY_SERVE_HAPROXY_TCP_NODELAY = (
    os.environ.get("RAY_SERVE_HAPROXY_TCP_NODELAY", "0") == "1"
)

Test plan

  • Verify generated haproxy.cfg contains option http-no-delay when env var is set
  • Existing HAProxy unit tests pass (test_controller_haproxy.py)
  • New test: test_start_with_tcp_nodelay — verifies HAProxy starts successfully with tcp_nodelay=True and config contains the directive
  • Benchmarked with/without on Qwen2.5-0.5B-Instruct (results above)

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
@eicherseiji eicherseiji requested a review from a team as a code owner March 3, 2026 22:59
'http-request set-nodelay' / 'http-response set-nodelay' are not valid
actions in HAProxy 2.8.x. Replace with 'option http-no-delay' which has
been available since HAProxy 1.5+ and forces TCP_NODELAY on every
outgoing segment for both client and server connections.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the RAY_SERVE_HAPROXY_DISABLE_NAGLE environment variable to disable Nagle's algorithm in HAProxy, which can improve latency for certain workloads. The implementation correctly adds the necessary HAProxy directives. My feedback includes a couple of suggestions to enhance code consistency and simplify the HAProxy configuration.

Comment on lines +674 to +676
RAY_SERVE_HAPROXY_DISABLE_NAGLE = (
os.environ.get("RAY_SERVE_HAPROXY_DISABLE_NAGLE", "0") == "1"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with other boolean environment variables in this file (e.g., RAY_SERVE_LOG_TO_STDERR), it would be better to use the get_env_bool utility function. This also makes the check more robust by handling values like "true" in addition to "1".

RAY_SERVE_HAPROXY_DISABLE_NAGLE = get_env_bool("RAY_SERVE_HAPROXY_DISABLE_NAGLE", "0")

@ray-gardener ray-gardener bot added serve Ray Serve Related Issue llm labels Mar 4, 2026
Positive framing using the canonical socket option name. More grep-friendly
and avoids the double-negative of "disable Nagle = 1".

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
@eicherseiji eicherseiji force-pushed the haproxy-disable-nagle branch from 76ff873 to 9587bc5 Compare March 4, 2026 01:38
@eicherseiji eicherseiji changed the title [Serve] Add RAY_SERVE_HAPROXY_DISABLE_NAGLE env var [Serve] Add RAY_SERVE_HAPROXY_TCP_NODELAY env var Mar 4, 2026
Verifies that when tcp_nodelay=True, the generated config contains
'option http-no-delay' and HAProxy starts successfully.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
@eicherseiji eicherseiji force-pushed the haproxy-disable-nagle branch from a8ad9b5 to 9ac2709 Compare March 4, 2026 01:41
@abrarsheikh abrarsheikh added the go add ONLY when ready to merge, run all tests label Mar 4, 2026
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests llm serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants