Skip to content

[Flow Control] Roadmap #2152

@LukeAVanDrie

Description

@LukeAVanDrie

Roadmap

This tracking issue outlines the roadmap for transitioning the Flow Control layer from experimental to a state where we can comfortably enable it by default and support core use cases.

Note on Scope & Priority: This roadmap represents the current initial backlog. Priorities are flexible and subject to change based on community feedback, user requirements, and contributor bandwidth. We welcome discussion on re-ordering these workstreams to better fit adoption needs.

Workstream 1: Core Architecture & Concurrency

Objective: Replace the brittle JIT and concurrency model with a robust, controller-reconciled architecture. Improve maintainability and extensibility of the Flow Control layer.

PoC: @LukeAVanDrie

Prio ID Issue Title Assignee Status Detailed Context
P0 #1982 Race Condition: Premature Flow GC causes Orphaned Queues @LukeAVanDrie 🟡 In Review CRITICAL. Fixes orphaned queues where Flow GC races with long inter-arrival times in bursty/idle workloads.
Active PRs: #2143 (Hack for v1.3) RC, #2127 (Concurrency Model), #2131 (Fix).
P0 #1792 Feature: Support dynamic priority provisioning @LukeAVanDrie 🟢 Done Completed. Enabled JIT support for arbitrary int priorities. (Merged via #2001 and #2006).
P1 #2012 Garbage Collection for Priority Bands @evacchi 🟡 In Progress Memory Safety. Prevents unused priority bands from leaking memory.
Status: Patterns established in PR #2127. Draft in #2097.
P2 #2011 Refactor: Drive Priority Band lifecycle from Controller Unassigned 🔵 Design Needed Architecture Change. Moves provisioning out of the hot path (removes JIT). Requires reconciling InferenceObjective to Flow Registry.

Workstream 2: Configuration & Extensibility

Objective: Expose internal logic (Policies, Saturation) via the EPP Plugin system.

PoC: seeking PoC

Prio ID Issue Title Assignee Status Detailed Context
P0 #1794 Integrate Policy Configuration into EPP Plugin Model Unassigned 🔴 Open Help Wanted. We have the logic (PR #2031) but need the YAML config loader wiring.
Blocker: Users cannot select policies without this.
P1 #1715 Align Flow Control config/extensibility with other layers @LukeAVanDrie 🟡 In Progress Parent Issue. Tracking formalization of Flow Control extension points and config surface (text-based and env vars).
P2 #1405 Saturation check should become an extension point @LukeAVanDrie 🟡 In Progress Extensibility. PR #1976 (Merged) prepared the directory structure. Next: Define SaturationDetector extension point interface.
P2 #1861 Inter-flow policies to support batch inference (Starvation) Unassigned 🔴 Open Extension Point. Add a new extension to allow controlling dispatch behavior between priority bands (instead of strict priority).
P3 #1863 Intra-flow sorting based on user defined stored metadata Unassigned 🔴 Open Feature. Allow ordering queues with custom headers/metadata (need to ensure this info is available to the OrderingPolicy call site).
P3 #2013 Cleanup: Remove PriorityName from PriorityBandConfig @majiayu000 🟡 In Progress Cleanup. Removing confusing string aliases for integer priorities.
Active PR: #2042

Workstream 3: Autoscaling & Saturation Signals

Objective: Provide "Golden Signals" for HPA and robust Scale-from-Zero support.

PoC: @aishukamal (autoscaling), @LukeAVanDrie (saturation)

Prio ID Issue Title Assignee Status Detailed Context
P0 #1798 Expose Backpressure Metrics for Autoscaling @aishukamal 🔵 Design Critical for Ops. Exploring queue_depth, concurrency, etc. as signals.
P0 #1800 Validate and Harden Scale-from-Zero Behavior @aishukamal 🟡 Validation Enablement. PR #1952 (Merged) enabled the mechanics. Need validation for customer use cases (and stressing envoy<->EPP interactions).
P1 #1793 Enhance Saturation Detector for Adaptive Control @LukeAVanDrie 🟡 In Review Stability. Moving from brittle heuristics to atomic concurrency tracking.
Active PR: #2062 (ConcurrencyDetector).

Workstream 4: Observability, Testing & Hardening

Objective: Ensure operators can debug the system and verify performance overhead.

PoC: @LukeAVanDrie (general hardening and milestone tracking), seeking PoC for benchmarking

Prio ID Issue Title Assignee Status Detailed Context
P0 #1795 Create User Guide for the Flow Control Layer @LukeAVanDrie 🟡 Drafting Documentation. Draft exists. Needs to be split into a few different artifacts and published to the dev site.
P0 #1708 Observability (Prom Metrics, not Tracing) @RyanRosario 🟡 In Progress Visibility. Adding dispatch cycle latency histograms, plugin execution times, queue length in bytes, etc.
P0 #1920 Prometheus Metrics labels missing 'InferencePool' label @LukeAVanDrie 🟢 Done Completed. (Merged via #2010).
P1 #2171 Grafana Dashboard for Flow Control Unassigned 🔴 Open Visibility. Makes the metrics in #1708 operable.
P1 #2087 Benchmark: Scale testing for Flow Control @LukeAVanDrie 🟡 In Progress Help Wanted. Empirically map the system's operational envelope by determining the exact breaking point of the single-threaded dispatcher under massive tenancy, thereby establishing the practical non-functional limits for our multi-tenancy story.
P2 #1799 Tracking Production Readiness and Hardening Unassigned 🔴 Open Audit. Distributed Tracing context propagation check, log level audit, performance profiling, etc.
P2 #1801 Create Benchmarking Guide for the Flow Control Layer Unassigned 🔴 Open Documentation. Develop a standardized benchmarking guide defining scenarios to validate latency overhead, multi-tenant isolation, and operational limits.

Workstream 5: Future Research

PoC: @wseaton (extension points & disaggregated Serving), @LukeAVanDrie (Flow Control & Scheduling interactions)

Prio ID Issue Title Status Context
P2 #1802 Define Flow Control Support for Disaggregated Serving 🔴 Open Architectural Design. What role does Flow Control have in Disaggregated Serving? Needs a design doc.
P3 #1797 Research and Implement Advanced Fairness/Scheduling Policies 🔴 Open Future Feature. (EDF, VTC). Blocked by Plugin wiring (#1794).
P3 #1860 Reconciliation of the flow control logic with the scheduling logic 🟡 PoC Optimization. Improving bin-packing decisions based on saturation signals.

Backlog / To Be Ticketed (Seeking PoCs)

The following items are identified as requirements but need formal issues created.

Testing Infrastructure:

  • Hermetic testing for Flow Control logic.
  • End-to-End (E2E) tests for Flow Control logic.

Architecture & Design:

  • Determine if sharding is required (and if so, if autotuning shard count is needed)
  • Formalize interactions with Latency Predictor.
  • Formalize interactions with Workload Variant Autoscaler (WVA).
  • Formalize interactions with other well-lit paths.
  • Formalize admission control and load shedding interactions.
  • Define support for Online vs. Offline Batching constraints.
  • Hold out minimum quantum of throughput per flow. Reserve the top N dispatch slots only for highest-priority traffic, ensuring a burst of P0 traffic is never blocked by a fully saturated pool of P1 traffic (do we tackle this in Saturation Detection, Scheduling (reserving pods for high priority traffic, or admission control?)
  • Flow Control support for offline batch (via Flow Control primitives ported as a library)
    • No throttling in batch components
    • Priority levels for offline and interactive requests s.t., interactive is strictly prioritized
    • Pass-through design from offline batch → online batch → Flow Control (in EPP)

API Definitions:

  • Finalize CRD integration (e.g., Priority derivation from InferenceObjective).
  • Standardize FlowKey sources (Headers vs. CRD).
  • Standardize TTL sources (Headers vs. Spec).

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions