-
Notifications
You must be signed in to change notification settings - Fork 221
Description
Roadmap
This tracking issue outlines the roadmap for transitioning the Flow Control layer from experimental to a state where we can comfortably enable it by default and support core use cases.
Note on Scope & Priority: This roadmap represents the current initial backlog. Priorities are flexible and subject to change based on community feedback, user requirements, and contributor bandwidth. We welcome discussion on re-ordering these workstreams to better fit adoption needs.
Workstream 1: Core Architecture & Concurrency
Objective: Replace the brittle JIT and concurrency model with a robust, controller-reconciled architecture. Improve maintainability and extensibility of the Flow Control layer.
PoC: @LukeAVanDrie
| Prio | ID | Issue Title | Assignee | Status | Detailed Context |
|---|---|---|---|---|---|
| P0 | #1982 | Race Condition: Premature Flow GC causes Orphaned Queues | @LukeAVanDrie | 🟡 In Review | CRITICAL. Fixes orphaned queues where Flow GC races with long inter-arrival times in bursty/idle workloads. • Active PRs: #2143 (Hack for v1.3) RC, #2127 (Concurrency Model), #2131 (Fix). |
| P0 | #1792 | Feature: Support dynamic priority provisioning | @LukeAVanDrie | 🟢 Done | Completed. Enabled JIT support for arbitrary int priorities. (Merged via #2001 and #2006). |
| P1 | #2012 | Garbage Collection for Priority Bands | @evacchi | 🟡 In Progress | Memory Safety. Prevents unused priority bands from leaking memory. • Status: Patterns established in PR #2127. Draft in #2097. |
| P2 | #2011 | Refactor: Drive Priority Band lifecycle from Controller | Unassigned | 🔵 Design Needed | Architecture Change. Moves provisioning out of the hot path (removes JIT). Requires reconciling InferenceObjective to Flow Registry. |
Workstream 2: Configuration & Extensibility
Objective: Expose internal logic (Policies, Saturation) via the EPP Plugin system.
PoC: seeking PoC
| Prio | ID | Issue Title | Assignee | Status | Detailed Context |
|---|---|---|---|---|---|
| P0 | #1794 | Integrate Policy Configuration into EPP Plugin Model | Unassigned | 🔴 Open | Help Wanted. We have the logic (PR #2031) but need the YAML config loader wiring. • Blocker: Users cannot select policies without this. |
| P1 | #1715 | Align Flow Control config/extensibility with other layers | @LukeAVanDrie | 🟡 In Progress | Parent Issue. Tracking formalization of Flow Control extension points and config surface (text-based and env vars). |
| P2 | #1405 | Saturation check should become an extension point | @LukeAVanDrie | 🟡 In Progress | Extensibility. PR #1976 (Merged) prepared the directory structure. Next: Define SaturationDetector extension point interface. |
| P2 | #1861 | Inter-flow policies to support batch inference (Starvation) | Unassigned | 🔴 Open | Extension Point. Add a new extension to allow controlling dispatch behavior between priority bands (instead of strict priority). |
| P3 | #1863 | Intra-flow sorting based on user defined stored metadata | Unassigned | 🔴 Open | Feature. Allow ordering queues with custom headers/metadata (need to ensure this info is available to the OrderingPolicy call site). |
| P3 | #2013 | Cleanup: Remove PriorityName from PriorityBandConfig |
@majiayu000 | 🟡 In Progress | Cleanup. Removing confusing string aliases for integer priorities. • Active PR: #2042 |
Workstream 3: Autoscaling & Saturation Signals
Objective: Provide "Golden Signals" for HPA and robust Scale-from-Zero support.
PoC: @aishukamal (autoscaling), @LukeAVanDrie (saturation)
| Prio | ID | Issue Title | Assignee | Status | Detailed Context |
|---|---|---|---|---|---|
| P0 | #1798 | Expose Backpressure Metrics for Autoscaling | @aishukamal | 🔵 Design | Critical for Ops. Exploring queue_depth, concurrency, etc. as signals. |
| P0 | #1800 | Validate and Harden Scale-from-Zero Behavior | @aishukamal | 🟡 Validation | Enablement. PR #1952 (Merged) enabled the mechanics. Need validation for customer use cases (and stressing envoy<->EPP interactions). |
| P1 | #1793 | Enhance Saturation Detector for Adaptive Control | @LukeAVanDrie | 🟡 In Review | Stability. Moving from brittle heuristics to atomic concurrency tracking. • Active PR: #2062 (ConcurrencyDetector). |
Workstream 4: Observability, Testing & Hardening
Objective: Ensure operators can debug the system and verify performance overhead.
PoC: @LukeAVanDrie (general hardening and milestone tracking), seeking PoC for benchmarking
| Prio | ID | Issue Title | Assignee | Status | Detailed Context |
|---|---|---|---|---|---|
| P0 | #1795 | Create User Guide for the Flow Control Layer | @LukeAVanDrie | 🟡 Drafting | Documentation. Draft exists. Needs to be split into a few different artifacts and published to the dev site. |
| P0 | #1708 | Observability (Prom Metrics, not Tracing) | @RyanRosario | 🟡 In Progress | Visibility. Adding dispatch cycle latency histograms, plugin execution times, queue length in bytes, etc. |
| P0 | #1920 | Prometheus Metrics labels missing 'InferencePool' label | @LukeAVanDrie | 🟢 Done | Completed. (Merged via #2010). |
| P1 | #2171 | Grafana Dashboard for Flow Control | Unassigned | 🔴 Open | Visibility. Makes the metrics in #1708 operable. |
| P1 | #2087 | Benchmark: Scale testing for Flow Control | @LukeAVanDrie | 🟡 In Progress | Help Wanted. Empirically map the system's operational envelope by determining the exact breaking point of the single-threaded dispatcher under massive tenancy, thereby establishing the practical non-functional limits for our multi-tenancy story. |
| P2 | #1799 | Tracking Production Readiness and Hardening | Unassigned | 🔴 Open | Audit. Distributed Tracing context propagation check, log level audit, performance profiling, etc. |
| P2 | #1801 | Create Benchmarking Guide for the Flow Control Layer | Unassigned | 🔴 Open | Documentation. Develop a standardized benchmarking guide defining scenarios to validate latency overhead, multi-tenant isolation, and operational limits. |
Workstream 5: Future Research
PoC: @wseaton (extension points & disaggregated Serving), @LukeAVanDrie (Flow Control & Scheduling interactions)
| Prio | ID | Issue Title | Status | Context |
|---|---|---|---|---|
| P2 | #1802 | Define Flow Control Support for Disaggregated Serving | 🔴 Open | Architectural Design. What role does Flow Control have in Disaggregated Serving? Needs a design doc. |
| P3 | #1797 | Research and Implement Advanced Fairness/Scheduling Policies | 🔴 Open | Future Feature. (EDF, VTC). Blocked by Plugin wiring (#1794). |
| P3 | #1860 | Reconciliation of the flow control logic with the scheduling logic | 🟡 PoC | Optimization. Improving bin-packing decisions based on saturation signals. |
Backlog / To Be Ticketed (Seeking PoCs)
The following items are identified as requirements but need formal issues created.
Testing Infrastructure:
- Hermetic testing for Flow Control logic.
- End-to-End (E2E) tests for Flow Control logic.
Architecture & Design:
- Determine if sharding is required (and if so, if autotuning shard count is needed)
- Formalize interactions with Latency Predictor.
- Formalize interactions with Workload Variant Autoscaler (WVA).
- Formalize interactions with other well-lit paths.
- Formalize admission control and load shedding interactions.
- Define support for Online vs. Offline Batching constraints.
- Hold out minimum quantum of throughput per flow. Reserve the top N dispatch slots only for highest-priority traffic, ensuring a burst of P0 traffic is never blocked by a fully saturated pool of P1 traffic (do we tackle this in Saturation Detection, Scheduling (reserving pods for high priority traffic, or admission control?)
- Flow Control support for offline batch (via Flow Control primitives ported as a library)
- No throttling in batch components
- Priority levels for offline and interactive requests s.t., interactive is strictly prioritized
- Pass-through design from offline batch → online batch → Flow Control (in EPP)
API Definitions:
- Finalize CRD integration (e.g., Priority derivation from InferenceObjective).
- Standardize FlowKey sources (Headers vs. CRD).
- Standardize TTL sources (Headers vs. Spec).