feat: Implement LLM call caching #243

vaishcodescape · 2026-01-15T10:21:09Z

Closes #241

📝 Description

This PR fixes the unnecessary LLM API calls made by the user either in the Discord or GitHub channels by implementing a multi-layer caching and optimization system for message classification.

🔧 Changes Made

cache_helpers.py

Simplified hashing: Replaced with single XXHash128, reducing CPU overhead and reducing collision
Added MAX_MESSAGE_LENGTH (10KB): Truncates oversized messages before processing to prevent DoS attacks and excessive memory usage patches security issues
Bounded in-flight tracking: Replaced unbounded _inflight dict with TTLCache(maxsize=1000, ttl=120) to prevent memory leaks from orphaned requests
Fixed orphaned future bug: Added asyncio.CancelledError handling to prevent request hangs when tasks are cancelled

classification_router.py

Added LLM concurrency limiter: Implemented asyncio.Semaphore(10) to limit concurrent LLM API calls, preventing rate limit errors and cost explosions during traffic spikes
Added message length validation: Early return for messages exceeding MAX_MESSAGE_LENGTH with fallback classification

Impact

Optimization Summary

Optimization	Impact
Pattern matching fast-path	30-50% cached messages skip LLM entirely
Result caching	Repeated questions served instantly
In-flight deduplication	Concurrent identical requests share one LLM call
Concurrency limiter	Max 10 simultaneous LLM calls

✅ Checklist

I have read the contributing guidelines.
I have added tests that prove my fix is effective or that my feature works.
I have added necessary documentation (if applicable).
Any dependent changes have been merged and published in downstream modules.

Summary by CodeRabbit

New Features
- Added an asynchronous caching layer for message classification to speed repeated or similar requests.
- Added pattern-based fast-path detection to return immediate results for common messages.
- Deduplicated concurrent identical requests so only one backend call is performed.
- Implemented concurrency limits and message-size safeguards to protect system stability.
- Exposed runtime metrics for total requests, cache hits/misses, and skipped LLM calls.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…rrency control for message classification.

coderabbitai · 2026-01-15T10:21:22Z

📝 Walkthrough

Walkthrough

Adds a new asynchronous LLM caching layer (TTL LRU + in‑flight dedupe), message normalization and regex fast‑path classification, and integrates these into ClassificationRouter with concurrency limiting, message length guards, metrics, and fallback triage behavior.

Changes

Cohort / File(s)	Summary
Caching Infrastructure `backend/app/classification/cache_helpers.py`	New module: TTL LRU cache and in‑flight dedupe, stable key generation (XXHash or blake2b fallback), message normalization, regex pattern fast‑path classification, cache get/set helpers, async `cached_llm_call` orchestration, and metrics counters.
Classification Router Integration `backend/app/classification/classification_router.py`	Integrates cache_helpers: adds a global semaphore for concurrency control, message length guard, normalization, fast‑path skip, normalized cache lookup/store, metrics increments, guarded LLM invocation, JSON parse of LLM payload, and fallback triage on error.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Message Request
    participant Router as ClassificationRouter
    participant Helper as cache_helpers
    participant Cache as TTL Cache
    participant Inflight as In‑flight Registry
    participant LLM as LLM Service

    Client->>Router: should_process_message(message, context)
    Router->>Helper: normalize_message(message)
    Helper-->>Router: normalized
    Router->>Helper: is_simple_message(normalized)
    Helper-->>Router: fast‑path result or None

    alt fast‑path
        Router-->>Client: return classification
    else
        Router->>Helper: key_for_normalized(normalized,...)
        Router->>Helper: get_cached_by_normalized(...)
        Helper->>Cache: lookup
        Cache-->>Helper: cached payload or miss
        Helper-->>Router: payload or None

        alt cache hit
            Router-->>Client: return cached classification
        else cache miss
            Router->>Inflight: check/register future for key
            alt another inflight exists
                Inflight-->>Router: await shared future -> result
            else owner:
                Router->>LLM: ainvoke(triage prompt) (guarded by semaphore)
                LLM-->>Router: response
                Router->>Helper: set_cached_by_normalized(..., payload)
                Router->>Inflight: resolve future + cleanup
            end
            Router-->>Client: return classification
        end
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

[feature]: entirely ReAct based workflow; aligns agents; introduces github agent #90 — Modifies classification subsystem and ClassificationRouter/triage logic in the same backend/app/classification area; likely related.

Suggested reviewers

chandansgowda

Poem

🐰 Hopping through prompts, I tidy and sort,

Keys clack and futures wait in a cozy fort.
Greetings answered quick, repeated asks spared,
One cached nibble saved many a call dared.
A rabbit's small hop and tokens are spared.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 58.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'feat: Implement LLM call caching' directly and accurately describes the primary change: adding caching functionality for LLM calls.
Linked Issues check	✅ Passed	All coding requirements from issue `#241` are implemented: pattern matching for simple messages, LRU cache with TTL, message normalization, in-flight deduplication, concurrency limiting, and metrics tracking.
Out of Scope Changes check	✅ Passed	All changes are scoped to the caching and optimization objectives from issue `#241`. No unrelated modifications detected beyond the stated enhancement goals.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧹 Recent nitpick comments

backend/app/classification/cache_helpers.py (2)
14-19: Refine exception type to ImportError.

The xxhash import is correctly moved inside the try block, but catching broad Exception is overly permissive. Use ImportError for precision and to satisfy the static analysis hint (BLE001).
♻️ Proposed fix
 try:
     import xxhash
     _HAS_XXHASH = True
-except Exception:
+except ImportError:
     xxhash = None
     _HAS_XXHASH = False
101-104: Consider logging when JSON serialization fails.

The fallback to str(params) is reasonable but could produce inconsistent cache keys if params contains objects with unstable __str__ representations. Adding a debug log would help identify problematic inputs.
♻️ Proposed fix
     try:
         params_blob = json.dumps(params or {}, sort_keys=True, separators=(",", ":"), default=str).encode("utf-8")
-    except Exception:
+    except (TypeError, ValueError) as e:
+        logger.debug(f"JSON serialization failed for params, using str fallback: {e}")
         params_blob = str(params).encode("utf-8")

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9d1db32 and c66b9fb.

📒 Files selected for processing (1)

backend/app/classification/cache_helpers.py

🧰 Additional context used

🪛 Ruff (0.14.11)

backend/app/classification/cache_helpers.py

17-17: Do not catch blind exception: Exception

(BLE001)

103-103: Do not catch blind exception: Exception

(BLE001)

188-188: Consider moving this statement to an else block

(TRY300)

🔇 Additional comments (5)

backend/app/classification/cache_helpers.py (5)

21-58: LGTM!

The configuration constants are sensible and the regex patterns are well-structured for fast-path classification. The patterns cover a good range of common message types including Discord-specific patterns.

82-87: LGTM!

Good use of bounded TTLCache for both the main cache and in-flight deduplication. The TTL on _inflight prevents memory leaks from orphaned requests.

141-224: Well-implemented async caching with proper task management.

The in-flight deduplication logic is solid:

Atomic future installation via setdefault correctly handles race conditions.

The owner task reference is stored and tracked.

Done callback propagates exceptions to waiting callers.

Cancellation is properly handled by cancelling the owner task and awaiting it.

The elapsed variable is now used for logging, and the task reference is properly managed.

226-231: LGTM!

The normalize_message function correctly handles truncation for DoS prevention, trims whitespace, lowercases, and collapses multiple whitespace characters into single spaces.

238-247: LGTM!

Simple and effective cache accessor functions with proper KeyError handling.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

vaishcodescape · 2026-01-15T10:22:42Z

@smokeyScraper @chandansgowda Can you review this I think I have implemented it the best I could any insights from you guys ?

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@backend/app/classification/cache_helpers.py`:
- Around line 62-80: The _PATTERN_CLASSIFICATION mapping contains orphan keys
"integration" and "architecture" that have no corresponding regex entries in
_PATTERNS, so is_simple_message() can never match them; fix by either removing
those two keys from _PATTERN_CLASSIFICATION or adding corresponding regex
patterns to _PATTERNS (e.g., patterns that match
oauth/discord/github/webhook/api/integration for "integration" and
queue/database/redis/llm/infra/architecture for "architecture") so the keys and
patterns are consistent with the existing classification logic.
- Around line 9-19: The xxhash availability check is broken because import
xxhash is unconditional; move the import into the try/except and set the flag
based on whether the import succeeded: wrap "import xxhash" in a try block, on
success set _HAS_XXHASH = True, on ImportError set xxhash = None and _HAS_XXHASH
= False (and keep existing hashlib/json imports outside). Update any existing
references to _HAS_XXHASH and xxhash to rely on this flag so code falls back to
hashlib when xxhash is not installed.
- Around line 177-199: The _owner_fetch coroutine computes an unused variable
elapsed and the created background task is not tracked or
cancellation-propagated; modify cached_llm_call to either remove or use elapsed
(e.g., include it in logs), assign the created task to a variable (owner_task =
loop.create_task(_owner_fetch())), add a done callback on owner_task to
set_exception on future if the task failed and future not done, and wrap the
await future in a try/except asyncio.CancelledError that cancels owner_task
(owner_task.cancel()), awaits owner_task to avoid orphaned execution, then
re-raises the cancellation; keep the existing finally that pops _inflight and
continue setting _cache and future as currently implemented.

🧹 Nitpick comments (3)

backend/app/classification/cache_helpers.py (1)
100-104: Consider logging the fallback for non-serializable params.

The broad exception catch on line 103 silently falls back to str(params). While this defensive approach is appropriate, adding a debug log would help identify cases where params contain unexpected non-serializable types.
♻️ Suggested improvement
     try:
         params_blob = json.dumps(params or {}, sort_keys=True, separators=(",", ":"), default=str).encode("utf-8")
-    except Exception:
+    except (TypeError, ValueError) as e:
+        logger.debug(f"Params serialization fallback: {type(e).__name__}")
         params_blob = str(params).encode("utf-8")
backend/app/classification/classification_router.py (2)
53-57: Simplify context ID extraction.

The current logic is a bit convoluted: or "" followed by if not ctx_id: ctx_id = None is redundant since empty string is already falsy.
♻️ Simplified version
             ctx_id = None
             if context:
-                ctx_id = context.get("channel_id") or context.get("thread_id") or ""
-                if not ctx_id:
-                    ctx_id = None
+                ctx_id = context.get("channel_id") or context.get("thread_id") or None
59-59: Consider extracting duplicated params dict.

The params dict {"temperature": 0.1} is hardcoded both here and on line 88. Consider extracting to a constant to ensure consistency.
♻️ Extract to constant
+_LLM_PARAMS = {"temperature": 0.1}
+
 class ClassificationRouter:
     ...
-            cached = get_cached_by_normalized(normalized, ctx_id, settings.classification_agent_model, {"temperature": 0.1})
+            cached = get_cached_by_normalized(normalized, ctx_id, settings.classification_agent_model, _LLM_PARAMS)
             ...
-                set_cached_by_normalized(normalized, ctx_id, settings.classification_agent_model, {"temperature": 0.1}, payload)
+                set_cached_by_normalized(normalized, ctx_id, settings.classification_agent_model, _LLM_PARAMS, payload)

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aa66519 and f7211ef.

📒 Files selected for processing (2)

backend/app/classification/cache_helpers.py
backend/app/classification/classification_router.py

🧰 Additional context used

🧬 Code graph analysis (2)

backend/app/classification/classification_router.py (1)

backend/app/classification/cache_helpers.py (4)

normalize_message (201-205)

is_simple_message (207-211)

get_cached_by_normalized (129-132)

set_cached_by_normalized (135-138)

backend/app/classification/cache_helpers.py (2)

backend/app/services/embedding_service/service.py (2)

model (40-51)

llm (54-67)

backend/app/core/orchestration/queue_manager.py (1)

start (47-56)

🪛 Ruff (0.14.11)

backend/app/classification/cache_helpers.py

17-17: Do not catch blind exception: Exception

(BLE001)

103-103: Do not catch blind exception: Exception

(BLE001)

181-181: Local variable elapsed is assigned to but never used

Remove assignment to unused variable elapsed

(F841)

186-186: Consider moving this statement to an else block

(TRY300)

198-198: Store a reference to the return value of loop.create_task

(RUF006)

🔇 Additional comments (9)

backend/app/classification/cache_helpers.py (4)

21-24: LGTM!

The cache configuration constants are reasonable: 4096 entries, 1-hour TTL, and 10KB message limit for DoS protection.

82-87: LGTM!

Good use of bounded TTLCache for both the result cache and in-flight deduplication. The 2-minute TTL on _inflight prevents memory leaks from orphaned entries.

201-211: LGTM!

The normalization logic is solid: truncation for DoS protection, lowercasing and whitespace collapsing for cache hit improvement. The is_simple_message function correctly iterates patterns and returns a copy of the classification dict with the original message attached.

213-222: LGTM!

Clean cache access wrappers with proper exception handling for missing keys.

backend/app/classification/classification_router.py (5)

1-16: LGTM!

Appropriate imports added for the caching integration. The selective imports from cache_helpers keep the namespace clean.

20-21: LGTM!

The semaphore with a limit of 10 concurrent LLM calls is a sensible default for rate limiting and cost control.

37-40: LGTM!

Good early return for oversized messages. This prevents DoS via large payloads and aligns with the MAX_MESSAGE_LENGTH constant from cache_helpers.

71-74: LGTM!

The semaphore correctly guards the LLM invocation to limit concurrent calls.

82-89: LGTM!

The payload construction and caching logic is correct. The structure aligns with _fallback_triage output format.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

backend/app/classification/cache_helpers.py

…patibility

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@backend/app/classification/cache_helpers.py`:
- Around line 207-211: is_simple_message currently returns the first match from
_PATTERNS which lets low-priority patterns like "greeting" shadow
higher-priority intents such as "action_request"; fix by prioritizing actionable
intents: either reorder _PATTERNS so "action_request" (and other actionable
patterns) appear before "greeting", or implement priority-based matching by
reading a priority field from _PATTERN_CLASSIFICATION (or a separate priority
map) and iterating _PATTERNS to collect matches then return the match with
highest priority; update is_simple_message to use the chosen approach and ensure
it still returns dict(_PATTERN_CLASSIFICATION[name],
original_message=normalized) for the selected top-priority name.

♻️ Duplicate comments (2)

backend/app/classification/cache_helpers.py (2)

62-80: Orphan classification entries remain unaddressed.

The _PATTERN_CLASSIFICATION dict still contains "integration" (line 68) and "architecture" (line 69) entries with no corresponding patterns in _PATTERNS. These classifications will never be matched by is_simple_message().

177-199: Task management issues remain unaddressed.

The previously flagged concerns still apply:

elapsed (line 181) is computed but never used

The task created on line 198 is not stored, risking unhandled exceptions being silently dropped

Cancellation of the caller is not propagated to the owner task

🧹 Nitpick comments (1)

backend/app/classification/cache_helpers.py (1)
14-19: Narrow the exception type to ImportError.

The import structure is now correct, but catching bare Exception is overly broad and can mask unexpected errors. Use ImportError for import failures.
 try:
     import xxhash
     _HAS_XXHASH = True
-except Exception:
+except ImportError:
     xxhash = None
     _HAS_XXHASH = False

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f7211ef and 9d1db32.

📒 Files selected for processing (1)

backend/app/classification/cache_helpers.py

🧰 Additional context used

🪛 Ruff (0.14.11)

backend/app/classification/cache_helpers.py

17-17: Do not catch blind exception: Exception

(BLE001)

103-103: Do not catch blind exception: Exception

(BLE001)

181-181: Local variable elapsed is assigned to but never used

Remove assignment to unused variable elapsed

(F841)

186-186: Consider moving this statement to an else block

(TRY300)

198-198: Store a reference to the return value of loop.create_task

(RUF006)

🔇 Additional comments (7)

backend/app/classification/cache_helpers.py (7)

21-24: LGTM!

Configuration constants are well-chosen: reasonable cache size, appropriate TTL, and sensible message length limit for DoS protection.

26-58: LGTM!

The pre-compiled regex patterns are well-structured for fast-path classification. The combination of anchored patterns for greetings/acknowledgments and flexible patterns for content detection is appropriate.

82-87: LGTM!

Good use of TTLCache for both the main cache and in-flight deduplication. The TTL on _inflight prevents memory leaks from orphaned requests. The metrics counter is appropriate for observability.

91-112: LGTM!

The cache key generation is well-designed with proper normalization, deterministic serialization, and appropriate fallbacks. The broad exception catch for JSON serialization fallback is acceptable here since we want to handle any edge case gracefully.

115-138: LGTM!

The key composition and cache access helper functions are well-structured with clear separation of concerns.

201-205: LGTM!

Normalization is well-implemented with proper truncation, whitespace handling, and case normalization for consistent cache key generation.

213-222: LGTM!

Cache access functions are simple and correct. Minor note: cache_get could use _cache.get(key) for the same behavior, but the current implementation is clear.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

backend/app/classification/cache_helpers.py

… background tasks

feat: Implement LLM call caching, fast-path classification, and concu…

f7211ef

…rrency control for message classification.

coderabbitai bot reviewed Jan 15, 2026

View reviewed changes

backend/app/classification/cache_helpers.py Outdated Show resolved Hide resolved

backend/app/classification/cache_helpers.py Show resolved Hide resolved

backend/app/classification/cache_helpers.py Outdated Show resolved Hide resolved

fix: Import xxhash conditionally in cache_helpers.py for improved com…

9d1db32

…patibility

coderabbitai bot reviewed Jan 15, 2026

View reviewed changes

backend/app/classification/cache_helpers.py Show resolved Hide resolved

feat: Enhance cached LLM call with logging and exception handling for…

c66b9fb

… background tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Implement LLM call caching #243

feat: Implement LLM call caching #243

Uh oh!

vaishcodescape commented Jan 15, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 15, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

vaishcodescape commented Jan 15, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: Implement LLM call caching #243

Are you sure you want to change the base?

feat: Implement LLM call caching #243

Uh oh!

Conversation

vaishcodescape commented Jan 15, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📝 Description

🔧 Changes Made

cache_helpers.py

classification_router.py

Impact

Optimization Summary

✅ Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

vaishcodescape commented Jan 15, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vaishcodescape commented Jan 15, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 15, 2026 •

edited

Loading