-
Notifications
You must be signed in to change notification settings - Fork 139
feat(examples): add say-server - streaming Pocket TTS with karaoke highlighting #304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Streaming TTS MCP App with real-time word highlighting - React widget with pause/resume controls - Debounced model context updates - Self-contained server.py for standalone uv run Co-authored-by: Claude <[email protected]>
- Pin Zod to v3.25.1 via esm.sh deps param (fixes .custom() error) - Add external=react,react-dom to prevent duplicate React instances - Update React from 19.1.0 to 19.2.0 - Import React for Babel's classic JSX transform - Consolidate imports: only use @modelcontextprotocol/ext-apps/react - Use Babel from unpkg.com instead of esm.sh - Simplify resource decorator with meta= parameter - Update CSP to allow both esm.sh and unpkg.com
- Add fullscreen toggle button (appears on hover when available) - Remove max-height/scroll in inline mode for cleaner display - Enable scrolling only in fullscreen mode - Track displayMode and fullscreenAvailable state - Handle onHostContextChanged to detect fullscreen support
…e audio streams - Check initial host context for fullscreen availability (not just updates) - Add audioOperationInProgressRef guard to prevent concurrent audio operations - Prevents race conditions from rapid clicks/double-clicks
Simplified CSP metadata handling to match say-server's approach: - Use meta parameter on @mcp.resource decorator directly - Removed custom _read_resource_with_meta handler and handler replacement
- Remove onClick from text display (was conflicting with double-click) - Always show overlay button (with pause icon when playing) - Lower opacity (0.3) when playing, full on hover - Button shows:▶️ (play), ⏸️ (pause), 🔄 (restart) - Double-click text to restart still works - Eliminates click/dblclick race condition
- Play/restart buttons visible normally - Pause button hidden until hover (doesn't block karaoke text)
- Add promise lock to initTTSQueue to prevent concurrent initialization - Reset queueIdRef and lastTextRef in ontoolresult for next tool call - Close existing AudioContext when starting new session - Reset all playback state (chunks, timings, position) on new session - Fixes: text deltas not sent when new tool call has shorter text
…ef reset - Don't reset queueIdRef in ontoolresult - audio should keep playing - Detect new sessions in ontoolinputpartial/ontoolinput by checking if new text starts with lastTextRef (continuation) or not (new session) - Only reset and create new queue when genuinely new session detected - Fixes: pause/play was creating new streams because queueIdRef was reset
Theming: - Use CSS variables (--font-sans, --color-text-primary/secondary) with fallbacks - Apply host fonts via applyHostFonts() on connect and context changes - Apply host style variables via applyHostStyleVariables() - Apply document theme via applyDocumentTheme() Floating button: - Button follows cursor position (below current line being spoken) - Uses invisible cursor marker element for accurate positioning - Shows ⏸️ when playing,▶️ when paused, 🔄 when finished - Low opacity when playing, full on hover Keyboard shortcuts: - Space: toggle play/pause - Enter: toggle fullscreen (when available) UI changes: - Large overlay only shown for initial play (idle state) - Floating button used during playback for less intrusive controls
- Remove floating cursor-following button (was too complex) - Fixed play button at top-right corner - Low opacity when playing, full on hover - Keep theming and keyboard shortcuts (Space/Enter) - Add say-server to E2E tests
Say-server requires Pocket TTS model (~500MB) and GPU support, which isn't available in the Docker CI environment. Skip it for now.
- Remove say-server from SKIP_SERVERS (widget works without TTS model) - Add say-server masks for play buttons - Add both say-server and qr-server to grid screenshot generator
… keyboard shortcuts Reverted to working commit 413beca and carefully reapplied: - CSS variables for theming (--font-sans, --color-text-primary/secondary) - Dark mode support via @media (prefers-color-scheme) - Host theming via applyHostStyleVariables/applyHostFonts/applyDocumentTheme - Keyboard shortcuts: Space = play/pause, Enter = fullscreen Preserved working behavior: - Play overlay visible when not playing, hover to show when playing - Pause on click works correctly - Karaoke highlighting works
… version Only CSS changes from working commit 413beca: - Added CSS variables for theming foundation - No JavaScript changes (reverted theming calls and keyboard shortcuts) This isolates whether the issue is CSS or JS related.
- Add autoPlay parameter (default: true) with Pydantic Field description - Description visible in MCP tool schema: 'browsers may block autoplay' - Add Reset button next to Fullscreen button (bottom right) - Both buttons share controlBtn base styles, shown on hover - Widget reads autoPlay from tool input params
Python SDK refactored transport params from constructor to run(): - Remove port, stateless_http from FastMCP() constructor - Pass stateless_http=True to streamable_http_app() instead
Logging added to: - togglePlayPause (all branches) - initTTSQueue (entry and AudioContext creation) - restartPlayback (entry) - ontoolinputpartial/ontoolinput (entry and new session detection) Open browser DevTools console and reproduce the issue - logs will show what's being called.
… corners - Top left: playPauseTopBtn for quick access - Bottom right: playPauseBtn alongside reset and fullscreen - Both show on hover with same styling as other control buttons - Icons:▶️ (play), ⏸️ (pause), 🔄 (restart when finished)
- Remove central play overlay, use single toolbar in top-right - Fix audio duplication on pause/resume (only defer to pendingChunks for initial autoplay) - Remove double-click to restart (use toolbar restart button) - Sync displayMode when host changes it externally - Click on text to play/pause
- Add list_voices tool to show available voices - Add voice parameter to say tool (predefined names, HuggingFace URLs, or local paths) - Predefined voices: alba, marius, javert, jean, cosette, eponine, azelma, fantine - Update README with voice documentation
- Add English-only constraint prominently in first line and note - List common trigger patterns (say, speak, read aloud, etc.) - Update text parameter description to mention English - Keep description concise for quick LLM scanning
When multiple TTS widgets exist, only one plays at a time: - Widget announces to localStorage when starting playback - Playing widgets poll localStorage every 200ms - If another widget starts, previous one pauses automatically - Lock is cleared on pause/finish/teardown Uses _meta.widgetUUID in tool result for widget identification.
- Add say-server to demo gallery with 300x300 grid-cell.png - Fix qr-server grid-cell.png dimensions (was 672x668, now 300x300) - Add golden snapshot for say-server E2E tests - Add full screenshot and grid-cell thumbnail for say-server 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Add blank lines in README for better readability - Add trailing newline to package.json 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Add onKeyDown handler to main element for Escape key - Update README with full list of MCP App features: - Model context updates - Native theming - Fullscreen mode - Multi-widget speak lock - Update server.py docstring with feature documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
The TTS queue could hang forever if: - The LLM stops generating mid-stream (no end signal) - User cancels before tool completes - Network interruption prevents end_tts_queue call Changes: - Add last_activity timestamp to track queue activity - Add 30-second timeout (QUEUE_TIMEOUT_SECONDS) - Poll with 5-second timeout, checking for stale queues - Mark stale queues as "error" so widget stops polling - Return error message in poll_tts_audio response - Add error logging in widget polling loop This prevents tool calls from running forever when queues are abandoned. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Add console logging to debug why end_tts_queue might not be called: Widget-side: - Log queueId in ontoolinputpartial, ontoolinput, ontoolresult - Log when new session is detected (queue reset) - Log when initTTSQueue fails - Log when end_tts_queue is called/not called Server-side: - Log when end_tts_queue is called - Log warnings for unknown queues - Log info for already-ended queues This will help identify if the issue is: - User interruption (ontoolresult never called) - Session reset (queueId becomes null) - Host bug (callback dropped) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Add 150ms delay before first poll to give TTS model time to generate the first audio chunk. This reduces unnecessary polling and server load. Current backoff after that: - 30ms if chunks received (keep streaming fast) - 80ms if no chunks (wait longer for generation) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Previous approach: Always wait 150ms before first poll Problem: Even when chunks are ready, we wait unnecessarily New approach: Adaptive polling - Start polling immediately (no initial delay) - 20ms backoff when receiving chunks (fast streaming) - Exponential backoff when no chunks: 50ms → 100ms → 150ms max - Reset backoff when chunks start flowing This reduces latency when TTS is fast while being polite when it's slow. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Uses SVG data URI for a speaker with sound waves icon.
- Expand License section in README with component attribution table - Add voice collection license matrix with commercial use indicators - Add warning about non-commercial expresso/ears voice collections - Clarify NON-COMMERCIAL restriction in list_voices() output - Use consistent CC-BY formatting (hyphenated)
…ext-apps into ochafik/say-server
@modelcontextprotocol/ext-apps
@modelcontextprotocol/server-basic-react
@modelcontextprotocol/server-basic-vanillajs
@modelcontextprotocol/server-budget-allocator
@modelcontextprotocol/server-cohort-heatmap
@modelcontextprotocol/server-customer-segmentation
@modelcontextprotocol/server-map
@modelcontextprotocol/server-pdf
@modelcontextprotocol/server-scenario-modeler
@modelcontextprotocol/server-shadertoy
@modelcontextprotocol/server-sheet-music
@modelcontextprotocol/server-system-monitor
@modelcontextprotocol/server-threejs
@modelcontextprotocol/server-transcript
@modelcontextprotocol/server-video-resource
@modelcontextprotocol/server-wiki-explorer
commit: |
- Replace emoji play/pause buttons with neutral white SVG icons - Add (i) info button in bottom-right with attribution popup - Use app.openLink() API for external links in popup - Add dynamic padding to fit popup when shown - Make create_tts_queue async to fix AnyIO event loop error - Add better error logging for TTS queue initialization - Update README with openLink feature documentation - Remove unnecessary HuggingFace login prerequisites
antonpk1
previously approved these changes
Jan 19, 2026
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Adds a new example MCP App: say-server - a real-time text-to-speech server with karaoke-style text highlighting, powered by Kyutai's Pocket TTS 🐐 🔥.
MCP App Features Demonstrated
ontoolinputpartial): Widget receives streaming text as it's being generatedrequestDisplayMode()APIvisibility: ["app"]): Private tools only accessible to the widget, not the modelopenLink): Attribution popup usesapp.openLink()to open external URLsesm.sh) for in-browser transpilationMulti-Widget Speak Lock
When multiple TTS widgets exist in the same browser (e.g., multiple chat messages each with their own say widget), they coordinate via
localStorageto ensure only one plays at a time:toolResult._meta.widgetUUID{uuid, timestamp}tolocalStorage["mcp-tts-playing"]This "last writer wins" protocol ensures clicking play on any widget immediately pauses others, without requiring cross-iframe
postMessagecoordination.Features
Testing
E2E tests added in
tests/e2e/servers.spec.tsTested locally via stdio transport
Host config:
{ "mcpServers": { "say": { "command": "uv", "args": [ "run", "--index", "https://pypi.org/simple", "https://raw.githubusercontent.com/modelcontextprotocol/ext-apps/ochafik/say-server/examples/say-server/server.py", "--stdio" ] } } }Less trusting host config (docker instance still can scan your host's local ports, mind you: security is always relative)
{ "mcpServers": { "say": { "command": "docker", "args": [ "run", "--quiet", "--rm", "-i", "ghcr.io/astral-sh/uv:debian", "uv", "run", "--index", "https://pypi.org/simple", "https://raw.githubusercontent.com/modelcontextprotocol/ext-apps/ochafik/say-server/examples/say-server/server.py", "--stdio" ] } } }Dependencies
Uses Python with
uvordockerfor zero-config execution:mcp- MCP Python SDKpocket-tts- Kyutai's TTS libraryuvicorn,starlette- HTTP serverThird-party Licenses
All third-party code / model is hopefully properly attributed (see LICENSE comments in server.py and in the info bubble):