Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
ac5444a
Add say-server: streaming TTS MCP App with karaoke UI
ochafik Jan 16, 2026
31609f4
Update server.py
ochafik Jan 16, 2026
3f982c1
fix(say-server): fix React/Zod version conflicts in widget
ochafik Jan 16, 2026
50d8091
fix(say-server): show play button when paused, restart when finished
ochafik Jan 16, 2026
5ede733
feat(say-server): add fullscreen button, remove inline scroll
ochafik Jan 16, 2026
d19404e
fix(say-server): fix fullscreen button visibility and prevent multipl…
ochafik Jan 16, 2026
fe668e7
fix(qr-server): use meta parameter instead of manual handler patching
ochafik Jan 16, 2026
99cd1a5
fix(say-server): simplify controls to prevent multiple audio streams
ochafik Jan 16, 2026
0d16e28
fix(say-server): only show pause button on hover
ochafik Jan 16, 2026
b3334b2
fix(say-server): fix partial input handling and session management
ochafik Jan 16, 2026
413beca
fix(say-server): detect new sessions by text comparison, not queueIdR…
ochafik Jan 16, 2026
390bb7f
feat(say-server): add theming, floating button, and keyboard shortcuts
ochafik Jan 16, 2026
6cbe30d
docs(say-server): add TODO items for future improvements
ochafik Jan 16, 2026
8bb83d1
fix(say-server): simplify to fixed play button at top right
ochafik Jan 16, 2026
7f915b4
test(e2e): add say-server to tests but skip in CI
ochafik Jan 16, 2026
8bb7a98
test(e2e): add say-server and qr-server to screenshot generation
ochafik Jan 16, 2026
ddf41b9
fix(say-server): restore working play/pause overlay, keep theming and…
ochafik Jan 16, 2026
0c99943
fix(say-server): minimal CSS-only theming, no JS changes from working…
ochafik Jan 16, 2026
6167c9a
feat(say-server): add autoPlay param and Reset button
ochafik Jan 16, 2026
b3f9059
fix(say-server): update FastMCP API for new SDK version
ochafik Jan 16, 2026
a2573f4
debug(say-server): add console logs to trace audio stream issue
ochafik Jan 16, 2026
5bbf146
feat(say-server): add play/pause buttons in top-left and bottom-right…
ochafik Jan 16, 2026
b75e2b1
fix(say-server): simplify UI and fix pause/resume audio duplication
ochafik Jan 16, 2026
a8e5029
feat(say-server): add voice selection support
ochafik Jan 16, 2026
095df25
docs(say-server): improve tool description for better LLM triggering
ochafik Jan 16, 2026
c21da49
feat(say-server): add speak lock for multi-widget coordination
ochafik Jan 16, 2026
2db2946
update package.json of say
ochafik Jan 17, 2026
d50aaf0
feat(say-server): add screenshots and update README gallery
ochafik Jan 17, 2026
8be6749
style(say-server): apply prettier formatting
ochafik Jan 17, 2026
bf613fa
feat(say-server): add Escape key to exit fullscreen and update docs
ochafik Jan 17, 2026
90fe8f5
fix(say-server): add queue timeout to prevent infinite polling
ochafik Jan 17, 2026
5e6ab93
debug(say-server): add logging to track queue lifecycle
ochafik Jan 17, 2026
0064873
perf(say-server): add initial delay before polling
ochafik Jan 17, 2026
536391f
fix(say-server): use adaptive polling instead of fixed initial delay
ochafik Jan 17, 2026
11e3d9c
feat(say-server): add speaker icon to server info
ochafik Jan 17, 2026
e9cbcea
docs(say-server): add comprehensive third-party licensing information
ochafik Jan 17, 2026
b339a00
Update last activity to prevent timeout during active polling
ochafik Jan 19, 2026
e06afdc
Merge branch 'ochafik/say-server' of github.com:modelcontextprotocol/…
ochafik Jan 19, 2026
88f76d0
feat(say-server): add info button, SVG icons, and UI fixes
ochafik Jan 19, 2026
ffd6035
docs(say-server): document multi-widget speak lock mechanism
ochafik Jan 19, 2026
d0b62a1
Merge branch 'main' into ochafik/say-server
ochafik Jan 20, 2026
335ba34
prettier:fix
ochafik Jan 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,5 @@ intermediate-findings/
# Playwright
playwright-report/
test-results/
__pycache__/
*.pyc
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,8 +67,8 @@ Or edit your `package.json` manually:
| [**Scenario Modeler**](examples/scenario-modeler-server) | [**Budget Allocator**](examples/budget-allocator-server) | [**Customer Segmentation**](examples/customer-segmentation-server) |
| [![System Monitor](examples/system-monitor-server/grid-cell.png "Real-time OS metrics")](examples/system-monitor-server) | [![Transcript](examples/transcript-server/grid-cell.png "Live speech transcription")](examples/transcript-server) | [![Video Resource](examples/video-resource-server/grid-cell.png "Binary video via MCP resources")](examples/video-resource-server) |
| [**System Monitor**](examples/system-monitor-server) | [**Transcript**](examples/transcript-server) | [**Video Resource**](examples/video-resource-server) |
| [![PDF Server](examples/pdf-server/grid-cell.png "Interactive PDF viewer with chunked loading")](examples/pdf-server) | [![QR Code](examples/qr-server/grid-cell.png "QR code generator")](examples/qr-server) | |
| [**PDF Server**](examples/pdf-server) | [**QR Code (Python)**](examples/qr-server) | |
| [![PDF Server](examples/pdf-server/grid-cell.png "Interactive PDF viewer with chunked loading")](examples/pdf-server) | [![QR Code](examples/qr-server/grid-cell.png "QR code generator")](examples/qr-server) | [![Say Demo](examples/say-server/grid-cell.png "Text-to-speech demo")](examples/say-server) |
| [**PDF Server**](examples/pdf-server) | [**QR Code (Python)**](examples/qr-server) | [**Say Demo**](examples/say-server) |

### Starter Templates

Expand Down
2 changes: 2 additions & 0 deletions examples/say-server/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
node_modules/
dist/
175 changes: 175 additions & 0 deletions examples/say-server/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
# Say Server - Streaming TTS MCP App

A real-time text-to-speech MCP App with karaoke-style text highlighting, powered by [Kyutai's Pocket TTS](https://github.com/kyutai-labs/pocket-tts).

## MCP App Features Demonstrated

This example showcases several MCP App capabilities:

- **Single-file executable**: Python server with embedded React UI - no build step required
- **Partial tool inputs** (`ontoolinputpartial`): Widget receives streaming text as it's being generated
- **Queue-based streaming**: Demonstrates how to stream text out and audio in via a polling tool (adds text to an input queue, retrieves audio chunks from an output queue)
- **Model context updates**: Widget updates the LLM with playback progress ("Playing: ...snippet...")
- **Native theming**: Uses CSS variables for automatic dark/light mode adaptation
- **Fullscreen mode**: Toggle fullscreen via `requestDisplayMode()` API, press Escape to exit
- **Multi-widget speak lock**: Coordinates multiple TTS widgets via localStorage so only one plays at a time
- **Hidden tools** (`visibility: ["app"]`): Private tools only accessible to the widget, not the model
- **External links** (`openLink`): Attribution popup uses `app.openLink()` to open external URLs
- **CSP metadata**: Resource declares required domains (`esm.sh`) for in-browser transpilation

## Features

- **Streaming TTS**: Audio starts playing as text is being generated
- **Karaoke highlighting**: Words are highlighted in sync with speech
- **Interactive controls**: Click to pause/resume, double-click to restart
- **Low latency**: Uses a polling-based queue for minimal delay

## Prerequisites

- [uv](https://docs.astral.sh/uv/getting-started/installation/) - fast Python package manager
- A CUDA GPU (recommended) or CPU with sufficient RAM (~2GB for model)

## Quick Start

The server is a single self-contained Python file that can be run directly with `uv`:

```bash
# Run directly (uv auto-installs dependencies)
uv run examples/say-server/server.py
```

The server will be available at `http://localhost:3109/mcp`.

## Running with Docker

Run directly from GitHub using the official `uv` Docker image. Mount your HuggingFace cache to avoid re-downloading the model:

```bash
docker run --rm -it \
-p 3109:3109 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-e HF_HOME=/root/.cache/huggingface \
ghcr.io/astral-sh/uv:debian \
uv run https://raw.githubusercontent.com/modelcontextprotocol/ext-apps/main/examples/say-server/server.py
```

For GPU support, add `--gpus all` (requires [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)).

## Usage

### With Claude Desktop

Add to your Claude Desktop config (`~/Library/Application Support/Claude/claude_desktop_config.json`):

```json
{
"mcpServers": {
"say": {
"command": "uv",
"args": ["run", "server.py", "--stdio"],
"cwd": "/path/to/examples/say-server"
}
}
}
```

### With MCP Clients

Connect to `http://localhost:3109/mcp` and call the `say` tool:

```json
{
"name": "say",
"arguments": {
"text": "Hello, world! This is a streaming TTS demo."
}
}
```

## Available Voices

The default voice is `cosette`. Use the `list_voices` tool or pass a `voice` parameter to `say`:

### Predefined Voices

- `alba`, `marius`, `javert`, `jean` - from [alba-mackenna](https://huggingface.co/kyutai/tts-voices/tree/main/alba-mackenna) (CC BY 4.0)
- `cosette`, `eponine`, `azelma`, `fantine` - from [VCTK dataset](https://huggingface.co/kyutai/tts-voices/tree/main/vctk) (CC BY 4.0)

### Custom Voices

You can also use HuggingFace URLs or local file paths:

```json
{"text": "Hello!", "voice": "hf://kyutai/tts-voices/voice-donations/alice.wav"}
{"text": "Hello!", "voice": "/path/to/my-voice.wav"}
```

See the [kyutai/tts-voices](https://huggingface.co/kyutai/tts-voices) repository for more voice collections

## Architecture

The entire server is contained in a single `server.py` file:

1. **`say` tool**: Public tool that triggers the widget with text to speak
2. **Private tools** (`create_tts_queue`, `add_tts_text`, `poll_tts_audio`, etc.): Hidden from the model, only callable by the widget
3. **Embedded React widget**: Uses [Babel standalone](https://babeljs.io/docs/babel-standalone) for in-browser JSX transpilation - no build step needed
4. **TTS backend**: Manages per-request audio queues using Pocket TTS

The widget communicates with the server via MCP tool calls:

- Receives streaming text via `ontoolinputpartial` callback
- Incrementally sends new text to the server as it arrives (via `add_tts_text`)
- Polls for generated audio chunks while TTS runs in parallel
- Plays audio via Web Audio API with synchronized text highlighting

## Multi-Widget Speak Lock

When multiple TTS widgets exist in the same browser (e.g., multiple chat messages each with their own say widget), they coordinate via localStorage to ensure only one plays at a time:

1. **Unique Widget IDs**: Each widget receives a UUID via `toolResult._meta.widgetUUID`
2. **Announce on Play**: When starting, a widget writes `{uuid, timestamp}` to `localStorage["mcp-tts-playing"]`
3. **Poll for Conflicts**: Every 200ms, playing widgets check if another widget took the lock
4. **Yield Gracefully**: If another widget started playing, pause and yield
5. **Clean Up**: On pause/finish, clear the lock (only if owned)

This "last writer wins" protocol ensures a seamless experience: clicking play on any widget immediately pauses others, without requiring cross-iframe postMessage coordination.

## TODO

- Persist caret position in localStorage (resume from where you left off)
- Click anywhere in text to move the cursor/playback position

## Credits

This project uses [Pocket TTS](https://github.com/kyutai-labs/pocket-tts) by [Kyutai](https://kyutai.org/) - a fantastic open-source text-to-speech model. Thank you to the Kyutai team for making this technology available!

The server includes modified Pocket TTS code to support streaming text input (text can be fed incrementally while audio generation runs in parallel). A PR contributing this functionality back to the original repo is planned.

## License

This example is MIT licensed.

### Third-Party Licenses

This project uses the following open-source components:

| Component | License | Link |
| --------------------------------------------------------------------- | ----------------- | ---------------------------- |
| [pocket-tts](https://github.com/kyutai-labs/pocket-tts) | MIT | Python TTS library |
| [Kyutai TTS model](https://huggingface.co/kyutai/tts-0.75b-en-public) | CC-BY 4.0 | Text-to-speech model weights |
| [kyutai/tts-voices](https://huggingface.co/kyutai/tts-voices) | Mixed (see below) | Voice prompt files |

### Voice Collection Licenses

The predefined voices in this example use **CC-BY 4.0** licensed collections:

| Collection | License | Commercial Use |
| --------------- | ------------------- | ------------------------- |
| alba-mackenna | CC-BY 4.0 | ✅ Yes (with attribution) |
| vctk | CC-BY 4.0 | ✅ Yes (with attribution) |
| cml-tts/fr | CC-BY 4.0 | ✅ Yes (with attribution) |
| voice-donations | CC0 (Public Domain) | ✅ Yes |
| **expresso** | CC-BY-NC 4.0 | ❌ Non-commercial only |
| **ears** | CC-BY-NC 4.0 | ❌ Non-commercial only |

⚠️ **Note**: If you use voices from the `expresso/` or `ears/` collections, your use is restricted to non-commercial purposes.
Binary file added examples/say-server/grid-cell.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
14 changes: 14 additions & 0 deletions examples/say-server/mcp-app.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="color-scheme" content="light dark">
<title>Say Widget</title>
<link rel="stylesheet" href="/src/global.css">
</head>
<body>
<div id="root"></div>
<script type="module" src="/src/mcp-app.tsx"></script>
</body>
</html>
17 changes: 17 additions & 0 deletions examples/say-server/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{
"name": "@modelcontextprotocol/server-say",
"version": "0.4.1",
"private": true,
"description": "Streaming TTS MCP App Server with karaoke-style text highlighting",
"repository": {
"type": "git",
"url": "https://github.com/modelcontextprotocol/ext-apps",
"directory": "examples/say-server"
},
"license": "MIT",
"scripts": {
"start": "uv run server.py",
"dev": "uv run server.py",
"build": "echo 'No build step needed for Python server'"
}
}
Binary file added examples/say-server/screenshot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading