-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Description
Problem
When using adk web with native-audio models (e.g., gemini-live-2.5-flash-native-audio), requesting TEXT modality causes an error. Native-audio models only support AUDIO modality.
Root Cause
In src/google/adk/cli/adk_web_server.py, the /run_live WebSocket endpoint accepts modalities from query parameters without validating them against the model's capabilities:
# Lines 1636-1638
modalities: List[Literal["TEXT", "AUDIO"]] = Query(
default=["AUDIO"]
), # Only allows "TEXT" or "AUDIO"# Line 1655
run_config = RunConfig(response_modalities=modalities)The code at runners.py:985-988 only sets a default when response_modalities is None, but doesn't validate when it's explicitly set to TEXT for native-audio models.
Proposed Solution
- Detect native-audio models by checking if the model name contains "native-audio"
- Force AUDIO modality for these models instead of returning an error
- Ensure output_audio_transcription is enabled so users can see text transcription of the audio response
Implementation
In adk_web_server.py around line 1653-1655:
async def forward_events():
runner = await self.get_runner_async(app_name)
# Check if agent uses a native-audio model
agent_or_app = self.agent_loader.load_agent(app_name)
root_agent = self._get_root_agent(agent_or_app)
model_name = root_agent.model if isinstance(root_agent.model, str) else ""
# Native audio models only support AUDIO modality
if "native-audio" in model_name:
effective_modalities = ["AUDIO"]
else:
effective_modalities = modalities
run_config = RunConfig(response_modalities=effective_modalities)Additional Issue: Transcription not displayed in UI
When using AUDIO modality, output_audio_transcription is enabled by default in RunConfig, and transcription events are created by TranscriptionManager. However, the frontend does not currently render outputTranscription.text - it only displays content.parts[].text.
This means users won't see any text when using native-audio models, even though transcription data is available in the events.
Options to address this:
- Update the frontend to display
outputTranscription.textwhen present - Convert transcription to content in the backend so existing UI can display it
Affected Files
src/google/adk/cli/adk_web_server.py- WebSocket endpoint for live sessionssrc/google/adk/runners.py- Default modality handlingsrc/google/adk/flows/llm_flows/transcription_manager.py- Transcription event creation