Skip to content

Conversation

@1234-ad
Copy link

@1234-ad 1234-ad commented Jan 26, 2026

Description

This PR adds a Raycast extension that allows users to control Cap screen recordings directly from Raycast using the existing deep link protocol.

Changes

Backend Updates (apps/desktop/src-tauri/src/deeplink_actions.rs)

  • Updated DeepLinkAction enum to include recording control actions:
    • StopRecording - Stop the current recording
    • PauseRecording - Pause the current recording
    • ResumeRecording - Resume a paused recording
    • TogglePauseRecording - Toggle between pause/resume states
    • SwitchCamera - Switch to a different camera
    • SwitchMicrophone - Switch to a different microphone
  • Implemented action handlers that call existing Tauri commands
  • All actions use the existing cap://action deep link protocol

Raycast Extension (extensions/raycast/)

Created a complete Raycast extension with the following commands:

  1. Start Recording (start-recording.tsx)

    • Interactive UI to choose capture mode (screen/window)
    • Select recording mode (Studio/Instant)
    • Option to enable system audio capture
  2. Stop Recording (stop-recording.tsx)

    • Instantly stops the current recording
    • No-view command for quick execution
  3. Pause Recording (pause-recording.tsx)

    • Pauses the current recording
    • No-view command for quick execution
  4. Resume Recording (resume-recording.tsx)

    • Resumes a paused recording
    • No-view command for quick execution
  5. Toggle Pause (toggle-pause.tsx)

    • Toggles between pause and resume states
    • No-view command for quick execution
  6. Switch Camera (switch-camera.tsx)

    • Interactive UI to select available cameras
    • Switches camera during recording
  7. Switch Microphone (switch-microphone.tsx)

    • Interactive UI to select available microphones
    • Switches microphone during recording

Supporting Files

  • utils.ts - TypeScript types and deep link execution helper
  • package.json - Extension metadata and dependencies
  • tsconfig.json - TypeScript configuration
  • README.md - Documentation for installation and usage

Technical Details

  • Uses Cap's existing cap://action deep link protocol
  • All backend functionality already exists in the Tauri commands
  • Extension communicates via URL schemes, no API changes needed
  • TypeScript-based with full type safety
  • Follows Raycast extension best practices

Testing

The extension can be tested by:

  1. Installing dependencies: cd extensions/raycast && npm install
  2. Running in dev mode: npm run dev
  3. Testing each command from Raycast

Benefits

  • Quick Access: Control recordings without switching to Cap UI
  • Keyboard-Driven: Leverage Raycast's keyboard-first workflow
  • Lightweight: Uses existing deep link protocol, no new APIs
  • Extensible: Easy to add more commands in the future

Future Enhancements

  • Fetch actual available cameras/microphones from Cap
  • Add recording status indicator
  • Support for opening recent recordings
  • Quick access to Cap settings

Related Issues

Closes #[issue-number] (if applicable)


Note: This extension requires Cap desktop application to be installed and running. It's designed for macOS users who use Raycast as their launcher.

Greptile Overview

Greptile Summary

Adds a Raycast extension for controlling Cap recordings via the existing cap://action deep link protocol, enabling keyboard-driven recording control without opening the Cap UI.

Major changes:

  • Extended DeepLinkAction enum in Rust backend with 6 new recording control actions (pause, resume, toggle pause, camera/mic switching)
  • Created 7 Raycast commands with TypeScript UI components and deep link execution logic
  • All backend handlers call existing Tauri commands (pause_recording, resume_recording, set_camera_input, set_mic_input)

Issues found:

  • Missing cap-icon.png file will cause extension installation to fail
  • Hardcoded placeholder values in start-recording, switch-camera, and switch-microphone commands will not match actual device names, causing runtime failures
  • The backend correctly validates screen/window names and will return errors like "No screen with name 'Default Screen'" when placeholders are used

Confidence Score: 2/5

  • This PR has critical issues that will prevent core functionality from working
  • The Rust backend implementation is solid and integrates cleanly with existing commands, but the Raycast extension has multiple blocking issues: missing icon file will prevent installation/publication, and hardcoded placeholder device names in start-recording, switch-camera, and switch-microphone will cause these commands to fail at runtime with validation errors from the backend
  • Pay close attention to extensions/raycast/package.json (missing icon), extensions/raycast/src/start-recording.tsx, extensions/raycast/src/switch-camera.tsx, and extensions/raycast/src/switch-microphone.tsx (hardcoded placeholders)

Important Files Changed

Filename Overview
apps/desktop/src-tauri/src/deeplink_actions.rs Added new recording control actions (pause, resume, toggle, camera/mic switching) to deep link enum with proper handler implementations
extensions/raycast/package.json Defines Raycast extension metadata with 7 commands; missing icon file referenced as cap-icon.png
extensions/raycast/src/start-recording.tsx UI for starting recordings with capture/mode selection; uses placeholder screen/window names instead of actual values
extensions/raycast/src/switch-camera.tsx Camera selection UI with hardcoded placeholder camera list instead of fetching actual devices
extensions/raycast/src/switch-microphone.tsx Microphone selection UI with hardcoded placeholder microphone list instead of fetching actual devices

Sequence Diagram

sequenceDiagram
    participant User
    participant Raycast
    participant Extension
    participant DeepLink
    participant CapApp
    participant Recording

    User->>Raycast: Trigger command
    Raycast->>Extension: Execute command
    Extension->>Extension: Build action JSON
    Extension->>DeepLink: Open cap://action?value={encoded_json}
    DeepLink->>CapApp: Parse deep link URL
    CapApp->>CapApp: Deserialize action enum
    
    alt Start Recording
        CapApp->>CapApp: Set camera/mic inputs
        CapApp->>CapApp: Resolve screen/window name
        CapApp->>Recording: start_recording()
    else Stop/Pause/Resume
        CapApp->>Recording: stop/pause/resume_recording()
    else Switch Camera/Mic
        CapApp->>CapApp: set_camera_input/set_mic_input()
    end
    
    CapApp-->>User: Action executed
Loading

(2/5) Greptile learns from your feedback when you react with thumbs up/down!

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

"name": "cap",
"title": "Cap",
"description": "Control Cap screen recording from Raycast",
"icon": "cap-icon.png",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Icon file cap-icon.png is referenced here but isn’t included in this PR; Raycast will fail to load the extension icon unless it exists. Consider adding it or switching to a built-in icon.

mic_label: string;
}

export type DeepLinkAction =
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit enum variants in the Rust DeepLinkAction deserialize from a JSON string (e.g. "stop_recording"), not an object like { stop_recording: {} }. JSON.stringify() will do the right thing if you pass a string here.

Suggested change
export type DeepLinkAction =
export type DeepLinkAction =
| { start_recording: StartRecordingOptions }
| "stop_recording"
| "pause_recording"
| "resume_recording"
| "toggle_pause_recording"
| { switch_camera: SwitchCameraOptions }
| { switch_microphone: SwitchMicrophoneOptions };

title: "Stopping recording...",
});

await executeCapAction({ stop_recording: {} });
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the enum serialization, this should be a JSON string action.

Suggested change
await executeCapAction({ stop_recording: {} });
await executeCapAction("stop_recording");

title: "Pausing recording...",
});

await executeCapAction({ pause_recording: {} });
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the enum serialization, this should be a JSON string action.

Suggested change
await executeCapAction({ pause_recording: {} });
await executeCapAction("pause_recording");

title: "Resuming recording...",
});

await executeCapAction({ resume_recording: {} });
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the enum serialization, this should be a JSON string action.

Suggested change
await executeCapAction({ resume_recording: {} });
await executeCapAction("resume_recording");

title: "Toggling pause...",
});

await executeCapAction({ toggle_pause_recording: {} });
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the enum serialization, this should be a JSON string action.

Suggested change
await executeCapAction({ toggle_pause_recording: {} });
await executeCapAction("toggle_pause_recording");

title: "Starting recording...",
});

// For simplicity, using default screen/window
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: this repo avoids code comments; you can drop these without losing clarity.

Suggested change
// For simplicity, using default screen/window
const capture_mode =

// In a real implementation, you'd want to list available screens/windows
const capture_mode =
captureType === "screen"
? { screen: "Default Screen" }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

capture_mode uses a display/window name and the backend matches it exactly. Default Screen / Default Window won’t exist, so this command will consistently fail unless you wire it up to real screen/window names (or adjust the deep link/backend to accept a default/ID).


await executeCapAction({
switch_camera: {
camera: cameraId,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backend expects DeviceOrModelID for camera (serde enum like { "DeviceID": "..." } or { "ModelID": "vid:pid" }), so a raw string here won’t deserialize. Probably want to update the payload/type accordingly (even if you still keep placeholder IDs for now).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant