Skip to content

Conversation

@sirtimid
Copy link
Contributor

@sirtimid sirtimid commented Jan 29, 2026

Closes #689

Summary

Implements a handshake protocol for kernel incarnation detection. When kernels connect, they exchange incarnation IDs to detect when a peer has lost its state (e.g., storage cleared) but retained the same peer ID.

Key Changes

Incarnation ID Management

  • Incarnation ID is persisted to storage via kernelStore.getOrCreateIncarnationId()
  • Survives kernel restarts but regenerates when storage is cleared (actual state loss)
  • Passed through RemoteManagerinitRemoteCommsinitTransport

Handshake Protocol (handshake.ts)

  • New module handles connection-level handshake exchange
  • Outbound connections: send handshake message, wait for handshakeAck
  • Inbound connections: wait for handshake, reply with handshakeAck
  • 10-second timeout for handshake completion
  • Type guard validates message structure including params.incarnationId

Incarnation Change Detection

  • PeerStateManager tracks remote incarnation IDs per peer
  • setRemoteIncarnation() returns true if incarnation changed from a known previous value
  • On change detection, onRemoteGiveUp callback triggers promise rejection for pending work

Promise Rejection Flow

  • onRemoteGiveUp(peerId)RemoteManager.#handleRemoteGiveUp()
  • Rejects all kernel promises where the remote is the decider
  • Calls RemoteHandle.rejectPendingRedemptions() for pending URL redemptions

Files Changed

File Change
handshake.ts New handshake protocol implementation
transport.ts Integration of handshake into connection flow
peer-state-manager.ts Incarnation tracking methods
reconnection-lifecycle.ts Handshake during reconnection
store/index.ts getOrCreateIncarnationId() method
Kernel.ts Incarnation ID initialization and reset handling
RemoteManager.ts Pass incarnation ID to transport
RemoteHandle.ts rejectPendingRedemptions() method

Testing

  • Unit tests for handshake protocol, type guards, and incarnation tracking
  • E2e test: detects incarnation change when peer restarts with fresh state
  • Previously disabled transport tests now implemented

🤖 Generated with Claude Code


Note

Medium Risk
Introduces a new connection-level handshake and propagates a new incarnationId parameter through the remote-comms stack; mistakes could break connectivity or cause unexpected remote give-ups/retries.

Overview
Adds kernel incarnation detection to remote comms by generating/persisting an incarnationId in KernelStore and threading it through RemoteManagerinitRemoteComms/RPC → PlatformServicesinitTransport.

transport now performs an explicit handshake (handshake/handshakeAck, with timeout) on both outbound connects and inbound accepts (including during reconnection), tracks remoteIncarnationId per peer, and triggers onRemoteGiveUp when a known peer reconnects with a different incarnation.

Updates reset semantics to optionally clear identity (including incarnationId), extends RPC validation for initializeRemoteComms, and adds/adjusts unit + e2e coverage for handshake behavior, reconnection ordering, and incarnation-change failure handling.

Written by Cursor Bugbot for commit 516cd65. This will update automatically on new commits. Configure here.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 29, 2026

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 88.49%
⬆️ +0.08%
5985 / 6763
🔵 Statements 88.38%
⬆️ +0.08%
6082 / 6881
🔵 Functions 87.06%
⬇️ -0.06%
1541 / 1770
🔵 Branches 84.66%
⬆️ +0.07%
2176 / 2570
File Coverage
File Stmts Branches Functions Lines Uncovered Lines
Changed Files
packages/kernel-browser-runtime/src/PlatformServicesClient.ts 96.66%
🟰 ±0%
81.48%
⬇️ -2.52%
89.47%
🟰 ±0%
96.66%
🟰 ±0%
106, 128
packages/kernel-browser-runtime/src/PlatformServicesServer.ts 95.23%
🟰 ±0%
88.88%
🟰 ±0%
80.95%
🟰 ±0%
95.23%
🟰 ±0%
140, 163, 194, 404
packages/ocap-kernel/src/Kernel.ts 93.5%
⬆️ +2.37%
85%
⬆️ +1.67%
92.3%
🟰 ±0%
93.5%
⬆️ +2.37%
116, 225-228, 410, 480, 524
packages/ocap-kernel/src/types.ts 100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
packages/ocap-kernel/src/remotes/kernel/RemoteManager.ts 98.3%
⬆️ +0.03%
100%
🟰 ±0%
100%
🟰 ±0%
98.3%
⬆️ +0.03%
133
packages/ocap-kernel/src/remotes/kernel/remote-comms.ts 100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
packages/ocap-kernel/src/remotes/platform/handshake.ts 96.42% 90.9% 75% 96.42% 76, 82
packages/ocap-kernel/src/remotes/platform/peer-state-manager.ts 100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
packages/ocap-kernel/src/remotes/platform/reconnection-lifecycle.ts 89.18%
⬆️ +0.61%
88.57%
⬆️ +0.70%
80%
🟰 ±0%
89.18%
⬆️ +0.61%
111-115, 135-136, 241-242
packages/ocap-kernel/src/remotes/platform/transport.ts 85.37%
⬇️ -1.22%
81.92%
⬆️ +0.39%
75.86%
⬆️ +0.86%
85.37%
⬇️ -1.22%
110, 146-149, 153-154, 191-200, 233, 267-285, 309, 395, 455, 476-477, 518, 548, 567, 576
packages/ocap-kernel/src/rpc/platform-services/initializeRemoteComms.ts 100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
packages/ocap-kernel/src/store/index.ts 100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
Generated in workflow #3474 for commit 516cd65 by the Vitest Coverage Report Action

@sirtimid sirtimid marked this pull request as ready for review January 29, 2026 13:58
@sirtimid sirtimid requested a review from a team as a code owner January 29, 2026 13:58
sirtimid and others added 19 commits January 29, 2026 19:51
Add Handshake and HandshakeAck message types to RemoteMessageBase
for kernel incarnation detection protocol. These types will be used
to exchange incarnation IDs during connection establishment.

Part of #689

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Generate unique incarnation ID (UUID) in Kernel constructor
- Store incarnation ID in memory only (not persisted)
- Pass incarnation ID through RemoteManager to remote-comms
- Update PlatformServices type and RPC handlers to accept incarnationId

The incarnation ID is used to detect when a peer has lost its state
and reconnected with the same peer ID.

Part of #689

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add remoteIncarnationId field to PeerState type
- Add setRemoteIncarnation() method that returns true on incarnation change
- Add getRemoteIncarnation() method for retrieving stored incarnation
- Add unit tests for incarnation tracking
- Update existing tests to include remoteIncarnationId field

Part of #689

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add handshake message types and detection logic
- Send handshake on outbound connections when incarnationId is provided
- Handle incoming handshake and reply with handshakeAck
- Handle incoming handshakeAck messages
- Track remote incarnation IDs in PeerStateManager
- Trigger onRemoteGiveUp callback when remote incarnation changes
- Update reconnection-lifecycle to pass isOutbound flag for handshake
- Add comprehensive unit tests for handshake protocol

The handshake protocol allows kernels to detect when a peer has lost
its state (incarnation changed) and trigger promise rejection for
any pending work with that peer.

Part of #689

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add e2e tests for the incarnation detection feature:
- Test that handshake is exchanged on connection establishment
- Test that incarnation change is detected when peer restarts with fresh state
- Mark test for message delivery after restart as todo (needs investigation)

Part of #689

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Replace crypto.randomUUID() with a manual UUID v4 generation using
crypto.getRandomValues(), which is supported in older browsers:
- randomUUID(): Chrome 92+, Firefox 95+, Safari 15.4+
- getRandomValues(): Chrome 11+, Firefox 21+, Safari 6.1+

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Move handshake to connection level in new handshake.ts module
- Persist incarnation ID to store via getOrCreateIncarnationId()
- Remove incarnationId parameter from RemoteManager (uses store)
- Consolidate identity reset logic in Kernel.resetKernelState()
- Remove Handshake/HandshakeAck from RemoteMessageBase (handled at transport)

The incarnation ID now persists across kernel restarts but resets when
storage is cleared, correctly detecting actual state loss vs. restarts.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Remove reconnectionHolder indirection pattern in transport.ts
  Use definite assignment assertion instead of holder object
- Remove duplicate writeWithTimeout from handshake.ts, import from channel-utils
- Fix PlatformServicesServer to accept incarnationId parameter
- Remove redundant e2e tests for handshake exchange and todo test

Co-Authored-By: Claude Opus 4.5 <[email protected]>
The isHandshakeMessage type guard was only checking for the method
property but not validating that params.incarnationId exists and is
a string. This could cause runtime errors when a malformed message
with the correct method but missing params or incarnationId was
received.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
The previous implementation used AbortSignal.timeout() which creates
an internal timer that continues running even after the read succeeds.
This could cause minor memory leaks as the timer and event listener
weren't properly cleaned up.

Now using AbortController with a manual setTimeout/clearTimeout to
ensure the timer is always cleaned up in the finally block.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Replace the definite assignment assertion (!) with a safer holder
pattern that throws a clear error if handleConnectionLoss is called
before initialization. This provides runtime safety if code is
refactored incorrectly in the future, while keeping TypeScript happy.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Remove unnecessary delay(3000) in incarnation detection test since
  await will naturally wait for the promise to settle
- Add cleanup for the fresh database file created during the test
  to avoid accumulating test artifacts on disk

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Implement the 5 tests that were marked as .todo() due to async
handshake flow complexity:

- times out after 10 seconds when write hangs
- handles timeout errors and triggers connection loss handling
- error message includes correct timeout duration
- handles multiple concurrent writes with timeout
- reuses existing channel when inbound connection arrives during
  reconnection dial

All tests now use proper mocking with makeAbortSignalMock to
simulate timeout behavior in a controllable way.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Extract the abort handler to a named function so it can be removed
in the finally block, preventing a minor memory leak where the
event listener would remain registered after the read completes.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
RemoteManager independently gets the incarnation ID from kernelStore,
making the Kernel field redundant.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
The HandshakeDeps interface defined getRemoteIncarnation but neither
performOutboundHandshake nor performInboundHandshake used it.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
…ient

The PlatformServicesClient.initializeRemoteComms method was missing the
incarnationId parameter, causing the handshake feature to silently fail
in the browser runtime.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@sirtimid sirtimid force-pushed the sirtimid/issue-689-incarnation-detection branch from 62c07ec to 606f03f Compare January 29, 2026 18:52
sirtimid and others added 2 commits January 29, 2026 21:01
…ages

If JSON.parse returns null, accessing parsed.method throws a TypeError
before the ?? 'unknown' fallback can evaluate. Use optional chaining to
safely handle null values.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Merged connection limit try-catch handling from main with handshake
logic from this branch. Also integrated closeRejectedChannel helper
for cleaner inbound connection handling.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

- Remove unused getRemoteIncarnation method from PeerStateManager
- Change Logger import to type-only import in handshake.ts
- Update tests to use getState() instead of removed method

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remote comms: Kernel incarnation detection

2 participants