CtrlK
BlogDocsLog inGet started
Tessl Logo

input-capture

Use when you need to intercept keyboard input from the human temporarily. Examples: "ask the user for approval before running a command", "build a selection menu in the terminal", "capture text input from the user".

64

Quality

76%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./skills/input-capture/SKILL.md
SKILL.md
Quality
Evals
Security

IMPORTANT: EXECUTION CONTEXT This skill describes what to do — domain patterns and decision-making. It does NOT describe how to call the API.

  1. If you have wsh_* tools (check your toolkit for wsh_send_input, wsh_get_screen, etc.): use them directly. Operation names in this skill generally map to tool names (e.g., "send input" → wsh_send_input). When in doubt, list your available wsh_* tools.
  2. If you do NOT have wsh_* tools: you are in HTTP/curl fallback mode. DO NOT GUESS endpoints or CLI subcommands. Load the full API reference first: search your workspace for skills/core/ and read SKILL.md. It contains every endpoint with working curl examples and a bootstrap sequence.
  3. Quick bootstrap: curl -sf --unix-socket ${XDG_RUNTIME_DIR:-/tmp}/wsh/default.http.sock http://localhost/health — if that fails: wsh server -L agent-$$ --ephemeral & and retry.

wsh:input-capture — Intercepting Keyboard Input

Input capture lets you temporarily take over the keyboard. While active, keystrokes from the human go to you instead of the shell. The terminal is frozen — nothing the human types reaches the PTY. You decide what to do with each keystroke.

The Mechanism

capture input       # grab the keyboard
# Keystrokes now go to subscribers, not the PTY
# Do your thing — build a menu, ask a question, etc.
release input       # give it back

The human can press Ctrl+\ at any time to toggle input capture. If input is in passthrough mode, Ctrl+\ enters capture. If input is already captured, Ctrl+\ releases it. This is a physical toggle — never disable it, never tell the human to avoid it. It's their escape hatch.

When the human presses Ctrl+\ to enter capture mode manually, this is a signal that they want to interact with you. If you don't already have overlays or panels visible, this is your cue to create UI elements and engage. Subscribe to mode change events to detect this.

Reading Captured Input

Captured keystrokes arrive via WebSocket event subscription (see the core skill for connection mechanics). Subscribe to input events. Each event includes:

  • raw — the byte sequence
  • parsed — structured key information (key name, modifiers like ctrl, alt, shift)

Use parsed when you want to understand what key was pressed. Use raw when you need to forward the exact bytes somewhere.

Check the Current Mode

get input mode → "passthrough" or "capture"

Always check before capturing. If input is already captured (by another agent or process), don't capture again without understanding why.

Focus Routing

When input is captured, you can direct it to a specific overlay or panel by setting focus. The element must be created with focusable: true. At most one element has focus at a time.

Focus is a logical association — it tells the system (and any listening clients) which UI element the captured input belongs to. This is useful when you have multiple overlays or panels visible and want to clarify which one is "active."

create overlay (focusable: true) → get id
capture input
set focus to overlay id

# Input events are now associated with this overlay.
# The element may receive visual focus indicators
# (e.g., highlighted border) depending on the client.

# Switch focus to a different element:
set focus to another-element-id

# Clear focus:
unfocus

Focus is automatically cleared when:

  • Input is released back to passthrough
  • The focused element is deleted

Don't overcomplicate focus management. For a single dialog or menu, you often don't need explicit focus — you're the only consumer of captured input, and you know which overlay you're updating. Focus becomes valuable when multiple elements are visible and you want to signal which one is "live."

Approval Workflows

The most common use of input capture: ask the human a yes-or-no question and wait for their answer.

The Pattern

1. Show the question (overlay or panel)
2. Capture input
3. Wait for a keystroke
4. Interpret the keystroke
5. Release input
6. Remove the visual prompt
7. Act on the answer

Example: Confirm a Dangerous Command

# Show the prompt
create overlay:
  "┌─ Confirm ──────────────────────┐"
  "│ Delete 47 files from /build ?  │"
  "│         [Y]es    [N]o          │"
  "└────────────────────────────────┘"

# Capture input
capture input

# Read keystroke via WebSocket
receive input event
if key == "y" or key == "Y":
    proceed with deletion
else:
    cancel

# Release and clean up
release input
delete overlay

Always Provide a Way Out

Every prompt must accept a "no" or "cancel" keystroke. Never build a prompt where the only option is "yes." Show the available keys clearly in the prompt so the human isn't guessing.

Selection Menus

Let the human choose from a list of options using arrow keys and Enter.

The Pattern

# Show the menu with one item highlighted
create overlay:
  "┌─ Select environment ──────┐"
  "│   development             │"
  "│ ▸ staging                 │"
  "│   production              │"
  "└───────────────────────────┘"

# Capture input
capture input

# Handle navigation
receive input events in a loop:
    Arrow Up / k   → move highlight up
    Arrow Down / j → move highlight down
    Enter          → confirm selection
    Escape / q     → cancel

# After each navigation keystroke, update the overlay
# to reflect the new highlight position

# Release and clean up
release input
delete overlay

Track the selected index yourself. On each arrow key, update the index, rebuild the spans with the highlight on the new item, and update the overlay.

Text Input

Capture free-form text from the human — a filename, a commit message, a search query.

The Pattern

create overlay:
  "┌─ Session name ────────────┐"
  "│ > _                       │"
  "└───────────────────────────┘"

capture input

buffer = ""
receive input events in a loop:
    printable character → append to buffer
    Backspace          → remove last character
    Enter              → confirm
    Escape             → cancel

# After each keystroke, update the overlay to show
# the current buffer:
"│ > my-session_               │"

release input
delete overlay

You're building a tiny text editor. Handle at least: character input, backspace, enter to confirm, escape to cancel. Don't try to build a full readline — keep it simple.

Multi-Step Dialogs

Chain prompts together for workflows that need several pieces of information:

Step 1: Select environment  (menu)
Step 2: Enter version tag   (text input)
Step 3: Confirm deployment  (yes/no)

Keep input captured across all steps. Show a progress indicator so the human knows where they are:

"Step 2 of 3 — Enter version tag"

Use focus routing to track which dialog step currently has input. As you advance through steps, move focus to the overlay or panel representing the current step. This signals to the system (and the human) which element is active.

If the human presses Escape at any step, cancel the entire flow and release input. Don't trap them in a multi-step dialog they can't exit.

Pitfalls

Minimize Capture Duration

Every moment input is captured, the human cannot use their terminal. This is disruptive. Capture as late as possible, release as early as possible:

Bad:  capture → build UI → show prompt → wait
Good: build UI → show prompt → capture → wait

Prepare everything before you grab the keyboard. The human should never see a captured terminal with nothing on screen explaining why.

Always Show What's Happening

A captured terminal with no visual explanation is terrifying. The human types and nothing happens. They don't know if the terminal is frozen, crashed, or waiting. Before or simultaneously with capturing input, always display an overlay or panel explaining what you're asking and what keys to press.

Handle Unexpected Input

The human may press keys you didn't anticipate. Don't crash or behave erratically. Ignore keys you don't handle:

if key in expected_keys:
    handle it
else:
    ignore, do nothing

Don't beep, flash, or scold. Just do nothing for unrecognized keys.

Don't Nest Captures

Input is either captured or it isn't — there's no nesting. If you capture while already captured, you're still in the same capture session. Design your flows to be flat: capture once, do your multi-step dialog, release once.

Remember Ctrl+\ Toggles

The human can toggle capture at any time with Ctrl+. Your code must handle this gracefully. If you're mid-dialog and input is suddenly released:

  • Your WebSocket will stop receiving input events in capture mode
  • Your overlay is still showing a stale prompt
  • Clean up: remove the overlay, abandon the flow
  • Don't re-capture without the human's consent

Conversely, if the human presses Ctrl+\ to enter capture mode and you have no UI, consider this an invitation to engage — create appropriate overlays or panels and start interacting.

Check the input mode if you're unsure whether you still have capture.

Don't Capture for Information You Could Ask Differently

Input capture is the right tool for real-time keystroke interaction — menus, approvals, text input that needs character-by-character handling. If you just need an answer to a question and latency doesn't matter, consider using the conversation instead. It's less disruptive and gives the human more room to think.

Repository
deepgram/wsh
Last updated
Created

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.