Use when you need to intercept keyboard input from the human temporarily. Examples: "ask the user for approval before running a command", "build a selection menu in the terminal", "capture text input from the user".
64
76%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/input-capture/SKILL.mdIMPORTANT: EXECUTION CONTEXT This skill describes what to do — domain patterns and decision-making. It does NOT describe how to call the API.
- If you have
wsh_*tools (check your toolkit forwsh_send_input,wsh_get_screen, etc.): use them directly. Operation names in this skill generally map to tool names (e.g., "send input" →wsh_send_input). When in doubt, list your availablewsh_*tools.- If you do NOT have
wsh_*tools: you are in HTTP/curl fallback mode. DO NOT GUESS endpoints or CLI subcommands. Load the full API reference first: search your workspace forskills/core/and readSKILL.md. It contains every endpoint with working curl examples and a bootstrap sequence.- Quick bootstrap:
curl -sf --unix-socket ${XDG_RUNTIME_DIR:-/tmp}/wsh/default.http.sock http://localhost/health— if that fails:wsh server -L agent-$$ --ephemeral &and retry.
Input capture lets you temporarily take over the keyboard. While active, keystrokes from the human go to you instead of the shell. The terminal is frozen — nothing the human types reaches the PTY. You decide what to do with each keystroke.
capture input # grab the keyboard
# Keystrokes now go to subscribers, not the PTY
# Do your thing — build a menu, ask a question, etc.
release input # give it backThe human can press Ctrl+\ at any time to toggle input capture. If input is in passthrough mode, Ctrl+\ enters capture. If input is already captured, Ctrl+\ releases it. This is a physical toggle — never disable it, never tell the human to avoid it. It's their escape hatch.
When the human presses Ctrl+\ to enter capture mode manually, this is a signal that they want to interact with you. If you don't already have overlays or panels visible, this is your cue to create UI elements and engage. Subscribe to mode change events to detect this.
Captured keystrokes arrive via WebSocket event subscription
(see the core skill for connection mechanics). Subscribe to
input events. Each event includes:
raw — the byte sequenceparsed — structured key information (key name,
modifiers like ctrl, alt, shift)Use parsed when you want to understand what key was
pressed. Use raw when you need to forward the exact
bytes somewhere.
get input mode → "passthrough" or "capture"Always check before capturing. If input is already captured (by another agent or process), don't capture again without understanding why.
When input is captured, you can direct it to a specific
overlay or panel by setting focus. The element must be
created with focusable: true. At most one element has
focus at a time.
Focus is a logical association — it tells the system (and any listening clients) which UI element the captured input belongs to. This is useful when you have multiple overlays or panels visible and want to clarify which one is "active."
create overlay (focusable: true) → get id
capture input
set focus to overlay id
# Input events are now associated with this overlay.
# The element may receive visual focus indicators
# (e.g., highlighted border) depending on the client.
# Switch focus to a different element:
set focus to another-element-id
# Clear focus:
unfocusFocus is automatically cleared when:
Don't overcomplicate focus management. For a single dialog or menu, you often don't need explicit focus — you're the only consumer of captured input, and you know which overlay you're updating. Focus becomes valuable when multiple elements are visible and you want to signal which one is "live."
The most common use of input capture: ask the human a yes-or-no question and wait for their answer.
1. Show the question (overlay or panel)
2. Capture input
3. Wait for a keystroke
4. Interpret the keystroke
5. Release input
6. Remove the visual prompt
7. Act on the answer# Show the prompt
create overlay:
"┌─ Confirm ──────────────────────┐"
"│ Delete 47 files from /build ? │"
"│ [Y]es [N]o │"
"└────────────────────────────────┘"
# Capture input
capture input
# Read keystroke via WebSocket
receive input event
if key == "y" or key == "Y":
proceed with deletion
else:
cancel
# Release and clean up
release input
delete overlayEvery prompt must accept a "no" or "cancel" keystroke. Never build a prompt where the only option is "yes." Show the available keys clearly in the prompt so the human isn't guessing.
Let the human choose from a list of options using arrow keys and Enter.
# Show the menu with one item highlighted
create overlay:
"┌─ Select environment ──────┐"
"│ development │"
"│ ▸ staging │"
"│ production │"
"└───────────────────────────┘"
# Capture input
capture input
# Handle navigation
receive input events in a loop:
Arrow Up / k → move highlight up
Arrow Down / j → move highlight down
Enter → confirm selection
Escape / q → cancel
# After each navigation keystroke, update the overlay
# to reflect the new highlight position
# Release and clean up
release input
delete overlayTrack the selected index yourself. On each arrow key, update the index, rebuild the spans with the highlight on the new item, and update the overlay.
Capture free-form text from the human — a filename, a commit message, a search query.
create overlay:
"┌─ Session name ────────────┐"
"│ > _ │"
"└───────────────────────────┘"
capture input
buffer = ""
receive input events in a loop:
printable character → append to buffer
Backspace → remove last character
Enter → confirm
Escape → cancel
# After each keystroke, update the overlay to show
# the current buffer:
"│ > my-session_ │"
release input
delete overlayYou're building a tiny text editor. Handle at least: character input, backspace, enter to confirm, escape to cancel. Don't try to build a full readline — keep it simple.
Chain prompts together for workflows that need several pieces of information:
Step 1: Select environment (menu)
Step 2: Enter version tag (text input)
Step 3: Confirm deployment (yes/no)Keep input captured across all steps. Show a progress indicator so the human knows where they are:
"Step 2 of 3 — Enter version tag"Use focus routing to track which dialog step currently has input. As you advance through steps, move focus to the overlay or panel representing the current step. This signals to the system (and the human) which element is active.
If the human presses Escape at any step, cancel the entire flow and release input. Don't trap them in a multi-step dialog they can't exit.
Every moment input is captured, the human cannot use their terminal. This is disruptive. Capture as late as possible, release as early as possible:
Bad: capture → build UI → show prompt → wait
Good: build UI → show prompt → capture → waitPrepare everything before you grab the keyboard. The human should never see a captured terminal with nothing on screen explaining why.
A captured terminal with no visual explanation is terrifying. The human types and nothing happens. They don't know if the terminal is frozen, crashed, or waiting. Before or simultaneously with capturing input, always display an overlay or panel explaining what you're asking and what keys to press.
The human may press keys you didn't anticipate. Don't crash or behave erratically. Ignore keys you don't handle:
if key in expected_keys:
handle it
else:
ignore, do nothingDon't beep, flash, or scold. Just do nothing for unrecognized keys.
Input is either captured or it isn't — there's no nesting. If you capture while already captured, you're still in the same capture session. Design your flows to be flat: capture once, do your multi-step dialog, release once.
The human can toggle capture at any time with Ctrl+. Your code must handle this gracefully. If you're mid-dialog and input is suddenly released:
Conversely, if the human presses Ctrl+\ to enter capture mode and you have no UI, consider this an invitation to engage — create appropriate overlays or panels and start interacting.
Check the input mode if you're unsure whether you still have capture.
Input capture is the right tool for real-time keystroke interaction — menus, approvals, text input that needs character-by-character handling. If you just need an answer to a question and latency doesn't matter, consider using the conversation instead. It's less disruptive and gives the human more room to think.
4863aaf
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.