CtrlK
BlogDocsLog inGet started
Tessl Logo

core

REQUIRED before any wsh terminal operation when you do NOT have wsh_* MCP tools. Contains the complete HTTP API reference with working curl examples, bootstrap sequence, and authentication guide. wsh has no CLI subcommands for programmatic use — do NOT run 'wsh <verb>' commands or guess endpoints. Load this skill first.

Invalid
This skill can't be scored yet
Validation errors are blocking scoring. Review and fix them to unlock Quality, Impact and Security scores. See what needs fixing →
SKILL.md
Quality
Evals
Security

wsh: Terminal as a Service

You have access to wsh, an API that gives you direct control over terminal sessions. You can see exactly what's on screen, send keystrokes, wait for commands to finish, and create visual elements — all programmatically.

Think of it this way: wsh gives you eyes (read the screen), hands (send input), patience (wait for output to settle), and a voice (overlays and panels to communicate with the human).

How It Works

wsh manages terminal sessions via a server daemon and exposes everything over an HTTP API served on a Unix domain socket. The human sees their normal terminal. You see a programmatic interface to the same session. Everything is synchronized — input you send appears on their screen, output they generate appears in your API calls. All endpoints are scoped to a session via /sessions/:name/ prefix (e.g., /sessions/default/input).

By default, wsh serves HTTP over a Unix domain socket at ${XDG_RUNTIME_DIR}/wsh/<name>.http.sock (default name: "default"). This is local-only and requires no authentication. TCP access is opt-in via --bind for remote scenarios.

MCP also available: wsh is also accessible as an MCP server (14 tools, 3 resources, 9 prompts) via Streamable HTTP at /mcp on the same socket, or via the wsh mcp stdio bridge. See the wsh:core-mcp prompt for MCP-specific guidance.

Getting Started

Before using the API, make sure a wsh server is reachable.

Step 1: Check for an existing server. A wsh server may already be running — try the default Unix socket:

WSH_SOCK=${XDG_RUNTIME_DIR:-/tmp}/wsh/default.http.sock
curl -sf --unix-socket $WSH_SOCK http://localhost/health

If this returns 200 OK, the server is reachable. Skip to step 3.

Step 2: Start a server (only if needed). Use -L with a unique name to avoid colliding with any existing wsh instance:

wsh server -L agent-$$ --ephemeral &
sleep 1
WSH_SOCK=${XDG_RUNTIME_DIR:-/tmp}/wsh/agent-$$.http.sock
# Wait for socket to appear
while [ ! -S "$WSH_SOCK" ]; do sleep 0.1; done

The socket path is deterministic from the instance name — no port discovery needed.

Step 3: Create a session. Sessions are where commands run. Create one via the API:

curl -s -X POST --unix-socket $WSH_SOCK http://localhost/sessions \
  -H "Content-Type: application/json" \
  -d '{"name": "work"}'

Returns {"name": "work", ...} on success.

Step 4: Use the send/wait/read loop. Now interact with your session using the API primitives described below. The fundamental loop:

# Send a command
curl -s -X POST --unix-socket $WSH_SOCK http://localhost/sessions/work/input -d $'ls -la\n'
# Wait for idle
curl -s --unix-socket $WSH_SOCK http://localhost/sessions/work/idle?timeout_ms=2000
# Read the screen
curl -s --unix-socket $WSH_SOCK http://localhost/sessions/work/screen?format=plain

Authentication

When wsh serves over a Unix domain socket (default), no authentication is needed — all connections are local and trusted.

When binding to a TCP address with --bind (e.g., --bind 0.0.0.0:7368), every request requires a Bearer token:

curl -H "Authorization: Bearer <token>" http://host:7368/sessions/default/screen

The token is either auto-generated on startup (printed to stderr) or set via --token / WSH_TOKEN. Retrieve it later with wsh token.

WebSocket connections from browsers use a ticket exchange: first acquire a short-lived ticket via POST /auth/ws-ticket with your Bearer token, then pass ?ticket=<nonce> on the WebSocket URL. Non-browser clients can use the Authorization header directly on the upgrade request.

The Fundamental Loop

Almost everything you do with wsh follows this pattern:

  1. Send — inject input into the terminal
  2. Wait — let the command run until output settles
  3. Read — see what's on screen now
  4. Decide — based on what you see, choose what to do next

This is your heartbeat. Learn it. A drive-process interaction is just this loop repeated until the task is done.

API Primitives

These are the building blocks. Every specialized skill builds on these.

Send Input

Inject keystrokes into the terminal. Supports raw bytes — use bash $'...' quoting for control characters.

curl -s -X POST --unix-socket $WSH_SOCK http://localhost/sessions/default/input -d 'ls -la'
curl -s -X POST --unix-socket $WSH_SOCK http://localhost/sessions/default/input -d $'ls -la\n'
curl -s -X POST --unix-socket $WSH_SOCK http://localhost/sessions/default/input -d $'\x03'        # Ctrl+C
curl -s -X POST --unix-socket $WSH_SOCK http://localhost/sessions/default/input -d $'\x1b'        # Escape
curl -s -X POST --unix-socket $WSH_SOCK http://localhost/sessions/default/input -d $'\x1b[A'      # Arrow Up
curl -s -X POST --unix-socket $WSH_SOCK http://localhost/sessions/default/input -d $'\t'          # Tab

Returns 204 (no content) on success.

Wait for Idle

Block until the terminal has been idle for timeout_ms milliseconds. This is a hint that the program may be idle — it could also just be working without producing output.

curl -s --unix-socket $WSH_SOCK http://localhost/sessions/default/idle?timeout_ms=2000

Returns the current screen snapshot plus a generation counter once idle. Returns 408 if the terminal doesn't settle within 30 seconds (configurable via max_wait_ms).

When polling repeatedly, pass back the generation from the previous response as last_generation to avoid busy-loop storms:

curl -s --unix-socket $WSH_SOCK 'http://localhost/sessions/default/idle?timeout_ms=2000&last_generation=42'

Or use fresh=true to always observe real silence (simpler, but always waits at least timeout_ms):

curl -s --unix-socket $WSH_SOCK 'http://localhost/sessions/default/idle?timeout_ms=2000&fresh=true'

Read the Screen

Get the current visible screen contents.

curl -s --unix-socket $WSH_SOCK http://localhost/sessions/default/screen?format=plain
curl -s --unix-socket $WSH_SOCK http://localhost/sessions/default/screen?format=styled

plain returns simple text lines. styled returns spans with color and formatting attributes.

Read Scrollback

Get historical output that has scrolled off screen.

curl -s --unix-socket $WSH_SOCK http://localhost/sessions/default/scrollback?format=plain&offset=0&limit=100

Use offset and limit to page through history.

Health Check

Verify wsh is running.

curl -s --unix-socket $WSH_SOCK http://localhost/health

Real-Time Events (WebSocket)

For monitoring and input capture, you need real-time event streaming. Connect to the JSON WebSocket:

websocat --ws-c-uri=ws://localhost/sessions/default/ws/json - ws-c:unix:$WSH_SOCK

After connecting, subscribe to the events you care about:

{"id": 1, "method": "subscribe", "params": {
  "events": ["lines", "input"],
  "format": "plain",
  "idle_timeout_ms": 1000
}}

Available event types:

  • lines — new lines of output
  • cursor — cursor movement
  • mode — alternate screen toggled
  • diffs — batched screen changes
  • input — keyboard input (essential for input capture)

The server pushes events as they happen. It also sends periodic sync snapshots when the terminal goes idle (controlled by idle_timeout_ms).

Under high output, the server coalesces events automatically: instead of individual updates, you get periodic sync snapshots at interval_ms intervals (default 100ms). This is transparent — handle sync events the same way you handle the initial sync after subscribing.

For a different session, replace default with the session name:

websocat --ws-c-uri=ws://localhost/sessions/build/ws/json - ws-c:unix:$WSH_SOCK

You can also send requests over the WebSocket instead of HTTP — get_screen, send_input, resize, capture_input, release_input, focus, unfocus, get_focus, get_screen_mode, enter_alt_screen, exit_alt_screen, etc. Same capabilities, persistent connection.

Visual Elements

Overlays

Floating text positioned on top of terminal content. They don't affect the terminal — they're a layer on top.

# Create an overlay at position (0, 0) with explicit size
curl -s -X POST --unix-socket $WSH_SOCK http://localhost/sessions/default/overlay \
  -H "Content-Type: application/json" \
  -d '{"x": 0, "y": 0, "width": 20, "height": 1,
       "spans": [{"text": "Hello!", "bold": true}]}'

# Returns {"id": "uuid"} — use this to update or delete it
curl -s -X DELETE --unix-socket $WSH_SOCK http://localhost/sessions/default/overlay/{id}
curl -s -X DELETE --unix-socket $WSH_SOCK http://localhost/sessions/default/overlay          # clear all

Opaque overlays: Add background to fill the rectangle with a solid color, making it a window-like element:

curl -s -X POST --unix-socket $WSH_SOCK http://localhost/sessions/default/overlay \
  -H "Content-Type: application/json" \
  -d '{"x": 10, "y": 5, "width": 40, "height": 10,
       "background": {"bg": "black"},
       "spans": [{"text": "Window content"}]}'

Background accepts named colors ("bg": "blue") or RGB ("bg": {"r": 30, "g": 30, "b": 30}).

Named spans: Give spans an id for targeted updates:

curl -s -X POST --unix-socket $WSH_SOCK http://localhost/sessions/default/overlay \
  -H "Content-Type: application/json" \
  -d '{"x": 0, "y": 0, "width": 30, "height": 1,
       "spans": [
        {"id": "label", "text": "Status: ", "bold": true},
        {"id": "value", "text": "running", "fg": "green"}
      ]}'

# Update named spans by id (POST with array of span updates)
curl -s -X POST --unix-socket $WSH_SOCK http://localhost/sessions/default/overlay/{id}/spans \
  -H "Content-Type: application/json" \
  -d '{"spans": [{"id": "value", "text": "stopped", "fg": "red"}]}'

Region writes: Place styled text at specific (row, col) offsets:

curl -s -X POST --unix-socket $WSH_SOCK http://localhost/sessions/default/overlay/{id}/write \
  -H "Content-Type: application/json" \
  -d '{"writes": [{"row": 2, "col": 5, "text": "Hello", "bold": true}]}'

Focusable: Add focusable: true to allow focus routing during input capture (see Input Capture below).

Use overlays for: tooltips, status indicators, annotations, notifications — anything that should appear on top of the terminal without disrupting it. With explicit dimensions: windows, dialogs, cards.

Panels

Agent-owned screen regions at the top or bottom of the terminal. Unlike overlays, panels shrink the PTY — they carve out dedicated space.

curl -s -X POST --unix-socket $WSH_SOCK http://localhost/sessions/default/panel \
  -H "Content-Type: application/json" \
  -d '{"position": "bottom", "height": 3, "spans": [{"text": "Status: running"}]}'

Background: Add background to fill the panel with a solid color:

curl -s -X POST --unix-socket $WSH_SOCK http://localhost/sessions/default/panel \
  -H "Content-Type: application/json" \
  -d '{"position": "bottom", "height": 2,
       "background": {"bg": "blue"},
       "spans": [{"text": "Status: ok"}]}'

Named spans: Same as overlays — give spans an id for targeted updates via POST with an array of span updates:

curl -s -X POST --unix-socket $WSH_SOCK http://localhost/sessions/default/panel/{id}/spans \
  -H "Content-Type: application/json" \
  -d '{"spans": [{"id": "status", "text": "3 errors", "fg": "red"}]}'

Region writes: Place text at specific (row, col) offsets:

curl -s -X POST --unix-socket $WSH_SOCK http://localhost/sessions/default/panel/{id}/write \
  -H "Content-Type: application/json" \
  -d '{"writes": [{"row": 0, "col": 10, "text": "updated", "bold": true}]}'

Focusable: Add focusable: true to allow focus routing during input capture.

Use panels for: persistent status bars, progress displays, context summaries — anything that deserves its own screen real estate.

Input Capture

Intercept keyboard input so it comes to you instead of the shell.

curl -s -X POST --unix-socket $WSH_SOCK http://localhost/sessions/default/input/capture    # grab input
curl -s -X POST --unix-socket $WSH_SOCK http://localhost/sessions/default/input/release    # release back

While captured, keystrokes are available via WebSocket subscription instead of going to the PTY. The human can press Ctrl+\ to toggle capture mode (it switches between passthrough and capture).

Focus routing: Direct captured input to a specific focusable overlay or panel. At most one element has focus at a time.

curl -s -X POST --unix-socket $WSH_SOCK http://localhost/sessions/default/input/focus \
  -H "Content-Type: application/json" \
  -d '{"id": "overlay-uuid"}'

curl -s --unix-socket $WSH_SOCK http://localhost/sessions/default/input/focus               # get current focus
curl -s -X POST --unix-socket $WSH_SOCK http://localhost/sessions/default/input/unfocus     # clear focus

Focus is automatically cleared when input is released or when the focused element is deleted.

Use input capture for: approval prompts, custom menus, interactive dialogs between you and the human.

Alternate Screen Mode

Enter a separate screen mode where you can create a completely independent set of overlays and panels. Exiting cleans up everything automatically.

curl -s --unix-socket $WSH_SOCK http://localhost/sessions/default/screen_mode                  # get current mode
curl -s -X POST --unix-socket $WSH_SOCK http://localhost/sessions/default/screen_mode/enter_alt  # enter alt screen
curl -s -X POST --unix-socket $WSH_SOCK http://localhost/sessions/default/screen_mode/exit_alt   # exit alt screen

Overlays and panels are automatically tagged with the screen mode active at the time of creation. List endpoints return only elements belonging to the current mode. When you exit alt screen, all elements created in alt mode are deleted and the original screen's elements are restored.

Use alt screen mode for: temporary full-screen agent UIs, setup wizards, immersive dashboards — anything that needs a clean canvas and should leave no trace when done.

Session Management

wsh always runs as a server daemon managing sessions. The sessions endpoint is always available:

curl -s --unix-socket $WSH_SOCK http://localhost/sessions

Creating Sessions

curl -s -X POST --unix-socket $WSH_SOCK http://localhost/sessions \
  -H "Content-Type: application/json" \
  -d '{"name": "build", "command": "cargo build", "tags": ["build", "ci"]}'

Returns {"name": "build", "tags": ["build", "ci"]} on success.

Session name rules: Names must be 1-64 characters and contain only letters, digits, dots, hyphens, and underscores ([a-zA-Z0-9._-]). Invalid names return 400 invalid_session_name. If omitted, the server auto-generates a valid name.

Terminal dimensions: rows and cols are clamped to 1-1000.

Tags are optional string labels (1-64 chars, alphanumeric plus hyphens, underscores, and dots). Use them to group and filter sessions by purpose.

Interacting with a Specific Session

All the primitives work per-session by adding /sessions/:name/ as a prefix:

curl -s -X POST --unix-socket $WSH_SOCK http://localhost/sessions/build/input -d $'cargo test\n'
curl -s --unix-socket $WSH_SOCK http://localhost/sessions/build/idle?timeout_ms=2000
curl -s --unix-socket $WSH_SOCK http://localhost/sessions/build/screen?format=plain

Overlays, panels, and input capture are also per-session.

Filtering Sessions by Tag

curl -s --unix-socket $WSH_SOCK 'http://localhost/sessions?tag=build,test'

Returns only sessions that have at least one of the specified tags (union/OR semantics).

Updating Tags

curl -s -X PATCH --unix-socket $WSH_SOCK http://localhost/sessions/build \
  -H "Content-Type: application/json" \
  -d '{"add_tags": ["production"], "remove_tags": ["draft"]}'

Tags can be added and removed alongside a rename in a single PATCH.

Wait for Idle on Any Session

You can race idle detection across all sessions (or a tag-filtered subset):

curl -s --unix-socket $WSH_SOCK 'http://localhost/idle?timeout_ms=2000&format=plain'

Returns the first session to become idle, including its name:

{"session": "build", "screen": {...}, "scrollback_lines": 42, "generation": 7}

To avoid re-returning the same session, pass last_session and last_generation from the previous response:

curl -s --unix-socket $WSH_SOCK 'http://localhost/idle?timeout_ms=2000&last_session=build&last_generation=7'

To scope idle detection to specific tags:

curl -s --unix-socket $WSH_SOCK 'http://localhost/idle?timeout_ms=2000&tag=build'

Returns 404 (no_sessions) if no sessions exist. Returns 408 if no session settles within max_wait_ms.

Session Lifecycle

curl -s --unix-socket $WSH_SOCK http://localhost/sessions              # list all
curl -s --unix-socket $WSH_SOCK 'http://localhost/sessions?tag=build'  # list by tag
curl -s --unix-socket $WSH_SOCK http://localhost/sessions/build         # get info
curl -s -X PATCH --unix-socket $WSH_SOCK http://localhost/sessions/build \
  -H "Content-Type: application/json" \
  -d '{"name": "build-v2"}'                         # rename (same name rules apply)
curl -s -X PATCH --unix-socket $WSH_SOCK http://localhost/sessions/build \
  -H "Content-Type: application/json" \
  -d '{"add_tags": ["ci"]}'                         # add tags
curl -s -X DELETE --unix-socket $WSH_SOCK http://localhost/sessions/build  # kill

Default Session

When wsh is started with wsh (no arguments), it auto-spawns an ephemeral server daemon (UDS-only, no TCP) and creates a session named default. Use /sessions/default/ prefix for all endpoints. If started with --name, the session has that name instead. Tags can be set at startup with --tag.

Federation

wsh supports federation — a cluster of wsh servers where one hub orchestrates multiple backends. Session operations accept an optional server parameter to target a specific server in the cluster by hostname.

When federation is configured:

  • Session creation can target a specific backend server
  • Session listing aggregates across all healthy backends
  • All session operations (input, screen, idle, overlays, etc.) are transparently routed to the server that owns the session
  • Server management operations let you list servers, check health, add/remove backends at runtime

Without federation, everything works as a single server — the server parameter is simply absent. Federation is opt-in and does not change the behavior of single-server deployments.

For detailed patterns on distributed session management, health monitoring, failure handling, and cross-server workflows, see the wsh:cluster-orchestration skill.

Specialized Skills

When your task matches one of these patterns, invoke the corresponding skill for detailed guidance.

wsh:drive-process — You need to run a CLI command and interact with it. Sending input, reading output, handling prompts, navigating sequential command-and-response workflows.

wsh:tui — You need to operate a full-screen terminal application like vim, htop, lazygit, or k9s. Reading a 2D grid, sending navigation keys, understanding menus and panes.

wsh:multi-session — You need to run multiple things in parallel. Spawning sessions, monitoring them, collecting results across sessions.

wsh:agent-orchestration — You need to drive another AI agent (Claude Code, Aider, etc.) through its terminal interface. Feeding tasks, handling approval prompts, reviewing agent output.

wsh:monitor — You need to watch what a human is doing and react. Subscribing to terminal events, detecting patterns, providing contextual assistance or auditing.

wsh:visual-feedback — You need to communicate with the human visually. Building overlay notifications, status panels, progress displays, contextual annotations.

wsh:input-capture — You need to take over keyboard input temporarily. Building approval workflows, custom menus, interactive dialogs.

wsh:generative-ui — You need to build a dynamic interactive experience in the terminal. Combining overlays, panels, input capture, direct drawing, and alternate screen mode to create bespoke interfaces on the fly.

wsh:cluster-orchestration — You need to manage sessions across multiple wsh servers. Distributing work across machines, monitoring backend health, handling server failures, coordinating cross-server workflows.

Repository
deepgram/wsh
Last updated
Created

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.