core-mcp

REQUIRED before any wsh terminal operation. Contains the complete MCP tool reference and bootstrap sequence for wsh_create_session, wsh_send_input, wsh_get_screen, wsh_send_and_read, wsh_send_keys, and all wsh_* tools. Do NOT guess wsh CLI commands or HTTP endpoints — use MCP tools or load this skill first.

Quality

76%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./skills/core-mcp/SKILL.md

wsh: Terminal as a Service (MCP)

You have access to wsh via MCP tools that give you direct control over terminal sessions. You can see exactly what's on screen, send keystrokes, wait for commands to finish, and create visual elements — all through MCP tool calls.

Think of it this way: wsh gives you eyes (read the screen), hands (send input), patience (wait for output to settle), and a voice (overlays and panels to communicate with the human).

How It Works

wsh manages terminal sessions via a server daemon that listens on a Unix domain socket (by default at ${XDG_RUNTIME_DIR}/wsh/default.http.sock). MCP tools connect to this socket automatically. The human sees their normal terminal. You interact through tool calls to the same session. Everything is synchronized — input you send appears on their screen, output they generate appears in your tool responses. All tools take a session parameter to specify which session to operate on (e.g., "default").

Getting Started

Step 1: Verify you have MCP tools. Check your toolkit for tools starting with wsh_ (e.g., wsh_send_input, wsh_get_screen, wsh_create_session). If you have them, proceed to step 2.

If you do NOT have wsh_* tools, the MCP bridge is not running. Do NOT try to start a server manually — that won't give you MCP tools. Instead, fall back to the HTTP API: search your workspace for skills/core/ and read SKILL.md. It has every endpoint with working curl examples and a bootstrap sequence.

Step 2: Create a session. Sessions are where commands run:

wsh_create_session(name="work")

Returns the session name and terminal dimensions on success.

Step 3: Use the send/wait/read loop. The primary tool:

wsh_send_and_read(session="work", keys=[{"text": "ls -la"}, {"key": "enter"}], format="plain")

This sends keystrokes, waits for idle, and returns the screen in one call. For more control, use wsh_send_input, wsh_await_idle, and wsh_get_screen separately.

Authentication

MCP tools connect locally via Unix domain socket, so no authentication is needed for local use. Authentication only matters when wsh is configured with TCP HTTP (via --bind) for remote access across the network. For remote TCP access, a Bearer token is required on all requests. The token is auto-generated on startup or set via --token / WSH_TOKEN.

The Fundamental Loop

Almost everything you do with wsh follows this pattern:

Send — inject input into the terminal
Wait — let the command run until output settles
Read — see what's on screen now
Decide — based on what you see, choose what to do next

This is your heartbeat. Learn it. A drive-process interaction is just this loop repeated until the task is done.

MCP Tools

These are the building blocks. Every specialized skill builds on these.

Send and Read (Send + Wait + Read)

The primary tool for the send/wait/read loop. Sends keystrokes, waits for idle, then returns the screen contents.

Use wsh_send_and_read with:

session — target session name (e.g., "default")
keys — array of key actions (see Send Keys below)
timeout_ms — idle timeout (default 2000)
max_wait_ms — maximum wall-clock wait (default 30000)
format — "plain" or "styled" (default "styled")

Example: run ls -la and read the result:

wsh_send_and_read(session="default", keys=[{"text": "ls -la"}, {"key": "enter"}], format="plain")

Returns the screen contents plus a generation counter. If the terminal doesn't settle within max_wait_ms, the screen is still returned but flagged as an error.

Send Keys

Inject keystrokes into the terminal using named keys. No encoding to get wrong — use key names instead of escape sequences.

Use wsh_send_keys with:

session — target session name
keys — array of key actions

Each element in keys is either:

{"text": "..."} — literal characters to type
{"key": "..."} — a named special key

Named keys: enter, tab, escape, backspace, delete, up, down, left, right, home, end, pageup, pagedown, ctrl+a through ctrl+z, f1-f12. Case-insensitive.

Examples:

wsh_send_keys(session="default", keys=[{"text": "ls -la"}, {"key": "enter"}])
wsh_send_keys(session="default", keys=[{"key": "ctrl+c"}])
wsh_send_keys(session="default", keys=[{"key": "escape"}, {"text": ":wq"}, {"key": "enter"}])

Returns {"status": "sent", "bytes": N, "generation": G} on success. The generation counter is the activity state before input was submitted — pass it to wsh_await_idle for correct sequencing.

Send Input (Low-Level)

Raw byte injection for advanced use. Prefer wsh_send_keys for most input — it handles encoding automatically.

Use wsh_send_input with:

session — target session name
input — the text or data to send (JSON string encoding)
encoding — "utf8" (default) or "base64"

Returns {"status": "sent", "bytes": N, "preview": "...", "generation": G}. The generation counter is the activity state before input was submitted — pass it to wsh_await_idle for correct sequencing. Includes a warning field if the input looks empty or double-escaped.

Base64 encoding bypasses any MCP transport issues with control characters:

wsh_send_input(session="default", input="Aw==", encoding="base64") — Ctrl+C
wsh_send_input(session="default", input="Cg==", encoding="base64") — Enter

<details> <summary>JSON escape reference (for utf8 encoding)</summary>

Key	JSON escape	Example
Enter	`\n`	`input="ls -la\n"`
Tab	`\t`	`input="\t"`
Ctrl+C	`\u0003`	`input="\u0003"`
Ctrl+D	`\u0004`	`input="\u0004"`
Escape	`\u001b`	`input="\u001b"`

Any Ctrl+key = \u00XX where XX is the ASCII code (A=01, B=02, ..., Z=1a).

</details>

Wait for Idle

Block until the terminal has been idle for timeout_ms milliseconds. This is a hint that the program may be idle — it could also just be working without producing output.

Use wsh_await_idle with:

session — target session name
timeout_ms — idle duration to wait for (default 2000)
max_wait_ms — maximum wall-clock wait (default 30000)

Returns {"status": "idle", "generation": N} once idle. Returns an error result if the terminal doesn't settle within max_wait_ms.

Read the Screen

Get the current visible screen contents.

Use wsh_get_screen with:

session — target session name
format — "plain" for simple text or "styled" for spans with color/formatting (default "styled")

Read Scrollback

Get historical output that has scrolled off screen.

Use wsh_get_scrollback with:

session — target session name
offset — line offset into scrollback (default 0)
limit — max lines to return (default 100)
format — "plain" or "styled" (default "styled")

Visual Elements

Overlays

Floating text positioned on top of terminal content. They don't affect the terminal — they're a layer on top.

Use wsh_overlay to create, update, or list overlays:

Create an overlay:

wsh_overlay(
  session="default",
  x=0, y=0, width=20, height=1,
  spans=[{"text": "Hello!", "bold": true}]
)

Returns {"status": "created", "id": "uuid"} — use this ID to update or delete.

Update an overlay (provide id):

wsh_overlay(
  session="default",
  id="<overlay-id>",
  spans=[{"text": "Updated!", "fg": "green"}]
)

List overlays:

wsh_overlay(session="default", list=true)

Opaque overlays: Add background to fill the rectangle with a solid color, making it a window-like element:

wsh_overlay(
  session="default",
  x=10, y=5, width=40, height=10,
  background={"bg": "black"},
  spans=[{"text": "Window content"}]
)

Background accepts named colors ("blue") or RGB ({"r": 30, "g": 30, "b": 30}).

Focusable: Add focusable=true to allow focus routing during input capture (see Input Capture below).

Use wsh_remove_overlay to remove overlays:

With id — remove a specific overlay
Without id — clear all overlays

Use overlays for: tooltips, status indicators, annotations, notifications — anything that should appear on top of the terminal without disrupting it. With explicit dimensions: windows, dialogs, cards.

Panels

Agent-owned screen regions at the top or bottom of the terminal. Unlike overlays, panels shrink the PTY — they carve out dedicated space.

Use wsh_panel to create, update, or list panels:

Create a panel:

wsh_panel(
  session="default",
  position="bottom", height=3,
  spans=[{"text": "Status: running"}]
)

Update a panel (provide id):

wsh_panel(
  session="default",
  id="<panel-id>",
  spans=[{"text": "Status: done", "fg": "green"}]
)

List panels:

wsh_panel(session="default", list=true)

Background: Add background to fill the panel with a solid color:

wsh_panel(
  session="default",
  position="bottom", height=2,
  background={"bg": "blue"},
  spans=[{"text": "Status: ok"}]
)

Focusable: Add focusable=true to allow focus routing during input capture.

Use wsh_remove_panel to remove panels:

With id — remove a specific panel
Without id — clear all panels

Use panels for: persistent status bars, progress displays, context summaries — anything that deserves its own screen real estate.

Input Capture

Intercept keyboard input so it comes to you instead of the shell.

Use wsh_input_mode to query or change input mode and focus:

mode="capture" — grab input (keystrokes go to API only)
mode="release" — release back (keystrokes go to PTY)
focus="<element-id>" — direct captured input to a specific focusable overlay or panel
unfocus=true — clear focus
No mode/focus params — query current state

The human can press Ctrl+\ to toggle capture mode (it switches between passthrough and capture).

Focus is automatically cleared when input is released or when the focused element is deleted.

Use input capture for: approval prompts, custom menus, interactive dialogs between you and the human.

Alternate Screen Mode

Enter a separate screen mode where you can create a completely independent set of overlays and panels. Exiting cleans up everything automatically.

Use wsh_screen_mode to query or change screen mode:

action="enter_alt" — enter alternate screen mode
action="exit_alt" — exit alternate screen mode
No action — query current mode ("normal" or "alt")

Overlays and panels are automatically tagged with the screen mode active at the time of creation. When you exit alt screen, all elements created in alt mode are deleted and the original screen's elements are restored.

Use alt screen mode for: temporary full-screen agent UIs, setup wizards, immersive dashboards — anything that needs a clean canvas and should leave no trace when done.

Session Management

wsh always runs as a server daemon managing sessions. Use these tools to manage session lifecycle:

List Sessions

wsh_list_sessions()                          # list all
wsh_list_sessions(session="build")           # get details for one
wsh_list_sessions(tag=["build", "ci"])        # filter by tags

Create Sessions

wsh_create_session(name="build", command="cargo build", tags=["build", "ci"])

Optional parameters: rows, cols, cwd, env, tags. Returns {"name": "build", "rows": 24, "cols": 80, "tags": ["build", "ci"]}.

Session name rules: Names must be 1-64 characters and contain only letters, digits, dots, hyphens, and underscores ([a-zA-Z0-9._-]). Invalid names return an invalid_session_name error. If omitted, the server auto-generates a valid name.

Terminal dimensions: rows and cols are clamped to 1-1000.

Tags are optional string labels (1-64 chars, alphanumeric plus hyphens, underscores, and dots). Use them to group and filter sessions by purpose.

Manage Sessions

wsh_manage_session(session="build", action="kill")            # destroy
wsh_manage_session(session="build", action="rename", new_name="build-v2")  # rename (same name rules apply)
wsh_manage_session(session="build", action="detach")          # disconnect clients
wsh_manage_session(session="build", action="add_tags", tags=["production"])  # add tags
wsh_manage_session(session="build", action="remove_tags", tags=["draft"])    # remove tags

Default Session

When wsh is started with wsh (no arguments), it auto-spawns an ephemeral server daemon (listening on Unix domain socket only) and creates a session named default. Use session="default" for all tool calls. If started with --name, the session has that name instead.

Federation (Multi-Server)

When wsh is configured with federated backends, all session tools accept an optional server parameter to target a specific backend by hostname. When omitted, operations target the local server (or aggregate across all servers for listings).

Server Parameter on Session Tools

Add server="<hostname>" to any session tool to route it to a specific backend:

wsh_create_session(name="build", server="prod-1", command="cargo build")
wsh_get_screen(session="build", server="prod-1")
wsh_list_sessions(server="prod-1")       # list sessions on one backend
wsh_list_sessions()                       # aggregate across all servers

Session responses include a server field indicating which server owns the session. Once a session exists, all operations are automatically routed to the correct server.

Server Management Tools

Use these tools to discover and manage the cluster:

List all servers:

wsh_list_servers()

Returns all servers (hub + backends) with hostname, address, health (healthy, connecting, unavailable), and role.

Add a backend:

wsh_add_server(address="http://10.0.1.10:8080")
wsh_add_server(address="https://10.0.1.11:8443", token="secret")

Addresses require http:// or https:// scheme. The backend starts in connecting state and transitions to healthy once reachable.

Check a specific server:

wsh_server_status(hostname="prod-1")

Remove a backend:

wsh_remove_server(hostname="prod-1")

For detailed federation patterns (cross-server workflows, failure handling, distributed quiescence), invoke wsh:cluster-orchestration.

Specialized Skills

When your task matches one of these patterns, invoke the corresponding skill for detailed guidance.

wsh:drive-process — You need to run a CLI command and interact with it. Sending input, reading output, handling prompts, navigating sequential command-and-response workflows.

wsh:tui — You need to operate a full-screen terminal application like vim, htop, lazygit, or k9s. Reading a 2D grid, sending navigation keys, understanding menus and panes.

wsh:multi-session — You need to run multiple things in parallel. Spawning sessions, monitoring them, collecting results across sessions.

wsh:agent-orchestration — You need to drive another AI agent (Claude Code, Aider, etc.) through its terminal interface. Feeding tasks, handling approval prompts, reviewing agent output.

wsh:monitor — You need to watch what a human is doing and react. Subscribing to terminal events, detecting patterns, providing contextual assistance or auditing.

wsh:visual-feedback — You need to communicate with the human visually. Building overlay notifications, status panels, progress displays, contextual annotations.

wsh:input-capture — You need to take over keyboard input temporarily. Building approval workflows, custom menus, interactive dialogs.

wsh:generative-ui — You need to build a dynamic interactive experience in the terminal. Combining overlays, panels, input capture, direct drawing, and alternate screen mode to create bespoke interfaces on the fly.

Repository: deepgram/wsh
Commit: 4863aaf

Last updated: 10 days ago
Created: 10 days ago

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.