CtrlK
BlogDocsLog inGet started
Tessl Logo

qzcli

Manage GPU compute jobs on the Qizhi (启智) platform using qzcli — a kubectl-style CLI tool. Use when user says "qzcli", "启智平台", "submit job", "stop job", "查计算组", "avail", "list jobs", "batch submit", or needs to manage distributed training jobs on a Qizhi instance.

84

Quality

82%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Risky

Do not use without reviewing

SKILL.md
Quality
Evals
Security

qzcli — 启智平台任务管理

A kubectl/docker-style CLI for managing GPU compute jobs on the Qizhi (启智) platform.

GitHub: tianyilt/qzcli_tool

Installation

pip install rich requests prompt_toolkit mcp
git clone https://github.com/tianyilt/qzcli_tool
cd qzcli_tool && pip install -e .

MCP Integration (optional)

To use qzcli as an MCP tool directly from Claude Code or Codex:

# Claude Code
claude mcp add qzcli -- qzcli-mcp

# Codex
codex mcp add qzcli -- qzcli-mcp

Configuration

Credentials are read in this priority order: CLI args > --password-stdin > env vars > QZCLI_ENV_FILE (.env) > ~/.qzcli/config.json > interactive input

# Option A: env file (recommended)
mkdir -p ~/.qzcli
cat > ~/.qzcli/.env <<'EOF'
QZCLI_USERNAME="your_username"
QZCLI_PASSWORD="your_password"
EOF

# Option B: environment variables
export QZCLI_USERNAME="your_username"
export QZCLI_PASSWORD="your_password"
export QZCLI_API_URL="https://qz.yourorg.edu.cn"

Config files are stored in ~/.qzcli/: config.json, .cookie, resources.json, jobs.json.


Quick Start

# 1. Login
qzcli login

# 2. Discover and cache workspaces/compute groups (run once, re-run after joining new workspaces)
qzcli res -u

# 3. Check available nodes
qzcli avail

# 4. List running jobs
qzcli ls -c -r

Authentication

# Interactive login
qzcli login

# With credentials
qzcli login -u YOUR_USERNAME -p 'YOUR_PASSWORD'

# Read password from stdin (for scripts)
echo 'YOUR_PASSWORD' | qzcli login -u YOUR_USERNAME --password-stdin

# Check current cookie
qzcli cookie --show

# Clear cookie
qzcli cookie --clear

Note: qzcli avail auto-refreshes the cookie if it expires and credentials are configured.


Resource Discovery

# List cached workspaces
qzcli res --list

# Refresh all workspace resource cache (run this first!)
qzcli res -u

# Refresh a specific workspace
qzcli res -w MY_WORKSPACE -u

# Set a human-readable alias for a workspace
qzcli res -w ws-xxxxxxxx --name "My Workspace"

Check Available Nodes

# All workspaces
qzcli avail

# Including low-priority task nodes (slower but more accurate)
qzcli avail --lp

# Specific workspace
qzcli avail -w MY_WORKSPACE

# Find compute groups with N free nodes
qzcli avail -n 4

# Export IDs for scripting
qzcli avail -n 4 -e

# Show idle node names
qzcli avail -w MY_WORKSPACE -v

Job Submission

Interactive (recommended for first-time use)

# Full interactive selection: workspace → project → compute group → spec
qzcli create -i

# Interactive for a specific workspace only
qzcli create -i -w "My Workspace"

The TUI shows GPU type, availability, and spec status at each level. Press Enter/→ to go deeper, to go back.

Non-interactive

# Using names (resolved from qzcli res cache)
qzcli create \
  --name "my-training-job" \
  --command "bash /path/to/train.sh" \
  --workspace "My Workspace" \
  --compute-group "My Compute Group" \
  --image YOUR_REGISTRY/team/image:tag \
  --instances 4 \
  --priority 10

# Using IDs directly
qzcli create \
  --name "my-job" \
  --command "bash /path/to/train.sh" \
  --workspace ws-YOUR_WORKSPACE_ID \
  --compute-group lcg-YOUR_LCG_ID \
  --spec YOUR_SPEC_ID \
  --image YOUR_REGISTRY/team/image:tag \
  --instances 4

Key parameters:

ParameterDefaultDescription
--name / -nrequiredJob name
--command / -crequiredCommand to run
--workspace / -wWorkspace name or ID (ws-...)
--compute-group / -gautoCompute group name or ID (lcg-...)
--spec / -sautoResource spec ID
--image / -mDocker image
--instances1Number of instances
--shm1200Shared memory (GiB)
--priority10Priority (1–10)
--dry-runPreview only, don't submit
--jsonJSON output for scripting
# Preview before submitting
qzcli create --name test --command "echo hi" --workspace "My Workspace" \
  --image YOUR_IMAGE --dry-run

Env-var passthrough (for existing submission scripts)

# Pass vars directly — do NOT use "export VAR; bash script.sh"
WORKSPACE_ID="ws-YOUR_WORKSPACE_ID" \
LCG_ID="lcg-YOUR_LCG_ID" \
SPEC_ID="YOUR_SPEC_ID" \
CHECKPOINT_DIR="/path/to/checkpoint" \
bash YOUR_SUBMIT_SCRIPT.sh

HPC / CPU jobs (Slurm)

qzcli hpc \
  --name "my-cpu-job" \
  --workspace ws-YOUR_WORKSPACE_ID \
  --compute-group lcg-YOUR_LCG_ID \
  --predef-quota-id YOUR_QUOTA_ID \
  --cpu 55 --mem-gi 300 --instances 30 \
  --image YOUR_REGISTRY/team/cpu-image:tag \
  --entrypoint "cd /path/to/dir && bash run.sh"

Batch Submission

# Submit from config file
qzcli batch batch_config.json --delay 3

# Preview all jobs
qzcli batch batch_config.json --dry-run

# Continue on error
qzcli batch batch_config.json --continue-on-error

Config format (batch_config.json):

{
  "defaults": {
    "workspace": "ws-YOUR_WORKSPACE_ID",
    "compute_group": "lcg-YOUR_LCG_ID",
    "spec": "YOUR_SPEC_ID",
    "image": "YOUR_REGISTRY/team/image:tag",
    "instances": 4,
    "priority": 10
  },
  "matrix": {
    "checkpoint": ["/path/to/ckpt1", "/path/to/ckpt2"],
    "step": [50000, 100000]
  },
  "name_template": "eval-{checkpoint_basename}-step{step}",
  "command_template": "bash eval.sh --checkpoint {checkpoint} --step {step}"
}

Matrix keys are Cartesian-producted (2×2 = 4 jobs above). Use {key_basename} for path basenames.

Shell loop (alternative)

for step in 040000 050000 060000; do
  qzcli create \
    --name "eval-step${step}" \
    --command "bash eval.sh --step $step" \
    --workspace "My Workspace" \
    --compute-group "My Compute Group" \
    --instances 4
  sleep 3
done

Job Management

# List jobs
qzcli ls -c -w MY_WORKSPACE          # specific workspace
qzcli ls -c --all-ws                 # all workspaces
qzcli ls -c -w MY_WORKSPACE -r       # running only
qzcli ls -c -w MY_WORKSPACE -n 50    # show 50

# Stop a job
qzcli stop JOB_ID

# Job status / details
qzcli status JOB_ID

# Watch all running jobs (refresh every 10s)
qzcli watch -i 10

# Workspace view with GPU utilization
qzcli ws
qzcli ws -a           # all projects
qzcli ws -p "My Project"

Troubleshooting

ProblemCauseFix
Cookie expiredSession gapRe-run qzcli login
未找到名称为 'xxx' 的工作空间Stale cacheRun qzcli res -u
No resources in create -iCache emptyRun qzcli login && qzcli res -u
qzcli-mcp not foundNot installedcd qzcli_tool && pip install -e .
Spec not in workspaceID mismatchMatch spec ID to the correct workspace
Silent job failureScript sys.exit(0)Check job logs directly
zsh glob errorsRemote shell is zshWrap commands in bash -c or use Python
Repository
wanshuiyin/Auto-claude-code-research-in-sleep
Last updated
Created

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.