CtrlK
BlogDocsLog inGet started
Tessl Logo

pantheon-ai/bash-script-toolkit

Complete bash-script toolkit with generation and validation capabilities

97

Quality

97%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Risky

Do not use without reviewing

Overview
Quality
Evals
Security
Files

criteria.jsonvalidator/evals/scenario-3/

{
  "context": "Evaluate the agent's ability to identify shell security vulnerabilities and produce a hardened rewrite",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "eval $PLUGIN_CMD identified as command injection",
      "description": "Agent identifies that eval with an unvalidated variable allows arbitrary command execution; describes a concrete example attack (e.g., PLUGIN_CMD='rm -rf /')",
      "max_score": 25
    },
    {
      "name": "rm -rf with unquoted variable identified",
      "description": "Agent identifies that rm -rf /tmp/deploy_$DEPLOY_ENV is dangerous when DEPLOY_ENV is empty or contains spaces, potentially deleting /tmp/deploy_ or unintended paths; explains the data loss risk",
      "max_score": 20
    },
    {
      "name": "|| true error suppression anti-pattern identified",
      "description": "Agent identifies that check_prereqs || true in the validate function always returns success, hiding failures, and that return 0 compounds this by preventing callers from detecting errors",
      "max_score": 15
    },
    {
      "name": "cd without error handling identified",
      "description": "Agent identifies that cd $WORK_DIR without || exit (SC2164) means subsequent commands run in the wrong directory if cd fails",
      "max_score": 10
    },
    {
      "name": "Hardened script produced",
      "description": "Agent produces a corrected script that: removes eval and replaces with a safe alternative (or strict allowlist), quotes the rm -rf variable, removes || true suppression, and adds cd || exit or cd || { ...; exit 1; }",
      "max_score": 20
    },
    {
      "name": "Attack vectors explained",
      "description": "For each security issue, agent describes a concrete attack scenario showing how the vulnerability could be exploited",
      "max_score": 10
    }
  ]
}

tile.json