Name: honeybadge/harbor
Rating: 79.2 (1 reviews)
Author: honeybadge

honeybadge/harbor

Framework for AI agent evaluation in containerized environments. Use when: (1) Running agent evaluations with `harbor run` against benchmarks (SWE-Bench, Terminal-Bench, Aider Polyglot, etc.), (2) Creating custom benchmark tasks with Dockerfile, instruction.md, solution, and tests, (3) Building adapters to convert existing benchmarks to Harbor format, (4) Implementing custom agents extending BaseAgent or BaseInstalledAgent, (5) Scaling evaluations to cloud providers (Daytona, Modal, E2B), (6) Exporting traces for RL/SFT training, (7) Debugging Harbor runs or inspecting package internals.

Quality

99%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Advisory

Suggest reviewing before use

Security

1 medium severity finding. This skill can be installed but you should review these findings before use.

Medium

W011: Third-party content exposure detected (indirect prompt injection risk)

What this means

The skill exposes the agent to untrusted, user-generated content from public third-party sources, creating a risk of indirect prompt injection. This includes browsing arbitrary URLs, reading social media posts or forum comments, and analyzing content from unknown websites.

Why it was flagged

Third-party content exposure detected (high risk: 0.90). The skill's core workflow loads external benchmark datasets and task instructions from a registry or user-provided sources (see references/commands.md for "harbor run -d <dataset@version>" and the --registry-url/harbor datasets list), adapters explicitly accept --source-url and load benchmark data from external repos (references/adapters.md), and tasks include instruction.md (references/tasks.md) which the agent reads and acts on—so untrusted, user-generated third‑party content can be ingested and materially influence agent actions.

Report incorrect finding

Audited: 3 months ago
Security analysis