CtrlK
BlogDocsLog inGet started
Tessl Logo

The default eval model has changed to GLM 5.1.

GLM 5.1 brings stronger reasoning and improved accuracy. You can still select any model when starting a run. Read the full announcement →

simon/cli

Private

Auto-generated tile from GitHub (107 skills)

Eval Details

Eval Run Status

Completed

Version

1ff5bc820f52cb485a7f88fce01c16d2b3e48847

Agent

Claude

Model

Claude Sonnet 4.6

Score

Agent success rate when using this plugin

81%

Improvement

Agent success rate improvement when using this plugin compared to baseline

1.8x

Baseline

Agent success rate without this plugin

45%

Evaluation results

100%

70%

Document Content Retrieval Script

Schema-first API inspection

Criteria
Without context
With context

Schema inspection step

15%

100%

Help command reference

100%

100%

Correct CLI resource syntax

0%

100%

Params flag used

0%

100%

Schema-driven flag construction

13%

100%

No secret output

100%

100%

82%

Automated Meeting Notes Updater

Plain text vs rich text appending

Criteria
Without context
With context

+write for plain text

100%

100%

batchUpdate for table

100%

100%

Correct +write flags

100%

100%

batchUpdate uses --json

100%

100%

Reason for split documented

100%

100%

Confirmation before write

0%

0%

No hardcoded credentials

100%

100%

80%

40%

Policy Document Revision Rollout

Batch update safety with dry-run

Criteria
Without context
With context

Dry-run preview pass

22%

100%

Dry-run is separate from live run

100%

100%

User confirmation before live run

0%

0%

Correct batchUpdate syntax

0%

100%

Loops over all IDs

100%

100%

Runbook dry-run explanation

27%

100%

No secrets exposed

100%

100%

82%

67%

Document Inventory Export Tool

Output formatting and pagination

Criteria
Without context
With context

--format flag used

0%

100%

--page-all flag present

0%

100%

--output flag for file saving

0%

0%

FORMAT argument controls --format

0%

100%

Pagination flags documented

0%

100%

Output filename matches format

100%

100%

61%

1%

Legal Document Processing Pipeline

Service account auth and PII screening

Criteria
Without context
With context

GOOGLE_APPLICATION_CREDENTIALS env var

72%

22%

No credential values output

90%

100%

--sanitize flag on get

0%

0%

Credential path as variable

100%

100%

Setup guide covers no-secret rule

100%

100%

Sanitize explained in setup guide

16%

100%