CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl-labs/skill-optimizer

Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.

88

1.07x
Quality

94%

Does it follow best practices?

Impact

88%

1.07x

Average score across 24 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

task.mdevals/scenario-23/

Commit Selection for Eval Scenario Generation

Problem Description

You are setting up an evaluation pipeline for a Tessl tile. You need to select commits from recent git history that will produce challenging, useful eval scenarios. Trivially simple commits produce tasks that agents solve at 100% baseline, making them worthless as evaluation datapoints.

Below are 7 recent commits from the acme/platform-api repository. Each includes the commit message, diff stat, and a summary of the actual changes. Evaluate each commit against the selection criteria below for generating eval scenarios.

Selection Criteria

Hard-skip gates (reject immediately if ANY apply):

  • Fewer than 3 source files changed
  • Fewer than 50 lines of source code changed (exclude test files, fixtures, and generated code)
  • Only documentation files changed (README, docs/, etc.)
  • Only configuration/build files changed (package.json, tsconfig, CI configs)

Complexity signals (score 1 point each for surviving commits):

  1. New abstractions — introduces new interfaces, base classes, or patterns
  2. Cross-cutting scope — changes span multiple architectural layers
  3. Wiring and registration — connects components, configures dependency injection
  4. Non-obvious control flow — async flows, error propagation, state machines
  5. Domain-specific logic — business rules, validation, calculations
  6. Multiple interdependent changes — changes that only work together
  7. No single-point solution — requires understanding broader context to solve

Recommend commits scoring 5+/7 as good eval candidates.

What to produce

Write a commit-analysis.md file that:

  1. Applies the hard-skip gates above to filter out commits that should be immediately rejected
  2. For commits that pass the gates, scores them on the 7 complexity signals above (1 point each)
  3. Recommends which commits should be selected for scenario generation (aim for 2-3)
  4. Explains why each recommended commit would produce a good eval scenario
  5. Explains why each rejected commit would NOT produce a good eval scenario

Commits

=============== FILE: commits/commit-1.txt =============== commit a1f3e7c Author: dev1 Date: Mon Mar 2

Rename utils.py to helpers.py

src/utils.py => src/helpers.py | 0 1 file changed, 0 insertions(+), 0 deletions(-) =============== END FILE ===============

=============== FILE: commits/commit-2.txt =============== commit b4d8a2e Author: dev2 Date: Tue Mar 3

Update README with new API examples and fix typos in CONTRIBUTING.md

README.md | 45 +++++++++++++++++++++++++++++++- CONTRIBUTING.md | 12 ++++----- 2 files changed, 50 insertions(+), 7 deletions(-) =============== END FILE ===============

=============== FILE: commits/commit-3.txt =============== commit c7e9f1a Author: dev3 Date: Wed Mar 4

Bump dependencies to latest versions

package.json | 8 ++++---- package-lock.json | 312 ++++++++++++++++++++++++++++++++++++++--------- 2 files changed, 256 insertions(+), 64 deletions(-)

Summary of changes:

  • Updated express from 4.18.2 to 4.19.0
  • Updated typescript from 5.3.2 to 5.4.0
  • Updated jest from 29.6.0 to 29.7.0
  • Regenerated lock file =============== END FILE ===============

=============== FILE: commits/commit-4.txt =============== commit d2a6b3f Author: dev4 Date: Thu Mar 5

Add date formatting utility and unit tests

src/utils/date-format.ts | 32 ++++++++++++++++++++++++++++++++ src/utils/date-format.test.ts | 8 ++++++++ 2 files changed, 40 insertions(+), 0 deletions(-)

Summary of changes:

  • New file: date-format.ts with formatDate(), parseISO(), and relativeDuration() functions
  • New file: date-format.test.ts with basic test cases for each function =============== END FILE ===============

=============== FILE: commits/commit-5.txt =============== commit e5c1d9b Author: dev5 Date: Fri Mar 6

Add payment processing endpoint with Stripe integration, validation middleware, and webhook handler

src/routes/payments.ts | 68 ++++++++++++++++++++++++++ src/middleware/validate-payment.ts | 42 ++++++++++++++++ src/services/stripe-client.ts | 35 ++++++++++++++ src/webhooks/stripe-events.ts | 28 +++++++++++ src/types/payment.ts | 19 ++++++++ tests/payments.test.ts | 47 ++++++++++++++++++ 6 files changed, 239 insertions(+), 0 deletions(-)

Summary of changes:

  • New route handler for POST /api/payments with input validation
  • New validation middleware that checks payment amount, currency, and idempotency key
  • New Stripe client wrapper with createPaymentIntent, confirmPayment, and refund methods
  • New webhook handler for stripe events (payment_intent.succeeded, payment_intent.failed, charge.disputed)
  • TypeScript types for PaymentRequest, PaymentResponse, and StripeWebhookEvent
  • Integration tests covering the happy path, validation failures, and Stripe error handling =============== END FILE ===============

=============== FILE: commits/commit-6.txt =============== commit f8b2e4a Author: dev6 Date: Mon Mar 9

Refactor authentication system: extract token service, add refresh token rotation, migrate session store to Redis

src/auth/token-service.ts | 89 ++++++++++++++++++++++++++++++ src/auth/refresh-rotation.ts | 54 ++++++++++++++++++ src/auth/session-store.ts | 67 ++++++++++++++--------- src/auth/middleware.ts | 43 ++++++++------- src/config/redis.ts | 22 ++++++++ src/routes/auth-routes.ts | 31 ++++++----- src/types/auth.ts | 18 +++++++ tests/auth/token-service.test.ts | 72 +++++++++++++++++++++++++ 8 files changed, 348 insertions(+), 46 deletions(-)

Summary of changes:

  • Extracted token generation/validation into dedicated TokenService class (was inline in middleware)
  • Added refresh token rotation: each refresh invalidates the old token and issues a new pair
  • Migrated session store from in-memory Map to Redis with configurable TTL
  • Updated auth middleware to use new TokenService instead of direct JWT calls
  • Updated auth routes to handle refresh token rotation flow
  • New Redis configuration module with connection pooling
  • New TypeScript types for TokenPair, RefreshTokenRecord, SessionData
  • Tests for token service covering generation, validation, expiry, and rotation =============== END FILE ===============

=============== FILE: commits/commit-7.txt =============== commit a9d4c6e Author: dev7 Date: Tue Mar 10

Add database migration for new analytics tables

migrations/20240310_analytics_tables.sql | 198 +++++++++++++++++++++++++++++++ 1 file changed, 198 insertions(+), 0 deletions(-)

Summary of changes:

  • CREATE TABLE analytics_events (id, event_type, user_id, payload, created_at, ...)
  • CREATE TABLE analytics_sessions (id, user_id, started_at, ended_at, device_info, ...)
  • CREATE TABLE analytics_aggregates (id, metric_name, dimension, value, period_start, period_end, ...)
  • Various indexes on event_type, user_id, created_at
  • Foreign key constraints between tables =============== END FILE ===============

evals

README.md

tile.json