CtrlK
BlogDocsLog inGet started
Tessl Logo

jbaruch/face-recognition-calibration

Production-grade dlib face_recognition toolkit: piecewise confidence formula, enrollment quality diagnostics, and producer-side persistence for flicker suppression.

96

2.70x
Quality

93%

Does it follow best practices?

Impact

100%

2.70x

Average score across 6 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-1/

{
  "context": "Tests whether the agent implements the piecewise confidence mapping (not the naive formula) and validates it with the canonical sanity check. The retail scenario motivates why the formula matters but does not name the correct formula.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Piecewise clamp at 0.30",
      "description": "confidence_module.py returns exactly 1.0 for any distance <= 0.30 (verified in results.txt or source)",
      "max_score": 15
    },
    {
      "name": "Piecewise clamp at 0.60",
      "description": "confidence_module.py returns exactly 0.0 for any distance >= 0.60 (verified in results.txt or source)",
      "max_score": 15
    },
    {
      "name": "Linear mid-range formula",
      "description": "The formula used in the 0.30–0.60 range is (0.60 - distance) / 0.30, NOT 1 - distance/tolerance or any other expression",
      "max_score": 20
    },
    {
      "name": "No naive formula",
      "description": "Does NOT use `1 - distance / tolerance` (or equivalent `1 - d / 0.6`) as the confidence formula",
      "max_score": 20
    },
    {
      "name": "Sanity check value at 0.38",
      "description": "results.txt shows confidence at distance 0.38 producing a value of approximately 0.73 (between 0.72 and 0.74)",
      "max_score": 20
    },
    {
      "name": "Threshold constants correct",
      "description": "The lower clamp threshold is 0.30 and upper clamp threshold is 0.60 — not shifted to 0.40/0.45 or any other values",
      "max_score": 10
    }
  ]
}

evals

scenario-1

criteria.json

task.md

README.md

tile.json