CtrlK
BlogDocsLog inGet started
Tessl Logo

sharaf/llm-learning-system-auditor

Use when the user wants to review, audit, or check safety for an AI memory system, agent learning pipeline, prompt-tuning workflow, skill builder, trace-mining tool, or eval/feedback loop. Produces an evidence-led audit report with learning-loop map, evidence inventory, maturity scorecard, severity-ranked findings, privacy/provenance gaps, counterfactual/eval coverage, and Stabilize/Standardize/Scale roadmap.

100

1.28x
Quality

100%

Does it follow best practices?

Impact

100%

1.28x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

Evaluation results

100%

26%

Audit Report: MemoryHarvester Rule and Memory Learning Loop

Memory extraction, rule induction, provenance, and rollout audit for session-history learning

Criteria
Without context
With context

Learning-loop map

90%

100%

0-4 maturity scorecard

100%

100%

Trace provenance and clean-evidence gap

50%

100%

Memory governance finding

83%

100%

Rule induction guardrails

58%

100%

Review, promotion, and rollback

75%

100%

Privacy and retention risk

80%

100%

Counterfactual eval coverage

50%

100%

Severity-ranked findings

62%

100%

Sequenced roadmap

100%

100%

100%

17%

Audit Report: SkillForge Generated Skill Pipeline

Generated executable skill verification, provenance, sandboxing, and registry promotion audit

Criteria
Without context
With context

Lifecycle and maturity map

100%

100%

Executable sandbox risk

93%

100%

Provenance metadata

83%

100%

Registry is not verification

90%

100%

Eval gate coverage

76%

100%

Human review and rollback

83%

100%

Trigger and activation safety

37%

100%

Deployment controls

87%

100%

Severity-ranked findings

66%

100%

Stabilize before scale roadmap

100%

100%

100%

22%

Audit Report: PromptTuner Automated Optimization System

Maturity scoring, report structure, and LLM-judge guardrails in a prompt optimization audit

Criteria
Without context
With context

0-4 maturity scores

16%

100%

No collapsed average

100%

100%

Required report headings

58%

100%

LLM-as-judge calibration flag

91%

100%

Post-hoc redaction guardrail

100%

100%

Transcript evidence gap

70%

100%

Validation split and optimize-on-gate finding

88%

100%

Roadmap with buckets

90%

100%

No safe-learning claim

100%

100%

Evaluated
Agent
Claude Code
Model
Claude Sonnet 4.6

Table of Contents