Systematic diary exploration: discover tags, entry distribution, coverage gaps, agent mistakes, and compile recipes
86
90%
Does it follow best practices?
Impact
81%
1.06xAverage score across 5 eval scenarios
Advisory
Suggest reviewing before use
Full exploration report and recovery protocol
All report sections
83%
83%
Entry type reflection
0%
0%
Exploration tags
50%
50%
Importance 6
0%
0%
Recovery detection
66%
100%
Phase resumption
100%
90%
Incident-antipattern relations
37%
87%
Decision-commit relations
37%
25%
Repeated incident relations
12%
62%
Header fields
62%
50%
Severity grouping in report
33%
16%
Ordered phases
100%
66%
Diary inventory and tag namespace discovery
Pagination function
100%
100%
Batch size 50
0%
0%
Entry type counts
100%
100%
Tag frequency
100%
100%
Tag namespace grouping
100%
100%
Namespace discovery not hardcoded
100%
100%
Importance histogram
100%
100%
Temporal range
100%
100%
Sample report generated
100%
100%
Tag tree format
100%
100%
Total entry count
100%
100%
Incident triage and episodic analysis
Tag-first search
100%
100%
Fallback search
100%
100%
Structured extraction
100%
100%
Severity classification
100%
100%
Subsystem inference
100%
100%
Subsystem grouping
100%
100%
Training candidates
100%
100%
Non-episodic filtering
50%
100%
Untagged incident detection
100%
100%
Preventive context field
100%
100%
Severity in report
50%
62%
Commit pattern analysis and anti-pattern detection
Procedural tag search
0%
0%
Fallback search
0%
0%
Tag frequency within procedural
100%
100%
Branch grouping
100%
100%
Double-prefix detection
100%
100%
Missing scope detection
0%
100%
Missing branch detection
100%
100%
Catch-all tag detection
75%
75%
Broad entry detection
62%
0%
Non-procedural filtering
100%
100%
Report has anti-patterns section
100%
100%
Coverage gap detection and compile recipe design
Recipe YAML fields
58%
58%
Recipes use existing tags
100%
100%
Coverage gap detection
100%
100%
Learn trace analysis
60%
80%
Noise source identification
80%
100%
Noise in exclude_tags
50%
75%
Multiple recipes
100%
100%
Rationale per recipe
37%
100%
Weight parameters
87%
100%
Token budget specified
100%
100%
Gap evidence
100%
100%