bapfernandez/article-creator

Content creator for tessl.io — generates publish-ready blog articles with SEO metadata, Tessl house style, and technical authority.

1.26x

Quality

79%

Does it follow best practices?

Impact

92%

1.26x

Average score across 10 eval scenarios

Securityby

Passed

No known issues

Task: Comparison Article on Agent Evaluation Frameworks

Name: bapfernandez/article-creator
Rating: 90.7 (1 reviews)
Author: bapfernandez

Background

The Tessl content team wants to publish a comparison article that helps developers choose between the leading agent evaluation frameworks currently available. The audience is engineers who are actively building AI agents and have reached the point where they need a structured approach to measuring agent accuracy, but haven't yet committed to a framework.

The content lead's brief:

"This needs to be genuinely useful, not a feature matrix that tells people nothing. Pick the dimensions that actually matter for this decision and give us your honest take on when to use each. Don't just describe features — tell us who each framework is right for. Be specific. Opinionated is fine; wishy-washy is not. Publish-ready with full metadata."

Frameworks to Compare

Compare the following three agent evaluation frameworks (invented but realistic):

EvalKit

Open source, MIT license
Focuses on deterministic scoring: regex match, JSON schema validation, tool call verification
Supports 12 built-in scorers; custom scorers via Python plugins
Runs locally or in CI; no hosted service
Average setup time reported by users: ~45 minutes to first eval run
Best-known for its speed and CI integration

Orion Evals

Commercial SaaS, free tier available (500 eval runs/month)
LLM-as-judge scoring with a configurable rubric builder
Hosted dashboard with trend graphs and regression alerts
Integrates with GitHub Actions, CircleCI, and GitLab CI out of the box
Pricing: $49/month for 5,000 runs; $199/month for 25,000 runs
Best-known for its rubric builder and visual reporting

spec-eval

Open source, Apache 2.0 license
Eval scenarios defined in YAML; scoring criteria derived from the spec
Designed to align evals with a written specification document
No hosted service; runs locally or in any CI environment
Smaller community; roughly 1,200 GitHub stars as of the comparison date
Best-known for its spec-driven workflow

What to Write

Write a publish-ready comparison article for the tessl.io blog. The article should:

Briefly introduce each framework so a reader unfamiliar with them can follow along
Compare them across 4-6 meaningful dimensions relevant to choosing an eval framework (you decide which dimensions matter most)
Include a comparison table summarizing the dimensions
Follow the table with expanded analysis for each dimension
End with specific, opinionated guidance on when to choose each framework ("If you're X doing Y, choose Z")
Close with something that gives the reader a clear next action or leaves them with a useful provocation

Use suggestive language when describing claimed benefits from any of the three frameworks.

Output Specification

Save the completed article as article.md in the current working directory.

The file must include a metadata block at the top (title, type, primary keyword, meta description, URL slug, internal links, estimated read time) followed by the full article body in markdown.