Benchmark Suite

Content Agent Suite

Five tasks measuring whether content agents produce work an editor would actually publish without rewriting.

01
Turn raw notes into a blog post
Convert messy meeting / research notes into a publishable blog post. Voice should match a provided sample.
Expected outcome
Cohesive post with the requested voice. No fabricated quotes or stats.
Failure tags watched
fabricated-quoteoff-voicefiller-padding
02
Generate social post variants
Take a long article and produce platform-appropriate variants for at least three platforms.
Expected outcome
Each variant respects the platform's norms and length limits. No cross-platform copy-paste.
Failure tags watched
copy-paste-across-platformslength-violationmissing-cta
03
Preserve voice and tone
Given a 1,000-word writing sample, produce new content that an unbiased reader would attribute to the same author.
Expected outcome
Voice survives. Reader cannot tell which is human and which is generated.
Failure tags watched
AI-tellsoff-registertonal-drift
04
Create an SEO outline
Outline a piece targeting a specific search intent without keyword stuffing or thin spam.
Expected outcome
Outline maps to the intent, includes E-E-A-T-friendly sections, no spammy patterns.
Failure tags watched
keyword-stuffingthin-contentintent-mismatch
05
Repurpose long-form to short-form
Turn a 2,500-word post into 5 distinct short-form pieces (thread, LinkedIn post, newsletter blurb, YouTube hook, etc).
Expected outcome
Each piece stands alone and respects its format. No identical openers.
Failure tags watched
identical-openersno-platform-fitlost-key-point