Benchmark Suite

Content Agent Suite

Five tasks measuring whether content agents produce work an editor would actually publish without rewriting.

  1. 01

    Turn raw notes into a blog post

    Convert messy meeting / research notes into a publishable blog post. Voice should match a provided sample.

    Expected outcome

    Cohesive post with the requested voice. No fabricated quotes or stats.

    Failure tags watched
    fabricated-quoteoff-voicefiller-padding
  2. 02

    Generate social post variants

    Take a long article and produce platform-appropriate variants for at least three platforms.

    Expected outcome

    Each variant respects the platform's norms and length limits. No cross-platform copy-paste.

    Failure tags watched
    copy-paste-across-platformslength-violationmissing-cta
  3. 03

    Preserve voice and tone

    Given a 1,000-word writing sample, produce new content that an unbiased reader would attribute to the same author.

    Expected outcome

    Voice survives. Reader cannot tell which is human and which is generated.

    Failure tags watched
    AI-tellsoff-registertonal-drift
  4. 04

    Create an SEO outline

    Outline a piece targeting a specific search intent without keyword stuffing or thin spam.

    Expected outcome

    Outline maps to the intent, includes E-E-A-T-friendly sections, no spammy patterns.

    Failure tags watched
    keyword-stuffingthin-contentintent-mismatch
  5. 05

    Repurpose long-form to short-form

    Turn a 2,500-word post into 5 distinct short-form pieces (thread, LinkedIn post, newsletter blurb, YouTube hook, etc).

    Expected outcome

    Each piece stands alone and respects its format. No identical openers.

    Failure tags watched
    identical-openersno-platform-fitlost-key-point