Business Automation Suite
Seven tasks measuring whether automation agents can plan and execute multi-step business workflows without breaking things upstream.
- 01
Build a Zapier/Make-style workflow plan
Given a business outcome, design a multi-step workflow. Must name specific apps, triggers, and actions.
Expected outcomeConcrete, runnable plan with named integrations and clear branching.
Failure tags watchedvague-stepfabricated-integrationmissing-error-branch - 02
Create an email follow-up automation
Design a sequence that follows up after a trigger event with proper opt-out and quiet-hours handling.
Expected outcomeSequence respects opt-out, has quiet-hours handling, doesn't double-send.
Failure tags watchedno-opt-outdouble-sendno-time-zone-handling - 03
Clean spreadsheet data
Given a messy CSV (mixed types, dupes, encoding issues), produce a clean version with a written change log.
Expected outcomeClean file plus a change log of every transformation. Reversible.
Failure tags watchedsilent-row-droplossy-transformno-change-log - 04
Generate SOP from messy notes
Turn raw meeting notes into an operational SOP a new hire could follow.
Expected outcomeClear, ordered, executable steps. Owner and tools named per step.
Failure tags watchedno-ownermissing-toolordering-error - 05
Triage inbox-style tasks
Classify an inbox of mixed messages (sales, support, internal, spam) into the right next action.
Expected outcomeCorrect classification per message and a recommended next action.
Failure tags watchedmisclassificationauto-reply-to-spammissed-urgent - 06
Create a CRM update plan
From a recent customer interaction, propose CRM field updates without overwriting existing fields blindly.
Expected outcomeField-level diff with rationale. Doesn't blow away existing data.
Failure tags watcheddestructive-overwriteno-rationalewrong-stage-jump - 07
Identify automation risk
Given a proposed workflow, surface the risk surface: data leaks, runaway loops, irreversible actions.
Expected outcomeConcrete risk list with mitigations. Flags any irreversible step explicitly.
Failure tags watchedmissed-irreversibleno-rate-limitdata-residency-blind