home / skills / gtmagents / gtm-agents / ab-testing

This skill guides A/B testing for subject lines, cadences, and journeys, helping you design, run, and learn from experiments effectively.

npx playbooks add skill gtmagents/gtm-agents --skill ab-testing

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
1.3 KB
---
name: ab-testing
description: Use when designing experiments for subject lines, offers, cadences, or
  journeys.
---

# Experimentation & A/B Testing Skill

## When to Use
- Validating new subject lines or creative.
- Testing segmentation hypotheses (persona vs behavior).
- Optimizing cadence, timing, or automation triggers.

## Framework
1. **Hypothesis** – define expected uplift + rationale.
2. **Metric Selection** – primary (open/click/conv) + guardrails (unsubs, spam).
3. **Sample Sizing** – ensure stat significance (min 500 recipients per variant or use power calculator).
4. **Execution** – randomize, keep variants isolated, limit simultaneous tests.
5. **Analysis** – use z-test or Bayesian uplift; document learnings.

## Templates
- Experiment brief (hypothesis, segments, KPI, risk guardrails).
- Variant table (control vs test inputs, creative asset links, owner).
- Calculator sheet for minimum detectable effect + sample size.
- Post-test debrief doc capturing learnings + rollout plan.

## Experiment Ideas
- Subject line vs preview text combos.
- CTA placement (hero vs footer).
- Personalization depth (basic vs dynamic modules).
- Wait times between touches.

## Tips
- Run no more than two tests per journey simultaneously.
- Recycle learnings into playbooks + automation templates.
- Segment results by persona to catch hidden signals.

---

Overview

This skill helps design, run, and analyze A/B tests for subject lines, offers, cadences, and customer journeys. It provides a practical experimentation framework, templates, and execution guidance to produce reliable, actionable learnings. Use it to reduce risk and speed up optimization across marketing and revenue workflows.

How this skill works

It guides you through hypothesis definition, metric selection, sample sizing, controlled execution, and statistical analysis. The skill includes templates for experiment briefs, variant tracking, and post-test debriefs, plus a calculator approach for minimum detectable effect and sample size. It enforces guardrails like randomization, isolation of variants, and limits on simultaneous tests to preserve result integrity.

When to use it

  • Validating new subject lines or creative combinations before full rollout
  • Testing segmentation hypotheses (persona-based vs behavior-based targeting)
  • Optimizing cadence, timing, or automation trigger intervals
  • Comparing offers, CTAs, or personalization levels across journeys
  • Prioritizing experiments with measurable conversion or engagement impact

Best practices

  • Start with a clear hypothesis that states the expected uplift and rationale
  • Choose a single primary metric and relevant guardrail metrics (e.g., unsubscribes, spam complaints)
  • Ensure adequate sample size — aim for at least 500 recipients per variant or use a power calculator
  • Randomize and isolate variants; avoid running more than two tests in the same journey concurrently
  • Document results and convert winning learnings into playbooks and automation templates

Example use cases

  • A/B test subject line vs preview text combinations to boost open rates
  • Compare basic personalization to dynamic modules to measure lift in clicks
  • Test CTA placement (hero vs footer) to improve conversion rate
  • Experiment with wait times between touches to optimize engagement across a cadence
  • Segment post-test results by persona to uncover hidden signals and tailor rollouts

FAQ

How large should my test sample be?

Use a power calculator to compute sample size based on baseline rate and minimum detectable effect; as a rule of thumb, aim for at least 500 recipients per variant.

Which statistical method should I use for analysis?

Use a z-test for simple proportions or a Bayesian uplift approach for more nuanced probability estimates; always report significance, confidence intervals, and guardrail outcomes.