Question 1

What is statistical significance?

Accepted Answer

Statistical significance is the probability that the difference you measured between two variants is real, not random chance. At 95% confidence, there is only a 5% chance that a “winning” result is actually a fluke. It is what separates a genuine improvement from noise.

Question 2

What is confidence level?

Accepted Answer

Confidence level is how sure you want to be before declaring a winner. 95% (the standard default) means you accept a 5% risk of a false positive — calling a winner that is not really better. Higher confidence (99%) is stricter and needs more visitors.

Question 3

What is statistical power?

Accepted Answer

Power is the chance your test detects a real difference when one truly exists. 80% power (the standard default) means an 80% chance of catching a genuine improvement. Higher power (90%) misses fewer real winners but requires more visitors.

Question 4

What is Minimum Detectable Effect (MDE)?

Accepted Answer

MDE is the smallest relative improvement you want to be able to detect. A 20% MDE on a 30% conversion rate means detecting a lift to 36%. Smaller improvements are harder to see, so they need far more traffic; larger improvements need much less.

Question 5

How many visitors do I need for an A/B test?

Accepted Answer

It depends on your current conversion rate and your MDE. Lower rates and smaller target improvements need more visitors. Enter your numbers in the calculator above for the exact sample size per variant.

Question 6

Can I stop an A/B test early?

Accepted Answer

Generally no. “Peeking” — checking results repeatedly and stopping the moment they look significant — massively inflates false positives. Decide on your sample size up front and wait until you reach it (and a full weekly cycle) before deciding.

Question 7

How long should I run an A/B test?

Accepted Answer

Run until you reach the required sample size AND at least one full week — ideally two — so weekday and weekend behaviour are both represented. Ending on a single strong day is one of the most common ways teams fool themselves.

Question 8

Why do low-traffic websites require longer tests?

Accepted Answer

The required sample size is set by your conversion rate and MDE — not by your traffic. So a low-traffic site needs the exact same number of visitors; it simply takes more days to accumulate them. That is why duration, not just sample size, matters.

Question 9

What happens if my sample size is too small?

Accepted Answer

An underpowered test is unreliable. It can miss real improvements (false negatives) and produce noisy results that swing between “significant” and “not significant.” Any winner you declare is more likely to be chance than a genuine effect.

A/B Test Sample Size & Duration Calculator

What this means

Duration vs. detectable improvement

What a healthy test looks like over time

Day 1 — Not enough data

Day 5 — Gathering evidence

Day 10 — Reliable result

How this works

What is a conversion rate?

What is MDE?

What are confidence and power?

How is sample size calculated?

Why sample size matters

Frequently Asked Questions