Glossary Back to glossary

Null Hypothesis

The starting assumption in any statistical test: that nothing interesting is happening. In A/B testing, it means assuming your variant performs identically to your control, any observed lift is just random noise until proven otherwise.

The null hypothesis, written H0, is the default position you're trying to disprove. It's the assumption of "nothing interesting is happening here" that you hold until the evidence forces you to abandon it. For a website experiment, H0 says: Variant B converts at exactly the same rate as Variant A, and any difference you see is coincidence.

You don't try to prove your new variant is better. You try to make the null hypothesis untenable. That backwards-feeling logic is the foundation of almost every A/B test ever run.

How you reject the null

To reject H0, you need enough evidence that the gap between A and B couldn't realistically have happened by chance alone. How much evidence counts as "enough" is set by your confidence level, usually 95 percent. Below that bar, you don't get to claim a winner; you simply fail to reject the null and admit you don't yet know.

Note the careful wording. You never "prove the null true." A test that comes back inconclusive hasn't shown the variants are identical; it has only failed to show they're different. That distinction sounds pedantic until it saves you from a bad decision.

A concrete example

You're testing a new pricing page. The old page converts at 3 percent. The new one is running at 3.4 percent. Promising, but the null hypothesis is sitting right there saying: there's no real difference, you just got lucky with who happened to land on the new page.

To overrule it, you need enough visitors that a 3.4 percent result becomes genuinely implausible under the assumption that the true rate is still 3 percent. With a few hundred visitors, a jump to 3.4 percent is well within the range of normal luck. With many thousands, the same gap becomes hard to explain as coincidence, and only then can you reasonably call the new page a winner.

Why it matters

Thinking in null-hypothesis terms keeps you skeptical, which is exactly what you want when money is on the line. Early results that look exciting are very often perfectly consistent with H0. The 30 percent lift you're celebrating after a day might be entirely compatible with the two variants being identical, and it can evaporate as more data arrives. The null hypothesis is the discipline that stops you from shipping noise and calling it a win.

Our take

Classical hypothesis testing is built around a single moment: you collect data, you decide whether the evidence is strong enough to reject H0, and then you stop, declare a winner, and ship. Everything is organized around that one verdict.

Dalton's multi-armed bandit isn't organized around a verdict at all. It treats every visitor as one more data point that updates how much traffic each variant deserves, right now, given everything seen so far. The question quietly changes. Instead of "do I have enough evidence to reject the null?", it becomes "given what we know at this moment, where should the next thousand visitors go?" The first question can only be answered once, at the end. The second gets answered continuously, and the answer improves as evidence accumulates.

That doesn't mean the null hypothesis is wrong or that skepticism stops mattering. The same caution applies: a variant that looks good early might just be lucky, and the bandit is built to account for exactly that, holding back from committing while uncertainty is still high. What changes is that you're no longer forced to convert all of that nuance into a single yes/no answer at a single moment. You let the allocation reflect your uncertainty directly, and you act on it the whole way through.The null hypothesis, written H0, is the default position you're trying to disprove. It's the assumption of "nothing interesting is happening here" that you hold until the evidence forces you to abandon it. For a website experiment, H0 says: Variant B converts at exactly the same rate as Variant A, and any difference you see is coincidence.

You don't try to prove your new variant is better. You try to make the null hypothesis untenable. That backwards-feeling logic is the foundation of almost every A/B test ever run.

How you reject the null

To reject H0, you need enough evidence that the gap between A and B couldn't realistically have happened by chance alone. How much evidence counts as "enough" is set by your confidence level, usually 95 percent. Below that bar, you don't get to claim a winner; you simply fail to reject the null and admit you don't yet know.

Note the careful wording. You never "prove the null true." A test that comes back inconclusive hasn't shown the variants are identical; it has only failed to show they're different. That distinction sounds pedantic until it saves you from a bad decision.

A concrete example

You're testing a new pricing page. The old page converts at 3 percent. The new one is running at 3.4 percent. Promising, but the null hypothesis is sitting right there saying: there's no real difference, you just got lucky with who happened to land on the new page.

To overrule it, you need enough visitors that a 3.4 percent result becomes genuinely implausible under the assumption that the true rate is still 3 percent. With a few hundred visitors, a jump to 3.4 percent is well within the range of normal luck. With many thousands, the same gap becomes hard to explain as coincidence, and only then can you reasonably call the new page a winner.

Why it matters

Thinking in null-hypothesis terms keeps you skeptical, which is exactly what you want when money is on the line. Early results that look exciting are very often perfectly consistent with H0. The 30 percent lift you're celebrating after a day might be entirely compatible with the two variants being identical, and it can evaporate as more data arrives. The null hypothesis is the discipline that stops you from shipping noise and calling it a win.

Our take

Classical hypothesis testing is built around a single moment: you collect data, you decide whether the evidence is strong enough to reject H0, and then you stop, declare a winner, and ship. Everything is organized around that one verdict.

Dalton's multi-armed bandit isn't organized around a verdict at all. It treats every visitor as one more data point that updates how much traffic each variant deserves, right now, given everything seen so far. The question quietly changes. Instead of "do I have enough evidence to reject the null?", it becomes "given what we know at this moment, where should the next thousand visitors go?" The first question can only be answered once, at the end. The second gets answered continuously, and the answer improves as evidence accumulates.

That doesn't mean the null hypothesis is wrong or that skepticism stops mattering. The same caution applies: a variant that looks good early might just be lucky, and the bandit is built to account for exactly that, holding back from committing while uncertainty is still high. What changes is that you're no longer forced to convert all of that nuance into a single yes/no answer at a single moment. You let the allocation reflect your uncertainty directly, and you act on it the whole way through.