Type 1 & Type 2 Errors
The two ways an experiment can mislead you. Type 1 (false positive): you declare a winner that doesn't actually beat the control. Type 2 (false negative): you miss a winner that's genuinely better.
Every A/B test ends in a yes/no call: ship the variant, or keep the control. Both calls can be wrong, in opposite directions, and those two kinds of mistake have names. A Type 1 error is shipping something that doesn't actually work. A Type 2 error is killing something that does. Understanding both is the difference between a test that protects you and a test that quietly costs you money.
Type 1: the false positive
A Type 1 error is a false alarm. Your test reports "B wins by 8 percent," you ship it, and three months later the numbers look exactly as they did before. Random noise tricked you into seeing a winner that was never there.
This is baked into the math. The standard 95 percent confidence threshold accepts a 5 percent Type 1 rate, which means that across many tests, roughly 1 in 20 of your declared "winners" is fictional. It's the price of being willing to act on evidence at all.
Type 2: the false negative
A Type 2 error is a missed win. Variant B is genuinely 5 percent better, but you ran the test too briefly, or with too little traffic, so the result never reached significance. The data looked inconclusive, so you killed the variant and moved on. The improvement was real. You just never gave it the chance to prove itself, and you'll never know what you walked away from.
The tradeoff between them
You can't simply eliminate both, because tightening one loosens the other. Demand 99 percent confidence and you'll catch fewer false positives, but the higher bar also makes you blind to smaller real wins that can't clear it in a reasonable window. Loosen the bar to move faster and you let more noise through. Every testing program is implicitly choosing a balance between these two errors, whether the team realizes it or not.
Why the costly error is the invisible one
Here's the part most teams get backwards. A Type 1 error is visible: you shipped a change, you can see it live, and someone eventually notices it didn't move the needle. So teams guard heavily against it. A Type 2 error leaves no trace. The winner you killed never existed as far as anyone can tell, so nobody mourns it.
But the silent error is often the more expensive one. A false positive wastes some development cycles shipping a non-improvement. A false negative leaves real, recurring revenue on the table, indefinitely, with no alarm going off. Teams overweight Type 1 because failure is embarrassing when it's visible, and invisible when it isn't.
Our take
Notice that both errors come from the same source: a single, all-or-nothing decision made at one moment in time. You collect data in a fixed window, then declare a winner or you don't. Force everything through that one yes/no gate and you're guaranteed to make both kinds of mistake sometimes.
Dalton's bandit sidesteps the binary decision entirely. Instead of asking "winner or not?" once at the end, it continuously routes more traffic toward variants that are looking better and less toward those looking worse. The consequences for both errors are direct. A weak variant gets its exposure throttled long before you'd ever formally call it dead, so the cost of a near-miss is small and self-correcting rather than a clean kill you can't undo. And a genuine winner starts earning more traffic, and compounding more revenue, while a fixed-window A/B test would still be sitting there collecting data toward a verdict.
It doesn't make the underlying uncertainty disappear; nothing can. What it changes is that you're no longer betting the whole decision on a single threshold at a single moment. You're spreading that decision across every visitor, adjusting as evidence accumulates, so neither error has a single point at which it can quietly cost you everything.