Glossary Back to glossary

Confidence Interval

The range your true conversion rate is most likely to sit inside, based on what your test has seen so far. Less data → wider interval → less certainty about the real number.

A confidence interval is a range of plausible values for a number you can't measure exactly, along with how confident you are that the true value sits inside it. Sample data only gets you so close to the truth, so instead of a single point, you report a range. If a variant converts at 4.2 percent over 500 visitors, the true conversion rate could plausibly be anywhere from roughly 3.0 to 5.5 percent. That range is your 95 percent confidence interval.

Why a range, not a number

A point estimate like "4.2 percent" hides how much you actually know. With 500 visitors, 4.2 percent is barely more than a guess. With 50,000 visitors at the same 4.2 percent, it's close to a fact. The average is identical; the certainty is not, and only the interval shows the difference.

As data accumulates, the interval shrinks. The same 4.2 percent rate measured over 50,000 visitors might carry an interval of 4.05 to 4.35 percent: same center, far tighter spread. Width is the part that matters. It is the visual measure of how much you should trust the number in the middle.

Why it beats looking at a single number

This is where confidence intervals earn their keep in A/B testing. A variant advertising a "+12 percent lift" sounds like a clear win. But if the confidence interval around that lift runs from −3 to +27 percent, the test has not ruled out that the variant is actually worse than the control. The headline number is real; it just isn't reliable yet. Point estimates flatter you. Intervals keep you honest.

How to read them

Most A/B testing tools show intervals as error bars or as a "± X percent" beside the result. The rule of thumb: look at overlap. If two variants' intervals overlap heavily, you don't have a winner yet, no matter how much better one average looks. When the intervals separate cleanly, the difference is real. A 95 percent interval is the common default, meaning that if you repeated the experiment many times, about 95 percent of the intervals you'd compute would contain the true rate.

Our take

Confidence intervals are built for a one-time decision: wait until the interval is tight and clearly separated, then make a single stop-or-ship call. That fits how traditional A/B testing works, and it's a sound way to read those results.

Dalton runs on a different idea. Our bandit doesn't wait for a final verdict; it acts on its uncertainty continuously. Under the hood it isn't reading off confidence intervals, it's using the full probability distribution behind each variant (the Bayesian relative of the interval) and drawing from it to decide where the next visitor goes. The effect is the same intuition you already have about interval width, but put to work in real time: when a variant's distribution is still wide, the system stays uncertain and keeps exploring it; once the evidence narrows and a variant clearly trails, traffic gets pulled away from it automatically.

Same instinct that makes a good analyst distrust a flattering point estimate, wired directly into how traffic is allocated, every visitor, instead of saved for a single decision at the end.