What is a confidence interval and how do you interpret it?

A confidence interval (CI) is a range of values constructed from sample data that is designed to capture the true population parameter with a specified level of confidence. Rather than reporting a single point estimate, a CI communicates uncertainty by providing a lower and upper bound. The correct interpretation is procedural: if you repeated the same sampling process many times, the stated percentage of those intervals would contain the true parameter — for example, 95% of 95% CIs would capture the truth.

What does a 95% confidence interval actually mean?

A 95% confidence interval does not mean there is a 95% probability that the true parameter lies within a specific calculated interval. Once the interval is computed, the true value is either inside it or it is not. The 95% refers to the long-run procedure: if you drew many samples and built a CI from each, approximately 95% of those intervals would contain the true population parameter. The probability describes the reliability of the method, not any single interval.

How is a confidence interval different from a prediction interval?

A confidence interval estimates the range for a population parameter (like a mean), while a prediction interval estimates the range for a single new observation. Prediction intervals are always wider because they must account for both the uncertainty in estimating the mean and the natural variability of individual data points. In machine learning, confidence intervals are used for model metrics, while prediction intervals appear in regression to bound individual forecasts.

How does sample size affect the width of a confidence interval?

Larger sample sizes produce narrower (more precise) confidence intervals because the standard error decreases as n increases — specifically as 1/√n. This means you must quadruple your sample size to halve the margin of error. In practice, this trade-off between precision and data collection cost is a key consideration in study and experiment design.

How are confidence intervals used in A/B testing and model evaluation?

In A/B testing, confidence intervals are calculated for the difference between two metrics (e.g., accuracy of Model A vs Model B). If the CI for the difference includes zero, the result is not statistically significant and you cannot conclude one model is better. In model evaluation, reporting cross-validation accuracy with a 95% CI (e.g., '83.0% ± 1.5%') is best practice because it honestly communicates the uncertainty in the performance estimate rather than overstating precision with a single number.

Confidence Intervals: Complete Guide

Introduction

In the previous chapter on Sampling Distributions, we learned that sample means fluctuate. If you take a sample of students and calculate their average height, you might get 165cm. If you take another sample, you might get 168cm.

Reporting a single number (a Point Estimate) like "165cm" is risky because it does not communicate uncertainty. It implies a precision that does not exist.

The Big Question

A Confidence Interval (CI) solves this by providing a range. Instead of saying:

Point Estimate (Risky)

"The mean height is 165cm."

Confidence Interval (Better)

"We are 95% confident the true mean is between 160cm and 170cm."

This range accounts for the natural variability of sampling. But what does "95% confident" actually mean? This is one of the most misunderstood concepts in statistics, and we will clarify it with an interactive simulation.

The Intuition: The Fishing Net Analogy

Imagine the true population parameter (e.g., the true average height of all humans) is an invisible, stationary fish in a murky lake. You cannot see the fish. You only know it is somewhere in the lake.

Point Estimate = Throwing a Spear

You throw a spear (a single number like 165cm) into the water. It is unlikely you will hit the exact center of the fish. You might be close, but you have no idea how close.

Confidence Interval = Casting a Net

You cast a net instead. The net has a certain width. You do not know for sure if the fish is inside, but if you use a wide enough net, you can be "confident" you caught it.

The Confidence Level (e.g., 95%)

If you cast this net 100 times in different spots (repeated sampling), you expect to catch the fish approximately 95 times. The fish does not move; only your net position varies.

Interactive Demo: Cast Your Nets

Watch the fishing net analogy in action. Each horizontal bar is a confidence interval. The vertical cyan line is the true population mean. Notice how approximately 95% of intervals capture the truth over time.

Confidence Level

Sample Size (n)30

Casts

Capture Rate

--%

Target: 95%

True Mean (μ = 50)

Click "Cast Net" to simulate sampling...

Interpretation: The "95%" refers to the long-run capture rate of the method. Any individual interval either misses (Red) or hits (Green). It is strictly binary.

Anatomy of a Confidence Interval

A confidence interval is constructed from two main parts: the center (Point Estimate) and the width (Margin of Error).

CI = \text{Point Estimate} \pm \text{Margin of Error}

CI = \bar{x} \pm (z^* \times \frac{\sigma}{\sqrt{n}})

1. Point Estimate ( $\bar{x}$ )

The mean calculated from your specific sample. This is the center of your interval - your best single guess.

2. Critical Value ( $z^$ or $t^$ )

Determined by your Confidence Level. For 95% confidence, $z^* = 1.96$ . This tells you how many standard errors wide the net needs to be.

3. Standard Error ( $\frac{\sigma}{\sqrt{n}}$ )

How much we expect the sample mean to fluctuate. We covered this in detail in theSampling Distributions chapter.

4. Margin of Error (ME)

The product of Critical Value and Standard Error. This is half the width of your interval - the "reach" of your net in one direction.

Common Critical Values (Z-scores)

Confidence Level	Alpha (tail area)	Critical Value (z*)
90%	0.10	1.645
95%	0.05	1.960
99%	0.01	2.576

Decision: Z-Score vs. T-Score

This is a critical decision point in real analysis. Which critical value do you use?

Scenario	Distribution	Formula
Large Sample ( $n \ge 30$ ) or known population $\sigma$	Z-Distribution (Normal)	$\bar{x} \pm z^* \frac{\sigma}{\sqrt{n}}$
Small Sample ( $n < 30$ ) AND unknown population $\sigma$	Student's T-Distribution	$\bar{x} \pm t^* \frac{s}{\sqrt{n}}$

Why T instead of Z?

The T-distribution has "fatter tails" to account for extra uncertainty when estimating $\sigma$ from small samples. We covered this in detail with an interactive visualization in theSampling Distributions chapter.

Step-by-Step Calculation Examples

CASE 1

Z-Interval (Large N / Proportions)

Problem: A retailer samples 121 orders. 102 were shipped within 12 hours. Calculate the 95% CI for the proportion of orders shipped fast.

Step 1: Calculate sample proportion: $\hat{p} = 102/121 = 0.84$

Step 2: Find critical value: For 95%, $z^* = 1.96$

Step 3: Calculate Standard Error: $SE = \sqrt{\frac{0.84 \times 0.16}{121}} \approx 0.033$

Step 4: Margin of Error: $ME = 1.96 \times 0.033 = 0.065$

Step 5: Construct interval: $0.84 \pm 0.065 = [0.775, 0.905]$

Conclusion: We are 95% confident that 77.5% to 90.5% of all orders ship within 12 hours.

CASE 2

T-Interval (Small N, Unknown $\sigma$ )

Problem: Measuring boiling point of a liquid. Sample size $n=6$ . Mean $\bar{x} = 101.82$ . Sample Std Dev $s = 1.2$ . Calculate 95% CI.

Step 1: Degrees of Freedom: $df = n - 1 = 5$

Step 2: Look up T-table for df=5, 95% confidence: $t^* = 2.571$

Step 3: Standard Error: $SE = 1.2 / \sqrt{6} \approx 0.49$

Step 4: Margin of Error: $ME = 2.571 \times 0.49 \approx 1.26$

Step 5: Construct interval: $101.82 \pm 1.26 = [100.56, 103.08]$

Note: If we had erroneously used $Z=1.96$ , our interval would be [100.86, 102.78] - too narrow and overconfident!

The 95% Trap

STOP AND READ CAREFULLY

Once you calculate a specific interval, say $[160, 170]$ , it is INCORRECT to say:

"There is a 95% probability that the true mean is between 160 and 170."

Why is this wrong? In Frequentist statistics, the true parameter is a fixed constant (the fish does not move). The interval is the variable (the net position varies). Once you catch the fish (calculate the interval), the fish is either inside (Probability = 1) or not (Probability = 0). There is no "95% inside."

Correct Interpretation: "If we repeated this sampling procedure many times, 95% of the intervals constructed would contain the true population mean."

This is exactly what the interactive simulation above demonstrates. Go back and watch - approximately 95% of the nets capture the fish, but each individual net either has the fish (100%) or does not (0%).

Factors Affecting Interval Width

A narrower interval is generally better (more precision), provided we maintain confidence. How do we achieve that?

Sample Size

Increase n

Since n is in the denominator (inside $\sqrt{n}$ ), quadrupling your sample size cuts your margin of error in half.

Confidence Level

Decrease CL

Lowering confidence (99% to 90%) reduces the critical value (2.576 to 1.645), narrowing the interval. Trade-off: less certainty.

\sigma

Standard Deviation

Decrease

\sigma

Less variable populations yield more precise estimates. Usually hard to control in practice - this is a property of the population.

Interactive: Interval Width Explorer

Play with all three factors and watch how the interval width changes. This builds intuition for the trade-offs.

-5.4+5.4

scale: 1 unit = 12px

Sample Size (n)

CostlyPrecise

Confidence Level

95%

RiskySafe

Population SD (σ)

ConsistentChaotic

Calculation

Width = 2 ×

1.960Z Score

√30

=10.74

Adjust the sliders to see how the mathematical "width" translates to visual uncertainty.

Applications in Machine Learning

Confidence intervals are essential for honest reporting of ML results.

1. A/B Testing

When comparing Model A (85% accuracy) vs Model B (86% accuracy), simply comparing numbers is insufficient. We calculate the CI for the difference in proportions. If the CI includes 0 (e.g., [-0.02, 0.04]), we cannot claim Model B is better - the difference could be noise. See theHypothesis Testing chapter for more.

2. Cross-Validation Scores

When you run 5-fold CV and get scores [0.82, 0.85, 0.81, 0.84, 0.83], calculate the CI for the mean. Report "Accuracy: 83.0% ± 1.5% (95% CI)" instead of just "83%". This communicates uncertainty honestly. The CLT guarantees this works.

3. Regression Coefficients

In Linear Regression, every weight/coefficient has a CI. If the CI for a feature's weight includes 0, that feature is likely not statistically significant for predicting the target variable and might be dropped.

Contents

Introduction

The Big Question

The Intuition: The Fishing Net Analogy

The Confidence Level (e.g., 95%)

Interactive Demo: Cast Your Nets

Anatomy of a Confidence Interval

1. Point Estimate (xˉ\bar{x}xˉ)

2. Critical Value (z∗z^*z∗ or t∗t^*t∗)

3. Standard Error (σn\frac{\sigma}{\sqrt{n}}n​σ​)

4. Margin of Error (ME)

Common Critical Values (Z-scores)

Decision: Z-Score vs. T-Score

Why T instead of Z?

Step-by-Step Calculation Examples

Z-Interval (Large N / Proportions)

T-Interval (Small N, Unknown σ\sigmaσ)

The 95% Trap

STOP AND READ CAREFULLY

Factors Affecting Interval Width

Interactive: Interval Width Explorer

Calculation

Applications in Machine Learning

1. A/B Testing

2. Cross-Validation Scores

3. Regression Coefficients

1. Point Estimate ( $\bar{x}$ )

2. Critical Value ( $z^$ or $t^$ )

3. Standard Error ( $\frac{\sigma}{\sqrt{n}}$ )

T-Interval (Small N, Unknown $\sigma$ )