Modules
05/15
Statistics

Contents

Confidence Intervals

Moving from a single point estimate to a range of plausible truths.

Introduction

In the previous chapter on Sampling Distributions, we learned that sample means fluctuate. If you take a sample of students and calculate their average height, you might get 165cm. If you take another sample, you might get 168cm.

Reporting a single number (a Point Estimate) like "165cm" is risky because it does not communicate uncertainty. It implies a precision that does not exist.

The Big Question

A Confidence Interval (CI) solves this by providing a range. Instead of saying:

Point Estimate (Risky)

"The mean height is 165cm."

Confidence Interval (Better)

"We are 95% confident the true mean is between 160cm and 170cm."

This range accounts for the natural variability of sampling. But what does "95% confident" actually mean? This is one of the most misunderstood concepts in statistics, and we will clarify it with an interactive simulation.

The Intuition: The Fishing Net Analogy

Imagine the true population parameter (e.g., the true average height of all humans) is an invisible, stationary fish in a murky lake. You cannot see the fish. You only know it is somewhere in the lake.

Point Estimate = Throwing a Spear

You throw a spear (a single number like 165cm) into the water. It is unlikely you will hit the exact center of the fish. You might be close, but you have no idea how close.

Confidence Interval = Casting a Net

You cast a net instead. The net has a certain width. You do not know for sure if the fish is inside, but if you use a wide enough net, you can be "confident" you caught it.

The Confidence Level (e.g., 95%)

If you cast this net 100 times in different spots (repeated sampling), you expect to catch the fish approximately 95 times. The fish does not move; only your net position varies.

Interactive Demo: Cast Your Nets

Watch the fishing net analogy in action. Each horizontal bar is a confidence interval. The vertical cyan line is the true population mean. Notice how approximately 95% of intervals capture the truth over time.

Confidence Level
Sample Size (n)30
Casts
0
Capture Rate
--%
Target: 95%
True Mean (μ = 50)
Click "Cast Net" to simulate sampling...

Interpretation: The "95%" refers to the long-run capture rate of the method. Any individual interval either misses (Red) or hits (Green). It is strictly binary.

Anatomy of a Confidence Interval

A confidence interval is constructed from two main parts: the center (Point Estimate) and the width (Margin of Error).

CI=Point Estimate±Margin of ErrorCI = \text{Point Estimate} \pm \text{Margin of Error}
CI=xˉ±(z×σn)CI = \bar{x} \pm (z^* \times \frac{\sigma}{\sqrt{n}})

1. Point Estimate (xˉ\bar{x})

The mean calculated from your specific sample. This is the center of your interval - your best single guess.

2. Critical Value (zz^* or tt^*)

Determined by your Confidence Level. For 95% confidence, z=1.96z^* = 1.96. This tells you how many standard errors wide the net needs to be.

3. Standard Error (σn\frac{\sigma}{\sqrt{n}})

How much we expect the sample mean to fluctuate. We covered this in detail in theSampling Distributions chapter.

4. Margin of Error (ME)

The product of Critical Value and Standard Error. This is half the width of your interval - the "reach" of your net in one direction.

Common Critical Values (Z-scores)

Confidence LevelAlpha (tail area)Critical Value (z*)
90%0.101.645
95%0.051.960
99%0.012.576

Decision: Z-Score vs. T-Score

This is a critical decision point in real analysis. Which critical value do you use?

ScenarioDistributionFormula
Large Sample (n30n \ge 30)
or known population σ\sigma
Z-Distribution (Normal)xˉ±zσn\bar{x} \pm z^* \frac{\sigma}{\sqrt{n}}
Small Sample (n<30n < 30)
AND unknown population σ\sigma
Student's T-Distributionxˉ±tsn\bar{x} \pm t^* \frac{s}{\sqrt{n}}

Why T instead of Z?

The T-distribution has "fatter tails" to account for extra uncertainty when estimating σ\sigma from small samples. We covered this in detail with an interactive visualization in theSampling Distributions chapter.

Step-by-Step Calculation Examples

CASE 1

Z-Interval (Large N / Proportions)

Problem: A retailer samples 121 orders. 102 were shipped within 12 hours. Calculate the 95% CI for the proportion of orders shipped fast.

Step 1: Calculate sample proportion: p^=102/121=0.84\hat{p} = 102/121 = 0.84

Step 2: Find critical value: For 95%, z=1.96z^* = 1.96

Step 3: Calculate Standard Error: SE=0.84×0.161210.033SE = \sqrt{\frac{0.84 \times 0.16}{121}} \approx 0.033

Step 4: Margin of Error: ME=1.96×0.033=0.065ME = 1.96 \times 0.033 = 0.065

Step 5: Construct interval: 0.84±0.065=[0.775,0.905]0.84 \pm 0.065 = [0.775, 0.905]

Conclusion: We are 95% confident that 77.5% to 90.5% of all orders ship within 12 hours.

CASE 2

T-Interval (Small N, Unknown σ\sigma)

Problem: Measuring boiling point of a liquid. Sample size n=6n=6. Mean xˉ=101.82\bar{x} = 101.82. Sample Std Dev s=1.2s = 1.2. Calculate 95% CI.

Step 1: Degrees of Freedom: df=n1=5df = n - 1 = 5

Step 2: Look up T-table for df=5, 95% confidence: t=2.571t^* = 2.571

Step 3: Standard Error: SE=1.2/60.49SE = 1.2 / \sqrt{6} \approx 0.49

Step 4: Margin of Error: ME=2.571×0.491.26ME = 2.571 \times 0.49 \approx 1.26

Step 5: Construct interval: 101.82±1.26=[100.56,103.08]101.82 \pm 1.26 = [100.56, 103.08]

Note: If we had erroneously used Z=1.96Z=1.96, our interval would be [100.86, 102.78] - too narrow and overconfident!

The 95% Trap

STOP AND READ CAREFULLY

Once you calculate a specific interval, say [160,170][160, 170], it is INCORRECT to say:

"There is a 95% probability that the true mean is between 160 and 170."

Why is this wrong? In Frequentist statistics, the true parameter is a fixed constant (the fish does not move). The interval is the variable (the net position varies). Once you catch the fish (calculate the interval), the fish is either inside (Probability = 1) or not (Probability = 0). There is no "95% inside."

Correct Interpretation: "If we repeated this sampling procedure many times, 95% of the intervals constructed would contain the true population mean."

This is exactly what the interactive simulation above demonstrates. Go back and watch - approximately 95% of the nets capture the fish, but each individual net either has the fish (100%) or does not (0%).

Factors Affecting Interval Width

A narrower interval is generally better (more precision), provided we maintain confidence. How do we achieve that?

n
Sample Size
Increase n

Since n is in the denominator (inside n\sqrt{n}), quadrupling your sample size cuts your margin of error in half.

CL
Confidence Level
Decrease CL

Lowering confidence (99% to 90%) reduces the critical value (2.576 to 1.645), narrowing the interval. Trade-off: less certainty.

σ\sigma
Standard Deviation
Decrease σ\sigma

Less variable populations yield more precise estimates. Usually hard to control in practice - this is a property of the population.

Interactive: Interval Width Explorer

Play with all three factors and watch how the interval width changes. This builds intuition for the trade-offs.

-5.4+5.4
scale: 1 unit = 12px
30
CostlyPrecise
95%
RiskySafe
15
ConsistentChaotic

Calculation

Width = 2 ×
1.960Z Score
×
15
30
=10.74
Adjust the sliders to see how the mathematical "width" translates to visual uncertainty.

Applications in Machine Learning

Confidence intervals are essential for honest reporting of ML results.

1. A/B Testing

When comparing Model A (85% accuracy) vs Model B (86% accuracy), simply comparing numbers is insufficient. We calculate the CI for the difference in proportions. If the CI includes 0 (e.g., [-0.02, 0.04]), we cannot claim Model B is better - the difference could be noise. See theHypothesis Testing chapter for more.

2. Cross-Validation Scores

When you run 5-fold CV and get scores [0.82, 0.85, 0.81, 0.84, 0.83], calculate the CI for the mean. Report "Accuracy: 83.0% ± 1.5% (95% CI)" instead of just "83%". This communicates uncertainty honestly. The CLT guarantees this works.

3. Regression Coefficients

In Linear Regression, every weight/coefficient has a CI. If the CI for a feature's weight includes 0, that feature is likely not statistically significant for predicting the target variable and might be dropped.