Introduction
In the previous chapter on Sampling Distributions, we learned that sample means fluctuate. If you take a sample of students and calculate their average height, you might get 165cm. If you take another sample, you might get 168cm.
Reporting a single number (a Point Estimate) like "165cm" is risky because it does not communicate uncertainty. It implies a precision that does not exist.
The Big Question
A Confidence Interval (CI) solves this by providing a range. Instead of saying:
"The mean height is 165cm."
"We are 95% confident the true mean is between 160cm and 170cm."
This range accounts for the natural variability of sampling. But what does "95% confident" actually mean? This is one of the most misunderstood concepts in statistics, and we will clarify it with an interactive simulation.
The Intuition: The Fishing Net Analogy
Imagine the true population parameter (e.g., the true average height of all humans) is an invisible, stationary fish in a murky lake. You cannot see the fish. You only know it is somewhere in the lake.
You throw a spear (a single number like 165cm) into the water. It is unlikely you will hit the exact center of the fish. You might be close, but you have no idea how close.
You cast a net instead. The net has a certain width. You do not know for sure if the fish is inside, but if you use a wide enough net, you can be "confident" you caught it.
The Confidence Level (e.g., 95%)
If you cast this net 100 times in different spots (repeated sampling), you expect to catch the fish approximately 95 times. The fish does not move; only your net position varies.
Interactive Demo: Cast Your Nets
Watch the fishing net analogy in action. Each horizontal bar is a confidence interval. The vertical cyan line is the true population mean. Notice how approximately 95% of intervals capture the truth over time.
Interpretation: The "95%" refers to the long-run capture rate of the method. Any individual interval either misses (Red) or hits (Green). It is strictly binary.
Anatomy of a Confidence Interval
A confidence interval is constructed from two main parts: the center (Point Estimate) and the width (Margin of Error).
1. Point Estimate ()
The mean calculated from your specific sample. This is the center of your interval - your best single guess.
2. Critical Value ( or )
Determined by your Confidence Level. For 95% confidence, . This tells you how many standard errors wide the net needs to be.
3. Standard Error ()
How much we expect the sample mean to fluctuate. We covered this in detail in theSampling Distributions chapter.
4. Margin of Error (ME)
The product of Critical Value and Standard Error. This is half the width of your interval - the "reach" of your net in one direction.
Common Critical Values (Z-scores)
| Confidence Level | Alpha (tail area) | Critical Value (z*) |
|---|---|---|
| 90% | 0.10 | 1.645 |
| 95% | 0.05 | 1.960 |
| 99% | 0.01 | 2.576 |
Decision: Z-Score vs. T-Score
This is a critical decision point in real analysis. Which critical value do you use?
| Scenario | Distribution | Formula |
|---|---|---|
Large Sample () or known population | Z-Distribution (Normal) | |
Small Sample () AND unknown population | Student's T-Distribution |
Why T instead of Z?
The T-distribution has "fatter tails" to account for extra uncertainty when estimating from small samples. We covered this in detail with an interactive visualization in theSampling Distributions chapter.
Step-by-Step Calculation Examples
Z-Interval (Large N / Proportions)
Problem: A retailer samples 121 orders. 102 were shipped within 12 hours. Calculate the 95% CI for the proportion of orders shipped fast.
Step 1: Calculate sample proportion:
Step 2: Find critical value: For 95%,
Step 3: Calculate Standard Error:
Step 4: Margin of Error:
Step 5: Construct interval:
Conclusion: We are 95% confident that 77.5% to 90.5% of all orders ship within 12 hours.
T-Interval (Small N, Unknown )
Problem: Measuring boiling point of a liquid. Sample size . Mean . Sample Std Dev . Calculate 95% CI.
Step 1: Degrees of Freedom:
Step 2: Look up T-table for df=5, 95% confidence:
Step 3: Standard Error:
Step 4: Margin of Error:
Step 5: Construct interval:
Note: If we had erroneously used , our interval would be [100.86, 102.78] - too narrow and overconfident!
The 95% Trap
STOP AND READ CAREFULLY
Once you calculate a specific interval, say , it is INCORRECT to say:
"There is a 95% probability that the true mean is between 160 and 170."
Why is this wrong? In Frequentist statistics, the true parameter is a fixed constant (the fish does not move). The interval is the variable (the net position varies). Once you catch the fish (calculate the interval), the fish is either inside (Probability = 1) or not (Probability = 0). There is no "95% inside."
Correct Interpretation: "If we repeated this sampling procedure many times, 95% of the intervals constructed would contain the true population mean."
This is exactly what the interactive simulation above demonstrates. Go back and watch - approximately 95% of the nets capture the fish, but each individual net either has the fish (100%) or does not (0%).
Factors Affecting Interval Width
A narrower interval is generally better (more precision), provided we maintain confidence. How do we achieve that?
Since n is in the denominator (inside ), quadrupling your sample size cuts your margin of error in half.
Lowering confidence (99% to 90%) reduces the critical value (2.576 to 1.645), narrowing the interval. Trade-off: less certainty.
Less variable populations yield more precise estimates. Usually hard to control in practice - this is a property of the population.
Interactive: Interval Width Explorer
Play with all three factors and watch how the interval width changes. This builds intuition for the trade-offs.
Calculation
Applications in Machine Learning
Confidence intervals are essential for honest reporting of ML results.
1. A/B Testing
When comparing Model A (85% accuracy) vs Model B (86% accuracy), simply comparing numbers is insufficient. We calculate the CI for the difference in proportions. If the CI includes 0 (e.g., [-0.02, 0.04]), we cannot claim Model B is better - the difference could be noise. See theHypothesis Testing chapter for more.
2. Cross-Validation Scores
When you run 5-fold CV and get scores [0.82, 0.85, 0.81, 0.84, 0.83], calculate the CI for the mean. Report "Accuracy: 83.0% ± 1.5% (95% CI)" instead of just "83%". This communicates uncertainty honestly. The CLT guarantees this works.
3. Regression Coefficients
In Linear Regression, every weight/coefficient has a CI. If the CI for a feature's weight includes 0, that feature is likely not statistically significant for predicting the target variable and might be dropped.