What is AUC-ROC and why is it used to evaluate classifiers?

AUC-ROC stands for Area Under the Receiver Operating Characteristic curve. The ROC curve plots the True Positive Rate (TPR) against the False Positive Rate (FPR) at every possible classification threshold. The AUC summarises this curve into a single scalar: a value of 1.0 means a perfect classifier while 0.5 means no discrimination power (random guessing). It is preferred over accuracy because it is threshold-independent and robust to class imbalance.

How does integration relate to the area under the ROC curve?

The AUC is literally a definite integral: AUC = ∫₀¹ TPR(FPR) d(FPR). Because ROC curves are constructed from a discrete set of threshold points rather than a continuous function, numerical integration methods such as the trapezoidal rule are used to approximate this integral by summing the areas of trapezoids formed between consecutive (FPR, TPR) points.

What is a good AUC score and how do you interpret it?

An AUC of 0.5 indicates a random classifier with no predictive power. Scores between 0.7 and 0.8 are considered acceptable, 0.8–0.9 excellent, and above 0.9 outstanding. AUC has a useful probabilistic interpretation: an AUC of 0.85 means the model correctly ranks a randomly chosen positive instance higher than a randomly chosen negative instance 85% of the time.

What is the difference between AUC-ROC and AUC-PR?

AUC-ROC plots TPR vs FPR and is well-suited to balanced datasets. AUC-PR (Precision-Recall AUC) plots Precision vs Recall and is more informative when the dataset is highly imbalanced and the majority class is negative, because it focuses on the model's performance on the minority (positive) class without being inflated by a large number of true negatives.

How is the trapezoidal rule used to compute AUC numerically?

After computing (FPR, TPR) pairs at each classification threshold, the trapezoidal rule approximates the AUC as: AUC ≈ Σᵢ (FPRᵢ₊₁ − FPRᵢ) × (TPRᵢ + TPRᵢ₊₁) / 2. Each term is the area of a trapezoid with parallel sides TPRᵢ and TPRᵢ₊₁ and width equal to the change in FPR. This is exactly what sklearn's roc_auc_score implements under the hood.

Integration & AUC-ROC: Area Under the Curve

From Sums to Areas

Integration is the reverse of differentiation. While derivatives tell us the rate of change, integrals accumulate quantities over an interval. In ML, we use integrals to compute areas under curves like the ROC curve.

\int_a^b f(x) \, dx = \lim_{n \to \infty} \sum_{i=1}^{n} f(x_i) \Delta x

The definite integral is the limit of Riemann sums as rectangles become infinitely thin.

Riemann Sums

The area under a curve is approximated by summing rectangles. The height of each rectangle is the function value, and width is $\Delta x = (b-a)/n$ .

Left Riemann

Use left endpoint of each interval for height. Underestimates for increasing functions.

Right Riemann

Use right endpoint. Overestimates for increasing functions.

Midpoint Rule

Use midpoint. Generally more accurate than left or right.

Interactive Simulator

Adjust the number of rectangles and watch the Riemann sum converge to the true AUC as n increases.

Riemann Sum Simulator

Approximating the Area Under Curve (Integral) with rectangles.

n4

Results

Riemann Sum (Approximation)

0.74876

\sum_{i=1}^{4} f(x_i^*)\Delta x

True Integral (Analytic)

0.74074

\int_0^1 x^{0.35} dx

Error1.08%

Why Integrate?

A single accuracy number can be misleading. Integration (AUC) summarizes performance across all specific thresholds.

As $n \to \infty$ , the Riemann sum converges exactly to the AUC.

AUC-ROC: The Area Under the ROC Curve

The ROC (Receiver Operating Characteristic) curve plots True Positive Rate vs False Positive Rate at various classification thresholds. The Area Under this Curve (AUC) is a single number summarizing classifier performance.

AUC = \int_0^1 TPR(FPR) \, d(FPR)

AUC = 1.0

Perfect classifier. The curve hugs the top left corner.

AUC = 0.5

Random classifier. The curve is the diagonal line. No discrimination power.

Probabilistic Interpretation

AUC = P(classifier ranks a random positive higher than a random negative). If AUC = 0.8, given a random positive and negative sample, the model correctly ranks them 80% of the time.

Case Study: Bulb Defect Classifier

You train a model to detect defective bulbs. It outputs a probability score from 0 to 1. Let's compute the AUC step by step.

Example model outputs:

Defective: [0.9, 0.8, 0.7, 0.6]Normal: [0.4, 0.3, 0.2, 0.1]

Step 1: Vary threshold from 0 to 1

At each threshold τ, predict "defective" if score ≥ τ. This gives different TPR/FPR pairs.

Step 2: Compute TPR and FPR at each threshold

TPR = \frac{TP}{TP + FN}

True positives / All actual positives

FPR = \frac{FP}{FP + TN}

False positives / All actual negatives

Step 3: Plot the ROC curve

Plot (FPR, TPR) for each threshold. Connect points to form curve from (0,0) to (1,1).

Step 4: Compute AUC using trapezoidal rule

AUC \approx \sum_{i} (FPR_{i+1} - FPR_i) \times \frac{TPR_i + TPR_{i+1}}{2}

For our example: AUC ≈ 0.94 (excellent discrimination)

Numerical Integration Methods

Trapezoidal Rule

Use trapezoids instead of rectangles. Error: $O(h^2)$

\int_a^b f(x)dx \approx \frac{h}{2}[f(a) + 2\sum f(x_i) + f(b)]

Simpson's Rule

Fit parabolas to three points. Error: $O(h^4)$

\int_a^b f(x)dx \approx \frac{h}{3}[f(a) + 4\sum f(x_{odd}) + 2\sum f(x_{even}) + f(b)]

Monte Carlo Integration

Sample random points and average. Scales well to high dimensions unlike grid methods.

ML Applications

AUC-ROC

Threshold independent evaluation for binary classifiers. Used by sklearn.metrics.roc_auc_score.

AUC-PR

Precision Recall AUC. Better for imbalanced datasets where negatives dominate.

Expected Calibration Error

Integrate the gap between predicted probability and actual frequency. Uses binning (discrete integration).

ELBO in VAEs

The Evidence Lower Bound involves integrals over latent variables. Approximated via Monte Carlo sampling.

Contents