What is a limit in calculus and why does it matter for machine learning?

A limit describes the value a function approaches as its input gets arbitrarily close to some point, without necessarily reaching it. In machine learning, limits are the foundation of derivatives: the derivative f'(x) is itself defined as a limit of the difference quotient as h approaches zero. Without limits there are no derivatives, and without derivatives there is no gradient descent or backpropagation.

What does it mean for a function to be continuous in machine learning?

A function is continuous at a point if three conditions hold: the function is defined there, the limit exists, and the limit equals the function value. In machine learning, continuity of activation functions and loss functions is essential for gradient descent to work: a jump discontinuity (like the step function) produces zero gradients almost everywhere, making backpropagation impossible. Smooth, continuous functions like sigmoid and softmax enable effective training.

What is the relationship between continuity and differentiability?

Differentiability is a strictly stronger condition than continuity: every differentiable function is continuous, but not every continuous function is differentiable. A function can be continuous yet have a sharp corner where no well-defined tangent line exists. ReLU is the canonical deep learning example: it is continuous everywhere but not differentiable at x = 0 due to the corner, which is handled in practice using subgradients.

How are limits used in defining derivatives for neural network training?

The derivative of a function f at a point x is defined as the limit of the difference quotient: f'(x) = lim_{h→0} [f(x+h) - f(x)] / h. This limit captures the instantaneous rate of change. In neural network training, these derivatives form the gradients used by backpropagation to update weights. If a function is discontinuous at a point, this limit fails to exist there and no gradient can be computed.

What is L'Hôpital's rule and when is it used in machine learning contexts?

L'Hôpital's rule states that when a limit of a ratio f(x)/g(x) yields an indeterminate form such as 0/0 or ∞/∞, the limit equals the limit of the ratio of their derivatives f'(x)/g'(x). In machine learning and statistics, it appears when deriving the behavior of loss functions at boundary values, such as showing that x·log(x) approaches 0 as x→0 in entropy calculations, or when analyzing the limiting behavior of softmax outputs.

Why do activation function choices in deep learning depend on continuity and differentiability?

The step function, used in early perceptrons, has a jump discontinuity at zero and zero derivative everywhere else, making gradients vanish and backpropagation fail. Sigmoid replaces it with a smooth, continuously differentiable function, enabling gradient flow. ReLU is continuous everywhere but non-differentiable at zero; its practical success shows that continuity alone is often sufficient for training, with subgradients handling the single non-differentiable point.

Limits and Continuity: Foundations of Calculus for ML

Introduction

Before we can discuss derivatives (the engine of optimization) or integrals (the foundation of probability), we must understand the bedrock they stand on: limits.

Calculus is the mathematics of change. But change happens at an instant, and measuring something "at an instant" involves dividing by zero ( $\frac{\Delta y}{0}$ ). To bypass this mathematical impossibility, we use limits: asking not "what is the value at this point," but "what value do we approach as we get infinitely close?"

Why Limits Matter for ML

Derivatives: The definition of derivative is a limit: $f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}$
Gradient Descent: Requires continuous, differentiable functions. Limits ensure this.
Activation Functions: Understanding where functions break (discontinuities) explains ReLU vs Sigmoid choices.
Convergence: Training loops converge when loss approaches a limit.

What is a Limit?

A limit describes the value a function approaches as the input approaches some value. The function doesn't need to actually reach that value, it just needs to get arbitrarily close.

Classic Example

Consider $f(x) = \frac{x^2 - 1}{x - 1}$ . If we plug in x = 1, we get $\frac{0}{0}$ , which is undefined.

But we can factor: $\frac{x^2-1}{x-1} = \frac{(x+1)(x-1)}{x-1} = x + 1$ for x != 1.

So as x approaches 1, f(x) approaches 2. The limit exists even though f(1) is undefined!

Left-hand Limit

\lim_{x \to c^-} f(x)

Approaching c from values less than c (from the left on number line).

Right-hand Limit

\lim_{x \to c^+} f(x)

Approaching c from values greater than c (from the right).

Limit Exists If and Only If

The limit $\lim_{x \to c} f(x) = L$ exists if and only if both one-sided limits exist AND are equal:

\lim_{x \to c^-} f(x) = \lim_{x \to c^+} f(x) = L

Interactive: Approaching a Limit

Watch how both sides approach the same value as we get closer to the point. The function has a "hole" at x = 1, but the limit still exists!

Approaching the Limit

Visualize $\lim_{x \to 1} \frac{x^2-1}{x-1} = 2$ . The hole at x=1 doesn't stop the limit!

Distance

Values Table

Left Limit

x \to 1^-

-0.5000

f(x)

0.5000

Right Limit

x \to 1^+

2.5000

f(x)

3.5000

Limit Exists!

Notice that as x gets closer to 1 from BOTH sides, f(x) gets closer to 2.

Even though f(1) is undefined (hole), we say $\lim_{x \to 1} f(x) = 2$ .

The Epsilon-Delta Definition

The formal definition of a limit is often considered one of the hardest concepts in Calculus 1. But it's actually just a way of describing a guarantee of precision.

The Manufacturing Analogy

Imagine you are manufacturing a high-precision piston (input $x$ ) that must fit into a cylinder (output $f(x)$ ).

1.The Goal (L): The cylinder has a perfect target width, say 10cm.
2.The Tolerance ( $\epsilon$ ): The customer says, "The width must be within 0.01cm of the target." This is your error margin.
3.The Input Precision ( $\delta$ ): You ask, "How precise must my piston mold be?" Maybe if the mold is within 0.005cm of the target size, the final piston will be within the customer's tolerance.

The Limit Exists If: No matter how strict the customer's tolerance ( $\epsilon$ ) is, you can always find a high-enough precision for your machine ( $\delta$ ) to satisfy it.

Formal Definition

We say $\lim_{x \to c} f(x) = L$ if:

For every $\epsilon > 0$ (the challenge), there exists a $\delta > 0$ (the response) such that:

0 < |x - c| < \delta \implies |f(x) - L| < \epsilon

Translation: If the input $x$ is within $\delta$ of $c$ , then the output $f(x)$ is guaranteed to be within $\epsilon$ of $L$ .

Limit Laws

These rules let us compute limits of complex expressions from simpler ones. If $\lim_{x \to c} f(x) = L$ and $\lim_{x \to c} g(x) = M$ :

Sum/Difference

\lim [f(x) \pm g(x)] = L \pm M

Product

\lim [f(x) \cdot g(x)] = L \cdot M

Quotient

\lim \frac{f(x)}{g(x)} = \frac{L}{M}

(if M != 0)

Scalar Multiple

\lim [k \cdot f(x)] = k \cdot L

Power

\lim [f(x)]^n = L^n

Composition

\lim f(g(x)) = f(M)

(if f continuous at M)

L'Hopital's Rule

When direct substitution gives an indeterminate form like $\frac{0}{0}$ or $\frac{\infty}{\infty}$ , L'Hopital's Rule provides a way forward.

L'Hopital's Rule

If $\lim_{x \to c} \frac{f(x)}{g(x)}$ gives $\frac{0}{0}$ or $\frac{\pm\infty}{\pm\infty}$ , then:

\lim_{x \to c} \frac{f(x)}{g(x)} = \lim_{x \to c} \frac{f'(x)}{g'(x)}

provided the limit on the right exists (or is infinity).

Example

Find $\lim_{x \to 0} \frac{\sin x}{x}$ :

Direct substitution: $\frac{\sin 0}{0} = \frac{0}{0}$ (indeterminate)

Apply L'Hopital: $\lim_{x \to 0} \frac{\cos x}{1} = \frac{\cos 0}{1} = 1$

This limit (sin x / x = 1 as x goes to 0) is fundamental in deriving the derivative of sine.

Caution

Only apply L'Hopital when you have an indeterminate form! Applying it to $\frac{1}{0}$ (which is just undefined, not indeterminate) gives wrong answers.

Continuity

Intuitively, a function is continuous if you can draw it without lifting your pen. Formally, continuity requires three conditions at a point c:

Three Conditions for Continuity at c

$f(c)$ is defined

No holes at the point.

$\lim_{x \to c} f(x)$ exists

Left and right limits agree.

$\lim_{x \to c} f(x) = f(c)$

The limit equals the function value.

Important Properties

Polynomials are continuous everywhere.
Rational functions are continuous except where denominator = 0.
Exponential, logarithmic, and trig functions are continuous on their domains.
Sums, products, and compositions of continuous functions are continuous.

Types of Discontinuity

Understanding where and how functions "break" is crucial for choosing activation functions and understanding gradient flow.

Types of Discontinuity

Select a type to see how the limit behaves and why it matters for ML.

Removable Discontinuity

f(x) = \frac{x^2+x}{x}

\lim_{x \to c} f(x) = L

The limit exists (left = right), but the function is undefined at the point (or defined differently).

Why it matters for ML

Removable discontinuities are trivial—we just 'fill the hole'. For example, defining 0/0 or 0*log(0) as 0 in entropy calculations.

Removable

Limit exists but f(c) is undefined or wrong. Can be "fixed" by redefining f(c) = L.

Example: (x^2-1)/(x-1) at x=1

Jump

Left and right limits both exist but differ. Function "jumps."

Example: Heaviside step function

Infinite

Function approaches infinity (vertical asymptote). Limit DNE.

Example: 1/x at x=0

Continuity vs Differentiability

This distinction is critical for understanding activation functions in deep learning.

The Hierarchy

All FunctionsincludesContinuousincludesDifferentiable

Differentiable implies Continuous

If f'(c) exists, then f must be continuous at c. You can't have a derivative at a gap or jump.

Continuous does NOT imply Differentiable

A function can be continuous but have a "corner" where the derivative is undefined.

The Classic Example: |x|

The absolute value function f(x) = |x| is continuous everywhere. But at x = 0:

Left derivative: $\lim_{h \to 0^-} \frac{|0+h|-|0|}{h} = \lim_{h \to 0^-} \frac{-h}{h} = -1$
Right derivative: $\lim_{h \to 0^+} \frac{|0+h|-|0|}{h} = \lim_{h \to 0^+} \frac{h}{h} = +1$

Since -1 != +1, the derivative at 0 does not exist. The function has a "corner."

ML Applications

The Death of the Perceptron

Early neural networks used the step function: output 0 if input is negative, 1 otherwise.

Problem: It has a jump discontinuity at 0. The derivative is 0 everywhere else, so gradients cannot flow. Backpropagation fails completely. This is why we moved to smooth activations like Sigmoid.

ReLU: The Practical Compromise

ReLU: $f(x) = \max(0, x)$ . It is continuous everywhere but has a corner at x = 0 (not differentiable there).

Solution: We use subgradients. We arbitrarily define f'(0) = 0 or 1. Since the probability of x being exactly 0.0000... is negligible, this works perfectly in practice.

Softmax and Numerical Stability

Softmax: $\sigma(z_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}$ . When $z_i$ gets large, $e^{z_i}$ can overflow to infinity. We use the trick of subtracting max(z) from all inputs, which doesn't change the output but keeps values bounded. This is a practical application of limit behavior.

Loss Function Continuity

MSE Loss $(y - \hat{y})^2$ is continuous and differentiable everywhere, making gradient descent smooth. Cross-entropy loss is continuous but has a singularity as predictions approach 0 or 1, which is why we clip probabilities away from exactly 0/1.

Contents

Introduction

Why Limits Matter for ML

What is a Limit?

Classic Example

Left-hand Limit

Right-hand Limit

Limit Exists If and Only If

Interactive: Approaching a Limit

Approaching the Limit

Values Table

Limit Exists!

The Epsilon-Delta Definition

The Manufacturing Analogy

Formal Definition

Limit Laws

L'Hopital's Rule

L'Hopital's Rule

Example

Caution

Continuity

Three Conditions for Continuity at c

Important Properties

Types of Discontinuity

Types of Discontinuity

Removable Discontinuity

Why it matters for ML

Removable

Jump

Infinite

Continuity vs Differentiability

The Hierarchy

Differentiable implies Continuous

Continuous does NOT imply Differentiable

The Classic Example: |x|

ML Applications

The Death of the Perceptron

ReLU: The Practical Compromise

Softmax and Numerical Stability

Loss Function Continuity