04 Calculus: Limit, Continuity, and Derivatives

image.png լուսանկարի հղումը, Գեղարքունիք - Արծվանիստ, Հեղինակ՝ Հայկ Բարսեղյան

📚 Նյութը

YouTube links in this section were auto-extracted. If you spot a mistake, please let me know!

Դասախոսություն

Գործնական

🏡 Տնային

Note
  1. ❗❗❗ DON’T CHECK THE SOLUTIONS BEFORE TRYING TO DO THE HOMEWORK BY YOURSELF❗❗❗
  2. Please don’t hesitate to ask questions, never forget about the 🍊karalyok🍊 principle!
  3. The harder the problem is, the more 🧀cheeses🧀 it has.
  4. Problems with 🎁 are just extra bonuses. It would be good to try to solve them, but also it’s not the highest priority task.
  5. If the problem involve many boring calculations, feel free to skip them - important part is understanding the concepts.
  6. Submit your solutions here (even if it’s unfinished)

📖 Textbook reading (Section 5)

It appears you don't have a PDF plugin for this browser. No biggie... you can click here to download the PDF file.

If link above doesn’t work, just go to this link. PDF page - 76, book page - 70.

The eight problems below are extracted from Section 5.8 of the Armenian course notes (PDF pages 76-78). Problem 5.9 (cardboard box volume) is solved in HW 05 as Problem 01.

Limits

01 Sequence Limits (textbook 5.1)

Does the sequence have a limit as \(n \to \infty\)? If yes, find it.

  1. \(a_n = \dfrac{3 - n}{2}\)
  2. \(a_n = \dfrac{\sqrt[4]{n}}{n^3}\)
  3. \(a_n = \dfrac{n-1}{n^2-1}\)
  4. \(a_n = 1^n\)
  5. \(a_n = 0.4^n\)
  6. \(a_n = (-4)^n\)
  7. Pick any number as \(a_1\), then \(a_{n+1} = a_n / 1.5\).
  8. \(a_1 = 1\); for \(n \ge 2\), let \(a_n\) be the first digit after the decimal point of \(a_{n-1}/7\).

Hint: write out the first few terms; for the last one, a calculator helps.

a. \(\dfrac{3 - n}{2} = \dfrac{3}{2} - \dfrac{n}{2} \to -\infty\) as \(n \to \infty\). No finite limit (the sequence diverges to \(-\infty\)).

b. \(\dfrac{n^{1/4}}{n^3} = n^{-11/4} \to 0\).

c. Factor: \(\dfrac{n-1}{n^2 - 1} = \dfrac{n-1}{(n-1)(n+1)} = \dfrac{1}{n+1} \to 0\).

d. \(1^n = 1\) for every \(n\), so the sequence is constant. Limit is \(1\). (A small “trick”: despite looking like an exponential, the base is \(1\), so nothing happens.)

e. Geometric with \(|r| = 0.4 < 1\), so \(0.4^n \to 0\).

f. \((-4)^n\) alternates: the even-indexed subsequence \(a_{2k} = 4^{2k} \to +\infty\), the odd-indexed subsequence \(a_{2k+1} = -4^{2k+1} \to -\infty\). Two subsequences with different limits ⇒ the full sequence has no limit. (This subsequence argument is the standard tool for showing a sequence diverges by oscillation — keep it in your toolbox.)

g. Unfold the recursion: \[a_2 = \frac{a_1}{1.5}, \quad a_3 = \frac{a_2}{1.5} = \frac{a_1}{1.5^2}, \quad \ldots, \quad a_n = \frac{a_1}{1.5^{n-1}}\] Since \(1.5^{n-1} \to \infty\), \(a_n \to 0\) regardless of the starting value \(a_1\).

h. With \(a_1 = 1\): \(1/7 = 0.\underline{1}42857\ldots\), so \(a_2 = 1\). By the same calculation, \(a_3 = 1\), and by induction \(a_n = 1\) for all \(n\). The sequence is constant, so the limit is \(1\).

(Aside for the curious: the map \(x \mapsto\) “first decimal digit of \(x/7\)” has fixed points at \(0\), \(1\), and \(2\). Try \(a_1 = 5\): \(5/7 = 0.7\ldots \Rightarrow a_2 = 7\); \(7/7 = 1.0\ldots \Rightarrow a_3 = 0\); \(0/7 = 0 \Rightarrow a_4 = 0\). So starting from \(5\) the sequence reaches \(0\) after two steps and stays there.)

02 Indeterminate \(0/0\) Forms (textbook 5.2)

The limit theorem says nothing about \(\lim_{n \to \infty} \dfrac{a_n}{b_n}\) when \(\lim b_n = 0\). Construct examples of sequences with \(a_n \to 0\) and \(b_n \to 0\) such that:

  1. \(\lim \dfrac{a_n}{b_n} = 1\)
  2. \(\lim \dfrac{a_n}{b_n} = -5\)
  3. \(\lim \dfrac{a_n}{b_n} = 0\)
  4. \(\lim \dfrac{a_n}{b_n} = +\infty\)
  5. \(\lim \dfrac{a_n}{b_n}\) does not exist

The point of this problem: even when both numerator and denominator vanish, the ratio can do anything depending on relative shrinking rates. There is no “\(0/0\) rule” — you must look at how fast each sequence goes to zero.

Want \(a_n\) \(b_n\) \(\dfrac{a_n}{b_n}\)
\(\to 1\) \(\dfrac{1}{n}\) \(\dfrac{1}{n}\) \(1\)
\(\to -5\) \(\dfrac{-5}{n}\) \(\dfrac{1}{n}\) \(-5\)
\(\to 0\) \(\dfrac{1}{n^2}\) \(\dfrac{1}{n}\) \(\dfrac{1}{n} \to 0\)
\(\to +\infty\) \(\dfrac{1}{n}\) \(\dfrac{1}{n^2}\) \(n \to \infty\)
DNE \(\dfrac{(-1)^n}{n}\) \(\dfrac{1}{n}\) \((-1)^n\) oscillates

Why this matters. The same \(0/0\) phenomenon shows up for function limits (\(\lim_{x \to a} f(x)/g(x)\) with both numerator and denominator vanishing), where L’Hôpital’s rule resolves it by replacing the ratio of values with the ratio of derivatives — i.e., comparing rates of vanishing. The sequence examples above are why such a tool is needed: the answer is genuinely sensitive to relative speeds, and there is no universal “\(0/0\) rule.”

03 Asymptotic Bank Growth (textbook 5.4)

Zeus, Prometheus, and Aramazd each deposit \(1000\) gold pieces in different banks:

  1. Olympus Bank (Zeus): balance after \(x\) years is \(f_1(x) = 100x + x^3\).
  2. Tavros Bank (Prometheus): balance is \(f_2(x) = 0.18 \, x^3 \sqrt{x}\).
  3. Ararat Bank (Aramazd): balance is \(f_3(x) = 30 x^2\).

All three are immortal. Who will be richest as \(x \to \infty\)?

Compare leading powers of \(x\):

  • \(f_1(x) = 100x + x^3\) — leading power \(x^3\)
  • \(f_2(x) = 0.18 \, x^3 \sqrt{x} = 0.18 \, x^{7/2}\) — leading power \(x^{3.5}\)
  • \(f_3(x) = 30 x^2\) — leading power \(x^2\)

Since \(\tfrac{7}{2} > 3 > 2\), Tavros grows fastest. Verifying both pairwise comparisons against Tavros:

\[\begin{aligned} \lim_{x \to \infty} \frac{f_2(x)}{f_1(x)} &= \lim_{x \to \infty} \frac{0.18 \, x^{7/2}}{x^3 + 100x} \\ &= \lim_{x \to \infty} 0.18 \, \sqrt{x} \;=\; +\infty \end{aligned}\]

\[\begin{aligned} \lim_{x \to \infty} \frac{f_2(x)}{f_3(x)} &= \lim_{x \to \infty} \frac{0.18 \, x^{7/2}}{30 x^2} \\ &= \lim_{x \to \infty} 0.006 \, x^{3/2} \;=\; +\infty \end{aligned}\]

So Prometheus (Tavros Bank) ends up richest, despite the tiny coefficient \(0.18\) — the half-power advantage in \(x\) overwhelms any constant for large enough \(x\). (Aramazd’s \(30 x^2\) comes in last, beaten by both higher-power competitors.)

Early vs. eventual — the word “eventually” is doing real work. A few snapshots:

Year \(x\) Olympus \(f_1\) Tavros \(f_2\) Ararat \(f_3\) Leader
\(1\) \(101\) \(0.18\) \(30\) Zeus
\(10\) \(2{,}000\) \(569\) \(3{,}000\) Aramazd
\(50\) \(130{,}000\) \(159{,}000\) \(75{,}000\) Prometheus
\(100\) \(1{,}010{,}000\) \(1{,}800{,}000\) \(300{,}000\) Prometheus

The lead changes hands three times before settling: Zeus leads at first, Aramazd takes over around \(x \approx 3.8\), Zeus retakes around \(x \approx 26.2\) (his cubic finally beats Aramazd’s quadratic), and Prometheus only overtakes everyone around \(x \approx 35.9\). Asymptotic dominance is not the same as “always ahead” — Prometheus has to be patient for a few decades before the half-power advantage actually pays off.

Pedagogical takeaway. Constants don’t matter in asymptotic comparisons; powers do. This is exactly the intuition behind Big-O notation: an \(O(n^{3.5})\) algorithm with tiny constant beats an \(O(n^3)\) algorithm with huge constant only for small \(n\), but loses for large \(n\).

Continuity

04 Continuity at a Point (textbook 5.3)

Each \(f\) below is two formulas glued at \(x = 2\). Find a value of \(c\) that makes \(f\) continuous at \(x = 2\) (try by eye first; if no such \(c\) exists, say so).

  1. \(f(x) = \begin{cases} 3x - 5 & x < 2 \\ x^2 + c & x \ge 2 \end{cases}\)

  2. \(f(x) = \begin{cases} x^3 + 1 & x < 2 \\ c x^2 & x \ge 2 \end{cases}\)

  3. \(f(x) = \begin{cases} -7 & x < 2 \\ c & x = 2 \\ 4 + 3\sin(\pi x) & x > 2 \end{cases}\)

  4. \(f(x) = \begin{cases} \dfrac{x^2 - x - 2}{x - 2} & x \ge 2 \\ c & x < 2 \end{cases}\)

For continuity at \(x = 2\), we need \(\lim_{x \to 2^-} f(x) = \lim_{x \to 2^+} f(x) = f(2)\).

a. Left limit \(= 3(2) - 5 = 1\). Right limit (and \(f(2)\)) \(= 4 + c\). Set equal: \(c = -3\).

b. Left limit \(= 2^3 + 1 = 9\). Right limit (and \(f(2)\)) \(= 4c\). Set equal: \(c = 9/4\).

c. Left limit \(= -7\). Right limit \(= 4 + 3\sin(2\pi) = 4 + 0 = 4\). The two side-limits disagree (\(-7 \ne 4\)), so no value of \(c\) makes \(f\) continuous\(f(2) = c\) can match only one side at a time.

d. A subtlety: at \(x = 2\) the formula \(\dfrac{x^2 - x - 2}{x - 2}\) gives \(0/0\), so \(f(2)\) is undefined as written. To make \(f\) continuous at \(2\) we have to do two things:

  1. Remove the singularity in the right branch. Simplify \(\dfrac{x^2 - x - 2}{x - 2} = \dfrac{(x - 2)(x + 1)}{x - 2} = x + 1\) for \(x \ne 2\). The right-hand limit is \(\lim_{x \to 2^+} (x + 1) = 3\), so the singularity is removable — redefine \(f(2) := 3\).
  2. Match the left side. With \(f(2) = 3\) now well-defined, the left-side value \(c\) must equal \(3\), so \(c = 3\).

The pattern. For piecewise functions, continuity at the join is just equating side-limits and \(f(2)\). Cases (a, b) are routine: solve one equation in \(c\). Case (c) is a jump discontinuity: the side-limits genuinely disagree, and no choice of \(f(2)\) can patch it. Case (d) is a removable singularity: the formula has a \(0/0\) that polynomial cancellation kills, and we then use the resulting clean limit as the value at the bad point.

Derivatives

05 Compute Derivatives (textbook 5.5)

Find \(f'(x)\):

  1. \(f(x) = 10000\)
  2. \(f(x) = 2x^2 - 7x + 1\)
  3. \(f(x) = 2 \sin x \cdot e^x\)
  4. \(f(x) = e^{x^2}\)
  5. \(f(x) = x^2 \ln x\)
  6. \(f(x) = \dfrac{x^2}{x^3}\)
  7. \(f(x) = \sin(\cos x)\)
  8. \(f(x) = x^x\)

a. Constant rule: \(f'(x) = 0\).

b. Linearity + power rule: \(f'(x) = 4x - 7\).

c. Product rule (with \((e^x)' = e^x\)): \(f'(x) = 2(\cos x \cdot e^x + \sin x \cdot e^x) = 2 e^x (\cos x + \sin x)\).

d. Chain rule (outer \(e^u\), inner \(u = x^2\)): \(f'(x) = e^{x^2} \cdot 2x = 2x \, e^{x^2}\).

e. Product rule (with \((\ln x)' = 1/x\)): \(f'(x) = 2x \ln x + x^2 \cdot \tfrac{1}{x} = x(2 \ln x + 1)\).

f. Simplify, then power rule: \(\dfrac{x^2}{x^3} = \dfrac{1}{x} = x^{-1}\), so \(f'(x) = -x^{-2} = -\dfrac{1}{x^2}\).

(Sanity check via quotient rule: \(\dfrac{2x \cdot x^3 - x^2 \cdot 3x^2}{x^6} = \dfrac{-x^4}{x^6} = -\dfrac{1}{x^2}\). ✓ Always simplify before differentiating when you can.)

g. Chain rule (outer \(\sin\), inner \(\cos x\)): \(f'(x) = \cos(\cos x) \cdot (-\sin x) = -\sin x \cdot \cos(\cos x)\).

h. Logarithmic differentiation (the new technique here). Take logs: \(\ln f(x) = x \ln x\). Differentiate both sides: \[\frac{f'(x)}{f(x)} = \ln x + x \cdot \tfrac{1}{x} = \ln x + 1\] So \(f'(x) = x^x (\ln x + 1)\).

Why \(x^x\) is sneaky. Neither the power rule (\(\tfrac{d}{dx} x^n = n x^{n-1}\), valid only for constant exponent) nor the exponential rule (\(\tfrac{d}{dx} a^x = a^x \ln a\), valid only for constant base) applies, because both base and exponent vary. Logarithmic differentiation handles such mixtures. Equivalent trick: write \(x^x = e^{x \ln x}\) and apply the chain rule directly.

06 ML Activation Derivatives (textbook 5.6)

Two functions used heavily in machine learning:

  1. ReLU (rectified linear unit): \[f(x) = \begin{cases} x & x > 0 \\ 0 & x \le 0 \end{cases}\]

  2. Sigmoid: \[\sigma(x) = \frac{1}{1 + e^{-x}}\]

Compute their derivatives. Can you express the sigmoid’s derivative in terms of \(\sigma\) itself?

a. ReLU. For \(x > 0\), \(f(x) = x \Rightarrow f'(x) = 1\). For \(x < 0\), \(f(x) = 0 \Rightarrow f'(x) = 0\). At \(x = 0\), the left and right slopes (\(0\) and \(1\)) disagree, so \(f\) is not differentiable at \(0\):

\[f'(x) = \begin{cases} 1 & x > 0 \\ 0 & x < 0 \\ \text{undefined} & x = 0 \end{cases}\]

In practice ML libraries pick a convention (usually \(f'(0) := 0\)) — the choice almost never matters since exactly hitting \(x = 0\) in floating-point arithmetic is rare, and gradient descent doesn’t care about a single point.

b. Sigmoid. Write \(\sigma(x) = (1 + e^{-x})^{-1}\) and apply the chain rule:

\[\sigma'(x) = -(1 + e^{-x})^{-2} \cdot (-e^{-x}) = \frac{e^{-x}}{(1 + e^{-x})^2}\]

Expressing \(\sigma'\) via \(\sigma\). Note that \(1 - \sigma(x) = \dfrac{e^{-x}}{1 + e^{-x}}\). So

\[\begin{aligned} \sigma(x) \cdot \big(1 - \sigma(x)\big) &= \frac{1}{1 + e^{-x}} \cdot \frac{e^{-x}}{1 + e^{-x}} \\ &= \frac{e^{-x}}{(1 + e^{-x})^2} \\ &= \sigma'(x) \end{aligned}\]

Hence the famous identity:

\[\sigma'(x) = \sigma(x) \, \big(1 - \sigma(x)\big)\]

Why this matters. During neural-network backpropagation we evaluate the derivative at every neuron, every step. The identity above means: once you’ve already computed \(\sigma(x)\) in the forward pass, you get \(\sigma'(x)\) for free — no need to evaluate \(e^{-x}\) again. That’s a huge engineering win, and one historical reason sigmoid was the dominant activation in early neural nets.

The dark side: vanishing gradients. The identity \(\sigma'(x) = \sigma(x)(1 - \sigma(x))\) has a maximum value of \(1/4\) (at \(x = 0\), where \(\sigma = 1/2\)), and decays toward \(0\) for large \(|x|\) where sigmoid saturates. In deep networks, gradients propagate by multiplying these per-layer derivatives — so each sigmoid layer can shrink them by a factor of at most \(1/4\), and saturated neurons shrink them to nearly nothing. After 10-20 layers, gradients vanish to numerical zero and the early layers stop learning. This is exactly why ReLU eventually replaced sigmoid for hidden layers: \(\text{ReLU}'(x) = 1\) in the active region, which doesn’t shrink gradients at all. So the same identity that makes sigmoid cheap to differentiate is also what kills it in deep stacks.

07 Derivative of \(\det(A + xI)\) (textbook 5.7)

Let \(A = \begin{pmatrix} a & b \\ c & d \end{pmatrix}\) be a linear map \(\mathbb{R}^2 \to \mathbb{R}^2\). We want to know how \(\det(A)\) changes if we perturb \(A\) by a small multiple of the identity, e.g. \(A + 0.001 \cdot I\). Define

\[f(x) = \det(A + x \cdot I)\]

  1. What is \(f(0)\)?
  2. Using the definition of the derivative, compute \(f'(0)\).
  3. Do you recognize the result?

Compute \(f\) explicitly:

\[A + xI = \begin{pmatrix} a + x & b \\ c & d + x \end{pmatrix}\]

so

\[\begin{aligned} f(x) &= (a + x)(d + x) - bc \\ &= x^2 + (a + d) x + (ad - bc) \end{aligned}\]

a. \(f(0) = ad - bc = \det(A)\).

b. Using the definition of the derivative: \[\begin{aligned} f'(0) &= \lim_{x \to 0} \frac{f(x) - f(0)}{x} \\ &= \lim_{x \to 0} \frac{x^2 + (a+d) x + (ad - bc) - (ad - bc)}{x} \\ &= \lim_{x \to 0} \big(x + (a+d)\big) \\ &= a + d \end{aligned}\]

(The textbook specifically asks for the definition — and there’s a reason. For larger \(n \times n\) matrices, expanding \(\det(A + xI)\) as a full polynomial in \(x\) gets ugly, but the limit-definition route stays structural. Same trick will return when we hit multivariable derivatives.)

c. \(a + d\) is the trace of \(A\) — the sum of its diagonal entries:

\[\left.\frac{d}{dx}\det(A + xI)\right|_{x = 0} = \operatorname{tr}(A)\]

The big picture. This says

\[\det(A + xI) \approx \det(A) + x \cdot \operatorname{tr}(A) \quad \text{for small } x\]

Adding a small \(xI\) shifts the determinant by approximately \(x \cdot \operatorname{tr}(A)\) at first order. This is the simplest case of Jacobi’s formula for the derivative of the determinant.

It generalizes. The identity \(\frac{d}{dx}\det(A + xI)\big|_{x=0} = \operatorname{tr}(A)\) holds for any \(n \times n\) matrix, not just \(2 \times 2\). (Sketch: expand \(\det(A + xI)\) as a polynomial in \(x\). The constant term is \(\det(A)\). The coefficient of \(x\) is the sum, over each diagonal slot \(i\), of the determinant of \(A\) with its \(i\)-th diagonal entry replaced by \(1\) and the rest of column \(i\) zeroed — which by cofactor expansion equals \(A_{ii}\). Summing over \(i\) gives \(\operatorname{tr}(A)\).)

Why it’s beautiful. The determinant is a multiplicative quantity (it scales volume); the trace is an additive one (it sums diagonals). The result above is the linearization bridge between them — at first order, multiplicative becomes additive. The deeper identity \(\det(e^M) = e^{\operatorname{tr}(M)}\) reflects the same principle, though its proof needs more than just our linearization (you’d integrate the rate-of-change relation \(\frac{d}{dt}\det(e^{tM}) = \operatorname{tr}(M) \cdot \det(e^{tM})\) to get the exponential). Spirit: additive on the tangent space ⇒ exponential on the multiplicative flow.

Local Extrema

08 Find Local Extrema (textbook 5.8)

Find all local minima and maxima (if any):

  1. \(f(x) = 5x - x^2\)
  2. \(f(x) = 3x + 1\)
  3. \(f(x) = \dfrac{x^3}{e^x}\)

a. \(f'(x) = 5 - 2x = 0 \Rightarrow x = 5/2\). Since \(f''(x) = -2 < 0\), \(x = 5/2\) is a local maximum with value \(f(5/2) = \tfrac{25}{2} - \tfrac{25}{4} = \tfrac{25}{4}\). No local minimum (parabola opens downward).

b. \(f'(x) = 3 \ne 0\) everywhere. Linear functions are strictly monotone — no critical points, no local extrema.

c. Write \(f(x) = x^3 e^{-x}\). Product rule: \[f'(x) = 3x^2 e^{-x} - x^3 e^{-x} = x^2 (3 - x) e^{-x}\]

Setting \(f'(x) = 0\): critical points at \(x = 0\) (double root from \(x^2\)) and \(x = 3\).

Region sign of \(x^2\) sign of \((3-x)\) sign of \(f'\)
\(x < 0\) \(+\) \(+\) \(+\)
\(0 < x < 3\) \(+\) \(+\) \(+\)
\(x > 3\) \(+\) \(-\) \(-\)

So \(f'\) does not change sign at \(x = 0\) — that’s a critical point but not a local extremum (a horizontal-tangent inflection: locally \(f(x) \approx x^3\) near \(0\), like the cubic). At \(x = 3\), \(f'\) changes from \(+\) to \(-\), so \(x = 3\) is a local maximum with value \(f(3) = 27/e^3 \approx 1.34\).

Global behavior. Look at the boundary at infinity: as \(x \to -\infty\) both \(x^3 \to -\infty\) and \(e^{-x} \to +\infty\), so the product \(\to -\infty\). As \(x \to +\infty\), exponential decay beats polynomial growth, so \(f(x) \to 0^+\). Combined with the local max we found, this means \(x = 3\) is also the global maximum on \(\mathbb{R}\) (value \(27/e^3\)), and no global minimum exists (the function is unbounded below).

Lesson. Setting \(f'(x) = 0\) gives candidates for extrema, not extrema themselves. Always check the sign change (first-derivative test) or the second derivative — a critical point can be a max, a min, or neither. And on an unbounded domain, always sanity-check what happens at the boundary at infinity before claiming a global extremum.

🛠️ Գործնական ToDo

🎲 40 (03)

Flag Counter