04 Calculus: Limit, Continuity, and Derivatives
լուսանկարի հղումը, Գեղարքունիք - Արծվանիստ, Հեղինակ՝ Հայկ Բարսեղյան
📚 Նյութը
YouTube links in this section were auto-extracted. If you spot a mistake, please let me know!
Դասախոսություն
Գործնական
🏡 Տնային
- ❗❗❗ DON’T CHECK THE SOLUTIONS BEFORE TRYING TO DO THE HOMEWORK BY YOURSELF❗❗❗
- Please don’t hesitate to ask questions, never forget about the 🍊karalyok🍊 principle!
- The harder the problem is, the more 🧀cheeses🧀 it has.
- Problems with 🎁 are just extra bonuses. It would be good to try to solve them, but also it’s not the highest priority task.
- If the problem involve many boring calculations, feel free to skip them - important part is understanding the concepts.
- Submit your solutions here (even if it’s unfinished)
📖 Textbook reading (Section 5)
If link above doesn’t work, just go to this link. PDF page - 76, book page - 70.
The nine problems below are extracted from Section 5.8 of the Armenian course notes (PDF pages 76-78). Problem 09 (cardboard box) also appears as Problem 01 of HW 05, where the geometric/isoperimetric angle is emphasized; here we focus on the calculus optimization mechanics.
Limits
01 Sequence Limits (textbook 5.1)
Does the sequence have a limit as \(n \to \infty\)? If yes, find it.
- \(a_n = \dfrac{3 - n}{2}\)
- \(a_n = \dfrac{\sqrt[4]{n}}{n^3}\)
- \(a_n = \dfrac{n-1}{n^2-1}\)
- \(a_n = 1^n\)
- \(a_n = 0.4^n\)
- \(a_n = (-4)^n\)
- Pick any number as \(a_1\), then \(a_{n+1} = a_n / 1.5\).
- \(a_1 = 1\); for \(n \ge 2\), let \(a_n\) be the first digit after the decimal point of \(a_{n-1}/7\).
Hint: write out the first few terms; for the last one, a calculator helps.
a. \(\dfrac{3 - n}{2} = \dfrac{3}{2} - \dfrac{n}{2} \to -\infty\) as \(n \to \infty\). No finite limit (the sequence diverges to \(-\infty\)).
b. \(\dfrac{n^{1/4}}{n^3} = n^{-11/4} \to 0\).
c. Factor: \(\dfrac{n-1}{n^2 - 1} = \dfrac{n-1}{(n-1)(n+1)} = \dfrac{1}{n+1} \to 0\).
d. \(1^n = 1\) for every \(n\), so the sequence is constant. Limit is \(1\). (A small “trick”: despite looking like an exponential, the base is \(1\), so nothing happens.)
e. Geometric with \(|r| = 0.4 < 1\), so \(0.4^n \to 0\).
f. \((-4)^n\) alternates: the even-indexed subsequence \(a_{2k} = 4^{2k} \to +\infty\), the odd-indexed subsequence \(a_{2k+1} = -4^{2k+1} \to -\infty\). Two subsequences with different limits ⇒ the full sequence has no limit. (This subsequence argument is the standard tool for showing a sequence diverges by oscillation — keep it in your toolbox.)
g. Unfold the recursion: \[a_2 = \frac{a_1}{1.5}, \quad a_3 = \frac{a_2}{1.5} = \frac{a_1}{1.5^2}, \quad \ldots, \quad a_n = \frac{a_1}{1.5^{n-1}}\] Since \(1.5^{n-1} \to \infty\), \(a_n \to 0\) regardless of the starting value \(a_1\).
h. With \(a_1 = 1\): \(1/7 = 0.\underline{1}42857\ldots\), so \(a_2 = 1\). By the same calculation, \(a_3 = 1\), and by induction \(a_n = 1\) for all \(n\). The sequence is constant, so the limit is \(1\).
(Aside for the curious: the map \(x \mapsto\) “first decimal digit of \(x/7\)” has fixed points at \(0\), \(1\), and \(2\). Try \(a_1 = 5\): \(5/7 = 0.7\ldots \Rightarrow a_2 = 7\); \(7/7 = 1.0\ldots \Rightarrow a_3 = 0\); \(0/7 = 0 \Rightarrow a_4 = 0\). So starting from \(5\) the sequence reaches \(0\) after two steps and stays there.)
02 Indeterminate \(0/0\) Forms (textbook 5.2)
The limit theorem says nothing about \(\lim_{n \to \infty} \dfrac{a_n}{b_n}\) when \(\lim b_n = 0\). Construct examples of sequences with \(a_n \to 0\) and \(b_n \to 0\) such that:
- \(\lim \dfrac{a_n}{b_n} = 1\)
- \(\lim \dfrac{a_n}{b_n} = -5\)
- \(\lim \dfrac{a_n}{b_n} = 0\)
- \(\lim \dfrac{a_n}{b_n} = +\infty\)
- \(\lim \dfrac{a_n}{b_n}\) does not exist
The point of this problem: even when both numerator and denominator vanish, the ratio can do anything depending on relative shrinking rates. There is no “\(0/0\) rule” — you must look at how fast each sequence goes to zero.
| Want | \(a_n\) | \(b_n\) | \(\dfrac{a_n}{b_n}\) |
|---|---|---|---|
| \(\to 1\) | \(\dfrac{1}{n}\) | \(\dfrac{1}{n}\) | \(1\) |
| \(\to -5\) | \(\dfrac{-5}{n}\) | \(\dfrac{1}{n}\) | \(-5\) |
| \(\to 0\) | \(\dfrac{1}{n^2}\) | \(\dfrac{1}{n}\) | \(\dfrac{1}{n} \to 0\) |
| \(\to +\infty\) | \(\dfrac{1}{n}\) | \(\dfrac{1}{n^2}\) | \(n \to \infty\) |
| DNE | \(\dfrac{(-1)^n}{n}\) | \(\dfrac{1}{n}\) | \((-1)^n\) oscillates |
Why this matters. The same \(0/0\) phenomenon shows up for function limits (\(\lim_{x \to a} f(x)/g(x)\) with both numerator and denominator vanishing), where L’Hôpital’s rule resolves it by replacing the ratio of values with the ratio of derivatives — i.e., comparing rates of vanishing. The sequence examples above are why such a tool is needed: the answer is genuinely sensitive to relative speeds, and there is no universal “\(0/0\) rule.”
03 Asymptotic Bank Growth (textbook 5.4)
Zeus, Prometheus, and Aramazd each deposit \(1000\) gold pieces in different banks:
- Olympus Bank (Zeus): balance after \(x\) years is \(f_1(x) = 100x + x^3\).
- Tavros Bank (Prometheus): balance is \(f_2(x) = 0.18 \, x^3 \sqrt{x}\).
- Ararat Bank (Aramazd): balance is \(f_3(x) = 30 x^2\).
All three are immortal. Who will be richest as \(x \to \infty\)?
Compare leading powers of \(x\):
- \(f_1(x) = 100x + x^3\) — leading power \(x^3\)
- \(f_2(x) = 0.18 \, x^3 \sqrt{x} = 0.18 \, x^{7/2}\) — leading power \(x^{3.5}\)
- \(f_3(x) = 30 x^2\) — leading power \(x^2\)
Since \(\tfrac{7}{2} > 3 > 2\), Tavros grows fastest. Verifying both pairwise comparisons against Tavros:
\[\begin{aligned} \lim_{x \to \infty} \frac{f_2(x)}{f_1(x)} &= \lim_{x \to \infty} \frac{0.18 \, x^{7/2}}{x^3 + 100x} \\ &= \lim_{x \to \infty} 0.18 \, \sqrt{x} \;=\; +\infty \end{aligned}\]
\[\begin{aligned} \lim_{x \to \infty} \frac{f_2(x)}{f_3(x)} &= \lim_{x \to \infty} \frac{0.18 \, x^{7/2}}{30 x^2} \\ &= \lim_{x \to \infty} 0.006 \, x^{3/2} \;=\; +\infty \end{aligned}\]
So Prometheus (Tavros Bank) ends up richest, despite the tiny coefficient \(0.18\) — the half-power advantage in \(x\) overwhelms any constant for large enough \(x\). (Aramazd’s \(30 x^2\) comes in last, beaten by both higher-power competitors.)
Early vs. eventual — the word “eventually” is doing real work. A few snapshots:
| Year \(x\) | Olympus \(f_1\) | Tavros \(f_2\) | Ararat \(f_3\) | Leader |
|---|---|---|---|---|
| \(1\) | \(101\) | \(0.18\) | \(30\) | Zeus |
| \(10\) | \(2{,}000\) | \(569\) | \(3{,}000\) | Aramazd |
| \(50\) | \(130{,}000\) | \(159{,}000\) | \(75{,}000\) | Prometheus |
| \(100\) | \(1{,}010{,}000\) | \(1{,}800{,}000\) | \(300{,}000\) | Prometheus |
The lead changes hands three times before settling: Zeus leads at first, Aramazd takes over around \(x \approx 3.8\), Zeus retakes around \(x \approx 26.2\) (his cubic finally beats Aramazd’s quadratic), and Prometheus only overtakes everyone around \(x \approx 35.9\). Asymptotic dominance is not the same as “always ahead” — Prometheus has to be patient for a few decades before the half-power advantage actually pays off.
Pedagogical takeaway. Constants don’t matter in asymptotic comparisons; powers do. This is exactly the intuition behind Big-O notation: an \(O(n^{3.5})\) algorithm with tiny constant beats an \(O(n^3)\) algorithm with huge constant only for small \(n\), but loses for large \(n\).
Continuity
04 Continuity at a Point (textbook 5.3)
Each \(f\) below is two formulas glued at \(x = 2\). Find a value of \(c\) that makes \(f\) continuous at \(x = 2\) (try by eye first; if no such \(c\) exists, say so).
\(f(x) = \begin{cases} 3x - 5 & x < 2 \\ x^2 + c & x \ge 2 \end{cases}\)
\(f(x) = \begin{cases} x^3 + 1 & x < 2 \\ c x^2 & x \ge 2 \end{cases}\)
\(f(x) = \begin{cases} -7 & x < 2 \\ c & x = 2 \\ 4 + 3\sin(\pi x) & x > 2 \end{cases}\)
\(f(x) = \begin{cases} \dfrac{x^2 - x - 2}{x - 2} & x \ge 2 \\ c & x < 2 \end{cases}\)
For continuity at \(x = 2\), we need \(\lim_{x \to 2^-} f(x) = \lim_{x \to 2^+} f(x) = f(2)\).
a. Left limit \(= 3(2) - 5 = 1\). Right limit (and \(f(2)\)) \(= 4 + c\). Set equal: \(c = -3\).
b. Left limit \(= 2^3 + 1 = 9\). Right limit (and \(f(2)\)) \(= 4c\). Set equal: \(c = 9/4\).
c. Left limit \(= -7\). Right limit \(= 4 + 3\sin(2\pi) = 4 + 0 = 4\). The two side-limits disagree (\(-7 \ne 4\)), so no value of \(c\) makes \(f\) continuous — \(f(2) = c\) can match only one side at a time.
d. A subtlety: at \(x = 2\) the formula \(\dfrac{x^2 - x - 2}{x - 2}\) gives \(0/0\), so \(f(2)\) is undefined as written. To make \(f\) continuous at \(2\) we have to do two things:
- Remove the singularity in the right branch. Simplify \(\dfrac{x^2 - x - 2}{x - 2} = \dfrac{(x - 2)(x + 1)}{x - 2} = x + 1\) for \(x \ne 2\). The right-hand limit is \(\lim_{x \to 2^+} (x + 1) = 3\), so the singularity is removable — redefine \(f(2) := 3\).
- Match the left side. With \(f(2) = 3\) now well-defined, the left-side value \(c\) must equal \(3\), so \(c = 3\).
The pattern. For piecewise functions, continuity at the join is just equating side-limits and \(f(2)\). Cases (a, b) are routine: solve one equation in \(c\). Case (c) is a jump discontinuity: the side-limits genuinely disagree, and no choice of \(f(2)\) can patch it. Case (d) is a removable singularity: the formula has a \(0/0\) that polynomial cancellation kills, and we then use the resulting clean limit as the value at the bad point.
Derivatives
05 Compute Derivatives (textbook 5.5)
Find \(f'(x)\):
- \(f(x) = 10000\)
- \(f(x) = 2x^2 - 7x + 1\)
- \(f(x) = 2 \sin x \cdot e^x\)
- \(f(x) = e^{x^2}\)
- \(f(x) = x^2 \ln x\)
- \(f(x) = \dfrac{x^2}{x^3}\)
- \(f(x) = \sin(\cos x)\)
- \(f(x) = x^x\)
a. Constant rule: \(f'(x) = 0\).
b. Linearity + power rule: \(f'(x) = 4x - 7\).
c. Product rule (with \((e^x)' = e^x\)): \(f'(x) = 2(\cos x \cdot e^x + \sin x \cdot e^x) = 2 e^x (\cos x + \sin x)\).
d. Chain rule (outer \(e^u\), inner \(u = x^2\)): \(f'(x) = e^{x^2} \cdot 2x = 2x \, e^{x^2}\).
e. Product rule (with \((\ln x)' = 1/x\)): \(f'(x) = 2x \ln x + x^2 \cdot \tfrac{1}{x} = x(2 \ln x + 1)\).
f. Simplify, then power rule: \(\dfrac{x^2}{x^3} = \dfrac{1}{x} = x^{-1}\), so \(f'(x) = -x^{-2} = -\dfrac{1}{x^2}\).
(Sanity check via quotient rule: \(\dfrac{2x \cdot x^3 - x^2 \cdot 3x^2}{x^6} = \dfrac{-x^4}{x^6} = -\dfrac{1}{x^2}\). ✓ Always simplify before differentiating when you can.)
g. Chain rule (outer \(\sin\), inner \(\cos x\)): \(f'(x) = \cos(\cos x) \cdot (-\sin x) = -\sin x \cdot \cos(\cos x)\).
h. Two equivalent methods — both rely on rewriting \(x^x\) to escape the trap below.
Why \(x^x\) is sneaky. The power rule \(\tfrac{d}{dx} x^n = n x^{n-1}\) only works for constant exponent \(n\). The exponential rule \(\tfrac{d}{dx} a^x = a^x \ln a\) only works for constant base \(a\). With \(x^x\), both base and exponent vary simultaneously — neither rule applies directly. We need to transform \(x^x\) into something the standard rules can handle.
Method 1: Rewrite using \(a = e^{\ln a}\) (the cleanest approach). Apply this identity to \(x^x\) itself: \[x^x \;=\; e^{\ln(x^x)} \;=\; e^{x \ln x}\]
That second equality used the log rule \(\ln(a^b) = b \ln a\). Now the function is in pure exponential form — chain rule territory. With \(u(x) = x \ln x\): \[u'(x) = 1 \cdot \ln x + x \cdot \tfrac{1}{x} = \ln x + 1\]
So by the chain rule: \[\frac{d}{dx} \, x^x \;=\; \frac{d}{dx} \, e^{x \ln x} \;=\; e^{x \ln x} \cdot (\ln x + 1) \;=\; x^x (\ln x + 1)\]
Method 2: Logarithmic differentiation (faster once you’re used to it). Take logs of both sides of \(f(x) = x^x\): \[\ln f(x) = x \ln x\] Differentiate both sides — the left side uses the chain rule with \(\frac{d}{dx} \ln f(x) = f'(x) / f(x)\): \[\frac{f'(x)}{f(x)} = \ln x + x \cdot \tfrac{1}{x} = \ln x + 1\] Multiply both sides by \(f(x) = x^x\) to isolate \(f'\): \[f'(x) = x^x (\ln x + 1)\]
Same answer. Method 1 is more transparent for first-time exposure (it uses only chain rule and the well-known derivative of \(e^u\)); Method 2 is a powerful general-purpose tool worth learning for any product/power-laden function.
06 ML Activation Derivatives (textbook 5.6)
Two functions used heavily in machine learning:
ReLU (rectified linear unit): \[f(x) = \begin{cases} x & x > 0 \\ 0 & x \le 0 \end{cases}\]
Sigmoid: \[\sigma(x) = \frac{1}{1 + e^{-x}}\]
Compute their derivatives. Can you express the sigmoid’s derivative in terms of \(\sigma\) itself?
a. ReLU. For \(x > 0\), \(f(x) = x \Rightarrow f'(x) = 1\). For \(x < 0\), \(f(x) = 0 \Rightarrow f'(x) = 0\). At \(x = 0\), the left and right slopes (\(0\) and \(1\)) disagree, so \(f\) is not differentiable at \(0\):
\[f'(x) = \begin{cases} 1 & x > 0 \\ 0 & x < 0 \\ \text{undefined} & x = 0 \end{cases}\]
In practice ML libraries pick a convention (usually \(f'(0) := 0\)) — the choice almost never matters since exactly hitting \(x = 0\) in floating-point arithmetic is rare, and gradient descent doesn’t care about a single point.
b. Sigmoid. Write \(\sigma(x) = (1 + e^{-x})^{-1}\) and apply the chain rule:
\[\sigma'(x) = -(1 + e^{-x})^{-2} \cdot (-e^{-x}) = \frac{e^{-x}}{(1 + e^{-x})^2}\]
Expressing \(\sigma'\) via \(\sigma\). Note that \(1 - \sigma(x) = \dfrac{e^{-x}}{1 + e^{-x}}\). So
\[\begin{aligned} \sigma(x) \cdot \big(1 - \sigma(x)\big) &= \frac{1}{1 + e^{-x}} \cdot \frac{e^{-x}}{1 + e^{-x}} \\ &= \frac{e^{-x}}{(1 + e^{-x})^2} \\ &= \sigma'(x) \end{aligned}\]
Hence the famous identity:
\[\sigma'(x) = \sigma(x) \, \big(1 - \sigma(x)\big)\]
Sanity check with concrete values:
| \(x\) | \(\sigma(x)\) | \(\sigma'(x)\) |
|---|---|---|
| \(-5\) | \(0.0067\) | \(0.0066\) |
| \(-3\) | \(0.0474\) | \(0.0452\) |
| \(-1\) | \(0.2689\) | \(0.1966\) |
| \(\;\;0\) | \(0.5000\) | \(0.2500\) (max) |
| \(\;\;1\) | \(0.7311\) | \(0.1966\) |
| \(\;\;3\) | \(0.9526\) | \(0.0452\) |
| \(\;\;5\) | \(0.9933\) | \(0.0066\) |
Three things worth noticing:
- \(\sigma'\) peaks at \(x = 0\) with value exactly \(\tfrac{1}{4}\) (since \(\sigma(0) = \tfrac{1}{2}\), and \(\tfrac{1}{2} \cdot \tfrac{1}{2} = \tfrac{1}{4}\)).
- \(\sigma'\) is symmetric: \(\sigma'(-x) = \sigma'(x)\). Compare the rows for \(\pm 1\), \(\pm 3\), \(\pm 5\) — identical \(\sigma'\) values. Reason: \(\sigma(-x) = 1 - \sigma(x)\), so \(\sigma'(-x) = \sigma(-x)(1 - \sigma(-x)) = (1-\sigma(x))\sigma(x) = \sigma'(x)\). So \(\sigma'\) is an even function even though \(\sigma\) itself is neither even nor odd.
- \(\sigma'\) is always positive (in fact \(> 0\) everywhere), which means \(\sigma\) is strictly increasing — a fact you can also see from the formula directly.
Why this matters. During neural-network backpropagation we evaluate the derivative at every neuron, every step. The identity above means: once you’ve already computed \(\sigma(x)\) in the forward pass, you get \(\sigma'(x)\) for free — no need to evaluate \(e^{-x}\) again. That’s a huge engineering win, and one historical reason sigmoid was the dominant activation in early neural nets.
The dark side: vanishing gradients. The identity \(\sigma'(x) = \sigma(x)(1 - \sigma(x))\) has a maximum value of \(1/4\) (at \(x = 0\), where \(\sigma = 1/2\)), and decays toward \(0\) for large \(|x|\) where sigmoid saturates. In deep networks, gradients propagate by multiplying these per-layer derivatives — so each sigmoid layer can shrink them by a factor of at most \(1/4\), and saturated neurons shrink them to nearly nothing. After 10-20 layers, gradients vanish to numerical zero and the early layers stop learning. This is exactly why ReLU eventually replaced sigmoid for hidden layers: \(\text{ReLU}'(x) = 1\) in the active region, which doesn’t shrink gradients at all. So the same identity that makes sigmoid cheap to differentiate is also what kills it in deep stacks.
07 Derivative of \(\det(A + xI)\) (textbook 5.7)
Let \(A = \begin{pmatrix} a & b \\ c & d \end{pmatrix}\) be a linear map \(\mathbb{R}^2 \to \mathbb{R}^2\). We want to know how \(\det(A)\) changes if we perturb \(A\) by a small multiple of the identity, e.g. \(A + 0.001 \cdot I\). Define
\[f(x) = \det(A + x \cdot I)\]
- What is \(f(0)\)?
- Using the definition of the derivative, compute \(f'(0)\).
- Do you recognize the result?
Compute \(f\) explicitly:
\[A + xI = \begin{pmatrix} a + x & b \\ c & d + x \end{pmatrix}\]
so
\[\begin{aligned} f(x) &= (a + x)(d + x) - bc \\ &= x^2 + (a + d) x + (ad - bc) \end{aligned}\]
a. \(f(0) = ad - bc = \det(A)\).
b. Using the definition of the derivative: \[\begin{aligned} f'(0) &= \lim_{x \to 0} \frac{f(x) - f(0)}{x} \\ &= \lim_{x \to 0} \frac{x^2 + (a+d) x + (ad - bc) - (ad - bc)}{x} \\ &= \lim_{x \to 0} \big(x + (a+d)\big) \\ &= a + d \end{aligned}\]
(The textbook specifically asks for the definition — and there’s a reason. For larger \(n \times n\) matrices, expanding \(\det(A + xI)\) as a full polynomial in \(x\) gets ugly, but the limit-definition route stays structural. Same trick will return when we hit multivariable derivatives.)
c. \(a + d\) is the trace of \(A\) — the sum of its diagonal entries:
\[\left.\frac{d}{dx}\det(A + xI)\right|_{x = 0} = \operatorname{tr}(A)\]
The big picture. This says
\[\det(A + xI) \approx \det(A) + x \cdot \operatorname{tr}(A) \quad \text{for small } x\]
Adding a small \(xI\) shifts the determinant by approximately \(x \cdot \operatorname{tr}(A)\) at first order. This is the simplest case of Jacobi’s formula for the derivative of the determinant.
It generalizes. The identity \(\frac{d}{dx}\det(A + xI)\big|_{x=0} = \operatorname{tr}(A)\) holds for any \(n \times n\) matrix, not just \(2 \times 2\). (Sketch: expand \(\det(A + xI)\) as a polynomial in \(x\). The constant term is \(\det(A)\). The coefficient of \(x\) is the sum, over each diagonal slot \(i\), of the determinant of \(A\) with its \(i\)-th diagonal entry replaced by \(1\) and the rest of column \(i\) zeroed — which by cofactor expansion equals \(A_{ii}\). Summing over \(i\) gives \(\operatorname{tr}(A)\).)
Why it’s beautiful. The determinant is a multiplicative quantity (it scales volume); the trace is an additive one (it sums diagonals). The result above is the linearization bridge between them — at first order, multiplicative becomes additive. The deeper identity \(\det(e^M) = e^{\operatorname{tr}(M)}\) reflects the same principle, though its proof needs more than just our linearization (you’d integrate the rate-of-change relation \(\frac{d}{dt}\det(e^{tM}) = \operatorname{tr}(M) \cdot \det(e^{tM})\) to get the exponential). Spirit: additive on the tangent space ⇒ exponential on the multiplicative flow.
Local Extrema
08 Find Local Extrema (textbook 5.8)
Find all local minima and maxima (if any):
- \(f(x) = 5x - x^2\)
- \(f(x) = 3x + 1\)
- \(f(x) = \dfrac{x^3}{e^x}\)
a. \(f'(x) = 5 - 2x = 0 \Rightarrow x = 5/2\). Since \(f''(x) = -2 < 0\), \(x = 5/2\) is a local maximum with value \(f(5/2) = \tfrac{25}{2} - \tfrac{25}{4} = \tfrac{25}{4}\). No local minimum (parabola opens downward).
b. \(f'(x) = 3 \ne 0\) everywhere. Linear functions are strictly monotone — no critical points, no local extrema.
c. Write \(f(x) = x^3 e^{-x}\). Product rule: \[f'(x) = 3x^2 e^{-x} - x^3 e^{-x} = x^2 (3 - x) e^{-x}\]
Setting \(f'(x) = 0\): critical points at \(x = 0\) (double root from \(x^2\)) and \(x = 3\).
| Region | sign of \(x^2\) | sign of \((3-x)\) | sign of \(f'\) |
|---|---|---|---|
| \(x < 0\) | \(+\) | \(+\) | \(+\) |
| \(0 < x < 3\) | \(+\) | \(+\) | \(+\) |
| \(x > 3\) | \(+\) | \(-\) | \(-\) |
So \(f'\) does not change sign at \(x = 0\) — that’s a critical point but not a local extremum (a horizontal-tangent inflection: locally \(f(x) \approx x^3\) near \(0\), like the cubic). At \(x = 3\), \(f'\) changes from \(+\) to \(-\), so \(x = 3\) is a local maximum with value \(f(3) = 27/e^3 \approx 1.34\).
Global behavior. Look at the boundary at infinity: as \(x \to -\infty\) both \(x^3 \to -\infty\) and \(e^{-x} \to +\infty\), so the product \(\to -\infty\). As \(x \to +\infty\), exponential decay beats polynomial growth, so \(f(x) \to 0^+\). Combined with the local max we found, this means \(x = 3\) is also the global maximum on \(\mathbb{R}\) (value \(27/e^3\)), and no global minimum exists (the function is unbounded below).
Lesson. Setting \(f'(x) = 0\) gives candidates for extrema, not extrema themselves. Always check the sign change (first-derivative test) or the second derivative — a critical point can be a max, a min, or neither. And on an unbounded domain, always sanity-check what happens at the boundary at infinity before claiming a global extremum.
Optimization
09 Cardboard Box (textbook 5.9)
You have \(24 \text{ m}^2\) of cardboard (plus scissors and glue) to build a closed rectangular box whose left and right faces are squares. What’s the maximum possible volume?
Hint: take one of the non-square faces. Call its length \(x\) and width \(y\). Can you express \(y\) in terms of \(x\)? And the volume in terms of \(x\) alone?
Setup. Let the square left/right faces have side length \(x\), and call the box’s depth \(y\). So the box’s dimensions are \(x \times x \times y\).
The closed box has six faces:
- 2 squares (left and right), each of area \(x^2\)
- 4 rectangles (top, bottom, front, back), each of area \(x y\)
Setting total surface area equal to the cardboard budget: \[2 x^2 + 4 x y = 24 \;\Rightarrow\; y = \frac{12 - x^2}{2 x}\]
Valid range: \(0 < x < \sqrt{12}\) (so that \(y > 0\)).
Volume in terms of \(x\) alone: \[V(x) = x^2 \cdot y = x^2 \cdot \frac{12 - x^2}{2 x} = \frac{x (12 - x^2)}{2} = 6 x - \frac{x^3}{2}\]
Optimize. Take the derivative and set it to zero: \[V'(x) = 6 - \frac{3 x^2}{2} = 0 \;\Rightarrow\; x^2 = 4 \;\Rightarrow\; x = 2\]
(Take the positive root since \(x\) is a length.)
Verify it’s a max. Second derivative: \(V''(x) = -3 x\), so \(V''(2) = -6 < 0\) ⇒ local max. At the boundary points \(x \to 0\) and \(x \to \sqrt{12}\), \(V \to 0\), so this critical point is also the global max on the valid range.
Plug back in. \(y = \dfrac{12 - 4}{4} = 2\). Optimal dimensions: \(2 \times 2 \times 2\) — a cube — with maximum volume:
\[V_{\max} = 2 \cdot 2 \cdot 2 = 8 \text{ m}^3\]
Why the cube? The problem only constrained two dimensions to be equal (\(x = x\) for the two squares), but optimization naturally drove the third dimension to match: \(y = x = 2\). This is the recurring “maximum-symmetry shapes win” pattern in geometric optimization — and connects to the isoperimetric inequality in higher dimensions. See HW 05 Problem 01 for more on that thread.