03 Linear Independence, Basis, SLE, Eigen-staff
լուսանկարի հղումը, Գազանանոց, Հեղինակ՝ Elmira Gokoryan
📚 Նյութը
YouTube links in this section were auto-extracted. If you spot a mistake, please let me know!
Դասախոսություն
Գործնական
🏡 Տնային
- ❗❗❗ DON’T CHECK THE SOLUTIONS BEFORE TRYING TO DO THE HOMEWORK BY YOURSELF❗❗❗
- Please don’t hesitate to ask questions, never forget about the 🍊karalyok🍊 principle!
- The harder the problem is, the more 🧀cheeses🧀 it has.
- Problems with 🎁 are just extra bonuses. It would be good to try to solve them, but also it’s not the highest priority task.
- If the problem involve many boring calculations, feel free to skip them - important part is understanding the concepts.
- Submit your solutions here (even if it’s unfinished)
Systems of Linear Equations
01: GPS positioning - from 2D to 3D
Part A: 2D Positioning (Easier warm-up)
Imagine you’re lost in a 2D world and receive distance signals from cell towers: - Tower A at (0, 0): You are 5 units away - Tower B at (6, 0): You are 3 units away
- Write the system of equations for your position (x, y).
- Solve it step by step using substitution or elimination.
- Plot the circles and find their intersection point(s).
Part B: 3D GPS Challenge
Now for real GPS with 4 satellites in 3D space: - Why do you need exactly 4 satellites for 3D positioning when you only need 2 towers for 2D? - What happens if you only have 3 satellites?
Part A: 2D positioning.
Each tower-distance gives a circle equation. Tower A at \((0,0)\) at distance \(5\):
\[x^2 + y^2 = 25\]
Tower B at \((6, 0)\) at distance \(3\):
\[(x-6)^2 + y^2 = 9\]
Subtracting eq. 1 from eq. 2 to eliminate \(y^2\):
\[(x-6)^2 - x^2 = 9 - 25 \;\Rightarrow\; -12x + 36 = -16 \;\Rightarrow\; x = \tfrac{52}{12} = \tfrac{13}{3}\]
Substitute back into eq. 1:
\[y^2 = 25 - \tfrac{169}{9} = \tfrac{56}{9} \;\Rightarrow\; y = \pm \tfrac{2\sqrt{14}}{3} \approx \pm 2.49\]
So two solutions: \(\left(\tfrac{13}{3}, \pm \tfrac{2\sqrt{14}}{3}\right)\).
In 2D, two distance measurements give two circles — generically intersecting at two points. To pick the right one you’d need extra information (a third tower, or knowledge of which side of the line you’re on).
Part B: 3D GPS.
In 3D, each satellite-distance gives a sphere of possible positions \(\|\vec p - \vec p_{\text{sat}}\|^2 = d^2\). Geometrically:
- 1 sphere → 2D surface of solutions
- 2 spheres → 1D circle (where two spheres intersect)
- 3 spheres → typically 2 isolated points
- 4 spheres → 1 unique point
So mathematically, 3 satellites are almost enough — they give 2 candidate points, of which one is usually obviously wrong (deep inside Earth or far in space). But there’s a hidden 4th unknown: the receiver’s clock offset.
GPS works by measuring how long signals take to arrive. Satellites carry atomic clocks; phones do not. So a receiver doesn’t know “true” time exactly, only its time plus an unknown offset \(\Delta t\). The actual unknowns are \((x, y, z, \Delta t)\) — four unknowns. Each satellite gives one equation, so you need 4 satellites to solve uniquely.
With only 3 satellites, you can’t disentangle position from clock error and accuracy degrades. With 4+ satellites (which is what real receivers usually see), you over-determine the system and use least squares for noise-resistant positioning.
02: Linear regression with normal equations
You have the following data points for house size (x) vs price (y): - (50, 100), (100, 180), (150, 280)
- Set up the design matrix X (including the intercept column) and target vector y.
- Solve the normal equation \(\vec{\theta} = (X^T X)^{-1} X^T \vec{y}\) (set up the system of linear equations and you can use Gaussian elimination).
- Find the line equation \(y = \theta_0 + \theta_1 x\).
- Plot the data points and your fitted line.
- Predict the price for a 120 square meter house.
Hint for 1.: We want to express calculating y (the price) as a matrix multiplication of X and theta.
\[\vec{y} = X \vec{\theta}\]
Where \(\vec{\theta} = \begin{pmatrix} \theta_0 \\ \theta_1 \end{pmatrix}\). In order for each \(y_i = \theta_0 + \theta_1 x_i\), we set up the design matrix (X) as follows: \[X = \begin{pmatrix} 1 & 50 \\ 1 & 100 \\ 1 & 150 \end{pmatrix}\]
Calculate the product \(X \vec{\theta}\) to verify that each \(y_i\) corresponds to plugging in \(x_i\) into the line equation.
Step 1: Set up the design matrix X and target vector y
For the data points (50, 100), (100, 180), (150, 280), we need to include an intercept column:
\[X = \begin{pmatrix} 1 & 50 \\ 1 & 100 \\ 1 & 150 \end{pmatrix}, \quad \vec{y} = \begin{pmatrix} 100 \\ 180 \\ 280 \end{pmatrix}\]
Step 2: Solve the normal equation
First, compute \(X^T X\): \[X^T = \begin{pmatrix} 1 & 1 & 1 \\ 50 & 100 & 150 \end{pmatrix}\]
\[X^T X = \begin{pmatrix} 1 & 1 & 1 \\ 50 & 100 & 150 \end{pmatrix} \begin{pmatrix} 1 & 50 \\ 1 & 100 \\ 1 & 150 \end{pmatrix} = \begin{pmatrix} 3 & 300 \\ 300 & 35000 \end{pmatrix}\]
Next, compute \(X^T \vec{y}\): \[X^T \vec{y} = \begin{pmatrix} 1 & 1 & 1 \\ 50 & 100 & 150 \end{pmatrix} \begin{pmatrix} 100 \\ 180 \\ 280 \end{pmatrix} = \begin{pmatrix} 560 \\ 59000 \end{pmatrix}\]
Now solve \((X^T X) \vec{\theta} = X^T \vec{y}\): \[\begin{pmatrix} 3 & 300 \\ 300 & 35000 \end{pmatrix} \begin{pmatrix} \theta_0 \\ \theta_1 \end{pmatrix} = \begin{pmatrix} 560 \\ 59000 \end{pmatrix}\]
Using Gaussian elimination: - From the first equation: \(3\theta_0 + 300\theta_1 = 560\) - From the second equation: \(300\theta_0 + 35000\theta_1 = 59000\)
Multiply the first equation by 100: \(300\theta_0 + 30000\theta_1 = 56000\)
Subtract from the second equation: \(5000\theta_1 = 3000\), so \(\theta_1 = 0.6\)
Substitute back: \(3\theta_0 + 300(0.6) = 560\), so \(3\theta_0 = 380\), giving \(\theta_0 = 126.67\)
Step 3: Line equation \[y = 126.67 + 0.6x\]
Step 4: Verification Check our solution with the data points: - For x = 50: \(y = 126.67 + 0.6(50) = 156.67\) (close to 100) - For x = 100: \(y = 126.67 + 0.6(100) = 186.67\) (close to 180) - For x = 150: \(y = 126.67 + 0.6(150) = 216.67\) (close to 280)
Step 5: Prediction for 120 square meters \[y = 126.67 + 0.6(120) = 198.67\]
The predicted price for a 120 square meter house is approximately 198.67 (thousand units).
03: The cheese shop multicollinearity problem
A cheese shop tracks the following features for each cheese: - Price in AMD: \(p_1\) - Price in USD: \(p_2\) - Weight in kilograms: \(w_1\) - Weight in pounds: \(w_2\)
Given the conversion rates: 1 USD = 400 AMD and 1 kg = 2.2 pounds.
Linear dependence detection: Which features are linearly dependent? Write the exact relationships.
Matrix rank problem: If you create a data matrix with these 4 features for 100 cheeses, what would be the maximum possible rank? Why?
Feature selection: Which 2 features should you keep to avoid multicollinearity? Explain your choice.
System solvability: If you try to fit a model using all 4 features, what problems might arise? Hint: Think about the normal equation and matrix invertibility.
1. Linear dependencies. With \(1\,\text{USD} = 400\,\text{AMD}\) and \(1\,\text{kg} = 2.2\,\text{lb}\):
\[p_1 = 400 \cdot p_2, \qquad w_2 = 2.2 \cdot w_1\]
Equivalently, \(p_1 - 400\,p_2 = 0\) and \(w_2 - 2.2\,w_1 = 0\). Each is a linear relation among the columns.
2. Maximum rank of the data matrix. With 100 cheeses and 4 columns, the matrix is \(100 \times 4\). The column rank is at most \(\min(100, 4) = 4\), but the two dependencies above eliminate two independent directions: only “price” (one currency suffices) and “weight” (one unit suffices) carry independent information. So:
\[\text{max rank} = 4 - 2 = 2\]
3. Feature selection. Keep one feature per “real” quantity, e.g., \(\{p_2, w_1\}\) (USD and kg). Either currency for price and either unit for weight gives identical information — they’re just rescaled copies. Picking one of each removes the redundancy.
4. What goes wrong if you keep all 4 features?
The data matrix \(X\) has linearly dependent columns. Then:
- \(X^T X\) is singular — its determinant is \(0\) — so the normal-equation formula \(\vec\theta = (X^TX)^{-1} X^T \vec y\) literally cannot be evaluated (no inverse exists).
- Even if you regularize and force a solution, the individual coefficients become uninterpretable and unstable. Tiny numerical changes in \(X\) cause large swings in \(\vec\theta\). You can’t separate “the effect of AMD price” from “the effect of USD price” — they’re the same thing in disguise.
- Common fixes: drop redundant features (feature selection), use ridge regression (\(X^TX + \lambda I\) is always invertible), or project to a smaller independent feature set via PCA.
This phenomenon is called multicollinearity and is one of the most common real-world ML pitfalls — especially when features come from automated pipelines that bring in different units or correlated metrics.
04: System consistency analysis
For which values of \(a\) does the following system have 0, 1, or infinitely many solutions?
\[\begin{cases} x + 2y + z = 1 \\ 2x + 4y + az = 2 \\ -x - y + (a-1)z = 0 \end{cases}\]
Form the augmented matrix and row-reduce. Apply \(R_2 \to R_2 - 2R_1\) and \(R_3 \to R_3 + R_1\):
\[\left[\begin{array}{ccc|c} 1 & 2 & 1 & 1 \\ 2 & 4 & a & 2 \\ -1 & -1 & a-1 & 0 \end{array}\right] \;\longrightarrow\; \left[\begin{array}{ccc|c} 1 & 2 & 1 & 1 \\ 0 & 0 & a-2 & 0 \\ 0 & 1 & a & 1 \end{array}\right]\]
Now swap rows 2 and 3 to get a clean staircase:
\[\left[\begin{array}{ccc|c} 1 & 2 & 1 & 1 \\ 0 & 1 & a & 1 \\ 0 & 0 & a-2 & 0 \end{array}\right]\]
The bottom-right entry \(a - 2\) controls everything.
Case 1: \(a \neq 2\). Then \((a-2) z = 0\) gives \(z = 0\). Back-substituting: \(y + a \cdot 0 = 1 \Rightarrow y = 1\), and \(x + 2 \cdot 1 + 0 = 1 \Rightarrow x = -1\).
→ Unique solution \((x, y, z) = (-1, 1, 0)\).
Case 2: \(a = 2\). The third equation becomes \(0 = 0\) — automatically satisfied, so it disappears. Two equations remain:
\[y + 2z = 1, \qquad x + 2y + z = 1\]
From the first: \(y = 1 - 2z\). Substituting into the second: \(x + 2(1 - 2z) + z = 1 \Rightarrow x = -1 + 3z\).
→ Infinitely many solutions parameterized by \(z \in \mathbb{R}\):
\[(x, y, z) = (-1 + 3z, \; 1 - 2z, \; z)\]
Case 3: 0 solutions? Never. The right-hand side of the second row stays at \(0\) regardless of \(a\), so we never get a contradiction like “\(0 = (\text{nonzero})\)”. The system is consistent for every \(a\).
| \(a\) | # of solutions |
|---|---|
| \(a = 2\) | infinitely many |
| \(a \neq 2\) | exactly one |
| any | never \(0\) |
Geometric interpretation. Each equation defines a plane in \(\mathbb{R}^3\). For \(a \neq 2\), the three planes meet at a single point. For \(a = 2\), two of them are essentially the same equation (rows 1 and 2 become parallel) — the three planes share a common line, so the solutions form a line in \(\mathbb{R}^3\).
05: Linear independence in polynomial space
Determine if the set \(\{1 + t, 1 + 2t, 1 + t + t^2\} \subset P_2\) is linearly independent. If dependent, express one polynomial as a linear combination of the others.
Set up the linear-independence test:
\[c_1(1+t) + c_2(1+2t) + c_3(1+t+t^2) = 0\]
For this polynomial to be the zero polynomial, the coefficient of each power of \(t\) must be \(0\):
| coefficient of | equation |
|---|---|
| \(t^2\) | \(c_3 = 0\) |
| \(t^1\) | \(c_1 + 2c_2 + c_3 = 0\) |
| \(t^0\) | \(c_1 + c_2 + c_3 = 0\) |
From the first: \(c_3 = 0\). Substituting into the other two: \(c_1 + 2c_2 = 0\) and \(c_1 + c_2 = 0\). Subtracting these gives \(c_2 = 0\), hence \(c_1 = 0\).
Only the trivial solution \(c_1 = c_2 = c_3 = 0\) → the set is linearly independent.
Alternative method via the coordinate-vector trick. Each polynomial maps to a coordinate vector in the standard basis \(\{1, t, t^2\}\):
\[1+t \leftrightarrow (1, 1, 0), \quad 1+2t \leftrightarrow (1, 2, 0), \quad 1+t+t^2 \leftrightarrow (1, 1, 1)\]
Place them as columns of a matrix and compute the determinant:
\[\det\begin{pmatrix} 1 & 1 & 1 \\ 1 & 2 & 1 \\ 0 & 0 & 1 \end{pmatrix} = 1\cdot(2 - 0) - 1\cdot(1 - 0) + 1\cdot 0 = 1 \neq 0\]
Nonzero determinant ⇒ columns are linearly independent ⇒ the original polynomials are linearly independent.
This trick — “convert vectors in any 3-dimensional vector space to \(\mathbb{R}^3\) via coordinates, then use a determinant” — is one of the main payoffs of the basis machinery. It works for matrices, polynomials, function spaces — anything with a finite basis.
06: Matrix subspaces and trace
Let \(V = \{M \in M_{2 \times 2} : \text{tr}(M) = 0\}\) be the set of \(2 \times 2\) matrices with zero trace.
- Prove that \(V\) is a subspace of \(M_{2 \times 2}\).
- Find a basis and determine the dimension of \(V\).
Part 1: \(V\) is a subspace.
We check the three subspace axioms.
Zero vector. The zero matrix has \(\text{tr}(0) = 0\), so \(0 \in V\). ✓
Closed under addition. If \(A, B \in V\), then \(\text{tr}(A) = \text{tr}(B) = 0\). Trace is linear in its argument, so
\[\text{tr}(A + B) = \text{tr}(A) + \text{tr}(B) = 0\]
Hence \(A + B \in V\). ✓
- Closed under scalar multiplication. If \(A \in V\) and \(c \in \mathbb{R}\), then \(\text{tr}(cA) = c\,\text{tr}(A) = 0\), so \(cA \in V\). ✓
Part 2: Basis and dimension.
A general \(2 \times 2\) matrix is \(M = \begin{pmatrix} a & b \\ c & d \end{pmatrix}\) with \(\text{tr}(M) = a + d\). The constraint \(\text{tr}(M) = 0\) forces \(d = -a\), leaving three free parameters \(a, b, c\):
\[M = \begin{pmatrix} a & b \\ c & -a \end{pmatrix} = a\underbrace{\begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix}}_{E_1} + b\underbrace{\begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix}}_{E_2} + c\underbrace{\begin{pmatrix} 0 & 0 \\ 1 & 0 \end{pmatrix}}_{E_3}\]
These three matrices \(E_1, E_2, E_3\) span \(V\) by construction, and they’re linearly independent (each one has a \(1\) in a unique position the others don’t). So:
\[\mathcal{B} = \{E_1, E_2, E_3\}, \qquad \dim(V) = 3\]
Sanity check via “constraint counting”. \(M_{2\times 2}\) has dimension \(4\). The trace condition \(a + d = 0\) is one linear constraint, so it cuts the dimension by exactly \(1\):
\[\dim(V) = \dim(M_{2\times2}) - 1 = 4 - 1 = 3 \;\checkmark\]
This is a general principle: each independent linear constraint on a vector space reduces dimension by exactly \(1\).
07: Change of basis and coordinates
Let \(B = \{(3,1), (2,1)\}\) and \(C = \{(1,1), (1,0)\}\) be bases of \(\mathbb{R}^2\).
- Find the change-of-basis matrix \(P_{C \leftarrow B}\).
- If \([v]_B = (1, 2)^T\), compute \([v]_C\).
Hint: \([v]_B\) notation means that the coordinates of vector \(v\) in basis \(B\) are \((1, 2)^T\) e.g. \(v = 1 \cdot (3,1) + 2 \cdot (2,1)\).
Part 1: Find the change-of-basis matrix \(P_{C \leftarrow B}\)
The columns of the change-of-basis matrix \(P_{C \leftarrow B}\) are the coordinate vectors of the basis vectors in \(B\) relative to the basis \(C\). Let \(\vec{b_1} = (3,1)\), \(\vec{b_2} = (2,1)\), and \(\vec{c_1} = (1,1)\), \(\vec{c_2} = (1,0)\).
We need to express \(\vec{b_1}\) and \(\vec{b_2}\) as linear combinations of \(\vec{c_1}\) and \(\vec{c_2}\).
For \(\vec{b_1} = (3,1)\): We need to find scalars \(x_1, y_1\) such that \((3,1) = x_1(1,1) + y_1(1,0)\). This gives the system: \[ \begin{cases} x_1 + y_1 = 3 \\ x_1 = 1 \end{cases} \] Solving this system yields \(x_1 = 1\) and \(y_1 = 2\). So, \([\vec{b_1}]_C = \begin{pmatrix} 1 \\ 2 \end{pmatrix}\).
For \(\vec{b_2} = (2,1)\): We need to find scalars \(x_2, y_2\) such that \((2,1) = x_2(1,1) + y_2(1,0)\). This gives the system: \[ \begin{cases} x_2 + y_2 = 2 \\ x_2 = 1 \end{cases} \] Solving this system yields \(x_2 = 1\) and \(y_2 = 1\). So, \([\vec{b_2}]_C = \begin{pmatrix} 1 \\ 1 \end{pmatrix}\).
Combining these column vectors gives the change-of-basis matrix: \[ P_{C \leftarrow B} = \begin{pmatrix} 1 & 1 \\ 2 & 1 \end{pmatrix} \]
Part 2: Compute \([v]_C\)
To find the coordinates of \(v\) with respect to basis \(C\), we use the formula: \[ [v]_C = P_{C \leftarrow B} [v]_B \]
Given \([v]_B = \begin{pmatrix} 1 \\ 2 \end{pmatrix}\), we can compute: \[ [v]_C = \begin{pmatrix} 1 & 1 \\ 2 & 1 \end{pmatrix} \begin{pmatrix} 1 \\ 2 \end{pmatrix} = \begin{pmatrix} 1 \cdot 1 + 1 \cdot 2 \\ 2 \cdot 1 + 1 \cdot 2 \end{pmatrix} = \begin{pmatrix} 1 + 2 \\ 2 + 2 \end{pmatrix} = \begin{pmatrix} 3 \\ 4 \end{pmatrix} \]
So, the coordinates of the vector \(v\) in the basis \(C\) are \((3, 4)^T\).
08: Basis verification in different vector spaces
In each case, determine whether \(S\) is a basis for \(V\):
a) \(V = P_3\), \(S = \{0, x, x^2, x^3\}\)
b) \(V = M_{2 \times 2}\), \(S = \left\{I_2, \begin{pmatrix} 1 & 3 \\ 2 & 4 \end{pmatrix}, \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}\right\}\)
c) \(V = \mathbb{R}^3\), \(S = \{e_1, e_2, e_1 + 3e_2\}\)
d) \(V = P_1\), \(S = \{1 - 2x, 2 - x\}\)
A subset \(S\) is a basis of \(V\) iff (i) \(|S| = \dim(V)\), and (ii) the elements of \(S\) are linearly independent. (Either condition combined with the size match implies spanning.)
a) \(V = P_3\) has dimension \(4\). \(|S| = 4\), but \(S\) contains the zero polynomial. Not a basis. Any set containing the zero vector is automatically linearly dependent — the relation \(1 \cdot 0 = 0\) is non-trivial.
b) \(V = M_{2 \times 2}\) has dimension \(4\). \(|S| = 3 < 4\). Not a basis. Three vectors can never span a \(4\)-dimensional space, no matter what they are.
c) \(V = \mathbb{R}^3\) has dimension \(3\). \(|S| = 3\), but check linear independence: the third vector is
\[e_1 + 3e_2 = 1 \cdot e_1 + 3 \cdot e_2 + 0 \cdot e_3\]
a linear combination of the first two. Not a basis. Equivalently: every linear combination of \(S\) stays in the \(xy\)-plane, so \(S\) doesn’t reach \(e_3\).
d) \(V = P_1\) has dimension \(2\). \(|S| = 2\), so we just need linear independence. Check \(c_1(1 - 2x) + c_2(2 - x) = 0\):
| coefficient of | equation |
|---|---|
| constant | \(c_1 + 2c_2 = 0\) |
| \(x\) | \(-2c_1 - c_2 = 0\) |
From the first: \(c_1 = -2c_2\). Substituting into the second: \(-2(-2c_2) - c_2 = 3c_2 = 0\), so \(c_2 = 0\) and \(c_1 = 0\). Only the trivial solution.
\(S\) is a basis of \(P_1\).
09: Linear transformation from coordinates
Suppose \(T\) is a linear transformation from \(\mathbb{R}^2\) to \(P_2\) such that: \[T\begin{pmatrix} -1 \\ 1 \end{pmatrix} = x + 3, \quad T\begin{pmatrix} 2 \\ 3 \end{pmatrix} = 2x^2 - x\]
Find \(T\begin{pmatrix} a \\ b \end{pmatrix}\) for arbitrary \(a, b \in \mathbb{R}\).
Key idea. A linear transformation is fully determined by its action on a basis. So once we express \((a, b)\) in terms of the given input vectors, linearity hands us \(T(a, b)\) for free.
Step 1: Verify the inputs form a basis.
The vectors \(\vec{v}_1 = (-1, 1)\) and \(\vec{v}_2 = (2, 3)\) have
\[\det\begin{pmatrix} -1 & 2 \\ 1 & 3 \end{pmatrix} = -3 - 2 = -5 \neq 0\]
so they’re linearly independent — and any 2 independent vectors in \(\mathbb{R}^2\) form a basis.
Step 2: Express \((a, b)\) in this basis.
Find \(\alpha, \beta\) such that \((a, b) = \alpha\vec{v}_1 + \beta\vec{v}_2\):
\[-\alpha + 2\beta = a, \qquad \alpha + 3\beta = b\]
Adding: \(5\beta = a + b \;\Rightarrow\; \beta = \frac{a + b}{5}\).
Substituting into \(\alpha = b - 3\beta\): \(\alpha = b - \frac{3(a+b)}{5} = \frac{2b - 3a}{5}\).
Step 3: Apply linearity.
\[T\!\begin{pmatrix} a \\ b \end{pmatrix} = \alpha \cdot T(\vec{v}_1) + \beta \cdot T(\vec{v}_2) = \frac{2b - 3a}{5}(x + 3) + \frac{a + b}{5}(2x^2 - x)\]
Expanding and grouping by powers of \(x\):
\[\boxed{\;T\!\begin{pmatrix} a \\ b \end{pmatrix} = \frac{2(a+b)}{5}\,x^2 + \frac{b - 4a}{5}\,x + \frac{6b - 9a}{5}\;}\]
Verification.
For \((a, b) = (-1, 1)\): \(\alpha = \frac{2 + 3}{5} = 1\), \(\beta = 0\). So \(T = 1 \cdot (x + 3) + 0 = x + 3\) ✓
For \((a, b) = (2, 3)\): \(\alpha = \frac{6 - 6}{5} = 0\), \(\beta = \frac{5}{5} = 1\). So \(T = 0 + 1 \cdot (2x^2 - x) = 2x^2 - x\) ✓
Why this matters. This “specify-on-a-basis, extend-by-linearity” recipe is the way linear transformations are defined and stored in practice. A neural-network weight matrix \(W \in \mathbb{R}^{m \times n}\) is exactly this: \(W\) tells you what the transformation does to each standard basis vector \(e_j\) (column \(j\) of \(W\) is \(W e_j\)), and linearity gives you the action on every other input.
10: Geometric eigenvalue intuition
Without doing any calculations, determine the eigenvalues and eigenvectors for the following transformations by thinking about their geometric effects:
a) Shear matrix: \(S = \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix}\)
- What vectors remain on the same line after this transformation?
- What vectors get stretched or shrunk (and by what factor)?
b) Rotation matrix: \(R = \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix}\) (90° counterclockwise)
c) Reflection matrix: \(F = \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix}\) (reflection across x-axis)
d) Scaling matrix: \(D = \begin{pmatrix} 3 & 0 \\ 0 & 2 \end{pmatrix}\)
The recipe in all four cases: an eigenvector is a direction that the matrix maps to the same line through the origin; the eigenvalue is the stretch factor (negative if reversed).
a) Shear \(S = \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix}\).
Vectors on the \(x\)-axis (\(y = 0\)) don’t move at all: \(S(k, 0)^T = (k, 0)^T\). So the \(x\)-axis is fixed → eigenvalue \(\lambda = 1\) with eigenvector \((1, 0)^T\).
What about other directions? A vector \((x, y)\) with \(y \neq 0\) goes to \((x + y, y)\) — not a scalar multiple of \((x, y)\). So this is the only eigenvector direction.
→ One eigenvalue \(\lambda = 1\) with one independent eigenvector. Such a matrix is called defective — it doesn’t have enough eigenvectors to form a basis of \(\mathbb{R}^2\), and so it cannot be diagonalized.
b) Rotation \(R = \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix}\) (90° CCW).
Every direction gets rotated by 90°. No real direction stays the same, so there are no real eigenvectors. (Algebraically, the characteristic polynomial \(\lambda^2 + 1 = 0\) has only complex roots \(\pm i\).) This generalizes: a 2D rotation by \(\theta \neq 0°, 180°\) has no real eigenvectors at all.
c) Reflection \(F = \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix}\) (across \(x\)-axis).
Two natural fixed directions:
- Along the mirror line (\(x\)-axis): \(F(k, 0)^T = (k, 0)^T\). Eigenvalue \(\lambda_1 = 1\), eigenvector \((1, 0)^T\).
- Perpendicular to the mirror (\(y\)-axis): \(F(0, k)^T = (0, -k)^T\). Eigenvalue \(\lambda_2 = -1\), eigenvector \((0, 1)^T\).
The eigenvalues \(\pm 1\) are characteristic of every reflection: the mirror gets \(+1\), the perpendicular gets \(-1\). In any orthonormal eigenbasis a reflection looks exactly like the diagonal matrix \(\text{diag}(1, -1)\).
d) Scaling \(D = \begin{pmatrix} 3 & 0 \\ 0 & 2 \end{pmatrix}\).
The matrix is already diagonal, so:
- \(D(1, 0)^T = (3, 0)^T = 3 \cdot (1, 0)^T\) → \(\lambda_1 = 3\), eigenvector \((1, 0)^T\).
- \(D(0, 1)^T = (0, 2)^T = 2 \cdot (0, 1)^T\) → \(\lambda_2 = 2\), eigenvector \((0, 1)^T\).
For a diagonal matrix, the diagonal entries are the eigenvalues and the standard basis vectors are the eigenvectors. This is why people work so hard to diagonalize general matrices — once you have a matrix in (or transformed to) diagonal form, eigenvalues and eigenvectors are immediate.
11: Matrix powers using eigenvalues
Consider the matrix \(A = \begin{pmatrix} 5 & 8 \\ 2 & 5 \end{pmatrix}\). We want to compute \(A^{10}\) efficiently.
Find eigenvalues and eigenvectors of matrix \(A\).
Diagonalize the matrix: Express \(A = PDP^{-1}\) where \(D\) is diagonal.
Compute high powers efficiently: Use diagonalization to find \(A^{10}\) without multiplying the matrix 10 times.
Pattern recognition: What happens to \(A^n\) as \(n \to \infty\)? Which eigenvalue dominates?
1. Eigenvalues and eigenvectors of \(A = \begin{pmatrix} 5 & 8 \\ 2 & 5 \end{pmatrix}\).
Solve the characteristic equation \(\det(A - \lambda I) = 0\):
\[\det\begin{pmatrix} 5 - \lambda & 8 \\ 2 & 5 - \lambda \end{pmatrix} = (5 - \lambda)^2 - 16 = 0\]
So \((5 - \lambda)^2 = 16 \Rightarrow 5 - \lambda = \pm 4 \Rightarrow \lambda = 1 \text{ or } 9\).
For \(\lambda = 9\): solve \((A - 9I)\vec v = 0\):
\[\begin{pmatrix} -4 & 8 \\ 2 & -4 \end{pmatrix}\vec v = 0 \;\Rightarrow\; -4v_1 + 8v_2 = 0 \;\Rightarrow\; v_1 = 2v_2\]
Eigenvector: \(\vec v_9 = (2, 1)^T\).
For \(\lambda = 1\): solve \((A - I)\vec v = 0\):
\[\begin{pmatrix} 4 & 8 \\ 2 & 4 \end{pmatrix}\vec v = 0 \;\Rightarrow\; v_1 = -2v_2\]
Eigenvector: \(\vec v_1 = (-2, 1)^T\).
2. Diagonalize \(A = PDP^{-1}\).
Place the eigenvectors as columns of \(P\) (in the order matching the eigenvalues on \(D\)’s diagonal):
\[P = \begin{pmatrix} 2 & -2 \\ 1 & 1 \end{pmatrix}, \quad D = \begin{pmatrix} 9 & 0 \\ 0 & 1 \end{pmatrix}\]
\(\det(P) = 2 \cdot 1 - (-2) \cdot 1 = 4\), so
\[P^{-1} = \frac{1}{4}\begin{pmatrix} 1 & 2 \\ -1 & 2 \end{pmatrix}\]
Sanity check. \(A P = P D\)? Column-by-column: \(A(2,1)^T = (18, 9)^T = 9 \cdot (2, 1)^T\) ✓ and \(A(-2, 1)^T = (-2, 1)^T = 1 \cdot (-2, 1)^T\) ✓.
3. Compute \(A^{10}\) via diagonalization.
The trick: \(A^n = (PDP^{-1})^n = PDP^{-1} \cdot PDP^{-1} \cdots PDP^{-1} = P D^n P^{-1}\), with all the inner \(P^{-1}P\) pairs collapsing to \(I\). And \(D^n\) is trivial — just power each diagonal entry:
\[D^{10} = \begin{pmatrix} 9^{10} & 0 \\ 0 & 1 \end{pmatrix}\]
So:
\[A^{10} = P D^{10} P^{-1} = \begin{pmatrix} 2 & -2 \\ 1 & 1 \end{pmatrix}\begin{pmatrix} 9^{10} & 0 \\ 0 & 1 \end{pmatrix} \cdot \frac{1}{4}\begin{pmatrix} 1 & 2 \\ -1 & 2 \end{pmatrix}\]
Multiplying out the first two:
\[= \begin{pmatrix} 2 \cdot 9^{10} & -2 \\ 9^{10} & 1 \end{pmatrix} \cdot \frac{1}{4}\begin{pmatrix} 1 & 2 \\ -1 & 2 \end{pmatrix} = \frac{1}{4}\begin{pmatrix} 2\cdot 9^{10} + 2 & 4\cdot 9^{10} - 4 \\ 9^{10} - 1 & 2\cdot 9^{10} + 2 \end{pmatrix}\]
Cleaning up:
\[A^{10} = \begin{pmatrix} \dfrac{9^{10}+1}{2} & 9^{10} - 1 \\[4pt] \dfrac{9^{10}-1}{4} & \dfrac{9^{10}+1}{2} \end{pmatrix}\]
(Numerically, \(9^{10} = 3{,}486{,}784{,}401\).)
The same calculation gives a closed form for every power:
\[A^n = \begin{pmatrix} \dfrac{9^n + 1}{2} & 9^n - 1 \\[4pt] \dfrac{9^n - 1}{4} & \dfrac{9^n + 1}{2} \end{pmatrix}\]
Sanity-check at \(n = 0\): \(A^0 = I\) (since \(9^0 = 1\), the top-left becomes \(\frac{1+1}{2} = 1\), etc.) ✓ and at \(n = 1\) the formula reproduces \(A\) itself. ✓
Why this is a big deal. Computing \(A^{10}\) by direct multiplication takes \(9\) matrix multiplications, each \(O(n^3)\) for an \(n \times n\) matrix. With diagonalization, you do one factorization plus a trivial power of a diagonal matrix — and you also get a closed-form formula for every power simultaneously, not just \(n = 10\).
4. Behavior as \(n \to \infty\).
The dominant eigenvalue \(\lambda_1 = 9\) controls the growth — entries of \(A^n\) scale like \(9^n\). The smaller eigenvalue \(\lambda_2 = 1\) contributes a constant correction that becomes negligible compared to \(9^n\). For large \(n\):
\[A^n \;\approx\; \frac{9^n}{4}\begin{pmatrix} 2 & 4 \\ 1 & 2 \end{pmatrix}\]
Notice that the matrix on the right is rank 1: the second column is just twice the first, and the matrix factors as
\[\begin{pmatrix} 2 & 4 \\ 1 & 2 \end{pmatrix} = \begin{pmatrix} 2 \\ 1 \end{pmatrix}\begin{pmatrix} 1 & 2 \end{pmatrix}\]
— the outer product of the dominant eigenvector \((2, 1)^T\) with another vector. So as \(n \to \infty\), \(A^n\) collapses onto a rank-1 matrix in the direction of the dominant eigenvector, scaled by \(9^n\).
This is the seed of several powerful ideas:
- Power method for finding dominant eigenvectors: just iterate \(\vec x \mapsto A\vec x\) and (almost) any starting vector converges (after normalization) to the dominant eigenvector.
- PageRank: Google ranks web pages as the dominant eigenvector of a giant link-graph matrix, computed by iterated multiplication.
- PCA: the principal component is the dominant eigenvector of the covariance matrix.
In every case, the largest-magnitude eigenvalue dominates and “pulls” the iteration onto its eigenvector direction.
15: From ellipse equation to eigenanalysis
էս մեկը ընթացիկ ա ուղղակի, ավելի շուտ օպտիմ անելուց ենք նայելու, բայց դե ստեղից չեմ ջնջում որովհետև գուցե ինչ-որ մեկը ուզի հիմիկվանից էլ բզբզա
You are given the equation of an ellipsoid
\[3x^2 + 2y^2 + 2xy\]
Part A: Extract the matrix representation
Rewrite as quadratic form: Express the equation in the form \(\mathbf{v}^T A \mathbf{v}\) where \(\mathbf{v} = \begin{pmatrix} x \\ y \end{pmatrix}\).
Build the symmetric matrix: What is the matrix \(A\)?
Part B: Eigenanalysis
Find eigenvalues and eigenvectors: Compute the eigenvalues \(\lambda_1, \lambda_2\) and their corresponding eigenvectors \(\mathbf{v}_1, \mathbf{v}_2\).
Verify orthogonality: Check that the eigenvectors are orthogonal to each other.
Part C: Geometric interpretation
- Principal axes: The eigenvectors represent the principal axes of the ellipse. What are the directions of these axes in the plane?
- Sketch and describe:
- Which axis is the major axis (longer)? Which is the minor axis (shorter)?
- How is the ellipse oriented relative to the standard x-y coordinate axes?
- What’s the angle of rotation from the standard axes?
Exercises from Armenian notes
The five problems below are translated from chapter 4 (Structure of Linear Spaces) of «Կարճ մաթեմ մեքենայական ուսուցման համար», section 4.6 (page 50 of the book / page 56 of the PDF). Original numbering preserved.
4.1: Filling in a matrix from constraints
Fill in the missing entries of the matrix
\[\begin{pmatrix} 2 & * \\ -1 & * \end{pmatrix}\]
if it is known that:
- the transformation stretches \(\begin{pmatrix} 1 \\ 0 \end{pmatrix}\) by a factor of \(2\)
- the trace is \(2\) and the determinant is \(3\)
- \(1\) and \(5\) are eigenvalues of the transformation
The fixed entries pin the first column of \(A\) to \((2, -1)^T\). We need to determine \(a_{12}\) and \(a_{22}\) in each case. Two useful identities for \(2 \times 2\) matrices:
\[\text{tr}(A) = a_{11} + a_{22}, \qquad \det(A) = a_{11} a_{22} - a_{12} a_{21}\]
and the eigenvalues satisfy \(\lambda_1 + \lambda_2 = \text{tr}(A)\) and \(\lambda_1 \lambda_2 = \det(A)\).
(a) Stretches \((1, 0)^T\) by factor 2.
“Stretches by \(2\)” means the vector becomes twice as long along its own direction:
\[A\begin{pmatrix} 1 \\ 0 \end{pmatrix} = 2 \cdot \begin{pmatrix} 1 \\ 0 \end{pmatrix} = \begin{pmatrix} 2 \\ 0 \end{pmatrix}\]
But \(A \begin{pmatrix} 1 \\ 0 \end{pmatrix}\) is the first column of \(A\), which the template fixes at \((2, -1)^T\). The two are not equal — the bottom entry should be \(0\), not \(-1\).
→ No completion is possible with this template. To make the constraint work we’d need \(a_{21} = 0\), but \(a_{21} = -1\) is given.
(General principle: applying \(A\) to a basis vector \(\vec e_i\) just reads off column \(i\) of \(A\). So saying “\(\vec e_i\) is scaled by \(\lambda\)” forces the entire \(i\)-th column.)
(b) trace = 2, det = 3.
\(\text{tr}(A) = 2 + a_{22} = 2 \Rightarrow a_{22} = 0\).
\(\det(A) = 2 \cdot 0 - a_{12} \cdot (-1) = a_{12} = 3 \Rightarrow a_{12} = 3\).
\[A = \begin{pmatrix} 2 & 3 \\ -1 & 0 \end{pmatrix}\]
Verify. \(\text{tr} = 2 + 0 = 2\) ✓, \(\det = 0 - (-3) = 3\) ✓.
(c) eigenvalues \(1\) and \(5\).
Then \(\text{tr}(A) = 1 + 5 = 6\) and \(\det(A) = 1 \cdot 5 = 5\).
\(2 + a_{22} = 6 \Rightarrow a_{22} = 4\).
\(2 \cdot 4 - a_{12} \cdot (-1) = 8 + a_{12} = 5 \Rightarrow a_{12} = -3\).
\[A = \begin{pmatrix} 2 & -3 \\ -1 & 4 \end{pmatrix}\]
Verify by characteristic polynomial. \(\det(A - \lambda I) = (2 - \lambda)(4 - \lambda) - (-3)(-1) = \lambda^2 - 6\lambda + 8 - 3 = \lambda^2 - 6\lambda + 5 = (\lambda - 1)(\lambda - 5)\) ✓.
The trace-and-determinant trick is the fastest way to recover a \(2 \times 2\) matrix from its eigenvalues — no characteristic polynomial needed.
4.2: Maximum dimension of a span
What is the largest possible dimension of the subspace spanned by these vectors?
- \(\vec v_1 = \begin{pmatrix} 6 \\ 1 \\ 5 \end{pmatrix}\), \(\vec v_2 = \begin{pmatrix} 2 \\ 3 \\ 2 \end{pmatrix}\)
- \(\vec v_1 = \begin{pmatrix} 6 \\ 2 \end{pmatrix}\), \(\vec v_2 = \begin{pmatrix} 1 \\ 3 \end{pmatrix}\), \(\vec v_3 = \begin{pmatrix} 5 \\ 2 \end{pmatrix}\)
- \(\vec v_1 = \begin{pmatrix} 4 \\ 1 \\ 2 \end{pmatrix}\), \(\vec v_2 = \begin{pmatrix} -4 \\ 2 \\ -1 \end{pmatrix}\), \(\vec v_3 = \begin{pmatrix} 8 \\ -1 \\ 7 \end{pmatrix}\)
- \(\vec v_1 = \begin{pmatrix} 3 \\ -2 \\ 1 \end{pmatrix}\), \(\vec v_2 = \begin{pmatrix} 6 \\ -4 \\ 4 \end{pmatrix}\)
The dimension of the span equals the number of linearly independent vectors in the set, capped by the dimension of the ambient space.
For two nonzero vectors, “linearly independent” is the same as “not parallel”. A quick check: see whether one is a scalar multiple of the other by comparing the ratios of corresponding components. If the ratios disagree even in one coordinate, they’re not parallel.
(a) Two vectors in \(\mathbb{R}^3\). Component ratios \(6/2 = 3\), \(1/3 \approx 0.33\), \(5/2 = 2.5\) — different. So they’re not parallel ⇒ linearly independent. dim = 2.
(b) Three vectors in \(\mathbb{R}^2\). Capped by \(\dim(\mathbb{R}^2) = 2\). Check that some pair is independent: \(\vec v_1 = (6, 2)\) and \(\vec v_2 = (1, 3)\) have ratios \(6 \neq 2/3\), so independent. dim = 2. The three vectors must satisfy some linear dependency in \(\mathbb{R}^2\) (any 3 vectors in a 2D space do).
(c) Three vectors in \(\mathbb{R}^3\). Stack as columns of a matrix and compute the determinant:
\[\det\begin{pmatrix} 4 & -4 & 8 \\ 1 & 2 & -1 \\ 2 & -1 & 7 \end{pmatrix}\]
Expand along row 1:
\[= 4(2 \cdot 7 - (-1)(-1)) - (-4)(1 \cdot 7 - (-1) \cdot 2) + 8(1 \cdot (-1) - 2 \cdot 2)\]
\[= 4(13) + 4(9) + 8(-5) = 52 + 36 - 40 = 48 \neq 0\]
Nonzero determinant ⇒ linearly independent ⇒ dim = 3.
(d) Two vectors in \(\mathbb{R}^3\). Try \(\vec v_2 = c \vec v_1\): ratios \(6/3 = 2\), \(-4/(-2) = 2\), \(4/1 = 4\). The first two match but the third doesn’t, so they’re not parallel. dim = 2.
(Easy to fool yourself: it looks like \(\vec v_2 = 2\vec v_1\) from the first two coordinates. Always check every coordinate.)
| case | # vectors | ambient dim | answer |
|---|---|---|---|
| (a) | 2 | 3 | 2 |
| (b) | 3 | 2 | 2 (capped) |
| (c) | 3 | 3 | 3 |
| (d) | 2 | 3 | 2 |
4.3: Eigenvalues and eigenvectors
Find the eigenvalues and eigenvectors of:
- \(A = \begin{pmatrix} 1 & 2 \\ 0 & 1 \end{pmatrix}\)
- \(B = \begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix}\)
- \(C = \begin{pmatrix} 8 & 1 \\ 8 & 1 \end{pmatrix}\)
- \(D = \begin{pmatrix} -1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 3 \end{pmatrix}\)
You can use any method (characteristic polynomial, trace-and-determinant, column-basis reasoning), but try to picture each transformation geometrically too.
(a) \(A = \begin{pmatrix} 1 & 2 \\ 0 & 1 \end{pmatrix}\) — a shear.
Upper-triangular ⇒ eigenvalues are the diagonal entries: \(\lambda = 1\) (with algebraic multiplicity \(2\)).
Solve \((A - I)\vec v = 0\): \(\begin{pmatrix} 0 & 2 \\ 0 & 0 \end{pmatrix}\vec v = 0 \Rightarrow 2 v_2 = 0 \Rightarrow v_2 = 0\), \(v_1\) free.
Single eigenvector direction: \(\vec v = (1, 0)^T\).
This is a defective matrix — only one independent eigenvector for a double eigenvalue. Geometrically, this is a horizontal shear: it pins the \(x\)-axis (eigenvalue \(1\)) and pushes everything else into a non-parallel direction.
(b) \(B = \begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix}\) — symmetric.
Characteristic polynomial: \((2 - \lambda)^2 - 1 = 0 \Rightarrow 2 - \lambda = \pm 1 \Rightarrow \lambda = 1, 3\).
For \(\lambda = 3\): \((B - 3I)\vec v = \begin{pmatrix} -1 & 1 \\ 1 & -1 \end{pmatrix}\vec v = 0 \Rightarrow v_1 = v_2\). Eigenvector \((1, 1)^T\).
For \(\lambda = 1\): \((B - I)\vec v = \begin{pmatrix} 1 & 1 \\ 1 & 1 \end{pmatrix}\vec v = 0 \Rightarrow v_1 = -v_2\). Eigenvector \((1, -1)^T\).
Notice the eigenvectors \((1, 1)^T\) and \((1, -1)^T\) are perpendicular — this isn’t a coincidence; symmetric matrices always have perpendicular eigenvectors. Geometrically: \(B\) stretches by \(3\) along the \((1, 1)\) diagonal and leaves the perpendicular direction \((1, -1)\) unchanged.
(c) \(C = \begin{pmatrix} 8 & 1 \\ 8 & 1 \end{pmatrix}\) — rank 1 (rows are equal).
Trace \(= 9\), det \(= 8 - 8 = 0\). So \(\lambda_1 + \lambda_2 = 9\) and \(\lambda_1 \lambda_2 = 0\) ⇒ eigenvalues are \(\lambda = 0, 9\).
For \(\lambda = 0\): \(C\vec v = 0 \Rightarrow 8 v_1 + v_2 = 0 \Rightarrow v_2 = -8 v_1\). Eigenvector \((1, -8)^T\).
For \(\lambda = 9\): \((C - 9I)\vec v = \begin{pmatrix} -1 & 1 \\ 8 & -8 \end{pmatrix}\vec v = 0 \Rightarrow v_1 = v_2\). Eigenvector \((1, 1)^T\).
Having \(0\) as an eigenvalue is the same as the matrix being singular (\(\det = 0\)). The corresponding eigenvector \((1, -8)^T\) shows the direction that gets crushed to the origin: the entire line through \((1, -8)\) is sent to \(\vec 0\).
(d) \(D = \begin{pmatrix} -1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 3 \end{pmatrix}\) — diagonal.
For diagonal matrices, the eigenvalues are the diagonal entries and the eigenvectors are the standard basis vectors:
\[\lambda_1 = -1, \;\vec v_1 = \vec e_1; \quad \lambda_2 = 1, \;\vec v_2 = \vec e_2; \quad \lambda_3 = 3, \;\vec v_3 = \vec e_3\]
Geometrically: \(D\) flips the \(x\)-axis (eigenvalue \(-1\)), fixes the \(y\)-axis (eigenvalue \(1\)), and stretches the \(z\)-axis by \(3\).
| matrix | eigenvalues | eigenvectors | type |
|---|---|---|---|
| (a) shear | \(1, 1\) | \((1, 0)^T\) only | defective |
| (b) symmetric | \(1, 3\) | \((1, -1)^T\), \((1, 1)^T\) | orthogonal |
| (c) rank 1 | \(0, 9\) | \((1, -8)^T\), \((1, 1)^T\) | singular |
| (d) diagonal | \(-1, 1, 3\) | \(\vec e_1, \vec e_2, \vec e_3\) | standard |
4.4: Diagonal matrix in \(\mathbb{R}^5\)
Suppose
\[A = \begin{pmatrix} a_1 & 0 & 0 & 0 & 0 \\ 0 & a_2 & 0 & 0 & 0 \\ 0 & 0 & a_3 & 0 & 0 \\ 0 & 0 & 0 & a_4 & 0 \\ 0 & 0 & 0 & 0 & a_5 \end{pmatrix}\]
is a linear transformation, where \(a_1, \ldots, a_5\) are some fixed positive numbers.
- What is the standard basis of \(\mathbb{R}^5\)?
- Where does \(A\) send the standard basis vectors of \(\mathbb{R}^5\)?
- What do you think the determinant of \(A\) is?
- Can you find the eigenvalues and eigenvectors of \(A\)?
(a) Standard basis. \(\vec e_1 = (1, 0, 0, 0, 0)^T\), \(\vec e_2 = (0, 1, 0, 0, 0)^T\), …, \(\vec e_5 = (0, 0, 0, 0, 1)^T\). The vector \(\vec e_i\) has \(1\) in position \(i\) and \(0\) elsewhere.
(b) Action on basis vectors. Multiplying out:
\[A \vec e_i = a_i \vec e_i\]
That is, \(A\) scales each \(\vec e_i\) by the corresponding diagonal entry \(a_i\). (Equivalently: column \(i\) of \(A\) is \(a_i \vec e_i\), and column \(i\) is exactly \(A \vec e_i\).)
(c) Determinant. For any triangular matrix — including diagonal — the determinant is the product of the diagonal entries:
\[\det(A) = a_1 \cdot a_2 \cdot a_3 \cdot a_4 \cdot a_5\]
Geometric meaning: \(A\) stretches space by factor \(a_1\) along the first axis, \(a_2\) along the second, and so on. The volume of the unit hypercube becomes the product of stretches — exactly \(a_1 a_2 a_3 a_4 a_5\).
(d) Eigenvalues and eigenvectors. From part (b), \(A\vec e_i = a_i \vec e_i\) — exactly the eigenvector relation. So:
| eigenvalue | eigenvector |
|---|---|
| \(a_1\) | \(\vec e_1\) |
| \(a_2\) | \(\vec e_2\) |
| \(a_3\) | \(\vec e_3\) |
| \(a_4\) | \(\vec e_4\) |
| \(a_5\) | \(\vec e_5\) |
This is the punchline of diagonalization: a diagonal matrix is the easiest case — eigenvalues sit on the diagonal, eigenvectors are the standard basis. Every diagonalizable matrix is “secretly” a diagonal matrix viewed in the wrong basis; finding the right basis (the eigenvectors) is what eigendecomposition does.
Sanity check on (c): \(\det(A) = \prod \lambda_i = a_1 a_2 a_3 a_4 a_5\) ✓ (since the determinant equals the product of eigenvalues).
4.5: Apple-pear system as \(A\vec x = \vec b\)
Encountering on Facebook the following puzzle:
2 apples and 3 pears together cost 1100 dram, and 1 apple and 4 pears cost 700 dram. Only people with IQ above 140 can find the price of the apple and the pear.
you realize the moment has come — finally to apply what you’ve learned.
- Letting the unknowns be \(x_1, x_2\), write the problem as a system.
- Can you express the conditions in the form \(\begin{pmatrix} * & * \\ * & * \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} = \begin{pmatrix} * \\ * \end{pmatrix}\), in short \(A\vec x = \vec b\)?
- Using the inverse formula, try to find \(\vec x\).
What you got is called a system of linear equations.
(a) System form. Let \(x_1\) = price of one apple, \(x_2\) = price of one pear (both in dram).
\[\begin{cases} 2 x_1 + 3 x_2 = 1100 \\ x_1 + 4 x_2 = 700 \end{cases}\]
(b) Matrix form. Read off the coefficients column-by-column on the left, and the right-hand sides as \(\vec b\):
\[\underbrace{\begin{pmatrix} 2 & 3 \\ 1 & 4 \end{pmatrix}}_{A} \underbrace{\begin{pmatrix} x_1 \\ x_2 \end{pmatrix}}_{\vec x} = \underbrace{\begin{pmatrix} 1100 \\ 700 \end{pmatrix}}_{\vec b}\]
Multiplying out the left side reproduces the original system — that’s the verification.
(c) Solve via the inverse.
\[\det(A) = 2 \cdot 4 - 3 \cdot 1 = 5 \neq 0\]
so \(A^{-1}\) exists. The \(2 \times 2\) inverse formula \(A^{-1} = \frac{1}{\det A}\begin{pmatrix} d & -b \\ -c & a \end{pmatrix}\) gives:
\[A^{-1} = \frac{1}{5}\begin{pmatrix} 4 & -3 \\ -1 & 2 \end{pmatrix}\]
Then:
\[\vec x = A^{-1}\vec b = \frac{1}{5}\begin{pmatrix} 4 & -3 \\ -1 & 2 \end{pmatrix}\begin{pmatrix} 1100 \\ 700 \end{pmatrix} = \frac{1}{5}\begin{pmatrix} 4400 - 2100 \\ -1100 + 1400 \end{pmatrix} = \frac{1}{5}\begin{pmatrix} 2300 \\ 300 \end{pmatrix} = \begin{pmatrix} 460 \\ 60 \end{pmatrix}\]
→ apple = 460 dram, pear = 60 dram.
Verify. \(2(460) + 3(60) = 920 + 180 = 1100\) ✓, \(\;460 + 4(60) = 460 + 240 = 700\) ✓.
Big picture. Every linear system \(A\vec x = \vec b\) with invertible \(A\) has the unique solution \(\vec x = A^{-1}\vec b\). In practice, for large systems, nobody actually computes \(A^{-1}\) — Gaussian elimination is faster and numerically more stable. But conceptually, the inverse view is what makes the system “solvable in one shot” and ties the system back to the matrix machinery from chapter 2.