02 Linear Algebra - Matrices

image.png Հանրապետության Հրապարակ, լուսանկարի հղումը, Նկարը facebook-ում հրապարակող՝ Marine Tovmasyan

📚 Նյութը

YouTube links in this section were auto-extracted. If you spot a mistake, please let me know!

Դասախոսություն

Գործնական

🏡 Տնային

Note
  1. ❗❗❗ DON’T CHECK THE SOLUTIONS BEFORE TRYING TO DO THE HOMEWORK BY YOURSELF❗❗❗
  2. Please don’t hesitate to ask questions, never forget about the 🍊karalyok🍊 principle!
  3. The harder the problem is, the more 🧀cheeses🧀 it has.
  4. Problems with 🎁 are just extra bonuses. It would be good to try to solve them, but also it’s not the highest priority task.
  5. If the problem involve many boring calculations, feel free to skip them - important part is understanding the concepts.
  6. Submit your solutions here (even if it’s unfinished)

01: Matrix transformations

What vectors do you get by applying the matrix \(A = \begin{pmatrix} 3 & -3 \\ 3 & 3 \end{pmatrix}\) on the vectors:

  1. \(\vec{a} = \begin{pmatrix} 1 \\ 0 \end{pmatrix}\)
  2. \(\vec{b} = \begin{pmatrix} 0 \\ 1 \end{pmatrix}\)
  3. \(\vec{c} = \begin{pmatrix} 1 \\ 1 \end{pmatrix}\)
  4. Draw the vectors before and after multiplying with \(A\). What can you say visually about the matrix? Can you guess how it will act on the vector \(\begin{pmatrix} 2 \\ -2 \end{pmatrix}\)?

1. \(A\vec{a} = \begin{pmatrix} 3 & -3 \\ 3 & 3 \end{pmatrix}\begin{pmatrix} 1 \\ 0 \end{pmatrix} = \begin{pmatrix} 3 \\ 3 \end{pmatrix}\)

2. \(A\vec{b} = \begin{pmatrix} 3 & -3 \\ 3 & 3 \end{pmatrix}\begin{pmatrix} 0 \\ 1 \end{pmatrix} = \begin{pmatrix} -3 \\ 3 \end{pmatrix}\)

3. \(A\vec{c} = \begin{pmatrix} 3 & -3 \\ 3 & 3 \end{pmatrix}\begin{pmatrix} 1 \\ 1 \end{pmatrix} = \begin{pmatrix} 0 \\ 6 \end{pmatrix}\)

Plotting the originals (dashed) alongside the transformed vectors (solid):

a b c A·a = (3,3) A·b = (-3,3) A·c = (0,6)

4. Visually: every transformed vector is rotated \(45^\circ\) counterclockwise from its original, and stretched by the same factor. To make this rigorous, we want to write \(A = k \cdot R_\theta\) where \(R_\theta\) is a pure rotation. Two diagnostics nail this down:

(i) Find the scale \(k\) via the determinant. A pure rotation has \(\det(R) = 1\) and a uniform scaling by \(k\) multiplies areas by \(k^2\), so \(\det(A) = k^2\):

\[\det(A) = 3 \cdot 3 - (-3) \cdot 3 = 9 + 9 = 18 \;\Rightarrow\; k = \sqrt{18} = 3\sqrt{2}\]

(ii) Confirm “no shearing” via \(A^TA\). A rotation preserves the inner product, so \(R^T R = I\). If \(A = kR\), then \(A^TA = k^2 R^T R = k^2 I\). Check:

\[A^TA = \begin{pmatrix} 3 & 3 \\ -3 & 3 \end{pmatrix}\begin{pmatrix} 3 & -3 \\ 3 & 3 \end{pmatrix} = \begin{pmatrix} 18 & 0 \\ 0 & 18 \end{pmatrix} = 18\,I \;\checkmark\]

So \(A\) really is a rotation scaled by \(3\sqrt{2}\), with no shear or non-uniform stretching.

Find the angle. Since \(\vec{a} = (1,0)\) maps to \((3,3)\), the rotation takes \(+x\) to direction \((3,3)\):

\[\theta = \arctan\!\left(\tfrac{3}{3}\right) = \arctan(1) = 45^\circ\]

Putting it together:

\[A = 3\sqrt{2}\cdot \underbrace{\frac{1}{\sqrt{2}}\begin{pmatrix} \cos 45^\circ & -\sin 45^\circ \\ \sin 45^\circ & \cos 45^\circ \end{pmatrix}}_{R_{45^\circ}} = 3\begin{pmatrix} 1 & -1 \\ 1 & 1 \end{pmatrix} \;\checkmark\]

Predicting \(A\vec{v}\) for \(\vec{v} = \begin{pmatrix} 2 \\ -2 \end{pmatrix}\). This vector points at \(-45^\circ\) with length \(2\sqrt{2}\). Rotating by \(+45^\circ\) aligns it with the \(+x\)-axis; the length becomes \(3\sqrt{2} \cdot 2\sqrt{2} = 12\). So we predict \(A\vec{v} = \begin{pmatrix} 12 \\ 0 \end{pmatrix}\). Verifying directly:

\[A\begin{pmatrix} 2 \\ -2 \end{pmatrix} = \begin{pmatrix} 6 + 6 \\ 6 - 6 \end{pmatrix} = \begin{pmatrix} 12 \\ 0 \end{pmatrix} \;\checkmark\]

02: Matrix products

Compute the following products:

  1. \((A - B)(A + B)\), where \(A = \begin{pmatrix} 2 & 3 \\ -1 & 2 \end{pmatrix}\), \(B = \begin{pmatrix} 1 & 2 \\ 2 & -1 \end{pmatrix}\)
  2. \(A^2 - B^2\), with the same \(A\) and \(B\) as in part (b).
  3. Any comments on the results?

1. Compute \(A - B\) and \(A + B\):

\[A - B = \begin{pmatrix} 1 & 1 \\ -3 & 3 \end{pmatrix}, \qquad A + B = \begin{pmatrix} 3 & 5 \\ 1 & 1 \end{pmatrix}\]

\[(A - B)(A + B) = \begin{pmatrix} 1\cdot 3 + 1\cdot 1 & 1\cdot 5 + 1\cdot 1 \\ -3\cdot 3 + 3\cdot 1 & -3\cdot 5 + 3\cdot 1 \end{pmatrix} = \begin{pmatrix} 4 & 6 \\ -6 & -12 \end{pmatrix}\]

2. Compute \(A^2\) and \(B^2\):

\[A^2 = \begin{pmatrix} 2 & 3 \\ -1 & 2 \end{pmatrix}\begin{pmatrix} 2 & 3 \\ -1 & 2 \end{pmatrix} = \begin{pmatrix} 1 & 12 \\ -4 & 1 \end{pmatrix}\]

\[B^2 = \begin{pmatrix} 1 & 2 \\ 2 & -1 \end{pmatrix}\begin{pmatrix} 1 & 2 \\ 2 & -1 \end{pmatrix} = \begin{pmatrix} 5 & 0 \\ 0 & 5 \end{pmatrix} = 5I\]

\[A^2 - B^2 = \begin{pmatrix} 1 - 5 & 12 \\ -4 & 1 - 5 \end{pmatrix} = \begin{pmatrix} -4 & 12 \\ -4 & -4 \end{pmatrix}\]

3. The two results are different! Expanding the product carefully (no rearranging factors):

\[(A - B)(A + B) = A \cdot A + A \cdot B - B \cdot A - B \cdot B = A^2 + AB - BA - B^2\]

This equals \(A^2 - B^2\) only when \(AB - BA = 0\), i.e. when \(A\) and \(B\) commute.

Why does the high-school identity work for numbers? Because \(ab = ba\) trivially for real numbers. The identity \((a-b)(a+b) = a^2 - b^2\) silently uses commutativity in the cross term: \(-ab + ba = 0\). With matrices, that cancellation no longer happens automatically — you cannot reorder factors.

Verifying that \(A\) and \(B\) here don’t commute. Compute both products:

\[AB = \begin{pmatrix} 2 & 3 \\ -1 & 2 \end{pmatrix}\begin{pmatrix} 1 & 2 \\ 2 & -1 \end{pmatrix} = \begin{pmatrix} 2 + 6 & 4 - 3 \\ -1 + 4 & -2 - 2 \end{pmatrix} = \begin{pmatrix} 8 & 1 \\ 3 & -4 \end{pmatrix}\]

\[BA = \begin{pmatrix} 1 & 2 \\ 2 & -1 \end{pmatrix}\begin{pmatrix} 2 & 3 \\ -1 & 2 \end{pmatrix} = \begin{pmatrix} 2 - 2 & 3 + 4 \\ 4 + 1 & 6 - 2 \end{pmatrix} = \begin{pmatrix} 0 & 7 \\ 5 & 4 \end{pmatrix}\]

So \(AB \neq BA\). Their difference

\[[A, B] := AB - BA = \begin{pmatrix} 8 & -6 \\ -2 & -8 \end{pmatrix}\]

is called the commutator of \(A\) and \(B\), and it measures exactly how much they fail to commute. (You’ll meet the commutator again in quantum mechanics, Lie algebras, and almost anywhere matrices show up.)

Sanity check. The discrepancy between our two products should be precisely \(AB - BA\):

\[(A-B)(A+B) - (A^2 - B^2) = \begin{pmatrix} 4 & 6 \\ -6 & -12 \end{pmatrix} - \begin{pmatrix} -4 & 12 \\ -4 & -4 \end{pmatrix} = \begin{pmatrix} 8 & -6 \\ -2 & -8 \end{pmatrix} \;\checkmark\]

When do matrices commute? Some important cases where \(AB = BA\) does hold:

  • A matrix commutes with itself and its powers: \(A \cdot A^k = A^k \cdot A\)
  • The identity \(I\) and any scalar matrix \(cI\) commute with every matrix
  • Two diagonal matrices always commute (try multiplying two of them in both orders)
  • \(A\) and \(A^{-1}\) commute (when the inverse exists)
  • Polynomials in the same matrix commute: \(p(A)\) and \(q(A)\) for any polynomials \(p, q\) (e.g. \(A^2 + 3A\) and \(A^5 - A\))

Generic matrices, like \(A\) and \(B\) in this problem, do not commute. This non-commutativity is the defining feature of matrix algebra — it’s why “the order of operations matters” in linear transformations, neural network layers, and rotations in 3D.

03: Shear matrix transformations

Shear transformations are commonly used in computer graphics for creating italic text effects, perspective corrections, and geometric distortions. They preserve area but change angles and shapes.

Consider the following matrix (it is called the shear matrix): \(S = \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix}\)

  1. What would you get if you apply \(S\) on the vector \(\begin{pmatrix} 0 \\ 1 \end{pmatrix}\)?
  2. What would you get if you apply \(S\) again on the result of the previous point?
  3. What if you apply \(S\) one more time?
  4. What do you think happens when we apply \(S\) 100 times on that vector?
  5. Can you compute \(S^{100}\)?

1. \(S\begin{pmatrix} 0 \\ 1 \end{pmatrix} = \begin{pmatrix} 1 \\ 1 \end{pmatrix}\)

2. \(S\begin{pmatrix} 1 \\ 1 \end{pmatrix} = \begin{pmatrix} 2 \\ 1 \end{pmatrix}\)

3. \(S\begin{pmatrix} 2 \\ 1 \end{pmatrix} = \begin{pmatrix} 3 \\ 1 \end{pmatrix}\)

4. Pattern: each application increases the \(x\)-coordinate by exactly the (constant) \(y\)-coordinate, while \(y\) itself stays fixed. After \(100\) applications, starting from \((0,1)\): \(\begin{pmatrix} 100 \\ 1 \end{pmatrix}\).

Geometric intuition. The shear \(S\) slides each point horizontally by an amount equal to its height. Points on the \(x\)-axis (\(y=0\)) don’t move at all; points at height \(y=1\) slide right by \(1\); points at height \(y=5\) would slide right by \(5\). Horizontal lines stay horizontal, but vertical lines tilt. A unit square gets sheared into a parallelogram of the same area — and indeed \(\det(S) = 1\), so shears are area-preserving. (This is a clean instance of the rule “determinant = signed volume scaling factor.”)

5. Conjecture from the pattern: \(S^n = \begin{pmatrix} 1 & n \\ 0 & 1 \end{pmatrix}\).

Proof by induction. Base case \(n = 1\): just \(S\) itself. Inductive step: assuming \(S^n = \begin{pmatrix} 1 & n \\ 0 & 1 \end{pmatrix}\),

\[S^{n+1} = S \cdot S^n = \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix}\begin{pmatrix} 1 & n \\ 0 & 1 \end{pmatrix} = \begin{pmatrix} 1 & n+1 \\ 0 & 1 \end{pmatrix} \;\checkmark\]

So \(S^{100} = \begin{pmatrix} 1 & 100 \\ 0 & 1 \end{pmatrix}\), consistent with part 4: applied to \((0,1)\) this gives \((100, 1)\).

A subtle observation worth noting. A diagonal matrix \(D = \begin{pmatrix} a & 0 \\ 0 & b \end{pmatrix}\) has \(D^n = \begin{pmatrix} a^n & 0 \\ 0 & b^n \end{pmatrix}\) — its entries grow (or shrink) exponentially in \(n\). The shear is qualitatively different: the off-diagonal entry of \(S^n\) is just \(n\), growing only linearly. So even after \(100\) applications, \(S^{100}\) still has small entries (max value \(100\)), while a diagonal matrix with \(|a| > 1\) would have already exploded by factor \(a^{100}\).

Both behaviors come from the same operation — taking matrix powers — yet produce very different growth rates. The deeper reason involves coordinate changes: some matrices “look diagonal” if you pick the right coordinate system, others (like the shear) genuinely don’t. We’ll come back to this idea later in the course.

04: Diagonal matrix powers

Diagonal matrices are particularly useful in linear algebra because their powers are easy to compute. This property is extensively used in eigenvalue decomposition and diagonalization of matrices (more on this later).

Consider the diagonal matrix \(A = \begin{pmatrix} 2 & 0 \\ 0 & -1 \end{pmatrix}\).

  1. Compute \(A^2\), \(A^3\), and \(A^4\).
  2. Find a general formula for \(A^n\) where \(n\) is any positive integer.
  3. What does this transformation represent geometrically? How does it affect the unit circle when applied repeatedly?
  4. What happens when you apply this transformation to the vector \(\begin{pmatrix} 1 \\ 1 \end{pmatrix}\) multiple times?

1. For diagonal matrices, multiplication acts componentwise — the off-diagonal zeros stay zero, and the diagonal entries multiply with each other:

\[A^2 = \begin{pmatrix} 2 \cdot 2 & 0 \\ 0 & (-1)\cdot(-1) \end{pmatrix} = \begin{pmatrix} 4 & 0 \\ 0 & 1 \end{pmatrix}\]

\[A^3 = \begin{pmatrix} 8 & 0 \\ 0 & -1 \end{pmatrix}, \quad A^4 = \begin{pmatrix} 16 & 0 \\ 0 & 1 \end{pmatrix}\]

2. General formula:

\[A^n = \begin{pmatrix} 2^n & 0 \\ 0 & (-1)^n \end{pmatrix}\]

Why are diagonal matrices so easy to take powers of? Because they don’t mix the coordinates. \(A\) stretches the \(x\)-direction by \(2\) and acts on the \(y\)-direction by \(-1\), completely independently — so taking \(A^n\) just means doing each independent action \(n\) times. Compare with the shear \(S\) from Problem 03 or the rotation+scaling in Problem 01, where the action mixes \(x\) and \(y\) together: powers of those matrices required actual computation.

This is one reason diagonal matrices feel “simpler than they look” in linear algebra. A natural follow-up question — can we always rewrite a matrix in some clever coordinate system where it becomes diagonal? — turns out to be a very deep one, and it’ll be a major theme later in the course.

3. Geometrically, \(A\) stretches along the \(x\)-axis by factor \(2\) and flips the \(y\)-axis (multiplying \(y\) by \(-1\), so points above the axis end up below).

The unit circle \(x^2 + y^2 = 1\) gets mapped to the ellipse \(\frac{x^2}{4} + y^2 = 1\) after one application. After \(n\) applications, the circle becomes \(\frac{x^2}{4^n} + y^2 = 1\): the horizontal semi-axis explodes as \(2^n\), while the vertical semi-axis stays pinned at \(1\). Visually, the circle gets stretched into a thinner and thinner horizontal sliver:

$n=0$ $n=1$ $n=2$

4. Applying \(A\) repeatedly to \(\begin{pmatrix} 1 \\ 1 \end{pmatrix}\):

\[A^n \begin{pmatrix} 1 \\ 1 \end{pmatrix} = \begin{pmatrix} 2^n \\ (-1)^n \end{pmatrix}\]

The \(x\)-coordinate grows without bound (doubling each step), while the \(y\)-coordinate just oscillates between \(+1\) and \(-1\) — a flip on every application.

A general principle worth pocketing: when iterating a linear map, the larger diagonal entry (in absolute value) dominates the long-term behavior. Here \(|2| > |-1|\), so the \(x\)-component runs away while the \(y\)-component stays small. This dominance phenomenon is the seed of several powerful numerical techniques for analyzing iterated systems.

05: Determinant properties

  1. Prove that \(\det(B^{-1}AB) = \det(A)\) if \(B\) is invertible.

  2. Suppose \(Q\) is a \(3 \times 3\) real matrix such that \(Q^T Q = I\). What values can \(\det(Q)\) take?

1. Use the multiplicativity of the determinant — \(\det(XY) = \det(X)\det(Y)\) — and \(\det(B^{-1}) = \frac{1}{\det(B)}\) (which follows from \(\det(B \cdot B^{-1}) = \det(I) = 1\)):

\[\det(B^{-1} A B) = \det(B^{-1})\det(A)\det(B) = \frac{1}{\det(B)} \cdot \det(A) \cdot \det(B) = \det(A)\]

What this really says. Two matrices related by \(B^{-1} A B\) are called similar. They represent the same linear transformation expressed in different coordinate systems — \(B\) being the change-of-basis matrix that converts between them. The determinant is a property of the linear map itself (a signed volume scaling factor), not of any particular coordinates, so it has to come out the same. Similarity is the matrix-algebra version of “renaming variables doesn’t change the function.”

2. Take determinants on both sides of \(Q^T Q = I\):

\[\det(Q^T Q) = \det(I) = 1\]

Using multiplicativity and the fact that \(\det(Q^T) = \det(Q)\):

\[\det(Q^T)\det(Q) = \det(Q)^2 = 1\]

Therefore \(\det(Q) = \pm 1\).

Both signs actually occur. Matrices satisfying \(Q^T Q = I\) are called orthogonal — they preserve lengths and angles. Geometrically:

  • \(\det(Q) = +1\): pure rotations (e.g., spinning a cube around an axis). They preserve orientation — a right-handed coordinate frame stays right-handed.

  • \(\det(Q) = -1\): rotations combined with a reflection (e.g., a mirror flip). They reverse orientation — right-handed becomes left-handed. A clean example is the reflection across the \(yz\)-plane:

\[Q = \begin{pmatrix} -1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix}, \qquad \det(Q) = -1\]

The whole collection of orthogonal \(3 \times 3\) matrices is called \(O(3)\), and the subgroup with \(\det = +1\) — pure rotations — is called \(SO(3)\). This \(SO(3)\) shows up everywhere physics meets linear algebra: rigid body motion, angular momentum, robot kinematics, the orientation of a phone in 3D.

06: Normal equation for linear regression

The normal equation is a closed-form solution to linear regression problems. It directly computes the optimal parameters using matrix operations, avoiding the need for iterative optimization algorithms like gradient descent.

Consider a simple linear regression problem where you want to fit a line \(y = \theta_0 + \theta_1 x\) to the following data points:

\(x\) \(y\)
1 2
2 4
  1. Set up the design matrix \(X\) (including the intercept column) and the target vector \(\vec{y}\).

  2. Use the normal equation \(\vec{\theta} = (X^T X)^{-1} X^T \vec{y}\) to find the optimal parameters \(\theta_0\) and \(\theta_1\).

  3. What line equation did you get? Does it make sense given the data?

  4. Verify your result by checking that this line passes through the given data points.

Part 1: Setting up the matrices

Design matrix \(X\) (with intercept column): \[X = \begin{pmatrix} 1 & 1 \\ 1 & 2 \end{pmatrix}\]

Target vector: \[\vec{y} = \begin{pmatrix} 2 \\ 4 \end{pmatrix}\]

Part 2: Computing the normal equation

First, compute \(X^T\): \[X^T = \begin{pmatrix} 1 & 1 \\ 1 & 2 \end{pmatrix}\]

Next, compute \(X^T X\): \[X^T X = \begin{pmatrix} 1 & 1 \\ 1 & 2 \end{pmatrix} \begin{pmatrix} 1 & 1 \\ 1 & 2 \end{pmatrix} = \begin{pmatrix} 2 & 3 \\ 3 & 5 \end{pmatrix}\]

Find \((X^T X)^{-1}\): \[\det(X^T X) = 2 \cdot 5 - 3 \cdot 3 = 10 - 9 = 1\] \[(X^T X)^{-1} = \frac{1}{1} \begin{pmatrix} 5 & -3 \\ -3 & 2 \end{pmatrix} = \begin{pmatrix} 5 & -3 \\ -3 & 2 \end{pmatrix}\]

Compute \(X^T \vec{y}\): \[X^T \vec{y} = \begin{pmatrix} 1 & 1 \\ 1 & 2 \end{pmatrix} \begin{pmatrix} 2 \\ 4 \end{pmatrix} = \begin{pmatrix} 6 \\ 10 \end{pmatrix}\]

Finally, compute \(\vec{\theta}\): \[\vec{\theta} = (X^T X)^{-1} X^T \vec{y} = \begin{pmatrix} 5 & -3 \\ -3 & 2 \end{pmatrix} \begin{pmatrix} 6 \\ 10 \end{pmatrix} = \begin{pmatrix} 0 \\ 2 \end{pmatrix}\]

Part 3: The line equation is \(y = 0 + 2x = 2x\). This makes perfect sense since both data points lie exactly on this line.

Part 4: Verification:

  • For \(x = 1\): \(y = 2(1) = 2\)
  • For \(x = 2\): \(y = 2(2) = 4\)

Where does the normal equation come from?

We want the parameter vector \(\vec{\theta}\) that minimizes the sum of squared residuals \(\|\vec{y} - X\vec{\theta}\|^2\) — the total squared distance between the predicted values \(X\vec{\theta}\) and the observed targets \(\vec{y}\).

Geometrically: as \(\vec{\theta}\) varies over all possible parameters, \(X\vec{\theta}\) sweeps out a plane (or more generally, a subspace) inside \(\mathbb{R}^n\) — the set of all points reachable by linear combinations of the columns of \(X\). We want the point in that plane that is closest to \(\vec{y}\). By a standard geometric fact, that closest point is the orthogonal projection of \(\vec{y}\) onto the plane: the residual \(\vec{y} - X\vec{\theta}\) must be perpendicular to every column of \(X\), i.e.

\[X^T(\vec{y} - X\vec{\theta}) = \vec{0} \;\Longleftrightarrow\; X^T X \vec{\theta} = X^T \vec{y}\]

If \(X^T X\) is invertible, you can solve directly: \(\vec{\theta} = (X^T X)^{-1} X^T \vec{y}\). That’s the normal equation.

Why did we get an exact fit (residual = \(\vec{0}\)) here?

We had \(2\) data points and \(2\) parameters (intercept + slope), so the system \(X\vec{\theta} = \vec{y}\) has the same number of equations as unknowns and a unique exact solution exists. With \(3\) or more points not all on the same line, you’d see a true approximation with nonzero residuals — that’s where the “fitting” in linear regression actually does meaningful work.

When does the formula break?

When \(X^T X\) is not invertible. This happens if the columns of \(X\) are linearly dependent — for example, if two features in your data are perfect multiples of each other. In that case there’s no unique best \(\vec{\theta}\) and one switches to the Moore–Penrose pseudoinverse, ridge regression, or iterative methods like gradient descent.

Exercises from Armenian notes

The six problems below are translated from chapter 3 (Geometric Interpretation of Matrices) of «Կարճ մաթեմ մեքենայական ուսուցման համար», section 3.8 (pages 43–44 of the PDF). Original numbering preserved.

3.1: Matching transformations to matrices

Match each transformation description to one of the candidate matrices.

Descriptions:

    1. Rotation by \(70^\circ\)
    1. Does nothing (identity)
    1. Reflection across the \(x\)-axis
    1. Stretches by factor \(5\) along the \(y\)-axis
    1. Collapses every point onto the line \(y = 6x\)
    1. Collapses every point onto the line \(y = 6x + 1\)

Candidate matrices:

  1. \(\begin{pmatrix} 1 & 0 \\ 0 & 5 \end{pmatrix}\)
  2. no such matrix exists
  3. \(\begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}\)
  4. \(\begin{pmatrix} 0.342 & -0.939 \\ 0.939 & 0.342 \end{pmatrix}\)
  5. \(\begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix}\)
  6. \(\begin{pmatrix} 3 & -1 \\ 18 & -6 \end{pmatrix}\)

Hint: compute determinants. Also track where the origin \((0, 0)\) goes.

Description Matrix Why
(a) Rotation by \(70^\circ\) 4 Standard rotation matrix is \(\begin{pmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{pmatrix}\). With \(\theta = 70^\circ\): \(\cos 70^\circ \approx 0.342\), \(\sin 70^\circ \approx 0.939\).
(b) Identity 3 \(I\) leaves every vector unchanged.
(c) Reflection across \(x\)-axis 5 Takes \((x, y) \to (x, -y)\).
(d) Stretches \(y\) by \(5\) 1 Takes \((x, y) \to (x, 5y)\).
(e) Collapses onto \(y = 6x\) 6 \(\det = 3\cdot(-6) - (-1)\cdot 18 = 0\) — singular, so the image is at most \(1\)-dimensional. Apply to \((1,0) \to (3, 18)\) and \((0,1) \to (-1, -6)\) — both on \(y=6x\) ✓.
(f) Collapses onto \(y = 6x + 1\) 2 The line \(y = 6x + 1\) does not pass through the origin, so no matrix can do this.

The deep takeaway from (f). Every linear map sends \(\vec 0\) to \(\vec 0\) — this is forced by linearity (\(A \vec 0 = A(0 \cdot \vec v) = 0 \cdot A\vec v = \vec 0\)). So the image of a linear map always contains the origin. Any “shifted” line (or plane, hyperplane) that doesn’t pass through \(\vec 0\) is unreachable as the image of a purely linear map. To translate, you’d need an affine transformation \(\vec x \mapsto A\vec x + \vec b\) with \(\vec b \neq \vec 0\) — fundamentally a different beast.

3.2: Determinants

What is the determinant of a matrix that…

    1. Rotates by \(14^\circ\) and stretches by \(1.5\times\) along the \(x\)-axis
    1. Is the inverse of the matrix in (a)
    1. Sends \((1, 0)^T\) to \((2, 4)^T\) and \((0, 1)^T\) to \((-1, 0)^T\)
    1. Has the form \(\begin{pmatrix} 2 & 3 \\ 2 & 1 \end{pmatrix}\)
    1. Has the form \(\begin{pmatrix} 8 & 3 & 5 \\ 1 & 4 & 2 \\ -4 & 0 & 4 \end{pmatrix}\)
    1. Is the matrix from (e), cubed

(a) Rotations preserve area, so \(\det(\text{rotation}) = 1\). A stretch by \(1.5\) in one direction has determinant \(1.5\). Determinants multiply under composition:

\[\det = 1 \cdot 1.5 = 1.5\]

(b) \(\det(A^{-1}) = 1/\det(A) = 1/1.5 = \dfrac{2}{3}\).

(This follows from \(\det(A) \det(A^{-1}) = \det(A A^{-1}) = \det(I) = 1\).)

(c) When a \(2 \times 2\) matrix sends \((1, 0) \to\) first column and \((0, 1) \to\) second column (this is exactly Problem 3.5), the matrix is

\[M = \begin{pmatrix} 2 & -1 \\ 4 & 0 \end{pmatrix}, \qquad \det(M) = 2 \cdot 0 - (-1) \cdot 4 = 4\]

(d) \(\det\!\begin{pmatrix} 2 & 3 \\ 2 & 1 \end{pmatrix} = 2 \cdot 1 - 3 \cdot 2 = -4\).

(e) Expand along row 1:

\[\det = 8 \cdot \det\!\begin{pmatrix} 4 & 2 \\ 0 & 4 \end{pmatrix} - 3 \cdot \det\!\begin{pmatrix} 1 & 2 \\ -4 & 4 \end{pmatrix} + 5 \cdot \det\!\begin{pmatrix} 1 & 4 \\ -4 & 0 \end{pmatrix}\]

\[= 8 \cdot 16 - 3 \cdot (4 - (-8)) + 5 \cdot (0 - (-16)) = 128 - 36 + 80 = 172\]

(f) \(\det(A^3) = \det(A)^3 = 172^3 = 5{,}088{,}448\).

(Useful identity: \(\det(A^n) = \det(A)^n\) for any positive integer \(n\), immediate from \(\det(AB) = \det(A)\det(B)\).)

3.3: Ellipse area

Take the unit circle \(x^2 + y^2 = 1\) and stretch it by factor \(2\) along the \(x\)-axis. The result is an ellipse — what is its area?

Generalize: derive a formula for the area of an ellipse with semi-axes \(a\) and \(b\).

The stretch is the linear map \(T = \begin{pmatrix} 2 & 0 \\ 0 & 1 \end{pmatrix}\), with \(\det(T) = 2\).

A linear map scales every area by \(|\det|\) — exactly what Problem 3.5 establishes. So:

\[\text{ellipse area} = |\det(T)| \cdot \text{circle area} = 2 \cdot \pi = 2\pi\]

Generalization. For an ellipse with semi-axes \(a\) and \(b\) — i.e., the curve \((x/a)^2 + (y/b)^2 = 1\) — apply the stretch \(T = \begin{pmatrix} a & 0 \\ 0 & b \end{pmatrix}\) to the unit circle:

\[\text{ellipse area} = |\det(T)| \cdot \pi = ab \cdot \pi = \boxed{\pi a b}\]

This is the standard formula — but here you got it geometrically in one line, instead of from a calculus integral. The determinant is the area-scaling factor, so the area of the ellipse is the area of the unit circle times the product of the semi-axes.

3.4: Finding inputs from outputs

The matrix

\[A = \begin{pmatrix} 4 & -8 \\ 1 & 2 \end{pmatrix}\]

sends some vector \(\vec v\) to:

    1. \(\begin{pmatrix} 1 \\ 2 \end{pmatrix}\)
    1. \(\begin{pmatrix} 3 \\ 4 \end{pmatrix}\)

Find \(\vec v\) in each case.

We need \(\vec v\) with \(A\vec v = \vec b\), i.e., \(\vec v = A^{-1}\vec b\).

First compute \(A^{-1}\):

\[\det(A) = 4 \cdot 2 - (-8) \cdot 1 = 16\]

\[A^{-1} = \frac{1}{16}\begin{pmatrix} 2 & 8 \\ -1 & 4 \end{pmatrix}\]

(a)

\[\vec v = A^{-1}\begin{pmatrix} 1 \\ 2 \end{pmatrix} = \frac{1}{16}\begin{pmatrix} 2 + 16 \\ -1 + 8 \end{pmatrix} = \frac{1}{16}\begin{pmatrix} 18 \\ 7 \end{pmatrix} = \begin{pmatrix} 9/8 \\ 7/16 \end{pmatrix}\]

Verify: \(A \cdot (9/8,\; 7/16)^T = \big(\tfrac{72-56}{16},\; \tfrac{18+14}{16}\big) = (1, 2)\). ✓

(b)

\[\vec v = A^{-1}\begin{pmatrix} 3 \\ 4 \end{pmatrix} = \frac{1}{16}\begin{pmatrix} 6 + 32 \\ -3 + 16 \end{pmatrix} = \frac{1}{16}\begin{pmatrix} 38 \\ 13 \end{pmatrix} = \begin{pmatrix} 19/8 \\ 13/16 \end{pmatrix}\]

Verify: \(A \cdot (19/8,\; 13/16)^T = \big(\tfrac{152-104}{16},\; \tfrac{38+26}{16}\big) = (3, 4)\). ✓

Geometric remark. The inverse \(A^{-1}\) “undoes” what \(A\) did. Going from output back to input is exactly inverting the matrix. This is why being invertible is the same as being able to uniquely reconstruct an input from its output — and that’s the same as \(\det(A) \neq 0\) (a non-singular matrix doesn’t collapse different inputs onto the same output).

3.5: Where standard basis vectors go, and parallelogram area

Suppose \(A = \begin{pmatrix} a & b \\ c & d \end{pmatrix}\). Applying \(A\):

  1. Where does \(\begin{pmatrix} 1 \\ 0 \end{pmatrix}\) go?

  2. Where does \(\begin{pmatrix} 0 \\ 1 \end{pmatrix}\) go?

  3. Why is the area of the parallelogram spanned by these two output vectors equal to \(|\det(A)|\)?

(a) \(A\begin{pmatrix} 1 \\ 0 \end{pmatrix} = \begin{pmatrix} a \\ c \end{pmatrix}\) — the first column of \(A\).

(b) \(A\begin{pmatrix} 0 \\ 1 \end{pmatrix} = \begin{pmatrix} b \\ d \end{pmatrix}\) — the second column of \(A\).

This is one of the most important facts about matrices: the columns of \(A\) tell you exactly where the standard basis vectors land. To define a linear transformation, you only need to specify where \(\vec e_1, \vec e_2, \ldots\) go — everything else follows by linearity. Matrix multiplication is then just “look up where each basis vector goes, then combine.”

(c) The area of the parallelogram spanned by \(\binom{a}{c}\) and \(\binom{b}{d}\) is

\[\text{area} = |ad - bc| = |\det(A)|\]

Why? From the formula “area = base × height”. Take base \(= \|(a, c)\| = \sqrt{a^2 + c^2}\). The height is the perpendicular distance from \((b, d)\) to the line through the origin in direction \((a, c)\), which works out (after some algebra with the perpendicular-foot formula) to \(\dfrac{|ad - bc|}{\sqrt{a^2 + c^2}}\). Multiplying:

\[\text{area} = \sqrt{a^2 + c^2} \cdot \frac{|ad - bc|}{\sqrt{a^2 + c^2}} = |ad - bc| = |\det(A)|\]

This is the foundational geometric fact about determinants: \(\det\) is the area-scaling factor of the linear map.

The unit square (area \(1\)) gets mapped by \(A\) to the parallelogram spanned by the columns of \(A\) (area \(|\det(A)|\)). In 3D, the analogous statement uses \(|\det|\) for volumes; in \(n\)D, for \(n\)-dimensional volumes. Almost every property of determinants — multiplicativity (\(\det(AB) = \det(A)\det(B)\)), behavior under inverses, change of variables in integrals — is a consequence of this single geometric meaning.

3.6: Permutation matrix and row swaps

Take the \(4 \times 4\) identity matrix and swap rows \(2\) and \(3\):

\[B = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix}\]

  1. What happens to a \(4 \times 4\) matrix when you multiply it by \(B\) from the left?

  2. What is \(\det(B)\)?

  3. How does the determinant of a matrix change when you swap two of its rows?

  4. Generalize the result.

(a) Left-multiplying any \(4 \times 4\) matrix \(X\) by \(B\) swaps rows \(2\) and \(3\) of \(X\):

\[BX = \begin{pmatrix} \text{row 1 of } X \\ \text{row 3 of } X \\ \text{row 2 of } X \\ \text{row 4 of } X \end{pmatrix}\]

Reason. Row \(i\) of \(BX\) equals (row \(i\) of \(B\)) \(\cdot X\). Row \(1\) of \(B\) has a single \(1\) in column \(1\), so it picks out row \(1\) of \(X\). Row \(2\) of \(B\) has its single \(1\) in column \(3\), so it picks out row \(3\) of \(X\). Row \(3\) picks out row \(2\). Row \(4\) picks out row \(4\). Net effect: rows \(2\) and \(3\) swap.

A matrix like \(B\) (every row and column has exactly one \(1\), rest zeros) is called a permutation matrix. Left-multiplication permutes the rows of whatever it’s applied to.

(b) Compute \(\det(B)\) by cofactor expansion. Expand along the first column (only \(B_{11} = 1\) contributes):

\[\det(B) = 1 \cdot \det\!\begin{pmatrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{pmatrix}\]

Expand the \(3 \times 3\) along its third column (only the \((3,3)\) entry \(= 1\) contributes):

\[= 1 \cdot \det\!\begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix} = 0 - 1 = -1\]

So \(\det(B) = -1\).

(c) Combining (a) and (b): left-multiplying by \(B\) performs the row swap, and by multiplicativity \(\det(BX) = \det(B)\det(X) = -\det(X)\). So swapping two rows of a matrix flips the sign of its determinant.

(d) Generalization. Swapping any two rows (or columns) of any square matrix multiplies its determinant by \(-1\). More generally, if you apply a permutation \(\sigma\) to the rows, the determinant gets multiplied by \(\text{sgn}(\sigma)\)\(+1\) for even permutations, \(-1\) for odd.

This is one of the three core “row operations” you can use to compute determinants efficiently:

  1. Swap two rows → determinant flips sign
  2. Multiply a row by \(c\) → determinant scales by \(c\)
  3. Add a multiple of one row to another → determinant is unchanged

Any sequence of these reduces a matrix to upper-triangular form, where the determinant is just the product of the diagonal entries. This is exactly how Gaussian elimination computes determinants in \(O(n^3)\) — drastically faster than the \(n!\)-term cofactor expansion you’d otherwise need.

🎲 39 (02)

Flag Counter