01 Vectors and Linear Algebra Fundamentals

image.png լուսանկարի հղումը, Հեղինակ՝ Suren Sargsyan

📚 Նյութը

ToDo

🏡 Տնային

Note
  1. ❗❗❗ DON’T CHECK THE SOLUTIONS BEFORE TRYING TO DO THE HOMEWORK BY YOURSELF❗❗❗
  2. Please don’t hesitate to ask questions, never forget about the 🍊karalyok🍊 principle!
  3. The harder the problem is, the more 🧀cheeses🧀 it has.
  4. Problems with 🎁 are just extra bonuses. It would be good to try to solve them, but also it’s not the highest priority task.
  5. If the problem involve many boring calculations, feel free to skip them - important part is understanding the concepts.
  6. Submit your solutions here (even if it’s unfinished)

Vector Operations

01 RGB color mixing with vectors

In computer graphics and image processing, colors can be represented as RGB vectors where each component (Red, Green, Blue) ranges from 0 to 255. Vector operations on these RGB values correspond to color mixing and transformations.

Consider these RGB color vectors:

  • Red: \(\vec{r} = (255, 0, 0)\)
  • Cyan: \(\vec{c} = (0, 255, 255)\)
  1. Calculate what color you get by adding red and cyan: \(\vec{r} + \vec{c}\).
  2. Find the “average” color between red and cyan: \(\frac{1}{2}(\vec{r} + \vec{c})\).
  3. Use a color picker to verify your answers from parts (a) and (b). What colors do you actually see?

02 Dot product

A translation office translated \(a = [24, 17, 9, 13]\) documents from English, French, German and Russian, respectively. For each of those languages, it takes about \(b = [5, 10, 11, 7]\) minutes to translate one page. How much time did they spend translating in total? How much did each of the translators spend on average if there are 4 translators in the office? Write an expression for this amount in terms of the vectors \(a\) and \(b\).

03 Feature vector normalization

In machine learning, we often work with data that has very different scales - like comparing a person’s age (around 20-80) with their salary (around 20,000-100,000). Without normalization (bringing all the values to a similar scale (e.g. having length of 1)), algorithms might think salary is much more important just because the numbers are bigger. Normalizing vectors to unit length helps ensure all features are treated equally.

A customer is represented by the vector \(\vec{v} = (25, 50000, 3)\) where components represent [age, income in $, number of purchases].

  1. Calculate the Euclidean norm (magnitude) \(||\vec{v}||_2\)
  2. Find the unit vector \(\hat{v} = \frac{\vec{v}}{||\vec{v}||_2}\)
  3. Verify that \(||\hat{v}||_2 = 1\)

Note: No need to carry out the calculations explicitly.

04 Triangle inequality

For vectors \(\vec{u} = (3, 4)\) and \(\vec{v} = (5, -12)\):

  1. Calculate \(||\vec{u}||\), \(||\vec{v}||\), and \(||\vec{u} + \vec{v}||\)
  2. Verify the triangle inequality: \(||\vec{u} + \vec{v}|| \leq ||\vec{u}|| + ||\vec{v}||\)
  3. When does equality hold in the triangle inequality?

05 Model selection with regularization

In machine learning, we constantly face a tradeoff: should we use a complex model that fits our training data very well, or a simpler model that captures the general pattern? This is where regularization comes in.

Imagine you’re Netflix trying to predict movie ratings. You could create an extremely complex formula with thousands of parameters that perfectly predicts every rating in your training data. But when a new user comes along, your model might fail spectacularly - it memorized the training data instead of learning the underlying patterns. This is called overfitting. (Kargin example)

Regularization prevents overfitting by adding a penalty for model complexity to our optimization goal:

\[\text{Total Error} = \text{Prediction Error} + \lambda \cdot \text{Complexity Penalty}\]

where \(\lambda\) controls how much we penalize complexity (having large parameter values).

The two most common regularization methods use different norms to measure complexity:

  • L1 Regularization (Lasso): Uses the sum of absolute values \[\text{L1 penalty} = \lambda \sum_{i=1}^{n} |w_i|\]

  • L2 Regularization (Ridge): Uses the sum of squares \[\text{L2 penalty} = \lambda \sum_{i=1}^{n} w_i^2\]

Real-world example: Suppose you’re predicting house prices using features like size, location, age, etc. Without regularization, your model might learn that “houses with exactly 2,347 sq ft, built in 1987, with 3.5 bathrooms, facing north-northeast, with blue doors” sell for $523,456. With regularization, it learns more general rules like “larger houses in good neighborhoods cost more.”

Վստահ չեմ որ լավ եմ ձևակերպել (հատկապես) էս խնդիրը , եթե հարցեր լինեն՝ խաբար արեք։

You’re comparing two models that predict house prices:

  • Model A: Complex formula with weights (coefficients) \(\vec{w_A} = (10, -8, 4)\) (this can correspond to equation (\(10x^2 - 8x + 4\) (quadratic))) and prediction error = 100
  • Model B: Simpler formula with weights \(\vec{w_B} = (0.1, -3, 1)\) \((0.1x^2 - 3x + 1)\) (almost just a linear function) and prediction error = 120

Model B makes slightly worse predictions, but which model is better when considering both error and simplicity?

  1. L1 Regularization (λ = 0.5): Calculate the total error for each model
    • Model A: \(\text{Error} + \lambda \cdot ||\vec{w_A}||_1 = ?\)
    • Model B: \(\text{Error} + \lambda \cdot ||\vec{w_B}||_1 = ?\)
  2. L2 Regularization (λ = 0.5): Calculate the total error for each model
    • Model A: \(\text{Error} + \lambda \cdot ||\vec{w_A}||_2^2 = ?\)
    • Model B: \(\text{Error} + \lambda \cdot ||\vec{w_B}||_2^2 = ?\)
  3. Model Selection: Which model would you choose under each regularization method? How does the choice of \(\lambda\) affect your decision?
  4. Practical Insight: In production systems, why might we prefer a model with slightly worse accuracy but much simpler weights?

06 k-Nearest Neighbors Classification

Կցված կգտնեք csv ֆայլ երեք սյունով՝ feature_1, feature_2, label։ Կարող եք պատկերացնել որ feature_1-ը իրանից ներկայացնում ա ծաղկի բարձրությունը, feature_2-ը՝ լայնությունը ու label (պիտակը) ներկայացնում ա թե 4 ծաղկի տեսակներից (0,1,2,3) որ մեկն ա։

Պետք ա ստեղծել մոդել (ալգորիթմ) որը ստանալով feature_1, feature_2 արժեքները կգուշակի ծաղկի տեսակը։

Հետևյալ կերպով՝ նոր ծաղկի համար գտնել K հատ ամենամոտիկ ծաղիկները մեր ունեցած տվյալներից ու նայել թե էդ k հարևաններից որ տեսակի ծաղիկն ա գերակշռում՝ ու դա օգտագործել որպես գուշակություն,

Հեռավորություն որպես օգտագործեք մի դեպքում L1-ը (Manhattan), մի դեպքում L2-ը (Euclidean): K-ի համար էլ տարբեր արժեքներ բզբացեք՝ 2,3, 5, 10 .

Թեթև հավելյալ նշումներ 1. Ալգորիմթի անունն ա K Nearest Neighbors ու զուտ “ասա ինձ ովքեր են քո ընկերները, ես կասեմ ով ես դու” սկզբումքով ա աշխատում, պրակտիկայում համարյա երբեք չի օգտագործվում բայց տնայինի համար կարա հավես լինի 2. Պատճառներից մեկը թե ինչի չի օգտագործվում դա “Չափողականության անեծքն” ա (Curse of dimenionality), շատ հավես էֆեկտ ա ըստ որի երբ գործ ենք ունենում բարձր չափանի տարածությունների հետ, տվյալները հիմնականում իրարից համարյա հավասարահեռ են դառնում ու անկյուններում են կուտակվում (այլ կերպ ասած՝ եթե բարձրաչափ նարինջը կլպենք՝ տակը բան չի մնա)։ Աղբյուր (https://slds-lmu.github.io/i2ml/chapters/14_cod/)

Մնացածը կարաք անտեսեք դեռ

04: Similarity measurement

The dot product is fundamental in measuring similarity between vectors. In recommendation systems, we often use cosine similarity (based on dot products) to find similar users or items.

Two user preference vectors are \(\vec{u_1} = (5, 3, 1, 4)\) and \(\vec{u_2} = (3, 5, 2, 2)\) where each component represents rating for different movie genres.

  1. Calculate the dot product \(\vec{u_1} \cdot \vec{u_2}\)
  2. Calculate the cosine similarity: \(\cos(\theta) = \frac{\vec{u_1} \cdot \vec{u_2}}{||\vec{u_1}|| \cdot ||\vec{u_2}||}\)
  3. What does a cosine similarity close to 1 indicate about user preferences?

04.5: Word embeddings similarity

ToDo

Check out this 3Blue1Brown video on word vectors for more insights!

06: Finding perpendicular vectors

Given the vector \(\vec{v} = (2, 3)\):

  1. Find a non-zero vector \(\vec{w} = (x, y)\) such that \(\vec{v}\) and \(\vec{w}\) are perpendicular.
  2. Verify that your chosen vector \(\vec{w}\) satisfies \(\vec{v} \cdot \vec{w} = 0\).
  3. Find a unit vector in the direction of \(\vec{w}\) by computing \(\frac{\vec{w}}{||\vec{w}||}\).
  4. Explain why there are infinitely many vectors perpendicular to \(\vec{v}\) and describe the general form of all such vectors.

08: Deriving the cosine angle formula

Derive the formula for the cosine of the angle between two vectors: \(\cos(\theta) = \frac{\vec{a} \cdot \vec{b}}{||\vec{a}|| \cdot ||\vec{b}||}\)

Start with the law of cosines for a triangle: \(c^2 = a^2 + b^2 - 2ab\cos(\theta)\). Consider a triangle formed by vectors \(\vec{a}\), \(\vec{b}\), and \(\vec{a} - \vec{b}\). The side lengths are \(||\vec{a}||\), \(||\vec{b}||\), and \(||\vec{a} - \vec{b}||\). Express \(||\vec{a} - \vec{b}||^2\) using the dot product and substitute into the law of cosines.

  1. Write down the law of cosines for the triangle with sides \(||\vec{a}||\), \(||\vec{b}||\), and \(||\vec{a} - \vec{b}||\)
  2. Express \(||\vec{a} - \vec{b}||^2\) in terms of dot products by expanding \((\vec{a} - \vec{b}) \cdot (\vec{a} - \vec{b})\)
  3. Substitute your result from part (2) into the law of cosines and solve for \(\cos(\theta)\)
  4. Verify your derived formula using vectors \(\vec{u} = (3, 4)\) and \(\vec{v} = (1, 0)\)

Geometric Interpretation

21: High-dimensional vector geometry

In high-dimensional spaces (common in ML), our intuition about geometry can be misleading.

Consider the unit sphere in \(\mathbb{R}^n\) (all vectors with norm 1):

  1. In 2D, what fraction of a unit square \([-1,1] \times [-1,1]\) is occupied by the unit circle?
  2. Estimate this fraction for a unit cube in 3D
  3. Research: What happens to this fraction as the dimension \(n\) increases? This is known as the “curse of dimensionality.”

Video

🛠️ Գործնական ToDo

🎲 38 (01) TODO