01 Linear Algebra - Vectors

լուսանկարի հղումը, Հեղինակ՝ Suren Sargsyan

📚 Նյութը

📚 Տանը կարդում ենք՝

Վեկտորներ, սկալյար արտադրյալ, վեկտորի նորմ

Poole, 2-12 էջերը (վեկտորներ)
Johnston, 10-14 էջերը (նորմ)

և դիտում 3b1b-ի 1-ին տեսադասը գծային հանրահաշվից(նույնը հայերեն)

Անկյուն, կոսինուսային նմանություն, գծային տարածություններ

Johnston, 15-19 էջերը (Կոշի-Շվարց, անկյուն)
Poole, 26-28 էջերը (Պյութագորասի թեորեմ, պրոյեկցիա)
Johnston, 121-124 էջերը (գծային տարածություններ) և ցանկության դեպքում դիտում StatQuest-ի տեսադասը (https://youtu.be/e9U0QAFbfLI) կոսինուսային նմանության մասին

Բոլոր գրքերը այստեղ են։

🏡 Տնային

Note

❗❗❗ DON’T CHECK THE SOLUTIONS BEFORE TRYING TO DO THE HOMEWORK BY YOURSELF❗❗❗
Please don’t hesitate to ask questions, never forget about the 🍊karalyok🍊 principle!
The harder the problem is, the more 🧀cheeses🧀 it has.
Problems with 🎁 are just extra bonuses. It would be good to try to solve them, but also it’s not the highest priority task.
If the problem involve many boring calculations, feel free to skip them - important part is understanding the concepts.
Submit your solutions here (even if it’s unfinished)

Vector Operations

01 RGB color mixing with vectors

Context

In computer graphics and image processing, colors can be represented as RGB vectors where each component (Red, Green, Blue) ranges from 0 to 255. Vector operations on these RGB values correspond to color mixing and transformations.

Consider these RGB color vectors:

Red: $\vec{r} = (255, 0, 0)$
Cyan: $\vec{c} = (0, 255, 255)$

Calculate what color you get by adding red and cyan: $\vec{r} + \vec{c}$.
Find the “average” color between red and cyan: $\frac{1}{2}(\vec{r} + \vec{c})$.
Use a color picker to verify your answers from parts (1) and (2). What colors do you actually see?

02 Dot product

A translation office translated $a = [24, 17, 9, 13]$ documents from English, French, German and Russian, respectively. For each of those languages, it takes about $b = [5, 10, 11, 7]$ minutes to translate one page. How much time did they spend translating in total? How much did each of the translators spend on average if there are 4 translators in the office? Write an expression for this amount in terms of the vectors $a$ and $b$.

03 Feature vector normalization

Context

In machine learning, we often work with data that has very different scales - like comparing a person’s age (around 20-80) with their salary (around 20,000-100,000). Without normalization (bringing all the values to a similar scale (e.g. having length of 1)), algorithms might think salary is much more important just because the numbers are bigger. Normalizing vectors to unit length helps ensure all features are treated equally.

A customer is represented by the vector $\vec{v} = (25, 50000, 3)$ where components represent [age, income in $, number of purchases].

Calculate the Euclidean norm (magnitude) $||\vec{v}||_2$
Find the unit vector $\hat{v} = \frac{\vec{v}}{||\vec{v}||_2}$
Verify that $||\hat{v}||_2 = 1$

Note: No need to carry out the calculations explicitly.

04 Triangle inequality

For vectors $\vec{u} = (3, 4)$ and $\vec{v} = (5, -12)$:

Calculate $||\vec{u}||$, $||\vec{v}||$, and $||\vec{u} + \vec{v}||$
Verify the triangle inequality: $||\vec{u} + \vec{v}|| \leq ||\vec{u}|| + ||\vec{v}||$
When does equality hold in the triangle inequality?

05 Model selection with regularization

Context

In machine learning, we constantly face a tradeoff: should we use a complex model that fits our training data very well, or a simpler model that captures the general pattern? This is where regularization comes in.

Imagine you’re Netflix trying to predict movie ratings. You could create an extremely complex formula with thousands of parameters that perfectly predicts every rating in your training data. But when a new user comes along, your model might fail spectacularly - it memorized the training data instead of learning the underlying patterns. This is called overfitting. (Kargin example)

Regularization prevents overfitting by adding a penalty for model complexity to our optimization goal:

\[\text{Total Error} = \text{Prediction Error} + \lambda \cdot \text{Complexity Penalty}\]

where $\lambda$ controls how much we penalize complexity (having large parameter values).

The two most common regularization methods use different norms to measure complexity:

L1 Regularization (Lasso): Uses the sum of absolute values \[\text{L1 penalty} = \lambda \sum_{i=1}^{n} |w_i|\]
L2 Regularization (Ridge): Uses the sum of squares \[\text{L2 penalty} = \lambda \sum_{i=1}^{n} w_i^2\]

Real-world example: Suppose you’re predicting house prices using features like size, location, age, etc. Without regularization, your model might learn that “houses with exactly 2,347 sq ft, built in 1987, with 3.5 bathrooms, facing north-northeast, with blue doors” sell for $523,456. With regularization, it learns more general rules like “larger houses in good neighborhoods cost more.”

Վստահ չեմ որ լավ եմ ձևակերպել (հատկապես) էս խնդիրը , եթե հարցեր լինեն՝ խաբար արեք։

You’re comparing two models that predict house prices:

Model A: Complex formula with weights (coefficients) $\vec{w_A} = (10, -8, 4)$ (this can correspond to equation ($10x^2 - 8x + 4$ (quadratic))) and prediction error = 100
Model B: Simpler formula with weights $\vec{w_B} = (0.1, -3, 1)$ $(0.1x^2 - 3x + 1)$ (almost just a linear function) and prediction error = 120

Model B makes slightly worse predictions, but which model is better when considering both error and simplicity?

L1 Regularization (λ = 0.5): Calculate the total error for each model
- Model A: $\text{Error} + \lambda \cdot ||\vec{w_A}||_1 = ?$
- Model B: $\text{Error} + \lambda \cdot ||\vec{w_B}||_1 = ?$
L2 Regularization (λ = 0.5): Calculate the total error for each model
- Model A: $\text{Error} + \lambda \cdot ||\vec{w_A}||_2^2 = ?$
- Model B: $\text{Error} + \lambda \cdot ||\vec{w_B}||_2^2 = ?$
Model Selection: Which model would you choose under each regularization method? How does the choice of $\lambda$ affect your decision?
Practical Insight: In production systems, why might we prefer a model with slightly worse accuracy but much simpler weights?

06 k-Nearest Neighbors Classification

Կցված կգտնեք csv ֆայլ երեք սյունով՝ feature_1, feature_2, label։ Կարող եք պատկերացնել որ feature_1-ը իրանից ներկայացնում ա ծաղկի բարձրությունը, feature_2-ը՝ լայնությունը ու label (պիտակը) ներկայացնում ա թե 4 ծաղկի տեսակներից (0,1,2,3) որ մեկն ա։

Պետք ա ստեղծել մոդել (ալգորիթմ) որը ստանալով feature_1, feature_2 արժեքները կգուշակի ծաղկի տեսակը։

Հետևյալ կերպով՝ նոր ծաղկի համար գտնել K հատ ամենամոտիկ ծաղիկները մեր ունեցած տվյալներից ու նայել թե էդ k հարևաններից որ տեսակի ծաղիկն ա գերակշռում՝ ու դա օգտագործել որպես գուշակություն,

Հեռավորություն որպես օգտագործեք մի դեպքում L1-ը (Manhattan), մի դեպքում L2-ը (Euclidean): K-ի համար էլ տարբեր արժեքներ բզբացեք՝ 2,3, 5, 10 .

Թեթև հավելյալ նշումներ 1. Ալգորիմթի անունն ա K Nearest Neighbors ու զուտ “ասա ինձ ովքեր են քո ընկերները, ես կասեմ ով ես դու” սկզբումքով ա աշխատում, պրակտիկայում համարյա երբեք չի օգտագործվում բայց տնայինի համար կարա հավես լինի 2. Պատճառներից մեկը թե ինչի չի օգտագործվում դա “Չափողականության անեծքն” ա (Curse of dimenionality), շատ հավես էֆեկտ ա ըստ որի երբ գործ ենք ունենում բարձր չափանի տարածությունների հետ, տվյալները հիմնականում իրարից համարյա հավասարահեռ են դառնում ու անկյուններում են կուտակվում (այլ կերպ ասած՝ եթե բարձրաչափ նարինջը կլպենք՝ տակը բան չի մնա)։ Աղբյուր (https://slds-lmu.github.io/i2ml/chapters/14_cod/)

Dot Products and Angles Between Vectors

07 Finding perpendicular vectors

Given the vector $\vec{v} = (2, 3)$:

Find a non-zero vector $\vec{w} = (x, y)$ such that $\vec{v}$ and $\vec{w}$ are perpendicular.
Verify that your chosen vector $\vec{w}$ satisfies $\vec{v} \cdot \vec{w} = 0$.
Find a unit vector in the direction of $\vec{w}$ by computing $\frac{\vec{w}}{||\vec{w}||}$.
Explain why there are infinitely many vectors perpendicular to $\vec{v}$ and describe the general form of all such vectors.

08: Word embeddings similarity

Context

Computers understand numbers, not words. To make sense of text, we convert words into vectors in a high-dimensional space, called word embeddings. In this space, words with similar meanings are located close to each other. For example, “king” and “queen” are closer than “king” and “car”.

# install gensim if you haven't already (you can also do it without `uv`, but why would you? uv is fantastic)
# !uv pip install gensim

import gensim.downloader as api

# this might take a few minutes first time
model_name = "glove-twitter-25" # smaller model 

# If you're willing to wait a bit longer for a better model, uncomment line below:
# model_name = "word2vec-google-news-300"  # 1.5 gb model

model = api.load(model_name)  # 300-dimensional vectors

print("Model loaded successfully!")

# to get vector for a word we just get it like from a dictionary
word = "cheese"
if word in model:
    vector = model[word]
    print(f"Vector for '{word}' has shape: {vector.shape}")
    print(f"First 10 dimensions: {vector[:10]}")
else:
    print(f"'{word}' not found in vocabulary")

Familiarize yourself with the code above. It loads a pre-trained word embedding model and retrieves the vector for the word “cheese”.

Your task is to calculate how similar the word “cheese” (or any other word you choose) is to a list of other words (given below) using cosine similarity.

potential_words = ["elephant", "cheese", "butter", "bread", "watermelon", "potato", "iron", "clock", "computer", "chicken", "fries"]

Return a sorted dictionary where keys are words and values are their cosine similarity to “cheese”.

You can AFTERWARDS also use the built-in method to find most similar words:

model.most_similar("cheese", topn=10)

Bonus: Play around with vector arithmetic to explore relationships - you can add and subtract word vectors to see how meanings combine. For example, try “king” - “man” + “woman” and see what word is closest to the resulting vector! Or try “Paris” - “France” + “Armenia” to see if you get “Yerevan”.

Check out this fantastic 3Blue1Brown video on word vectors (embeddings) for more insights! Also, this is a cool tool to play with

Vector Spaces and Subspaces

09: Identifying vector spaces and non-vector spaces

For each of the following sets, determine whether it is a vector space or not. If it is a vector space, prove it by verifying all the required axioms. If it is not a vector space, identify which axiom(s) fail and provide counterexamples.

$A = \left\{\begin{pmatrix} a \\ 0 \end{pmatrix} \mid a \in \mathbb{R}\right\}$ (vectors with second component zero)
$B = \left\{\begin{pmatrix} a \\ -a \end{pmatrix} \mid a \in \mathbb{R}\right\}$ (vectors where second component is negative of first)
$C = \mathbb{N}$ (the set of natural numbers)
$D = \left\{\begin{pmatrix} a \\ 1 \end{pmatrix} \mid a \in \mathbb{R}\right\}$ (vectors with second component always 1)

Note: By default, if we don’t mention the operation, we mean the standard vector addition and scalar multiplication (e.g. our good old + and * we learning at school).

Hint: For the non-vector spaces, show that there are some “bad” elements such that if we add them or multiply with some number (not necessarily positive), the result would not belong to the set.

10 vector space or not?

Check if the following set is a vector space:

$A=\mathbb{Z}$, with the usual operations $+$ and $\cdot$.
$B=\left\{ \begin{bmatrix} 0 \\ 0 \\ a \end{bmatrix} \,\middle|\, a\in\mathbb{R} \right\}$ with the usual operations $+$ and $\cdot$.
$C=\mathbb{R}^2=\left\{ \begin{bmatrix} a \\ b \end{bmatrix} \,\middle|\, a,b\in\mathbb{R} \right\}$, with the usual operation $\cdot$ and the addition defined as $$
\[\begin{bmatrix} x_1 \\ x_2 \end{bmatrix}\]

\[\begin{bmatrix} y_1 \\ y_2 \end{bmatrix}\] = \[\begin{bmatrix} x_1 + y_1 \\ x_2 + y_2 + 1 \end{bmatrix}\] . $$

The set of all polynomials of degree $\le 2$, with the usual operations $+$ and $\cdot$. Bonus question - is this maybe “equivalent” to some other vector space we already know?

11 Vector subspaces

Video solution

12 Deriving the cosine angle formula {.bonus problem data-difficulty=“3”}

Derive the formula for the cosine of the angle between two vectors: $\cos(\theta) = \frac{\vec{a} \cdot \vec{b}}{||\vec{a}|| \cdot ||\vec{b}||}$

Hint

Start with the law of cosines for a triangle: $c^2 = a^2 + b^2 - 2ab\cos(\theta)$. Consider a triangle formed by vectors $\vec{a}$, $\vec{b}$, and $\vec{a} - \vec{b}$. The side lengths are $||\vec{a}||$, $||\vec{b}||$, and $||\vec{a} - \vec{b}||$. Express $||\vec{a} - \vec{b}||^2$ using the dot product and substitute into the law of cosines.

Write down the law of cosines for the triangle with sides $||\vec{a}||$, $||\vec{b}||$, and $||\vec{a} - \vec{b}||$
Express $||\vec{a} - \vec{b}||^2$ in terms of dot products by expanding $(\vec{a} - \vec{b}) \cdot (\vec{a} - \vec{b})$
Substitute your result from part (2) into the law of cosines and solve for $\cos(\theta)$
Verify your derived formula using vectors $\vec{u} = (3, 4)$ and $\vec{v} = (1, 0)$

13 High-dimensional vector geometry

In high-dimensional spaces (common in ML), our intuition about geometry can be misleading (we will explore this later).

Consider the unit sphere in $\mathbb{R}^n$ (all vectors with norm 1):

In 2D, what fraction of a unit square $[-1,1] \times [-1,1]$ is occupied by the unit circle?
Estimate this fraction for a unit cube in 3D (you can google the formula)
Try to guess and then google What happens to this fraction as the dimension $n$ increases? This is known as the “curse of dimensionality.”

Video

🎲 38 (01) TODO

▶️ToDo
🔗Random link
🇦🇲🎶ToDo
🌐🎶ToDo
🤌Կարգին

--- title: "01 Linear Algebra - Vectors" format: html: css: homework-styles.css --- <script src="homework-scripts.js"></script> ![image.png](../background_photos/math_01_shinararutun.jpg) [լուսանկարի հղումը](https://unsplash.com/photos/black-and-yellow-crane-near-building-during-daytime-JcRhkLqvICA), Հեղինակ՝ [Suren Sargsyan](https://unsplash.com/@s_u_ren) # 📚 Նյութը - [📚 Ամբողջական նյութը](01_vectors_linear_algebra.qmd) - [📺 Վեկտորներ, սկալյար արտադրյալ, վեկտորի նորմ](https://www.youtube.com/watch?v=-VPo9D_E6FQ), [🎞️ Սլայդեր](Lectures/L01_Vectors.pdf) - [📺 Անկյուն, կոսինուսային նմանություն, գծային տարածություններ](https://www.youtube.com/watch?v=kh10WTvYTR0), [🎞️ Սլայդեր](Lectures/L02_Angles__Vector_Spaces__Matrices.pdf) - [🛠️📺 Անկյուն, նորմ, վեկտորական (ենթա)տարածություն](https://www.youtube.com/watch?v=bLQJIKmkqmE), [🛠️🗂️ Գործնականի PDF-ը](Homeworks/hw_01_vectors.pdf) 📚 Տանը կարդում ենք՝ **Վեկտորներ, սկալյար արտադրյալ, վեկտորի նորմ** - [Poole](bibliography/Poole - Linear Algebra-1-400.pdf), 2-12 էջերը (վեկտորներ) - [Johnston](bibliography/Nathaniel Johnston - Introduction to Linear and Matrix Algebra-Springer (2021).pdf), 10-14 էջերը (նորմ) և դիտում 3b1b-ի 1-ին [տեսադասը](https://youtu.be/fNk_zzaMoSs) գծային հանրահաշվից(նույնը [հայերեն](https://youtu.be/7-r7Z2iH0Ps)) **Անկյուն, կոսինուսային նմանություն, գծային տարածություններ** - [Johnston](bibliography/Nathaniel Johnston - Introduction to Linear and Matrix Algebra-Springer (2021).pdf), 15-19 էջերը (Կոշի-Շվարց, անկյուն) - [Poole](bibliography/Poole - Linear Algebra-1-400.pdf), 26-28 էջերը (Պյութագորասի թեորեմ, պրոյեկցիա) - [Johnston](bibliography/Nathaniel Johnston - Introduction to Linear and Matrix Algebra-Springer (2021).pdf), 121-124 էջերը (գծային տարածություններ) և ցանկության դեպքում դիտում StatQuest-ի տեսադասը (https://youtu.be/e9U0QAFbfLI) կոսինուսային նմանության մասին Բոլոր գրքերը [այստեղ](https://drive.google.com/drive/folders/14ib_UZSDQ4UPW6XgncURhhbtWLs3-qV3?usp=drive_link) են։ # 🏡 Տնային ::: {.callout-note collapse="false"} 1. ❗❗❗ DON'T CHECK THE SOLUTIONS BEFORE TRYING TO DO THE HOMEWORK BY YOURSELF❗❗❗ 2. Please don't hesitate to ask questions, never forget about the 🍊karalyok🍊 principle! 3. The harder the problem is, the more 🧀cheeses🧀 it has. 4. Problems with 🎁 are just extra bonuses. It would be good to try to solve them, but also it's not the highest priority task. 5. If the problem involve many boring calculations, feel free to skip them - important part is understanding the concepts. 6. Submit your solutions [here](https://forms.gle/CFEvNqFiTSsDLiFc6) (even if it's unfinished) ::: ## Vector Operations ### 01 RGB color mixing with vectors {data-difficulty="1"} ::: {.callout-tip collapse="true" appearance="minimal"} #### Context In computer graphics and image processing, colors can be represented as RGB vectors where each component (Red, Green, Blue) ranges from 0 to 255. Vector operations on these RGB values correspond to color mixing and transformations. ::: Consider these RGB color vectors: - Red: $\vec{r} = (255, 0, 0)$ - Cyan: $\vec{c} = (0, 255, 255)$ 1. Calculate what color you get by adding red and cyan: $\vec{r} + \vec{c}$. 2. Find the "average" color between red and cyan: $\frac{1}{2}(\vec{r} + \vec{c})$. 3. Use a [color picker](https://share.google/yadDErXuKGKRwIHnq) to verify your answers from parts (1) and (2). What colors do you actually see? ### 02 Dot product {data-difficulty="1"} A translation office translated $a = [24, 17, 9, 13]$ documents from English, French, German and Russian, respectively. For each of those languages, it takes about $b = [5, 10, 11, 7]$ minutes to translate one page. How much time did they spend translating in total? How much did each of the translators spend on average if there are 4 translators in the office? Write an expression for this amount in terms of the vectors $a$ and $b$. ### 03 Feature vector normalization {data-difficulty="2"} ::: {.callout-note collapse="true" appearance="minimal"} #### Context In machine learning, we often work with data that has very different scales - like comparing a person's age (around 20-80) with their salary (around 20,000-100,000). Without normalization (bringing all the values to a similar scale (e.g. having length of 1)), algorithms might think salary is much more important just because the numbers are bigger. Normalizing vectors to unit length helps ensure all features are treated equally. ::: A customer is represented by the vector $\vec{v} = (25, 50000, 3)$ where components represent [age, income in $, number of purchases]. 1. Calculate the Euclidean norm (magnitude) $||\vec{v}||_2$ 2. Find the unit vector $\hat{v} = \frac{\vec{v}}{||\vec{v}||_2}$ 3. Verify that $||\hat{v}||_2 = 1$ *Note:* No need to carry out the calculations explicitly. ### 04 Triangle inequality {data-difficulty="2"} For vectors $\vec{u} = (3, 4)$ and $\vec{v} = (5, -12)$: 1. Calculate $||\vec{u}||$, $||\vec{v}||$, and $||\vec{u} + \vec{v}||$ 2. Verify the triangle inequality: $||\vec{u} + \vec{v}|| \leq ||\vec{u}|| + ||\vec{v}||$ 3. When does equality hold in the triangle inequality? ### 05 Model selection with regularization {data-difficulty="2"} ::: {.callout-important collapse="true" appearance="minimal"} #### Context In machine learning, we constantly face a tradeoff: should we use a complex model that fits our training data very well, or a simpler model that captures the general pattern? This is where **regularization** comes in. Imagine you're Netflix trying to predict movie ratings. You could create an extremely complex formula with thousands of parameters that perfectly predicts every rating in your training data. But when a new user comes along, your model might fail spectacularly - it memorized the training data instead of learning the underlying patterns. This is called **overfitting**. ([Kargin example](https://www.youtube.com/watch?v=723rlQAhXqc)) **Regularization** prevents overfitting by adding a penalty for model complexity to our optimization goal: $$\text{Total Error} = \text{Prediction Error} + \lambda \cdot \text{Complexity Penalty}$$ where $\lambda$ controls how much we penalize complexity (having large parameter values). The two most common regularization methods use different norms to measure complexity: - **L1 Regularization (Lasso)**: Uses the sum of absolute values $$\text{L1 penalty} = \lambda \sum_{i=1}^{n} |w_i|$$ - **L2 Regularization (Ridge)**: Uses the sum of squares $$\text{L2 penalty} = \lambda \sum_{i=1}^{n} w_i^2$$ **Real-world example:** Suppose you're predicting house prices using features like size, location, age, etc. Without regularization, your model might learn that "houses with exactly 2,347 sq ft, built in 1987, with 3.5 bathrooms, facing north-northeast, with blue doors" sell for $523,456. With regularization, it learns more general rules like "larger houses in good neighborhoods cost more." ::: Վստահ չեմ որ լավ եմ ձևակերպել (հատկապես) էս խնդիրը , եթե հարցեր լինեն՝ խաբար արեք։ You're comparing two models that predict house prices: - Model A: Complex formula with weights (coefficients) $\vec{w_A} = (10, -8, 4)$ (this can correspond to equation ($10x^2 - 8x + 4$ (quadratic))) and prediction error = 100 - Model B: Simpler formula with weights $\vec{w_B} = (0.1, -3, 1)$ $(0.1x^2 - 3x + 1)$ (almost just a linear function) and prediction error = 120 Model B makes slightly worse predictions, but which model is better when considering both error and simplicity? a) **L1 Regularization (λ = 0.5):** Calculate the total error for each model - Model A: $\text{Error} + \lambda \cdot ||\vec{w_A}||_1 = ?$ - Model B: $\text{Error} + \lambda \cdot ||\vec{w_B}||_1 = ?$ b) **L2 Regularization (λ = 0.5):** Calculate the total error for each model - Model A: $\text{Error} + \lambda \cdot ||\vec{w_A}||_2^2 = ?$ - Model B: $\text{Error} + \lambda \cdot ||\vec{w_B}||_2^2 = ?$ c) **Model Selection:** Which model would you choose under each regularization method? How does the choice of $\lambda$ affect your decision? d) **Practical Insight:** In production systems, why might we prefer a model with slightly worse accuracy but much simpler weights? ### 06 k-Nearest Neighbors Classification {data-difficulty="3"} Կցված կգտնեք [csv ֆայլ](https://github.com/HaykTarkhanyan/python_math_ml_course/blob/main/math/assets/knn.csv) երեք սյունով՝ feature_1, feature_2, label։ Կարող եք պատկերացնել որ feature_1-ը իրանից ներկայացնում ա ծաղկի բարձրությունը, feature_2-ը՝ լայնությունը ու label (պիտակը) ներկայացնում ա թե 4 ծաղկի տեսակներից (0,1,2,3) որ մեկն ա։ Պետք ա ստեղծել մոդել (ալգորիթմ) որը ստանալով feature_1, feature_2 արժեքները կգուշակի ծաղկի տեսակը։ Հետևյալ կերպով՝ նոր ծաղկի համար գտնել K հատ ամենամոտիկ ծաղիկները մեր ունեցած տվյալներից ու նայել թե էդ k հարևաններից որ տեսակի ծաղիկն ա գերակշռում՝ ու դա օգտագործել որպես գուշակություն, Հեռավորություն որպես օգտագործեք մի դեպքում L1-ը (Manhattan), մի դեպքում L2-ը (Euclidean): K-ի համար էլ տարբեր արժեքներ բզբացեք՝ 2,3, 5, 10 . Թեթև հավելյալ նշումներ 1. Ալգորիմթի անունն ա K Nearest Neighbors ու զուտ "ասա ինձ ովքեր են քո ընկերները, ես կասեմ ով ես դու" սկզբումքով ա աշխատում, պրակտիկայում համարյա երբեք չի օգտագործվում բայց տնայինի համար կարա հավես լինի 2. Պատճառներից մեկը թե ինչի չի օգտագործվում դա "Չափողականության անեծքն" ա (Curse of dimenionality), շատ հավես էֆեկտ ա ըստ որի երբ գործ ենք ունենում բարձր չափանի տարածությունների հետ, տվյալները հիմնականում իրարից համարյա հավասարահեռ են դառնում ու անկյուններում են կուտակվում (այլ կերպ ասած՝ եթե բարձրաչափ նարինջը կլպենք՝ տակը բան չի մնա)։ Աղբյուր (https://slds-lmu.github.io/i2ml/chapters/14_cod/) ## Dot Products and Angles Between Vectors ### 07 Finding perpendicular vectors {data-difficulty="1"} Given the vector $\vec{v} = (2, 3)$: 1. Find a non-zero vector $\vec{w} = (x, y)$ such that $\vec{v}$ and $\vec{w}$ are perpendicular. 2. Verify that your chosen vector $\vec{w}$ satisfies $\vec{v} \cdot \vec{w} = 0$. 3. Find a unit vector in the direction of $\vec{w}$ by computing $\frac{\vec{w}}{||\vec{w}||}$. 4. Explain why there are infinitely many vectors perpendicular to $\vec{v}$ and describe the general form of all such vectors. ### 08: Word embeddings similarity {data-difficulty="3"} ::: {.callout-important collapse="true" appearance="minimal"} #### Context Computers understand numbers, not words. To make sense of text, we convert words into vectors in a high-dimensional space, called **word embeddings**. In this space, words with similar meanings are located close to each other. For example, "king" and "queen" are closer than "king" and "car". ::: ```{python} #| eval: false # install gensim if you haven't already (you can also do it without `uv`, but why would you? uv is fantastic) # !uv pip install gensim import gensim.downloader as api # this might take a few minutes first time model_name = "glove-twitter-25" # smaller model # If you're willing to wait a bit longer for a better model, uncomment line below: # model_name = "word2vec-google-news-300" # 1.5 gb model model = api.load(model_name) # 300-dimensional vectors print("Model loaded successfully!") # to get vector for a word we just get it like from a dictionary word = "cheese" if word in model: vector = model[word] print(f"Vector for '{word}' has shape: {vector.shape}") print(f"First 10 dimensions: {vector[:10]}") else: print(f"'{word}' not found in vocabulary") ``` Familiarize yourself with the code above. It loads a pre-trained word embedding model and retrieves the vector for the word "cheese". Your task is to calculate how similar the word "cheese" (or any other word you choose) is to a list of other words (given below) using cosine similarity. ```{python} #| eval: false potential_words = ["elephant", "cheese", "butter", "bread", "watermelon", "potato", "iron", "clock", "computer", "chicken", "fries"] ``` Return a sorted dictionary where keys are words and values are their cosine similarity to "cheese". You can **AFTERWARDS** also use the built-in method to find most similar words: ```{python} #| eval: false model.most_similar("cheese", topn=10) ``` **Bonus:** Play around with vector arithmetic to explore relationships - you can add and subtract word vectors to see how meanings combine. For example, try "king" - "man" + "woman" and see what word is closest to the resulting vector! Or try "Paris" - "France" + "Armenia" to see if you get "Yerevan". ::: {.callout-note collapse="true" appearance="minimal"} Check out this fantastic [3Blue1Brown video on word vectors (embeddings)](https://youtu.be/wjZofJX0v4M?t=751) for more insights! Also, [this](https://huggingface.co/blog/embeddinggemma#demo) is a cool tool to play with ::: ## Vector Spaces and Subspaces ### 09: Identifying vector spaces and non-vector spaces {data-difficulty="3"} For each of the following sets, determine whether it is a vector space or not. If it is a vector space, prove it by verifying all the required axioms. If it is not a vector space, identify which axiom(s) fail and provide counterexamples. a) $A = \left\{\begin{pmatrix} a \\ 0 \end{pmatrix} \mid a \in \mathbb{R}\right\}$ (vectors with second component zero) b) $B = \left\{\begin{pmatrix} a \\ -a \end{pmatrix} \mid a \in \mathbb{R}\right\}$ (vectors where second component is negative of first) c) $C = \mathbb{N}$ (the set of natural numbers) d) $D = \left\{\begin{pmatrix} a \\ 1 \end{pmatrix} \mid a \in \mathbb{R}\right\}$ (vectors with second component always 1) Note: By default, if we don't mention the operation, we mean the standard vector addition and scalar multiplication (e.g. our good old + and * we learning at school). **Hint:** For the non-vector spaces, show that there are some "bad" elements such that if we add them or multiply with some number (not necessarily positive), the result would not belong to the set. ### 10 vector space or not? {data-difficulty="2"} Check if the following set is a vector space: a) $A=\mathbb{Z}$, with the usual operations $+$ and $\cdot$. b) $B=\left\{ \begin{bmatrix} 0 \\ 0 \\ a \end{bmatrix} \,\middle|\, a\in\mathbb{R} \right\}$ with the usual operations $+$ and $\cdot$. c) $C=\mathbb{R}^2=\left\{ \begin{bmatrix} a \\ b \end{bmatrix} \,\middle|\, a,b\in\mathbb{R} \right\}$, with the usual operation $\cdot$ and the addition defined as $$ \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} + \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} = \begin{bmatrix} x_1 + y_1 \\ x_2 + y_2 + 1 \end{bmatrix}. $$ d) The set of all polynomials of degree $\le 2$, with the usual operations $+$ and $\cdot$. Bonus question - is this maybe "equivalent" to some other vector space we already know? ### 11 Vector subspaces {data-difficulty="2"} ![](assets/subspace_exercise.png) [Video](https://www.youtube.com/watch?v=N8GR7eCepl8) solution ### 12 Deriving the cosine angle formula {.bonus problem data-difficulty="3"} Derive the formula for the cosine of the angle between two vectors: $\cos(\theta) = \frac{\vec{a} \cdot \vec{b}}{||\vec{a}|| \cdot ||\vec{b}||}$ ::: {.callout-warning collapse="true" appearance="minimal"} #### Hint Start with the law of cosines for a triangle: $c^2 = a^2 + b^2 - 2ab\cos(\theta)$. Consider a triangle formed by vectors $\vec{a}$, $\vec{b}$, and $\vec{a} - \vec{b}$. The side lengths are $||\vec{a}||$, $||\vec{b}||$, and $||\vec{a} - \vec{b}||$. Express $||\vec{a} - \vec{b}||^2$ using the dot product and substitute into the law of cosines. ::: 1. Write down the law of cosines for the triangle with sides $||\vec{a}||$, $||\vec{b}||$, and $||\vec{a} - \vec{b}||$ 2. Express $||\vec{a} - \vec{b}||^2$ in terms of dot products by expanding $(\vec{a} - \vec{b}) \cdot (\vec{a} - \vec{b})$ 3. Substitute your result from part (2) into the law of cosines and solve for $\cos(\theta)$ 4. Verify your derived formula using vectors $\vec{u} = (3, 4)$ and $\vec{v} = (1, 0)$ ### 13 High-dimensional vector geometry {.bonus-problem data-difficulty="1"} In high-dimensional spaces (common in ML), our intuition about geometry can be misleading (we will explore this later). Consider the unit sphere in $\mathbb{R}^n$ (all vectors with norm 1): 1. In 2D, what fraction of a unit square $[-1,1] \times [-1,1]$ is occupied by the unit circle? 2. Estimate this fraction for a unit cube in 3D (you can google the formula) 3. Try to guess and then google What happens to this fraction as the dimension $n$ increases? This is known as the "curse of dimensionality." [Video](https://www.youtube.com/watch?v=9Tf-_mJhOkU) # 🎲 38 (01) TODO - ▶️[ToDo]() - 🔗[Random link]() - 🇦🇲🎶[ToDo]() - 🌐🎶[ToDo]() - 🤌[Կարգին]() <a href="http://s01.flagcounter.com/more/1oO"><img src="https://s01.flagcounter.com/count2/1oO/bg_FFFFFF/txt_000000/border_CCCCCC/columns_2/maxflags_10/viewers_0/labels_0/pageviews_1/flags_0/percent_0/" alt="Flag Counter"></a>