00 Preliminaries: Sets, Combinatorics, Functions
լուսանկարի հղումը, Հեղինակ՝ Misak Aghababyan
📚 Նյութը
- 📚 Ամբողջական նյութը
- 📺 Բազմություններ, կոմբինատորիկա, արտապատկերումներ
- 🎞️ Սլայդեր
- 🛠️📺 Գործնականի տեսագրությունը
- 🛠️🗂️ Գործնականի PDF-ը
Մինչ բուն նյութին անցնելը ծանոթանում ենք մի քանի նախնական կարևոր գաղափարների՝
- Բազմություններ (գործողություններ, կցման-արտաքսման սկզունք, դեկարտյան արտադրյալ)
- Զուգորդություններ, կարգավորություններ
- Ֆունկցիաներ (պատկեր, նախապատկեր, ինյեկտիվ, սյուրյեկտիվ, բիյեկտիվ, կենտ, զույգ)
📚 Տանը կարդում ենք՝
- Գևորգյան, Սահակյան 12 83-88 (բազմություններ), 91-103 (կոմբինատորիկա) էջերը
- Stewart 59-62 (ֆունկցիայի հակադարձ), 17-18 (զույգ և կենտ ֆունկցիա) էջերը
Բոլոր գրքերը այստեղ են։
🏡 Տնային
- ❗❗❗ DON’T CHECK THE SOLUTIONS BEFORE TRYING TO DO THE HOMEWORK BY YOURSELF❗❗❗
- Please don’t hesitate to ask questions, never forget about the 🍊karalyok🍊 principle!
- The harder the problem is, the more 🧀cheeses🧀 it has.
- Problems with 🎁 are just extra bonuses. It would be good to try to solve them, but also it’s not the highest priority task.
- If the problem involve many boring calculations, feel free to skip them - important part is understanding the concepts.
- Submit your solutions here (even if it’s unfinished)
Sets
01 Two-annotator inclusion–exclusion
Out of 1,000 images, annotator \(A\) flags 260 as “elephant,” annotator \(B\) flags 310 as “elephant,” and both flag 190.
- How many images at least one annotator called “elephant”?
- How many images neither annotator called “elephant”?
Inclusion-exclusion for two sets: \[|A \cup B| = |A| + |B| - |A \cap B| = 260 + 310 - 190 = 380.\]
Neither annotator flagged the image means it’s outside \(A \cup B\): \[1000 - 380 = 620.\]
We subtract the intersection once because \(|A| + |B|\) counts the 190 shared images twice.
02 Train/Test split
You collected 200 images. Train set \(T\) has 140 indices; Test set \(S\) has 60. We noticed an overlap \(|T\cap S|=4\).
- Are the splits disjoint? If not, how many unique images actually exist across both sets?
- How many items must be moved to make \(T\) and \(S\) disjoint with target sizes \(140\) and \(60\) (no duplicates)? Propose a minimal-move strategy.
Part 1. The splits are not disjoint because \(|T \cap S| = 4 > 0\). By inclusion-exclusion: \[|T \cup S| = |T| + |S| - |T \cap S| = 140 + 60 - 4 = 196\] unique images. So 4 images are double-counted and there are only 196 distinct images across both sets (out of the 200 you collected, meaning 4 images appear in neither set).
Part 2. The 4 overlapping images must each be in only one of the two sets. The target sizes 140 and 60 sum to 200, but currently the union is 196. To get to disjoint sets of those exact sizes we need 4 more “slots” filled, which means recruiting the 4 unused images.
Minimal-move strategy:
- For each of the 4 overlap images, remove it from \(S\) (keep in \(T\)). Now \(|T| = 140\), \(|S| = 56\), disjoint.
- Add the 4 unused images to \(S\) to bring \(|S|\) back to 60.
Total: 4 removals + 4 additions = 8 single-image moves. The key insight is that any image in the overlap must “lose” one of its two memberships, and you can always remove it from the smaller set to keep things balanced.
03 Divisibility via inclusion–exclusion
How many integers in \({1,\dots,100}\) are divisible by \(5\) or \(7\).
Let \(A\) = multiples of 5 in \(\{1,\dots,100\}\) and \(B\) = multiples of 7. Their intersection is multiples of \(\operatorname{lcm}(5,7) = 35\).
- \(|A| = \lfloor 100/5 \rfloor = 20\)
- \(|B| = \lfloor 100/7 \rfloor = 14\)
- \(|A \cap B| = \lfloor 100/35 \rfloor = 2\) (these are 35 and 70)
\[|A \cup B| = 20 + 14 - 2 = 32.\]
Sanity check: 35 and 70 are the only numbers up to 100 divisible by both, so without the correction we’d be double-counting them.
04 Inclusion-exclusion principle for 3 sets
Figure out the equation of the inclusion-exclusion principle for 3 sets. You may find it useful to draw a Venn diagram.
The formula is: \[|A \cup B \cup C| = |A| + |B| + |C| - |A \cap B| - |A \cap C| - |B \cap C| + |A \cap B \cap C|.\]
Why each term? Imagine pouring \(|A| + |B| + |C|\) into a counter and watching how each region of the Venn diagram gets counted:
- An element in only one set: counted 1 time. Correct.
- An element in exactly two sets (say \(A\) and \(B\), not \(C\)): counted 2 times by \(|A| + |B| + |C|\). We subtract \(|A \cap B|\) to bring it down to 1. Correct.
- An element in all three sets: counted 3 times by \(|A| + |B| + |C|\). We then subtract it 3 times (once in each pairwise intersection), bringing it to 0. We add it back once with \(|A \cap B \cap C|\) to bring it to 1. Correct.
Pattern. Add singletons, subtract pairs, add triples. For \(n\) sets the formula alternates signs: \(+\) for odd-size groups, \(-\) for even-size. This is the start of the general inclusion-exclusion principle.
Combinatorics
05 Augmentation pipeline counting
An image pipeline does: random crop (3 sizes), color jitter (on/off), horizontal flip (on/off), and chooses one of 4 background textures.
Assuming independence, how many distinct augmentation “recipes” can be produced for a given image? Example recipe is (crop: 2nd size, flip: 1 (on), texture: 3rd type).
By the multiplication principle, since the four choices are independent: \[3 \times 2 \times 2 \times 4 = 48 \text{ recipes}.\]
Each recipe is an element of the Cartesian product \(\{\text{crop}\} \times \{\text{jitter}\} \times \{\text{flip}\} \times \{\text{texture}\}\).
06 Two hyperparameters — combinations
Let hyperparameter set \(H_1\) have \(m\) options and \(H_2\) have \(n\) options.
If we want to try the whole grid of options how many combinations do we have? How is the set of all combinations called?
The total number of combinations is \(m \cdot n\). The set of all pairs \((h_1, h_2)\) with \(h_1 \in H_1\), \(h_2 \in H_2\) is called the Cartesian product \(H_1 \times H_2\).
In ML this brute-force search over the full grid is called grid search.
07 Armenian car plates
Armenian cars have the format DD LL DDD, digits 0–9, letters A–Z, repetition allowed.
- How many distinct plates exist?
- How does the answer change if letters cannot repeat inside the same plate?
There are 10 digits and 26 letters. The plate has 5 digit slots and 2 letter slots.
Part 1. With repetition everywhere: \[10^2 \cdot 26^2 \cdot 10^3 = 100 \cdot 676 \cdot 1000 = 67{,}600{,}000.\]
Part 2. Letters cannot repeat (digits still can): \[10^2 \cdot (26 \cdot 25) \cdot 10^3 = 100 \cdot 650 \cdot 1000 = 65{,}000{,}000.\]
Sanity check: forbidding letter repeats removes the \(26 \cdot 1 = 26\) “double-letter” choices and leaves \(26 \cdot 25 = 650\) instead of \(676\). The ratio \(650/676 \approx 0.96\) matches the drop from 67.6M to 65.0M.
08 Anagrams of “cheeses”
How many distinct words can you form by rearranging the letters of “cheeses”? (Handle repeated letters carefully.)
The word “cheeses” has 7 letters with multiplicities: c=1, h=1, e=3, s=2.
If all 7 letters were distinct we’d have \(7! = 5040\) arrangements. But swapping the three e’s among themselves gives the same word, so we divide by \(3!\). Same for the two s’s, divide by \(2!\): \[\frac{7!}{3! \cdot 2! \cdot 1! \cdot 1!} = \frac{5040}{6 \cdot 2} = \frac{5040}{12} = 420.\]
This is the multinomial coefficient \(\binom{7}{1,1,3,2}\).
Sanity check: a fully distinct 7-letter word gives 5040; “cheeses” has heavy repetition so 420 (about 12x fewer) makes sense.
Functions
09 (my favorite exercise) Real-world examples
Give one real-life example (and short justification) for 1. injective 2. surjective and 3. bijective mapping.
There’s no single right answer here — these are just a few good ones. The point is that you can articulate why the example fits.
Injective (one-to-one, but not necessarily covering the codomain): person \(\to\) fingerprint. No two people share a fingerprint, so different inputs go to different outputs. Not surjective because most theoretically possible “fingerprint patterns” are never realized by any actual person.
Surjective (covers everything in the codomain, but inputs can collide): student in a course \(\to\) letter grade in \(\{A, B, C, D, F\}\). Assuming all five grades are used in the class, the map covers the codomain. Not injective because many students get the same grade.
Bijective (one-to-one and covers the codomain): seat number \(\to\) ticket holder at a sold-out concert. Every seat has exactly one occupant and every ticket holder has exactly one seat. Other good ones: passport number \(\to\) citizen, or position in a finished race \(\to\) runner.
A useful test: ask “can two different inputs map to the same output?” (if no \(\Rightarrow\) injective) and “is every possible output actually hit?” (if yes \(\Rightarrow\) surjective).
10 Counting small functions
How many functions \(f:\{1,2,3\}\to\{a,b\}\)?
How many are injective? How many are surjective? (Hint: total is \(2^3\).)
Total functions. Each of 1, 2, 3 independently chooses an output in \(\{a,b\}\): \(2^3 = 8\) functions.
Injective. Injective requires distinct inputs to map to distinct outputs, but here we have 3 inputs trying to fit into a codomain of size 2 — by the pigeonhole principle at least two inputs must collide. So 0 injective functions.
Surjective. A surjection must hit both \(a\) and \(b\). The only failures are constant functions: “everything maps to \(a\)” or “everything maps to \(b\)” — exactly 2 of them. So surjections = \(8 - 2 = 6\).
Sanity check: list them. The 8 functions correspond to the 8 binary strings of length 3 (think \(a=0, b=1\)): 000, 001, 010, 011, 100, 101, 110, 111. The strings 000 and 111 are the constants — the other 6 use both letters.
11 Even/odd — activation symmetry
Classify each as even, odd, or neither, and briefly note the symmetry implication:
- \(f(x)=x^2\)
- \(g(x)=x^3\)
- \(p(x)=\max(0,x)\) (same as above, but the name is
REctified Linear Unit (ReLu)) - \(s(x)=\frac{1}{1+e^{-x}}\) (this function is called
Sigmoidand we’ll encounter it a lot in ML) (if the function is neither please specify what we can do with it to make it even/odd)
Recall: even means \(f(-x) = f(x)\) (symmetric across the y-axis); odd means \(f(-x) = -f(x)\) (point symmetry through the origin).
\(f(x) = x^2\): \(f(-x) = (-x)^2 = x^2 = f(x)\). Even. Mirror-symmetric across the y-axis.
\(g(x) = x^3\): \(g(-x) = (-x)^3 = -x^3 = -g(x)\). Odd. Symmetric about the origin.
\(p(x) = \max(0, x)\) (ReLU): \(p(1) = 1\) but \(p(-1) = 0\), so \(p(-1) \neq p(1)\) (not even) and \(p(-1) \neq -p(1)\) (not odd). Neither. ReLU asymmetrically zeros out negative inputs — that asymmetry is the whole point in ML, since it lets the network represent non-linear behavior.
\(s(x) = \frac{1}{1+e^{-x}}\) (Sigmoid): \(s(0) = 1/2\), so it can’t be odd (an odd function must satisfy \(s(0) = 0\)). Also \(s(1) \approx 0.73\) but \(s(-1) \approx 0.27 \neq s(1)\), so not even either. Neither. A common trick: shift it down by \(\tfrac{1}{2}\). The function \(\tilde{s}(x) = s(x) - \tfrac{1}{2}\) satisfies \(\tilde{s}(-x) = -\tilde{s}(x)\), so it is odd. Equivalently, \(2s(x) - 1 = \tanh(x/2)\), which is the standard odd version.
12 Function properties
Given \(f:\mathbb{R}\to\mathbb{R}\), \(f(x)=x+3x^2\), determine:
- Is it injective or not?
- Is it surjective or not?
- Is it bijective or not?
- What is the image of \(A\) under \(f\), when \(A=[-1,2]\)?
- What is the preimage of \(B\) under \(f\), when \(B=[0,5]\)?
\(f(x) = 3x^2 + x\) is an upward-opening parabola. Its vertex (minimum) is at \(x = -\tfrac{1}{2 \cdot 3} \cdot 1 = -\tfrac{1}{6}\), and the minimum value is \(f(-\tfrac{1}{6}) = 3 \cdot \tfrac{1}{36} - \tfrac{1}{6} = \tfrac{1}{12} - \tfrac{2}{12} = -\tfrac{1}{12}\).
Injective? No. For example \(f(0) = 0\) and \(f(-\tfrac{1}{3}) = 3 \cdot \tfrac{1}{9} - \tfrac{1}{3} = \tfrac{1}{3} - \tfrac{1}{3} = 0\). Two different inputs give the same output. (More generally any horizontal line above the minimum hits the parabola twice.)
Surjective? No. The range is \([-\tfrac{1}{12}, +\infty)\), which doesn’t cover all of \(\mathbb{R}\) — for instance \(-1\) has no preimage.
Bijective? No, since it’s neither injective nor surjective.
Image of \(A = [-1, 2]\). The vertex \(x = -1/6\) lies inside \([-1, 2]\), so the minimum on \(A\) is \(f(-1/6) = -1/12\). Endpoints: \(f(-1) = 3 - 1 = 2\), \(f(2) = 12 + 2 = 14\). So \(f(A) = \left[-\tfrac{1}{12}, 14\right]\).
Preimage of \(B = [0, 5]\). We need all \(x\) with \(0 \le 3x^2 + x \le 5\).
- \(3x^2 + x \ge 0 \iff x(3x + 1) \ge 0 \iff x \le -\tfrac{1}{3}\) or \(x \ge 0\).
- \(3x^2 + x \le 5 \iff 3x^2 + x - 5 \le 0\). Roots: \(x = \frac{-1 \pm \sqrt{1 + 60}}{6} = \frac{-1 \pm \sqrt{61}}{6}\). So \(\frac{-1 - \sqrt{61}}{6} \le x \le \frac{-1 + \sqrt{61}}{6}\).
Intersecting: \[f^{-1}(B) = \left[\tfrac{-1 - \sqrt{61}}{6},\, -\tfrac{1}{3}\right] \cup \left[0,\, \tfrac{-1 + \sqrt{61}}{6}\right].\]
Numerically \(\sqrt{61} \approx 7.81\), so this is roughly \([-1.47, -0.33] \cup [0, 1.13]\).
13 Both even and odd
Describe all functions \(f:\mathbb{R}\to\mathbb{R}\) that are simultaneously even and odd.
Suppose \(f\) is both even and odd. Then for every \(x \in \mathbb{R}\):
- Even: \(f(-x) = f(x)\)
- Odd: \(f(-x) = -f(x)\)
Setting these equal: \(f(x) = -f(x)\), so \(2f(x) = 0\), giving \(f(x) = 0\) for all \(x\).
So the only function \(\mathbb{R} \to \mathbb{R}\) that is both even and odd is the zero function \(f \equiv 0\).
It does satisfy both conditions trivially: \(f(-x) = 0 = f(x)\) and \(f(-x) = 0 = -0 = -f(x)\).
14 Inclusion–exclusion for surjections
Count surjective functions \(f:\{1,2,3,4\}\to\{a,b,c\}\) using inclusion–exclusion. Show your work.
Let \(X = \{1,2,3,4\}\) (size \(n = 4\)) and \(Y = \{a,b,c\}\) (size \(k = 3\)). Total functions \(f: X \to Y\): there are \(k^n = 3^4 = 81\).
A function is not surjective iff some element of \(Y\) is missed. Let \(A_y\) = set of functions whose image misses element \(y \in Y\). We want \[\#\text{surjections} = 3^4 - |A_a \cup A_b \cup A_c|.\]
By inclusion-exclusion (problem 04 generalized): \[|A_a \cup A_b \cup A_c| = \sum |A_y| - \sum |A_{y_1} \cap A_{y_2}| + |A_a \cap A_b \cap A_c|.\]
Single term. Functions missing one specific \(y\) map all 4 inputs into the remaining 2 elements: \(2^4 = 16\). There are \(\binom{3}{1} = 3\) such terms, contributing \(3 \cdot 16 = 48\).
Pairwise term. Functions missing 2 specific elements map all 4 inputs into the remaining 1 element: \(1^4 = 1\). There are \(\binom{3}{2} = 3\) such terms, contributing \(3 \cdot 1 = 3\).
Triple term. Functions missing all 3 elements: \(0^4 = 0\).
So \(|A_a \cup A_b \cup A_c| = 48 - 3 + 0 = 45\), and \[\#\text{surjections} = 81 - 45 = 36.\]
General formula. The same argument with \(|X| = n\), \(|Y| = k\) gives the standard surjection count: \[\#\text{surjections} = \sum_{i=0}^{k} (-1)^i \binom{k}{i}(k-i)^n.\]
For \(n=4, k=3\): \(\binom{3}{0} 3^4 - \binom{3}{1} 2^4 + \binom{3}{2} 1^4 - \binom{3}{3} 0^4 = 81 - 48 + 3 - 0 = 36\). Same answer.
Sanity check by structure. A surjection from a 4-set onto a 3-set must group the 4 inputs into 3 nonempty blocks (one per output), then assign blocks to outputs. The only way to split 4 elements into 3 nonempty blocks is \(2+1+1\) — choose which 2 stick together: \(\binom{4}{2} = 6\) ways. Then assign these 3 blocks to \(\{a, b, c\}\): \(3! = 6\) ways. Total: \(6 \cdot 6 = 36\). Matches.