05 Clustering
🎲 Random
TBD
📚 Նյութը
📝 Թեմայի վերաբերյալ հարցաշար (Google Form): TBD
References:
- scikit-learn example: Color Quantization using K-Means
- Bishop, Pattern Recognition and Machine Learning (2006), section 9.1.1 (image segmentation / compression with k-means).
🏡 Տնային
Project — Image compression with k-means 🧀🧀
An image is just a grid of pixels, and every pixel is a point in RGB space (3 numbers). If you run k-means on the pixels and repaint each one with its cluster’s centroid color, the result uses only k colors — a much smaller palette for a small drop in quality. This is color quantization, the practical we set up at the end of the lecture.
Setup. Use any image you like (a photo, a painting — img/saryan_mountains.jpg is provided). Deliver one reproducible notebook, seed 509, ending with a 3–5 sentence conclusion. Resize the image down first if it is large — k-means on millions of pixels is slow.
Tasks.
- Load the image, reshape it to
(n_pixels, 3), and scale the values to[0, 1]. - Quantize with k-means for several
k(e.g. 2, 4, 8, 16, 32) and show the results next to the original. - Choose a
k: plot inertia (the elbow) and the silhouette againstk, and argue for one value. - Show the palette — the
kcentroid colors. - Measure the compression honestly. A quantized image stores a palette (
k × 3bytes) plus one index per pixel (ceil(log2 k)bits), versus24bits/pixel originally — compute that ratio and the reconstruction error (MSE between original and quantized pixels). Explain in one line why comparing re-saved.jpgfile sizes is misleading. - Where does it break? Find a
klow enough that banding / posterization shows up in smooth gradients (sky, shadows).
Bonus: MiniBatchKMeans to quantize the full-resolution image fast; try a different color space (e.g. Lab); swap k-means for another algorithm and compare; pick k automatically from the elbow.