05 Clustering

🎲 Random

TBD

📚 Նյութը

📝 Թեմայի վերաբերյալ հարցաշար (Google Form): TBD

References:

  • scikit-learn example: Color Quantization using K-Means
  • Bishop, Pattern Recognition and Machine Learning (2006), section 9.1.1 (image segmentation / compression with k-means).

🏡 Տնային

Project — Image compression with k-means 🧀🧀

An image is just a grid of pixels, and every pixel is a point in RGB space (3 numbers). If you run k-means on the pixels and repaint each one with its cluster’s centroid color, the result uses only k colors — a much smaller palette for a small drop in quality. This is color quantization, the practical we set up at the end of the lecture.

Setup. Use any image you like (a photo, a painting — img/saryan_mountains.jpg is provided). Deliver one reproducible notebook, seed 509, ending with a 3–5 sentence conclusion. Resize the image down first if it is large — k-means on millions of pixels is slow.

Tasks.

  1. Load the image, reshape it to (n_pixels, 3), and scale the values to [0, 1].
  2. Quantize with k-means for several k (e.g. 2, 4, 8, 16, 32) and show the results next to the original.
  3. Choose a k: plot inertia (the elbow) and the silhouette against k, and argue for one value.
  4. Show the palette — the k centroid colors.
  5. Measure the compression honestly. A quantized image stores a palette (k × 3 bytes) plus one index per pixel (ceil(log2 k) bits), versus 24 bits/pixel originally — compute that ratio and the reconstruction error (MSE between original and quantized pixels). Explain in one line why comparing re-saved .jpg file sizes is misleading.
  6. Where does it break? Find a k low enough that banding / posterization shows up in smooth gradients (sky, shadows).

Bonus: MiniBatchKMeans to quantize the full-resolution image fast; try a different color space (e.g. Lab); swap k-means for another algorithm and compare; pick k automatically from the elbow.

Flag Counter