MrTimmyJ

Main Project Image

MNIST Handwritten Digit Classifier

April 2024

A Python tool that decodes, visualizes, and processes raw MNIST dataset files to identify handwritten numbers.

< >

Project Overview

        A Python tool that decodes, visualizes, and processes the raw MNIST dataset files. This project reads binary image and label data from the classic MNIST dataset, reconstructs grayscale images using PIL, and demonstrates low-level parsing of the IDX file format for educational purposes in machine learning preprocessing.


        MNIST Image Decoder is a Python script that loads and visualizes the MNIST handwritten digits dataset directly from the original binary files. It provides a simple interface to decode, reshape, and display the grayscale images using NumPy and Pillow, offering a foundational introduction to how deep learning datasets are preprocessed from raw formats.

Project Detail Image

    ๐Ÿงฑ Binary Decoding

  • Parses the header of the IDX3 file to extract image count, width, and height
  • Reads each image as a flattened array of grayscale pixel values
  • Reshapes to (28x28) and visualizes with PIL.Image
  • ๐Ÿ”ข Label Extraction

  • Reads individual byte values from the IDX1 label file
  • Converts labels to one-hot encoded vectors
  • Pairs image and label data for training sets
  • ๐Ÿ“ฆ Full Dataset Loading

  • load_all_training_images() reads all images into a NumPy array of shape [60000, 784, 1]
  • Data type: np.ubyte (8-bit grayscale)

Project Features

  • ๐Ÿงพ Parses IDX3/IDX1 binary formats from the MNIST dataset
  • ๐Ÿ–ผ๏ธ Converts image data into grayscale PNGs using PIL
  • ๐Ÿ”ข One-hot encodes label data for training models
  • ๐Ÿ“š Loads the full dataset into structured NumPy arrays
  • ๐Ÿ’ก Designed for educational purposes in ML, DL, and data engineering
Project Detail Image

Project User Workflow

  • Download MNIST dataset files (train-images-idx3-ubyte, train-labels-idx1-ubyte)
  • Run the Python script (python mnist_decoder.py)
  • Reconstructed grayscale images are saved as PNG files
  • Explore how the binary data maps to image matrices and label vectors
Project Detail Image

Technologies Used

  • ๐Ÿ Python Core programming language
  • ๐Ÿ–ผ๏ธ Pillow Image processing and PNG saving
  • ๐Ÿงฎ NumPy Efficient numerical array manipulation
  • ๐Ÿ—ƒ๏ธ File I/O Low-level binary file reading of IDX format