Yes, the original README.md needs a few updates.
The primary issue is the mention of Python 3.13 as a requirement. While Python 3.13 is the latest major stable release, it's still very new and may not be compatible with all common machine learning libraries, including older versions of PyTorch. The program code itself doesn't use any 3.13-specific features, so the more pragmatic and compatible requirement would be a version of Python that's widely supported by the deep learning ecosystem.
Here is a revised README.md with the necessary updates.
A Python program that trains a Convolutional Neural Network (CNN) to classify images of cats and dogs.
This program demonstrates how to build and train a deep learning model for image classification. It uses PyTorch, a popular deep learning framework, to create a neural network that can learn to distinguish between pictures of cats and dogs.
If you're new to machine learning, this readme will guide you through the concepts, requirements, and usage of this program.
A Convolutional Neural Network (CNN) is a type of artificial neural network specifically designed for processing structured grid data like images. Here's how it works in simple terms:
- Convolutional Layers: These scan the image with small filters to detect features like edges, textures, and patterns. Think of them as feature detectors that learn what parts of an image are important.
- Pooling Layers: These reduce the image size while preserving important information. They help make the network more efficient and focus on what matters.
- Fully Connected Layers: After extracting features with convolutional and pooling layers, these layers connect all the extracted features to make the final decision (cat or dog).
- Training Process: The network initially makes random guesses, compares them to the correct answers, and gradually adjusts its internal parameters to make better predictions.
- Python 3.8 or higher
- PyTorch (1.7.0 or higher recommended)
- torchvision
- tqdm (for progress bars)
- Pillow (for image processing)
- GPU enabled machine (Apple or NVIDIA, optional but recommended for faster training)
- onnxruntime
- onnx
You can install the required packages with:
pip install -r requirements.txtThe program expects your dataset to be organized in a specific way:
data/
├── cat/
│ ├── cat_image1.jpg
│ ├── cat_image2.jpg
│ └── ...
└── dog/
├── dog_image1.jpg
├── dog_image2.jpg
└── ...
Each class (cat, dog) should have its own folder containing the relevant images. You can use the Kaggle Cats and Dogs Dataset, which can be downloaded here.
Run the program with default parameters:
python main.pyThis will:
- Look for images in the
./datadirectory - Resize all images to 256×256 pixels
- Train for 10 epochs (complete passes through the dataset)
- Save the trained model as
cat_dog_classifier.pthandcat_dog_classifier.onnx
You can customize the training process with various arguments:
--data_dir PATH: Path to your dataset directory (default: './data')--image_size SIZE: Size to resize images to (default: 256)--augmentation: Enable data augmentation for training
--batch_size SIZE: Number of images to process at once (default: 32)--learning_rate RATE: Controls how quickly the model adapts (default: 0.001)--num_epochs NUM: Number of complete passes through the dataset (default: 10)--momentum VAL: Helps accelerate training in consistent directions (default: 0.9)--weight_decay VAL: Helps prevent overfitting (default: 1e-4)
--model_path PATH: Where to save the PyTorch model (default: 'cat_dog_classifier.pth')--onnx_path PATH: Where to save the ONNX model (default: 'cat_dog_classifier.onnx')
--val_split RATIO: Portion of data used for validation during training (default: 0.2)--patience NUM: Epochs to wait before reducing learning rate (default: 2)
--early_stopping: Enable early stopping to prevent overfitting--early_stopping_patience NUM: Number of epochs to wait before stopping if validation loss doesn't improve (default: 3)--early_stopping_min_delta VAL: Minimum change in validation loss to qualify as improvement (default: 0.001)
python main.py --data_dir ./my_images --image_size 224 --batch_size 64 --num_epochs 20 --learning_rate 0.0005 --augmentationExample with early stopping enabled:
python main.py --early_stopping --early_stopping_patience 5 --num_epochs 50This will stop training early if the validation loss doesn't improve for 5 consecutive epochs, even if it hasn't reached 50 epochs.
After training a model, you can use it to classify individual images without retraining. The program supports inference mode for this purpose.
To classify a single image using a trained model:
python main.py --inference --image_path path/to/your/image.jpgThis will:
- Automatically look for a trained model (
cat_dog_classifier.pthorcat_dog_classifier.onnx) - Load the image and preprocess it
- Run the model to predict whether it's a cat or dog
- Display the prediction and confidence score
--inference: Enable inference mode (required to run inference)--image_path PATH: Path to the image file you want to classify (required for inference)--model_file PATH: Specify a particular model file to use (optional, supports .pth or .onnx)--image_size SIZE: Image size the model was trained with (default: 256)
Using the default model:
python main.py --inference --image_path ./test_images/cat1.jpgUsing a specific PyTorch model:
python main.py --inference --image_path ./test_images/dog1.jpg --model_file ./models/best_model.pthUsing an ONNX model:
python main.py --inference --image_path ./test_images/cat2.jpg --model_file ./models/exported_model.onnxUsing a model trained with different image size:
python main.py --inference --image_path ./test_images/dog2.jpg --model_file custom_model.pth --image_size 224The inference mode will display:
- The path to the image being classified
- The predicted class (cat or dog)
- The confidence score as a percentage
Example output:
Using NVIDIA GPU (CUDA).
Using default PyTorch model: cat_dog_classifier.pth
Inference Results:
Image: ./test_images/fluffy_cat.jpg
Prediction: cat
Confidence: 96.52%
During training, you'll see progress bars and information about:
- Loss: How wrong the model's predictions are (lower is better)
- Accuracy: Percentage of correct predictions on the validation set
- Learning Rate: Current learning rate (may decrease during training)
At the end, the program saves:
- A PyTorch model (
.pth) file: For use within other PyTorch applications - An ONNX model (
.onnx) file: For cross-platform deployment or use with other frameworks
- Training set: The images the model learns from
- Validation set: Images kept separate to test how well the model generalizes
- Batch size: Number of images processed at once (higher uses more memory but can be faster)
- Learning rate: Controls how quickly the model changes its parameters (too high can overshoot, too low can be slow)
- Epochs: Number of complete passes through the dataset (more epochs = more learning time)
When a model performs well on training data but poorly on new data. Several techniques in this program help prevent overfitting:
- Dropout (randomly ignoring some neurons during training)
- Weight decay (penalizing large weights)
- Batch normalization (normalizing layer inputs)
- Early stopping (stopping training when validation performance stops improving)
After training, you have two options to classify new images:
The easiest way is to use the program's built-in inference mode:
python main.py --inference --image_path your_image.jpgSee the "Running Inference on a Single Image" section above for more details.
You can also load the model in your own Python scripts:
import torch
from torchvision import transforms
from PIL import Image
from main import CatDogCNN # Import the model class
# Load the trained model
model = CatDogCNN(image_size=256)
model.load_state_dict(torch.load('cat_dog_classifier.pth'))
model.eval() # Set to evaluation mode
# Prepare image transformation
transform = transforms.Compose([
transforms.Resize((256, 256)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
# Load and transform an image
image = Image.open('new_image.jpg')
input_tensor = transform(image).unsqueeze(0) # Add batch dimension
# Make prediction
with torch.no_grad():
output = model(input_tensor)
# Get class prediction with confidence
probabilities = torch.nn.functional.softmax(output, dim=1)
confidence, predicted = torch.max(probabilities, 1)
class_names = ['cat', 'dog']
print(f'Prediction: {class_names[predicted.item()]}')
print(f'Confidence: {confidence.item():.2%}')If you're interested in learning more about machine learning and CNNs:
- PyTorch Tutorials: https://pytorch.org/tutorials/
- Convolutional Neural Networks: CS231n
- Deep Learning Book: Goodfellow, Bengio, and Courville. Deep Learning. MIT, 2016.
- Out of Memory Error: Reduce batch size using
--batch_size - Slow Training: Check if you're using GPU; if not, consider setting up CUDA
- Poor Accuracy: Try training for more epochs, adjusting learning rate, or getting more training data
- Model Not Learning: Ensure your dataset is correctly organized and contains sufficient examples
MIT license. See LICENSE.md for details.
This program the uses the Kaggle Dogs vs. Cats imageset: Will Cukierski. Dogs vs. Cats. https://kaggle.com/competitions/dogs-vs-cats, 2013. Kaggle.