- TensorFlow implementation of YOLOv3 for object detection.
- Both inference and training pipelines are implemented.
- For inference using pre-trained model, the model stored in
.weightsfile is first downloaded from official YOLO website (Section 'Performance on the COCO Dataset', YOLOv3-416 link), then converted to.npyfile and finally loaded by the TensorFlow model for prediction. - For training, the pre-trained DarkNet-53 is used as the feature extractor and the YOLO prediction layers at three scales are trained from scratch. Data augmentation such as random flipping, cropping, resize, affine transformation and color change (hue, saturation, brightness) are applied. Anchor clustering and multiple scale training (rescale training images every 10 epochs) are implemented as well.
- Convert pre-trained
.weightsmodel to.npyfile (detail). - Pre-trained DarkNet-53 for image classification (detail).
- Object detection using pre-trained YOLOv3 trained on COCO dataset (detail).
- YOLOv3 training pipeline
- Train on VOC dataset (detail).
- Performance evaluation.
- Train on custom dataset.
- Python 3.0
- TensorFlow 1.12.0+
- Numpy
- Scipy
- imageio
- Matplotlib
- Download the pre-trained model
yolov3.npyfrom here. This model is converted from the.weightsfile from here (Section 'Performance on the COCO Dataset', YOLOv3-416 link). - More details for converting models can be found here.
-
Modified the config file
configs/config_path.cfgwith the following content:[path] coco_pretrained_npy = DIRECTORY/TO/MODEL/yolov3.npy save_path = DIRECTORY/TO/SAVE/RESULT/ test_image_path = DIRECTORY/OF/TEST/IMAGE/ test_image_name = .jpg- Put the converted pretrained model
yolov3.npyincoco_pretrained_npy. - Put testing images in
test_image_path. - Part of testimg image names is specified by
test_image_name. - Result images will be saved in
save_path.
- Put the converted pretrained model
-
Use
obj_score_threshandnms_iou_threshin config fileconfigs/coco80.cfgto setup the parameters of non-maximum suppression to remove multiple bounding boxes for one detected object.obj_score_threshis the threshold for deciding if a bounding box detects an object class based on the score. Default is0.8.nms_iou_threshis the threshold for deciding if two bounding boxes overlap too much based on the IoU. Default is0.45.
-
Put testing images in
test_image_pathinpretrain_coco_path.cfgand go toexperiment\, runpython yolov3.py --detect -
Testing images are rescaled to 416 * 416 fed into the network.
-
Result images are saved in
save_pathsetting inconfigs/pretrain_coco_path.cfg.
Train on VOC2012 dataset (20 classes)
- Download VOC2012 training/validation data from here (2GB tar file).
- Download the pre-trained Darknet-53
yolov3_feat.npyfrom here. This model is converted from the.weightsfile from here (Section 'Pre-Trained Models', Darknet53 448x448 link). - More details for converting models can be found here.
-
Modified the config file
configs/config_path.cfgwith the following content:[path] yolo_feat_pretraind_npy = DIRECTORY/TO/MODEL/yolov3_feat.npy train_data_path = DIRECTORY/OF/TRAINING/SET/ save_path = DIRECTORY/TO/SAVE/RESULT/- Put the converted pretrained model
yolov3_feat.npyinyolo_feat_pretraind_npy. train_data_pathis the parent directoryJPEGImagesandAnnotationsfor training/validation set.- Tensorboard summary and trained model will be saved in
save_path.
- Put the converted pretrained model
-
Use config file
configs/voc.cfgto setup the hyper-parameters for training on VOC2012. Default values are the current setting.anchorare the 9 anchors (width and height) obtained from anchor clustering in ascending order.obj_weightandnobj_weightare the weights of object loss and non-object loss.multiscaleis the set of scales used for training.
-
Go to
experiment\, runpython yolov3.py --train -
The entire dataset is randomly divided into 14556 training samples (85%) and 2568 validation images (15%).
-
Data augmentation (flipping, cropping, resize, affine transformation and color change) is applied to the training set. The training images are rescaled every 10 epochs (randomly picked from
multiscaleinconfigs/voc.cfg). -
Validation image are all rescaled to 416 * 416 without augmentation for validation.
-
The learning rate schedule needs to be further tuned, but the current setting is: 0.1 (1-50 epochs), 0.01 (51-100 epochs) and 0.001 (101-150 epochs).
-
Tensorboard summary includes losses and sample predictions for both training set (every 100 steps) and validation set (every epoch) are saved in
save_pathinconfigs/config_path.cfg. Note that non-maximum suppression does not used in sample predictions and only top 20 predicted bounding boxes based on class score are shown. You can see how the model is doing during training:
- Prediction after 150 epochs. Performance evaluation will be added soon.
- https://github.com/pjreddie/darknet
- https://github.com/experiencor/keras-yolo3
- https://github.com/qqwweee/keras-yolo3
Qian Ge