DOI QR코드

DOI QR Code

Evaluating Chest Abnormalities Detection: YOLOv7 and Detection Transformer with CycleGAN Data Augmentation

  • Yoshua Kaleb Purwanto (Department of Computer Engineering, Dongseo University) ;
  • Suk-Ho Lee (Department of Computer Engineering, Dongseo University) ;
  • Dae-Ki Kang (Department of Computer Engineering, Dongseo University)
  • Received : 2024.05.06
  • Accepted : 2024.05.21
  • Published : 2024.06.30

Abstract

In this paper, we investigate the comparative performance of two leading object detection architectures, YOLOv7 and Detection Transformer (DETR), across varying levels of data augmentation using CycleGAN. Our experiments focus on chest scan images within the context of biomedical informatics, specifically targeting the detection of abnormalities. The study reveals that YOLOv7 consistently outperforms DETR across all levels of augmented data, maintaining better performance even with 75% augmented data. Additionally, YOLOv7 demonstrates significantly faster convergence, requiring approximately 30 epochs compared to DETR's 300 epochs. These findings underscore the superiority of YOLOv7 for object detection tasks, especially in scenarios with limited data and when rapid convergence is essential. Our results provide valuable insights for researchers and practitioners in the field of computer vision, highlighting the effectiveness of YOLOv7 and the importance of data augmentation in improving model performance and efficiency.

Keywords

Acknowledgement

This work was supported by Dongseo University, "Dongseo Cluster Project" Research Fund of 2023 (DSU-20230004).

References

  1. A. Bochkovskiy, C. Wang, H. M. Liao, and R. Girshick, "YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors," Apr. 2021. DOI: https://doi.org/10.48550/arXiv.2207.02696
  2. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. DOI: https://doi.org/10.48550/arXiv.1506.02640
  3. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "YOLOv3: An Incremental Improvement," Apr. 2018. DOI: https://doi.org/10.48550/arXiv.1804.02767
  4. A. Bochkovskiy, C. Wang, and H. M. Liao, "YOLOv4: Optimal Speed and Accuracy of Object Detection," Apr. 2020. DOI: https://doi.org/10.48550/arXiv.2004.10934
  5. N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, "End-to-end object detection with transformers," in European Conference on Computer Vision (ECCV), 2020, pp. 213-229. Springer. DOI: https://doi.org/10.48550/arXiv.2005.12872
  6. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Kaiser, and I. Polosukhin, "Attention is all you need," in Advances in neural information processing systems (NeurIPS), 2017. DOI: https://doi.org/10.48550/arXiv.1706.03762
  7. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., "An image is worth 16x16 words: Transformers for image recognition at scale," Oct. 2020. DOI: https://doi.org/10.48550/arXiv.2010.11929
  8. Z. Chen, Y. Duan, W. Wang, J. He, T. Lu, J. Dai, and Y. Qiao, "Vision Tranformer Adapter for Dense Predictions," In Proceedings of the 9th International Conference on Learning Representations (ICLR), Feb. 2023. DOI: https://doi.org/10.48550/arXiv.2205.08534
  9. H. Q. Nguyen et al., "VinDr-CXR: An Open Dataset of Chest X-rays with Radiologist's Annotations," Jan. 2022. DOI: https://doi.org/10.48550/arXiv.2012.15029
  10. J. Zhu, T. Park, P. Isola, and A. A. Efros, "Unpaired image-to-image translation using cycle-consistent adversarial networks," in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017. DOI: https://doi.org/10.48550/arXiv.1703.10593
  11. R. Zhang, P. Isola, and A. A. Efros, "Colorful image colorization," in European Conference on Computer Vision (ECCV), 2016. DOI: https://doi.org/10.48550/arXiv.1603.08511
  12. L. A. Gatys, A. S. Ecker, and M. Bethge, "Image style transfer using convolutional neural networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  13. M. Tan and Q. V. Le, "EfficientDet: Scalable and efficient object detection," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. DOI: https://doi.org/10.48550/arXiv.1911.09070
  14. J. Hosang, R. Benenson, P. Dollar, and B. Schiele, "Learning non-maximum suppression," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. DOI: https://doi.org/10.48550/arXiv.1705.02950
  15. D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate," in Proceedings of the 3rd International Conference on Learning Representations (ICLR), Sep. 2014. https://doi.org/10.48550/arXiv.1409.0473