GAM-YOLO: An Enhanced Small Object Detection Model Integrating Global Attention Mechanism
Abstract
key improvements:
??GAM integration: Embedded at critical Backbone and Neck nodes to suppress information diffusion and enhance salient feature capture.
??P2 detection head: Added for high-resolution features, forming a four-head architecture for small objects. Experiments on PASCAL
VOC show GAM-YOLO achieves 89.7% mAP@.5, a 3.2% improvement over YOLOv5s. This provides a robust solution for challenging
small object detection tasks like drone imagery.
Keywords
Full Text:
PDFReferences
[1] S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-CNN: Towards real-time object detection with region proposal networks," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 11371149, 2017.
[2] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unifi ed, real-time object detection," in Proceedings of the
IEEE conference on computer vision and pattern recognition, 2016, pp. 779788.
[3] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, "YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2023, pp. 74647475.
[4] C.-Y. Wang, I.-H. Yeh, and H.-Y. M. Liao, "YOLOv9: Learning what you want to learn using programmable gradient information,"
arXiv preprint arXiv:2402.13616, 2024.
[5] G. Jocher et al., "YOLOv5." https://github.com/ultralytics/yolov5, 2020.
[6] Y. Du, M. Li, D. Li, L. Liu, and Y. Wang, "A review of object detection in UAV remote sensing images," Image and Vision Computing,
vol. 128, p. 104576, 2022.
[7] A. Vaswani et al., "Attention is all you need," Advances in neural information processing systems, vol. 30, 2017
[8] Y. Liu, Z. Shao, and N. Hoffmann, "Global attention mechanism: Retain information to enhance channel-spatial interactions," arXiv preprint arXiv:2112.05561, 2021
[9] J. Redmon and A. Farhadi, "Yolov3: An incremental improvement," arXiv preprint arXiv:1804.02767, 2018.
[10] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, "Yolov4: Optimal speed and accuracy of object detection," arXiv preprint
arXiv:2004.10934, 2020
[11] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, "Path aggregation network for instance segmentation," in Proceedings of the IEEE conference on
computer vision and pattern recognition (CVPR), 2018, pp. 87598768
[12] X. Xu et al., "DAMO-YOLO: A report on real-time object detection challenge," arXiv preprint arXiv:2211.15996, 2022.
[13] J. Hu, L. Shen, and G. Sun, "Squeeze-and-excitation networks," in Proceedings of the IEEE conference on computer vision and pattern
recognition (CVPR), 2018, pp. 71327141.
[14] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, "Cbam: Convolutional block attention module," in Proceedings of the european conference
on computer vision (ECCV), 2018, pp. 319.
[15] J. Li, Z. Qu, P. Liu, and X. He, "Improved YOLOv5s-based method for small object detection in unmanned aerial vehicle scenarios,"
Sensors, vol. 22, no. 12, p. 4381, 2022.
[16] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollr, "Focal loss for dense object detection," in Proceedings of the IEEE international
conference on computer vision (ICCV), 2017, pp. 29802988.
[17] W. Lv et al., "RT-DETR: Denoising training for real-time object detection," in Proceedings of the IEEE/CVF international conference on
computer vision (ICCV), 2023.
[18] G. Jocher, A. Chaurasia, and J. Qiu, "YOLOv8." https://github.com/ultralytics/ultralytics, 2023.
DOI: http://dx.doi.org/10.70711/aitr.v2i11.7407
Refbacks
- There are currently no refbacks.