GAM-YOLO: An Enhanced Small Object Detection Model 
Integrating Global Attention Mechanism

Kangyu Qin; Tianran Yuan

doi:10.70711/aitr.v2i11.7407

GAM-YOLO: An Enhanced Small Object Detection Model Integrating Global Attention Mechanism

Kangyu Qin, Tianran Yuan

Abstract

To address feature loss in YOLOv5 when processing small objects and complex backgrounds, we propose GAM-YOLO with two
key improvements:
??GAM integration: Embedded at critical Backbone and Neck nodes to suppress information diffusion and enhance salient feature capture.
??P2 detection head: Added for high-resolution features, forming a four-head architecture for small objects. Experiments on PASCAL
VOC show GAM-YOLO achieves 89.7% mAP@.5, a 3.2% improvement over YOLOv5s. This provides a robust solution for challenging
small object detection tasks like drone imagery.

Keywords

Object detection; YOLOv5; Attention mechanism; GAM; Small object detection

Full Text:

PDF

Included Database

References

[1] S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-CNN: Towards real-time object detection with region proposal networks," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 11371149, 2017.

[2] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unifi ed, real-time object detection," in Proceedings of the

IEEE conference on computer vision and pattern recognition, 2016, pp. 779788.

[3] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, "YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2023, pp. 74647475.

[4] C.-Y. Wang, I.-H. Yeh, and H.-Y. M. Liao, "YOLOv9: Learning what you want to learn using programmable gradient information,"

arXiv preprint arXiv:2402.13616, 2024.

[5] G. Jocher et al., "YOLOv5." https://github.com/ultralytics/yolov5, 2020.

[6] Y. Du, M. Li, D. Li, L. Liu, and Y. Wang, "A review of object detection in UAV remote sensing images," Image and Vision Computing,

vol. 128, p. 104576, 2022.

[7] A. Vaswani et al., "Attention is all you need," Advances in neural information processing systems, vol. 30, 2017

[8] Y. Liu, Z. Shao, and N. Hoffmann, "Global attention mechanism: Retain information to enhance channel-spatial interactions," arXiv preprint arXiv:2112.05561, 2021

[9] J. Redmon and A. Farhadi, "Yolov3: An incremental improvement," arXiv preprint arXiv:1804.02767, 2018.

[10] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, "Yolov4: Optimal speed and accuracy of object detection," arXiv preprint

arXiv:2004.10934, 2020

[11] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, "Path aggregation network for instance segmentation," in Proceedings of the IEEE conference on

computer vision and pattern recognition (CVPR), 2018, pp. 87598768

[12] X. Xu et al., "DAMO-YOLO: A report on real-time object detection challenge," arXiv preprint arXiv:2211.15996, 2022.

[13] J. Hu, L. Shen, and G. Sun, "Squeeze-and-excitation networks," in Proceedings of the IEEE conference on computer vision and pattern

recognition (CVPR), 2018, pp. 71327141.

[14] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, "Cbam: Convolutional block attention module," in Proceedings of the european conference

on computer vision (ECCV), 2018, pp. 319.

[15] J. Li, Z. Qu, P. Liu, and X. He, "Improved YOLOv5s-based method for small object detection in unmanned aerial vehicle scenarios,"

Sensors, vol. 22, no. 12, p. 4381, 2022.

[16] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollr, "Focal loss for dense object detection," in Proceedings of the IEEE international

conference on computer vision (ICCV), 2017, pp. 29802988.

[17] W. Lv et al., "RT-DETR: Denoising training for real-time object detection," in Proceedings of the IEEE/CVF international conference on

computer vision (ICCV), 2023.

[18] G. Jocher, A. Chaurasia, and J. Qiu, "YOLOv8." https://github.com/ultralytics/ultralytics, 2023.

DOI: http://dx.doi.org/10.70711/aitr.v2i11.7407

Refbacks

There are currently no refbacks.