REMOTE SENSING OBJECT DETECTION BASED ON CONVOLUTION AND SWIN TRANSFORMER
Keywords:
Remote sensing images, object detection, attention mechanism, swin transformer, multiscale features.Abstract
This study addresses challenges in remote sensing object detection, proposing the RAST-YOLO
algorithm that integrates Region Attention (RA) with Swin Transformer as the backbone. The method effectively
handles issues like varied target scales, intricate backgrounds, and closely spaced small objects. Incorporating the
C3D module optimizes the multi-scale problem for small objects, enhancing detection accuracy. Evaluations on
DIOR and TGRS-HRRSD datasets demonstrate RAST-YOLO's state-of-the-art performance, surpassing baseline
networks. Notably, the model achieves a substantial mean average precision (mAP) improvement on both datasets,
showcasing its effectiveness and superiority. Furthermore, the lightweight structure ensures real-time detection,
making RAST-YOLO a practical choice for efficient and robust remote sensing object detection. The study extends
the analysis to other prominent models like YOLOv5s, YOLOv3, FasterRCNN, RetinaNet, YOLOv5x6, and
YOLOv8. Notably, YOLOv5x6 stands out with an impressive 0.80% mAP or higher, suggesting its potential for
further enhancing detection performance in remote sensing applications.