In contemporary computer vision, You Only Look Once (YOLO) has become a benchmark for object detection, widely used in domains from intelligent manufacturing-such as industrial quality control and automated inspection-to real-time video surveillance. For example, detecting surface defects on steel products or electronic components in production lines relies on such algorithms to maintain high quality and safety. Despite YOLO’s excellent speed and accuracy in many tasks, it still faces difficulties in certain challenging conditions, notably high dynamic range scenes, complex backgrounds, and the detection of small or subtle objects. These conditions are common in practice-for instance, on shiny metal surfaces with uneven lighting or in busy surveillance scenes-where conventional YOLO models struggle to capture fine details reliably. To overcome these limitations, we propose an improved YOLO-based framework featuring a novel Dynamic Cross-Scale Feature Fusion Module (Dy-CCFM) and a Dual-path Downsampling Convolution Module (DDConv). These modules enhance multi-scale feature representation and preserve detail under extreme lighting and background clutter, which is crucial for monitoring in complex environments. Additionally, we employ the Minimum Point Distance Intersection over Union (MPDIoU) as an optimized loss function for bounding box regression, significantly improving the localization of small objects. Thanks to these innovations, the model achieves a mean Average Precision (mAP) of 75.1 % on the challenging Northeastern University surface defect (NEU-DET) dataset, while the smallest variant is only 1.6M in size. Compared to YOLOv8, our approach improves mAP by 2.1 % while also delivering higher inference speed (FPS), and it surpasses the Detection Transformer (DETR) by 5.0 % mAP. The model further demonstrates excellent generalization on the Google Cloud 10 Defect Detection (GC10-DET) dataset. This enhanced detection algorithm not only improves performance but also offers significant practical value in intelligent manufacturing and automated inspection systems, intelligent video surveillance, and autonomous vehicles, where reliable real-time detection of small defects or targets is critical.
A real-time surface defect detection model based on adaptive feature information selection and fusion
Karimi, Hamid Reza
2026-01-01
Abstract
In contemporary computer vision, You Only Look Once (YOLO) has become a benchmark for object detection, widely used in domains from intelligent manufacturing-such as industrial quality control and automated inspection-to real-time video surveillance. For example, detecting surface defects on steel products or electronic components in production lines relies on such algorithms to maintain high quality and safety. Despite YOLO’s excellent speed and accuracy in many tasks, it still faces difficulties in certain challenging conditions, notably high dynamic range scenes, complex backgrounds, and the detection of small or subtle objects. These conditions are common in practice-for instance, on shiny metal surfaces with uneven lighting or in busy surveillance scenes-where conventional YOLO models struggle to capture fine details reliably. To overcome these limitations, we propose an improved YOLO-based framework featuring a novel Dynamic Cross-Scale Feature Fusion Module (Dy-CCFM) and a Dual-path Downsampling Convolution Module (DDConv). These modules enhance multi-scale feature representation and preserve detail under extreme lighting and background clutter, which is crucial for monitoring in complex environments. Additionally, we employ the Minimum Point Distance Intersection over Union (MPDIoU) as an optimized loss function for bounding box regression, significantly improving the localization of small objects. Thanks to these innovations, the model achieves a mean Average Precision (mAP) of 75.1 % on the challenging Northeastern University surface defect (NEU-DET) dataset, while the smallest variant is only 1.6M in size. Compared to YOLOv8, our approach improves mAP by 2.1 % while also delivering higher inference speed (FPS), and it surpasses the Detection Transformer (DETR) by 5.0 % mAP. The model further demonstrates excellent generalization on the Google Cloud 10 Defect Detection (GC10-DET) dataset. This enhanced detection algorithm not only improves performance but also offers significant practical value in intelligent manufacturing and automated inspection systems, intelligent video surveillance, and autonomous vehicles, where reliable real-time detection of small defects or targets is critical.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


