Deep learning-based object detection models have achieved significant progress in natural scene tasks. However, in industrial environments, challenges such as densely stacked workpieces, blurred boundaries, and complex backgrounds make it difficult for existing general-purpose detection models to perform effectively in dense workpiece detection tasks. To address this issue, we propose a dense workpiece detection framework for industrial scenarios, called improved YOLOv5 combining slicing training and slicing inference (STSI-YOLOv5). Specifically, we design a selective channel attention module (SCAM), which is effectively integrated into the neck of YOLOv5 to enhance feature aggregation capability by dynamically selecting relevant channel information and suppressing redundant channels, thereby improving model performance in workpiece detection within complex backgrounds. In addition, STSI-YOLOv5 introduces the slicing training and slicing inference (STSI) strategy, which divides the input image into consistently sized patches during both training and inference to enhance the representation of local features, thereby assisting the model in more accurately determining workpiece boundaries and improving localization accuracy in dense scenes. Correspondingly, for the slicing inference in this scenario, we propose a novel postprocessing method, area-based nonmaximum suppression (A-NMS), which ranks the detection boxes based on their area, followed by suppression operations, effectively selecting more accurate detection boxes that align with the target characteristics. Finally, the proposed method was evaluated on the WPCD dataset, with results showing its superiority over existing state-of-the-art methods.
STSI-YOLOv5: Improved YOLOv5 Combining Slicing Training and Slicing Inference for Dense Workpiece Detection in Industrial Scenes
Quan, Hao;
2025-01-01
Abstract
Deep learning-based object detection models have achieved significant progress in natural scene tasks. However, in industrial environments, challenges such as densely stacked workpieces, blurred boundaries, and complex backgrounds make it difficult for existing general-purpose detection models to perform effectively in dense workpiece detection tasks. To address this issue, we propose a dense workpiece detection framework for industrial scenarios, called improved YOLOv5 combining slicing training and slicing inference (STSI-YOLOv5). Specifically, we design a selective channel attention module (SCAM), which is effectively integrated into the neck of YOLOv5 to enhance feature aggregation capability by dynamically selecting relevant channel information and suppressing redundant channels, thereby improving model performance in workpiece detection within complex backgrounds. In addition, STSI-YOLOv5 introduces the slicing training and slicing inference (STSI) strategy, which divides the input image into consistently sized patches during both training and inference to enhance the representation of local features, thereby assisting the model in more accurately determining workpiece boundaries and improving localization accuracy in dense scenes. Correspondingly, for the slicing inference in this scenario, we propose a novel postprocessing method, area-based nonmaximum suppression (A-NMS), which ranks the detection boxes based on their area, followed by suppression operations, effectively selecting more accurate detection boxes that align with the target characteristics. Finally, the proposed method was evaluated on the WPCD dataset, with results showing its superiority over existing state-of-the-art methods.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


