rtificial Intelligence (AI) and computer vision (CV) methods are increasingly applied in construction for managerial and operational tasks, including quality control, progress monitoring, and safety management, driven by the demand for automated detection of objects during under- and post-construction phases. However, the complexity of construction jobsites presents substantial challenges for the application of CV methods, particularly the conventional deep learning techniques that perform with closed sets of predefined object classes and rely on and extensive, task-specific training data. This limits their generalized application in dynamic site conditions and new scenes, while open-set object detection methods have emerged as a promising alternative to address this challenge, enabling object recognition without the need for large datasets and task-specific training, thus offering generalization to complex and dynamic construction phases. This study investigates the performance of GroundingDINO, an advanced open-vocabulary vision–language model, in generalizing to construction-related objects recognition. A dataset of 240 images across 48 object classes was developed, encompassing both indoor and outdoor environments. Five scenarios per object class were considered to replicate real-world and complex conditions of the objects: normal appearance, small scale within context, cluttered environment, high and low brightness. The results highlight both the potential and current limitations of open-set object detection for construction domain across diversified scenarios. This study advances automated vision-based construction management by providing a comprehensive assessment of open-set object detection in construction-specific environments, while highlighting the need for further research on prompt formulation to improve the robustness of open-vocabulary models in real-world managerial and operational workflows.

Vision–Language Models in Construction: Evaluating Open-Vocabulary Object Detection under Complex Scenarios

F. Madaschi;M. L. A. Trani
In corso di stampa

Abstract

rtificial Intelligence (AI) and computer vision (CV) methods are increasingly applied in construction for managerial and operational tasks, including quality control, progress monitoring, and safety management, driven by the demand for automated detection of objects during under- and post-construction phases. However, the complexity of construction jobsites presents substantial challenges for the application of CV methods, particularly the conventional deep learning techniques that perform with closed sets of predefined object classes and rely on and extensive, task-specific training data. This limits their generalized application in dynamic site conditions and new scenes, while open-set object detection methods have emerged as a promising alternative to address this challenge, enabling object recognition without the need for large datasets and task-specific training, thus offering generalization to complex and dynamic construction phases. This study investigates the performance of GroundingDINO, an advanced open-vocabulary vision–language model, in generalizing to construction-related objects recognition. A dataset of 240 images across 48 object classes was developed, encompassing both indoor and outdoor environments. Five scenarios per object class were considered to replicate real-world and complex conditions of the objects: normal appearance, small scale within context, cluttered environment, high and low brightness. The results highlight both the potential and current limitations of open-set object detection for construction domain across diversified scenarios. This study advances automated vision-based construction management by providing a comprehensive assessment of open-set object detection in construction-specific environments, while highlighting the need for further research on prompt formulation to improve the robustness of open-vocabulary models in real-world managerial and operational workflows.
In corso di stampa
University of Toronto Press
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1309216
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact