Machine Learning (ML)-enabled systems that run in safety-critical settings expose humans to risks. Hence, it is important to build such systems with strong assurances for domain-specific safety requirements. Simulation as well as metaheuristic optimizing search have proven to be valuable tools for online testing of ML-enabled systems for early detection of hazards. However, the efficient generation of effective test cases remains a challenging issue. In particular, the testing process shall produce as many failures as possible but also unveil diverse sets of failure scenarios.To study this phenomenon, we introduce a risk-driven test case generation and diversity analysis method tailored to ML-enabled systems. Our approach uses an online testing technique based on metaheuristic optimizing search to falsify domainspecific safety requirements. All test cases leading to hazards are then analyzed to assess their diversity by using clustering and interpretable ML. We evaluated our approach in a collaborative robotics case study showing that generating tests considering risk metrics represents an effective strategy. Furthermore, we compare alternative optimizing search algorithms and rank them based on the overall diversity of the test cases, ultimately showing that selecting the testing strategy based on the number of failures only may be misleading.
Risk-driven Online Testing and Test Case Diversity Analysis for ML-enabled Critical Systems
Camilli, Matteo;
2023-01-01
Abstract
Machine Learning (ML)-enabled systems that run in safety-critical settings expose humans to risks. Hence, it is important to build such systems with strong assurances for domain-specific safety requirements. Simulation as well as metaheuristic optimizing search have proven to be valuable tools for online testing of ML-enabled systems for early detection of hazards. However, the efficient generation of effective test cases remains a challenging issue. In particular, the testing process shall produce as many failures as possible but also unveil diverse sets of failure scenarios.To study this phenomenon, we introduce a risk-driven test case generation and diversity analysis method tailored to ML-enabled systems. Our approach uses an online testing technique based on metaheuristic optimizing search to falsify domainspecific safety requirements. All test cases leading to hazards are then analyzed to assess their diversity by using clustering and interpretable ML. We evaluated our approach in a collaborative robotics case study showing that generating tests considering risk metrics represents an effective strategy. Furthermore, we compare alternative optimizing search algorithms and rank them based on the overall diversity of the test cases, ultimately showing that selecting the testing strategy based on the number of failures only may be misleading.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.