This paper proposes a multimodal cross-domain anomaly detection framework that leverages synthetic anomalies as auxiliary data. To overcome the cold-start challenge overlooked by existing methods, we design the Synthetic Anomaly Module (SyAM), which embeds potential anomaly patterns into high-frequency regions of normal samples during forward diffusion, thereby enabling realistic anomaly generation while preserving low-frequency distributions. For detection, a pyramid graph network extracts multi-scale topological features, and a zooming mechanism establishes cross-scale correlations to enhance anomaly localization. Detection is performed via vision-text matching. Experimental results show that the proposed model, Zoom-Anomaly, achieves high accuracy when trained solely on synthetic anomalies and demonstrates robust performance on both the MVTec AD, VisA and PV_actual AD datasets, confirming its effectiveness in real-world industrial environments.

Zoom-Anomaly: Multimodal vision-Language fusion industrial anomaly detection with synthetic data

Li, Jiaqi;Karimi, Hamid Reza
2026-01-01

Abstract

This paper proposes a multimodal cross-domain anomaly detection framework that leverages synthetic anomalies as auxiliary data. To overcome the cold-start challenge overlooked by existing methods, we design the Synthetic Anomaly Module (SyAM), which embeds potential anomaly patterns into high-frequency regions of normal samples during forward diffusion, thereby enabling realistic anomaly generation while preserving low-frequency distributions. For detection, a pyramid graph network extracts multi-scale topological features, and a zooming mechanism establishes cross-scale correlations to enhance anomaly localization. Detection is performed via vision-text matching. Experimental results show that the proposed model, Zoom-Anomaly, achieves high accuracy when trained solely on synthetic anomalies and demonstrates robust performance on both the MVTec AD, VisA and PV_actual AD datasets, confirming its effectiveness in real-world industrial environments.
2026
Denoising diffusion probabilistic model; Graph neural networks; Industrial anomaly detection; multimodality; Synthetic data;
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1310753
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
social impact