RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

Objective: Accurate assessment of colorectal lesion morphology during colonoscopy is essential for guiding treatment and estimating cancer risk. The Paris classification is widely adopted for this purpose but suffers from substantial inter-observer variability, while Vision Transformers (ViTs) can base their decisions on diffuse, off-lesion attention patterns that are hard to interpret. This study investigates whether directly supervising ViT attention maps with expert lesion annotations can concurrently improve Paris classification performance and model explainability. Method: We propose a Lesion-Focused Attention Loss (GLFA), an attention-supervised pretraining objective that uses expert polyp bounding boxes to focus last-layer [CLS] attention on annotated lesion regions, followed by standard cross-entropy fine-tuning. GLFA is applied to six ViT architectures and evaluated on the public SUN dataset for binary (0-I vs. 0-II) and three-class (0-Ip, 0-Is, 0-IIa) Paris classification. Performance is assessed using frame-wise accuracy and the AttIn, we additionally perform an ablation study against a Grad-CAM consistency baseline. Results: Attention-supervised pretraining yields consistent gains in both accuracy and lesion-focused attention. Across the six ViTs, adding GLFA improves three-class accuracy by up to 7 percentage points. In a detailed ablation on ViT-B/16, GLFA outperforms a Grad-CAM consistency baseline by about 5-13 percentage points across the 2-class and 3-class tasks, and chi 2 tests confirm a significant association between high AttIn and correct predictions. Conclusion: Direct supervision of ViT attention with GLFA leverages expert knowledge to jointly boost Paris classification accuracy and spatial interpretability, and compares favourably with Grad-CAM-based explanation regularisation. The source code and dataset splits are publicly available at https://github.com/LucaCarlini/ SUNDatasetPretraining.

Enhancing accuracy and explainability in colorectal lesion classification with attention-supervised Vision Transformers

Carlini L.;Di Stefano L.;Lena C.;Massimi D.;Rizkala T.;Hassan C.;De Momi E.

2026-01-01

Abstract

Objective: Accurate assessment of colorectal lesion morphology during colonoscopy is essential for guiding treatment and estimating cancer risk. The Paris classification is widely adopted for this purpose but suffers from substantial inter-observer variability, while Vision Transformers (ViTs) can base their decisions on diffuse, off-lesion attention patterns that are hard to interpret. This study investigates whether directly supervising ViT attention maps with expert lesion annotations can concurrently improve Paris classification performance and model explainability. Method: We propose a Lesion-Focused Attention Loss (GLFA), an attention-supervised pretraining objective that uses expert polyp bounding boxes to focus last-layer [CLS] attention on annotated lesion regions, followed by standard cross-entropy fine-tuning. GLFA is applied to six ViT architectures and evaluated on the public SUN dataset for binary (0-I vs. 0-II) and three-class (0-Ip, 0-Is, 0-IIa) Paris classification. Performance is assessed using frame-wise accuracy and the AttIn, we additionally perform an ablation study against a Grad-CAM consistency baseline. Results: Attention-supervised pretraining yields consistent gains in both accuracy and lesion-focused attention. Across the six ViTs, adding GLFA improves three-class accuracy by up to 7 percentage points. In a detailed ablation on ViT-B/16, GLFA outperforms a Grad-CAM consistency baseline by about 5-13 percentage points across the 2-class and 3-class tasks, and chi 2 tests confirm a significant association between high AttIn and correct predictions. Conclusion: Direct supervision of ViT attention with GLFA leverages expert knowledge to jointly boost Paris classification accuracy and spatial interpretability, and compares favourably with Grad-CAM-based explanation regularisation. The source code and dataset splits are publicly available at https://github.com/LucaCarlini/ SUNDatasetPretraining.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2026
			
	Titolo della rivista
	
				COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE
			
	Parole chiave
	
				Attention supervision
Colorectal lesion classification
Paris classification
Trustworthy AI
Vision transformers
			
	Appare nelle tipologie:
	
				01.1 Articolo in Rivista

File in questo prodotto:

File	Dimensione	Formato
paper_expl.pdf accesso aperto : Publisher’s version Dimensione 1.96 MB Formato Adobe PDF Visualizza/Apri	1.96 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1308394

Citazioni

1

1

1

ND

social impact