RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

Video Question Answering (VideoQA) is a key problem contributing to advanced video understanding. The rise of Multimodal Large Language Models (MLLMs) has accelerated the improvement on VideoQA tasks. However, MLLMs can produce inconsistent output even for similar prompts and suffer from hallucinations and biases. In this position paper, we envisage a novel pipeline, where scene graphs representing people, objects, and relationships in a video are injected in the MLLM prompt. We hypothesise that leveraging a symbolic representation of the video content can improve accuracy and verifiability and reduce the latency of MLLMs for VideoQA.

Graph Against the Machine: a Neuro-Symbolic Approach for Enhanced Video Question Answering

Fabio Lusha;Agnese Chiatti;Sara Pidò;Nico Catalano;Matteo Matteucci

2025-01-01

Abstract

Video Question Answering (VideoQA) is a key problem contributing to advanced video understanding. The rise of Multimodal Large Language Models (MLLMs) has accelerated the improvement on VideoQA tasks. However, MLLMs can produce inconsistent output even for similar prompts and suffer from hallucinations and biases. In this position paper, we envisage a novel pipeline, where scene graphs representing people, objects, and relationships in a video are injected in the MLLM prompt. We hypothesise that leveraging a symbolic representation of the video content can improve accuracy and verifiability and reduce the latency of MLLMs for VideoQA.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2025
			
	Titolo del libro
	
				ANSyA 2025 Advanced Neuro-Symbolic Applications
			
	Titolo della collana
	
				CEUR WORKSHOP PROCEEDINGS
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
paper_17.pdf accesso aperto Descrizione: full paper manuscript : Publisher’s version Dimensione 1.07 MB Formato Adobe PDF Visualizza/Apri	1.07 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1308431

Citazioni

ND

1

0

ND

social impact