Markerless pose estimation is an accessible alternative to marker-based methods. Although Google MediaPipe Pose is a promising RGB-only solution, a comprehensive performance investigation of its 2D and 3D models for postural assessment is lacking. This study systematically evaluated the accuracy, reliability, and symmetry preservation of MediaPipe models at varying complexities, and compared them against a marker-based gold standard and a markerless RGB-D reference system based on Azure Kinect. Twenty-four healthy subjects performed static 40-second postural tasks, from which five categories of angular and linear measures were computed from frontal recordings to compare model' behaviors. One of the primary findings was the counterintuitive degradation in 3D MediaPipe reconstruction performance with increasing model complexity. Indeed, the high-complexity model introduced severe distortions and asymmetries, which we demonstrated to originate from the 3D-uplifting process rather than robust 2D tracking. All 2D models showed excellent performance in frontal plane analysis, albeit with some limitations for specific angular measurements. This study revealed a clear trade-off between the precision of RGB-D systems and the accessibility of RGB-only models. In addition, the analysis provided detailed recommendations for selecting the most suitable model for specific applications of postural assessment. Overall, the findings suggest that current 3D MediaPipe Pose models should be used with caution for quantitative postural assessment, especially at higher complexities, whereas 2D models represent an effective and practical choice for many lightweight front-view applications.

A Deep Dive into MediaPipe Pose for Postural Assessment: A Comparative Investigation

Cerfoglio, Serena;Cimolin, Veronica
2025-01-01

Abstract

Markerless pose estimation is an accessible alternative to marker-based methods. Although Google MediaPipe Pose is a promising RGB-only solution, a comprehensive performance investigation of its 2D and 3D models for postural assessment is lacking. This study systematically evaluated the accuracy, reliability, and symmetry preservation of MediaPipe models at varying complexities, and compared them against a marker-based gold standard and a markerless RGB-D reference system based on Azure Kinect. Twenty-four healthy subjects performed static 40-second postural tasks, from which five categories of angular and linear measures were computed from frontal recordings to compare model' behaviors. One of the primary findings was the counterintuitive degradation in 3D MediaPipe reconstruction performance with increasing model complexity. Indeed, the high-complexity model introduced severe distortions and asymmetries, which we demonstrated to originate from the 3D-uplifting process rather than robust 2D tracking. All 2D models showed excellent performance in frontal plane analysis, albeit with some limitations for specific angular measurements. This study revealed a clear trade-off between the precision of RGB-D systems and the accessibility of RGB-only models. In addition, the analysis provided detailed recommendations for selecting the most suitable model for specific applications of postural assessment. Overall, the findings suggest that current 3D MediaPipe Pose models should be used with caution for quantitative postural assessment, especially at higher complexities, whereas 2D models represent an effective and practical choice for many lightweight front-view applications.
2025
File in questo prodotto:
File Dimensione Formato  
Ferraris_IEEEAccess_2025.pdf

Accesso riservato

: Publisher’s version
Dimensione 3.38 MB
Formato Adobe PDF
3.38 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1302490
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact