Environmental perception is the base of autonomous driving systems, and it directly affects both operational safety and intelligent decision-making capability. Among the emerging technologies, vision-based 3D occupancy prediction is gaining more attention because of its high cost-effectiveness and high-resolution scene understanding capability. However, existing methods often have too much model complexity and limited inference efficiency, which makes deployment on resource-constrained embedded platforms difficult. To address the limitations, we propose LWMOcc, a lightweight monocular 3D occupancy prediction framework. The main component of LWMOcc is the lightweight Encoder-Decoder module, which is a lightweight fine-grained scene perception module that combines a simplified backbone with an efficient decoding strategy. By performing structural simplification and parameter compression, LWMOcc effectively reduces computational overhead, while retaining high predictive accuracy, resulting in significant enhancements in both inference speed and model compactness. A large number of experiments conducted on the SemanticKITTI benchmark show that LWMOcc reduces the 2D visual feature extraction module's parameter count by 83.2%, increases the inference speed by 82.6%, and maintains a high occupancy prediction accuracy of 98.2%. Compared with existing lightweight methods, LWMOcc achieves a better balance between efficiency and accuracy, demonstrating strong potential for real-time deployment on embedded autonomous driving systems.

LWMOcc: Lightweight Monocular 3D Occupancy Prediction Method

Cadini, Francesco
2025-01-01

Abstract

Environmental perception is the base of autonomous driving systems, and it directly affects both operational safety and intelligent decision-making capability. Among the emerging technologies, vision-based 3D occupancy prediction is gaining more attention because of its high cost-effectiveness and high-resolution scene understanding capability. However, existing methods often have too much model complexity and limited inference efficiency, which makes deployment on resource-constrained embedded platforms difficult. To address the limitations, we propose LWMOcc, a lightweight monocular 3D occupancy prediction framework. The main component of LWMOcc is the lightweight Encoder-Decoder module, which is a lightweight fine-grained scene perception module that combines a simplified backbone with an efficient decoding strategy. By performing structural simplification and parameter compression, LWMOcc effectively reduces computational overhead, while retaining high predictive accuracy, resulting in significant enhancements in both inference speed and model compactness. A large number of experiments conducted on the SemanticKITTI benchmark show that LWMOcc reduces the 2D visual feature extraction module's parameter count by 83.2%, increases the inference speed by 82.6%, and maintains a high occupancy prediction accuracy of 98.2%. Compared with existing lightweight methods, LWMOcc achieves a better balance between efficiency and accuracy, demonstrating strong potential for real-time deployment on embedded autonomous driving systems.
2025
SAE Technical Paper
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1311196
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact