STF: A Unified Framework for Joint Pixel-Level Segmentation and Tracking of Tissues in Endoscopic Surgery

Li, Y.; Cruciani, L.; Mistretta, F. A.; Luzzago, S.; Ferrigno, G.; Musi, G.; De Momi, E.

doi:10.1109/TBME.2026.3656751

: Endoscopic minimally invasive surgery relies on precise tissue video segmentation to avoid complications such as vascular bleeding or nerve injury. However, existing video segmentation methods often fail to maintain long-term robustness due to target loss and challenging conditions (e.g., occlusion, motion blur), limiting their applicability in prolonged surgical procedures. To address these limitations, we proposed the Unified Framework for Joint Pixel-Level Segmentation and Tracking (STF), it integrates a synergistic segmentation-guided tracking pipeline with an adaptive re-detection mechanism. First, a deep learning-based segmentation network precisely localizes the target tissue. A cost-efficient Hough Voting Network then tracks the segmented region, while a Bayesian refinement module improves compatibility between segmentation and tracking. If tracking reliability drops, an evaluation module triggers re-segmentation, ensuring continuous and stable long-term performance. Extensive experiments confirm that STF achieves superior accuracy and temporal consistency over segmentation networks in long-term surgical video segmentation, particularly under extreme conditions. This automated methodology significantly improves the robustness and re-detection capability for sustained tissue analysis, markedly reducing the dependency on manual intervention prevalent in many model-based tracking solutions.