Coding video sequences of visual features

Baroffio, Luca; Cesana, Matteo; Redondi, Alessandro Enrico Cesare; Tubaro, Stefano; Tagliasacchi, Marco

doi:10.1109/ICIP.2013.6738390

Visual features provide a convenient representation of the image content, which is exploited in several applications, e.g., visual search, object tracking, etc. In several cases, visual features need to be transmitted over a bandwidth-limited network, thus calling for coding techniques to reduce the required rate, while attaining a target efficiency for the task at hand. Although the literature has recently addressed the problem of coding local features extracted from still images, in this paper we propose, for the first time, a coding architecture designed for local features extracted from video content. We exploit both spatial and temporal redundancy by means of intra-frame and inter-frame coding modes. In addition, we propose a coding mode decision based on rate-distortion optimization. Experimental results demonstrate that, in the case of SIFT descriptors, exploiting temporal redundancy leads to substantial gains in terms of coding efficiency.