Analyses of the errors characteristics in land cover maps: Aggregation of errors for individual land cover class confusions

Bratic, G.; Brovelli, M. A.

Methodology and EO data behind land cover maps are improving constantly so as the land cover maps quality, and this is known thanks to the accuracy assessment. Accuracy assessment is not only useful for map users to understand how reliable the map is, but also feedback to the producer about how much the map can be improved. For accuracy assessment, classified data are compared to reference data. Reference data can be a result of a field survey or visual inspection of satellite imagery or orthophoto, but also another land cover map. The first two sources of reference data are considered more reliable, however, the advantage of having another land cover map as a reference data is that the data are continuous in the study area, and not samples like in other cases. This advantage was exploited in this work to develop a methodology for analyzing the aggregation of the errors in the land cover map for each confusion type (confusion type is a combination of the two confused classes). Aggregation is a proxy to determine if an error of certain confusion type is systematic or random. If the confusion type is not random, it is possible to tune parameters of classification algorithm or increase training samples to improve results of the specific class involved in non-random confusion. Before the comparison of the reference and classified map they need to go through preprocessing to have the same class codes, reference system, and resolution. Afterward, all the pixels on the borders between classes in classified map and reference maps should be masked. This is necessary to exclude possible mixed pixel and georeferencing error since usually they are not expected to be aggregated and thus can affect overall aggregation of the confusion type. Furthermore, a confusion map between classification and reference can be computed. Values in the confusion map can be obtained by merging the value of reference data and classification data (e.g. 1020 is confusion between class 10 in reference and class 20 in classified data) through arithmetic operations on the rasters. Based on this map it was possible to estimate how aggregated each type of confusion is. In this methodology aggregation is estimated as the ratio between the number of joins of a confusion type with itself and the total number of joins with all the classes including itself. The equation originates from FRAGSTATS software (https://www.umass.edu/landeco/research/fragstats/fragstats.html), but it is adjusted from 4-cell to the 8-cell neighborhood. Borders are double counted. Number of joins was extracted through arithmetic operations from the results of r.neigborhood GRASS GIS module (interspersion method) that also uses number of joins for getting interspersion of a pixels. All the operations were done by employing different modules of GRASS GIS software through Python (PyGRASS, (https://grass.osgeo.org/grass78/manuals/libpython/pygrass_index.html) employed on CINECA HPC - Galileo (http://www.hpc.cineca.it/). For exemplifying the results Figure ‘Confusion type aggregation’ contains results of the above described procedure for two confusion types. Data used for calculation of confusion types are ‘GlobeLand30’ (GL30, http://www.globallandcover.com/) for 2015 and 'S2 prototype LC map at 20m of Africa 2016' (CCI S2 Prototype, http://2016africalandcover20m.esrin.esa.int/) in Rwanda, Africa. The GL30 map represent classification results which was compared to the CCI S2 Prototype reference map. In the figure it is possible to see confusion types: Confusion of class Cultivated area with Water bodies and Shrubland with Forest. Aggregation of former confusion type is estimated to be 95%, while aggregation of latter confusion type is 79%. The figure confirms what numbers are showing. The aggregation of the Cultivated area – Water bodies confusion is stronger than Shrubland-Forest confusion. Confusion types with high aggregation indicate that cause of the confusion is systematic. Since two land cover maps are compared during accuracy assessment, in the first place we must be sure that cause of error is not due to mismatching of legends. Then map producer can investigate if number of training samples was sufficient, or if tuning of the classification algorithm can be adjusted. This work focuses only in the estimation of the aggregation, however for complete accuracy assessment it is necessary to take into consideration error matrix and associated indexes (e.g Overall accuracy, User’s Accuracy, Producer’s accuracy, etc.). Although use of existing land cover maps for validation is not the most reliable technique it can have its advantages That is the only type of reference data that is continuous and therefore allows analyses of class confusion aggregation and thus separation of confusion to random and systematic If the confusion is strongly aggregated it is an indicator that classification might be wrong and that it is necessary to tune classification algorithm or increase number of training samples involved into classification of the class concerned

RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano