We present a study that reveals a significant statistical bias in the distributions of geolocated and non-geolocated social data. We state that this bias affects the real performance of social geolocation algorithms and can impair the results of these algorithms, which are commonly trained and tested on datasets consisting of crawled geolocated data. At last, we propose the construction of an a-posteriori geolocated dataset for an unbiased estimation of new and state-of-the-art algorithms alike.
Towards an unbiased approach for the evaluation of social data geolocation
BERNASCHINA, CARLO;CATALLO, ILIO;CICERI, ELEONORA;FEDOROV, ROMAN;FRATERNALI, PIERO
2015-01-01
Abstract
We present a study that reveals a significant statistical bias in the distributions of geolocated and non-geolocated social data. We state that this bias affects the real performance of social geolocation algorithms and can impair the results of these algorithms, which are commonly trained and tested on datasets consisting of crawled geolocated data. At last, we propose the construction of an a-posteriori geolocated dataset for an unbiased estimation of new and state-of-the-art algorithms alike.File in questo prodotto:
File | Dimensione | Formato | |
---|---|---|---|
gir2015_twittergeotag.pdf
accesso aperto
:
Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione
191.51 kB
Formato
Adobe PDF
|
191.51 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.