We present a study that reveals a significant statistical bias in the distributions of geolocated and non-geolocated social data. We state that this bias affects the real performance of social geolocation algorithms and can impair the results of these algorithms, which are commonly trained and tested on datasets consisting of crawled geolocated data. At last, we propose the construction of an a-posteriori geolocated dataset for an unbiased estimation of new and state-of-the-art algorithms alike.

Towards an unbiased approach for the evaluation of social data geolocation

BERNASCHINA, CARLO;CATALLO, ILIO;CICERI, ELEONORA;FEDOROV, ROMAN;FRATERNALI, PIERO
2015

Abstract

We present a study that reveals a significant statistical bias in the distributions of geolocated and non-geolocated social data. We state that this bias affects the real performance of social geolocation algorithms and can impair the results of these algorithms, which are commonly trained and tested on datasets consisting of crawled geolocated data. At last, we propose the construction of an a-posteriori geolocated dataset for an unbiased estimation of new and state-of-the-art algorithms alike.
Proceedings of the 9th Workshop on Geographic Information Retrieval
9781450339377
File in questo prodotto:
File Dimensione Formato  
gir2015_twittergeotag.pdf

accesso aperto

: Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione 191.51 kB
Formato Adobe PDF
191.51 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11311/981110
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact