The construction of control groups of scientists is often a daunting effort. This paper presents sosia, an open-source Python-based software designed to efficiently query the Scopus database via RESTful API. sosia searches for researchers with publication profiles similar to a given researcher up to a given year based on all main standard bibliometric indicators. The user can choose flexibly a set of parameters to restrict the search to more or less narrow boundaries upfront and obtain additional similarity indicators to select a subset of authors after the search. Advanced settings also allow narrowing the search to a list of affiliations and to minimize the possible errors arising from ambiguous author profiles. One basic search can be set up in a few command lines and the average time of computation goes between 60 and 300 minutes. We discuss the functioning, characteristics, limitations and possible extension of the software.

Finding Doppelgängers in Scopus: how to build scientists control groups using sosia

Baruffaldi, Stefano H.
2025-01-01

Abstract

The construction of control groups of scientists is often a daunting effort. This paper presents sosia, an open-source Python-based software designed to efficiently query the Scopus database via RESTful API. sosia searches for researchers with publication profiles similar to a given researcher up to a given year based on all main standard bibliometric indicators. The user can choose flexibly a set of parameters to restrict the search to more or less narrow boundaries upfront and obtain additional similarity indicators to select a subset of authors after the search. Advanced settings also allow narrowing the search to a list of affiliations and to minimize the possible errors arising from ambiguous author profiles. One basic search can be set up in a few command lines and the average time of computation goes between 60 and 300 minutes. We discuss the functioning, characteristics, limitations and possible extension of the software.
2025
Control group
Diff-in-diff
Scopus
Statistical doftware
File in questo prodotto:
File Dimensione Formato  
s11192-025-05298-y.pdf

accesso aperto

: Publisher’s version
Dimensione 958.22 kB
Formato Adobe PDF
958.22 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1309939
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact