The construction of control groups of scientists is often a daunting effort. This paper presents sosia, an open-source Python-based software designed to efficiently query the Scopus database via RESTful API. sosia searches for researchers with publication profiles similar to a given researcher up to a given year based on all main standard bibliometric indicators. The user can choose flexibly a set of parameters to restrict the search to more or less narrow boundaries upfront and obtain additional similarity indicators to select a subset of authors after the search. Advanced settings also allow narrowing the search to a list of affiliations and to minimize the possible errors arising from ambiguous author profiles. One basic search can be set up in a few command lines and the average time of computation goes between 60 and 300 minutes. We discuss the functioning, characteristics, limitations and possible extension of the software.
Finding Doppelgängers in Scopus: how to build scientists control groups using sosia
Baruffaldi, Stefano H.
2025-01-01
Abstract
The construction of control groups of scientists is often a daunting effort. This paper presents sosia, an open-source Python-based software designed to efficiently query the Scopus database via RESTful API. sosia searches for researchers with publication profiles similar to a given researcher up to a given year based on all main standard bibliometric indicators. The user can choose flexibly a set of parameters to restrict the search to more or less narrow boundaries upfront and obtain additional similarity indicators to select a subset of authors after the search. Advanced settings also allow narrowing the search to a list of affiliations and to minimize the possible errors arising from ambiguous author profiles. One basic search can be set up in a few command lines and the average time of computation goes between 60 and 300 minutes. We discuss the functioning, characteristics, limitations and possible extension of the software.| File | Dimensione | Formato | |
|---|---|---|---|
|
s11192-025-05298-y.pdf
accesso aperto
:
Publisher’s version
Dimensione
958.22 kB
Formato
Adobe PDF
|
958.22 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


