A lack of reliable relevance labels for training ranking functions is a significant problem for many search applications. Transfer ranking is a technique aiming to transfer knowledge from an existing machine learning ranking task to a new ranking task. Unsupervised transfer ranking is a special case of transfer ranking where there aren't any relevance labels available for the new task, only queries and retrieved documents. One approach to tackling this problem is to impute relevance labels for (document-query) instances in the target collection. This is done by using knowledge from the source collection. We propose three self-labeling methods for unsupervised transfer ranking: an expectation-maximization based method (RankPairwiseEM) for estimating pairwise preferences across documents, a hard-assignment expectation-maximization based algorithm (RankHardLabelEM), which directly assigns imputed relevance labels to documents, and a self-learning algorithm (RankSelfTrain), which gradually increases the number of imputed labels. We have compared the three algorithms on three large public test collections using LambdaMART as the base ranker and found that (i) all the proposed algorithms show improvements over the original source ranker in different transferring scenarios; (ii) RankPairwiseEM and RankSelfTrain significantly outperform the source rankers across all environments. We have also found that they are not significantly worse than the model directly trained on the target collection; and (iii) self-labeling methods are significantly better than previous instance-weighting based solutions on a variety of collections.

Self-labeling methods for unsupervised transfer ranking

Carman M.;
2020-01-01

Abstract

A lack of reliable relevance labels for training ranking functions is a significant problem for many search applications. Transfer ranking is a technique aiming to transfer knowledge from an existing machine learning ranking task to a new ranking task. Unsupervised transfer ranking is a special case of transfer ranking where there aren't any relevance labels available for the new task, only queries and retrieved documents. One approach to tackling this problem is to impute relevance labels for (document-query) instances in the target collection. This is done by using knowledge from the source collection. We propose three self-labeling methods for unsupervised transfer ranking: an expectation-maximization based method (RankPairwiseEM) for estimating pairwise preferences across documents, a hard-assignment expectation-maximization based algorithm (RankHardLabelEM), which directly assigns imputed relevance labels to documents, and a self-learning algorithm (RankSelfTrain), which gradually increases the number of imputed labels. We have compared the three algorithms on three large public test collections using LambdaMART as the base ranker and found that (i) all the proposed algorithms show improvements over the original source ranker in different transferring scenarios; (ii) RankPairwiseEM and RankSelfTrain significantly outperform the source rankers across all environments. We have also found that they are not significantly worse than the model directly trained on the target collection; and (iii) self-labeling methods are significantly better than previous instance-weighting based solutions on a variety of collections.
2020
Domain adaptation
Information retrieval
Learning to rank
Ranking adaptation
Transfer learning
Transfer ranking
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1145131
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 3
social impact