{"title":"Machine learning methods for results merging in patent retrieval","authors":"Vasileios Stamatis, M. Salampasis, K. Diamantaras","doi":"10.1108/dta-06-2021-0156","DOIUrl":null,"url":null,"abstract":"PurposeIn federated search, a query is sent simultaneously to multiple resources and each one of them returns a list of results. These lists are merged into a single list using the results merging process. In this work, the authors apply machine learning methods for results merging in federated patent search. Even though several methods for results merging have been developed, none of them were tested on patent data nor considered several machine learning models. Thus, the authors experiment with state-of-the-art methods using patent data and they propose two new methods for results merging that use machine learning models.Design/methodology/approachThe methods are based on a centralized index containing samples of documents from all the remote resources, and they implement machine learning models to estimate comparable scores for the documents retrieved by different resources. The authors examine the new methods in cooperative and uncooperative settings where document scores from the remote search engines are available and not, respectively. In uncooperative environments, they propose two methods for assigning document scores.FindingsThe effectiveness of the new results merging methods was measured against state-of-the-art models and found to be superior to them in many cases with significant improvements. The random forest model achieves the best results in comparison to all other models and presents new insights for the results merging problem.Originality/valueIn this article the authors prove that machine learning models can substitute other standard methods and models that used for results merging for many years. Our methods outperformed state-of-the-art estimation methods for results merging, and they proved that they are more effective for federated patent search.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":" ","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Technologies and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1108/dta-06-2021-0156","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
PurposeIn federated search, a query is sent simultaneously to multiple resources and each one of them returns a list of results. These lists are merged into a single list using the results merging process. In this work, the authors apply machine learning methods for results merging in federated patent search. Even though several methods for results merging have been developed, none of them were tested on patent data nor considered several machine learning models. Thus, the authors experiment with state-of-the-art methods using patent data and they propose two new methods for results merging that use machine learning models.Design/methodology/approachThe methods are based on a centralized index containing samples of documents from all the remote resources, and they implement machine learning models to estimate comparable scores for the documents retrieved by different resources. The authors examine the new methods in cooperative and uncooperative settings where document scores from the remote search engines are available and not, respectively. In uncooperative environments, they propose two methods for assigning document scores.FindingsThe effectiveness of the new results merging methods was measured against state-of-the-art models and found to be superior to them in many cases with significant improvements. The random forest model achieves the best results in comparison to all other models and presents new insights for the results merging problem.Originality/valueIn this article the authors prove that machine learning models can substitute other standard methods and models that used for results merging for many years. Our methods outperformed state-of-the-art estimation methods for results merging, and they proved that they are more effective for federated patent search.