{"title":"A model to address the cold-start in peer recommendation by using k-means clustering and sentence embedding","authors":"Deepika Shukla, C. Ravindranath Chowdary","doi":"10.1016/j.jocs.2024.102465","DOIUrl":null,"url":null,"abstract":"<div><div>In academia, research collaboration plays a vital role in enhancing the research quality and enriching the academic profile of the authors. Recommending appropriate collaborators from a vast scholarly database, particularly for newcomers, poses a challenging cold-start problem. This study addresses a cold-start problem in peer recommendation, considering a dynamic coauthorship graph as a network structure of academic collaborators. As the coauthorship graph is quite large and complex, an efficient indexing method is essential for speeding up the initial search of similar coauthors. The study introduces an efficient Global Inverted List <span><math><mrow><mo>(</mo><mi>G</mi><mi>I</mi><mi>L</mi><mo>)</mo></mrow></math></span> for indexing research areas and active authors in the coauthorship network. An attribute-based search and filtering mechanism is proposed to identify relevant collaborators, followed by the application of k-means clustering and doc2vec metrics to rank and select top recommendations. A cold user is associated with attributes that identify coauthors with similar research interests. For each attribute of the cold user, the model searches the associated authors from the GIL. Further, two filtering approaches are applied to refine the retrieved author list. The first ensures that the authors have a significant presence in the specified research areas, whereas the second one helps avoid recommending authors with only superficial connections to the cold user. The model creates a feature matrix of filtered authors using the publication features of authors. The k-means clustering applied to the feature matrix generates <span><math><mi>k</mi></math></span> clusters, among which the model chooses only those with seed nodes i.e. the clusters which are having seed nodes are selected for further process. Selected clusters are ranked using doc2vec metrics, with the top-ranked cluster providing the final recommendation. The model recommends the top <span><math><mi>L</mi></math></span> members of the selected cluster, where <span><math><mi>L</mi></math></span> is the length of the recommendations provided to the new user. Our extensive experiments show the efficacy of the proposed model.</div></div>","PeriodicalId":48907,"journal":{"name":"Journal of Computational Science","volume":"83 ","pages":"Article 102465"},"PeriodicalIF":3.1000,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational Science","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1877750324002588","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
In academia, research collaboration plays a vital role in enhancing the research quality and enriching the academic profile of the authors. Recommending appropriate collaborators from a vast scholarly database, particularly for newcomers, poses a challenging cold-start problem. This study addresses a cold-start problem in peer recommendation, considering a dynamic coauthorship graph as a network structure of academic collaborators. As the coauthorship graph is quite large and complex, an efficient indexing method is essential for speeding up the initial search of similar coauthors. The study introduces an efficient Global Inverted List for indexing research areas and active authors in the coauthorship network. An attribute-based search and filtering mechanism is proposed to identify relevant collaborators, followed by the application of k-means clustering and doc2vec metrics to rank and select top recommendations. A cold user is associated with attributes that identify coauthors with similar research interests. For each attribute of the cold user, the model searches the associated authors from the GIL. Further, two filtering approaches are applied to refine the retrieved author list. The first ensures that the authors have a significant presence in the specified research areas, whereas the second one helps avoid recommending authors with only superficial connections to the cold user. The model creates a feature matrix of filtered authors using the publication features of authors. The k-means clustering applied to the feature matrix generates clusters, among which the model chooses only those with seed nodes i.e. the clusters which are having seed nodes are selected for further process. Selected clusters are ranked using doc2vec metrics, with the top-ranked cluster providing the final recommendation. The model recommends the top members of the selected cluster, where is the length of the recommendations provided to the new user. Our extensive experiments show the efficacy of the proposed model.
在学术界,研究合作对提高研究质量和丰富作者的学术形象起着至关重要的作用。从庞大的学术数据库中推荐合适的合作者,尤其是对于新人来说,是一个具有挑战性的冷启动问题。本研究将动态共同作者图视为学术合作者的网络结构,从而解决了同行推荐中的冷启动问题。由于共同作者图谱相当庞大和复杂,高效的索引方法对于加快相似共同作者的初始搜索至关重要。本研究介绍了一种高效的全局反向列表(GIL),用于索引共同作者网络中的研究领域和活跃作者。研究提出了一种基于属性的搜索和过滤机制来识别相关的合作者,然后应用 k-means 聚类和 doc2vec 指标来排列和选择顶级推荐。冷用户与识别具有相似研究兴趣的合作者的属性相关联。对于冷用户的每个属性,模型都会从 GIL 中搜索相关作者。此外,还采用了两种过滤方法来完善检索到的作者列表。第一种方法确保作者在指定的研究领域有重要影响力,而第二种方法则有助于避免推荐与冷用户只有表面联系的作者。该模型利用作者的发表特征创建了筛选作者的特征矩阵。对特征矩阵进行 k-means 聚类生成 k 个聚类,模型只选择其中有种子节点的聚类,即选择有种子节点的聚类进行下一步处理。选定的聚类使用 doc2vec 指标进行排名,排名靠前的聚类提供最终推荐。该模型推荐所选簇中排名前 L 的成员,其中 L 是向新用户提供的推荐的长度。我们的大量实验证明了所建议模型的有效性。
期刊介绍:
Computational Science is a rapidly growing multi- and interdisciplinary field that uses advanced computing and data analysis to understand and solve complex problems. It has reached a level of predictive capability that now firmly complements the traditional pillars of experimentation and theory.
The recent advances in experimental techniques such as detectors, on-line sensor networks and high-resolution imaging techniques, have opened up new windows into physical and biological processes at many levels of detail. The resulting data explosion allows for detailed data driven modeling and simulation.
This new discipline in science combines computational thinking, modern computational methods, devices and collateral technologies to address problems far beyond the scope of traditional numerical methods.
Computational science typically unifies three distinct elements:
• Modeling, Algorithms and Simulations (e.g. numerical and non-numerical, discrete and continuous);
• Software developed to solve science (e.g., biological, physical, and social), engineering, medicine, and humanities problems;
• Computer and information science that develops and optimizes the advanced system hardware, software, networking, and data management components (e.g. problem solving environments).