Pub Date : 2024-07-25DOI: 10.1177/01655515241263272
Muhammad Asim, Muhammad Arif, Muhammad Rafiq
This study aims to synthesise the findings of research on cloud computing adoption and use in libraries. This systematic literature review is based on Preferred Reporting Items for Systematic Reviews and Meta-Analyses method and comprises publications in the English language, published in the four world-renowned databases. This study identified various cloud computing practices, including library automation systems on clouds, email services, applications of social media, cloud storage (Dropbox), consortium services, digital library and file sharing. The libraries adopted cloud computing due to cost-effectiveness, storage facility, ease-to-use, flexibility and scalability, time-saving, lack of in-house skill set and ubiquitous nature of the technologies. Several factors, for example, security issues, privacy of data, slow Internet connectivity and high subscription rate affect the adoption of cloud computing. The critical adoption, usage factors and various challenges identified would provide valuable insight to library professionals to decide how to employ cloud-based practices to offer innovative services in academic libraries.
{"title":"Adoption and uses of cloud computing in academic libraries: A systematic literature","authors":"Muhammad Asim, Muhammad Arif, Muhammad Rafiq","doi":"10.1177/01655515241263272","DOIUrl":"https://doi.org/10.1177/01655515241263272","url":null,"abstract":"This study aims to synthesise the findings of research on cloud computing adoption and use in libraries. This systematic literature review is based on Preferred Reporting Items for Systematic Reviews and Meta-Analyses method and comprises publications in the English language, published in the four world-renowned databases. This study identified various cloud computing practices, including library automation systems on clouds, email services, applications of social media, cloud storage (Dropbox), consortium services, digital library and file sharing. The libraries adopted cloud computing due to cost-effectiveness, storage facility, ease-to-use, flexibility and scalability, time-saving, lack of in-house skill set and ubiquitous nature of the technologies. Several factors, for example, security issues, privacy of data, slow Internet connectivity and high subscription rate affect the adoption of cloud computing. The critical adoption, usage factors and various challenges identified would provide valuable insight to library professionals to decide how to employ cloud-based practices to offer innovative services in academic libraries.","PeriodicalId":54796,"journal":{"name":"Journal of Information Science","volume":"12 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141785715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-30DOI: 10.1177/01655515241244462
Alex J Yang, Star X Zhao, Sanhong Deng
Delayed recognition, exemplified by the phenomenon of sleeping beauties, presents a compelling narrative within the dynamics of scientific impact and innovation. Our investigation delves into the nuanced facets of delayed acknowledgement, uncovering its profound implications and innovation pathways. Through the analysis of extensive datasets and advanced methodologies, we elucidate the intricate connections between delayed recognition and the realms of scientific and technological influence. Our study not only reveals correlations between atypical combinations of knowledge and the emergence of sleeping beauties but also sheds light on the relationship between delayed recognition and disruptive paradigm shifts in scientific evolution, suggesting their potential role in shaping scientific breakthroughs. Furthermore, our analysis highlights the journey of delayed recognition, often culminating in significant contributions across diverse fields, including notable achievements, such as Nobel-worthy milestones. This article advances our understanding of scientific evolution and the complex landscape of acknowledging pioneering research.
{"title":"Revisiting delayed recognition in science: A large-scale and comprehensive study","authors":"Alex J Yang, Star X Zhao, Sanhong Deng","doi":"10.1177/01655515241244462","DOIUrl":"https://doi.org/10.1177/01655515241244462","url":null,"abstract":"Delayed recognition, exemplified by the phenomenon of sleeping beauties, presents a compelling narrative within the dynamics of scientific impact and innovation. Our investigation delves into the nuanced facets of delayed acknowledgement, uncovering its profound implications and innovation pathways. Through the analysis of extensive datasets and advanced methodologies, we elucidate the intricate connections between delayed recognition and the realms of scientific and technological influence. Our study not only reveals correlations between atypical combinations of knowledge and the emergence of sleeping beauties but also sheds light on the relationship between delayed recognition and disruptive paradigm shifts in scientific evolution, suggesting their potential role in shaping scientific breakthroughs. Furthermore, our analysis highlights the journey of delayed recognition, often culminating in significant contributions across diverse fields, including notable achievements, such as Nobel-worthy milestones. This article advances our understanding of scientific evolution and the complex landscape of acknowledging pioneering research.","PeriodicalId":54796,"journal":{"name":"Journal of Information Science","volume":"42 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141188614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-08DOI: 10.1177/01655515241227533
Nursabrina Abdul Jalil, Suraya Hamid
Billions of web searches are recorded every day; however, little is known about the types of financial information that users search for. While many studies have investigated information exchanges in financial forums, this is the first study to identify the financial information needs of Internet users in Malaysia through an analysis of search queries. We identified financial topics and discovered subtopics of interest using text mining. We found that topics with high search volume were related to financial products and services, and very little was related to concepts and information that would increase financial knowledge. The results of this study can be used to develop more strategic online financial education content that not only meets users’ financial information needs but also increases their financial knowledge, especially when the financial knowledge of the global population has remained low over the years.
{"title":"What financial topics do people search for? An analysis of search queries using text mining","authors":"Nursabrina Abdul Jalil, Suraya Hamid","doi":"10.1177/01655515241227533","DOIUrl":"https://doi.org/10.1177/01655515241227533","url":null,"abstract":"Billions of web searches are recorded every day; however, little is known about the types of financial information that users search for. While many studies have investigated information exchanges in financial forums, this is the first study to identify the financial information needs of Internet users in Malaysia through an analysis of search queries. We identified financial topics and discovered subtopics of interest using text mining. We found that topics with high search volume were related to financial products and services, and very little was related to concepts and information that would increase financial knowledge. The results of this study can be used to develop more strategic online financial education content that not only meets users’ financial information needs but also increases their financial knowledge, especially when the financial knowledge of the global population has remained low over the years.","PeriodicalId":54796,"journal":{"name":"Journal of Information Science","volume":"12 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140931481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-29DOI: 10.1177/01655515241244497
Somnath Bhattacharya, Shankar Prawesh
News reading is an important social activity and to help readers quickly find news articles of their interest, news content providers and aggregators use recommender systems. Such systems are designed to address a variety of challenges. Inspiration for algorithmic design is taken from various domains which has resulted in the creation of an enormous body of literature. Also, different methods are used for evaluation of the recommendation algorithms. In this study, we review these developments and present three major components in news recommendation research. First, we list and categorise the challenges faced while designing news recommender systems. We especially list the different algorithmic designs used for generating personalised and non-personalised recommendations. We discuss the major neural network architectures that are being increasingly used for both collaborative and content-based recommender systems. Next, we list the two major evaluation methods and also list some popular datasets used in evaluation. Finally, we identify the emerging trends in news recommender research. We find that the issues related to fake news, trust and use of personal data for news recommendation are gaining wider attention, and deep learning methods are being increasingly used to address these issues.
{"title":"A review of challenges, algorithms and evaluation methods in news recommendation","authors":"Somnath Bhattacharya, Shankar Prawesh","doi":"10.1177/01655515241244497","DOIUrl":"https://doi.org/10.1177/01655515241244497","url":null,"abstract":"News reading is an important social activity and to help readers quickly find news articles of their interest, news content providers and aggregators use recommender systems. Such systems are designed to address a variety of challenges. Inspiration for algorithmic design is taken from various domains which has resulted in the creation of an enormous body of literature. Also, different methods are used for evaluation of the recommendation algorithms. In this study, we review these developments and present three major components in news recommendation research. First, we list and categorise the challenges faced while designing news recommender systems. We especially list the different algorithmic designs used for generating personalised and non-personalised recommendations. We discuss the major neural network architectures that are being increasingly used for both collaborative and content-based recommender systems. Next, we list the two major evaluation methods and also list some popular datasets used in evaluation. Finally, we identify the emerging trends in news recommender research. We find that the issues related to fake news, trust and use of personal data for news recommendation are gaining wider attention, and deep learning methods are being increasingly used to address these issues.","PeriodicalId":54796,"journal":{"name":"Journal of Information Science","volume":"68 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140840052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-29DOI: 10.1177/01655515241238405
Jaihyun Park, JungHwan Yang, Amanda Tolbert, Katherine Bunsold
This study examines code-switching behaviours of cross-platform social media users specifically between Twitter and Parler during the 2020 US Presidential Election. Utilising social identity theory as a framework, we examine messages related to voter fraud by users who migrated from Twitter to Parler following Twitter bans. Our analysis covers 38,798 accounts active on both platforms, analysing 1.5 million tweets and more than 100,000 parleys. The key findings of the study are as follows: First, we discovered differing levels of network homophily between high degree centrality and low-degree centrality cross-platform users, illustrating how individuals with varying degrees of influence engage differently across platforms. Second, we observed higher toxicity levels in heterogeneous networks, which include both in-group and out-group members, compared with homogeneous networks that are primarily composed of in-group members. This suggests the level of toxicity in online spaces correlates with the level of group diversity. Third, we found that cross-platform users created distinctive discourse community with in-group and out-group members, indicating that content and discussions within these networks are influenced by the social identity dynamics of the users. Our study contributes to the current research in political communication and information science by proposing comparative user analyses across multiple social media platforms. Focusing on a critical period of platform transition during a contentious political event, our study offers insights into the dynamics of online communities and the shifting nature of political language used by social media users.
{"title":"You change the way you talk: Examining the network, toxicity and discourse of cross-platform users on Twitter and Parler during the 2020 US Presidential Election","authors":"Jaihyun Park, JungHwan Yang, Amanda Tolbert, Katherine Bunsold","doi":"10.1177/01655515241238405","DOIUrl":"https://doi.org/10.1177/01655515241238405","url":null,"abstract":"This study examines code-switching behaviours of cross-platform social media users specifically between Twitter and Parler during the 2020 US Presidential Election. Utilising social identity theory as a framework, we examine messages related to voter fraud by users who migrated from Twitter to Parler following Twitter bans. Our analysis covers 38,798 accounts active on both platforms, analysing 1.5 million tweets and more than 100,000 parleys. The key findings of the study are as follows: First, we discovered differing levels of network homophily between high degree centrality and low-degree centrality cross-platform users, illustrating how individuals with varying degrees of influence engage differently across platforms. Second, we observed higher toxicity levels in heterogeneous networks, which include both in-group and out-group members, compared with homogeneous networks that are primarily composed of in-group members. This suggests the level of toxicity in online spaces correlates with the level of group diversity. Third, we found that cross-platform users created distinctive discourse community with in-group and out-group members, indicating that content and discussions within these networks are influenced by the social identity dynamics of the users. Our study contributes to the current research in political communication and information science by proposing comparative user analyses across multiple social media platforms. Focusing on a critical period of platform transition during a contentious political event, our study offers insights into the dynamics of online communities and the shifting nature of political language used by social media users.","PeriodicalId":54796,"journal":{"name":"Journal of Information Science","volume":"105 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140839789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-20DOI: 10.1177/01655515241245952
Henrik Karlstrøm, Dag W Aksnes, Fredrik N Piro
The main objective of the open access (OA) movement is to make scientific literature freely available to everyone. This may be of particular importance to researchers in lower-income countries, who often face barriers due to high subscription costs. In this article, we address this issue by analysing over time the reference lists of scientific publications around the world. Our study focuses on key issues, including whether researchers from lower-income countries reference fewer publications in their research and how this trend evolves over time. We also investigate whether researchers from lower-income countries rely more on the literature that is openly available through different OA routes compared with other researchers. Our study revealed that the proportion of OA references has increased over time for all publications and country groups. However, publications from lower-income countries have seen a higher growth rate of OA-based references, suggesting that the emergence of OA publishing has been particularly advantageous to researchers in these countries.
开放存取(OA)运动的主要目标是让每个人都能免费获取科学文献。这可能对低收入国家的研究人员尤为重要,因为他们往往面临高昂的订阅费用带来的障碍。在本文中,我们通过分析世界各地科学出版物的参考文献列表来解决这一问题。我们的研究聚焦于一些关键问题,包括低收入国家的研究人员在其研究中参考的出版物是否较少,以及这一趋势是如何随时间演变的。我们还调查了与其他研究人员相比,低收入国家的研究人员是否更依赖于通过不同 OA 途径公开提供的文献。我们的研究表明,随着时间的推移,所有出版物和国家组的 OA 引用比例都在增加。然而,低收入国家的出版物基于 OA 的引用增长率更高,这表明 OA 出版的出现对这些国家的研究人员特别有利。
{"title":"Benefits of open access to researchers from lower-income countries: A global analysis of reference patterns in 1980–2020","authors":"Henrik Karlstrøm, Dag W Aksnes, Fredrik N Piro","doi":"10.1177/01655515241245952","DOIUrl":"https://doi.org/10.1177/01655515241245952","url":null,"abstract":"The main objective of the open access (OA) movement is to make scientific literature freely available to everyone. This may be of particular importance to researchers in lower-income countries, who often face barriers due to high subscription costs. In this article, we address this issue by analysing over time the reference lists of scientific publications around the world. Our study focuses on key issues, including whether researchers from lower-income countries reference fewer publications in their research and how this trend evolves over time. We also investigate whether researchers from lower-income countries rely more on the literature that is openly available through different OA routes compared with other researchers. Our study revealed that the proportion of OA references has increased over time for all publications and country groups. However, publications from lower-income countries have seen a higher growth rate of OA-based references, suggesting that the emergence of OA publishing has been particularly advantageous to researchers in these countries.","PeriodicalId":54796,"journal":{"name":"Journal of Information Science","volume":"123 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140626153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-09DOI: 10.1177/01655515241244460
Jorge Revez, Luís Corujo
How are scientists coping with misinformation and disinformation? Focusing on the triangle scientists/mis-disinformation/behaviour, this study aims to systematically review the literature to answer three research questions: What are the main approaches described in the literature concerning scientists’ behaviour towards mis-disinformation? Which techniques or strategies are discussed to tackle information disorder? Is there a research gap in including scientists as subjects of research projects concerning information disorder tackling strategies? Following PRISMA 2020 statement, a checklist and flow diagram for reporting systematic reviews, a set of 14 documents was analysed. Findings revealed that the literature might be interpreted following Wilson and Maceviciute’s model as creation, acceptance and dissemination categories. Crossing over these categories, we advanced three standing points to analyse scientists’ positions towards mis-disinformation: inside, inside-out and outside-in. The stage ‘Creation/facilitation’ was the least present in our sample, but ‘Use/rejection/acceptance’ and ‘Dissemination’ were depicted in the literature retrieved. Most of the literature approaches were about inside-out perspectives, meaning that the topic is mainly studied concerning communication issues. Regarding the strategies against the information disorder, findings suggest that preventive and reactive strategies are simultaneously used. A strong appeal to a multidisciplinary effort against mis-disinformation is widely present, but there is a gap in including scientists as subjects of research projects.
{"title":"Scientists’ behaviour towards information disorder: A systematic review","authors":"Jorge Revez, Luís Corujo","doi":"10.1177/01655515241244460","DOIUrl":"https://doi.org/10.1177/01655515241244460","url":null,"abstract":"How are scientists coping with misinformation and disinformation? Focusing on the triangle scientists/mis-disinformation/behaviour, this study aims to systematically review the literature to answer three research questions: What are the main approaches described in the literature concerning scientists’ behaviour towards mis-disinformation? Which techniques or strategies are discussed to tackle information disorder? Is there a research gap in including scientists as subjects of research projects concerning information disorder tackling strategies? Following PRISMA 2020 statement, a checklist and flow diagram for reporting systematic reviews, a set of 14 documents was analysed. Findings revealed that the literature might be interpreted following Wilson and Maceviciute’s model as creation, acceptance and dissemination categories. Crossing over these categories, we advanced three standing points to analyse scientists’ positions towards mis-disinformation: inside, inside-out and outside-in. The stage ‘Creation/facilitation’ was the least present in our sample, but ‘Use/rejection/acceptance’ and ‘Dissemination’ were depicted in the literature retrieved. Most of the literature approaches were about inside-out perspectives, meaning that the topic is mainly studied concerning communication issues. Regarding the strategies against the information disorder, findings suggest that preventive and reactive strategies are simultaneously used. A strong appeal to a multidisciplinary effort against mis-disinformation is widely present, but there is a gap in including scientists as subjects of research projects.","PeriodicalId":54796,"journal":{"name":"Journal of Information Science","volume":"25 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140602413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-26DOI: 10.1177/01655515241238401
Ziqiang Zeng, Cuicui Jia, Weiye Zhang, Xinyi Zhuang, Xuan Li
Journal assessment is of great significance to promote the development of academic platforms. Citation analysis is a recognised tool to assess the citation performance of journals. However, due to the shortcomings such as the inflated Journal Impact Factor and heterogeneity of the citations, additional dimensions are necessary to be considered to balance with the article citation. This article constructs a three-dimensional journal assessment framework to measure the comprehensive influence of a journal based on article citation, author and institution influence. The CRITIC-Entropy weighting method is employed to calculate the weighted average scores for the three dimensions, respectively. Then, a newly defined Pareto-dominated set-based sum of TOPSIS score (PDS-based STS) approach is developed to assess the comprehensive influence of journals. A sample of 76 journals in Economics field is selected to demonstrate the effectiveness and validity of the proposed assessment method. The Chartered Association of Business Schools’ Academic Journal Guide 2021 (CABS-AJG 2021) which is an expert-based journal rating is chosen as a baseline model. It has been found that the assessment method using PDS-based STS shows a more rational journal ranking than that based on the Journal Impact Factor if using the CABS-AJG 2021’s ratings as the benchmark model.
{"title":"Assessing journals through a three-dimensional framework based on article citation, author and institution influence","authors":"Ziqiang Zeng, Cuicui Jia, Weiye Zhang, Xinyi Zhuang, Xuan Li","doi":"10.1177/01655515241238401","DOIUrl":"https://doi.org/10.1177/01655515241238401","url":null,"abstract":"Journal assessment is of great significance to promote the development of academic platforms. Citation analysis is a recognised tool to assess the citation performance of journals. However, due to the shortcomings such as the inflated Journal Impact Factor and heterogeneity of the citations, additional dimensions are necessary to be considered to balance with the article citation. This article constructs a three-dimensional journal assessment framework to measure the comprehensive influence of a journal based on article citation, author and institution influence. The CRITIC-Entropy weighting method is employed to calculate the weighted average scores for the three dimensions, respectively. Then, a newly defined Pareto-dominated set-based sum of TOPSIS score (PDS-based STS) approach is developed to assess the comprehensive influence of journals. A sample of 76 journals in Economics field is selected to demonstrate the effectiveness and validity of the proposed assessment method. The Chartered Association of Business Schools’ Academic Journal Guide 2021 (CABS-AJG 2021) which is an expert-based journal rating is chosen as a baseline model. It has been found that the assessment method using PDS-based STS shows a more rational journal ranking than that based on the Journal Impact Factor if using the CABS-AJG 2021’s ratings as the benchmark model.","PeriodicalId":54796,"journal":{"name":"Journal of Information Science","volume":"11 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140312224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-21DOI: 10.1177/01655515241230793
Farid Uddin, Yibo Chen, Zuping Zhang, Xin Huang
Modelling short text is challenging due to the small number of word co-occurrence and insufficient semantic information that affects downstream Natural Language Processing (NLP) tasks, for example, text classification. Gathering information from external sources is expensive and may increase noise. For efficient short text classification without depending on external knowledge sources, we propose Expressive Short text Classification (EStC). EStC consists of a novel document context-aware semantically enriched topic model called the Short text Topic Model (StTM) that captures words, topics and documents semantics in a joint learning framework. In StTM, the probability of predicting a context word involves the topic distribution of word embeddings and the document vector as the global context, which obtains by weighted averaging of word embeddings on the fly simultaneously with the topic distribution of words without requiring an additional inference method for the document embedding. EStC represents documents in an expressive (number of topics × number of word embedding features) embedding space and uses a linear support vector machine (SVM) classifier for their classification. Experimental results demonstrate that EStC outperforms many state-of-the-art language models in short text classification using several publicly available short text data sets.
短文本建模具有挑战性,因为短文本中词的共现数量少,语义信息不足,会影响下游的自然语言处理(NLP)任务,例如文本分类。从外部收集信息不仅成本高昂,而且可能会增加噪音。为了在不依赖外部知识源的情况下实现高效的短文分类,我们提出了 "表达式短文分类"(Expressive Short text Classification,简称 EStC)。EStC 包含一个新颖的文档上下文感知语义丰富主题模型,称为短文主题模型(Stort text Topic Model,StTM),它在一个联合学习框架中捕捉单词、主题和文档语义。在 StTM 中,预测上下文单词的概率涉及单词嵌入的主题分布和作为全局上下文的文档向量,而全局上下文是通过单词嵌入的加权平均和单词的主题分布同时获得的,不需要对文档嵌入采用额外的推理方法。EStC 在一个富有表现力(主题数×词嵌入特征数)的嵌入空间中表示文档,并使用线性支持向量机(SVM)分类器对文档进行分类。实验结果表明,在使用几个公开的短文本数据集进行短文本分类时,EStC 的表现优于许多最先进的语言模型。
{"title":"Short text classification using semantically enriched topic model","authors":"Farid Uddin, Yibo Chen, Zuping Zhang, Xin Huang","doi":"10.1177/01655515241230793","DOIUrl":"https://doi.org/10.1177/01655515241230793","url":null,"abstract":"Modelling short text is challenging due to the small number of word co-occurrence and insufficient semantic information that affects downstream Natural Language Processing (NLP) tasks, for example, text classification. Gathering information from external sources is expensive and may increase noise. For efficient short text classification without depending on external knowledge sources, we propose Expressive Short text Classification (EStC). EStC consists of a novel document context-aware semantically enriched topic model called the Short text Topic Model (StTM) that captures words, topics and documents semantics in a joint learning framework. In StTM, the probability of predicting a context word involves the topic distribution of word embeddings and the document vector as the global context, which obtains by weighted averaging of word embeddings on the fly simultaneously with the topic distribution of words without requiring an additional inference method for the document embedding. EStC represents documents in an expressive (number of topics × number of word embedding features) embedding space and uses a linear support vector machine (SVM) classifier for their classification. Experimental results demonstrate that EStC outperforms many state-of-the-art language models in short text classification using several publicly available short text data sets.","PeriodicalId":54796,"journal":{"name":"Journal of Information Science","volume":"20 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140196251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-11DOI: 10.1177/01655515241233742
Jiawei Peng, Yong Mi, Zhenwen Ren, Yu Kang
Multi-view clustering (MVC) has gained promising performance improvement compared with traditional signal-view clustering due to the complementary information of multiple views. However, existing MVC methods exploit clustering structure by utilising signal-layer mapping, such that they cannot exploit the underlying deep-level semantic information in complex and interleaved multi-view data. Moreover, existing methods usually conduct multi-view fusion and clustering separately, which results in unpromising performance. To address the above problems, one-step MVC via deep-level semantics exploiting (DLSE) is proposed to exploit deep-level semantic information and learn the indicator matrix using a one-step manner. To be specific, a novel deep matrix factorisation (DMF) paradigm is designed to exploit the hierarchical semantics via a layer-wise scheme, so that samples from the same clusters are forced to be closer in the low-dimensional space layer by layer. Furthermore, to make the learned representation preserve the local geometric structure of data, DLSE introduces a local preservation regularisation to guide DMF. Meanwhile, by employing spectral rotating fusion, the cluster indicator can be obtained directly. Extensive experiments demonstrate the superiority of DLSE in contrast with some state-of-the-art methods.
{"title":"One-step multi-view clustering via deep-level semantics exploiting","authors":"Jiawei Peng, Yong Mi, Zhenwen Ren, Yu Kang","doi":"10.1177/01655515241233742","DOIUrl":"https://doi.org/10.1177/01655515241233742","url":null,"abstract":"Multi-view clustering (MVC) has gained promising performance improvement compared with traditional signal-view clustering due to the complementary information of multiple views. However, existing MVC methods exploit clustering structure by utilising signal-layer mapping, such that they cannot exploit the underlying deep-level semantic information in complex and interleaved multi-view data. Moreover, existing methods usually conduct multi-view fusion and clustering separately, which results in unpromising performance. To address the above problems, one-step MVC via deep-level semantics exploiting (DLSE) is proposed to exploit deep-level semantic information and learn the indicator matrix using a one-step manner. To be specific, a novel deep matrix factorisation (DMF) paradigm is designed to exploit the hierarchical semantics via a layer-wise scheme, so that samples from the same clusters are forced to be closer in the low-dimensional space layer by layer. Furthermore, to make the learned representation preserve the local geometric structure of data, DLSE introduces a local preservation regularisation to guide DMF. Meanwhile, by employing spectral rotating fusion, the cluster indicator can be obtained directly. Extensive experiments demonstrate the superiority of DLSE in contrast with some state-of-the-art methods.","PeriodicalId":54796,"journal":{"name":"Journal of Information Science","volume":"41 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140106125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}