Calibrated recommendations are devoted to revealing the various preferences of users with the appropriate proportions in the recommendation list. Most of the existing calibrated-oriented recommendations take an extra postprocessing step to rerank the initial outputs. However, applying this postprocessing strategy may decrease the recommendation relevance, since the origin accurate outputs have been scattered, and they usually ignore the calibration between pairwise users/items. Instead of reranking the recommendation outputs, this article is dedicated to modifying the criterion of neighbour users’ selection, where we look forward to strengthening the recommendation relevance by calibrating the neighbourhood. We propose the first-order, second-order and the third-order calibration distance based on the motivation that if a user has a similar genre distribution or genre rating schema towards the target user, then his or her suggestions will be more useful for rating prediction. We also provide an equivalent transformation for the original method to speed up the algorithm with solid theoretical proof. Experimental analysis on two publicly available data sets empirically shows that our approaches are better than some of the state-of-the-art methods in terms of recommendation relevance, calibration and efficiency.
{"title":"Relevance meets calibration: Triple calibration distance design for neighbour-based recommender systems","authors":"Zhuang Chen, Haitao Zou, Hualong Yu, Shang Zheng, Shang Gao","doi":"10.1177/01655515231182069","DOIUrl":"https://doi.org/10.1177/01655515231182069","url":null,"abstract":"Calibrated recommendations are devoted to revealing the various preferences of users with the appropriate proportions in the recommendation list. Most of the existing calibrated-oriented recommendations take an extra postprocessing step to rerank the initial outputs. However, applying this postprocessing strategy may decrease the recommendation relevance, since the origin accurate outputs have been scattered, and they usually ignore the calibration between pairwise users/items. Instead of reranking the recommendation outputs, this article is dedicated to modifying the criterion of neighbour users’ selection, where we look forward to strengthening the recommendation relevance by calibrating the neighbourhood. We propose the first-order, second-order and the third-order calibration distance based on the motivation that if a user has a similar genre distribution or genre rating schema towards the target user, then his or her suggestions will be more useful for rating prediction. We also provide an equivalent transformation for the original method to speed up the algorithm with solid theoretical proof. Experimental analysis on two publicly available data sets empirically shows that our approaches are better than some of the state-of-the-art methods in terms of recommendation relevance, calibration and efficiency.","PeriodicalId":54796,"journal":{"name":"Journal of Information Science","volume":" ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45064146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-03DOI: 10.1177/01655515231182075
Zhang Xue, Zhiqiang Zhang, Zhengyin Hu
Interdisciplinary research has gradually become one of the main driving forces to promote original innovation of scientific research, and how to measure the interdisciplinarity of science project is becoming an important topic in the science foundation managements. Existing researches mainly using methods, such as academic degree or institutional discipline or discipline category mapping of journals, to measure the interdisciplinarity. This study proposes an approach to mine and capture the different or complementary characteristics of interdisciplinarity of projects by combining text mining and machine learning methods. First, we construct the classification system and extract a raw paper and its discipline matrix according to the discipline category of journals where the references were published in. Second, we cut the matrix to summarise the distribution of key disciplines in each paper and extract the text features in the abstract and title to form a training set. Finally, we compare and analyse the classification effects of Naive Bayesian Model, Support Vector Machine and Bidirectional Encoder Representations from Transformers (BERT) model. Then, the model evaluation indicators show that the best classification effect was achieved by the BERT model. Therefore, the deep pre-trained linguistic model BERT is chosen to predict the discipline distribution of each project. In addition, the different aspects of interdisciplinarity are measured using network coherence and discipline diversity indicators. Besides, experts are invited to evaluate and interpret the results. This proposed approach could be applied to deeply understand the discipline integration from a new perspective.
{"title":"Exploring interdisciplinarity of science projects based on the text mining","authors":"Zhang Xue, Zhiqiang Zhang, Zhengyin Hu","doi":"10.1177/01655515231182075","DOIUrl":"https://doi.org/10.1177/01655515231182075","url":null,"abstract":"Interdisciplinary research has gradually become one of the main driving forces to promote original innovation of scientific research, and how to measure the interdisciplinarity of science project is becoming an important topic in the science foundation managements. Existing researches mainly using methods, such as academic degree or institutional discipline or discipline category mapping of journals, to measure the interdisciplinarity. This study proposes an approach to mine and capture the different or complementary characteristics of interdisciplinarity of projects by combining text mining and machine learning methods. First, we construct the classification system and extract a raw paper and its discipline matrix according to the discipline category of journals where the references were published in. Second, we cut the matrix to summarise the distribution of key disciplines in each paper and extract the text features in the abstract and title to form a training set. Finally, we compare and analyse the classification effects of Naive Bayesian Model, Support Vector Machine and Bidirectional Encoder Representations from Transformers (BERT) model. Then, the model evaluation indicators show that the best classification effect was achieved by the BERT model. Therefore, the deep pre-trained linguistic model BERT is chosen to predict the discipline distribution of each project. In addition, the different aspects of interdisciplinarity are measured using network coherence and discipline diversity indicators. Besides, experts are invited to evaluate and interpret the results. This proposed approach could be applied to deeply understand the discipline integration from a new perspective.","PeriodicalId":54796,"journal":{"name":"Journal of Information Science","volume":" ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42368051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-03DOI: 10.1177/01655515231184832
Dhrubajyoti Borgohain, B. Lund, M. Verma
This article aims to establish the h-index in different mathematical aspects and measure the correlation of the h-index with other metrics using the bibliographic data of selected journals in the Library and Information Science (LIS) domain. Using the concept of relation and function, the h-index is expressed. Data collected from authors in three major LIS journals, including h-index, g-index, m-index, total citations and total publications, are analysed using correlation and regression analyses. The findings indicate a high level of relationship between h-index and g-index scores and m-index at starting year of publication. Conversely, a lower level of relationship between the h-index, g-index, starting year of publication, the total number of publications and the number of citations. This is an original study and will be of interest to the researchers of LIS who wants to know this performance indicator in different aspects, the dependency/independency of the h-index with other metrics, and the impact/performance of the three journals over time in terms of h-index, m-index, total citations and the number of publications. The study is limited to analysis of performance-and-citation-related measurements h-index, g-index, m-index, total citations, and the number of publications.
{"title":"A mathematical analysis of the h-index and study of its correlation with some improved metrics: A conceptual approach","authors":"Dhrubajyoti Borgohain, B. Lund, M. Verma","doi":"10.1177/01655515231184832","DOIUrl":"https://doi.org/10.1177/01655515231184832","url":null,"abstract":"This article aims to establish the h-index in different mathematical aspects and measure the correlation of the h-index with other metrics using the bibliographic data of selected journals in the Library and Information Science (LIS) domain. Using the concept of relation and function, the h-index is expressed. Data collected from authors in three major LIS journals, including h-index, g-index, m-index, total citations and total publications, are analysed using correlation and regression analyses. The findings indicate a high level of relationship between h-index and g-index scores and m-index at starting year of publication. Conversely, a lower level of relationship between the h-index, g-index, starting year of publication, the total number of publications and the number of citations. This is an original study and will be of interest to the researchers of LIS who wants to know this performance indicator in different aspects, the dependency/independency of the h-index with other metrics, and the impact/performance of the three journals over time in terms of h-index, m-index, total citations and the number of publications. The study is limited to analysis of performance-and-citation-related measurements h-index, g-index, m-index, total citations, and the number of publications.","PeriodicalId":54796,"journal":{"name":"Journal of Information Science","volume":" ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41619474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-30DOI: 10.1177/01655515231182073
Dhaval Kukreja, Shikha Gupta, Dheeraj Patel, J. Rai
Web 3.0 is a next-generation web architecture that envisions a more decentralised, secure and intelligent Internet. Its implications are vast and could potentially impact various research areas, such as e-commerce, social networking, finance, healthcare and education. It can be seen as a confluence of various technological advancements, including blockchain, artificial intelligence, semantic web and decentralised web technologies, which continue to attract substantial research interest in several dimensions and categories throughout the last few decades. A detailed scientometric analysis was undertaken to obtain concise understanding on development and publication trends of this multi-dimensional field. Corpus of 1154 articles, extracted from Web of Science from 2002 to 2022, were used to identify networks of co-authorship, keywords, subject categories, institutions and countries engaged in publishing on Web 3.0 along with co-citation and cluster analysis. Networks and interactive visualisations created using CiteSpace revealed new research areas where Web 3.0 may be beneficial and potential directions of development for Web 3.0 discipline. We identify Journalism 3.0, Personal Data Stores (PDS), Decentralised File Storages (DFS) and Metaverse as emerging domains Web 3.0 research, seeking overwhelming research attention globally.
Web 3.0是下一代网络架构,它设想了一个更加分散、安全和智能的互联网。它的影响是巨大的,可能会影响到各个研究领域,如电子商务、社交网络、金融、医疗保健和教育。它可以被看作是各种技术进步的融合,包括区块链、人工智能、语义网和分散的网络技术,在过去的几十年里,这些技术在几个维度和类别上继续吸引着大量的研究兴趣。本文进行了详细的科学计量分析,以获得对这一多维领域的发展和出版趋势的简明理解。利用2002 ~ 2022年Web of Science数据库中1154篇文章的语料库,对参与Web 3.0发表的合作作者网络、关键词、主题类别、机构和国家进行了识别,并进行了共被引和聚类分析。使用CiteSpace创建的网络和交互式可视化显示了Web 3.0可能有益的新研究领域和Web 3.0学科的潜在发展方向。我们确定新闻3.0,个人数据存储(PDS),分散文件存储(DFS)和元宇宙作为新兴领域的Web 3.0研究,寻求全球压倒性的研究关注。
{"title":"Scientometric review of Web 3.0","authors":"Dhaval Kukreja, Shikha Gupta, Dheeraj Patel, J. Rai","doi":"10.1177/01655515231182073","DOIUrl":"https://doi.org/10.1177/01655515231182073","url":null,"abstract":"Web 3.0 is a next-generation web architecture that envisions a more decentralised, secure and intelligent Internet. Its implications are vast and could potentially impact various research areas, such as e-commerce, social networking, finance, healthcare and education. It can be seen as a confluence of various technological advancements, including blockchain, artificial intelligence, semantic web and decentralised web technologies, which continue to attract substantial research interest in several dimensions and categories throughout the last few decades. A detailed scientometric analysis was undertaken to obtain concise understanding on development and publication trends of this multi-dimensional field. Corpus of 1154 articles, extracted from Web of Science from 2002 to 2022, were used to identify networks of co-authorship, keywords, subject categories, institutions and countries engaged in publishing on Web 3.0 along with co-citation and cluster analysis. Networks and interactive visualisations created using CiteSpace revealed new research areas where Web 3.0 may be beneficial and potential directions of development for Web 3.0 discipline. We identify Journalism 3.0, Personal Data Stores (PDS), Decentralised File Storages (DFS) and Metaverse as emerging domains Web 3.0 research, seeking overwhelming research attention globally.","PeriodicalId":54796,"journal":{"name":"Journal of Information Science","volume":" ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48033695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-26DOI: 10.1177/01655515231182076
Cheng Jiang, Yong-tian Yu, Xinyu Zhang
Although official departments attempt to intervene against misinformation, the personal field often conflicts with the goals of these departments. Thus, when rumours spread widely on social media, decision-makers often use a combination of rigid and soft control measures, such as blocking keywords, deleting misinformation, suspending accounts or refuting misinformation, to decrease the diffusion of misinformation. However, existing methods rarely consider the interplay of blocking and rebuttal measures, resulting in an unclear effect of the double intervention mechanism. To address these issues, we propose a novel misinformation diffusion model called SEIRI (susceptible, exposed, infective, removed, and infective) that considers the double intervention mechanism and secondary diffusion characteristics. We analyse the stability of the proposed model, obtain rumour-free and rumour-spread equilibriums, and calculate the basic reproduction number. Furthermore, we conduct numerical simulations to analyse the influence of key parameters through comparative experiments. Finally, we validate the effectiveness of the proposed approach by crawling a real-world data set of COVID-19-related misinformation tweets from Sina Weibo. Our comparison experiments with other similar works show that the SEIRI model provides superior performance in characterising the actual spread of misinformation. Our findings lead to several practical implications for public health policymaking.
{"title":"Modelling and analysis of misinformation diffusion based on the double intervention mechanism","authors":"Cheng Jiang, Yong-tian Yu, Xinyu Zhang","doi":"10.1177/01655515231182076","DOIUrl":"https://doi.org/10.1177/01655515231182076","url":null,"abstract":"Although official departments attempt to intervene against misinformation, the personal field often conflicts with the goals of these departments. Thus, when rumours spread widely on social media, decision-makers often use a combination of rigid and soft control measures, such as blocking keywords, deleting misinformation, suspending accounts or refuting misinformation, to decrease the diffusion of misinformation. However, existing methods rarely consider the interplay of blocking and rebuttal measures, resulting in an unclear effect of the double intervention mechanism. To address these issues, we propose a novel misinformation diffusion model called SEIRI (susceptible, exposed, infective, removed, and infective) that considers the double intervention mechanism and secondary diffusion characteristics. We analyse the stability of the proposed model, obtain rumour-free and rumour-spread equilibriums, and calculate the basic reproduction number. Furthermore, we conduct numerical simulations to analyse the influence of key parameters through comparative experiments. Finally, we validate the effectiveness of the proposed approach by crawling a real-world data set of COVID-19-related misinformation tweets from Sina Weibo. Our comparison experiments with other similar works show that the SEIRI model provides superior performance in characterising the actual spread of misinformation. Our findings lead to several practical implications for public health policymaking.","PeriodicalId":54796,"journal":{"name":"Journal of Information Science","volume":" ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42374534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-26DOI: 10.1177/01655515231176668
Ana Carolina Ribeiro, Amanda Sizo, Luís Paulo Reis
The assignment of appropriate reviewers to academic articles, known as the reviewer assignment problem (RAP), has become a crucial issue in academia. While there has been much research on RAP, there has not yet been a systematic literature review (SLR) examining the various approaches, techniques, algorithms and discoveries related to this topic. To conduct the SLR, we identified and evaluated relevant articles from four databases using defined inclusion and exclusion criteria. We analysed the selected articles and extracted information, and assessed their quality. Our review identified 67 articles on RAP published in conferences and journals up to mid-2022. As one of the main challenges in RAP is acquiring open data, we have studied the data sources used by researchers and found that most studies use real data from conferences, bibliographic databases and online academic search engines. RAP is divided into two main phases: (1) finding/recommending expert reviewers and (2) assigning reviewers to submitted manuscripts. In Phase 1, we have identified that decision support systems, recommendation systems, and machine learning-oriented approaches are more commonly used due to better results. In Phase 2, heuristics and metaheuristics are the approaches that present better results and are consequently more commonly used by researchers. Based on the analysed studies, we have identified potential areas for future research that could lead to improved results. Specifically, we suggest exploring the application of deep neural networks for calculating the degree of correspondence and using the Boolean satisfiability problem to optimise the attribution process.
{"title":"Investigating the reviewer assignment problem: A systematic literature review","authors":"Ana Carolina Ribeiro, Amanda Sizo, Luís Paulo Reis","doi":"10.1177/01655515231176668","DOIUrl":"https://doi.org/10.1177/01655515231176668","url":null,"abstract":"The assignment of appropriate reviewers to academic articles, known as the reviewer assignment problem (RAP), has become a crucial issue in academia. While there has been much research on RAP, there has not yet been a systematic literature review (SLR) examining the various approaches, techniques, algorithms and discoveries related to this topic. To conduct the SLR, we identified and evaluated relevant articles from four databases using defined inclusion and exclusion criteria. We analysed the selected articles and extracted information, and assessed their quality. Our review identified 67 articles on RAP published in conferences and journals up to mid-2022. As one of the main challenges in RAP is acquiring open data, we have studied the data sources used by researchers and found that most studies use real data from conferences, bibliographic databases and online academic search engines. RAP is divided into two main phases: (1) finding/recommending expert reviewers and (2) assigning reviewers to submitted manuscripts. In Phase 1, we have identified that decision support systems, recommendation systems, and machine learning-oriented approaches are more commonly used due to better results. In Phase 2, heuristics and metaheuristics are the approaches that present better results and are consequently more commonly used by researchers. Based on the analysed studies, we have identified potential areas for future research that could lead to improved results. Specifically, we suggest exploring the application of deep neural networks for calculating the degree of correspondence and using the Boolean satisfiability problem to optimise the attribution process.","PeriodicalId":54796,"journal":{"name":"Journal of Information Science","volume":" ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45537641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-22DOI: 10.1177/01655515231182067
Hou Jianhua, Yuyao He, Conglin Ye, Zhuomalamu, Suolanglamu, Song Haoyang
This study attempts to investigate the relationship between scientific collaboration variety and scientific output in a specific field. The indicators were set from co-author variety (co-author’s academic background variety and co-author’s network structure variety as independent variables) and article impact (academic impact and social impact as dependent variables). Considering other factors affecting the research results, we also set up control variables (the number of co-authors, proportion of high-level authors and ratio of highly productive authors). We used the Scopus database as the data source and collected all articles published in the dental field in 2018 as data. We used multiple linear regression analyses to examine the impact of co-authors’ variety on the article impact. The results demonstrate that the relationship between scientific collaboration variety and article impact is complicated, which depends on the type of variety of the cooperative scientist. Conversely, the same variety indicator presents the same results as the correlation analysis of academic and social impact articles. The findings indicate that authors can improve their scientific output by collaborating with similar authors of academic backgrounds or stable groups of authors, which provides guidance for scientific cooperation.
{"title":"Does scientific collaboration variety influence the impact of articles?","authors":"Hou Jianhua, Yuyao He, Conglin Ye, Zhuomalamu, Suolanglamu, Song Haoyang","doi":"10.1177/01655515231182067","DOIUrl":"https://doi.org/10.1177/01655515231182067","url":null,"abstract":"This study attempts to investigate the relationship between scientific collaboration variety and scientific output in a specific field. The indicators were set from co-author variety (co-author’s academic background variety and co-author’s network structure variety as independent variables) and article impact (academic impact and social impact as dependent variables). Considering other factors affecting the research results, we also set up control variables (the number of co-authors, proportion of high-level authors and ratio of highly productive authors). We used the Scopus database as the data source and collected all articles published in the dental field in 2018 as data. We used multiple linear regression analyses to examine the impact of co-authors’ variety on the article impact. The results demonstrate that the relationship between scientific collaboration variety and article impact is complicated, which depends on the type of variety of the cooperative scientist. Conversely, the same variety indicator presents the same results as the correlation analysis of academic and social impact articles. The findings indicate that authors can improve their scientific output by collaborating with similar authors of academic backgrounds or stable groups of authors, which provides guidance for scientific cooperation.","PeriodicalId":54796,"journal":{"name":"Journal of Information Science","volume":" ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43142108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-08DOI: 10.1177/01655515231171090
Haiying Ren, Lu Zhang, Chao Wang
Despite the fast pace of technological development, the process of inventive ideation remains fuzzy. Meanwhile, improving innovation efficiency has become critical for research and development (R&D) teams because of the fierce competition. This study claimed that new technical phrases (NTPs) were important carriers of novel inventive ideas, and their formation was key to understanding and improving ideation processes. Therefore, this article proposed a methodology to analyse and predict the formation of NTPs. First, based on the recombinant search theory and link prediction, four variables in the prior co-word network of a phrase that may influence its formation were collected. Thereafter, logistic regression and a classification tree were employed on patent data to explore the effects of these variables on NTPs. Moreover, various machine learning methods were used for developing NTP prediction models, and procedures for applying the prediction models in real-world R&D settings were designed. Finally, a case study was conducted using the proposed methodology for its demonstration and validation in neural network technology. The case study revealed that all the four variables posed significant impact on the formation of NTPs, and the prediction models yielded the highest prediction accuracy of 78.6% on the test set. The proposed methodology would shed light on the ideation process in innovation theory and provide R&D teams with practical tools for generating new technical ideas.
{"title":"Analysis and prediction of the formation of new technical phrases for inventive ideation","authors":"Haiying Ren, Lu Zhang, Chao Wang","doi":"10.1177/01655515231171090","DOIUrl":"https://doi.org/10.1177/01655515231171090","url":null,"abstract":"Despite the fast pace of technological development, the process of inventive ideation remains fuzzy. Meanwhile, improving innovation efficiency has become critical for research and development (R&D) teams because of the fierce competition. This study claimed that new technical phrases (NTPs) were important carriers of novel inventive ideas, and their formation was key to understanding and improving ideation processes. Therefore, this article proposed a methodology to analyse and predict the formation of NTPs. First, based on the recombinant search theory and link prediction, four variables in the prior co-word network of a phrase that may influence its formation were collected. Thereafter, logistic regression and a classification tree were employed on patent data to explore the effects of these variables on NTPs. Moreover, various machine learning methods were used for developing NTP prediction models, and procedures for applying the prediction models in real-world R&D settings were designed. Finally, a case study was conducted using the proposed methodology for its demonstration and validation in neural network technology. The case study revealed that all the four variables posed significant impact on the formation of NTPs, and the prediction models yielded the highest prediction accuracy of 78.6% on the test set. The proposed methodology would shed light on the ideation process in innovation theory and provide R&D teams with practical tools for generating new technical ideas.","PeriodicalId":54796,"journal":{"name":"Journal of Information Science","volume":" ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46199024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01DOI: 10.1177/01655515211027770
Ankita Dhar, Himadri Mukherjee, K. Roy, K. Santosh, N. Dash
The incredible expansion of online texts due to the Internet has intensified and revived the interest of sorting, managing and categorising the documents into their respective domains. This shows the pressing need for automatic text categorization system to assign a document into its appropriate domain. In this article, the focus is on showcasing the effectiveness of a hybrid approach that works elegantly by combining text-based and graph-based features. The hybrid approach was applied on 14,373 Bangla articles with 57,22,569 tokens collected from various online news corpora covering nine categories. This article also presents the individual application of both the features to explicate how they generally work. For classification purposes, the feature sets were passed through the Bayesian classification methods which yield satisfactory results with 98.73% accuracy for Naïve Bayes Multinomial (NBM). Also, to test the robustness and language independency of the system, the experiments were performed on two popular English datasets as well.
{"title":"Hybrid approach for text categorization: A case study with Bangla news article","authors":"Ankita Dhar, Himadri Mukherjee, K. Roy, K. Santosh, N. Dash","doi":"10.1177/01655515211027770","DOIUrl":"https://doi.org/10.1177/01655515211027770","url":null,"abstract":"The incredible expansion of online texts due to the Internet has intensified and revived the interest of sorting, managing and categorising the documents into their respective domains. This shows the pressing need for automatic text categorization system to assign a document into its appropriate domain. In this article, the focus is on showcasing the effectiveness of a hybrid approach that works elegantly by combining text-based and graph-based features. The hybrid approach was applied on 14,373 Bangla articles with 57,22,569 tokens collected from various online news corpora covering nine categories. This article also presents the individual application of both the features to explicate how they generally work. For classification purposes, the feature sets were passed through the Bayesian classification methods which yield satisfactory results with 98.73% accuracy for Naïve Bayes Multinomial (NBM). Also, to test the robustness and language independency of the system, the experiments were performed on two popular English datasets as well.","PeriodicalId":54796,"journal":{"name":"Journal of Information Science","volume":"49 1","pages":"762 - 777"},"PeriodicalIF":2.4,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46025373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-26DOI: 10.1177/01655515231171085
David Otero, Javier Parapar, Álvaro Barreiro
Offline evaluation of information retrieval systems depends on test collections. These datasets provide the researchers with a corpus of documents, topics and relevance judgements indicating which documents are relevant for each topic. Gathering the latter is costly, requiring human assessors to judge the documents. Therefore, experts usually judge only a portion of the corpus. The most common approach for selecting that subset is pooling. By intelligently choosing which documents to assess, it is possible to optimise the number of positive labels for a given budget. For this reason, much work has focused on developing techniques to better select which documents from the corpus merit human assessments. In this article, we propose using relevance feedback to prioritise the documents when building new pooled test collections. We explore several state-of-the-art statistical feedback methods for prioritising the documents the algorithm presents to the assessors. A thorough comparison on eight Text Retrieval Conference (TREC) datasets against strong baselines shows that, among other results, our proposals improve in retrieving relevant documents with lower assessment effort than other state-of-the-art adjudicating methods without harming the reliability, fairness and reusability.
{"title":"Relevance feedback for building pooled test collections","authors":"David Otero, Javier Parapar, Álvaro Barreiro","doi":"10.1177/01655515231171085","DOIUrl":"https://doi.org/10.1177/01655515231171085","url":null,"abstract":"Offline evaluation of information retrieval systems depends on test collections. These datasets provide the researchers with a corpus of documents, topics and relevance judgements indicating which documents are relevant for each topic. Gathering the latter is costly, requiring human assessors to judge the documents. Therefore, experts usually judge only a portion of the corpus. The most common approach for selecting that subset is pooling. By intelligently choosing which documents to assess, it is possible to optimise the number of positive labels for a given budget. For this reason, much work has focused on developing techniques to better select which documents from the corpus merit human assessments. In this article, we propose using relevance feedback to prioritise the documents when building new pooled test collections. We explore several state-of-the-art statistical feedback methods for prioritising the documents the algorithm presents to the assessors. A thorough comparison on eight Text Retrieval Conference (TREC) datasets against strong baselines shows that, among other results, our proposals improve in retrieving relevant documents with lower assessment effort than other state-of-the-art adjudicating methods without harming the reliability, fairness and reusability.","PeriodicalId":54796,"journal":{"name":"Journal of Information Science","volume":" ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41335507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}