Abstract Purpose Due to the incompleteness nature of knowledge graphs (KGs), the task of predicting missing links between entities becomes important. Many previous approaches are static, this posed a notable problem that all meanings of a polysemous entity share one embedding vector. This study aims to propose a polysemous embedding approach, named KG embedding under relational contexts (ContE for short), for missing link prediction. Design/methodology/approach ContE models and infers different relationship patterns by considering the context of the relationship, which is implicit in the local neighborhood of the relationship. The forward and backward impacts of the relationship in ContE are mapped to two different embedding vectors, which represent the contextual information of the relationship. Then, according to the position of the entity, the entity's polysemous representation is obtained by adding its static embedding vector to the corresponding context vector of the relationship. Findings ContE is a fully expressive, that is, given any ground truth over the triples, there are embedding assignments to entities and relations that can precisely separate the true triples from false ones. ContE is capable of modeling four connectivity patterns such as symmetry, antisymmetry, inversion and composition. Research limitations ContE needs to do a grid search to find best parameters to get best performance in practice, which is a time-consuming task. Sometimes, it requires longer entity vectors to get better performance than some other models. Practical implications ContE is a bilinear model, which is a quite simple model that could be applied to large-scale KGs. By considering contexts of relations, ContE can distinguish the exact meaning of an entity in different triples so that when performing compositional reasoning, it is capable to infer the connectivity patterns of relations and achieves good performance on link prediction tasks. Originality/value ContE considers the contexts of entities in terms of their positions in triples and the relationships they link to. It decomposes a relation vector into two vectors, namely, forward impact vector and backward impact vector in order to capture the relational contexts. ContE has the same low computational complexity as TransE. Therefore, it provides a new approach for contextualized knowledge graph embedding.
{"title":"Learning Context-based Embeddings for Knowledge Graph Completion","authors":"Fei Pu, Zhongwei Zhang, Yangde Feng, Bailin Yang","doi":"10.2478/jdis-2022-0009","DOIUrl":"https://doi.org/10.2478/jdis-2022-0009","url":null,"abstract":"Abstract Purpose Due to the incompleteness nature of knowledge graphs (KGs), the task of predicting missing links between entities becomes important. Many previous approaches are static, this posed a notable problem that all meanings of a polysemous entity share one embedding vector. This study aims to propose a polysemous embedding approach, named KG embedding under relational contexts (ContE for short), for missing link prediction. Design/methodology/approach ContE models and infers different relationship patterns by considering the context of the relationship, which is implicit in the local neighborhood of the relationship. The forward and backward impacts of the relationship in ContE are mapped to two different embedding vectors, which represent the contextual information of the relationship. Then, according to the position of the entity, the entity's polysemous representation is obtained by adding its static embedding vector to the corresponding context vector of the relationship. Findings ContE is a fully expressive, that is, given any ground truth over the triples, there are embedding assignments to entities and relations that can precisely separate the true triples from false ones. ContE is capable of modeling four connectivity patterns such as symmetry, antisymmetry, inversion and composition. Research limitations ContE needs to do a grid search to find best parameters to get best performance in practice, which is a time-consuming task. Sometimes, it requires longer entity vectors to get better performance than some other models. Practical implications ContE is a bilinear model, which is a quite simple model that could be applied to large-scale KGs. By considering contexts of relations, ContE can distinguish the exact meaning of an entity in different triples so that when performing compositional reasoning, it is capable to infer the connectivity patterns of relations and achieves good performance on link prediction tasks. Originality/value ContE considers the contexts of entities in terms of their positions in triples and the relationships they link to. It decomposes a relation vector into two vectors, namely, forward impact vector and backward impact vector in order to capture the relational contexts. ContE has the same low computational complexity as TransE. Therefore, it provides a new approach for contextualized knowledge graph embedding.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"7 1","pages":"84 - 106"},"PeriodicalIF":0.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45530191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Thelwall, E. Stuart, Amalia Más-Bleda, Meiko Makita, Mahshid Abdoli
Abstract Purpose Performers may generate loyalty partly through eliciting illusory personal connections with their audience, parasocial relationships (PSRs), and individual illusory exchanges, parasocial interactions (PSIs). On social media, semi-PSIs are real but imbalanced exchanges with audiences, including through comments on influencers’ videos, and strong semi-PSIs are those that occur within PSRs. This article introduces and assesses an automatic method to detect videos with strong PSI potential. Design/methodology/approach Strong semi-PSIs were hypothesized to occur when commenters used a variant of the pronoun “you”, typically addressing the influencer. Comments on the videos of UK female influencer channels were used to test whether the proportion of you pronoun comments could be an automated indicator of strong PSI potential, and to find factors associating with the strong PSI potential of influencer videos. The highest and lowest strong PSI potential videos for 117 influencers were classified with content analysis for strong PSI potential and evidence of factors that might elicit PSIs. Findings The you pronoun proportion was effective at indicating video strong PSI potential, the first automated method to detect any type of PSI. Gazing at the camera, head and shoulders framing, discussing personal issues, and focusing on the influencer associated with higher strong PSI potential for influencer videos. New social media factors found include requesting feedback and discussing the channel itself. Research limitations Only one country, genre and social media platform was analysed. Practical implications The method can be used to automatically detect YouTube videos with strong PSI potential, helping influencers to monitor their performance. Originality/value This is the first automatic method to detect any aspect of PSI or PSR.
{"title":"I’m Nervous about Sharing This Secret with You: Youtube Influencers Generate Strong Parasocial Interactions by Discussing Personal Issues","authors":"M. Thelwall, E. Stuart, Amalia Más-Bleda, Meiko Makita, Mahshid Abdoli","doi":"10.2478/jdis-2022-0011","DOIUrl":"https://doi.org/10.2478/jdis-2022-0011","url":null,"abstract":"Abstract Purpose Performers may generate loyalty partly through eliciting illusory personal connections with their audience, parasocial relationships (PSRs), and individual illusory exchanges, parasocial interactions (PSIs). On social media, semi-PSIs are real but imbalanced exchanges with audiences, including through comments on influencers’ videos, and strong semi-PSIs are those that occur within PSRs. This article introduces and assesses an automatic method to detect videos with strong PSI potential. Design/methodology/approach Strong semi-PSIs were hypothesized to occur when commenters used a variant of the pronoun “you”, typically addressing the influencer. Comments on the videos of UK female influencer channels were used to test whether the proportion of you pronoun comments could be an automated indicator of strong PSI potential, and to find factors associating with the strong PSI potential of influencer videos. The highest and lowest strong PSI potential videos for 117 influencers were classified with content analysis for strong PSI potential and evidence of factors that might elicit PSIs. Findings The you pronoun proportion was effective at indicating video strong PSI potential, the first automated method to detect any type of PSI. Gazing at the camera, head and shoulders framing, discussing personal issues, and focusing on the influencer associated with higher strong PSI potential for influencer videos. New social media factors found include requesting feedback and discussing the channel itself. Research limitations Only one country, genre and social media platform was analysed. Practical implications The method can be used to automatically detect YouTube videos with strong PSI potential, helping influencers to monitor their performance. Originality/value This is the first automatic method to detect any aspect of PSI or PSR.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"7 1","pages":"31 - 56"},"PeriodicalIF":0.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43886029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sichao Tong, Zhesi Shen, Tianyuan Huang, Liying Yang
{"title":"Fighting Against Academic Misconduct: What Can Scientometricians Do?","authors":"Sichao Tong, Zhesi Shen, Tianyuan Huang, Liying Yang","doi":"10.2478/jdis-2022-0013","DOIUrl":"https://doi.org/10.2478/jdis-2022-0013","url":null,"abstract":"","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"7 1","pages":"4 - 5"},"PeriodicalIF":0.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43053860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"I Don’t Peer-Review for Non-Open Journals, and Neither Should You","authors":"Michael P. Taylor","doi":"10.2478/jdis-2022-0010","DOIUrl":"https://doi.org/10.2478/jdis-2022-0010","url":null,"abstract":"","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"7 1","pages":"1 - 3"},"PeriodicalIF":0.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41991224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Purpose Based on real-world academic data, this study aims to use network embedding technology to mining academic relationships, and investigate the effectiveness of the proposed embedding model on academic collaborator recommendation tasks. Design/methodology/approach We propose an academic collaborator recommendation model based on attributed network embedding (ACR-ANE), which can get enhanced scholar embedding and take full advantage of the topological structure of the network and multi-type scholar attributes. The non-local neighbors for scholars are defined to capture strong relationships among scholars. A deep auto-encoder is adopted to encode the academic collaboration network structure and scholar attributes into a low-dimensional representation space. Findings 1. The proposed non-local neighbors can better describe the relationships among scholars in the real world than the first-order neighbors. 2. It is important to consider the structure of the academic collaboration network and scholar attributes when recommending collaborators for scholars simultaneously. Research limitations The designed method works for static networks, without taking account of the network dynamics. Practical implications The designed model is embedded in academic collaboration network structure and scholarly attributes, which can be used to help scholars recommend potential collaborators. Originality/value Experiments on two real-world scholarly datasets, Aminer and APS, show that our proposed method performs better than other baselines.
{"title":"Academic Collaborator Recommendation Based on Attributed Network Embedding","authors":"Ouxia Du, Ya Li","doi":"10.2478/jdis-2022-0005","DOIUrl":"https://doi.org/10.2478/jdis-2022-0005","url":null,"abstract":"Abstract Purpose Based on real-world academic data, this study aims to use network embedding technology to mining academic relationships, and investigate the effectiveness of the proposed embedding model on academic collaborator recommendation tasks. Design/methodology/approach We propose an academic collaborator recommendation model based on attributed network embedding (ACR-ANE), which can get enhanced scholar embedding and take full advantage of the topological structure of the network and multi-type scholar attributes. The non-local neighbors for scholars are defined to capture strong relationships among scholars. A deep auto-encoder is adopted to encode the academic collaboration network structure and scholar attributes into a low-dimensional representation space. Findings 1. The proposed non-local neighbors can better describe the relationships among scholars in the real world than the first-order neighbors. 2. It is important to consider the structure of the academic collaboration network and scholar attributes when recommending collaborators for scholars simultaneously. Research limitations The designed method works for static networks, without taking account of the network dynamics. Practical implications The designed model is embedded in academic collaboration network structure and scholarly attributes, which can be used to help scholars recommend potential collaborators. Originality/value Experiments on two real-world scholarly datasets, Aminer and APS, show that our proposed method performs better than other baselines.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"7 1","pages":"37 - 56"},"PeriodicalIF":0.0,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42912037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Purpose This study explores the underlying research topics regarding CRISPR based on the LDA model and figures out trends in knowledge transfer from science to technology in this area over the latest 10 years. Design/methodology/approach We collected publications on CRISPR between 2011 and 2020 from the Web of Science, and traced all the patents citing them from lens.org. 15,904 articles and 18,985 patents in total are downloaded and analyzed. The LDA model was applied to identify underlying research topics in related research. In addition, some indicators were introduced to measure the knowledge transfer from research topics of scientific publications to IPC-4 classes of patents. Findings The emerging research topics on CRISPR were identified and their evolution over time displayed. Furthermore, a big picture of knowledge transition from research topics to technological classes of patents was presented. We found that for all topics on CRISPR, the average first transition year, the ratio of articles cited by patents, the NPR transition rate are respectively 1.08, 15.57%, and 1.19, extremely shorter and more intensive than those of general fields. Moreover, the transition patterns are different among research topics. Research limitations Our research is limited to publications retrieved from the Web of Science and their citing patents indexed in lens.org. A limitation inherent with LDA analysis is in the manual interpretation and labeling of “topics”. Practical implications Our study provides good references for policy-makers on allocating scientific resources and regulating financial budgets to face challenges related to the transformative technology of CRISPR. Originality/value The LDA model here is applied to topic identification in the area of transformative researches for the first time, as exemplified on CRISPR. Additionally, the dataset of all citing patents in this area helps to provide a full picture to detect the knowledge transition between S&T.
{"title":"Progress and Knowledge Transfer from Science to Technology in the Research Frontier of CRISPR Based on the LDA Model","authors":"Yushuang Lyu, Muqi Yin, Fangjie Xi, Xiaojun Hu","doi":"10.2478/jdis-2022-0004","DOIUrl":"https://doi.org/10.2478/jdis-2022-0004","url":null,"abstract":"Abstract Purpose This study explores the underlying research topics regarding CRISPR based on the LDA model and figures out trends in knowledge transfer from science to technology in this area over the latest 10 years. Design/methodology/approach We collected publications on CRISPR between 2011 and 2020 from the Web of Science, and traced all the patents citing them from lens.org. 15,904 articles and 18,985 patents in total are downloaded and analyzed. The LDA model was applied to identify underlying research topics in related research. In addition, some indicators were introduced to measure the knowledge transfer from research topics of scientific publications to IPC-4 classes of patents. Findings The emerging research topics on CRISPR were identified and their evolution over time displayed. Furthermore, a big picture of knowledge transition from research topics to technological classes of patents was presented. We found that for all topics on CRISPR, the average first transition year, the ratio of articles cited by patents, the NPR transition rate are respectively 1.08, 15.57%, and 1.19, extremely shorter and more intensive than those of general fields. Moreover, the transition patterns are different among research topics. Research limitations Our research is limited to publications retrieved from the Web of Science and their citing patents indexed in lens.org. A limitation inherent with LDA analysis is in the manual interpretation and labeling of “topics”. Practical implications Our study provides good references for policy-makers on allocating scientific resources and regulating financial budgets to face challenges related to the transformative technology of CRISPR. Originality/value The LDA model here is applied to topic identification in the area of transformative researches for the first time, as exemplified on CRISPR. Additionally, the dataset of all citing patents in this area helps to provide a full picture to detect the knowledge transition between S&T.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"7 1","pages":"1 - 19"},"PeriodicalIF":0.0,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48706809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pablo Dorta-Gonz'alez, Mar'ia Isabel Dorta-Gonz'alez
Abstract Purpose Researchers are more likely to read and cite papers to which they have access than those that they cannot obtain. Thus, the objective of this work is to analyze the contribution of the Open Access (OA) modality to the impact of hybrid journals. Design/methodology/approach The “research articles” in the year 2017 from 200 hybrid journals in four subject areas, and the citations received by such articles in the period 2017–2020 in the Scopus database, were analyzed. The hybrid OA papers were compared with the paywalled ones. The journals were randomly selected from those with share of OA papers higher than some minimal value. More than 60 thousand research articles were analyzed in the sample, of which 24% under the OA modality. Findings We obtain at journal level that cites per article in both hybrid modalities (OA and paywalled) strongly correlate. However, there is no correlation between the OA prevalence and cites per article. There is OA citation advantage in 80% of hybrid journals. Moreover, the OA citation advantage is consistent across fields and held in time. We obtain an OA citation advantage of 50% in average, and higher than 37% in half of the hybrid journals. Finally, the OA citation advantage is higher in Humanities than in Science and Social Science. Research limitations Some of the citation advantage is likely due to more access allows more people to read and hence cite articles they otherwise would not. However, causation is difficult to establish and there are many possible bias. Several factors can affect the observed differences in citation rates. Funder mandates can be one of them. Funders are likely to have OA requirement, and well-funded studies are more likely to receive more citations than poorly funded studies. Another discussed factor is the selection bias postulate, which suggests that authors choose only their most impactful studies to be open access. Practical implications For hybrid journals, the open access modality is positive, in the sense that it provides a greater number of potential readers. This in turn translates into a greater number of citations and an improvement in the position of the journal in the rankings by impact factor. For researchers it is also positive because it increases the potential number of readers and citations received. Originality/value Our study refines previous results by comparing documents more similar to each other. Although it does not examine the cause of the observed citation advantage, we find that it exists in a very large sample.
{"title":"Contribution of the Open Access Modality to the Impact of Hybrid Journals Controlling by Field and Time Effects","authors":"Pablo Dorta-Gonz'alez, Mar'ia Isabel Dorta-Gonz'alez","doi":"10.2478/jdis-2022-0007","DOIUrl":"https://doi.org/10.2478/jdis-2022-0007","url":null,"abstract":"Abstract Purpose Researchers are more likely to read and cite papers to which they have access than those that they cannot obtain. Thus, the objective of this work is to analyze the contribution of the Open Access (OA) modality to the impact of hybrid journals. Design/methodology/approach The “research articles” in the year 2017 from 200 hybrid journals in four subject areas, and the citations received by such articles in the period 2017–2020 in the Scopus database, were analyzed. The hybrid OA papers were compared with the paywalled ones. The journals were randomly selected from those with share of OA papers higher than some minimal value. More than 60 thousand research articles were analyzed in the sample, of which 24% under the OA modality. Findings We obtain at journal level that cites per article in both hybrid modalities (OA and paywalled) strongly correlate. However, there is no correlation between the OA prevalence and cites per article. There is OA citation advantage in 80% of hybrid journals. Moreover, the OA citation advantage is consistent across fields and held in time. We obtain an OA citation advantage of 50% in average, and higher than 37% in half of the hybrid journals. Finally, the OA citation advantage is higher in Humanities than in Science and Social Science. Research limitations Some of the citation advantage is likely due to more access allows more people to read and hence cite articles they otherwise would not. However, causation is difficult to establish and there are many possible bias. Several factors can affect the observed differences in citation rates. Funder mandates can be one of them. Funders are likely to have OA requirement, and well-funded studies are more likely to receive more citations than poorly funded studies. Another discussed factor is the selection bias postulate, which suggests that authors choose only their most impactful studies to be open access. Practical implications For hybrid journals, the open access modality is positive, in the sense that it provides a greater number of potential readers. This in turn translates into a greater number of citations and an improvement in the position of the journal in the rankings by impact factor. For researchers it is also positive because it increases the potential number of readers and citations received. Originality/value Our study refines previous results by comparing documents more similar to each other. Although it does not examine the cause of the observed citation advantage, we find that it exists in a very large sample.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"7 1","pages":"57 - 83"},"PeriodicalIF":0.0,"publicationDate":"2022-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44573569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Purpose With the availability and utilization of Inter-Country Input-Output (ICIO) tables, it is possible to construct quantitative indices to assess its impact on the Global Value Chain (GVC). For the sake of visualization, ICIO networks with tremendous low- weight edges are too dense to show the substantial structure. These redundant edges, inevitably make the network data full of noise and eventually exert negative effects on Social Network Analysis (SNA). In this case, we need a method to filter such edges and obtain a sparser network with only the meaningful connections. Design/methodology/approach In this paper, we propose two parameterless pruning algorithms from the global and local perspectives respectively, then the performance of them is examined using the ICIO table from different databases. Findings The Searching Paths (SP) method extracts the strongest association paths from the global perspective, while Filtering Edges (FE) method captures the key links according to the local weight ratio. The results show that the FE method can basically include the SP method and become the best solution for the ICIO networks. Research limitations There are still two limitations in this research. One is that the computational complexity may increase rapidly while processing the large-scale networks, so the proposed method should be further improved. The other is that much more empirical networks should be introduced to testify the scientificity and practicability of our methodology. Practical implications The network pruning methods we proposed will promote the analysis of the ICIO network, in terms of community detection, link prediction, and spatial econometrics, etc. Also, they can be applied to many other complex networks with similar characteristics. Originality/value This paper improves the existing research from two aspects, namely, considering the heterogeneity of weights and avoiding the interference of parameters. Therefore, it provides a new idea for the research of network backbone extraction.
{"title":"Parameterless Pruning Algorithms for Similarity-Weight Network and Its Application in Extracting the Backbone of Global Value Chain","authors":"Lizhi Xing, Yuanqing Han","doi":"10.2478/jdis-2022-0002","DOIUrl":"https://doi.org/10.2478/jdis-2022-0002","url":null,"abstract":"Abstract Purpose With the availability and utilization of Inter-Country Input-Output (ICIO) tables, it is possible to construct quantitative indices to assess its impact on the Global Value Chain (GVC). For the sake of visualization, ICIO networks with tremendous low- weight edges are too dense to show the substantial structure. These redundant edges, inevitably make the network data full of noise and eventually exert negative effects on Social Network Analysis (SNA). In this case, we need a method to filter such edges and obtain a sparser network with only the meaningful connections. Design/methodology/approach In this paper, we propose two parameterless pruning algorithms from the global and local perspectives respectively, then the performance of them is examined using the ICIO table from different databases. Findings The Searching Paths (SP) method extracts the strongest association paths from the global perspective, while Filtering Edges (FE) method captures the key links according to the local weight ratio. The results show that the FE method can basically include the SP method and become the best solution for the ICIO networks. Research limitations There are still two limitations in this research. One is that the computational complexity may increase rapidly while processing the large-scale networks, so the proposed method should be further improved. The other is that much more empirical networks should be introduced to testify the scientificity and practicability of our methodology. Practical implications The network pruning methods we proposed will promote the analysis of the ICIO network, in terms of community detection, link prediction, and spatial econometrics, etc. Also, they can be applied to many other complex networks with similar characteristics. Originality/value This paper improves the existing research from two aspects, namely, considering the heterogeneity of weights and avoiding the interference of parameters. Therefore, it provides a new idea for the research of network backbone extraction.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"7 1","pages":"57 - 75"},"PeriodicalIF":0.0,"publicationDate":"2021-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46397038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Minh-Hoang Nguyen, N. Huyen, Manh-Toan Ho, T. Le, Q. Vuong
Abstract Purpose The open-access (OA) publishing model can help improve researchers’ outreach, thanks to its accessibility and visibility to the public. Therefore, the presentation of female researchers can benefit from the OA publishing model. Despite that, little is known about how gender affects OA practices. Thus, the current study explores the effects of female involvement and risk aversion on OA publishing patterns among Vietnamese social sciences and humanities. Design/methodology/approach The study employed Bayesian Mindsponge Framework (BMF) on a dataset of 3,122 Vietnamese social sciences and humanities (SS&H) publications during 2008–2019. The Mindsponge mechanism was specifically used to construct theoretical models, while Bayesian inference was utilized for fitting models. Findings The result showed a positive association between female participation and OA publishing probability. However, the positive effect of female involvement on OA publishing probability was negated by the high ratio of female researchers in a publication. OA status was negatively associated with the JIF of the journal in which the publication was published, but the relationship was moderated by the involvement of a female researcher(s). The findings suggested that Vietnamese female researchers might be more likely to publish under the OA model in journals with high JIF for avoiding the risk of public criticism. Research limitations The study could only provide evidence on the association between female involvement and OA publishing probability. However, whether to publish under OA terms is often determined by the first or corresponding authors, but not necessarily gender-based. Practical implications Systematically coordinated actions are suggested to better support women and promote the OA movement in Vietnam. Originality/value The findings show the OA publishing patterns of female researchers in Vietnamese SS&H.
{"title":"The Roles of Female Involvement and Risk Aversion in Open Access Publishing Patterns in Vietnamese Social Sciences and Humanities","authors":"Minh-Hoang Nguyen, N. Huyen, Manh-Toan Ho, T. Le, Q. Vuong","doi":"10.2478/jdis-2022-0001","DOIUrl":"https://doi.org/10.2478/jdis-2022-0001","url":null,"abstract":"Abstract Purpose The open-access (OA) publishing model can help improve researchers’ outreach, thanks to its accessibility and visibility to the public. Therefore, the presentation of female researchers can benefit from the OA publishing model. Despite that, little is known about how gender affects OA practices. Thus, the current study explores the effects of female involvement and risk aversion on OA publishing patterns among Vietnamese social sciences and humanities. Design/methodology/approach The study employed Bayesian Mindsponge Framework (BMF) on a dataset of 3,122 Vietnamese social sciences and humanities (SS&H) publications during 2008–2019. The Mindsponge mechanism was specifically used to construct theoretical models, while Bayesian inference was utilized for fitting models. Findings The result showed a positive association between female participation and OA publishing probability. However, the positive effect of female involvement on OA publishing probability was negated by the high ratio of female researchers in a publication. OA status was negatively associated with the JIF of the journal in which the publication was published, but the relationship was moderated by the involvement of a female researcher(s). The findings suggested that Vietnamese female researchers might be more likely to publish under the OA model in journals with high JIF for avoiding the risk of public criticism. Research limitations The study could only provide evidence on the association between female involvement and OA publishing probability. However, whether to publish under OA terms is often determined by the first or corresponding authors, but not necessarily gender-based. Practical implications Systematically coordinated actions are suggested to better support women and promote the OA movement in Vietnam. Originality/value The findings show the OA publishing patterns of female researchers in Vietnamese SS&H.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"7 1","pages":"76 - 96"},"PeriodicalIF":0.0,"publicationDate":"2021-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43088303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Purpose Social media users share their ideas, thoughts, and emotions with other users. However, it is not clear how online users would respond to new research outcomes. This study aims to predict the nature of the emotions expressed by Twitter users toward scientific publications. Additionally, we investigate what features of the research articles help in such prediction. Identifying the sentiments of research articles on social media will help scientists gauge a new societal impact of their research articles. Design/methodology/approach Several tools are used for sentiment analysis, so we applied five sentiment analysis tools to check which are suitable for capturing a tweet's sentiment value and decided to use NLTK VADER and TextBlob. We segregated the sentiment value into negative, positive, and neutral. We measure the mean and median of tweets’ sentiment value for research articles with more than one tweet. We next built machine learning models to predict the sentiments of tweets related to scientific publications and investigated the essential features that controlled the prediction models. Findings We found that the most important feature in all the models was the sentiment of the research article title followed by the author count. We observed that the tree-based models performed better than other classification models, with Random Forest achieving 89% accuracy for binary classification and 73% accuracy for three-label classification. Research limitations In this research, we used state-of-the-art sentiment analysis libraries. However, these libraries might vary at times in their sentiment prediction behavior. Tweet sentiment may be influenced by a multitude of circumstances and is not always immediately tied to the paper's details. In the future, we intend to broaden the scope of our research by employing word2vec models. Practical implications Many studies have focused on understanding the impact of science on scientists or how science communicators can improve their outcomes. Research in this area has relied on fewer and more limited measures, such as citations and user studies with small datasets. There is currently a critical need to find novel methods to quantify and evaluate the broader impact of research. This study will help scientists better comprehend the emotional impact of their work. Additionally, the value of understanding the public's interest and reactions helps science communicators identify effective ways to engage with the public and build positive connections between scientific communities and the public. Originality/value This study will extend work on public engagement with science, sociology of science, and computational social science. It will enable researchers to identify areas in which there is a gap between public and expert understanding and provide strategies by which this gap can be bridged.
{"title":"Public Reaction to Scientific Research via Twitter Sentiment Prediction","authors":"Murtuza Shahzad, Hamed Alhoori","doi":"10.2478/jdis-2022-0003","DOIUrl":"https://doi.org/10.2478/jdis-2022-0003","url":null,"abstract":"Abstract Purpose Social media users share their ideas, thoughts, and emotions with other users. However, it is not clear how online users would respond to new research outcomes. This study aims to predict the nature of the emotions expressed by Twitter users toward scientific publications. Additionally, we investigate what features of the research articles help in such prediction. Identifying the sentiments of research articles on social media will help scientists gauge a new societal impact of their research articles. Design/methodology/approach Several tools are used for sentiment analysis, so we applied five sentiment analysis tools to check which are suitable for capturing a tweet's sentiment value and decided to use NLTK VADER and TextBlob. We segregated the sentiment value into negative, positive, and neutral. We measure the mean and median of tweets’ sentiment value for research articles with more than one tweet. We next built machine learning models to predict the sentiments of tweets related to scientific publications and investigated the essential features that controlled the prediction models. Findings We found that the most important feature in all the models was the sentiment of the research article title followed by the author count. We observed that the tree-based models performed better than other classification models, with Random Forest achieving 89% accuracy for binary classification and 73% accuracy for three-label classification. Research limitations In this research, we used state-of-the-art sentiment analysis libraries. However, these libraries might vary at times in their sentiment prediction behavior. Tweet sentiment may be influenced by a multitude of circumstances and is not always immediately tied to the paper's details. In the future, we intend to broaden the scope of our research by employing word2vec models. Practical implications Many studies have focused on understanding the impact of science on scientists or how science communicators can improve their outcomes. Research in this area has relied on fewer and more limited measures, such as citations and user studies with small datasets. There is currently a critical need to find novel methods to quantify and evaluate the broader impact of research. This study will help scientists better comprehend the emotional impact of their work. Additionally, the value of understanding the public's interest and reactions helps science communicators identify effective ways to engage with the public and build positive connections between scientific communities and the public. Originality/value This study will extend work on public engagement with science, sociology of science, and computational social science. It will enable researchers to identify areas in which there is a gap between public and expert understanding and provide strategies by which this gap can be bridged.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"7 1","pages":"97 - 124"},"PeriodicalIF":0.0,"publicationDate":"2021-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42492680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}