Journal of data and information science (Warsaw, Poland)最新文献_第4页

Learning Context-based Embeddings for Knowledge Graph Completion 基于学习上下文的知识图嵌入

Journal of data and information science (Warsaw, Poland)

Pub Date : 2022-04-01 DOI: 10.2478/jdis-2022-0009

Fei Pu, Zhongwei Zhang, Yangde Feng, Bailin Yang

Abstract Purpose Due to the incompleteness nature of knowledge graphs (KGs), the task of predicting missing links between entities becomes important. Many previous approaches are static, this posed a notable problem that all meanings of a polysemous entity share one embedding vector. This study aims to propose a polysemous embedding approach, named KG embedding under relational contexts (ContE for short), for missing link prediction. Design/methodology/approach ContE models and infers different relationship patterns by considering the context of the relationship, which is implicit in the local neighborhood of the relationship. The forward and backward impacts of the relationship in ContE are mapped to two different embedding vectors, which represent the contextual information of the relationship. Then, according to the position of the entity, the entity's polysemous representation is obtained by adding its static embedding vector to the corresponding context vector of the relationship. Findings ContE is a fully expressive, that is, given any ground truth over the triples, there are embedding assignments to entities and relations that can precisely separate the true triples from false ones. ContE is capable of modeling four connectivity patterns such as symmetry, antisymmetry, inversion and composition. Research limitations ContE needs to do a grid search to find best parameters to get best performance in practice, which is a time-consuming task. Sometimes, it requires longer entity vectors to get better performance than some other models. Practical implications ContE is a bilinear model, which is a quite simple model that could be applied to large-scale KGs. By considering contexts of relations, ContE can distinguish the exact meaning of an entity in different triples so that when performing compositional reasoning, it is capable to infer the connectivity patterns of relations and achieves good performance on link prediction tasks. Originality/value ContE considers the contexts of entities in terms of their positions in triples and the relationships they link to. It decomposes a relation vector into two vectors, namely, forward impact vector and backward impact vector in order to capture the relational contexts. ContE has the same low computational complexity as TransE. Therefore, it provides a new approach for contextualized knowledge graph embedding.

摘要目的由于知识图的不完全性，预测实体之间缺失链接的任务变得很重要。以前的许多方法都是静态的，这带来了一个显著的问题，即一个多义实体的所有含义共享一个嵌入向量。本研究旨在提出一种用于缺失链接预测的多义词嵌入方法，称为关系上下文下的KG嵌入（简称ContE）。设计/方法论/方法通过考虑关系的上下文来建模和推断不同的关系模式，这隐含在关系的局部邻域中。ContE中关系的前向和后向影响被映射到两个不同的嵌入向量，这两个向量表示关系的上下文信息。然后，根据实体的位置，通过将实体的静态嵌入向量添加到关系的相应上下文向量中，获得实体的多义词表示。发现ContE是一个完全表达的，也就是说，给定三元组上的任何基本事实，存在对实体和关系的嵌入赋值，可以精确地将真三元组与假三元组区分开来。ContE能够对四种连通性模式进行建模，如对称性、反对称性、反演和合成。研究限制ContE需要进行网格搜索以找到最佳参数，从而在实践中获得最佳性能，这是一项耗时的任务。有时，它需要更长的实体向量才能获得比其他一些模型更好的性能。实际含义ContE是一个双线性模型，这是一个非常简单的模型，可以应用于大规模KGs。通过考虑关系的上下文，ContE可以区分不同三元组中实体的确切含义，以便在执行组合推理时，它能够推断关系的连接模式，并在链路预测任务中获得良好的性能。独创性/价值ContE根据实体在三元组中的位置及其链接的关系来考虑实体的上下文。它将关系向量分解为两个向量，即前向影响向量和后向影响向量，以捕捉关系上下文。ContE具有与TransE相同的低计算复杂度。因此，它为上下文化知识图嵌入提供了一种新的方法。

{"title":"Learning Context-based Embeddings for Knowledge Graph Completion","authors":"Fei Pu, Zhongwei Zhang, Yangde Feng, Bailin Yang","doi":"10.2478/jdis-2022-0009","DOIUrl":"https://doi.org/10.2478/jdis-2022-0009","url":null,"abstract":"Abstract Purpose Due to the incompleteness nature of knowledge graphs (KGs), the task of predicting missing links between entities becomes important. Many previous approaches are static, this posed a notable problem that all meanings of a polysemous entity share one embedding vector. This study aims to propose a polysemous embedding approach, named KG embedding under relational contexts (ContE for short), for missing link prediction. Design/methodology/approach ContE models and infers different relationship patterns by considering the context of the relationship, which is implicit in the local neighborhood of the relationship. The forward and backward impacts of the relationship in ContE are mapped to two different embedding vectors, which represent the contextual information of the relationship. Then, according to the position of the entity, the entity's polysemous representation is obtained by adding its static embedding vector to the corresponding context vector of the relationship. Findings ContE is a fully expressive, that is, given any ground truth over the triples, there are embedding assignments to entities and relations that can precisely separate the true triples from false ones. ContE is capable of modeling four connectivity patterns such as symmetry, antisymmetry, inversion and composition. Research limitations ContE needs to do a grid search to find best parameters to get best performance in practice, which is a time-consuming task. Sometimes, it requires longer entity vectors to get better performance than some other models. Practical implications ContE is a bilinear model, which is a quite simple model that could be applied to large-scale KGs. By considering contexts of relations, ContE can distinguish the exact meaning of an entity in different triples so that when performing compositional reasoning, it is capable to infer the connectivity patterns of relations and achieves good performance on link prediction tasks. Originality/value ContE considers the contexts of entities in terms of their positions in triples and the relationships they link to. It decomposes a relation vector into two vectors, namely, forward impact vector and backward impact vector in order to capture the relational contexts. ContE has the same low computational complexity as TransE. Therefore, it provides a new approach for contextualized knowledge graph embedding.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"7 1","pages":"84 - 106"},"PeriodicalIF":0.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45530191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

I’m Nervous about Sharing This Secret with You: Youtube Influencers Generate Strong Parasocial Interactions by Discussing Personal Issues 我对与你分享这个秘密感到紧张：Youtube影响者通过讨论个人问题产生强烈的反社会互动

Journal of data and information science (Warsaw, Poland)

Pub Date : 2022-04-01 DOI: 10.2478/jdis-2022-0011

M. Thelwall, E. Stuart, Amalia Más-Bleda, Meiko Makita, Mahshid Abdoli

Abstract Purpose Performers may generate loyalty partly through eliciting illusory personal connections with their audience, parasocial relationships (PSRs), and individual illusory exchanges, parasocial interactions (PSIs). On social media, semi-PSIs are real but imbalanced exchanges with audiences, including through comments on influencers’ videos, and strong semi-PSIs are those that occur within PSRs. This article introduces and assesses an automatic method to detect videos with strong PSI potential. Design/methodology/approach Strong semi-PSIs were hypothesized to occur when commenters used a variant of the pronoun “you”, typically addressing the influencer. Comments on the videos of UK female influencer channels were used to test whether the proportion of you pronoun comments could be an automated indicator of strong PSI potential, and to find factors associating with the strong PSI potential of influencer videos. The highest and lowest strong PSI potential videos for 117 influencers were classified with content analysis for strong PSI potential and evidence of factors that might elicit PSIs. Findings The you pronoun proportion was effective at indicating video strong PSI potential, the first automated method to detect any type of PSI. Gazing at the camera, head and shoulders framing, discussing personal issues, and focusing on the influencer associated with higher strong PSI potential for influencer videos. New social media factors found include requesting feedback and discussing the channel itself. Research limitations Only one country, genre and social media platform was analysed. Practical implications The method can be used to automatically detect YouTube videos with strong PSI potential, helping influencers to monitor their performance. Originality/value This is the first automatic method to detect any aspect of PSI or PSR.

表演者可能通过与观众建立虚幻的个人联系，即副社会关系(PSRs)和个体虚幻交换，即副社会互动(psi)来产生忠诚。在社交媒体上，半psi是指通过对网红视频的评论等方式与受众进行的真实但不平衡的交流，而强半psi是指在psr内部发生的交流。本文介绍并评估了一种自动检测强PSI电位视频的方法。设计/方法/方法假设当评论者使用代词“你”的变体时，通常会出现强大的半psi，用于称呼影响者。我们使用英国女性网红频道的视频评论来测试你代词评论的比例是否可以作为强PSI潜力的自动指标，并找到与网红视频的强PSI潜力相关的因素。对117位影响者的最高和最低的强PSI潜力视频进行分类，并对强PSI潜力的内容分析和可能引发PSI的因素的证据进行分类。发现“you pronoun”比例能有效指示视频强PSI电位，是第一个检测任何类型PSI的自动化方法。盯着镜头，头和肩膀构图，讨论个人问题，并专注于影响者视频中与更高的PSI潜力相关的影响者。新发现的社交媒体因素包括请求反馈和讨论渠道本身。研究局限仅分析了一个国家、流派和社交媒体平台。该方法可用于自动检测具有强大PSI潜力的YouTube视频，帮助网红监控其表现。这是第一个自动检测PSI或PSR任何方面的方法。

{"title":"I’m Nervous about Sharing This Secret with You: Youtube Influencers Generate Strong Parasocial Interactions by Discussing Personal Issues","authors":"M. Thelwall, E. Stuart, Amalia Más-Bleda, Meiko Makita, Mahshid Abdoli","doi":"10.2478/jdis-2022-0011","DOIUrl":"https://doi.org/10.2478/jdis-2022-0011","url":null,"abstract":"Abstract Purpose Performers may generate loyalty partly through eliciting illusory personal connections with their audience, parasocial relationships (PSRs), and individual illusory exchanges, parasocial interactions (PSIs). On social media, semi-PSIs are real but imbalanced exchanges with audiences, including through comments on influencers’ videos, and strong semi-PSIs are those that occur within PSRs. This article introduces and assesses an automatic method to detect videos with strong PSI potential. Design/methodology/approach Strong semi-PSIs were hypothesized to occur when commenters used a variant of the pronoun “you”, typically addressing the influencer. Comments on the videos of UK female influencer channels were used to test whether the proportion of you pronoun comments could be an automated indicator of strong PSI potential, and to find factors associating with the strong PSI potential of influencer videos. The highest and lowest strong PSI potential videos for 117 influencers were classified with content analysis for strong PSI potential and evidence of factors that might elicit PSIs. Findings The you pronoun proportion was effective at indicating video strong PSI potential, the first automated method to detect any type of PSI. Gazing at the camera, head and shoulders framing, discussing personal issues, and focusing on the influencer associated with higher strong PSI potential for influencer videos. New social media factors found include requesting feedback and discussing the channel itself. Research limitations Only one country, genre and social media platform was analysed. Practical implications The method can be used to automatically detect YouTube videos with strong PSI potential, helping influencers to monitor their performance. Originality/value This is the first automatic method to detect any aspect of PSI or PSR.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"7 1","pages":"31 - 56"},"PeriodicalIF":0.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43886029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Fighting Against Academic Misconduct: What Can Scientometricians Do? 与学术不端行为作斗争：科学工作者能做什么？

Journal of data and information science (Warsaw, Poland)

Pub Date : 2022-04-01 DOI: 10.2478/jdis-2022-0013

Sichao Tong, Zhesi Shen, Tianyuan Huang, Liying Yang

引用次数: 0

I Don’t Peer-Review for Non-Open Journals, and Neither Should You 我不为非开放期刊进行同行评议，你也不应该

Journal of data and information science (Warsaw, Poland)

Pub Date : 2022-04-01 DOI: 10.2478/jdis-2022-0010

Michael P. Taylor

引用次数: 0

Academic Collaborator Recommendation Based on Attributed Network Embedding 基于属性网络嵌入的学术合作者推荐

Journal of data and information science (Warsaw, Poland)

Pub Date : 2022-02-01 DOI: 10.2478/jdis-2022-0005

Ouxia Du, Ya Li

Abstract Purpose Based on real-world academic data, this study aims to use network embedding technology to mining academic relationships, and investigate the effectiveness of the proposed embedding model on academic collaborator recommendation tasks. Design/methodology/approach We propose an academic collaborator recommendation model based on attributed network embedding (ACR-ANE), which can get enhanced scholar embedding and take full advantage of the topological structure of the network and multi-type scholar attributes. The non-local neighbors for scholars are defined to capture strong relationships among scholars. A deep auto-encoder is adopted to encode the academic collaboration network structure and scholar attributes into a low-dimensional representation space. Findings 1. The proposed non-local neighbors can better describe the relationships among scholars in the real world than the first-order neighbors. 2. It is important to consider the structure of the academic collaboration network and scholar attributes when recommending collaborators for scholars simultaneously. Research limitations The designed method works for static networks, without taking account of the network dynamics. Practical implications The designed model is embedded in academic collaboration network structure and scholarly attributes, which can be used to help scholars recommend potential collaborators. Originality/value Experiments on two real-world scholarly datasets, Aminer and APS, show that our proposed method performs better than other baselines.

摘要目的基于真实世界的学术数据，本研究旨在利用网络嵌入技术挖掘学术关系，并研究所提出的嵌入模型在学术合作者推荐任务中的有效性。设计/方法论/方法我们提出了一种基于属性网络嵌入的学术合作者推荐模型（ACR-ANE），该模型可以得到增强的学者嵌入，并充分利用网络的拓扑结构和多种类型的学者属性。学者的非本地邻居被定义为捕捉学者之间的牢固关系。采用深度自动编码器将学术协作网络结构和学者属性编码到低维表示空间中。调查结果1。所提出的非局部邻居比一阶邻居更能描述现实世界中学者之间的关系。2.在为学者推荐合作者时，同时考虑学术合作网络的结构和学者属性是很重要的。研究局限性所设计的方法适用于静态网络，不考虑网络动力学。所设计的模型嵌入到学术协作网络结构和学术属性中，可用于帮助学者推荐潜在的合作者。在Aminer和APS这两个真实世界的学术数据集上进行的原创性/价值实验表明，我们提出的方法比其他基线表现更好。

{"title":"Academic Collaborator Recommendation Based on Attributed Network Embedding","authors":"Ouxia Du, Ya Li","doi":"10.2478/jdis-2022-0005","DOIUrl":"https://doi.org/10.2478/jdis-2022-0005","url":null,"abstract":"Abstract Purpose Based on real-world academic data, this study aims to use network embedding technology to mining academic relationships, and investigate the effectiveness of the proposed embedding model on academic collaborator recommendation tasks. Design/methodology/approach We propose an academic collaborator recommendation model based on attributed network embedding (ACR-ANE), which can get enhanced scholar embedding and take full advantage of the topological structure of the network and multi-type scholar attributes. The non-local neighbors for scholars are defined to capture strong relationships among scholars. A deep auto-encoder is adopted to encode the academic collaboration network structure and scholar attributes into a low-dimensional representation space. Findings 1. The proposed non-local neighbors can better describe the relationships among scholars in the real world than the first-order neighbors. 2. It is important to consider the structure of the academic collaboration network and scholar attributes when recommending collaborators for scholars simultaneously. Research limitations The designed method works for static networks, without taking account of the network dynamics. Practical implications The designed model is embedded in academic collaboration network structure and scholarly attributes, which can be used to help scholars recommend potential collaborators. Originality/value Experiments on two real-world scholarly datasets, Aminer and APS, show that our proposed method performs better than other baselines.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"7 1","pages":"37 - 56"},"PeriodicalIF":0.0,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42912037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Progress and Knowledge Transfer from Science to Technology in the Research Frontier of CRISPR Based on the LDA Model 基于LDA模型的CRISPR研究前沿的科技进步与知识转移

Journal of data and information science (Warsaw, Poland)

Pub Date : 2022-02-01 DOI: 10.2478/jdis-2022-0004

Yushuang Lyu, Muqi Yin, Fangjie Xi, Xiaojun Hu

Abstract Purpose This study explores the underlying research topics regarding CRISPR based on the LDA model and figures out trends in knowledge transfer from science to technology in this area over the latest 10 years. Design/methodology/approach We collected publications on CRISPR between 2011 and 2020 from the Web of Science, and traced all the patents citing them from lens.org. 15,904 articles and 18,985 patents in total are downloaded and analyzed. The LDA model was applied to identify underlying research topics in related research. In addition, some indicators were introduced to measure the knowledge transfer from research topics of scientific publications to IPC-4 classes of patents. Findings The emerging research topics on CRISPR were identified and their evolution over time displayed. Furthermore, a big picture of knowledge transition from research topics to technological classes of patents was presented. We found that for all topics on CRISPR, the average first transition year, the ratio of articles cited by patents, the NPR transition rate are respectively 1.08, 15.57%, and 1.19, extremely shorter and more intensive than those of general fields. Moreover, the transition patterns are different among research topics. Research limitations Our research is limited to publications retrieved from the Web of Science and their citing patents indexed in lens.org. A limitation inherent with LDA analysis is in the manual interpretation and labeling of “topics”. Practical implications Our study provides good references for policy-makers on allocating scientific resources and regulating financial budgets to face challenges related to the transformative technology of CRISPR. Originality/value The LDA model here is applied to topic identification in the area of transformative researches for the first time, as exemplified on CRISPR. Additionally, the dataset of all citing patents in this area helps to provide a full picture to detect the knowledge transition between S&T.

摘要目的本研究基于LDA模型探讨了CRISPR的潜在研究主题，并统计了近10年来该领域从科学到技术的知识转移趋势。设计/方法论/方法我们从科学网收集了2011年至2020年间关于CRISPR的出版物，并从lens.org追踪了所有引用这些出版物的专利。共下载和分析了15904篇文章和18985项专利。LDA模型用于确定相关研究中的潜在研究主题。此外，还采用了一些指标来衡量从科学出版物的研究主题到IPC-4类专利的知识转移情况。研究结果确定了CRISPR的新兴研究主题，并展示了它们随时间的演变。此外，还介绍了知识从研究主题向专利技术类别转变的全貌。我们发现，对于CRISPR的所有主题，平均第一个过渡年、专利引用文章的比率和NPR过渡率分别为1.08、15.57%和1.19，比一般领域的主题更短、更密集。此外，不同研究主题之间的转换模式也不同。研究局限性我们的研究仅限于从科学网检索的出版物及其在lens.org中索引的引用专利。LDA分析固有的局限性在于对“主题”的手动解释和标记。实际意义我们的研究为决策者分配科学资源和规范财政预算以应对与CRISPR变革技术相关的挑战提供了很好的参考。原创性/价值这里的LDA模型首次应用于变革性研究领域的主题识别，CRISPR就是一个例子。此外，该领域所有引用专利的数据集有助于提供全面的信息来检测科技之间的知识转换。

{"title":"Progress and Knowledge Transfer from Science to Technology in the Research Frontier of CRISPR Based on the LDA Model","authors":"Yushuang Lyu, Muqi Yin, Fangjie Xi, Xiaojun Hu","doi":"10.2478/jdis-2022-0004","DOIUrl":"https://doi.org/10.2478/jdis-2022-0004","url":null,"abstract":"Abstract Purpose This study explores the underlying research topics regarding CRISPR based on the LDA model and figures out trends in knowledge transfer from science to technology in this area over the latest 10 years. Design/methodology/approach We collected publications on CRISPR between 2011 and 2020 from the Web of Science, and traced all the patents citing them from lens.org. 15,904 articles and 18,985 patents in total are downloaded and analyzed. The LDA model was applied to identify underlying research topics in related research. In addition, some indicators were introduced to measure the knowledge transfer from research topics of scientific publications to IPC-4 classes of patents. Findings The emerging research topics on CRISPR were identified and their evolution over time displayed. Furthermore, a big picture of knowledge transition from research topics to technological classes of patents was presented. We found that for all topics on CRISPR, the average first transition year, the ratio of articles cited by patents, the NPR transition rate are respectively 1.08, 15.57%, and 1.19, extremely shorter and more intensive than those of general fields. Moreover, the transition patterns are different among research topics. Research limitations Our research is limited to publications retrieved from the Web of Science and their citing patents indexed in lens.org. A limitation inherent with LDA analysis is in the manual interpretation and labeling of “topics”. Practical implications Our study provides good references for policy-makers on allocating scientific resources and regulating financial budgets to face challenges related to the transformative technology of CRISPR. Originality/value The LDA model here is applied to topic identification in the area of transformative researches for the first time, as exemplified on CRISPR. Additionally, the dataset of all citing patents in this area helps to provide a full picture to detect the knowledge transition between S&T.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"7 1","pages":"1 - 19"},"PeriodicalIF":0.0,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48706809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Contribution of the Open Access Modality to the Impact of Hybrid Journals Controlling by Field and Time Effects 开放获取模式对受场效应和时间效应控制的混合型期刊影响的贡献

Journal of data and information science (Warsaw, Poland)

Pub Date : 2022-01-23 DOI: 10.2478/jdis-2022-0007

Pablo Dorta-Gonz'alez, Mar'ia Isabel Dorta-Gonz'alez

Abstract Purpose Researchers are more likely to read and cite papers to which they have access than those that they cannot obtain. Thus, the objective of this work is to analyze the contribution of the Open Access (OA) modality to the impact of hybrid journals. Design/methodology/approach The “research articles” in the year 2017 from 200 hybrid journals in four subject areas, and the citations received by such articles in the period 2017–2020 in the Scopus database, were analyzed. The hybrid OA papers were compared with the paywalled ones. The journals were randomly selected from those with share of OA papers higher than some minimal value. More than 60 thousand research articles were analyzed in the sample, of which 24% under the OA modality. Findings We obtain at journal level that cites per article in both hybrid modalities (OA and paywalled) strongly correlate. However, there is no correlation between the OA prevalence and cites per article. There is OA citation advantage in 80% of hybrid journals. Moreover, the OA citation advantage is consistent across fields and held in time. We obtain an OA citation advantage of 50% in average, and higher than 37% in half of the hybrid journals. Finally, the OA citation advantage is higher in Humanities than in Science and Social Science. Research limitations Some of the citation advantage is likely due to more access allows more people to read and hence cite articles they otherwise would not. However, causation is difficult to establish and there are many possible bias. Several factors can affect the observed differences in citation rates. Funder mandates can be one of them. Funders are likely to have OA requirement, and well-funded studies are more likely to receive more citations than poorly funded studies. Another discussed factor is the selection bias postulate, which suggests that authors choose only their most impactful studies to be open access. Practical implications For hybrid journals, the open access modality is positive, in the sense that it provides a greater number of potential readers. This in turn translates into a greater number of citations and an improvement in the position of the journal in the rankings by impact factor. For researchers it is also positive because it increases the potential number of readers and citations received. Originality/value Our study refines previous results by comparing documents more similar to each other. Although it does not examine the cause of the observed citation advantage, we find that it exists in a very large sample.

研究人员更有可能阅读和引用他们可以获得的论文，而不是那些他们无法获得的论文。因此，这项工作的目的是分析开放获取(OA)模式对混合期刊影响的贡献。设计/方法/方法分析2017年4个学科领域200种混合期刊的“研究文章”，以及Scopus数据库2017 - 2020年期间这些文章的被引情况。将混合型OA论文与付费墙OA论文进行了比较。这些期刊是从那些OA论文份额高于某个最小值的期刊中随机选择的。样本中分析了6万多篇研究论文，其中24%采用OA模式。研究发现，在期刊水平上，两种混合模式(开放获取和付费模式)的每篇文章的引用量密切相关。然而，OA患病率与每篇文章的引用数之间没有相关性。80%的混合期刊具有OA引用优势。此外，OA的被引优势具有跨领域的一致性和时效性。我们获得的OA引用优势平均为50%，在一半的混合期刊中高于37%。最后，人文学科的OA被引优势高于自然科学和社会科学。一些引用优势可能是由于更多的访问允许更多的人阅读并因此引用他们本来不会引用的文章。然而，因果关系很难确定，而且存在许多可能的偏差。有几个因素可以影响观察到的引用率差异。基金授权就是其中之一。资助者可能有OA要求，资金充足的研究比资金不足的研究更有可能获得更多的引用。另一个讨论的因素是选择偏差假设，这表明作者只选择他们最具影响力的研究来开放获取。对混合型期刊来说，开放获取模式是积极的，因为它提供了更多的潜在读者。这反过来又转化为更多的引用，并提高了该期刊在影响因子排名中的地位。对于研究人员来说，这也是积极的，因为它增加了潜在的读者数量和收到的引用。我们的研究通过比较彼此更相似的文件来改进先前的结果。虽然它没有考察观察到的引文优势的原因，但我们发现它存在于一个非常大的样本中。

{"title":"Contribution of the Open Access Modality to the Impact of Hybrid Journals Controlling by Field and Time Effects","authors":"Pablo Dorta-Gonz'alez, Mar'ia Isabel Dorta-Gonz'alez","doi":"10.2478/jdis-2022-0007","DOIUrl":"https://doi.org/10.2478/jdis-2022-0007","url":null,"abstract":"Abstract Purpose Researchers are more likely to read and cite papers to which they have access than those that they cannot obtain. Thus, the objective of this work is to analyze the contribution of the Open Access (OA) modality to the impact of hybrid journals. Design/methodology/approach The “research articles” in the year 2017 from 200 hybrid journals in four subject areas, and the citations received by such articles in the period 2017–2020 in the Scopus database, were analyzed. The hybrid OA papers were compared with the paywalled ones. The journals were randomly selected from those with share of OA papers higher than some minimal value. More than 60 thousand research articles were analyzed in the sample, of which 24% under the OA modality. Findings We obtain at journal level that cites per article in both hybrid modalities (OA and paywalled) strongly correlate. However, there is no correlation between the OA prevalence and cites per article. There is OA citation advantage in 80% of hybrid journals. Moreover, the OA citation advantage is consistent across fields and held in time. We obtain an OA citation advantage of 50% in average, and higher than 37% in half of the hybrid journals. Finally, the OA citation advantage is higher in Humanities than in Science and Social Science. Research limitations Some of the citation advantage is likely due to more access allows more people to read and hence cite articles they otherwise would not. However, causation is difficult to establish and there are many possible bias. Several factors can affect the observed differences in citation rates. Funder mandates can be one of them. Funders are likely to have OA requirement, and well-funded studies are more likely to receive more citations than poorly funded studies. Another discussed factor is the selection bias postulate, which suggests that authors choose only their most impactful studies to be open access. Practical implications For hybrid journals, the open access modality is positive, in the sense that it provides a greater number of potential readers. This in turn translates into a greater number of citations and an improvement in the position of the journal in the rankings by impact factor. For researchers it is also positive because it increases the potential number of readers and citations received. Originality/value Our study refines previous results by comparing documents more similar to each other. Although it does not examine the cause of the observed citation advantage, we find that it exists in a very large sample.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"7 1","pages":"57 - 83"},"PeriodicalIF":0.0,"publicationDate":"2022-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44573569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Parameterless Pruning Algorithms for Similarity-Weight Network and Its Application in Extracting the Backbone of Global Value Chain 相似权网络的无参数修剪算法及其在提取全球价值链骨干中的应用

Journal of data and information science (Warsaw, Poland)

Pub Date : 2021-12-11 DOI: 10.2478/jdis-2022-0002

Lizhi Xing, Yuanqing Han

Abstract Purpose With the availability and utilization of Inter-Country Input-Output (ICIO) tables, it is possible to construct quantitative indices to assess its impact on the Global Value Chain (GVC). For the sake of visualization, ICIO networks with tremendous low- weight edges are too dense to show the substantial structure. These redundant edges, inevitably make the network data full of noise and eventually exert negative effects on Social Network Analysis (SNA). In this case, we need a method to filter such edges and obtain a sparser network with only the meaningful connections. Design/methodology/approach In this paper, we propose two parameterless pruning algorithms from the global and local perspectives respectively, then the performance of them is examined using the ICIO table from different databases. Findings The Searching Paths (SP) method extracts the strongest association paths from the global perspective, while Filtering Edges (FE) method captures the key links according to the local weight ratio. The results show that the FE method can basically include the SP method and become the best solution for the ICIO networks. Research limitations There are still two limitations in this research. One is that the computational complexity may increase rapidly while processing the large-scale networks, so the proposed method should be further improved. The other is that much more empirical networks should be introduced to testify the scientificity and practicability of our methodology. Practical implications The network pruning methods we proposed will promote the analysis of the ICIO network, in terms of community detection, link prediction, and spatial econometrics, etc. Also, they can be applied to many other complex networks with similar characteristics. Originality/value This paper improves the existing research from two aspects, namely, considering the heterogeneity of weights and avoiding the interference of parameters. Therefore, it provides a new idea for the research of network backbone extraction.

摘要目的利用国家间投入产出(ICIO)表，可以构建量化指标来评估其对全球价值链的影响。为了可视化，具有大量低权重边缘的ICIO网络过于密集，无法显示其实质结构。这些冗余的边缘不可避免地使网络数据充满噪声，最终对社会网络分析(Social network Analysis, SNA)产生负面影响。在这种情况下，我们需要一种方法来过滤这些边，并获得一个只有有意义连接的更稀疏的网络。本文分别从全局和局部角度提出了两种无参数剪枝算法，并利用不同数据库的ICIO表对其性能进行了检验。发现搜索路径(SP)方法从全局角度提取最强关联路径，过滤边缘(FE)方法根据局部权重比捕获关键环节。结果表明，有限元方法基本可以包含SP方法，成为ICIO网络的最优解。本研究还存在两个局限性。一是在处理大规模网络时，计算复杂度可能会迅速增加，因此该方法有待进一步改进。另一个是应该引入更多的经验网络来证明我们的方法的科学性和实用性。本文提出的网络修剪方法将促进ICIO网络在社区检测、链接预测和空间计量等方面的分析。同样，它们也可以应用于许多其他具有相似特征的复杂网络。本文从考虑权重的异质性和避免参数的干扰两方面对已有研究进行了改进。因此，为网络骨干网提取的研究提供了一种新的思路。

{"title":"Parameterless Pruning Algorithms for Similarity-Weight Network and Its Application in Extracting the Backbone of Global Value Chain","authors":"Lizhi Xing, Yuanqing Han","doi":"10.2478/jdis-2022-0002","DOIUrl":"https://doi.org/10.2478/jdis-2022-0002","url":null,"abstract":"Abstract Purpose With the availability and utilization of Inter-Country Input-Output (ICIO) tables, it is possible to construct quantitative indices to assess its impact on the Global Value Chain (GVC). For the sake of visualization, ICIO networks with tremendous low- weight edges are too dense to show the substantial structure. These redundant edges, inevitably make the network data full of noise and eventually exert negative effects on Social Network Analysis (SNA). In this case, we need a method to filter such edges and obtain a sparser network with only the meaningful connections. Design/methodology/approach In this paper, we propose two parameterless pruning algorithms from the global and local perspectives respectively, then the performance of them is examined using the ICIO table from different databases. Findings The Searching Paths (SP) method extracts the strongest association paths from the global perspective, while Filtering Edges (FE) method captures the key links according to the local weight ratio. The results show that the FE method can basically include the SP method and become the best solution for the ICIO networks. Research limitations There are still two limitations in this research. One is that the computational complexity may increase rapidly while processing the large-scale networks, so the proposed method should be further improved. The other is that much more empirical networks should be introduced to testify the scientificity and practicability of our methodology. Practical implications The network pruning methods we proposed will promote the analysis of the ICIO network, in terms of community detection, link prediction, and spatial econometrics, etc. Also, they can be applied to many other complex networks with similar characteristics. Originality/value This paper improves the existing research from two aspects, namely, considering the heterogeneity of weights and avoiding the interference of parameters. Therefore, it provides a new idea for the research of network backbone extraction.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"7 1","pages":"57 - 75"},"PeriodicalIF":0.0,"publicationDate":"2021-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46397038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

The Roles of Female Involvement and Risk Aversion in Open Access Publishing Patterns in Vietnamese Social Sciences and Humanities 女性参与与风险规避在越南社会科学与人文开放出版模式中的作用

Journal of data and information science (Warsaw, Poland)

Pub Date : 2021-12-11 DOI: 10.2478/jdis-2022-0001

Minh-Hoang Nguyen, N. Huyen, Manh-Toan Ho, T. Le, Q. Vuong

Abstract Purpose The open-access (OA) publishing model can help improve researchers’ outreach, thanks to its accessibility and visibility to the public. Therefore, the presentation of female researchers can benefit from the OA publishing model. Despite that, little is known about how gender affects OA practices. Thus, the current study explores the effects of female involvement and risk aversion on OA publishing patterns among Vietnamese social sciences and humanities. Design/methodology/approach The study employed Bayesian Mindsponge Framework (BMF) on a dataset of 3,122 Vietnamese social sciences and humanities (SS&H) publications during 2008–2019. The Mindsponge mechanism was specifically used to construct theoretical models, while Bayesian inference was utilized for fitting models. Findings The result showed a positive association between female participation and OA publishing probability. However, the positive effect of female involvement on OA publishing probability was negated by the high ratio of female researchers in a publication. OA status was negatively associated with the JIF of the journal in which the publication was published, but the relationship was moderated by the involvement of a female researcher(s). The findings suggested that Vietnamese female researchers might be more likely to publish under the OA model in journals with high JIF for avoiding the risk of public criticism. Research limitations The study could only provide evidence on the association between female involvement and OA publishing probability. However, whether to publish under OA terms is often determined by the first or corresponding authors, but not necessarily gender-based. Practical implications Systematically coordinated actions are suggested to better support women and promote the OA movement in Vietnam. Originality/value The findings show the OA publishing patterns of female researchers in Vietnamese SS&H.

摘要目的开放获取（OA）出版模式由于其对公众的可访问性和可见性，有助于提高研究人员的外联能力。因此，女性研究人员的陈述可以受益于OA出版模式。尽管如此，人们对性别如何影响OA实践知之甚少。因此，本研究探讨了女性参与和风险厌恶对越南社会科学和人文学科OA出版模式的影响。设计/方法论/方法该研究采用贝叶斯思维海绵框架（BMF）对2008-2019年期间3122份越南社会科学和人文科学（SS&H）出版物的数据集进行研究。Minds海绵机制专门用于构建理论模型，而贝叶斯推理用于拟合模型。研究结果显示女性参与度与OA发表概率呈正相关。然而，女性参与对OA出版概率的积极影响被出版物中女性研究人员的高比例所否定。OA状态与发表该出版物的期刊的JIF呈负相关，但这种关系由一名女性研究人员的参与调节。研究结果表明，越南女性研究人员可能更有可能在OA模式下在JIF高的期刊上发表文章，以避免公众批评的风险。研究局限性该研究只能提供女性参与与OA发表概率之间关系的证据。然而，是否根据OA条款发表通常由第一作者或通讯作者决定，但不一定基于性别。实际意义建议采取系统协调的行动，更好地支持越南妇女和促进OA运动。原创性/价值研究结果显示了越南SS&H女性研究人员的OA出版模式。

{"title":"The Roles of Female Involvement and Risk Aversion in Open Access Publishing Patterns in Vietnamese Social Sciences and Humanities","authors":"Minh-Hoang Nguyen, N. Huyen, Manh-Toan Ho, T. Le, Q. Vuong","doi":"10.2478/jdis-2022-0001","DOIUrl":"https://doi.org/10.2478/jdis-2022-0001","url":null,"abstract":"Abstract Purpose The open-access (OA) publishing model can help improve researchers’ outreach, thanks to its accessibility and visibility to the public. Therefore, the presentation of female researchers can benefit from the OA publishing model. Despite that, little is known about how gender affects OA practices. Thus, the current study explores the effects of female involvement and risk aversion on OA publishing patterns among Vietnamese social sciences and humanities. Design/methodology/approach The study employed Bayesian Mindsponge Framework (BMF) on a dataset of 3,122 Vietnamese social sciences and humanities (SS&H) publications during 2008–2019. The Mindsponge mechanism was specifically used to construct theoretical models, while Bayesian inference was utilized for fitting models. Findings The result showed a positive association between female participation and OA publishing probability. However, the positive effect of female involvement on OA publishing probability was negated by the high ratio of female researchers in a publication. OA status was negatively associated with the JIF of the journal in which the publication was published, but the relationship was moderated by the involvement of a female researcher(s). The findings suggested that Vietnamese female researchers might be more likely to publish under the OA model in journals with high JIF for avoiding the risk of public criticism. Research limitations The study could only provide evidence on the association between female involvement and OA publishing probability. However, whether to publish under OA terms is often determined by the first or corresponding authors, but not necessarily gender-based. Practical implications Systematically coordinated actions are suggested to better support women and promote the OA movement in Vietnam. Originality/value The findings show the OA publishing patterns of female researchers in Vietnamese SS&H.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"7 1","pages":"76 - 96"},"PeriodicalIF":0.0,"publicationDate":"2021-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43088303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Public Reaction to Scientific Research via Twitter Sentiment Prediction 通过推特情绪预测公众对科学研究的反应

Journal of data and information science (Warsaw, Poland)

Pub Date : 2021-12-11 DOI: 10.2478/jdis-2022-0003

Murtuza Shahzad, Hamed Alhoori

Abstract Purpose Social media users share their ideas, thoughts, and emotions with other users. However, it is not clear how online users would respond to new research outcomes. This study aims to predict the nature of the emotions expressed by Twitter users toward scientific publications. Additionally, we investigate what features of the research articles help in such prediction. Identifying the sentiments of research articles on social media will help scientists gauge a new societal impact of their research articles. Design/methodology/approach Several tools are used for sentiment analysis, so we applied five sentiment analysis tools to check which are suitable for capturing a tweet's sentiment value and decided to use NLTK VADER and TextBlob. We segregated the sentiment value into negative, positive, and neutral. We measure the mean and median of tweets’ sentiment value for research articles with more than one tweet. We next built machine learning models to predict the sentiments of tweets related to scientific publications and investigated the essential features that controlled the prediction models. Findings We found that the most important feature in all the models was the sentiment of the research article title followed by the author count. We observed that the tree-based models performed better than other classification models, with Random Forest achieving 89% accuracy for binary classification and 73% accuracy for three-label classification. Research limitations In this research, we used state-of-the-art sentiment analysis libraries. However, these libraries might vary at times in their sentiment prediction behavior. Tweet sentiment may be influenced by a multitude of circumstances and is not always immediately tied to the paper's details. In the future, we intend to broaden the scope of our research by employing word2vec models. Practical implications Many studies have focused on understanding the impact of science on scientists or how science communicators can improve their outcomes. Research in this area has relied on fewer and more limited measures, such as citations and user studies with small datasets. There is currently a critical need to find novel methods to quantify and evaluate the broader impact of research. This study will help scientists better comprehend the emotional impact of their work. Additionally, the value of understanding the public's interest and reactions helps science communicators identify effective ways to engage with the public and build positive connections between scientific communities and the public. Originality/value This study will extend work on public engagement with science, sociology of science, and computational social science. It will enable researchers to identify areas in which there is a gap between public and expert understanding and provide strategies by which this gap can be bridged.

社交媒体用户与其他用户分享他们的想法、想法和情感。然而，尚不清楚在线用户将如何回应新的研究成果。这项研究旨在预测Twitter用户对科学出版物表达的情感的性质。此外，我们还调查了研究文章的哪些特征有助于这种预测。识别社交媒体上研究文章的情绪将有助于科学家评估他们的研究文章的新的社会影响。设计/方法/方法情感分析使用了几种工具，因此我们应用了五种情感分析工具来检查哪些工具适合捕获tweet的情感价值，并决定使用NLTK VADER和TextBlob。我们将情绪值分为消极、积极和中性。我们测量具有多个tweet的研究文章的tweet的情感值的平均值和中位数。接下来，我们建立了机器学习模型来预测与科学出版物相关的推文的情绪，并研究了控制预测模型的基本特征。我们发现，所有模型中最重要的特征是研究文章标题的情感，其次是作者数量。我们观察到基于树的模型比其他分类模型表现得更好，随机森林在二元分类中达到89%的准确率，在三标签分类中达到73%的准确率。在本研究中，我们使用了最先进的情感分析库。然而，这些库的情绪预测行为有时可能会有所不同。推特上的情绪可能受到多种情况的影响，并不总是与报纸的细节直接相关。在未来，我们打算通过使用word2vec模型来扩大我们的研究范围。许多研究的重点是理解科学对科学家的影响，或者科学传播者如何改善他们的成果。这一领域的研究依赖于更少和更有限的措施，例如使用小数据集的引用和用户研究。目前迫切需要找到新的方法来量化和评估研究的更广泛影响。这项研究将帮助科学家更好地理解他们的工作对情感的影响。此外，了解公众的兴趣和反应的价值有助于科学传播者确定与公众接触的有效方式，并在科学界和公众之间建立积极的联系。原创性/价值本研究将扩展公众参与科学、科学社会学和计算社会科学的工作。它将使研究人员能够确定在公众和专家的理解之间存在差距的领域，并提供可以弥补这一差距的策略。

{"title":"Public Reaction to Scientific Research via Twitter Sentiment Prediction","authors":"Murtuza Shahzad, Hamed Alhoori","doi":"10.2478/jdis-2022-0003","DOIUrl":"https://doi.org/10.2478/jdis-2022-0003","url":null,"abstract":"Abstract Purpose Social media users share their ideas, thoughts, and emotions with other users. However, it is not clear how online users would respond to new research outcomes. This study aims to predict the nature of the emotions expressed by Twitter users toward scientific publications. Additionally, we investigate what features of the research articles help in such prediction. Identifying the sentiments of research articles on social media will help scientists gauge a new societal impact of their research articles. Design/methodology/approach Several tools are used for sentiment analysis, so we applied five sentiment analysis tools to check which are suitable for capturing a tweet's sentiment value and decided to use NLTK VADER and TextBlob. We segregated the sentiment value into negative, positive, and neutral. We measure the mean and median of tweets’ sentiment value for research articles with more than one tweet. We next built machine learning models to predict the sentiments of tweets related to scientific publications and investigated the essential features that controlled the prediction models. Findings We found that the most important feature in all the models was the sentiment of the research article title followed by the author count. We observed that the tree-based models performed better than other classification models, with Random Forest achieving 89% accuracy for binary classification and 73% accuracy for three-label classification. Research limitations In this research, we used state-of-the-art sentiment analysis libraries. However, these libraries might vary at times in their sentiment prediction behavior. Tweet sentiment may be influenced by a multitude of circumstances and is not always immediately tied to the paper's details. In the future, we intend to broaden the scope of our research by employing word2vec models. Practical implications Many studies have focused on understanding the impact of science on scientists or how science communicators can improve their outcomes. Research in this area has relied on fewer and more limited measures, such as citations and user studies with small datasets. There is currently a critical need to find novel methods to quantify and evaluate the broader impact of research. This study will help scientists better comprehend the emotional impact of their work. Additionally, the value of understanding the public's interest and reactions helps science communicators identify effective ways to engage with the public and build positive connections between scientific communities and the public. Originality/value This study will extend work on public engagement with science, sociology of science, and computational social science. It will enable researchers to identify areas in which there is a gap between public and expert understanding and provide strategies by which this gap can be bridged.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"7 1","pages":"97 - 124"},"PeriodicalIF":0.0,"publicationDate":"2021-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42492680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4