Journal of data and information science (Warsaw, Poland)最新文献_第8页

The Association between Researchers’ Conceptions of Research and Their Strategic Research Agendas 研究者的研究观念与战略研究议程的关联

Journal of data and information science (Warsaw, Poland)

Pub Date : 2020-11-01 DOI: 10.2478/jdis-2020-0032

João M. Santos, H. Horta

Abstract Purpose In studies of the research process, the association between how researchers conceptualize research and their strategic research agendas has been largely overlooked. This study aims to address this gap. Design/methodology/approach This study analyzes this relationship using a dataset of more than 8,500 researchers across all scientific fields and the globe. It studies the associations between the dimensions of two inventories: the Conceptions of Research Inventory (CoRI) and the Multi-Dimensional Research Agenda Inventory—Revised (MDRAI-R). Findings The findings show a relatively strong association between researchers’ conceptions of research and their research agendas. While all conceptions of research are positively related to scientific ambition, the findings are mixed regarding how the dimensions of the two inventories relate to one another, which is significant for those seeking to understand the knowledge production process better. Research limitations The study relies on self-reported data, which always carries a risk of response bias. Practical implications The findings provide a greater understanding of the inner workings of knowledge processes and indicate that the two inventories, whether used individually or in combination, may provide complementary analytical perspectives to research performance indicators. They may thus offer important insights for managers of research environments regarding how to assess the research culture, beliefs, and conceptualizations of individual researchers and research teams when designing strategies to promote specific institutional research focuses and strategies. Originality/value To the best of the authors’ knowledge, this is the first study to associate research agendas and conceptions of research. It is based on a large sample of researchers working worldwide and in all fields of knowledge, which ensures that the findings have a reasonable degree of generalizability to the global population of researchers.

摘要目的在研究过程的研究中，研究人员如何概念化研究和他们的战略研究议程之间的联系在很大程度上被忽视了。本研究旨在解决这一差距。设计/方法论/方法本研究使用全球所有科学领域8500多名研究人员的数据集分析了这种关系。它研究了两个清单的维度之间的关联：研究清单的概念（CoRI）和多维研究议程清单——修订版（MDRAI-R）。研究结果研究结果表明，研究人员的研究概念与他们的研究议程之间存在着相对强烈的联系。虽然所有的研究概念都与科学雄心呈正相关，但在这两个清单的维度如何相互关联方面，研究结果喜忧参半，这对那些寻求更好地理解知识生产过程的人来说意义重大。研究局限性该研究依赖于自我报告的数据，而这些数据总是存在反应偏差的风险。研究结果使人们更好地了解了知识过程的内部运作，并表明这两份清单，无论是单独使用还是组合使用，都可以为研究绩效指标提供互补的分析视角。因此，它们可以为研究环境的管理者提供重要的见解，让他们了解在设计策略以促进特定的机构研究重点和策略时，如何评估研究文化、信念和个人研究人员和研究团队的概念。原创性/价值据作者所知，这是第一项将研究议程和研究概念联系起来的研究。它基于在世界各地和所有知识领域工作的大量研究人员样本，这确保了这些发现在全球研究人员中具有合理程度的可推广性。

{"title":"The Association between Researchers’ Conceptions of Research and Their Strategic Research Agendas","authors":"João M. Santos, H. Horta","doi":"10.2478/jdis-2020-0032","DOIUrl":"https://doi.org/10.2478/jdis-2020-0032","url":null,"abstract":"Abstract Purpose In studies of the research process, the association between how researchers conceptualize research and their strategic research agendas has been largely overlooked. This study aims to address this gap. Design/methodology/approach This study analyzes this relationship using a dataset of more than 8,500 researchers across all scientific fields and the globe. It studies the associations between the dimensions of two inventories: the Conceptions of Research Inventory (CoRI) and the Multi-Dimensional Research Agenda Inventory—Revised (MDRAI-R). Findings The findings show a relatively strong association between researchers’ conceptions of research and their research agendas. While all conceptions of research are positively related to scientific ambition, the findings are mixed regarding how the dimensions of the two inventories relate to one another, which is significant for those seeking to understand the knowledge production process better. Research limitations The study relies on self-reported data, which always carries a risk of response bias. Practical implications The findings provide a greater understanding of the inner workings of knowledge processes and indicate that the two inventories, whether used individually or in combination, may provide complementary analytical perspectives to research performance indicators. They may thus offer important insights for managers of research environments regarding how to assess the research culture, beliefs, and conceptualizations of individual researchers and research teams when designing strategies to promote specific institutional research focuses and strategies. Originality/value To the best of the authors’ knowledge, this is the first study to associate research agendas and conceptions of research. It is based on a large sample of researchers working worldwide and in all fields of knowledge, which ensures that the findings have a reasonable degree of generalizability to the global population of researchers.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"5 1","pages":"56 - 74"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44315615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Bilateral Co-authorship Indicators Based on Fractional Counting 基于分数计数的双边合作作者指标

Journal of data and information science (Warsaw, Poland)

Pub Date : 2020-10-23 DOI: 10.2478/jdis-2021-0005

R. Rousseau, Lin Zhang

Abstract Purpose In this contribution we provide two new co-authorship indicators based on fractional counting. Design/methodology/approach Based on the idea of fractional counting we reflect on what should be an acceptable indicator for co-authorship between two entities. From this reflection we propose an indicator, the co-authorship score, denoted as cs, using the harmonic mean. Dividing this new indicator by the classical co-authorship indicator based on full counting, leads to a co-authorship intensity indicator. Findings We show that the indicators we propose have many necessary or at least highly desirable properties for a proper cs-score. It is pointed out that the two new indicators can be used for countries, but also for institutions and other pairs of entities. A small example shows the feasibility of the co-authorship score and the co-authorship intensity indicator. Research limitations The indicators are not yet tested in real cases. Practical implications As the notions of co-authorship and collaboration have many aspects, we think that our contribution may help policy management to take yet another aspect into account as part of a multi-faceted description of research outcomes. Originality/value The indicators we propose cover yet another aspect of co-authorship.

摘要目的在这篇文章中，我们提供了两个基于分数计数的新的合著指标。设计/方法论/方法基于分数计数的思想，我们思考了两个实体之间合作的可接受指标。根据这一反映，我们提出了一个指标，即使用调和平均值的合著者得分，表示为cs。将这一新指标除以基于完全计数的经典合作指标，得出合作强度指标。研究结果我们表明，我们提出的指标具有许多必要的或至少非常理想的性质，以获得适当的cs分数。有人指出，这两个新指标既可用于国家，也可用于机构和其他实体。一个小例子展示了合作作者得分和合作作者强度指标的可行性。研究局限性这些指标尚未在实际案例中进行测试。实际意义由于合著和合作的概念有很多方面，我们认为我们的贡献可能有助于政策管理将另一个方面作为研究成果多方面描述的一部分。原创性/价值我们提出的指标涵盖了合作的另一个方面。

{"title":"Bilateral Co-authorship Indicators Based on Fractional Counting","authors":"R. Rousseau, Lin Zhang","doi":"10.2478/jdis-2021-0005","DOIUrl":"https://doi.org/10.2478/jdis-2021-0005","url":null,"abstract":"Abstract Purpose In this contribution we provide two new co-authorship indicators based on fractional counting. Design/methodology/approach Based on the idea of fractional counting we reflect on what should be an acceptable indicator for co-authorship between two entities. From this reflection we propose an indicator, the co-authorship score, denoted as cs, using the harmonic mean. Dividing this new indicator by the classical co-authorship indicator based on full counting, leads to a co-authorship intensity indicator. Findings We show that the indicators we propose have many necessary or at least highly desirable properties for a proper cs-score. It is pointed out that the two new indicators can be used for countries, but also for institutions and other pairs of entities. A small example shows the feasibility of the co-authorship score and the co-authorship intensity indicator. Research limitations The indicators are not yet tested in real cases. Practical implications As the notions of co-authorship and collaboration have many aspects, we think that our contribution may help policy management to take yet another aspect into account as part of a multi-faceted description of research outcomes. Originality/value The indicators we propose cover yet another aspect of co-authorship.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"1 - 12"},"PeriodicalIF":0.0,"publicationDate":"2020-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41647492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

National Lists of Scholarly Publication Channels: An Overview and Recommendations for Their Construction and Maintenance 全国学术出版渠道目录:综述及建设与维护建议

Journal of data and information science (Warsaw, Poland)

Pub Date : 2020-10-23 DOI: 10.2478/jdis-2021-0004

Janne Pölönen, Raf Guns, Emanuel Kulczycki, G. Sivertsen, Tim C. E. Engels

Abstract Purpose This paper presents an overview of different kinds of lists of scholarly publication channels and of experiences related to the construction and maintenance of national lists supporting performance-based research funding systems. It also contributes with a set of recommendations for the construction and maintenance of national lists of journals and book publishers. Design/methodology/approach The study is based on analysis of previously published studies, policy papers, and reported experiences related to the construction and use of lists of scholarly publication channels. Findings Several countries have systems for research funding and/or evaluation, that involve the use of national lists of scholarly publication channels (mainly journals and publishers). Typically, such lists are selective (do not include all scholarly or non-scholarly channels) and differentiated (distinguish between channels of different levels and quality). At the same time, most lists are embedded in a system that encompasses multiple or all disciplines. This raises the question how such lists can be organized and maintained to ensure that all relevant disciplines and all types of research are adequately represented. Research limitation The conclusions and recommendations of the study are based on the authors’ interpretation of a complex and sometimes controversial process with many different stakeholders involved. Practical implications The recommendations and the related background information provided in this paper enable mutual learning that may feed into improvements in the construction and maintenance of national and other lists of scholarly publication channels in any geographical context. This may foster a development of responsible evaluation practices. Originality/value This paper presents the first general overview and typology of different kinds of publication channel lists, provides insights on expert-based versus metrics-based evaluation, and formulates a set of recommendations for the responsible construction and maintenance of publication channel lists.

摘要目的本文概述了不同类型的学术出版渠道清单，以及与支持基于绩效的研究资助系统的国家清单的构建和维护相关的经验。它还为建立和维护国家期刊和图书出版商名单提出了一系列建议。设计/方法论/方法本研究基于对先前发表的研究、政策论文和报告的与学术出版渠道列表的构建和使用有关的经验的分析。研究结果几个国家有研究资助和/或评估系统，涉及使用国家学术出版渠道清单（主要是期刊和出版商）。通常，此类列表是有选择性的（不包括所有学术或非学术渠道）和有区别的（区分不同级别和质量的渠道）。同时，大多数列表都嵌入到一个包含多个或所有学科的系统中。这就提出了一个问题，即如何组织和维护这些清单，以确保所有相关学科和所有类型的研究都有充分的代表性。研究局限性研究的结论和建议是基于作者对一个复杂且有时有争议的过程的解释，其中涉及许多不同的利益相关者。实际意义本文提供的建议和相关背景信息有助于相互学习，有助于在任何地理背景下改进国家和其他学术出版渠道的建设和维护。这可能有助于发展负责任的评价做法。原创性/价值本文首次对不同类型的出版渠道列表进行了概述和类型化，对基于专家与基于度量的评估提供了见解，并为负责任地构建和维护出版渠道列表制定了一套建议。

{"title":"National Lists of Scholarly Publication Channels: An Overview and Recommendations for Their Construction and Maintenance","authors":"Janne Pölönen, Raf Guns, Emanuel Kulczycki, G. Sivertsen, Tim C. E. Engels","doi":"10.2478/jdis-2021-0004","DOIUrl":"https://doi.org/10.2478/jdis-2021-0004","url":null,"abstract":"Abstract Purpose This paper presents an overview of different kinds of lists of scholarly publication channels and of experiences related to the construction and maintenance of national lists supporting performance-based research funding systems. It also contributes with a set of recommendations for the construction and maintenance of national lists of journals and book publishers. Design/methodology/approach The study is based on analysis of previously published studies, policy papers, and reported experiences related to the construction and use of lists of scholarly publication channels. Findings Several countries have systems for research funding and/or evaluation, that involve the use of national lists of scholarly publication channels (mainly journals and publishers). Typically, such lists are selective (do not include all scholarly or non-scholarly channels) and differentiated (distinguish between channels of different levels and quality). At the same time, most lists are embedded in a system that encompasses multiple or all disciplines. This raises the question how such lists can be organized and maintained to ensure that all relevant disciplines and all types of research are adequately represented. Research limitation The conclusions and recommendations of the study are based on the authors’ interpretation of a complex and sometimes controversial process with many different stakeholders involved. Practical implications The recommendations and the related background information provided in this paper enable mutual learning that may feed into improvements in the construction and maintenance of national and other lists of scholarly publication channels in any geographical context. This may foster a development of responsible evaluation practices. Originality/value This paper presents the first general overview and typology of different kinds of publication channel lists, provides insights on expert-based versus metrics-based evaluation, and formulates a set of recommendations for the responsible construction and maintenance of publication channel lists.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"50 - 86"},"PeriodicalIF":0.0,"publicationDate":"2020-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46941913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Sentence, Phrase, and Triple Annotations to Build a Knowledge Graph of Natural Language Processing Contributions—A Trial Dataset 句子、短语和三重注释构建自然语言处理贡献的知识图谱——一个试验数据集

Journal of data and information science (Warsaw, Poland)

Pub Date : 2020-10-09 DOI: 10.2478/jdis-2021-0023

J. D’Souza, S. Auer

Abstract Purpose This work aims to normalize the NlpContributions scheme (henceforward, NlpContributionGraph) to structure, directly from article sentences, the contributions information in Natural Language Processing (NLP) scholarly articles via a two-stage annotation methodology: 1) pilot stage—to define the scheme (described in prior work); and 2) adjudication stage—to normalize the graphing model (the focus of this paper). Design/methodology/approach We re-annotate, a second time, the contributions-pertinent information across 50 prior-annotated NLP scholarly articles in terms of a data pipeline comprising: contribution-centered sentences, phrases, and triple statements. To this end, specifically, care was taken in the adjudication annotation stage to reduce annotation noise while formulating the guidelines for our proposed novel NLP contributions structuring and graphing scheme. Findings The application of NlpContributionGraph on the 50 articles resulted finally in a dataset of 900 contribution-focused sentences, 4,702 contribution-information-centered phrases, and 2,980 surface-structured triples. The intra-annotation agreement between the first and second stages, in terms of F1-score, was 67.92% for sentences, 41.82% for phrases, and 22.31% for triple statements indicating that with increased granularity of the information, the annotation decision variance is greater. Research limitations NlpContributionGraph has limited scope for structuring scholarly contributions compared with STEM (Science, Technology, Engineering, and Medicine) scholarly knowledge at large. Further, the annotation scheme in this work is designed by only an intra-annotator consensus—a single annotator first annotated the data to propose the initial scheme, following which, the same annotator reannotated the data to normalize the annotations in an adjudication stage. However, the expected goal of this work is to achieve a standardized retrospective model of capturing NLP contributions from scholarly articles. This would entail a larger initiative of enlisting multiple annotators to accommodate different worldviews into a “single” set of structures and relationships as the final scheme. Given that the initial scheme is first proposed and the complexity of the annotation task in the realistic timeframe, our intra-annotation procedure is well-suited. Nevertheless, the model proposed in this work is presently limited since it does not incorporate multiple annotator worldviews. This is planned as future work to produce a robust model. Practical implications We demonstrate NlpContributionGraph data integrated into the Open Research Knowledge Graph (ORKG), a next-generation KG-based digital library with intelligent computations enabled over structured scholarly knowledge, as a viable aid to assist researchers in their day-to-day tasks. Originality/value NlpContributionGraph is a novel scheme to annotate research contributions from NLP articles and integrate them in a knowledge

本工作旨在通过两阶段的标注方法，规范nlpcontribution方案(以下简称NlpContributionGraph)，以直接从文章句子中构建自然语言处理(NLP)学术文章中的贡献信息:1)试点阶段-定义方案(在先前的工作中描述);2)裁定阶段——对绘图模型进行规范化(本文的重点)。设计/方法/方法我们第二次重新注释了50篇先前注释过的NLP学术文章中的贡献相关信息，这些信息由一个数据管道组成:以贡献为中心的句子、短语和三重语句。为此，我们在裁决注释阶段特别注意减少注释噪声，同时为我们提出的新型NLP贡献结构和绘图方案制定指导方针。NlpContributionGraph在50篇文章上的应用最终得到了一个包含900个以贡献为中心的句子、4702个以贡献信息为中心的短语和2980个表面结构三元组的数据集。第一阶段和第二阶段的注释内一致性，句子为67.92%，短语为41.82%，三组语句为22.31%，说明随着信息粒度的增加，标注决策方差更大。与STEM(科学、技术、工程和医学)学术知识相比，NlpContributionGraph在构建学术贡献方面的范围有限。此外，本工作中的注释方案仅由注释者内部共识设计-单个注释者首先对数据进行注释以提出初始方案，随后，同一注释者重新注释数据以在裁决阶段对注释进行规范化。然而，这项工作的预期目标是实现从学术文章中获取NLP贡献的标准化回顾性模型。这将需要一个更大的倡议，即招募多个注释者，以将不同的世界观容纳到“单一”的结构和关系集中，作为最终方案。考虑到最初的方案是首先提出的，以及在实际时间框架内注释任务的复杂性，我们的内部注释过程非常适合。然而，在这项工作中提出的模型目前是有限的，因为它没有纳入多个注释者的世界观。这是计划作为未来的工作，以产生一个强大的模型。我们展示了NlpContributionGraph数据集成到开放研究知识图谱(ORKG)中，ORKG是下一代基于kg的数字图书馆，具有结构化学术知识的智能计算能力，可以帮助研究人员完成日常任务。NlpContributionGraph是一种新颖的方案，用于注释NLP文章中的研究贡献，并将其整合到一个知识图中，据我们所知，这个知识图在社区中是不存在的。此外，我们对两阶段注释任务的定量评估提供了对任务难度的见解。

{"title":"Sentence, Phrase, and Triple Annotations to Build a Knowledge Graph of Natural Language Processing Contributions—A Trial Dataset","authors":"J. D’Souza, S. Auer","doi":"10.2478/jdis-2021-0023","DOIUrl":"https://doi.org/10.2478/jdis-2021-0023","url":null,"abstract":"Abstract Purpose This work aims to normalize the NlpContributions scheme (henceforward, NlpContributionGraph) to structure, directly from article sentences, the contributions information in Natural Language Processing (NLP) scholarly articles via a two-stage annotation methodology: 1) pilot stage—to define the scheme (described in prior work); and 2) adjudication stage—to normalize the graphing model (the focus of this paper). Design/methodology/approach We re-annotate, a second time, the contributions-pertinent information across 50 prior-annotated NLP scholarly articles in terms of a data pipeline comprising: contribution-centered sentences, phrases, and triple statements. To this end, specifically, care was taken in the adjudication annotation stage to reduce annotation noise while formulating the guidelines for our proposed novel NLP contributions structuring and graphing scheme. Findings The application of NlpContributionGraph on the 50 articles resulted finally in a dataset of 900 contribution-focused sentences, 4,702 contribution-information-centered phrases, and 2,980 surface-structured triples. The intra-annotation agreement between the first and second stages, in terms of F1-score, was 67.92% for sentences, 41.82% for phrases, and 22.31% for triple statements indicating that with increased granularity of the information, the annotation decision variance is greater. Research limitations NlpContributionGraph has limited scope for structuring scholarly contributions compared with STEM (Science, Technology, Engineering, and Medicine) scholarly knowledge at large. Further, the annotation scheme in this work is designed by only an intra-annotator consensus—a single annotator first annotated the data to propose the initial scheme, following which, the same annotator reannotated the data to normalize the annotations in an adjudication stage. However, the expected goal of this work is to achieve a standardized retrospective model of capturing NLP contributions from scholarly articles. This would entail a larger initiative of enlisting multiple annotators to accommodate different worldviews into a “single” set of structures and relationships as the final scheme. Given that the initial scheme is first proposed and the complexity of the annotation task in the realistic timeframe, our intra-annotation procedure is well-suited. Nevertheless, the model proposed in this work is presently limited since it does not incorporate multiple annotator worldviews. This is planned as future work to produce a robust model. Practical implications We demonstrate NlpContributionGraph data integrated into the Open Research Knowledge Graph (ORKG), a next-generation KG-based digital library with intelligent computations enabled over structured scholarly knowledge, as a viable aid to assist researchers in their day-to-day tasks. Originality/value NlpContributionGraph is a novel scheme to annotate research contributions from NLP articles and integrate them in a knowledge","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"6 - 34"},"PeriodicalIF":0.0,"publicationDate":"2020-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48540740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Current Status and Enhancement of Collaborative Research in the World: A Case Study of Osaka University 世界合作研究的现状与加强——以大阪大学为例

Journal of data and information science (Warsaw, Poland)

Pub Date : 2020-09-22 DOI: 10.2478/jdis-2020-0035

S. Iwami, Toshihiko Shimizu, M. Empizo, J. Gabayno, N. Sarukura, Shota Fujii, Yoshinari Sumimura

Abstract Purpose The purpose of this research is to provide evidence for decision-makers to realize the potentials of collaborations between countries/regions via the scientometric analysis of co-authoring in academic publications. Design/methodology/approach The approach is that Osaka University, which has set a strategy to become a global campus, is positioned to have a leading role to enhance such collaborations. This research measures co-authoring relations between Osaka University and other countries/regions to identify networks for fostering strong research collaborations. Findings Five countries are identified as candidates for the future global campuses of Osaka University based on three factors, co-authoring relations, GDP growth, and population growth. Research limitations The main limitation of this study is not being able to use the relations by the former positions of authors in Osaka University, because the data retrieved is limited by the query of the organization name at the first step. Practical implications The significance of this work is to provide evidence for the university strategy to expand abroad based on the quantity and visualization of trends. Originality/value With wider practical implementations, the approach of this research is useful in making a strategic roadmap for scientific organizations that intend to collaborate internationally.

摘要目的本研究的目的是通过对学术出版物共同创作的科学计量分析，为决策者实现国家/地区之间合作的潜力提供证据。设计/方法论/方法大阪大学制定了成为全球校园的战略，并在加强此类合作方面发挥主导作用。这项研究衡量了大阪大学与其他国家/地区之间的共同创作关系，以确定促进强有力的研究合作的网络。研究结果基于共同创作关系、GDP增长和人口增长三个因素，五个国家被确定为大阪大学未来全球校园的候选国。研究局限性本研究的主要局限性是无法使用大阪大学作者以前职位的关系，因为检索到的数据在第一步受到组织名称查询的限制。这项工作的意义在于为大学基于数量和趋势可视化的海外扩张战略提供证据。独创性/价值随着更广泛的实际实施，这项研究的方法有助于为打算进行国际合作的科学组织制定战略路线图。

{"title":"Current Status and Enhancement of Collaborative Research in the World: A Case Study of Osaka University","authors":"S. Iwami, Toshihiko Shimizu, M. Empizo, J. Gabayno, N. Sarukura, Shota Fujii, Yoshinari Sumimura","doi":"10.2478/jdis-2020-0035","DOIUrl":"https://doi.org/10.2478/jdis-2020-0035","url":null,"abstract":"Abstract Purpose The purpose of this research is to provide evidence for decision-makers to realize the potentials of collaborations between countries/regions via the scientometric analysis of co-authoring in academic publications. Design/methodology/approach The approach is that Osaka University, which has set a strategy to become a global campus, is positioned to have a leading role to enhance such collaborations. This research measures co-authoring relations between Osaka University and other countries/regions to identify networks for fostering strong research collaborations. Findings Five countries are identified as candidates for the future global campuses of Osaka University based on three factors, co-authoring relations, GDP growth, and population growth. Research limitations The main limitation of this study is not being able to use the relations by the former positions of authors in Osaka University, because the data retrieved is limited by the query of the organization name at the first step. Practical implications The significance of this work is to provide evidence for the university strategy to expand abroad based on the quantity and visualization of trends. Originality/value With wider practical implementations, the approach of this research is useful in making a strategic roadmap for scientific organizations that intend to collaborate internationally.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"5 1","pages":"75 - 85"},"PeriodicalIF":0.0,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47406637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Identifying Scientific and Technical “Unicorns” 识别科技“独角兽”

Journal of data and information science (Warsaw, Poland)

Pub Date : 2020-09-22 DOI: 10.2478/jdis-2021-0002

Lucy L. Xu, Miao Qi, F. Y. Ye

Abstract Purpose Using the metaphor of “unicorn,” we identify the scientific papers and technical patents characterized by the informetric feature of very high citations in the first ten years after publishing, which may provide a new pattern to understand very high impact works in science and technology. Design/methodology/approach When we set CT as the total citations of papers or patents in the first ten years after publication, with CT≥ 5,000 for scientific “unicorn” and CT≥ 500 for technical “unicorn,” we have an absolute standard for identifying scientific and technical “unicorn” publications. Findings We identify 165 scientific “unicorns” in 14,301,875 WoS papers and 224 technical “unicorns” in 13,728,950 DII patents during 2001–2012. About 50% of “unicorns” belong to biomedicine, in which selected cases are individually discussed. The rare “unicorns” increase following linear model, the fitting data show 95% confidence with the RMSE of scientific “unicorn” is 0.2127 while the RMSE of technical “unicorn” is 0.0923. Research limitations A “unicorn” is a pure quantitative consideration without concerning its quality, and “potential unicorns” as CT≤5,000 for papers and CT≤500 for patents are left in future studies. Practical implications Scientific and technical “unicorns” provide a new pattern to understand high-impact works in science and technology. The “unicorn” pattern supplies a concise approach to identify very high-impact scientific papers and technical patents. Originality/value The “unicorn” pattern supplies a concise approach to identify very high impact scientific papers and technical patents.

摘要目的以“独角兽”为隐喻，对发表后10年内具有高被引信息特征的科技论文和技术专利进行识别，为理解高影响力科技论文和技术专利提供新的思路。当我们将CT设置为发表后前十年的论文或专利总被引次数时，科学“独角兽”的CT≥5000次，技术“独角兽”的CT≥500次，我们就有了一个确定科技“独角兽”出版物的绝对标准。2001-2012年，我们在14301875篇WoS论文中发现了165个科学“独角兽”，在13728950项DII专利中发现了224个技术“独角兽”。大约50%的“独角兽”属于生物医学领域，在生物医学领域，选定的案例会被单独讨论。罕见的“独角兽”在线性模型下增加，拟合数据显示95%置信度，科学“独角兽”的RMSE为0.2127，技术“独角兽”的RMSE为0.0923。“独角兽”是纯粹的定量考虑，不考虑其质量，在未来的研究中保留论文CT≤5000、专利CT≤500的“潜在独角兽”。科技“独角兽”提供了一种理解高影响力科技作品的新模式。“独角兽”模式提供了一种简洁的方法来识别非常有影响力的科学论文和技术专利。“独角兽”模式提供了一种简洁的方法来识别非常有影响力的科学论文和技术专利。

{"title":"Identifying Scientific and Technical “Unicorns”","authors":"Lucy L. Xu, Miao Qi, F. Y. Ye","doi":"10.2478/jdis-2021-0002","DOIUrl":"https://doi.org/10.2478/jdis-2021-0002","url":null,"abstract":"Abstract Purpose Using the metaphor of “unicorn,” we identify the scientific papers and technical patents characterized by the informetric feature of very high citations in the first ten years after publishing, which may provide a new pattern to understand very high impact works in science and technology. Design/methodology/approach When we set CT as the total citations of papers or patents in the first ten years after publication, with CT≥ 5,000 for scientific “unicorn” and CT≥ 500 for technical “unicorn,” we have an absolute standard for identifying scientific and technical “unicorn” publications. Findings We identify 165 scientific “unicorns” in 14,301,875 WoS papers and 224 technical “unicorns” in 13,728,950 DII patents during 2001–2012. About 50% of “unicorns” belong to biomedicine, in which selected cases are individually discussed. The rare “unicorns” increase following linear model, the fitting data show 95% confidence with the RMSE of scientific “unicorn” is 0.2127 while the RMSE of technical “unicorn” is 0.0923. Research limitations A “unicorn” is a pure quantitative consideration without concerning its quality, and “potential unicorns” as CT≤5,000 for papers and CT≤500 for patents are left in future studies. Practical implications Scientific and technical “unicorns” provide a new pattern to understand high-impact works in science and technology. The “unicorn” pattern supplies a concise approach to identify very high-impact scientific papers and technical patents. Originality/value The “unicorn” pattern supplies a concise approach to identify very high impact scientific papers and technical patents.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"96 - 115"},"PeriodicalIF":0.0,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48100575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

An Automatic Approach to Extending the Consumer Health Vocabulary 一种自动扩展消费者健康词汇的方法

Journal of data and information science (Warsaw, Poland)

Pub Date : 2020-09-22 DOI: 10.2478/jdis-2021-0003

Michal Monselise, J. Greenberg, Ou Stella Liang, Sonia M. Pascua, Heejun Kim, Mat Kelly, Joan Boone, Christopher C. Yang

Abstract Purpose Given the ubiquitous presence of the internet in our lives, many individuals turn to the web for medical information. A challenge here is that many laypersons (as “consumers”) do not use professional terms found in the medical nomenclature when describing their conditions and searching the internet. The Consumer Health Vocabulary (CHV) ontology, initially developed in 2007, aimed to bridge this gap, although updates have been limited over the last decade. The purpose of this research is to implement a means of automatically creating a hierarchical consumer health vocabulary. This overall purpose is improving consumers’ ability to search for medical conditions and symptoms with an enhanced CHV and improving the search capabilities of our searching and indexing tool HIVE (Helping Interdisciplinary Vocabulary Engineering). Design/methodology/approach The research design uses ontological fusion, an approach for automatically extracting and integrating the Medical Subject Headings (MeSH) ontology into CHV, and further convert CHV from a flat mapping to a hierarchical ontology. The additional relationships and parent terms from MeSH allow us to uncover relationships between existing terms in the CHV ontology as well. The research design also included improving the search capabilities of HIVE identifying alternate relationships and consolidating them to a single entry. Findings The key findings are an improved CHV with a hierarchical structure that enables consumers to search through the ontology and uncover more relationships. Research limitations There are some cases where the improved search results in HIVE return terms that are related but not completely synonymous. We present an example and discuss the implications of this result. Practical implications This research makes available an updated and richer CHV ontology using the HIVE tool. Consumers may use this tool to search consumer terminology for medical conditions and symptoms. The HIVE tool will return results about the medical term linked with the consumer term as well as the hierarchy of other medical terms connected to the term. Originality/value This is a first attempt in over a decade to improve and enhance the CHV ontology with current terminology and the first research effort to convert CHV's original flat ontology structure to a hierarchical structure. This research also enhances the HIVE infrastructure and provides consumers with a simple, efficient mechanism for searching the CHV ontology and providing meaningful data to consumers.

鉴于互联网在我们生活中无处不在，许多人转向网络获取医疗信息。这里的一个挑战是，许多外行(作为“消费者”)在描述他们的病情和搜索互联网时不使用医学术语。消费者健康词汇(CHV)本体最初于2007年开发，旨在弥补这一差距，尽管在过去十年中更新有限。本研究的目的是实现一种自动创建分层消费者健康词汇表的方法。这一总体目的是提高消费者使用增强的CHV搜索医疗条件和症状的能力，并提高我们的搜索和索引工具HIVE(帮助跨学科词汇工程)的搜索能力。设计/方法/方法本研究设计采用本体融合的方法，将医学主题词(MeSH)本体自动提取并集成到CHV中，进而将CHV从平面映射转化为层次本体。来自MeSH的附加关系和父术语也允许我们发现CHV本体中现有术语之间的关系。研究设计还包括提高HIVE识别替代关系的搜索能力，并将它们整合到单个条目中。主要发现是改进的CHV具有层次结构，使消费者能够在本体中搜索并发现更多关系。在某些情况下，HIVE中改进的搜索结果返回相关但不完全同义的术语。我们给出了一个例子，并讨论了这一结果的含义。本研究利用HIVE工具提供了更新和更丰富的CHV本体。消费者可以使用此工具搜索医疗条件和症状的消费者术语。HIVE工具将返回与消费者术语相关的医学术语的结果，以及与该术语相关的其他医学术语的层次结构。这是十多年来第一次尝试用现有术语对CHV本体进行改进和增强，也是第一次将CHV原有的平面本体结构转化为层次结构的研究。本研究还增强了HIVE基础架构，为消费者提供了一种简单、高效的CHV本体搜索机制，为消费者提供有意义的数据。

{"title":"An Automatic Approach to Extending the Consumer Health Vocabulary","authors":"Michal Monselise, J. Greenberg, Ou Stella Liang, Sonia M. Pascua, Heejun Kim, Mat Kelly, Joan Boone, Christopher C. Yang","doi":"10.2478/jdis-2021-0003","DOIUrl":"https://doi.org/10.2478/jdis-2021-0003","url":null,"abstract":"Abstract Purpose Given the ubiquitous presence of the internet in our lives, many individuals turn to the web for medical information. A challenge here is that many laypersons (as “consumers”) do not use professional terms found in the medical nomenclature when describing their conditions and searching the internet. The Consumer Health Vocabulary (CHV) ontology, initially developed in 2007, aimed to bridge this gap, although updates have been limited over the last decade. The purpose of this research is to implement a means of automatically creating a hierarchical consumer health vocabulary. This overall purpose is improving consumers’ ability to search for medical conditions and symptoms with an enhanced CHV and improving the search capabilities of our searching and indexing tool HIVE (Helping Interdisciplinary Vocabulary Engineering). Design/methodology/approach The research design uses ontological fusion, an approach for automatically extracting and integrating the Medical Subject Headings (MeSH) ontology into CHV, and further convert CHV from a flat mapping to a hierarchical ontology. The additional relationships and parent terms from MeSH allow us to uncover relationships between existing terms in the CHV ontology as well. The research design also included improving the search capabilities of HIVE identifying alternate relationships and consolidating them to a single entry. Findings The key findings are an improved CHV with a hierarchical structure that enables consumers to search through the ontology and uncover more relationships. Research limitations There are some cases where the improved search results in HIVE return terms that are related but not completely synonymous. We present an example and discuss the implications of this result. Practical implications This research makes available an updated and richer CHV ontology using the HIVE tool. Consumers may use this tool to search consumer terminology for medical conditions and symptoms. The HIVE tool will return results about the medical term linked with the consumer term as well as the hierarchy of other medical terms connected to the term. Originality/value This is a first attempt in over a decade to improve and enhance the CHV ontology with current terminology and the first research effort to convert CHV's original flat ontology structure to a hierarchical structure. This research also enhances the HIVE infrastructure and provides consumers with a simple, efficient mechanism for searching the CHV ontology and providing meaningful data to consumers.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"35 - 49"},"PeriodicalIF":0.0,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45007645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Correction: Corrigendum: The Gender Patenting Gap: A Study on the Iberoamerican Countries 更正:更正:性别专利差距:对伊比利亚美洲国家的研究

Journal of data and information science (Warsaw, Poland)

Pub Date : 2020-09-22 DOI: 10.2478/jdis-2020-0039

Danilo S. Carvalho, Lydia Bares, Kelyane Silva

引用次数: 0

Priorities for Social and Humanities Projects Based on Text Analysis① 基于文本分析的社会人文项目优先级①

Journal of data and information science (Warsaw, Poland)

Pub Date : 2020-09-21 DOI: 10.2478/jdis-2020-0036

Ülle Must

Abstract Purpose Changes in the world show that the role, importance, and coherence of SSH (social sciences and the humanities) will increase significantly in the coming years. This paper aims to monitor and analyze the evolution (or overlapping) of the SSH thematic pattern through three funding instruments since 2007. Design/methodology/approach The goal of the paper is to check to what extent the EU Framework Program (FP) affects/does not affect research on national level, and to highlight hot topics from a given period with the help of text analysis. Funded project titles and abstracts derived from the EU FP, Slovenian, and Estonian RIS were used. The final analysis and comparisons between different datasets were made based on the 200 most frequent words. After removing punctuation marks, numeric values, articles, prepositions, conjunctions, and auxiliary verbs, 4,854 unique words in ETIS, 4,421 unique words in the Slovenian Research Information System (SICRIS), and 3,950 unique words in FP were identified. Findings Across all funding instruments, about a quarter of the top words constitute half of the word occurrences. The text analysis results show that in the majority of cases words do not overlap between FP and nationally funded projects. In some cases, it may be due to using different vocabulary. There is more overlapping between words in the case of Slovenia (SL) and Estonia (EE) and less in the case of Estonia and EU Framework Programmes (FP). At the same time, overlapping words indicate a wider reach (culture, education, social, history, human, innovation, etc.). In nationally funded projects (bottom-up), it was relatively difficult to observe the change in thematic trends over time. More specific results emerged from the comparison of the different programs throughout FP (top-down). Research limitations Only projects with English titles and abstracts were analyzed. Practical implications The specifics of SSH have to take into account—the one-to-one meaning of terms/words is not as important as, for example, in the exact sciences. Thus, even in co-word analysis, the final content may go unnoticed. Originality/value This was the first attempt to monitor the trends of SSH projects using text analysis. The text analysis of the SSH projects of the two new EU Member States used in the study showed that SSH's thematic coverage is not much affected by the EU Framework Program. Whether this result is field-specific or country-specific should be shown in the following study, which targets SSH projects in the so-called old Member States.

摘要目的世界的变化表明，SSH（社会科学和人文学科）的作用、重要性和连贯性将在未来几年显著提高。本文旨在通过三种资助工具监测和分析自2007年以来SSH主题模式的演变（或重叠）。设计/方法论/方法本文的目标是检查欧盟框架计划在多大程度上影响/不影响国家层面的研究，并借助文本分析突出特定时期的热门话题。使用了源自欧盟FP、斯洛文尼亚语和爱沙尼亚RIS的资助项目标题和摘要。基于200个最频繁的单词，对不同数据集进行了最终分析和比较。在去除标点符号、数值、冠词、介词、连词和助动词后，ETIS中的4854个独特单词、斯洛文尼亚研究信息系统（SICRIS）中的4421个独特单词和FP中的3950个独特单词被识别。调查结果在所有资助工具中，大约四分之一的热门单词占单词出现次数的一半。文本分析结果表明，在大多数情况下，FP和国家资助项目之间的单词没有重叠。在某些情况下，这可能是由于使用了不同的词汇。斯洛文尼亚（SL）和爱沙尼亚（EE）的单词之间重叠较多，爱沙尼亚和欧盟框架计划（FP）的单词重叠较少。同时，重叠的单词表示范围更广（文化、教育、社会、历史、人类、创新等）。在国家资助的项目（自下而上）中，相对难以观察到主题趋势随时间的变化。通过对整个FP（自上而下）不同项目的比较，得出了更具体的结果。研究局限性只分析了英文标题和摘要的项目。实际含义SSH的具体内容必须考虑到——术语/单词的一对一含义不像精确科学中那么重要。因此，即使在共词分析中，最终内容也可能被忽视。原创性/价值这是第一次尝试使用文本分析来监控SSH项目的趋势。研究中使用的对两个新欧盟成员国SSH项目的文本分析表明，SSH的主题覆盖范围不受欧盟框架计划的太大影响。这一结果是针对具体领域还是针对具体国家，应在以下研究中说明，该研究针对所谓旧成员国的SSH项目。

{"title":"Priorities for Social and Humanities Projects Based on Text Analysis①","authors":"Ülle Must","doi":"10.2478/jdis-2020-0036","DOIUrl":"https://doi.org/10.2478/jdis-2020-0036","url":null,"abstract":"Abstract Purpose Changes in the world show that the role, importance, and coherence of SSH (social sciences and the humanities) will increase significantly in the coming years. This paper aims to monitor and analyze the evolution (or overlapping) of the SSH thematic pattern through three funding instruments since 2007. Design/methodology/approach The goal of the paper is to check to what extent the EU Framework Program (FP) affects/does not affect research on national level, and to highlight hot topics from a given period with the help of text analysis. Funded project titles and abstracts derived from the EU FP, Slovenian, and Estonian RIS were used. The final analysis and comparisons between different datasets were made based on the 200 most frequent words. After removing punctuation marks, numeric values, articles, prepositions, conjunctions, and auxiliary verbs, 4,854 unique words in ETIS, 4,421 unique words in the Slovenian Research Information System (SICRIS), and 3,950 unique words in FP were identified. Findings Across all funding instruments, about a quarter of the top words constitute half of the word occurrences. The text analysis results show that in the majority of cases words do not overlap between FP and nationally funded projects. In some cases, it may be due to using different vocabulary. There is more overlapping between words in the case of Slovenia (SL) and Estonia (EE) and less in the case of Estonia and EU Framework Programmes (FP). At the same time, overlapping words indicate a wider reach (culture, education, social, history, human, innovation, etc.). In nationally funded projects (bottom-up), it was relatively difficult to observe the change in thematic trends over time. More specific results emerged from the comparison of the different programs throughout FP (top-down). Research limitations Only projects with English titles and abstracts were analyzed. Practical implications The specifics of SSH have to take into account—the one-to-one meaning of terms/words is not as important as, for example, in the exact sciences. Thus, even in co-word analysis, the final content may go unnoticed. Originality/value This was the first attempt to monitor the trends of SSH projects using text analysis. The text analysis of the SSH projects of the two new EU Member States used in the study showed that SSH's thematic coverage is not much affected by the EU Framework Program. Whether this result is field-specific or country-specific should be shown in the following study, which targets SSH projects in the so-called old Member States.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"5 1","pages":"116 - 125"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44997062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A New Citation Recommendation Strategy Based on Term Functions in Related Studies Section 基于术语函数的引文推荐新策略

Journal of data and information science (Warsaw, Poland)

Pub Date : 2020-09-11 DOI: 10.2478/jdis-2021-0022

Haihua Chen

Abstract Purpose Researchers frequently encounter the following problems when writing scientific articles: (1) Selecting appropriate citations to support the research idea is challenging. (2) The literature review is not conducted extensively, which leads to working on a research problem that others have well addressed. The study focuses on citation recommendation in the related studies section by applying the term function of a citation context, potentially improving the efficiency of writing a literature review. Design/methodology/approach We present nine term functions with three newly created and six identified from existing literature. Using these term functions as labels, we annotate 531 research papers in three topics to evaluate our proposed recommendation strategy. BM25 and Word2vec with VSM are implemented as the baseline models for the recommendation. Then the term function information is applied to enhance the performance. Findings The experiments show that the term function-based methods outperform the baseline methods regarding the recall, precision, and F1-score measurement, demonstrating that term functions are useful in identifying valuable citations. Research limitations The dataset is insufficient due to the complexity of annotating citation functions for paragraphs in the related studies section. More recent deep learning models should be performed to future validate the proposed approach. Practical implications The citation recommendation strategy can be helpful for valuable citation discovery, semantic scientific retrieval, and automatic literature review generation. Originality/value The proposed citation function-based citation recommendation can generate intuitive explanations of the results for users, improving the transparency, persuasiveness, and effectiveness of recommender systems.

研究人员在撰写科学论文时经常遇到以下问题:(1)选择合适的引文来支持研究思路具有挑战性。(2)文献综述没有进行广泛的研究，这导致研究的问题别人已经很好地解决了。本研究将重点放在相关研究部分的引文推荐上，利用引文上下文的术语函数，有可能提高文献综述的写作效率。设计/方法/方法我们提出了九个术语函数，其中三个是新创建的，六个是从现有文献中确定的。使用这些术语函数作为标签，我们注释了三个主题的531篇研究论文，以评估我们提出的推荐策略。采用BM25、Word2vec和VSM作为推荐的基线模型。然后利用术语函数信息来提高性能。实验结果表明，基于术语函数的方法在查全率、查准率和f1分数测量方面优于基线方法，表明术语函数在识别有价值的引文方面是有用的。研究局限:由于相关研究部分段落的标注引用功能的复杂性，数据集不足。应该执行更多最新的深度学习模型来验证所提出的方法。引文推荐策略有助于有价值的引文发现、语义科学检索和文献综述自动生成。本文提出的基于引文函数的引文推荐可以为用户生成对结果的直观解释，提高推荐系统的透明度、说服力和有效性。

{"title":"A New Citation Recommendation Strategy Based on Term Functions in Related Studies Section","authors":"Haihua Chen","doi":"10.2478/jdis-2021-0022","DOIUrl":"https://doi.org/10.2478/jdis-2021-0022","url":null,"abstract":"Abstract Purpose Researchers frequently encounter the following problems when writing scientific articles: (1) Selecting appropriate citations to support the research idea is challenging. (2) The literature review is not conducted extensively, which leads to working on a research problem that others have well addressed. The study focuses on citation recommendation in the related studies section by applying the term function of a citation context, potentially improving the efficiency of writing a literature review. Design/methodology/approach We present nine term functions with three newly created and six identified from existing literature. Using these term functions as labels, we annotate 531 research papers in three topics to evaluate our proposed recommendation strategy. BM25 and Word2vec with VSM are implemented as the baseline models for the recommendation. Then the term function information is applied to enhance the performance. Findings The experiments show that the term function-based methods outperform the baseline methods regarding the recall, precision, and F1-score measurement, demonstrating that term functions are useful in identifying valuable citations. Research limitations The dataset is insufficient due to the complexity of annotating citation functions for paragraphs in the related studies section. More recent deep learning models should be performed to future validate the proposed approach. Practical implications The citation recommendation strategy can be helpful for valuable citation discovery, semantic scientific retrieval, and automatic literature review generation. Originality/value The proposed citation function-based citation recommendation can generate intuitive explanations of the results for users, improving the transparency, persuasiveness, and effectiveness of recommender systems.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"75 - 98"},"PeriodicalIF":0.0,"publicationDate":"2020-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46889303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4