Journal of data and information science (Warsaw, Poland)最新文献_第6页

The Three-Step Workflow: A Pragmatic Approach to Allocating Academic Hospitals’ Affiliations for Bibliometric Purposes 三步工作流程:为文献计量目的分配学术医院附属机构的实用方法

Journal of data and information science (Warsaw, Poland)

Pub Date : 2021-05-28 DOI: 10.2478/jdis-2022-0006

Andrea Reyes Elizondo, C. Calero-Medina, M. Visser

Abstract Purpose A key question when ranking universities is whether or not to allocate the publication output of affiliated hospitals to universities. This paper presents a method for classifying the varying degrees of interdependency between academic hospitals and universities in the context of the Leiden Ranking. Design/methodology/approach Hospital nomenclatures vary worldwide to denote some form of collaboration with a university, however they do not correspond to universally standard definitions. Thus, rather than seeking a normative definition of academic hospitals, we propose a three-step workflow that aligns the university-hospital relationship with one of three general models: full integration of the hospital and the medical faculty into a single organization; health science centres in which hospitals and medical faculty remain separate entities albeit within the same governance structure; and structures in which universities and hospitals are separate entities which collaborate with one another. This classification system provides a standard through which publications which mention affiliations with academic hospitals can be better allocated. Findings In the paper we illustrate how the three-step workflow effectively translates the three above-mentioned models into two types of instrumental relationships for the assignation of publications: “associate” and “component”. When a hospital and a medical faculty are fully integrated or when a hospital is part of a health science centre, the relationship is classified as component. When a hospital follows the model of collaboration and support, the relationship is classified as associate. The compilation of data following these standards allows for a more uniform comparison between worldwide educational and research systems. Research limitations The workflow is resource intensive, depends heavily on the information provided by universities and hospitals, and is more challenging for languages that use non-Latin characters. Further, the application of the workflow demands a careful evaluation of different types of input which can result in ambiguity and makes it difficult to automatize. Practical implications Determining the type of affiliation an academic hospital has with a university can have a substantial impact on the publication counts for universities. This workflow can also aid in analysing collaborations among the two types of organizations. Originality/value The three-step workflow is a unique way to establish the type of relationship an academic hospital has with a university accounting for national and regional differences on nomenclature.

摘要目的在大学排名中，是否将附属医院的论文产出分配给大学是一个关键问题。本文提出了一种在莱顿排名的背景下对学术医院和大学之间不同程度的相互依赖进行分类的方法。设计/方法/方法医院的命名在世界范围内有所不同，以表示与大学的某种形式的合作，但它们并不符合普遍的标准定义。因此，与其寻求学术医院的规范定义，我们提出了一个三步工作流程，将大学与医院的关系与三种一般模式之一保持一致:将医院和医学院完全整合到一个组织中;在保健科学中心，医院和医学院虽然在同一治理结构内，但仍然是独立的实体;在这种结构中，大学和医院是相互独立的实体，彼此合作。这种分类系统提供了一个标准，通过该标准，提及与学术医院的隶属关系的出版物可以更好地分配。在本文中，我们说明了三步工作流程如何有效地将上述三种模型转化为两种类型的出版物分配工具关系:“关联”和“组成部分”。当医院和医学院完全整合，或者当医院是卫生科学中心的一部分时，这种关系被归类为组成部分。当医院遵循合作和支持的模式时，这种关系被归类为合作关系。按照这些标准汇编数据，可以对世界各地的教育和研究系统进行更统一的比较。该工作流程是资源密集型的，严重依赖于大学和医院提供的信息，并且对于使用非拉丁字符的语言更具挑战性。此外，工作流的应用需要对不同类型的输入进行仔细的评估，这可能导致歧义，并使其难以自动化。确定学术医院与大学的隶属关系类型可以对大学的出版物数量产生重大影响。此工作流还可以帮助分析两种类型的组织之间的协作。原创性/价值三步工作流程是一种独特的方式来建立一个学术医院与大学之间的关系类型，考虑到国家和地区在命名上的差异。

{"title":"The Three-Step Workflow: A Pragmatic Approach to Allocating Academic Hospitals’ Affiliations for Bibliometric Purposes","authors":"Andrea Reyes Elizondo, C. Calero-Medina, M. Visser","doi":"10.2478/jdis-2022-0006","DOIUrl":"https://doi.org/10.2478/jdis-2022-0006","url":null,"abstract":"Abstract Purpose A key question when ranking universities is whether or not to allocate the publication output of affiliated hospitals to universities. This paper presents a method for classifying the varying degrees of interdependency between academic hospitals and universities in the context of the Leiden Ranking. Design/methodology/approach Hospital nomenclatures vary worldwide to denote some form of collaboration with a university, however they do not correspond to universally standard definitions. Thus, rather than seeking a normative definition of academic hospitals, we propose a three-step workflow that aligns the university-hospital relationship with one of three general models: full integration of the hospital and the medical faculty into a single organization; health science centres in which hospitals and medical faculty remain separate entities albeit within the same governance structure; and structures in which universities and hospitals are separate entities which collaborate with one another. This classification system provides a standard through which publications which mention affiliations with academic hospitals can be better allocated. Findings In the paper we illustrate how the three-step workflow effectively translates the three above-mentioned models into two types of instrumental relationships for the assignation of publications: “associate” and “component”. When a hospital and a medical faculty are fully integrated or when a hospital is part of a health science centre, the relationship is classified as component. When a hospital follows the model of collaboration and support, the relationship is classified as associate. The compilation of data following these standards allows for a more uniform comparison between worldwide educational and research systems. Research limitations The workflow is resource intensive, depends heavily on the information provided by universities and hospitals, and is more challenging for languages that use non-Latin characters. Further, the application of the workflow demands a careful evaluation of different types of input which can result in ambiguity and makes it difficult to automatize. Practical implications Determining the type of affiliation an academic hospital has with a university can have a substantial impact on the publication counts for universities. This workflow can also aid in analysing collaborations among the two types of organizations. Originality/value The three-step workflow is a unique way to establish the type of relationship an academic hospital has with a university accounting for national and regional differences on nomenclature.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"7 1","pages":"20 - 36"},"PeriodicalIF":0.0,"publicationDate":"2021-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44720964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

RDFAdaptor: Efficient ETL Plugins for RDF Data Process RDFAdaptor：RDF数据处理的高效ETL插件

Journal of data and information science (Warsaw, Poland)

Pub Date : 2021-04-14 DOI: 10.2478/jdis-2021-0020

Jiao Li, Guojian Xian, Ruixue Zhao, Yongwen Huang, Yuantao Kou, Tingting Luo, Tan Sun

Abstract Purpose The interdisciplinary nature and rapid development of the Semantic Web led to the mass publication of RDF data in a large number of widely accepted serialization formats, thus developing out the necessity for RDF data processing with specific purposes. The paper reports on an assessment of chief RDF data endpoint challenges and introduces the RDF Adaptor, a set of plugins for RDF data processing which covers the whole life-cycle with high efficiency. Design/methodology/approach The RDFAdaptor is designed based on the prominent ETL tool—Pentaho Data Integration—which provides a user-friendly and intuitive interface and allows connect to various data sources and formats, and reuses the Java framework RDF4J as middleware that realizes access to data repositories, SPARQL endpoints and all leading RDF database solutions with SPARQL 1.1 support. It can support effortless services with various configuration templates in multi-scenario applications, and help extend data process tasks in other services or tools to complement missing functions. Findings The proposed comprehensive RDF ETL solution—RDFAdaptor—provides an easy-to-use and intuitive interface, supports data integration and federation over multi-source heterogeneous repositories or endpoints, as well as manage linked data in hybrid storage mode. Research limitations The plugin set can support several application scenarios of RDF data process, but error detection/check and interaction with other graph repositories remain to be improved. Practical implications The plugin set can provide user interface and configuration templates which enable its usability in various applications of RDF data generation, multi-format data conversion, remote RDF data migration, and RDF graph update in semantic query process. Originality/value This is the first attempt to develop components instead of systems that can include extract, consolidate, and store RDF data on the basis of an ecologically mature data warehousing environment.

语义网的跨学科性质和快速发展导致RDF数据以大量被广泛接受的序列化格式大量发布，从而产生了针对特定目的进行RDF数据处理的必要性。本文对RDF数据端点的主要挑战进行了评估，并介绍了RDF适配器，这是一套用于RDF数据处理的插件，它以高效率覆盖了整个生命周期。RDFAdaptor是基于著名的ETL工具——pentaho数据集成——设计的，它提供了一个用户友好和直观的界面，允许连接到各种数据源和格式，并重用Java框架RDF4J作为中间件，实现对数据存储库、SPARQL端点和所有领先的RDF数据库解决方案的访问，并支持SPARQL 1.1。它可以在多场景应用程序中支持使用各种配置模板的轻松服务，并帮助扩展其他服务或工具中的数据处理任务，以补充缺失的功能。提出的综合RDF ETL解决方案——rdfadaptor——提供了一个易于使用和直观的界面，支持多源异构存储库或端点上的数据集成和联合，以及在混合存储模式下管理链接数据。该插件集可以支持RDF数据处理的几种应用场景，但错误检测/检查以及与其他图形存储库的交互仍有待改进。该插件集可以提供用户界面和配置模板，使其可用于RDF数据生成、多格式数据转换、远程RDF数据迁移和语义查询过程中的RDF图更新等各种应用。原创性/价值这是第一次尝试开发组件而不是系统，这些组件可以在生态成熟的数据仓库环境的基础上包含提取、合并和存储RDF数据。

{"title":"RDFAdaptor: Efficient ETL Plugins for RDF Data Process","authors":"Jiao Li, Guojian Xian, Ruixue Zhao, Yongwen Huang, Yuantao Kou, Tingting Luo, Tan Sun","doi":"10.2478/jdis-2021-0020","DOIUrl":"https://doi.org/10.2478/jdis-2021-0020","url":null,"abstract":"Abstract Purpose The interdisciplinary nature and rapid development of the Semantic Web led to the mass publication of RDF data in a large number of widely accepted serialization formats, thus developing out the necessity for RDF data processing with specific purposes. The paper reports on an assessment of chief RDF data endpoint challenges and introduces the RDF Adaptor, a set of plugins for RDF data processing which covers the whole life-cycle with high efficiency. Design/methodology/approach The RDFAdaptor is designed based on the prominent ETL tool—Pentaho Data Integration—which provides a user-friendly and intuitive interface and allows connect to various data sources and formats, and reuses the Java framework RDF4J as middleware that realizes access to data repositories, SPARQL endpoints and all leading RDF database solutions with SPARQL 1.1 support. It can support effortless services with various configuration templates in multi-scenario applications, and help extend data process tasks in other services or tools to complement missing functions. Findings The proposed comprehensive RDF ETL solution—RDFAdaptor—provides an easy-to-use and intuitive interface, supports data integration and federation over multi-source heterogeneous repositories or endpoints, as well as manage linked data in hybrid storage mode. Research limitations The plugin set can support several application scenarios of RDF data process, but error detection/check and interaction with other graph repositories remain to be improved. Practical implications The plugin set can provide user interface and configuration templates which enable its usability in various applications of RDF data generation, multi-format data conversion, remote RDF data migration, and RDF graph update in semantic query process. Originality/value This is the first attempt to develop components instead of systems that can include extract, consolidate, and store RDF data on the basis of an ecologically mature data warehousing environment.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"123 - 145"},"PeriodicalIF":0.0,"publicationDate":"2021-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46308518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Bibliometric-based Study of Scientist Academic Genealogy 基于文献计量学的科学家学术谱系研究

Journal of data and information science (Warsaw, Poland)

Pub Date : 2021-04-14 DOI: 10.2478/jdis-2021-0021

R. Lv, Huan Chang

Abstract Purpose This study aims to construct new models and methods of academic genealogy research based on bibliometrics. Design/methodology/approach This study proposes an academic influence scale for academic genealogy, and introduces the w index for bibliometric scaling of the academic genealogy. We then construct a two-dimensional (academic fecundity versus academic influence) evaluation system of academic genealogy, and validate it on the academic genealogy of a famous Chinese geologist. Findings The two-dimensional evaluation system can characterize the development and evolution of the academic genealogy, compare the academic influences of different genealogies, and evaluate individuals’ contributions to the inheritance and evolution of the academic genealogy. Individual academic influence is mainly indicated by the w index (the improved h index), which overcomes the situation of repeated measurements and distortion of results in the academic genealogy. Practical implications The two-dimensional evaluation system for the academic genealogy can better demonstrate the reproduction and the academic inheritance ability of a genealogy. Research limitations It is not comprehensive to only use the w index to characterize academic influence. It should also include scholars’ academic awards and academic part-timers and so on. In future work, we will integrate scholars’ academic awards and academic part-timers into the w index for a comprehensive reflection of scholars’ individual academic influences. Originality/value This study constructs new models and methods of academic genealogy research based on bibliometrics, which improves the quantitative assessment of academic genealogy and enriches its research and evaluation methods.

摘要目的本研究旨在构建基于文献计量学的学术谱系学研究新模式和方法。设计/方法论/方法本研究提出了一个学术谱系学的学术影响力量表，并引入了学术谱系学文献计量量表的w指数。然后，我们构建了一个二维（学术繁殖力与学术影响力）的学术谱系评价体系，并在中国著名地质学家的学术谱系上进行了验证。研究结果二维评价体系可以表征学术谱系的发展和演变，比较不同谱系的学术影响，评价个人对学术谱系传承和演变的贡献。个人学术影响力主要表现在w指数（改进的h指数）上，它克服了学术谱系中重复测量和结果失真的情况。学术谱系学的二维评价体系可以更好地展示谱系学的再生产和学术传承能力。研究局限性仅用w指数来表征学术影响力是不全面的。在未来的工作中，我们将把学者的学术奖项和学术兼职纳入w指数，以全面反映学者个人的学术影响。原创性/价值本研究构建了基于文献计量学的学术谱系学研究新模式和方法，改进了学术谱系学的定量评估，丰富了学术谱系研究和评价方法。

{"title":"Bibliometric-based Study of Scientist Academic Genealogy","authors":"R. Lv, Huan Chang","doi":"10.2478/jdis-2021-0021","DOIUrl":"https://doi.org/10.2478/jdis-2021-0021","url":null,"abstract":"Abstract Purpose This study aims to construct new models and methods of academic genealogy research based on bibliometrics. Design/methodology/approach This study proposes an academic influence scale for academic genealogy, and introduces the w index for bibliometric scaling of the academic genealogy. We then construct a two-dimensional (academic fecundity versus academic influence) evaluation system of academic genealogy, and validate it on the academic genealogy of a famous Chinese geologist. Findings The two-dimensional evaluation system can characterize the development and evolution of the academic genealogy, compare the academic influences of different genealogies, and evaluate individuals’ contributions to the inheritance and evolution of the academic genealogy. Individual academic influence is mainly indicated by the w index (the improved h index), which overcomes the situation of repeated measurements and distortion of results in the academic genealogy. Practical implications The two-dimensional evaluation system for the academic genealogy can better demonstrate the reproduction and the academic inheritance ability of a genealogy. Research limitations It is not comprehensive to only use the w index to characterize academic influence. It should also include scholars’ academic awards and academic part-timers and so on. In future work, we will integrate scholars’ academic awards and academic part-timers into the w index for a comprehensive reflection of scholars’ individual academic influences. Originality/value This study constructs new models and methods of academic genealogy research based on bibliometrics, which improves the quantitative assessment of academic genealogy and enriches its research and evaluation methods.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"146 - 163"},"PeriodicalIF":0.0,"publicationDate":"2021-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42737707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Lone Geniuses or One among Many? An Explorative Study of Contemporary Highly Cited Researchers 孤独的天才还是众多天才中的一个?当代高被引研究者的探索性研究

Journal of data and information science (Warsaw, Poland)

Pub Date : 2021-03-08 DOI: 10.2478/jdis-2021-0019

D. Aksnes, K. Aagaard

Abstract Purpose The ranking lists of highly cited researchers receive much public attention. In common interpretations, highly cited researchers are perceived to have made extraordinary contributions to science. Thus, the metrics of highly cited researchers are often linked to notions of breakthroughs, scientific excellence, and lone geniuses. Design/methodology/approach In this study, we analyze a sample of individuals who appear on Clarivate Analytics’ Highly Cited Researchers list. The main purpose is to juxtapose the characteristics of their research performance against the claim that the list captures a small fraction of the researcher population that contributes disproportionately to extending the frontier and gaining—on behalf of society—knowledge and innovations that make the world healthier, richer, sustainable, and more secure. Findings The study reveals that the highly cited articles of the selected individuals generally have a very large number of authors. Thus, these papers seldom represent individual contributions but rather are the result of large collective research efforts conducted in research consortia. This challenges the common perception of highly cited researchers as individual geniuses who can be singled out for their extraordinary contributions. Moreover, the study indicates that a few of the individuals have not even contributed to highly cited original research but rather to reviews or clinical guidelines. Finally, the large number of authors of the papers implies that the ranking list is very sensitive to the specific method used for allocating papers and citations to individuals. In the “whole count” methodology applied by Clarivate Analytics, each author gets full credit of the papers regardless of the number of additional co-authors. The study shows that the ranking list would look very different using an alternative fractionalised methodology. Research limitations The study is based on a limited part of the total population of highly cited researchers. Practical implications It is concluded that “excellence” understood as highly cited encompasses very different types of research and researchers of which many do not fit with dominant preconceptions. Originality/value The study develops further knowledge on highly cited researchers, addressing questions such as who becomes highly cited and the type of research that benefits by defining excellence in terms of citation scores and specific counting methods.

摘要目的被高度引用的研究人员的排行榜受到了公众的广泛关注。在通常的解释中，被高度引用的研究人员被认为对科学做出了非凡的贡献。因此，被高度引用的研究人员的指标往往与突破、科学卓越和天才的概念联系在一起。设计/方法论/方法在这项研究中，我们分析了出现在Clarivate Analytics的高引用研究人员名单上的个人样本。其主要目的是将他们的研究表现特征与以下说法并置，即该名单只涵盖了一小部分研究人员，他们为扩展前沿和代表社会获得知识和创新做出了不成比例的贡献，这些知识和创新使世界更健康、更富富富、更可持续、更安全。研究结果研究表明，被选中的个人的高引用文章通常有大量作者。因此，这些论文很少代表个人的贡献，而是研究联盟进行大规模集体研究的结果。这挑战了被高度引用的研究人员作为个人天才的普遍看法，他们可以因其非凡的贡献而被单独挑选出来。此外，该研究表明，其中一些人甚至没有对被高度引用的原始研究做出贡献，而是对综述或临床指南做出了贡献。最后，论文作者数量众多意味着排行榜对将论文和引文分配给个人的具体方法非常敏感。在Clarivate Analytics应用的“整体计数”方法中，无论有多少其他合著者，每位作者都会获得论文的全部学分。研究表明，使用另一种细分方法，排行榜看起来会大不相同。研究局限性该研究基于被高度引用的研究人员总数中的有限部分。实践意义得出的结论是，被高度引用的“卓越”包括非常不同类型的研究和研究人员，其中许多人不符合主流的先入为主的观念。原创性/价值该研究进一步了解了被高度引用的研究人员，解决了诸如谁被高度引用以及通过定义引用分数和具体计数方法方面的卓越性而受益的研究类型等问题。

{"title":"Lone Geniuses or One among Many? An Explorative Study of Contemporary Highly Cited Researchers","authors":"D. Aksnes, K. Aagaard","doi":"10.2478/jdis-2021-0019","DOIUrl":"https://doi.org/10.2478/jdis-2021-0019","url":null,"abstract":"Abstract Purpose The ranking lists of highly cited researchers receive much public attention. In common interpretations, highly cited researchers are perceived to have made extraordinary contributions to science. Thus, the metrics of highly cited researchers are often linked to notions of breakthroughs, scientific excellence, and lone geniuses. Design/methodology/approach In this study, we analyze a sample of individuals who appear on Clarivate Analytics’ Highly Cited Researchers list. The main purpose is to juxtapose the characteristics of their research performance against the claim that the list captures a small fraction of the researcher population that contributes disproportionately to extending the frontier and gaining—on behalf of society—knowledge and innovations that make the world healthier, richer, sustainable, and more secure. Findings The study reveals that the highly cited articles of the selected individuals generally have a very large number of authors. Thus, these papers seldom represent individual contributions but rather are the result of large collective research efforts conducted in research consortia. This challenges the common perception of highly cited researchers as individual geniuses who can be singled out for their extraordinary contributions. Moreover, the study indicates that a few of the individuals have not even contributed to highly cited original research but rather to reviews or clinical guidelines. Finally, the large number of authors of the papers implies that the ranking list is very sensitive to the specific method used for allocating papers and citations to individuals. In the “whole count” methodology applied by Clarivate Analytics, each author gets full credit of the papers regardless of the number of additional co-authors. The study shows that the ranking list would look very different using an alternative fractionalised methodology. Research limitations The study is based on a limited part of the total population of highly cited researchers. Practical implications It is concluded that “excellence” understood as highly cited encompasses very different types of research and researchers of which many do not fit with dominant preconceptions. Originality/value The study develops further knowledge on highly cited researchers, addressing questions such as who becomes highly cited and the type of research that benefits by defining excellence in terms of citation scores and specific counting methods.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"41 - 66"},"PeriodicalIF":0.0,"publicationDate":"2021-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44706468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Male, Female, and Nonbinary Differences in UK Twitter Self-descriptions: A Fine-grained Systematic Exploration 英国推特自我描述中的男性、女性和非二元差异:细粒度系统探索

Journal of data and information science (Warsaw, Poland)

Pub Date : 2021-03-08 DOI: 10.2478/jdis-2021-0018

M. Thelwall, Saheeda Thelwall, Ruth Fairclough

Abstract Purpose Although gender identities influence how people present themselves on social media, previous studies have tested pre-specified dimensions of difference, potentially overlooking other differences and ignoring nonbinary users. Design/methodology/approach Word association thematic analysis was used to systematically check for fine-grained statistically significant gender differences in Twitter profile descriptions between 409,487 UK-based female, male, and nonbinary users in 2020. A series of statistical tests systematically identified 1,474 differences at the individual word level, and a follow up thematic analysis grouped these words into themes. Findings The results reflect offline variations in interests and in jobs. They also show differences in personal disclosures, as reflected by words, with females mentioning qualifications, relationships, pets, and illnesses much more, nonbinaries discussing sexuality more, and males declaring political and sports affiliations more. Other themes were internally imbalanced, including personal appearance (e.g. male: beardy; female: redhead), self-evaluations (e.g. male: legend; nonbinary: witch; female: feisty), and gender identity (e.g. male: dude; nonbinary: enby; female: queen). Research limitations The methods are affected by linguistic styles and probably under-report nonbinary differences. Practical implications The gender differences found may inform gender theory, and aid social web communicators and marketers. Originality/value The results show a much wider range of gender expression differences than previously acknowledged for any social media site.

虽然性别认同会影响人们在社交媒体上的自我表现，但之前的研究已经测试了预先指定的差异维度，可能忽略了其他差异，忽略了非二元用户。设计/方法/方法使用词关联主题分析系统地检查了2020年英国409,487名女性、男性和非二元用户在Twitter个人资料描述中的细粒度统计显著性别差异。一系列的统计测试系统地确定了1474个单个单词水平上的差异，并进行了后续的主题分析，将这些单词分组为主题。研究结果反映了兴趣和工作的线下差异。他们在个人信息披露方面也表现出差异，这反映在语言上，女性更多地提到资格、关系、宠物和疾病，非二元性别的人更多地谈论性，而男性更多地宣布政治和体育关系。其他主题内部不平衡，包括个人外观(例如男性:大胡子;女性:红发)，自我评价(例如男性:传奇;非:女巫;女性:feisty)和性别认同(例如男性:dude;非:enby;女:女王)。研究局限:研究方法受语言风格的影响，可能会低估非二元差异。所发现的性别差异可以为性别理论提供信息，并为社交网络传播者和营销人员提供帮助。研究结果显示，性别表达差异的范围比之前任何社交媒体网站都要大得多。

{"title":"Male, Female, and Nonbinary Differences in UK Twitter Self-descriptions: A Fine-grained Systematic Exploration","authors":"M. Thelwall, Saheeda Thelwall, Ruth Fairclough","doi":"10.2478/jdis-2021-0018","DOIUrl":"https://doi.org/10.2478/jdis-2021-0018","url":null,"abstract":"Abstract Purpose Although gender identities influence how people present themselves on social media, previous studies have tested pre-specified dimensions of difference, potentially overlooking other differences and ignoring nonbinary users. Design/methodology/approach Word association thematic analysis was used to systematically check for fine-grained statistically significant gender differences in Twitter profile descriptions between 409,487 UK-based female, male, and nonbinary users in 2020. A series of statistical tests systematically identified 1,474 differences at the individual word level, and a follow up thematic analysis grouped these words into themes. Findings The results reflect offline variations in interests and in jobs. They also show differences in personal disclosures, as reflected by words, with females mentioning qualifications, relationships, pets, and illnesses much more, nonbinaries discussing sexuality more, and males declaring political and sports affiliations more. Other themes were internally imbalanced, including personal appearance (e.g. male: beardy; female: redhead), self-evaluations (e.g. male: legend; nonbinary: witch; female: feisty), and gender identity (e.g. male: dude; nonbinary: enby; female: queen). Research limitations The methods are affected by linguistic styles and probably under-report nonbinary differences. Practical implications The gender differences found may inform gender theory, and aid social web communicators and marketers. Originality/value The results show a much wider range of gender expression differences than previously acknowledged for any social media site.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"1 - 27"},"PeriodicalIF":0.0,"publicationDate":"2021-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47203787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

A Causal Configuration Analysis of Payment Decision Drivers in Paid Q&A 付费问答中付费决策驱动因素的因果配置分析

Journal of data and information science (Warsaw, Poland)

Pub Date : 2021-03-08 DOI: 10.2478/jdis-2021-0017

Wenyu Chen, Yan Cheng, Jia Li

Abstract Purpose This paper examines factors of payment decision as well as the role each factor plays in casual configurations leading to high payment intention under systematic and heuristic information processing routes. Design/methodology/approach Based on heuristic-systematic model (HSM), we propose a configurational analytic framework to investigate complex casual relationships between influencing factors and payment decision. In line with this approach, we use fuzzy-set qualitative comparative analysis (fsQCA) to analyze data crawled from Zhihu.com. Findings The number of previous consultations is a necessary element in all five equivalent configurations which lead to high intention in payment decision. The heuristic processing route plays a core role while the systematic processing route plays a peripheral role in payment decision-making process. Research limitations Research is limited in that moderating effect of professional fields has not been considered in the framework. Practical implications Configurations in results can assist managers of knowledge communities and paid Q&A service providers in the management of information elements to motivate more payment decision. Originality/value This paper is one of the few studies to apply HSM theory and fsQCA method with respect to the payment decision in paid Q&A.

摘要目的研究系统启发式信息处理路径下的支付决策因素，以及各因素在导致高支付意愿的随机配置中的作用。设计/方法/方法基于启发式系统模型(HSM)，我们提出了一个配置分析框架来研究影响因素与支付决策之间复杂的因果关系。根据这种方法，我们使用模糊集定性比较分析(fsQCA)来分析从知乎网抓取的数据。调查结果以前的咨询次数是所有五种等效配置的必要因素，导致支付决策的高意愿。在支付决策过程中，启发式处理路径起核心作用，系统性处理路径起外围作用。研究的局限性在于框架中没有考虑专业领域的调节作用。结果中的配置可以帮助知识社区的管理者和付费问答服务提供商管理信息元素，以激励更多的付费决策。本文是为数不多的将HSM理论和fsQCA方法应用于付费问答支付决策的研究之一。

{"title":"A Causal Configuration Analysis of Payment Decision Drivers in Paid Q&A","authors":"Wenyu Chen, Yan Cheng, Jia Li","doi":"10.2478/jdis-2021-0017","DOIUrl":"https://doi.org/10.2478/jdis-2021-0017","url":null,"abstract":"Abstract Purpose This paper examines factors of payment decision as well as the role each factor plays in casual configurations leading to high payment intention under systematic and heuristic information processing routes. Design/methodology/approach Based on heuristic-systematic model (HSM), we propose a configurational analytic framework to investigate complex casual relationships between influencing factors and payment decision. In line with this approach, we use fuzzy-set qualitative comparative analysis (fsQCA) to analyze data crawled from Zhihu.com. Findings The number of previous consultations is a necessary element in all five equivalent configurations which lead to high intention in payment decision. The heuristic processing route plays a core role while the systematic processing route plays a peripheral role in payment decision-making process. Research limitations Research is limited in that moderating effect of professional fields has not been considered in the framework. Practical implications Configurations in results can assist managers of knowledge communities and paid Q&A service providers in the management of information elements to motivate more payment decision. Originality/value This paper is one of the few studies to apply HSM theory and fsQCA method with respect to the payment decision in paid Q&A.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"139 - 162"},"PeriodicalIF":0.0,"publicationDate":"2021-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41557970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Content Characteristics of Knowledge Integration in the eHealth Field: An Analysis Based on Citation Contexts 电子健康领域知识整合的内容特征——基于引文语境的分析

Journal of data and information science (Warsaw, Poland)

Pub Date : 2021-03-02 DOI: 10.2478/jdis-2021-0015

Shiyun Wang, Jin Mao, Jing Tang, Yujie Cao

Abstract Purpose This study attempts to disclose the characteristics of knowledge integration in an interdisciplinary field by looking into the content aspect of knowledge. Design/methodology/approach The eHealth field was chosen in the case study. Associated knowledge phrases (AKPs) that are shared between citing papers and their references were extracted from the citation contexts of the eHealth papers by applying a stem-matching method. A classification schema that considers the functions of knowledge in the domain was proposed to categorize the identified AKPs. The source disciplines of each knowledge type were analyzed. Quantitative indicators and a co-occurrence analysis were applied to disclose the integration patterns of different knowledge types. Findings The annotated AKPs evidence the major disciplines supplying each type of knowledge. Different knowledge types have remarkably different integration patterns in terms of knowledge amount, the breadth of source disciplines, and the integration time lag. We also find several frequent co-occurrence patterns of different knowledge types. Research limitations The collected articles of the field are limited to the two leading open access journals. The stem-matching method to extract AKPs could not identify those phrases with the same meaning but expressed in words with different stems. The type of Research Subject dominates the recognized AKPs, which calls on an improvement of the classification schema for better knowledge integration analysis on knowledge units. Practical implications The methodology proposed in this paper sheds new light on knowledge integration characteristics of an interdisciplinary field from the content perspective. The findings have practical implications on the future development of research strategies in eHealth and the policies about interdisciplinary research. Originality/value This study proposed a new methodology to explore the content characteristics of knowledge integration in an interdisciplinary field.

摘要目的本研究试图从知识的内容方面揭示跨学科领域知识整合的特点。设计/方法/方法在案例研究中选择了电子健康领域。引用论文及其参考文献之间共享的关联知识短语（AKP）是通过应用词干匹配方法从电子健康论文的引用上下文中提取的。提出了一种考虑领域中知识功能的分类模式来对识别出的AKP进行分类。分析了每种知识类型的来源学科。应用定量指标和共现分析揭示了不同知识类型的整合模式。研究结果注释的AKP证明了提供每种类型知识的主要学科。不同的知识类型在知识量、来源学科的广度和整合时滞方面具有显著不同的整合模式。我们还发现了不同知识类型的几种频繁共现模式。研究局限性该领域收集的文章仅限于两种领先的开放获取期刊。提取AKP的词干匹配方法不能识别那些意思相同但表达在不同词干的单词中的短语。研究主题的类型主导着公认的AKP，这要求改进分类模式，以便更好地对知识单元进行知识整合分析。本文提出的方法论从内容的角度揭示了跨学科领域的知识整合特征。这些发现对电子健康研究策略的未来发展和跨学科研究的政策具有实际意义。原创性/价值本研究提出了一种新的方法论来探索跨学科领域知识整合的内容特征。

{"title":"Content Characteristics of Knowledge Integration in the eHealth Field: An Analysis Based on Citation Contexts","authors":"Shiyun Wang, Jin Mao, Jing Tang, Yujie Cao","doi":"10.2478/jdis-2021-0015","DOIUrl":"https://doi.org/10.2478/jdis-2021-0015","url":null,"abstract":"Abstract Purpose This study attempts to disclose the characteristics of knowledge integration in an interdisciplinary field by looking into the content aspect of knowledge. Design/methodology/approach The eHealth field was chosen in the case study. Associated knowledge phrases (AKPs) that are shared between citing papers and their references were extracted from the citation contexts of the eHealth papers by applying a stem-matching method. A classification schema that considers the functions of knowledge in the domain was proposed to categorize the identified AKPs. The source disciplines of each knowledge type were analyzed. Quantitative indicators and a co-occurrence analysis were applied to disclose the integration patterns of different knowledge types. Findings The annotated AKPs evidence the major disciplines supplying each type of knowledge. Different knowledge types have remarkably different integration patterns in terms of knowledge amount, the breadth of source disciplines, and the integration time lag. We also find several frequent co-occurrence patterns of different knowledge types. Research limitations The collected articles of the field are limited to the two leading open access journals. The stem-matching method to extract AKPs could not identify those phrases with the same meaning but expressed in words with different stems. The type of Research Subject dominates the recognized AKPs, which calls on an improvement of the classification schema for better knowledge integration analysis on knowledge units. Practical implications The methodology proposed in this paper sheds new light on knowledge integration characteristics of an interdisciplinary field from the content perspective. The findings have practical implications on the future development of research strategies in eHealth and the policies about interdisciplinary research. Originality/value This study proposed a new methodology to explore the content characteristics of knowledge integration in an interdisciplinary field.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"58 - 74"},"PeriodicalIF":0.0,"publicationDate":"2021-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42368138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Automatic Keyphrase Extraction from Scientific Chinese Medical Abstracts Based on Character-Level Sequence Labeling 基于字符级序列标记的中医科学文摘关键词自动提取

Journal of data and information science (Warsaw, Poland)

Pub Date : 2021-03-02 DOI: 10.2478/jdis-2021-0013

Liangping Ding, Zhixiong Zhang, Huan Liu, Jie Li, Gaihong Yu

Abstract Purpose Automatic keyphrase extraction (AKE) is an important task for grasping the main points of the text. In this paper, we aim to combine the benefits of sequence labeling formulation and pretrained language model to propose an automatic keyphrase extraction model for Chinese scientific research. Design/methodology/approach We regard AKE from Chinese text as a character-level sequence labeling task to avoid segmentation errors of Chinese tokenizer and initialize our model with pretrained language model BERT, which was released by Google in 2018. We collect data from Chinese Science Citation Database and construct a large-scale dataset from medical domain, which contains 100,000 abstracts as training set, 6,000 abstracts as development set and 3,094 abstracts as test set. We use unsupervised keyphrase extraction methods including term frequency (TF), TF-IDF, TextRank and supervised machine learning methods including Conditional Random Field (CRF), Bidirectional Long Short Term Memory Network (BiLSTM), and BiLSTM-CRF as baselines. Experiments are designed to compare word-level and character-level sequence labeling approaches on supervised machine learning models and BERT-based models. Findings Compared with character-level BiLSTM-CRF, the best baseline model with F1 score of 50.16%, our character-level sequence labeling model based on BERT obtains F1 score of 59.80%, getting 9.64% absolute improvement. Research limitations We just consider automatic keyphrase extraction task rather than keyphrase generation task, so only keyphrases that are occurred in the given text can be extracted. In addition, our proposed dataset is not suitable for dealing with nested keyphrases. Practical implications We make our character-level IOB format dataset of Chinese Automatic Keyphrase Extraction from scientific Chinese medical abstracts (CAKE) publicly available for the benefits of research community, which is available at: https://github.com/possible1402/Dataset-For-Chinese-Medical-Keyphrase-Extraction. Originality/value By designing comparative experiments, our study demonstrates that character-level formulation is more suitable for Chinese automatic keyphrase extraction task under the general trend of pretrained language models. And our proposed dataset provides a unified method for model evaluation and can promote the development of Chinese automatic keyphrase extraction to some extent.

摘要目的关键词自动提取（AKE）是掌握文本要点的一项重要任务。在本文中，我们的目的是结合序列标记公式和预训练语言模型的优点，提出一种用于中国科学研究的关键短语自动提取模型。设计/方法论/方法我们将中文文本的AKE视为一个字符级序列标记任务，以避免中文标记器的分割错误，并使用谷歌于2018年发布的预训练语言模型BERT初始化我们的模型。我们从中国科学引文数据库中收集数据，构建了一个医学领域的大规模数据集，其中包括100000篇摘要作为训练集，6000篇摘要作为开发集，3094篇摘要作为测试集。我们使用无监督的关键短语提取方法，包括术语频率（TF）、TF-IDF、TextRank，以及有监督的机器学习方法，包括条件随机场（CRF）、双向长短期记忆网络（BiLSTM）和BiLSTM CRF作为基线。实验旨在比较监督机器学习模型和基于BERT的模型上的单词级和字符级序列标记方法。结果与F1得分为50.16%的最佳基线模型——特征水平BiLSTM-CRF相比，我们基于BERT的特征水平序列标记模型获得了59.80%的F1得分，获得了9.64%的绝对改善。研究局限性我们只考虑自动关键短语提取任务，而不是关键短语生成任务，因此只有出现在给定文本中的关键短语才能被提取。此外，我们提出的数据集不适合处理嵌套的关键短语。实际意义为了研究界的利益，我们公开了我们的字符级IOB格式的从科学中医摘要中提取中文关键词的数据集（CAKE），该数据集可在：https://github.com/possible1402/Dataset-For-Chinese-Medical-Keyphrase-Extraction.独创性/价值通过设计比较实验，我们的研究表明，在预训练语言模型的大趋势下，字符级公式更适合中文关键词自动提取任务。我们提出的数据集为模型评估提供了一种统一的方法，可以在一定程度上促进中文关键词自动提取的发展。

{"title":"Automatic Keyphrase Extraction from Scientific Chinese Medical Abstracts Based on Character-Level Sequence Labeling","authors":"Liangping Ding, Zhixiong Zhang, Huan Liu, Jie Li, Gaihong Yu","doi":"10.2478/jdis-2021-0013","DOIUrl":"https://doi.org/10.2478/jdis-2021-0013","url":null,"abstract":"Abstract Purpose Automatic keyphrase extraction (AKE) is an important task for grasping the main points of the text. In this paper, we aim to combine the benefits of sequence labeling formulation and pretrained language model to propose an automatic keyphrase extraction model for Chinese scientific research. Design/methodology/approach We regard AKE from Chinese text as a character-level sequence labeling task to avoid segmentation errors of Chinese tokenizer and initialize our model with pretrained language model BERT, which was released by Google in 2018. We collect data from Chinese Science Citation Database and construct a large-scale dataset from medical domain, which contains 100,000 abstracts as training set, 6,000 abstracts as development set and 3,094 abstracts as test set. We use unsupervised keyphrase extraction methods including term frequency (TF), TF-IDF, TextRank and supervised machine learning methods including Conditional Random Field (CRF), Bidirectional Long Short Term Memory Network (BiLSTM), and BiLSTM-CRF as baselines. Experiments are designed to compare word-level and character-level sequence labeling approaches on supervised machine learning models and BERT-based models. Findings Compared with character-level BiLSTM-CRF, the best baseline model with F1 score of 50.16%, our character-level sequence labeling model based on BERT obtains F1 score of 59.80%, getting 9.64% absolute improvement. Research limitations We just consider automatic keyphrase extraction task rather than keyphrase generation task, so only keyphrases that are occurred in the given text can be extracted. In addition, our proposed dataset is not suitable for dealing with nested keyphrases. Practical implications We make our character-level IOB format dataset of Chinese Automatic Keyphrase Extraction from scientific Chinese medical abstracts (CAKE) publicly available for the benefits of research community, which is available at: https://github.com/possible1402/Dataset-For-Chinese-Medical-Keyphrase-Extraction. Originality/value By designing comparative experiments, our study demonstrates that character-level formulation is more suitable for Chinese automatic keyphrase extraction task under the general trend of pretrained language models. And our proposed dataset provides a unified method for model evaluation and can promote the development of Chinese automatic keyphrase extraction to some extent.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"35 - 57"},"PeriodicalIF":0.0,"publicationDate":"2021-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46705840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

“Sparking” and “Igniting” Key Publications of 2020 Nobel Prize Laureates “激发”和“点燃”2020年诺贝尔奖获得者的重要出版物

Journal of data and information science (Warsaw, Poland)

Pub Date : 2021-03-02 DOI: 10.2478/jdis-2021-0016

Fangjie Xi, R. Rousseau, Xiaojun Hu

Abstract Purpose This article aims to determine the percentage of “Sparking” articles among the work of this year’s Nobel Prize winners in medicine, physics, and chemistry. Design/methodology/approach We focus on under-cited influential research among the key publications as mentioned by the Nobel Prize Committee for the 2020 Noble Prize laureates. Specifically, we extracted data from the Web of Science, and calculated the Sparking Indices using the formulas as proposed by Hu and Rousseau in 2016 and 2017. In addition, we identified another type of igniting articles based on the notion in 2017. Findings In the fields of medicine and physics, the proportions of articles with sparking characteristics share 78.571% and 68.75% respectively, yet, in chemistry 90% articles characterized by “igniting”. Moreover, the two types of articles share more than 93% in the work of the Nobel Prize included in this study. Research limitations Our research did not cover the impact of topic, socio-political, and author’s reputation on the Sparking Indices. Practical implications Our study shows that the Sparking Indices truly reflect influence of the best research work, so it can be used to detect under-cited influential articles, as well as identifying fundamental work. Originality/value Our findings suggest that the Sparking Indices have good applicability for research evaluation.

摘要目的本文旨在确定今年诺贝尔医学、物理学和化学奖获得者作品中“闪闪发光”的文章所占的百分比。设计/方法论/方法我们专注于诺贝尔奖委员会为2020年诺贝尔奖获得者提到的关键出版物中被低估的有影响力的研究。具体而言，我们从科学网中提取数据，并使用胡和卢梭在2016年和2017年提出的公式计算了火花指数。此外，我们在2017年根据这一概念确定了另一种类型的点火物品。研究结果在医学和物理学领域，具有点火特征的文章比例分别为78.571%和68.75%，而在化学领域，90%的文章具有“点火”特征。此外，这两类文章在本研究所收录的诺贝尔奖作品中的份额超过93%。研究局限性我们的研究没有涵盖话题、社会政治和作者声誉对火花指数的影响。实际意义我们的研究表明，火花指数确实反映了最佳研究工作的影响，因此它可以用于检测被引用不足的有影响力的文章，以及识别基础工作。原创性/价值我们的研究结果表明，火花指数在研究评估中具有良好的适用性。

{"title":"“Sparking” and “Igniting” Key Publications of 2020 Nobel Prize Laureates","authors":"Fangjie Xi, R. Rousseau, Xiaojun Hu","doi":"10.2478/jdis-2021-0016","DOIUrl":"https://doi.org/10.2478/jdis-2021-0016","url":null,"abstract":"Abstract Purpose This article aims to determine the percentage of “Sparking” articles among the work of this year’s Nobel Prize winners in medicine, physics, and chemistry. Design/methodology/approach We focus on under-cited influential research among the key publications as mentioned by the Nobel Prize Committee for the 2020 Noble Prize laureates. Specifically, we extracted data from the Web of Science, and calculated the Sparking Indices using the formulas as proposed by Hu and Rousseau in 2016 and 2017. In addition, we identified another type of igniting articles based on the notion in 2017. Findings In the fields of medicine and physics, the proportions of articles with sparking characteristics share 78.571% and 68.75% respectively, yet, in chemistry 90% articles characterized by “igniting”. Moreover, the two types of articles share more than 93% in the work of the Nobel Prize included in this study. Research limitations Our research did not cover the impact of topic, socio-political, and author’s reputation on the Sparking Indices. Practical implications Our study shows that the Sparking Indices truly reflect influence of the best research work, so it can be used to detect under-cited influential articles, as well as identifying fundamental work. Originality/value Our findings suggest that the Sparking Indices have good applicability for research evaluation.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"28 - 40"},"PeriodicalIF":0.0,"publicationDate":"2021-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48157058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

The Scientometric Measurement of Interdisciplinarity and Diversity in the Research Portfolios of Chinese Universities 中国高校科研组合跨学科性和多样性的科学计量研究

Journal of data and information science (Warsaw, Poland)

Pub Date : 2021-02-25 DOI: 10.2139/ssrn.3798519

Lin Zhang, L. Leydesdorff

Abstract Purpose Interdisciplinarity is a hot topic in science and technology policy. However, the concept of interdisciplinarity is both abstract and complex, and therefore difficult to measure using a single indicator. A variety of metrics for measuring the diversity and interdisciplinarity of articles, journals, and fields have been proposed in the literature. In this article, we ask whether institutions can be ranked in terms of their (inter-)disciplinary diversity. Design/methodology/approach We developed a software application (interd_vb.exe) that outputs the values of relevant diversity indicators for any document set or network structure. The software is made available, free to the public, online. The indicators it considers include the advanced diversity indicators Rao-Stirling (RS) diversity and DIV*, as well as standard measures of diversity, such as the Gini coefficient, Shannon entropy, and the Simpson Index. As an empirical demonstration of how the application works, we compared the research portfolios of 42 “Double First-Class” Chinese universities across Web of Science Subject Categories (WCs). Findings The empirical results suggest that DIV* provides results that are more in line with one's intuitive impressions than RS, particularly when the results are based on sample-dependent disparity measures. Furthermore, the scores for diversity are more consistent when based on a global disparity matrix than on a local map. Research limitations “Interdisciplinarity” can be operationalized as bibliographic coupling among (sets of) documents with references to disciplines. At the institutional level, however, diversity may also indicate comprehensiveness. Unlike impact (e.g. citation), diversity and interdisciplinarity are context-specific and therefore provide a second dimension to the evaluation. Policy or practical implications Operationalization and quantification make it necessary for analysts to make their choices and options clear. Although the equations used to calculate diversity are often mathematically transparent, the specification in terms of computer code helps the analyst to further precision in decisions. Although diversity is not necessarily a goal of universities, a high diversity score may inform potential policies concerning interdisciplinarity at the university level. Originality/value This article introduces a non-commercial online application to the public domain that allows researchers and policy analysts to measure “diversity” and “interdisciplinarity” using the various indicators as encompassing as possible for any document set or network structure (e.g. a network of co-authors). Insofar as we know, such a professional computing tool for evaluating data sets using diversity indicators has not yet been made available online.

摘要目的跨学科是科学技术政策中的一个热门话题。然而，跨学科性的概念既抽象又复杂，因此很难用单一指标来衡量。文献中提出了各种衡量文章、期刊和领域多样性和跨学科性的指标。在这篇文章中，我们询问机构是否可以根据其（跨学科）多样性进行排名。设计/方法论/方法我们开发了一个软件应用程序（interd_vb.exe），用于输出任何文档集或网络结构的相关多样性指标的值。该软件在网上向公众免费提供。它考虑的指标包括高级多样性指标Rao Stirling（RS）多样性和DIV*，以及多样性的标准衡量标准，如基尼系数、香农熵和辛普森指数。作为应用程序工作原理的实证证明，我们比较了42所“双一流”中国大学在网络科学学科类别（WCs）中的研究组合。研究结果实证结果表明，与RS相比，DIV*提供的结果更符合人们的直觉印象，尤其是当结果基于样本依赖性差异测量时。此外，当基于全局视差矩阵时，多样性的得分比基于局部地图时更一致。研究局限性“跨学科性”可以操作为（一组）文献与学科参考文献之间的书目耦合。然而，在机构一级，多样性也可能表明全面性。与影响力（如引文）不同，多样性和跨学科性是特定于上下文的，因此为评估提供了第二个维度。政策或实际影响操作化和量化使分析师有必要明确他们的选择和选择。尽管用于计算多样性的方程在数学上通常是透明的，但计算机代码方面的规范有助于分析师进一步提高决策的准确性。尽管多样性不一定是大学的目标，但高多样性分数可能会为大学层面跨学科的潜在政策提供信息。原创性/价值本文将一个非商业性的在线应用程序引入公共领域，使研究人员和政策分析师能够使用各种指标来衡量“多样性”和“跨学科性”，这些指标尽可能涵盖任何文件集或网络结构（如合著者网络）。据我们所知，这种使用多样性指标评估数据集的专业计算工具尚未在网上提供。

{"title":"The Scientometric Measurement of Interdisciplinarity and Diversity in the Research Portfolios of Chinese Universities","authors":"Lin Zhang, L. Leydesdorff","doi":"10.2139/ssrn.3798519","DOIUrl":"https://doi.org/10.2139/ssrn.3798519","url":null,"abstract":"Abstract Purpose Interdisciplinarity is a hot topic in science and technology policy. However, the concept of interdisciplinarity is both abstract and complex, and therefore difficult to measure using a single indicator. A variety of metrics for measuring the diversity and interdisciplinarity of articles, journals, and fields have been proposed in the literature. In this article, we ask whether institutions can be ranked in terms of their (inter-)disciplinary diversity. Design/methodology/approach We developed a software application (interd_vb.exe) that outputs the values of relevant diversity indicators for any document set or network structure. The software is made available, free to the public, online. The indicators it considers include the advanced diversity indicators Rao-Stirling (RS) diversity and DIV*, as well as standard measures of diversity, such as the Gini coefficient, Shannon entropy, and the Simpson Index. As an empirical demonstration of how the application works, we compared the research portfolios of 42 “Double First-Class” Chinese universities across Web of Science Subject Categories (WCs). Findings The empirical results suggest that DIV* provides results that are more in line with one's intuitive impressions than RS, particularly when the results are based on sample-dependent disparity measures. Furthermore, the scores for diversity are more consistent when based on a global disparity matrix than on a local map. Research limitations “Interdisciplinarity” can be operationalized as bibliographic coupling among (sets of) documents with references to disciplines. At the institutional level, however, diversity may also indicate comprehensiveness. Unlike impact (e.g. citation), diversity and interdisciplinarity are context-specific and therefore provide a second dimension to the evaluation. Policy or practical implications Operationalization and quantification make it necessary for analysts to make their choices and options clear. Although the equations used to calculate diversity are often mathematically transparent, the specification in terms of computer code helps the analyst to further precision in decisions. Although diversity is not necessarily a goal of universities, a high diversity score may inform potential policies concerning interdisciplinarity at the university level. Originality/value This article introduces a non-commercial online application to the public domain that allows researchers and policy analysts to measure “diversity” and “interdisciplinarity” using the various indicators as encompassing as possible for any document set or network structure (e.g. a network of co-authors). Insofar as we know, such a professional computing tool for evaluating data sets using diversity indicators has not yet been made available online.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"13 - 35"},"PeriodicalIF":0.0,"publicationDate":"2021-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44665473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7