首页 > 最新文献

Data and information management最新文献

英文 中文
An Empirical Study on Knowledge Aggregation in Academic Virtual Community Based on Deep Learning 基于深度学习的学术虚拟社区知识聚集实证研究
Pub Date : 2021-10-01 DOI: 10.2478/dim-2021-0010
Liangfeng Qian , Shengli Deng

Academic virtual community provides an environment for users to exchange knowledge, so it gathers a large amount of knowledge resources and presents a trend of rapid and disorderly growth. We learn how to organize the scattered and disordered knowledge of network community effectively and provide personalized service for users. We focus on analyzing the knowledge association among titles in an all-round way based on deep learning, so as to realize effective knowledge aggregation in academic virtual community. We take ResearchGate (RG) “online community” resources as an example and use Word2Vec model to realize deep knowledge aggregation. Then, principal component analysis (PCA) is used to verify its scientificity, and Wide & Deep learning model is used to verify its running effect. The empirical results show that the knowledge aggregation system of “online community” works well and has scientific rationality.

学术虚拟社区为用户提供了知识交流的环境,聚集了大量的知识资源,呈现出快速无序增长的趋势。我们学会了如何有效地组织网络社区中分散无序的知识,为用户提供个性化的服务。我们着眼于基于深度学习的全面分析职称之间的知识关联,从而实现学术虚拟社区中有效的知识聚合。以ResearchGate (RG)“在线社区”资源为例,利用Word2Vec模型实现深度知识聚合。然后运用主成分分析(PCA)对其科学性进行了验证。采用深度学习模型验证其运行效果。实证结果表明,“网络社区”知识聚合系统运行良好,具有科学合理性。
{"title":"An Empirical Study on Knowledge Aggregation in Academic Virtual Community Based on Deep Learning","authors":"Liangfeng Qian ,&nbsp;Shengli Deng","doi":"10.2478/dim-2021-0010","DOIUrl":"10.2478/dim-2021-0010","url":null,"abstract":"<div><p>Academic virtual community provides an environment for users to exchange knowledge, so it gathers a large amount of knowledge resources and presents a trend of rapid and disorderly growth. We learn how to organize the scattered and disordered knowledge of network community effectively and provide personalized service for users. We focus on analyzing the knowledge association among titles in an all-round way based on deep learning, so as to realize effective knowledge aggregation in academic virtual community. We take ResearchGate (RG) “online community” resources as an example and use Word2Vec model to realize deep knowledge aggregation. Then, principal component analysis (PCA) is used to verify its scientificity, and Wide &amp; Deep learning model is used to verify its running effect. The empirical results show that the knowledge aggregation system of “online community” works well and has scientific rationality.</p></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"5 4","pages":"Pages 372-388"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2543925122000092/pdfft?md5=ba22355ecf2fd5d4f73c6e54eeffe9fe&pid=1-s2.0-S2543925122000092-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43768582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How Should One Explore the Digital Library of the Future?# 如何探索未来的数字图书馆?#
Pub Date : 2021-10-01 DOI: 10.2478/dim-2021-0003
Edward A. Fox , Prashant Chandrasekar

This article partially addresses a challenge from Licklider in his 1965 book on “Libraries of the Future,” focusing on how to build extensible digital libraries that can dramatically expand the support of exploration. A new methodology connects the efforts of User eXperience researchers with those of subject matter experts (domain scientists, curators, researchers, and so on) and developers. This allows constructing a knowledge graph representing the relationships among goals, tasks, workflows, and services. A reasoner empowers authorized users to have their goals met with suitable workflows that are dynamically generated and executed. Student teams have applied the new methodology to support users interested in tweets, web pages, or electronic theses and dissertations, as well as those curating and experimenting with those collections. Exploration is thus broadened across content types and their elements, with an extensible set of services, to address an arbitrary set of stakeholder goals.

本文部分解决了Licklider在他1965年出版的《未来的图书馆》一书中提出的一个挑战,重点是如何构建可扩展的数字图书馆,从而极大地扩展对探索的支持。一种新的方法将用户体验研究人员的工作与主题专家(领域科学家、管理员、研究人员等)和开发人员的工作联系起来。这允许构建表示目标、任务、工作流和服务之间关系的知识图。推理器使授权用户能够使用动态生成和执行的合适工作流来满足他们的目标。学生团队已经应用了新的方法来支持对tweet、网页或电子论文和学位论文感兴趣的用户,以及那些对这些集合进行策划和实验的用户。因此,探索可以跨内容类型及其元素进行扩展,并使用一组可扩展的服务,以处理一组任意的涉众目标。
{"title":"How Should One Explore the Digital Library of the Future?#","authors":"Edward A. Fox ,&nbsp;Prashant Chandrasekar","doi":"10.2478/dim-2021-0003","DOIUrl":"https://doi.org/10.2478/dim-2021-0003","url":null,"abstract":"<div><p>This article partially addresses a challenge from Licklider in his 1965 book on “<em>Libraries of the Future</em>,” focusing on how to build extensible digital libraries that can dramatically expand the support of exploration. A new methodology connects the efforts of User eXperience researchers with those of subject matter experts (domain scientists, curators, researchers, and so on) and developers. This allows constructing a knowledge graph representing the relationships among goals, tasks, workflows, and services. A reasoner empowers authorized users to have their goals met with suitable workflows that are dynamically generated and executed. Student teams have applied the new methodology to support users interested in tweets, web pages, or electronic theses and dissertations, as well as those curating and experimenting with those collections. Exploration is thus broadened across content types and their elements, with an extensible set of services, to address an arbitrary set of stakeholder goals.</p></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"5 4","pages":"Pages 349-362"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2543925122000079/pdfft?md5=1036e870d69995d4e00a8a90fecdd30f&pid=1-s2.0-S2543925122000079-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"137370190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Research Librarians' Experiences of Research Data Management Activities at an Academic Library in a Developing Country 发展中国家高校图书馆科研数据管理活动的研究馆员经验
Pub Date : 2021-10-01 DOI: 10.2478/dim-2021-0002
Johnson Masinde , Jing Chen , Daniel Wambiri , Angela Mumo

University libraries have archaeologically augmented scientific research by collecting, organizing, maintaining, and availing research materials for access. Researchers reckon that with the expertise acquired from conventional cataloging, classification, and indexing coupled with that attained in the development, along with the maintenance of institutional repositories, it is only rational that libraries take a dominant and central role in research data management and further their capacity as curators. Accordingly, University libraries are expected to assemble capabilities, to manage and provide research data for sharing and reusing efficiently. This study examined research librarians' experiences of RDM activities at the UON Library to recommend measures to enhance managing, sharing and reusing research data. The study was informed by the DCC Curation lifecycle model and the Community Capability Model Framework (CCMF) that enabled the Investigator to purposively capture qualitative data from a sample of 5 research librarians at the UON Library. The data was analysed thematically to generate themes that enabled the Investigator to address the research problem. Though the UON Library had policies on research data, quality assurance and intellectual property, study findings evidenced no explicit policies to guide each stage of data curation and capabilities. There were also inadequacies in skills and training capability, technological infrastructure and collaborative partnerships. Overall, RDM faced challenges in all the examined capabilities. These challenges limited the managing, sharing, and reusing of research data. The study recommends developing an RDM unit within the UON Library to oversee the implementation of RDM activities by assembling all the needed capabilities (policy guidelines, skills and training, technological infrastructure and collaborative partnerships) to support data curation activities and enable efficient managing, sharing and reusing research data.

大学图书馆通过收集、组织、维护和利用研究资料,从考古角度增强了科学研究。研究人员认为,有了从传统编目、分类和索引中获得的专业知识,再加上在发展过程中获得的专业知识,再加上机构知识库的维护,图书馆在研究数据管理中发挥主导和核心作用,并进一步发挥其策展人的作用,这是唯一合理的。因此,大学图书馆需要整合能力,管理和提供研究数据,以便有效地共享和再利用。这项研究审查了研究图书馆员在联合国图书馆进行资源分配管理活动的经验,以建议加强管理、分享和重新使用研究数据的措施。该研究由DCC策展生命周期模型和社区能力模型框架(CCMF)提供信息,该框架使研究者能够有目的地从联合国图书馆的5名研究馆员样本中获取定性数据。数据按主题进行分析,以产生使研究者能够解决研究问题的主题。虽然联合国图书馆有关于研究数据、质量保证和知识产权的政策,但研究结果证明没有明确的政策来指导数据管理和能力的每个阶段。在技能和培训能力、技术基础设施和合作伙伴关系方面也存在不足。总的来说,RDM在所有被检查的功能中都面临着挑战。这些挑战限制了研究数据的管理、共享和重用。研究报告建议在联合国图书馆内设立一个资源管理管理单位,通过汇集所有必要的能力(政策准则、技能和培训、技术基础设施和合作伙伴关系)来监督资源管理管理活动的实施,以支持数据管理活动,并实现有效管理、共享和再利用研究数据。
{"title":"Research Librarians' Experiences of Research Data Management Activities at an Academic Library in a Developing Country","authors":"Johnson Masinde ,&nbsp;Jing Chen ,&nbsp;Daniel Wambiri ,&nbsp;Angela Mumo","doi":"10.2478/dim-2021-0002","DOIUrl":"10.2478/dim-2021-0002","url":null,"abstract":"<div><p>University libraries have archaeologically augmented scientific research by collecting, organizing, maintaining, and availing research materials for access. Researchers reckon that with the expertise acquired from conventional cataloging, classification, and indexing coupled with that attained in the development, along with the maintenance of institutional repositories, it is only rational that libraries take a dominant and central role in research data management and further their capacity as curators. Accordingly, University libraries are expected to assemble capabilities, to manage and provide research data for sharing and reusing efficiently. This study examined research librarians' experiences of RDM activities at the UON Library to recommend measures to enhance managing, sharing and reusing research data. The study was informed by the DCC Curation lifecycle model and the Community Capability Model Framework (CCMF) that enabled the Investigator to purposively capture qualitative data from a sample of 5 research librarians at the UON Library. The data was analysed thematically to generate themes that enabled the Investigator to address the research problem. Though the UON Library had policies on research data, quality assurance and intellectual property, study findings evidenced no explicit policies to guide each stage of data curation and capabilities. There were also inadequacies in skills and training capability, technological infrastructure and collaborative partnerships. Overall, RDM faced challenges in all the examined capabilities. These challenges limited the managing, sharing, and reusing of research data. The study recommends developing an RDM unit within the UON Library to oversee the implementation of RDM activities by assembling all the needed capabilities (policy guidelines, skills and training, technological infrastructure and collaborative partnerships) to support data curation activities and enable efficient managing, sharing and reusing research data.</p></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"5 4","pages":"Pages 412-424"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2543925122000110/pdfft?md5=8c6a5129bde5b2b8eba45ac716ddeec6&pid=1-s2.0-S2543925122000110-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45662567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Interactive Evolution of Multidimensional Information in Social Media for Public Emergency: A Perspective from Optics Scattering 突发公共事件社交媒体中多维信息的交互演化——基于光学散射的视角
Pub Date : 2021-10-01 DOI: 10.2478/dim-2021-0008
Xiaoyue Ma , Xiao Meng , Hao Ma

Most of the current research on the information analysis of social media (SM) for public emergency focused on a single dimension such as emotion while neglecting the interaction between multidimensional information. Therefore, in this study, an information dispersing–superimposing model is proposed to explain the implicit regularity of the impact within a symbol, sentiment, and context information and their dependent evolution on the SM. Information hue, saturation, and flux (HSF) are defined to measure the interaction process. An online event was selected to verify the concept and hypothesis of this study. The results proved that the interaction among multidimensional information did exist on the SM for a public emergency. The turning points of information dispersing–superimposing often emerged when the number of online users involved had significant changes, and sentiment and context information were showed to have a strong interaction relationship and tended to be spread at the same time. It was also manifested that the dominant information component was varied at each stage of the emergency. This paper is one of the first to study the interaction of multidimensional information on the SM derived from optics scattering. The findings of the study will try to provide a theoretical explanation for why certain information components may be enhanced during the online dissemination and suggest practical support for the information predictions and interface design for SM.

当前关于突发公共事件社交媒体信息分析的研究大多集中在情感等单一维度上,而忽略了多维信息之间的相互作用。因此,本研究提出了一个信息分散-叠加模型来解释符号、情感和上下文信息之间的影响及其在SM上的依赖演化的隐含规律。定义了信息色相、饱和度和通量(HSF)来测量交互过程。选择一个在线事件来验证本研究的概念和假设。结果表明,突发公共事件信息管理中确实存在多维信息交互作用。信息扩散叠加的拐点往往出现在参与网络用户数量发生重大变化时,情绪信息和语境信息表现出较强的交互关系,并有同时传播的趋势。还表明,在紧急情况的每个阶段,占主导地位的信息组成部分各不相同。本文首次研究了由光学散射得到的SM上的多维信息的相互作用。本研究的结果将试图为网络传播过程中某些信息成分为何会被强化提供理论解释,并为网络传播的信息预测和界面设计提供实践支持。
{"title":"Interactive Evolution of Multidimensional Information in Social Media for Public Emergency: A Perspective from Optics Scattering","authors":"Xiaoyue Ma ,&nbsp;Xiao Meng ,&nbsp;Hao Ma","doi":"10.2478/dim-2021-0008","DOIUrl":"10.2478/dim-2021-0008","url":null,"abstract":"<div><p>Most of the current research on the information analysis of social media (SM) for public emergency focused on a single dimension such as emotion while neglecting the interaction between multidimensional information. Therefore, in this study, an information dispersing–superimposing model is proposed to explain the implicit regularity of the impact within a symbol, sentiment, and context information and their dependent evolution on the SM. Information hue, saturation, and flux (HSF) are defined to measure the interaction process. An online event was selected to verify the concept and hypothesis of this study. The results proved that the interaction among multidimensional information did exist on the SM for a public emergency. The turning points of information dispersing–superimposing often emerged when the number of online users involved had significant changes, and sentiment and context information were showed to have a strong interaction relationship and tended to be spread at the same time. It was also manifested that the dominant information component was varied at each stage of the emergency. This paper is one of the first to study the interaction of multidimensional information on the SM derived from optics scattering. The findings of the study will try to provide a theoretical explanation for why certain information components may be enhanced during the online dissemination and suggest practical support for the information predictions and interface design for SM.</p></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"5 4","pages":"Pages 389-411"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2543925122000109/pdfft?md5=5499abbac83d4e8133668dad9a77ceb7&pid=1-s2.0-S2543925122000109-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46347705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Which Message? Which Channel? Which Customer? - Exploring Response Rates in Multi-Channel Marketing Using Short-Form Advertising 哪条消息?哪个频道?哪个客户?-利用简短广告探索多渠道营销的响应率
Pub Date : 2021-09-24 DOI: 10.2478/dim-2021-0011
Omar Marzouk, Joni O. Salminen, Pengyi Zhang, B. Jansen
Abstract Formulating short-form advertising messages with little ad content that work and choosing high-performing channels to disseminate them are persistent challenges in multichannel marketing. Drawing on the persuasive systems design (PSD) model, we experimented with 33,848 actual customers of an international telecom company. In a real-life setting, we compared the effectiveness of three persuasion strategies (rational, emotional, and social) tested in three marketing channels (short message service (SMS), social media advertising, and mobile application), evaluating their effect on influencing customers to purchase international mobile phone credits. Results suggest that companies should send rational messages when using short-form advertising messages regardless of the channel to achieve higher response rates. Findings further show that certain customer characteristics are predictive of positive responses and differ by channel but not by message type. Findings from crowdsourced evaluations also indicate that people noticeably disagree on what persuasive strategy was applied to these short messages, indicating that consumers are not well-equipped to identify persuasive strategies or that what advertisers see as a “pure” strategy actually involves elements from multiple strategies as interpreted by consumers. The results have implications for the theoretical understanding of persuasive short-form commercial messaging in multichannel marketing and practical insights for advertising within a limited amount of space and attention afforded by many digital channels.
摘要在多渠道营销中,用少量有效的广告内容来制定简短的广告信息,并选择高性能的渠道来传播这些信息,是一个持续的挑战。利用说服系统设计(PSD)模型,我们对一家国际电信公司的33848名实际客户进行了实验。在现实生活中,我们比较了在三种营销渠道(短信服务、社交媒体广告和移动应用程序)中测试的三种说服策略(理性、情感和社交)的有效性,评估了它们对影响客户购买国际手机信用的影响。研究结果表明,公司在使用短格式广告信息时,无论渠道如何,都应该发送合理的信息,以获得更高的响应率。调查结果进一步表明,某些客户特征可以预测积极的反应,并因渠道而不同,但不因信息类型而不同。众包评估的结果还表明,人们对将什么样的说服策略应用于这些短信存在明显的分歧,这表明消费者没有做好识别说服策略的准备,或者广告商所认为的“纯粹”策略实际上涉及消费者所解释的多个策略的元素。研究结果对多渠道营销中有说服力的短格式商业信息的理论理解以及在许多数字渠道提供的有限空间和关注范围内进行广告的实践见解具有启示意义。
{"title":"Which Message? Which Channel? Which Customer? - Exploring Response Rates in Multi-Channel Marketing Using Short-Form Advertising","authors":"Omar Marzouk, Joni O. Salminen, Pengyi Zhang, B. Jansen","doi":"10.2478/dim-2021-0011","DOIUrl":"https://doi.org/10.2478/dim-2021-0011","url":null,"abstract":"Abstract Formulating short-form advertising messages with little ad content that work and choosing high-performing channels to disseminate them are persistent challenges in multichannel marketing. Drawing on the persuasive systems design (PSD) model, we experimented with 33,848 actual customers of an international telecom company. In a real-life setting, we compared the effectiveness of three persuasion strategies (rational, emotional, and social) tested in three marketing channels (short message service (SMS), social media advertising, and mobile application), evaluating their effect on influencing customers to purchase international mobile phone credits. Results suggest that companies should send rational messages when using short-form advertising messages regardless of the channel to achieve higher response rates. Findings further show that certain customer characteristics are predictive of positive responses and differ by channel but not by message type. Findings from crowdsourced evaluations also indicate that people noticeably disagree on what persuasive strategy was applied to these short messages, indicating that consumers are not well-equipped to identify persuasive strategies or that what advertisers see as a “pure” strategy actually involves elements from multiple strategies as interpreted by consumers. The results have implications for the theoretical understanding of persuasive short-form commercial messaging in multichannel marketing and practical insights for advertising within a limited amount of space and attention afforded by many digital channels.","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45058909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Discovering Booming Bio-entities and Their Relationship with Funds 发现蓬勃发展的生物实体及其与基金的关系
Pub Date : 2021-07-01 DOI: 10.2478/dim-2021-0007
Fang Tan , Tongyang Zhang , Siting Yang , Xiaoyan Wu , Jian Xu

With the increasing pressure on the National Institutes of Health (NIH) budget nowadays, it is such a major challenge to cut waste and improve efficiency in the research funding allocation. To meet this challenge, this paper explores research hotspots and disciplinary trends of the biomedical area, and discusses the relationship between these factors and the government funding, thereby uncovering biomedical hotspots of interest to academia and the evolution law of the U.S. federal government funding through an entitymetrics analysis. Considering that the rapid proliferation of biomedical literature provides large amounts of information resources for knowledge discovery, entities extracted from articles in PubMed and NIH-funded projects during 1988–2017 are taken as experimental data. They are divided into four categories: species, diseases, genes, and drugs. Subsequently, a comparative analysis of entity trajectories in the four domains is performed, which includes occurrence frequency calculations of disease entities to explore frequency variation trends in high-frequency entities and the situation of the distribution of research funds. Finally, we conduct an evolutionary analysis of two sides, respectively: the relationship between research popularity and the amount of funding; the relationship between research popularity and the number of funded projects. The results suggest that research on gene and disease entities is at the stage of rapid development. Diseases with high prevalence rate and mortality and diseases associated with genetic factors will be the emphasis of research trends in the future. The distribution of NIH grant appears obvious long tail effect and can influence overall trends in the heat of research topics.. We also find that there is a strong linear correlation between the research popularity of bio-entities, and the amount and number of funding grants, respectively. However, the impact of the amount and number of grant funds on the entity research popularity is decreasing. The above results indicate the extensive applicability of entitymetrics in funding research.

随着美国国立卫生研究院(NIH)预算压力的不断增加,减少浪费和提高研究经费分配效率是一个重大挑战。为了应对这一挑战,本文探索了生物医学领域的研究热点和学科发展趋势,并探讨了这些因素与政府资助的关系,从而通过实体计量分析揭示了学术界感兴趣的生物医学热点和美国联邦政府资助的演变规律。考虑到生物医学文献的快速增长为知识发现提供了大量的信息资源,从1988-2017年PubMed和nih资助项目的文章中提取实体作为实验数据。它们被分为四类:物种、疾病、基因和药物。随后,对四个领域的实体轨迹进行对比分析,包括计算疾病实体的发生频率,以探索高频实体的频率变化趋势和研究经费分配情况。最后,从两个方面分别进行了演化分析:研究知名度与资助金额的关系;研究知名度与资助项目数量的关系。结果表明,基因和疾病实体的研究正处于快速发展阶段。高患病率和高死亡率疾病以及与遗传因素有关的疾病将是未来研究趋势的重点。NIH拨款的分布表现出明显的长尾效应,可以影响研究课题热度的总体趋势。我们还发现,生物实体的研究受欢迎程度与资助金额和数量之间分别存在很强的线性相关关系。但是,赞助金额和数量对实体研究人气的影响正在减少。上述结果表明实体指标在资助研究方面具有广泛的适用性。
{"title":"Discovering Booming Bio-entities and Their Relationship with Funds","authors":"Fang Tan ,&nbsp;Tongyang Zhang ,&nbsp;Siting Yang ,&nbsp;Xiaoyan Wu ,&nbsp;Jian Xu","doi":"10.2478/dim-2021-0007","DOIUrl":"10.2478/dim-2021-0007","url":null,"abstract":"<div><p>With the increasing pressure on the National Institutes of Health (NIH) budget nowadays, it is such a major challenge to cut waste and improve efficiency in the research funding allocation. To meet this challenge, this paper explores research hotspots and disciplinary trends of the biomedical area, and discusses the relationship between these factors and the government funding, thereby uncovering biomedical hotspots of interest to academia and the evolution law of the U.S. federal government funding through an entitymetrics analysis. Considering that the rapid proliferation of biomedical literature provides large amounts of information resources for knowledge discovery, entities extracted from articles in PubMed and NIH-funded projects during 1988–2017 are taken as experimental data. They are divided into four categories: species, diseases, genes, and drugs. Subsequently, a comparative analysis of entity trajectories in the four domains is performed, which includes occurrence frequency calculations of disease entities to explore frequency variation trends in high-frequency entities and the situation of the distribution of research funds. Finally, we conduct an evolutionary analysis of two sides, respectively: the relationship between research popularity and the amount of funding; the relationship between research popularity and the number of funded projects. The results suggest that research on gene and disease entities is at the stage of rapid development. Diseases with high prevalence rate and mortality and diseases associated with genetic factors will be the emphasis of research trends in the future. The distribution of NIH grant appears obvious long tail effect and can influence overall trends in the heat of research topics.. We also find that there is a strong linear correlation between the research popularity of bio-entities, and the amount and number of funding grants, respectively. However, the impact of the amount and number of grant funds on the entity research popularity is decreasing. The above results indicate the extensive applicability of entitymetrics in funding research.</p></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"5 3","pages":"Pages 312-328"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S254392512200002X/pdfft?md5=35aa412bbf46c3c0636990378d0b8ebf&pid=1-s2.0-S254392512200002X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49143950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Knowledge Entity Extraction and Text Mining in the Era of Big Data 大数据时代的知识实体提取与文本挖掘
Pub Date : 2021-07-01 DOI: 10.2478/dim-2021-0009
Chengzhi Zhang , Philipp Mayr , Wei Lu , Yi Zhang
{"title":"Knowledge Entity Extraction and Text Mining in the Era of Big Data","authors":"Chengzhi Zhang ,&nbsp;Philipp Mayr ,&nbsp;Wei Lu ,&nbsp;Yi Zhang","doi":"10.2478/dim-2021-0009","DOIUrl":"10.2478/dim-2021-0009","url":null,"abstract":"","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"5 3","pages":"Pages 309-311"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2543925122000018/pdfft?md5=93256bdc8b58ce0e3460e63245bb3707&pid=1-s2.0-S2543925122000018-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44547642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Pattern and POS Auto-Learning Method for Terminology Extraction from Scientific Text 科技文本术语抽取的模式与词性自动学习方法
Pub Date : 2021-07-01 DOI: 10.2478/dim-2021-0005
Wei Shao , Bolin Hua , Linqi Song

A lot of new scientific documents are being published on various platforms every day. It is more and more imperative to quickly and efficiently discover new words and meanings from these documents. However, most of the related works rely on labeled data, and it is quite difficult to deal with unlabeled new documents efficiently. For this, we have introduced an unsupervised method based on sentence patterns and part of speech (POS) sequences. Our method just needs a few initial learnable patterns to obtain the initial terminology tokens and their POS sequences. In this process, new patterns are constructed and can match more sentences to find more POS sequences of terminology. Finally, we use obtained POS sequences and sentence patterns to extract terminology terms in new scientific text. Experiments on paper abstracts from Web of Knowledge show that this method is practical and can achieve a good performance on our test data.

每天都有大量新的科学文献在各种平台上发表。从这些文档中快速有效地发现新词和词义变得越来越重要。然而,大多数相关工作依赖于标记数据,有效地处理未标记的新文档是相当困难的。为此,我们提出了一种基于句型和词性序列的无监督方法。我们的方法只需要一些初始的可学习模式来获得初始术语令牌及其POS序列。在这个过程中,新的模式被构建,并且可以匹配更多的句子,从而找到更多的术语的词序。最后,利用获得的词序和句式对新科学文本中的术语进行提取。在Web of Knowledge的论文摘要上进行的实验表明,该方法是实用的,可以在我们的测试数据上取得良好的性能。
{"title":"A Pattern and POS Auto-Learning Method for Terminology Extraction from Scientific Text","authors":"Wei Shao ,&nbsp;Bolin Hua ,&nbsp;Linqi Song","doi":"10.2478/dim-2021-0005","DOIUrl":"10.2478/dim-2021-0005","url":null,"abstract":"<div><p>A lot of new scientific documents are being published on various platforms every day. It is more and more imperative to quickly and efficiently discover new words and meanings from these documents. However, most of the related works rely on labeled data, and it is quite difficult to deal with unlabeled new documents efficiently. For this, we have introduced an unsupervised method based on sentence patterns and part of speech (POS) sequences. Our method just needs a few initial learnable patterns to obtain the initial terminology tokens and their POS sequences. In this process, new patterns are constructed and can match more sentences to find more POS sequences of terminology. Finally, we use obtained POS sequences and sentence patterns to extract terminology terms in new scientific text. Experiments on paper abstracts from Web of Knowledge show that this method is practical and can achieve a good performance on our test data.</p></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"5 3","pages":"Pages 329-335"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2543925122000031/pdfft?md5=def416db2e2762263b15157e5919b4c2&pid=1-s2.0-S2543925122000031-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45670167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Automatic Subject Classification of Public Messages in E-government Affairs 电子政务中公共信息的主题自动分类
Pub Date : 2021-07-01 DOI: 10.2478/dim-2021-0004
Pei Pan , Yijin Chen

Public messages on the Internet political inquiry platform rely on manual classification, which has the problems of heavy workload, low efficiency, and high error rate. A Bi-directional long short-term memory (Bi-LSTM) network model based on attention mechanism was proposed in this paper to realize the automatic classification of public messages. Considering the network political inquiry data set provided by the BdRace platform as samples, the Bi-LSTM algorithm is used to strengthen the correlation between the messages before and after the training process, and the semantic attention to important text features is strengthened in combination with the characteristics of attention mechanism. Feature weights are integrated through the full connection layer to carry out classification calculations. The experimental results show that the F1 value of the message classification model proposed here reaches 0.886 and 0.862, respectively, in the data set of long text and short text. Compared with three algorithms of long short-term memory (LSTM), logistic regression, and naive Bayesian, the Bi-LSTM model can achieve better results in the automatic classification of public message subjects.

网络政治查询平台上的公开信息主要依靠人工分类,存在工作量大、效率低、错误率高等问题。为了实现公共消息的自动分类,提出了一种基于注意机制的双向长短期记忆(Bi-LSTM)网络模型。以BdRace平台提供的网络政治查询数据集为样本,采用Bi-LSTM算法加强训练过程前后消息之间的相关性,并结合注意机制的特点加强对重要文本特征的语义关注。通过全连接层整合特征权值进行分类计算。实验结果表明,本文提出的消息分类模型在长文和短文本数据集中的F1值分别达到0.886和0.862。与长短期记忆(LSTM)、逻辑回归和朴素贝叶斯三种算法相比,Bi-LSTM模型在公共消息主题的自动分类中取得了更好的效果。
{"title":"Automatic Subject Classification of Public Messages in E-government Affairs","authors":"Pei Pan ,&nbsp;Yijin Chen","doi":"10.2478/dim-2021-0004","DOIUrl":"10.2478/dim-2021-0004","url":null,"abstract":"<div><p>Public messages on the Internet political inquiry platform rely on manual classification, which has the problems of heavy workload, low efficiency, and high error rate. A Bi-directional long short-term memory (Bi-LSTM) network model based on attention mechanism was proposed in this paper to realize the automatic classification of public messages. Considering the network political inquiry data set provided by the BdRace platform as samples, the Bi-LSTM algorithm is used to strengthen the correlation between the messages before and after the training process, and the semantic attention to important text features is strengthened in combination with the characteristics of attention mechanism. Feature weights are integrated through the full connection layer to carry out classification calculations. The experimental results show that the F1 value of the message classification model proposed here reaches 0.886 and 0.862, respectively, in the data set of long text and short text. Compared with three algorithms of long short-term memory (LSTM), logistic regression, and naive Bayesian, the Bi-LSTM model can achieve better results in the automatic classification of public message subjects.</p></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"5 3","pages":"Pages 336-347"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2543925122000043/pdfft?md5=9eb8a1ad631981af104c47aa695f8e57&pid=1-s2.0-S2543925122000043-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46034683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Towards a Sustainable Infrastructure for the Preservation of Cultural Heritage and Digital Scholarship 为保护文化遗产和数字学术建立可持续的基础设施
Pub Date : 2021-04-01 DOI: 10.2478/dim-2020-0052
Peter X. Zhou

The digital lifecycle encompasses definitive processes for data curation and management, long-term preservation, and dissemination, all of which are key building blocks in the development of a digital library. Maintaining a complete digital lifecycle workflow is vital to the preservation of digital cultural heritage and digital scholarship. This paper considers digital lifecycle programs for digital libraries, noting similarities between the digital and print lifecycles and referring to the example of the Digital Dunhuang project. Only through a systematic and sustainable digital lifecycle program can platforms for cross-disciplinary research and repositories for large aggregations of digital content be built. Moreover, advancing digital lifecycle development will ensure that knowledge and scholarship created in the digital age will have the same chances for survival that print-and-paper scholarship has had for centuries. It will also ensure that digital library users will have effective access to aggregated content across different domains and platforms.

数字生命周期包括数据管理、长期保存和传播的明确过程,所有这些都是数字图书馆发展的关键组成部分。维护完整的数字生命周期工作流对于保护数字文化遗产和数字学术至关重要。本文研究了数字图书馆的数字生命周期计划,指出了数字生命周期与印刷生命周期的相似之处,并参考了数字敦煌项目的例子。只有通过系统和可持续的数字生命周期计划,才能建立跨学科研究平台和大型数字内容聚合库。此外,推进数字生命周期发展将确保在数字时代创造的知识和学术将拥有与几个世纪以来印刷和纸张学术相同的生存机会。它还将确保数字图书馆用户能够有效地访问不同领域和平台的聚合内容。
{"title":"Towards a Sustainable Infrastructure for the Preservation of Cultural Heritage and Digital Scholarship","authors":"Peter X. Zhou","doi":"10.2478/dim-2020-0052","DOIUrl":"https://doi.org/10.2478/dim-2020-0052","url":null,"abstract":"<div><p>The digital lifecycle encompasses definitive processes for data curation and management, long-term preservation, and dissemination, all of which are key building blocks in the development of a digital library. Maintaining a complete digital lifecycle workflow is vital to the preservation of digital cultural heritage and digital scholarship. This paper considers digital lifecycle programs for digital libraries, noting similarities between the digital and print lifecycles and referring to the example of the Digital Dunhuang project. Only through a systematic and sustainable digital lifecycle program can platforms for cross-disciplinary research and repositories for large aggregations of digital content be built. Moreover, advancing digital lifecycle development will ensure that knowledge and scholarship created in the digital age will have the same chances for survival that print-and-paper scholarship has had for centuries. It will also ensure that digital library users will have effective access to aggregated content across different domains and platforms.</p></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"5 2","pages":"Pages 253-261"},"PeriodicalIF":0.0,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2543925122000122/pdfft?md5=fc16c10f00b08b6e8a9abc81a05fc721&pid=1-s2.0-S2543925122000122-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"92006378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Data and information management
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1