Abstract Purpose In studies of the research process, the association between how researchers conceptualize research and their strategic research agendas has been largely overlooked. This study aims to address this gap. Design/methodology/approach This study analyzes this relationship using a dataset of more than 8,500 researchers across all scientific fields and the globe. It studies the associations between the dimensions of two inventories: the Conceptions of Research Inventory (CoRI) and the Multi-Dimensional Research Agenda Inventory—Revised (MDRAI-R). Findings The findings show a relatively strong association between researchers’ conceptions of research and their research agendas. While all conceptions of research are positively related to scientific ambition, the findings are mixed regarding how the dimensions of the two inventories relate to one another, which is significant for those seeking to understand the knowledge production process better. Research limitations The study relies on self-reported data, which always carries a risk of response bias. Practical implications The findings provide a greater understanding of the inner workings of knowledge processes and indicate that the two inventories, whether used individually or in combination, may provide complementary analytical perspectives to research performance indicators. They may thus offer important insights for managers of research environments regarding how to assess the research culture, beliefs, and conceptualizations of individual researchers and research teams when designing strategies to promote specific institutional research focuses and strategies. Originality/value To the best of the authors’ knowledge, this is the first study to associate research agendas and conceptions of research. It is based on a large sample of researchers working worldwide and in all fields of knowledge, which ensures that the findings have a reasonable degree of generalizability to the global population of researchers.
{"title":"The Association between Researchers’ Conceptions of Research and Their Strategic Research Agendas","authors":"João M. Santos, H. Horta","doi":"10.2478/jdis-2020-0032","DOIUrl":"https://doi.org/10.2478/jdis-2020-0032","url":null,"abstract":"Abstract Purpose In studies of the research process, the association between how researchers conceptualize research and their strategic research agendas has been largely overlooked. This study aims to address this gap. Design/methodology/approach This study analyzes this relationship using a dataset of more than 8,500 researchers across all scientific fields and the globe. It studies the associations between the dimensions of two inventories: the Conceptions of Research Inventory (CoRI) and the Multi-Dimensional Research Agenda Inventory—Revised (MDRAI-R). Findings The findings show a relatively strong association between researchers’ conceptions of research and their research agendas. While all conceptions of research are positively related to scientific ambition, the findings are mixed regarding how the dimensions of the two inventories relate to one another, which is significant for those seeking to understand the knowledge production process better. Research limitations The study relies on self-reported data, which always carries a risk of response bias. Practical implications The findings provide a greater understanding of the inner workings of knowledge processes and indicate that the two inventories, whether used individually or in combination, may provide complementary analytical perspectives to research performance indicators. They may thus offer important insights for managers of research environments regarding how to assess the research culture, beliefs, and conceptualizations of individual researchers and research teams when designing strategies to promote specific institutional research focuses and strategies. Originality/value To the best of the authors’ knowledge, this is the first study to associate research agendas and conceptions of research. It is based on a large sample of researchers working worldwide and in all fields of knowledge, which ensures that the findings have a reasonable degree of generalizability to the global population of researchers.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"5 1","pages":"56 - 74"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44315615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Purpose In this contribution we provide two new co-authorship indicators based on fractional counting. Design/methodology/approach Based on the idea of fractional counting we reflect on what should be an acceptable indicator for co-authorship between two entities. From this reflection we propose an indicator, the co-authorship score, denoted as cs, using the harmonic mean. Dividing this new indicator by the classical co-authorship indicator based on full counting, leads to a co-authorship intensity indicator. Findings We show that the indicators we propose have many necessary or at least highly desirable properties for a proper cs-score. It is pointed out that the two new indicators can be used for countries, but also for institutions and other pairs of entities. A small example shows the feasibility of the co-authorship score and the co-authorship intensity indicator. Research limitations The indicators are not yet tested in real cases. Practical implications As the notions of co-authorship and collaboration have many aspects, we think that our contribution may help policy management to take yet another aspect into account as part of a multi-faceted description of research outcomes. Originality/value The indicators we propose cover yet another aspect of co-authorship.
{"title":"Bilateral Co-authorship Indicators Based on Fractional Counting","authors":"R. Rousseau, Lin Zhang","doi":"10.2478/jdis-2021-0005","DOIUrl":"https://doi.org/10.2478/jdis-2021-0005","url":null,"abstract":"Abstract Purpose In this contribution we provide two new co-authorship indicators based on fractional counting. Design/methodology/approach Based on the idea of fractional counting we reflect on what should be an acceptable indicator for co-authorship between two entities. From this reflection we propose an indicator, the co-authorship score, denoted as cs, using the harmonic mean. Dividing this new indicator by the classical co-authorship indicator based on full counting, leads to a co-authorship intensity indicator. Findings We show that the indicators we propose have many necessary or at least highly desirable properties for a proper cs-score. It is pointed out that the two new indicators can be used for countries, but also for institutions and other pairs of entities. A small example shows the feasibility of the co-authorship score and the co-authorship intensity indicator. Research limitations The indicators are not yet tested in real cases. Practical implications As the notions of co-authorship and collaboration have many aspects, we think that our contribution may help policy management to take yet another aspect into account as part of a multi-faceted description of research outcomes. Originality/value The indicators we propose cover yet another aspect of co-authorship.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"1 - 12"},"PeriodicalIF":0.0,"publicationDate":"2020-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41647492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Janne Pölönen, Raf Guns, Emanuel Kulczycki, G. Sivertsen, Tim C. E. Engels
Abstract Purpose This paper presents an overview of different kinds of lists of scholarly publication channels and of experiences related to the construction and maintenance of national lists supporting performance-based research funding systems. It also contributes with a set of recommendations for the construction and maintenance of national lists of journals and book publishers. Design/methodology/approach The study is based on analysis of previously published studies, policy papers, and reported experiences related to the construction and use of lists of scholarly publication channels. Findings Several countries have systems for research funding and/or evaluation, that involve the use of national lists of scholarly publication channels (mainly journals and publishers). Typically, such lists are selective (do not include all scholarly or non-scholarly channels) and differentiated (distinguish between channels of different levels and quality). At the same time, most lists are embedded in a system that encompasses multiple or all disciplines. This raises the question how such lists can be organized and maintained to ensure that all relevant disciplines and all types of research are adequately represented. Research limitation The conclusions and recommendations of the study are based on the authors’ interpretation of a complex and sometimes controversial process with many different stakeholders involved. Practical implications The recommendations and the related background information provided in this paper enable mutual learning that may feed into improvements in the construction and maintenance of national and other lists of scholarly publication channels in any geographical context. This may foster a development of responsible evaluation practices. Originality/value This paper presents the first general overview and typology of different kinds of publication channel lists, provides insights on expert-based versus metrics-based evaluation, and formulates a set of recommendations for the responsible construction and maintenance of publication channel lists.
{"title":"National Lists of Scholarly Publication Channels: An Overview and Recommendations for Their Construction and Maintenance","authors":"Janne Pölönen, Raf Guns, Emanuel Kulczycki, G. Sivertsen, Tim C. E. Engels","doi":"10.2478/jdis-2021-0004","DOIUrl":"https://doi.org/10.2478/jdis-2021-0004","url":null,"abstract":"Abstract Purpose This paper presents an overview of different kinds of lists of scholarly publication channels and of experiences related to the construction and maintenance of national lists supporting performance-based research funding systems. It also contributes with a set of recommendations for the construction and maintenance of national lists of journals and book publishers. Design/methodology/approach The study is based on analysis of previously published studies, policy papers, and reported experiences related to the construction and use of lists of scholarly publication channels. Findings Several countries have systems for research funding and/or evaluation, that involve the use of national lists of scholarly publication channels (mainly journals and publishers). Typically, such lists are selective (do not include all scholarly or non-scholarly channels) and differentiated (distinguish between channels of different levels and quality). At the same time, most lists are embedded in a system that encompasses multiple or all disciplines. This raises the question how such lists can be organized and maintained to ensure that all relevant disciplines and all types of research are adequately represented. Research limitation The conclusions and recommendations of the study are based on the authors’ interpretation of a complex and sometimes controversial process with many different stakeholders involved. Practical implications The recommendations and the related background information provided in this paper enable mutual learning that may feed into improvements in the construction and maintenance of national and other lists of scholarly publication channels in any geographical context. This may foster a development of responsible evaluation practices. Originality/value This paper presents the first general overview and typology of different kinds of publication channel lists, provides insights on expert-based versus metrics-based evaluation, and formulates a set of recommendations for the responsible construction and maintenance of publication channel lists.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"50 - 86"},"PeriodicalIF":0.0,"publicationDate":"2020-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46941913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Purpose This work aims to normalize the NlpContributions scheme (henceforward, NlpContributionGraph) to structure, directly from article sentences, the contributions information in Natural Language Processing (NLP) scholarly articles via a two-stage annotation methodology: 1) pilot stage—to define the scheme (described in prior work); and 2) adjudication stage—to normalize the graphing model (the focus of this paper). Design/methodology/approach We re-annotate, a second time, the contributions-pertinent information across 50 prior-annotated NLP scholarly articles in terms of a data pipeline comprising: contribution-centered sentences, phrases, and triple statements. To this end, specifically, care was taken in the adjudication annotation stage to reduce annotation noise while formulating the guidelines for our proposed novel NLP contributions structuring and graphing scheme. Findings The application of NlpContributionGraph on the 50 articles resulted finally in a dataset of 900 contribution-focused sentences, 4,702 contribution-information-centered phrases, and 2,980 surface-structured triples. The intra-annotation agreement between the first and second stages, in terms of F1-score, was 67.92% for sentences, 41.82% for phrases, and 22.31% for triple statements indicating that with increased granularity of the information, the annotation decision variance is greater. Research limitations NlpContributionGraph has limited scope for structuring scholarly contributions compared with STEM (Science, Technology, Engineering, and Medicine) scholarly knowledge at large. Further, the annotation scheme in this work is designed by only an intra-annotator consensus—a single annotator first annotated the data to propose the initial scheme, following which, the same annotator reannotated the data to normalize the annotations in an adjudication stage. However, the expected goal of this work is to achieve a standardized retrospective model of capturing NLP contributions from scholarly articles. This would entail a larger initiative of enlisting multiple annotators to accommodate different worldviews into a “single” set of structures and relationships as the final scheme. Given that the initial scheme is first proposed and the complexity of the annotation task in the realistic timeframe, our intra-annotation procedure is well-suited. Nevertheless, the model proposed in this work is presently limited since it does not incorporate multiple annotator worldviews. This is planned as future work to produce a robust model. Practical implications We demonstrate NlpContributionGraph data integrated into the Open Research Knowledge Graph (ORKG), a next-generation KG-based digital library with intelligent computations enabled over structured scholarly knowledge, as a viable aid to assist researchers in their day-to-day tasks. Originality/value NlpContributionGraph is a novel scheme to annotate research contributions from NLP articles and integrate them in a knowledge
{"title":"Sentence, Phrase, and Triple Annotations to Build a Knowledge Graph of Natural Language Processing Contributions—A Trial Dataset","authors":"J. D’Souza, S. Auer","doi":"10.2478/jdis-2021-0023","DOIUrl":"https://doi.org/10.2478/jdis-2021-0023","url":null,"abstract":"Abstract Purpose This work aims to normalize the NlpContributions scheme (henceforward, NlpContributionGraph) to structure, directly from article sentences, the contributions information in Natural Language Processing (NLP) scholarly articles via a two-stage annotation methodology: 1) pilot stage—to define the scheme (described in prior work); and 2) adjudication stage—to normalize the graphing model (the focus of this paper). Design/methodology/approach We re-annotate, a second time, the contributions-pertinent information across 50 prior-annotated NLP scholarly articles in terms of a data pipeline comprising: contribution-centered sentences, phrases, and triple statements. To this end, specifically, care was taken in the adjudication annotation stage to reduce annotation noise while formulating the guidelines for our proposed novel NLP contributions structuring and graphing scheme. Findings The application of NlpContributionGraph on the 50 articles resulted finally in a dataset of 900 contribution-focused sentences, 4,702 contribution-information-centered phrases, and 2,980 surface-structured triples. The intra-annotation agreement between the first and second stages, in terms of F1-score, was 67.92% for sentences, 41.82% for phrases, and 22.31% for triple statements indicating that with increased granularity of the information, the annotation decision variance is greater. Research limitations NlpContributionGraph has limited scope for structuring scholarly contributions compared with STEM (Science, Technology, Engineering, and Medicine) scholarly knowledge at large. Further, the annotation scheme in this work is designed by only an intra-annotator consensus—a single annotator first annotated the data to propose the initial scheme, following which, the same annotator reannotated the data to normalize the annotations in an adjudication stage. However, the expected goal of this work is to achieve a standardized retrospective model of capturing NLP contributions from scholarly articles. This would entail a larger initiative of enlisting multiple annotators to accommodate different worldviews into a “single” set of structures and relationships as the final scheme. Given that the initial scheme is first proposed and the complexity of the annotation task in the realistic timeframe, our intra-annotation procedure is well-suited. Nevertheless, the model proposed in this work is presently limited since it does not incorporate multiple annotator worldviews. This is planned as future work to produce a robust model. Practical implications We demonstrate NlpContributionGraph data integrated into the Open Research Knowledge Graph (ORKG), a next-generation KG-based digital library with intelligent computations enabled over structured scholarly knowledge, as a viable aid to assist researchers in their day-to-day tasks. Originality/value NlpContributionGraph is a novel scheme to annotate research contributions from NLP articles and integrate them in a knowledge","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"6 - 34"},"PeriodicalIF":0.0,"publicationDate":"2020-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48540740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Iwami, Toshihiko Shimizu, M. Empizo, J. Gabayno, N. Sarukura, Shota Fujii, Yoshinari Sumimura
Abstract Purpose The purpose of this research is to provide evidence for decision-makers to realize the potentials of collaborations between countries/regions via the scientometric analysis of co-authoring in academic publications. Design/methodology/approach The approach is that Osaka University, which has set a strategy to become a global campus, is positioned to have a leading role to enhance such collaborations. This research measures co-authoring relations between Osaka University and other countries/regions to identify networks for fostering strong research collaborations. Findings Five countries are identified as candidates for the future global campuses of Osaka University based on three factors, co-authoring relations, GDP growth, and population growth. Research limitations The main limitation of this study is not being able to use the relations by the former positions of authors in Osaka University, because the data retrieved is limited by the query of the organization name at the first step. Practical implications The significance of this work is to provide evidence for the university strategy to expand abroad based on the quantity and visualization of trends. Originality/value With wider practical implementations, the approach of this research is useful in making a strategic roadmap for scientific organizations that intend to collaborate internationally.
{"title":"Current Status and Enhancement of Collaborative Research in the World: A Case Study of Osaka University","authors":"S. Iwami, Toshihiko Shimizu, M. Empizo, J. Gabayno, N. Sarukura, Shota Fujii, Yoshinari Sumimura","doi":"10.2478/jdis-2020-0035","DOIUrl":"https://doi.org/10.2478/jdis-2020-0035","url":null,"abstract":"Abstract Purpose The purpose of this research is to provide evidence for decision-makers to realize the potentials of collaborations between countries/regions via the scientometric analysis of co-authoring in academic publications. Design/methodology/approach The approach is that Osaka University, which has set a strategy to become a global campus, is positioned to have a leading role to enhance such collaborations. This research measures co-authoring relations between Osaka University and other countries/regions to identify networks for fostering strong research collaborations. Findings Five countries are identified as candidates for the future global campuses of Osaka University based on three factors, co-authoring relations, GDP growth, and population growth. Research limitations The main limitation of this study is not being able to use the relations by the former positions of authors in Osaka University, because the data retrieved is limited by the query of the organization name at the first step. Practical implications The significance of this work is to provide evidence for the university strategy to expand abroad based on the quantity and visualization of trends. Originality/value With wider practical implementations, the approach of this research is useful in making a strategic roadmap for scientific organizations that intend to collaborate internationally.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"5 1","pages":"75 - 85"},"PeriodicalIF":0.0,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47406637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Purpose Using the metaphor of “unicorn,” we identify the scientific papers and technical patents characterized by the informetric feature of very high citations in the first ten years after publishing, which may provide a new pattern to understand very high impact works in science and technology. Design/methodology/approach When we set CT as the total citations of papers or patents in the first ten years after publication, with CT≥ 5,000 for scientific “unicorn” and CT≥ 500 for technical “unicorn,” we have an absolute standard for identifying scientific and technical “unicorn” publications. Findings We identify 165 scientific “unicorns” in 14,301,875 WoS papers and 224 technical “unicorns” in 13,728,950 DII patents during 2001–2012. About 50% of “unicorns” belong to biomedicine, in which selected cases are individually discussed. The rare “unicorns” increase following linear model, the fitting data show 95% confidence with the RMSE of scientific “unicorn” is 0.2127 while the RMSE of technical “unicorn” is 0.0923. Research limitations A “unicorn” is a pure quantitative consideration without concerning its quality, and “potential unicorns” as CT≤5,000 for papers and CT≤500 for patents are left in future studies. Practical implications Scientific and technical “unicorns” provide a new pattern to understand high-impact works in science and technology. The “unicorn” pattern supplies a concise approach to identify very high-impact scientific papers and technical patents. Originality/value The “unicorn” pattern supplies a concise approach to identify very high impact scientific papers and technical patents.
{"title":"Identifying Scientific and Technical “Unicorns”","authors":"Lucy L. Xu, Miao Qi, F. Y. Ye","doi":"10.2478/jdis-2021-0002","DOIUrl":"https://doi.org/10.2478/jdis-2021-0002","url":null,"abstract":"Abstract Purpose Using the metaphor of “unicorn,” we identify the scientific papers and technical patents characterized by the informetric feature of very high citations in the first ten years after publishing, which may provide a new pattern to understand very high impact works in science and technology. Design/methodology/approach When we set CT as the total citations of papers or patents in the first ten years after publication, with CT≥ 5,000 for scientific “unicorn” and CT≥ 500 for technical “unicorn,” we have an absolute standard for identifying scientific and technical “unicorn” publications. Findings We identify 165 scientific “unicorns” in 14,301,875 WoS papers and 224 technical “unicorns” in 13,728,950 DII patents during 2001–2012. About 50% of “unicorns” belong to biomedicine, in which selected cases are individually discussed. The rare “unicorns” increase following linear model, the fitting data show 95% confidence with the RMSE of scientific “unicorn” is 0.2127 while the RMSE of technical “unicorn” is 0.0923. Research limitations A “unicorn” is a pure quantitative consideration without concerning its quality, and “potential unicorns” as CT≤5,000 for papers and CT≤500 for patents are left in future studies. Practical implications Scientific and technical “unicorns” provide a new pattern to understand high-impact works in science and technology. The “unicorn” pattern supplies a concise approach to identify very high-impact scientific papers and technical patents. Originality/value The “unicorn” pattern supplies a concise approach to identify very high impact scientific papers and technical patents.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"96 - 115"},"PeriodicalIF":0.0,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48100575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michal Monselise, J. Greenberg, Ou Stella Liang, Sonia M. Pascua, Heejun Kim, Mat Kelly, Joan Boone, Christopher C. Yang
Abstract Purpose Given the ubiquitous presence of the internet in our lives, many individuals turn to the web for medical information. A challenge here is that many laypersons (as “consumers”) do not use professional terms found in the medical nomenclature when describing their conditions and searching the internet. The Consumer Health Vocabulary (CHV) ontology, initially developed in 2007, aimed to bridge this gap, although updates have been limited over the last decade. The purpose of this research is to implement a means of automatically creating a hierarchical consumer health vocabulary. This overall purpose is improving consumers’ ability to search for medical conditions and symptoms with an enhanced CHV and improving the search capabilities of our searching and indexing tool HIVE (Helping Interdisciplinary Vocabulary Engineering). Design/methodology/approach The research design uses ontological fusion, an approach for automatically extracting and integrating the Medical Subject Headings (MeSH) ontology into CHV, and further convert CHV from a flat mapping to a hierarchical ontology. The additional relationships and parent terms from MeSH allow us to uncover relationships between existing terms in the CHV ontology as well. The research design also included improving the search capabilities of HIVE identifying alternate relationships and consolidating them to a single entry. Findings The key findings are an improved CHV with a hierarchical structure that enables consumers to search through the ontology and uncover more relationships. Research limitations There are some cases where the improved search results in HIVE return terms that are related but not completely synonymous. We present an example and discuss the implications of this result. Practical implications This research makes available an updated and richer CHV ontology using the HIVE tool. Consumers may use this tool to search consumer terminology for medical conditions and symptoms. The HIVE tool will return results about the medical term linked with the consumer term as well as the hierarchy of other medical terms connected to the term. Originality/value This is a first attempt in over a decade to improve and enhance the CHV ontology with current terminology and the first research effort to convert CHV's original flat ontology structure to a hierarchical structure. This research also enhances the HIVE infrastructure and provides consumers with a simple, efficient mechanism for searching the CHV ontology and providing meaningful data to consumers.
{"title":"An Automatic Approach to Extending the Consumer Health Vocabulary","authors":"Michal Monselise, J. Greenberg, Ou Stella Liang, Sonia M. Pascua, Heejun Kim, Mat Kelly, Joan Boone, Christopher C. Yang","doi":"10.2478/jdis-2021-0003","DOIUrl":"https://doi.org/10.2478/jdis-2021-0003","url":null,"abstract":"Abstract Purpose Given the ubiquitous presence of the internet in our lives, many individuals turn to the web for medical information. A challenge here is that many laypersons (as “consumers”) do not use professional terms found in the medical nomenclature when describing their conditions and searching the internet. The Consumer Health Vocabulary (CHV) ontology, initially developed in 2007, aimed to bridge this gap, although updates have been limited over the last decade. The purpose of this research is to implement a means of automatically creating a hierarchical consumer health vocabulary. This overall purpose is improving consumers’ ability to search for medical conditions and symptoms with an enhanced CHV and improving the search capabilities of our searching and indexing tool HIVE (Helping Interdisciplinary Vocabulary Engineering). Design/methodology/approach The research design uses ontological fusion, an approach for automatically extracting and integrating the Medical Subject Headings (MeSH) ontology into CHV, and further convert CHV from a flat mapping to a hierarchical ontology. The additional relationships and parent terms from MeSH allow us to uncover relationships between existing terms in the CHV ontology as well. The research design also included improving the search capabilities of HIVE identifying alternate relationships and consolidating them to a single entry. Findings The key findings are an improved CHV with a hierarchical structure that enables consumers to search through the ontology and uncover more relationships. Research limitations There are some cases where the improved search results in HIVE return terms that are related but not completely synonymous. We present an example and discuss the implications of this result. Practical implications This research makes available an updated and richer CHV ontology using the HIVE tool. Consumers may use this tool to search consumer terminology for medical conditions and symptoms. The HIVE tool will return results about the medical term linked with the consumer term as well as the hierarchy of other medical terms connected to the term. Originality/value This is a first attempt in over a decade to improve and enhance the CHV ontology with current terminology and the first research effort to convert CHV's original flat ontology structure to a hierarchical structure. This research also enhances the HIVE infrastructure and provides consumers with a simple, efficient mechanism for searching the CHV ontology and providing meaningful data to consumers.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"35 - 49"},"PeriodicalIF":0.0,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45007645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction: Corrigendum: The Gender Patenting Gap: A Study on the Iberoamerican Countries","authors":"Danilo S. Carvalho, Lydia Bares, Kelyane Silva","doi":"10.2478/jdis-2020-0039","DOIUrl":"https://doi.org/10.2478/jdis-2020-0039","url":null,"abstract":"","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"5 1","pages":"147 - 150"},"PeriodicalIF":0.0,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42213468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Purpose Changes in the world show that the role, importance, and coherence of SSH (social sciences and the humanities) will increase significantly in the coming years. This paper aims to monitor and analyze the evolution (or overlapping) of the SSH thematic pattern through three funding instruments since 2007. Design/methodology/approach The goal of the paper is to check to what extent the EU Framework Program (FP) affects/does not affect research on national level, and to highlight hot topics from a given period with the help of text analysis. Funded project titles and abstracts derived from the EU FP, Slovenian, and Estonian RIS were used. The final analysis and comparisons between different datasets were made based on the 200 most frequent words. After removing punctuation marks, numeric values, articles, prepositions, conjunctions, and auxiliary verbs, 4,854 unique words in ETIS, 4,421 unique words in the Slovenian Research Information System (SICRIS), and 3,950 unique words in FP were identified. Findings Across all funding instruments, about a quarter of the top words constitute half of the word occurrences. The text analysis results show that in the majority of cases words do not overlap between FP and nationally funded projects. In some cases, it may be due to using different vocabulary. There is more overlapping between words in the case of Slovenia (SL) and Estonia (EE) and less in the case of Estonia and EU Framework Programmes (FP). At the same time, overlapping words indicate a wider reach (culture, education, social, history, human, innovation, etc.). In nationally funded projects (bottom-up), it was relatively difficult to observe the change in thematic trends over time. More specific results emerged from the comparison of the different programs throughout FP (top-down). Research limitations Only projects with English titles and abstracts were analyzed. Practical implications The specifics of SSH have to take into account—the one-to-one meaning of terms/words is not as important as, for example, in the exact sciences. Thus, even in co-word analysis, the final content may go unnoticed. Originality/value This was the first attempt to monitor the trends of SSH projects using text analysis. The text analysis of the SSH projects of the two new EU Member States used in the study showed that SSH's thematic coverage is not much affected by the EU Framework Program. Whether this result is field-specific or country-specific should be shown in the following study, which targets SSH projects in the so-called old Member States.
{"title":"Priorities for Social and Humanities Projects Based on Text Analysis①","authors":"Ülle Must","doi":"10.2478/jdis-2020-0036","DOIUrl":"https://doi.org/10.2478/jdis-2020-0036","url":null,"abstract":"Abstract Purpose Changes in the world show that the role, importance, and coherence of SSH (social sciences and the humanities) will increase significantly in the coming years. This paper aims to monitor and analyze the evolution (or overlapping) of the SSH thematic pattern through three funding instruments since 2007. Design/methodology/approach The goal of the paper is to check to what extent the EU Framework Program (FP) affects/does not affect research on national level, and to highlight hot topics from a given period with the help of text analysis. Funded project titles and abstracts derived from the EU FP, Slovenian, and Estonian RIS were used. The final analysis and comparisons between different datasets were made based on the 200 most frequent words. After removing punctuation marks, numeric values, articles, prepositions, conjunctions, and auxiliary verbs, 4,854 unique words in ETIS, 4,421 unique words in the Slovenian Research Information System (SICRIS), and 3,950 unique words in FP were identified. Findings Across all funding instruments, about a quarter of the top words constitute half of the word occurrences. The text analysis results show that in the majority of cases words do not overlap between FP and nationally funded projects. In some cases, it may be due to using different vocabulary. There is more overlapping between words in the case of Slovenia (SL) and Estonia (EE) and less in the case of Estonia and EU Framework Programmes (FP). At the same time, overlapping words indicate a wider reach (culture, education, social, history, human, innovation, etc.). In nationally funded projects (bottom-up), it was relatively difficult to observe the change in thematic trends over time. More specific results emerged from the comparison of the different programs throughout FP (top-down). Research limitations Only projects with English titles and abstracts were analyzed. Practical implications The specifics of SSH have to take into account—the one-to-one meaning of terms/words is not as important as, for example, in the exact sciences. Thus, even in co-word analysis, the final content may go unnoticed. Originality/value This was the first attempt to monitor the trends of SSH projects using text analysis. The text analysis of the SSH projects of the two new EU Member States used in the study showed that SSH's thematic coverage is not much affected by the EU Framework Program. Whether this result is field-specific or country-specific should be shown in the following study, which targets SSH projects in the so-called old Member States.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"5 1","pages":"116 - 125"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44997062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Purpose Researchers frequently encounter the following problems when writing scientific articles: (1) Selecting appropriate citations to support the research idea is challenging. (2) The literature review is not conducted extensively, which leads to working on a research problem that others have well addressed. The study focuses on citation recommendation in the related studies section by applying the term function of a citation context, potentially improving the efficiency of writing a literature review. Design/methodology/approach We present nine term functions with three newly created and six identified from existing literature. Using these term functions as labels, we annotate 531 research papers in three topics to evaluate our proposed recommendation strategy. BM25 and Word2vec with VSM are implemented as the baseline models for the recommendation. Then the term function information is applied to enhance the performance. Findings The experiments show that the term function-based methods outperform the baseline methods regarding the recall, precision, and F1-score measurement, demonstrating that term functions are useful in identifying valuable citations. Research limitations The dataset is insufficient due to the complexity of annotating citation functions for paragraphs in the related studies section. More recent deep learning models should be performed to future validate the proposed approach. Practical implications The citation recommendation strategy can be helpful for valuable citation discovery, semantic scientific retrieval, and automatic literature review generation. Originality/value The proposed citation function-based citation recommendation can generate intuitive explanations of the results for users, improving the transparency, persuasiveness, and effectiveness of recommender systems.
{"title":"A New Citation Recommendation Strategy Based on Term Functions in Related Studies Section","authors":"Haihua Chen","doi":"10.2478/jdis-2021-0022","DOIUrl":"https://doi.org/10.2478/jdis-2021-0022","url":null,"abstract":"Abstract Purpose Researchers frequently encounter the following problems when writing scientific articles: (1) Selecting appropriate citations to support the research idea is challenging. (2) The literature review is not conducted extensively, which leads to working on a research problem that others have well addressed. The study focuses on citation recommendation in the related studies section by applying the term function of a citation context, potentially improving the efficiency of writing a literature review. Design/methodology/approach We present nine term functions with three newly created and six identified from existing literature. Using these term functions as labels, we annotate 531 research papers in three topics to evaluate our proposed recommendation strategy. BM25 and Word2vec with VSM are implemented as the baseline models for the recommendation. Then the term function information is applied to enhance the performance. Findings The experiments show that the term function-based methods outperform the baseline methods regarding the recall, precision, and F1-score measurement, demonstrating that term functions are useful in identifying valuable citations. Research limitations The dataset is insufficient due to the complexity of annotating citation functions for paragraphs in the related studies section. More recent deep learning models should be performed to future validate the proposed approach. Practical implications The citation recommendation strategy can be helpful for valuable citation discovery, semantic scientific retrieval, and automatic literature review generation. Originality/value The proposed citation function-based citation recommendation can generate intuitive explanations of the results for users, improving the transparency, persuasiveness, and effectiveness of recommender systems.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"75 - 98"},"PeriodicalIF":0.0,"publicationDate":"2020-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46889303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}