Alexander S. Behr , Diana Chernenko , Dominik Koßmann , Arjun Neyyathala , Schirin Hanf , Stephan A. Schunk , Norbert Kockmann
{"title":"Generating knowledge graphs through text mining of catalysis research related literature†","authors":"Alexander S. Behr , Diana Chernenko , Dominik Koßmann , Arjun Neyyathala , Schirin Hanf , Stephan A. Schunk , Norbert Kockmann","doi":"10.1039/d4cy00369a","DOIUrl":null,"url":null,"abstract":"<div><div>Structured research data management in catalysis is crucial, especially for large amounts of data, and should be guided by FAIR principles for easy access and compatibility of data. Ontologies help to organize knowledge in a structured and FAIR way. The increasing numbers of scientific publications call for automated methods to preselect and access the desired knowledge while minimizing the effort to search for relevant publications. While ontology learning can be used to create structured knowledge graphs, named entity recognition allows detection and categorization of important information in text. This work combines ontology learning and named entity recognition for automated extraction of key data from publications and organization of the implicit knowledge in a machine- and user-readable knowledge graph and data. CatalysisIE is a pre-trained model for such information extraction for catalysis research. This model is used and extended in this work based on a new data set, increasing the precision and recall of the model with regard to the data set. Validation of the presented workflow is presented on two datasets regarding catalysis research. Preformulated SPARQL-queries are provided to show the usability and applicability of the resulting knowledge graph for researchers.</div></div>","PeriodicalId":66,"journal":{"name":"Catalysis Science & Technology","volume":"14 19","pages":"Pages 5699-5713"},"PeriodicalIF":4.4000,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/cy/d4cy00369a?page=search","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Catalysis Science & Technology","FirstCategoryId":"92","ListUrlMain":"https://www.sciencedirect.com/org/science/article/pii/S2044475324004696","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Structured research data management in catalysis is crucial, especially for large amounts of data, and should be guided by FAIR principles for easy access and compatibility of data. Ontologies help to organize knowledge in a structured and FAIR way. The increasing numbers of scientific publications call for automated methods to preselect and access the desired knowledge while minimizing the effort to search for relevant publications. While ontology learning can be used to create structured knowledge graphs, named entity recognition allows detection and categorization of important information in text. This work combines ontology learning and named entity recognition for automated extraction of key data from publications and organization of the implicit knowledge in a machine- and user-readable knowledge graph and data. CatalysisIE is a pre-trained model for such information extraction for catalysis research. This model is used and extended in this work based on a new data set, increasing the precision and recall of the model with regard to the data set. Validation of the presented workflow is presented on two datasets regarding catalysis research. Preformulated SPARQL-queries are provided to show the usability and applicability of the resulting knowledge graph for researchers.
期刊介绍:
A multidisciplinary journal focusing on cutting edge research across all fundamental science and technological aspects of catalysis.
Editor-in-chief: Bert Weckhuysen
Impact factor: 5.0
Time to first decision (peer reviewed only): 31 days