{"title":"Keyphrase Extraction for Technical Language Processing.","authors":"Alden Dima, Aaron Massey","doi":"10.6028/jres.126.053","DOIUrl":null,"url":null,"abstract":"<p><p>Keyphrase extraction is an important facet of annotation tools that offer the provision of the metadata necessary for technical language processing (TLP). Because TLP imposes additional requirements on typical natural language processing (NLP) methods, we examined TLP keyphrase extraction through the lens of a hypothetical toolkit which consists of a combination of text features and classifers suitable for use in low-resource TLP applications. We compared two approaches for keyphrase extraction: The frst which applied our toolkit-based methods that used only distributional features of words and phrases, and the second was the Maui automatic topic indexer, a well-known academic method. Performance was measured against two collections of technical literature: 1153 articles from Journal of Chemical Thermodynamics (JCT) curated by the National Institute of Standards and Technology Thermodynamics Research Center (TRC) and 244 articles from Task 5 of the Workshop on Semantic Evaluation (SemEval). Both collections have author-provided keyphrases available; the SemEval articles also have reader-provided keyphrases. Our fndings indicate that our toolkit approach was competitive with Maui when author-provided keyphrases were frst removed from the text. For the TRC-JCT articles, the Maui automatic topic indexer reported an F-measure of 29.4 % while our toolkit approach obtained an F-measure of 28.2 %. For the SemEval articles, our toolkit approach using a Naïve Bayes classifer resulted in an F-measure of 20.8 %, which outperformed Maui's F-measure of 18.8 %.</p>","PeriodicalId":1,"journal":{"name":"Accounts of Chemical Research","volume":" ","pages":"126053"},"PeriodicalIF":17.7000,"publicationDate":"2022-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11249704/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Accounts of Chemical Research","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.6028/jres.126.053","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2021/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Keyphrase extraction is an important facet of annotation tools that offer the provision of the metadata necessary for technical language processing (TLP). Because TLP imposes additional requirements on typical natural language processing (NLP) methods, we examined TLP keyphrase extraction through the lens of a hypothetical toolkit which consists of a combination of text features and classifers suitable for use in low-resource TLP applications. We compared two approaches for keyphrase extraction: The frst which applied our toolkit-based methods that used only distributional features of words and phrases, and the second was the Maui automatic topic indexer, a well-known academic method. Performance was measured against two collections of technical literature: 1153 articles from Journal of Chemical Thermodynamics (JCT) curated by the National Institute of Standards and Technology Thermodynamics Research Center (TRC) and 244 articles from Task 5 of the Workshop on Semantic Evaluation (SemEval). Both collections have author-provided keyphrases available; the SemEval articles also have reader-provided keyphrases. Our fndings indicate that our toolkit approach was competitive with Maui when author-provided keyphrases were frst removed from the text. For the TRC-JCT articles, the Maui automatic topic indexer reported an F-measure of 29.4 % while our toolkit approach obtained an F-measure of 28.2 %. For the SemEval articles, our toolkit approach using a Naïve Bayes classifer resulted in an F-measure of 20.8 %, which outperformed Maui's F-measure of 18.8 %.
期刊介绍:
Accounts of Chemical Research presents short, concise and critical articles offering easy-to-read overviews of basic research and applications in all areas of chemistry and biochemistry. These short reviews focus on research from the author’s own laboratory and are designed to teach the reader about a research project. In addition, Accounts of Chemical Research publishes commentaries that give an informed opinion on a current research problem. Special Issues online are devoted to a single topic of unusual activity and significance.
Accounts of Chemical Research replaces the traditional article abstract with an article "Conspectus." These entries synopsize the research affording the reader a closer look at the content and significance of an article. Through this provision of a more detailed description of the article contents, the Conspectus enhances the article's discoverability by search engines and the exposure for the research.