{"title":"面向技术语言处理的关键词提取","authors":"Alden Dima, Aaron Massey","doi":"10.6028/jres.126.053","DOIUrl":null,"url":null,"abstract":"<p><p>Keyphrase extraction is an important facet of annotation tools that offer the provision of the metadata necessary for technical language processing (TLP). Because TLP imposes additional requirements on typical natural language processing (NLP) methods, we examined TLP keyphrase extraction through the lens of a hypothetical toolkit which consists of a combination of text features and classifers suitable for use in low-resource TLP applications. We compared two approaches for keyphrase extraction: The frst which applied our toolkit-based methods that used only distributional features of words and phrases, and the second was the Maui automatic topic indexer, a well-known academic method. Performance was measured against two collections of technical literature: 1153 articles from Journal of Chemical Thermodynamics (JCT) curated by the National Institute of Standards and Technology Thermodynamics Research Center (TRC) and 244 articles from Task 5 of the Workshop on Semantic Evaluation (SemEval). Both collections have author-provided keyphrases available; the SemEval articles also have reader-provided keyphrases. Our fndings indicate that our toolkit approach was competitive with Maui when author-provided keyphrases were frst removed from the text. For the TRC-JCT articles, the Maui automatic topic indexer reported an F-measure of 29.4 % while our toolkit approach obtained an F-measure of 28.2 %. For the SemEval articles, our toolkit approach using a Naïve Bayes classifer resulted in an F-measure of 20.8 %, which outperformed Maui's F-measure of 18.8 %.</p>","PeriodicalId":54766,"journal":{"name":"Journal of Research of the National Institute of Standards and Technology","volume":null,"pages":null},"PeriodicalIF":1.3000,"publicationDate":"2022-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11249704/pdf/","citationCount":"0","resultStr":"{\"title\":\"Keyphrase Extraction for Technical Language Processing.\",\"authors\":\"Alden Dima, Aaron Massey\",\"doi\":\"10.6028/jres.126.053\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Keyphrase extraction is an important facet of annotation tools that offer the provision of the metadata necessary for technical language processing (TLP). Because TLP imposes additional requirements on typical natural language processing (NLP) methods, we examined TLP keyphrase extraction through the lens of a hypothetical toolkit which consists of a combination of text features and classifers suitable for use in low-resource TLP applications. We compared two approaches for keyphrase extraction: The frst which applied our toolkit-based methods that used only distributional features of words and phrases, and the second was the Maui automatic topic indexer, a well-known academic method. Performance was measured against two collections of technical literature: 1153 articles from Journal of Chemical Thermodynamics (JCT) curated by the National Institute of Standards and Technology Thermodynamics Research Center (TRC) and 244 articles from Task 5 of the Workshop on Semantic Evaluation (SemEval). Both collections have author-provided keyphrases available; the SemEval articles also have reader-provided keyphrases. Our fndings indicate that our toolkit approach was competitive with Maui when author-provided keyphrases were frst removed from the text. For the TRC-JCT articles, the Maui automatic topic indexer reported an F-measure of 29.4 % while our toolkit approach obtained an F-measure of 28.2 %. For the SemEval articles, our toolkit approach using a Naïve Bayes classifer resulted in an F-measure of 20.8 %, which outperformed Maui's F-measure of 18.8 %.</p>\",\"PeriodicalId\":54766,\"journal\":{\"name\":\"Journal of Research of the National Institute of Standards and Technology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2022-03-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11249704/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Research of the National Institute of Standards and Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.6028/jres.126.053\",\"RegionNum\":4,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2021/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q3\",\"JCRName\":\"INSTRUMENTS & INSTRUMENTATION\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Research of the National Institute of Standards and Technology","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.6028/jres.126.053","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2021/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"INSTRUMENTS & INSTRUMENTATION","Score":null,"Total":0}
Keyphrase Extraction for Technical Language Processing.
Keyphrase extraction is an important facet of annotation tools that offer the provision of the metadata necessary for technical language processing (TLP). Because TLP imposes additional requirements on typical natural language processing (NLP) methods, we examined TLP keyphrase extraction through the lens of a hypothetical toolkit which consists of a combination of text features and classifers suitable for use in low-resource TLP applications. We compared two approaches for keyphrase extraction: The frst which applied our toolkit-based methods that used only distributional features of words and phrases, and the second was the Maui automatic topic indexer, a well-known academic method. Performance was measured against two collections of technical literature: 1153 articles from Journal of Chemical Thermodynamics (JCT) curated by the National Institute of Standards and Technology Thermodynamics Research Center (TRC) and 244 articles from Task 5 of the Workshop on Semantic Evaluation (SemEval). Both collections have author-provided keyphrases available; the SemEval articles also have reader-provided keyphrases. Our fndings indicate that our toolkit approach was competitive with Maui when author-provided keyphrases were frst removed from the text. For the TRC-JCT articles, the Maui automatic topic indexer reported an F-measure of 29.4 % while our toolkit approach obtained an F-measure of 28.2 %. For the SemEval articles, our toolkit approach using a Naïve Bayes classifer resulted in an F-measure of 20.8 %, which outperformed Maui's F-measure of 18.8 %.
期刊介绍:
The Journal of Research of the National Institute of Standards and Technology is the flagship publication of the National Institute of Standards and Technology. It has been published under various titles and forms since 1904, with its roots as Scientific Papers issued as the Bulletin of the Bureau of Standards.
In 1928, the Scientific Papers were combined with Technologic Papers, which reported results of investigations of material and methods of testing. This new publication was titled the Bureau of Standards Journal of Research.
The Journal of Research of NIST reports NIST research and development in metrology and related fields of physical science, engineering, applied mathematics, statistics, biotechnology, information technology.