科技文本术语抽取的模式与词性自动学习方法

Wei Shao , Bolin Hua , Linqi Song
{"title":"科技文本术语抽取的模式与词性自动学习方法","authors":"Wei Shao ,&nbsp;Bolin Hua ,&nbsp;Linqi Song","doi":"10.2478/dim-2021-0005","DOIUrl":null,"url":null,"abstract":"<div><p>A lot of new scientific documents are being published on various platforms every day. It is more and more imperative to quickly and efficiently discover new words and meanings from these documents. However, most of the related works rely on labeled data, and it is quite difficult to deal with unlabeled new documents efficiently. For this, we have introduced an unsupervised method based on sentence patterns and part of speech (POS) sequences. Our method just needs a few initial learnable patterns to obtain the initial terminology tokens and their POS sequences. In this process, new patterns are constructed and can match more sentences to find more POS sequences of terminology. Finally, we use obtained POS sequences and sentence patterns to extract terminology terms in new scientific text. Experiments on paper abstracts from Web of Knowledge show that this method is practical and can achieve a good performance on our test data.</p></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"5 3","pages":"Pages 329-335"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2543925122000031/pdfft?md5=def416db2e2762263b15157e5919b4c2&pid=1-s2.0-S2543925122000031-main.pdf","citationCount":"6","resultStr":"{\"title\":\"A Pattern and POS Auto-Learning Method for Terminology Extraction from Scientific Text\",\"authors\":\"Wei Shao ,&nbsp;Bolin Hua ,&nbsp;Linqi Song\",\"doi\":\"10.2478/dim-2021-0005\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>A lot of new scientific documents are being published on various platforms every day. It is more and more imperative to quickly and efficiently discover new words and meanings from these documents. However, most of the related works rely on labeled data, and it is quite difficult to deal with unlabeled new documents efficiently. For this, we have introduced an unsupervised method based on sentence patterns and part of speech (POS) sequences. Our method just needs a few initial learnable patterns to obtain the initial terminology tokens and their POS sequences. In this process, new patterns are constructed and can match more sentences to find more POS sequences of terminology. Finally, we use obtained POS sequences and sentence patterns to extract terminology terms in new scientific text. Experiments on paper abstracts from Web of Knowledge show that this method is practical and can achieve a good performance on our test data.</p></div>\",\"PeriodicalId\":72769,\"journal\":{\"name\":\"Data and information management\",\"volume\":\"5 3\",\"pages\":\"Pages 329-335\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2543925122000031/pdfft?md5=def416db2e2762263b15157e5919b4c2&pid=1-s2.0-S2543925122000031-main.pdf\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data and information management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2543925122000031\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data and information management","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2543925122000031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

每天都有大量新的科学文献在各种平台上发表。从这些文档中快速有效地发现新词和词义变得越来越重要。然而,大多数相关工作依赖于标记数据,有效地处理未标记的新文档是相当困难的。为此,我们提出了一种基于句型和词性序列的无监督方法。我们的方法只需要一些初始的可学习模式来获得初始术语令牌及其POS序列。在这个过程中,新的模式被构建,并且可以匹配更多的句子,从而找到更多的术语的词序。最后,利用获得的词序和句式对新科学文本中的术语进行提取。在Web of Knowledge的论文摘要上进行的实验表明,该方法是实用的,可以在我们的测试数据上取得良好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Pattern and POS Auto-Learning Method for Terminology Extraction from Scientific Text

A lot of new scientific documents are being published on various platforms every day. It is more and more imperative to quickly and efficiently discover new words and meanings from these documents. However, most of the related works rely on labeled data, and it is quite difficult to deal with unlabeled new documents efficiently. For this, we have introduced an unsupervised method based on sentence patterns and part of speech (POS) sequences. Our method just needs a few initial learnable patterns to obtain the initial terminology tokens and their POS sequences. In this process, new patterns are constructed and can match more sentences to find more POS sequences of terminology. Finally, we use obtained POS sequences and sentence patterns to extract terminology terms in new scientific text. Experiments on paper abstracts from Web of Knowledge show that this method is practical and can achieve a good performance on our test data.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Data and information management
Data and information management Management Information Systems, Library and Information Sciences
CiteScore
3.70
自引率
0.00%
发文量
0
审稿时长
55 days
期刊最新文献
Erratum regarding missing Declaration of Competing Interest statements in previously published articles (Volume 6, Issues 1–4) Improved detection of transient events in wide area sky survey using convolutional neural networks An evaluation method of academic output that considers productivity differences Adaptive K-means clustering based under-sampling methods to solve the class imbalance problem Does internet use affect public risk perception? — From the perspective of political participation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1