Semi-automatic construction of thyroid cancer intervention corpus from biomedical abstracts

Wutthipong Kongburan, P. Padungweang, Worarat Krathu, Jonathan H. Chan
{"title":"Semi-automatic construction of thyroid cancer intervention corpus from biomedical abstracts","authors":"Wutthipong Kongburan, P. Padungweang, Worarat Krathu, Jonathan H. Chan","doi":"10.1109/ICACI.2016.7449819","DOIUrl":null,"url":null,"abstract":"Thyroid cancer is a common endocrine tumor that is experiencing a steady increase in incidence worldwide. The latest discoveries on disease and its treatment are mostly propagated in the form of biomedical publications such as those in PubMed. Unfortunately, this information is distributed in unstructured text with over two thousand articles being added annually. Text mining technology plays an important role in information extraction, since it can be used to uncover hidden value from the vast amount of text in reasonable time. In general, a preliminary task of text mining is Named Entity Recognition (NER). In this case, a gold standard corpus is needed, since the capability of NER depends on a trustworthy corpus. However the construction of gold standard corpus is a laborious and time-consuming process. In order to obtain a reasonably practical corpus in a limited time, this paper consequently proposes a semiautomatic approach to construct a thyroid cancer interventions corpus. The experimental results demonstrate that the proposed method can be used to construct a thyroid cancer intervention corpus reasonably in terms of both performance and overfitting avoidance.","PeriodicalId":211040,"journal":{"name":"2016 Eighth International Conference on Advanced Computational Intelligence (ICACI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Eighth International Conference on Advanced Computational Intelligence (ICACI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACI.2016.7449819","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Thyroid cancer is a common endocrine tumor that is experiencing a steady increase in incidence worldwide. The latest discoveries on disease and its treatment are mostly propagated in the form of biomedical publications such as those in PubMed. Unfortunately, this information is distributed in unstructured text with over two thousand articles being added annually. Text mining technology plays an important role in information extraction, since it can be used to uncover hidden value from the vast amount of text in reasonable time. In general, a preliminary task of text mining is Named Entity Recognition (NER). In this case, a gold standard corpus is needed, since the capability of NER depends on a trustworthy corpus. However the construction of gold standard corpus is a laborious and time-consuming process. In order to obtain a reasonably practical corpus in a limited time, this paper consequently proposes a semiautomatic approach to construct a thyroid cancer interventions corpus. The experimental results demonstrate that the proposed method can be used to construct a thyroid cancer intervention corpus reasonably in terms of both performance and overfitting avoidance.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于生物医学文摘的甲状腺癌干预语料库的半自动构建
甲状腺癌是一种常见的内分泌肿瘤,在世界范围内的发病率正在稳步上升。关于疾病及其治疗的最新发现大多以诸如PubMed等生物医学出版物的形式传播。不幸的是,这些信息是以非结构化文本的形式分发的,每年增加两千多篇文章。文本挖掘技术在信息提取中起着重要的作用,它可以在合理的时间内从海量的文本中发现隐藏的价值。一般来说,文本挖掘的一个初步任务是命名实体识别(NER)。在这种情况下,需要一个金标准语料库,因为NER的能力依赖于一个值得信赖的语料库。然而,构建金标准语料库是一个费时费力的过程。为了在有限的时间内获得较为实用的语料库,本文提出了一种半自动构建甲状腺癌干预语料库的方法。实验结果表明,该方法在性能和避免过拟合方面都能合理地构建甲状腺癌干预语料库。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Short term traffic flow prediction based on on-line sequential extreme learning machine Computational intelligent color normalization for wheat plant images to support precision farming A new time-dependent algorithm for post enrolment-based course timetabling problem Semi-automatic construction of thyroid cancer intervention corpus from biomedical abstracts Improvement of spatial data clustering algorithm in city location
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1