FRE @ BC8 SympTEMIST 赛道分析:命名实体识别。

IF 4.3 3区 材料科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC ACS Applied Electronic Materials Pub Date : 2024-09-16 DOI:10.1093/database/baae101
Ander Martinez, Nuria García-Santa
{"title":"FRE @ BC8 SympTEMIST 赛道分析:命名实体识别。","authors":"Ander Martinez, Nuria García-Santa","doi":"10.1093/database/baae101","DOIUrl":null,"url":null,"abstract":"<p><p>This paper is a more in-depth analysis of the approaches used in our submission (Martínez A, García-Santa N. (2023) FRE @ BC8 SympTEMIST track: Named Entity Recognition Zenodo.) to the 'SympTEMIST' Named Entity Recognition (NER) shared subtask at 'BioCreative 2023'. We participated on the challenge submitting two systems based on a RoBERTa architecture LLM trained on Spanish-language clinical data available at 'HuggingFace' model repository. Before choosing the systems that would be submitted, we tried different combinations of the techniques described here: Conditional Random Fields and Byte-Pair Encoding dropout. In the second system we also included Sub-Subword feature based embeddings (SSW). The test set used in the challenge has now been released (López SL, Sánchez LG, Farré E et al. (2024) SympTEMIST Corpus: Gold Standard annotations for clinical symptoms, signs and findings information extraction. Zenodo), allowing us to analyze more in depth our methods, as well as measuring the impact of introducing data from CARMEN-I (Lima-López S, Farré-Maduell E, Krallinger M. (2023) CARMEN-I: Clinical Entities Annotation Guidelines in Spanish. Zenodo) corpus. Our experiments show the moderate effect of using the Sub-Subword feature based embeddings and the impact of including the symptom NER data from the CARMEN-I dataset. Database URL: https://physionet.org/content/carmen-i/1.0/.</p>","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11403810/pdf/","citationCount":"0","resultStr":"{\"title\":\"An analysis of FRE @ BC8 SympTEMIST track: named entity recognition.\",\"authors\":\"Ander Martinez, Nuria García-Santa\",\"doi\":\"10.1093/database/baae101\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>This paper is a more in-depth analysis of the approaches used in our submission (Martínez A, García-Santa N. (2023) FRE @ BC8 SympTEMIST track: Named Entity Recognition Zenodo.) to the 'SympTEMIST' Named Entity Recognition (NER) shared subtask at 'BioCreative 2023'. We participated on the challenge submitting two systems based on a RoBERTa architecture LLM trained on Spanish-language clinical data available at 'HuggingFace' model repository. Before choosing the systems that would be submitted, we tried different combinations of the techniques described here: Conditional Random Fields and Byte-Pair Encoding dropout. In the second system we also included Sub-Subword feature based embeddings (SSW). The test set used in the challenge has now been released (López SL, Sánchez LG, Farré E et al. (2024) SympTEMIST Corpus: Gold Standard annotations for clinical symptoms, signs and findings information extraction. Zenodo), allowing us to analyze more in depth our methods, as well as measuring the impact of introducing data from CARMEN-I (Lima-López S, Farré-Maduell E, Krallinger M. (2023) CARMEN-I: Clinical Entities Annotation Guidelines in Spanish. Zenodo) corpus. Our experiments show the moderate effect of using the Sub-Subword feature based embeddings and the impact of including the symptom NER data from the CARMEN-I dataset. Database URL: https://physionet.org/content/carmen-i/1.0/.</p>\",\"PeriodicalId\":3,\"journal\":{\"name\":\"ACS Applied Electronic Materials\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11403810/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Applied Electronic Materials\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/database/baae101\",\"RegionNum\":3,\"RegionCategory\":\"材料科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/database/baae101","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

本文是对我们提交的论文(Martínez A, García-Santa N. (2023) FRE @ BC8 SympTEMIST track:命名实体识别 Zenodo.)提交给 "BioCreative 2023 "的 "SympTEMIST "命名实体识别(NER)共享子任务。我们参加了这项挑战,提交了两个基于 RoBERTa 架构 LLM 的系统,该 LLM 在 "HuggingFace "模型库中的西班牙语临床数据上进行了训练。在选择提交的系统之前,我们尝试了本文所述技术的不同组合:条件随机场和字节对编码剔除。在第二个系统中,我们还加入了基于子分词特征的嵌入(SSW)。挑战赛中使用的测试集现已发布(López SL, Sánchez LG, Farré E et al. (2024) SympTEMIST Corpus:用于临床症状、体征和检查结果信息提取的黄金标准注释。Zenodo),让我们能够更深入地分析我们的方法,并衡量引入 CARMEN-I (Lima-López S, Farré-Maduell E, Krallinger M. (2023) CARMEN-I: 西班牙语临床实体注释指南。Zenodo)语料库。我们的实验表明,使用基于 Sub-Subword 特征的嵌入效果适中,而纳入 CARMEN-I 数据集的症状 NER 数据则会产生影响。数据库网址:https://physionet.org/content/carmen-i/1.0/.
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
An analysis of FRE @ BC8 SympTEMIST track: named entity recognition.

This paper is a more in-depth analysis of the approaches used in our submission (Martínez A, García-Santa N. (2023) FRE @ BC8 SympTEMIST track: Named Entity Recognition Zenodo.) to the 'SympTEMIST' Named Entity Recognition (NER) shared subtask at 'BioCreative 2023'. We participated on the challenge submitting two systems based on a RoBERTa architecture LLM trained on Spanish-language clinical data available at 'HuggingFace' model repository. Before choosing the systems that would be submitted, we tried different combinations of the techniques described here: Conditional Random Fields and Byte-Pair Encoding dropout. In the second system we also included Sub-Subword feature based embeddings (SSW). The test set used in the challenge has now been released (López SL, Sánchez LG, Farré E et al. (2024) SympTEMIST Corpus: Gold Standard annotations for clinical symptoms, signs and findings information extraction. Zenodo), allowing us to analyze more in depth our methods, as well as measuring the impact of introducing data from CARMEN-I (Lima-López S, Farré-Maduell E, Krallinger M. (2023) CARMEN-I: Clinical Entities Annotation Guidelines in Spanish. Zenodo) corpus. Our experiments show the moderate effect of using the Sub-Subword feature based embeddings and the impact of including the symptom NER data from the CARMEN-I dataset. Database URL: https://physionet.org/content/carmen-i/1.0/.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
7.20
自引率
4.30%
发文量
567
期刊最新文献
Hyperbaric oxygen treatment promotes tendon-bone interface healing in a rabbit model of rotator cuff tears. Oxygen-ozone therapy for myocardial ischemic stroke and cardiovascular disorders. Comparative study on the anti-inflammatory and protective effects of different oxygen therapy regimens on lipopolysaccharide-induced acute lung injury in mice. Heme oxygenase/carbon monoxide system and development of the heart. Hyperbaric oxygen for moderate-to-severe traumatic brain injury: outcomes 5-8 years after injury.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1