ABB-BERT: A BERT model for disambiguating abbreviations and contractions

Q3 Arts and Humanities Icon Pub Date : 2022-07-08 DOI:10.48550/arXiv.2207.04008
Prateek Kacker, Andi Cupallari, Aswin Giridhar Subramanian, Nimit Jain
{"title":"ABB-BERT: A BERT model for disambiguating abbreviations and contractions","authors":"Prateek Kacker, Andi Cupallari, Aswin Giridhar Subramanian, Nimit Jain","doi":"10.48550/arXiv.2207.04008","DOIUrl":null,"url":null,"abstract":"Abbreviations and contractions are commonly found in text across different domains. For example, doctors’ notes contain many contractions that can be personalized based on their choices. Existing spelling correction models are not suitable to handle expansions because of many reductions of characters in words. In this work, we propose ABB-BERT, a BERT-based model, which deals with an ambiguous language containing abbreviations and contractions. ABB-BERT can rank them from thousands of options and is designed for scale. It is trained on Wikipedia text, and the algorithm allows it to be fine-tuned with little compute to get better performance for a domain or person. We are publicly releasing the training dataset for abbreviations and contractions derived from Wikipedia.","PeriodicalId":53637,"journal":{"name":"Icon","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Icon","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2207.04008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Arts and Humanities","Score":null,"Total":0}
引用次数: 0

Abstract

Abbreviations and contractions are commonly found in text across different domains. For example, doctors’ notes contain many contractions that can be personalized based on their choices. Existing spelling correction models are not suitable to handle expansions because of many reductions of characters in words. In this work, we propose ABB-BERT, a BERT-based model, which deals with an ambiguous language containing abbreviations and contractions. ABB-BERT can rank them from thousands of options and is designed for scale. It is trained on Wikipedia text, and the algorithm allows it to be fine-tuned with little compute to get better performance for a domain or person. We are publicly releasing the training dataset for abbreviations and contractions derived from Wikipedia.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于消除缩写和缩写歧义的BERT模型
缩写和缩写通常出现在不同领域的文本中。例如,医生的笔记中包含许多宫缩,这些宫缩可以根据他们的选择进行个性化设置。现有的拼写校正模型不适合处理扩展,因为单词中的字符减少了很多。在这项工作中,我们提出了ABB-BERT,这是一个基于BERT的模型,它处理包含缩写和缩写的歧义语言。ABB-BERT可以从数千个选项中对它们进行排名,并且是为规模而设计的。它是在维基百科文本上训练的,算法允许它在几乎没有计算的情况下进行微调,以获得更好的域或个人性能。我们正在公开发布源自维基百科的缩写和缩写的训练数据集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Icon
Icon Arts and Humanities-History and Philosophy of Science
CiteScore
0.30
自引率
0.00%
发文量
0
期刊最新文献
Long-term Coherent Accumulation Algorithm Based on Radar Altimeter Deep Composite Kernels ELM Based on Spatial Feature Extraction for Hyperspectral Vegetation Image Classification Research based on improved SSD target detection algorithm CON-GAN-BERT: combining Contrastive Learning with Generative Adversarial Nets for Few-Shot Sentiment Classification A Two Stage Learning Algorithm for Hyperspectral Image Classification
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1