Discriminative Optimization of String Similarity and Its Application to Biomedical Abbreviation Clustering

Atsuko Yamaguchi, Yasunori Yamamoto, Jin-Dong Kim, T. Takagi, A. Yonezawa
{"title":"Discriminative Optimization of String Similarity and Its Application to Biomedical Abbreviation Clustering","authors":"Atsuko Yamaguchi, Yasunori Yamamoto, Jin-Dong Kim, T. Takagi, A. Yonezawa","doi":"10.1109/ICMLA.2011.58","DOIUrl":null,"url":null,"abstract":"Many string similarity measures have been developed to deal with the variety of expressions in natural language texts. With the abundance of such measures, we should consider the choice of measures and its parameters to maximize the performance for a given task. During our preliminary experiment to find the best measure and its parameters for the task of clustering terms to improve our abbreviation dictionary in life science, we found that chemical names had different characteristics in their character sequences compared to other terms. Based on the observation, we experimented with four string similarity measures to test the hypothesis, gchemical names has a different morphology, thus computation of their similarity should be differed from that of other terms.h The experimental results show that the edit distance is the best for chemical names, and that the discriminative application of string similarity methods to chemical and non-chemical names may be a simple but effective way to improve the performance of term clustering.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 10th International Conference on Machine Learning and Applications and Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2011.58","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Many string similarity measures have been developed to deal with the variety of expressions in natural language texts. With the abundance of such measures, we should consider the choice of measures and its parameters to maximize the performance for a given task. During our preliminary experiment to find the best measure and its parameters for the task of clustering terms to improve our abbreviation dictionary in life science, we found that chemical names had different characteristics in their character sequences compared to other terms. Based on the observation, we experimented with four string similarity measures to test the hypothesis, gchemical names has a different morphology, thus computation of their similarity should be differed from that of other terms.h The experimental results show that the edit distance is the best for chemical names, and that the discriminative application of string similarity methods to chemical and non-chemical names may be a simple but effective way to improve the performance of term clustering.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
字符串相似度判别优化及其在生物医学缩写聚类中的应用
为了处理自然语言文本中的各种表达式,已经开发了许多字符串相似度度量。由于此类度量的丰裕,我们应该考虑度量及其参数的选择,以最大限度地提高给定任务的性能。在寻找聚类术语任务的最佳度量及其参数以改进我们的生命科学缩写词典的初步实验中,我们发现化学名称的字符序列与其他术语相比具有不同的特征。在此基础上,我们实验了四种字符串相似度度量来验证假设,化学名称具有不同的形态,因此其相似度的计算应该与其他术语不同。实验结果表明,化学名称的编辑距离是最好的,将字符串相似度方法区分应用于化学名称和非化学名称可能是一种简单而有效的方法来提高术语聚类的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Data-Mining Approach to Travel Price Forecasting L1 vs. L2 Regularization in Text Classification when Learning from Labeled Features Nonlinear RANSAC Optimization for Parameter Estimation with Applications to Phagocyte Transmigration Speech Rating System through Space Mapping Kernel Methods for Minimum Entropy Encoding
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1