在点模式匹配中使用编辑距离

V. Makinen
{"title":"在点模式匹配中使用编辑距离","authors":"V. Makinen","doi":"10.1109/SPIRE.2001.989751","DOIUrl":null,"url":null,"abstract":"Edit distance is a powerful measure of similarity in string matching, measuring the minimum amount of insertions, deletions, and substitutions to convert a string into another string. This measure is ofte. contrasted with time warping in speech processing, that measures how close two trajectories are by allowing compression and expansion operations on time scale. Erne warping can be easily generalized to measure the similarity between ID point-patterns (ascending lists of real values), as the diference between ith and (i l ) th points in a point-pattern can be considered as the value of a trajectory at the time i. Howeve< we show that edit distance is more natural choice, and derive a measure by calculating the minimum amount of space needed to insert and delete between points to convert a point-pattern into another. We show that this measure defines a metric. We also define a substitution operation such that the distance calculation automatically separates the points into matching and mismatching points. The algorithms are based on dynamic programming. The main motivation for these methods is two and higher dimensional point-pattern matching, and therefore we generalize these methods into the 2 0 case, and show that this generalization leads to an NP-complete problem. There is also applications for the I D case; we discuss shortly the matching of tree ring sequences in dendrochronology.","PeriodicalId":107511,"journal":{"name":"Proceedings Eighth Symposium on String Processing and Information Retrieval","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Using edit distance in point-pattern matching\",\"authors\":\"V. Makinen\",\"doi\":\"10.1109/SPIRE.2001.989751\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Edit distance is a powerful measure of similarity in string matching, measuring the minimum amount of insertions, deletions, and substitutions to convert a string into another string. This measure is ofte. contrasted with time warping in speech processing, that measures how close two trajectories are by allowing compression and expansion operations on time scale. Erne warping can be easily generalized to measure the similarity between ID point-patterns (ascending lists of real values), as the diference between ith and (i l ) th points in a point-pattern can be considered as the value of a trajectory at the time i. Howeve< we show that edit distance is more natural choice, and derive a measure by calculating the minimum amount of space needed to insert and delete between points to convert a point-pattern into another. We show that this measure defines a metric. We also define a substitution operation such that the distance calculation automatically separates the points into matching and mismatching points. The algorithms are based on dynamic programming. The main motivation for these methods is two and higher dimensional point-pattern matching, and therefore we generalize these methods into the 2 0 case, and show that this generalization leads to an NP-complete problem. There is also applications for the I D case; we discuss shortly the matching of tree ring sequences in dendrochronology.\",\"PeriodicalId\":107511,\"journal\":{\"name\":\"Proceedings Eighth Symposium on String Processing and Information Retrieval\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings Eighth Symposium on String Processing and Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SPIRE.2001.989751\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Eighth Symposium on String Processing and Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPIRE.2001.989751","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

编辑距离是字符串匹配中的一种强大的相似性度量,它测量将一个字符串转换为另一个字符串所需的插入、删除和替换的最小数量。这是常用的方法。与语音处理中的时间扭曲相比,它通过允许在时间尺度上进行压缩和扩展操作来测量两条轨迹的接近程度。Erne扭曲可以很容易地推广到测量ID点模式(实值的升序列表)之间的相似性,因为点模式中第i个点与第i个点之间的差值可以被认为是时刻i的轨迹值。然而,我们表明编辑距离是更自然的选择,并通过计算插入和删除点之间所需的最小空间量来导出度量。我们证明这个度量定义了一个度量。我们还定义了替换操作,使距离计算自动将点划分为匹配点和不匹配点。算法是基于动态规划的。这些方法的主要动机是二维和高维点模式匹配,因此我们将这些方法推广到20的情况,并表明这种推广导致np完全问题。也有申请身份证的个案;简要讨论了树木年轮序列在树木年代学中的匹配问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Using edit distance in point-pattern matching
Edit distance is a powerful measure of similarity in string matching, measuring the minimum amount of insertions, deletions, and substitutions to convert a string into another string. This measure is ofte. contrasted with time warping in speech processing, that measures how close two trajectories are by allowing compression and expansion operations on time scale. Erne warping can be easily generalized to measure the similarity between ID point-patterns (ascending lists of real values), as the diference between ith and (i l ) th points in a point-pattern can be considered as the value of a trajectory at the time i. Howeve< we show that edit distance is more natural choice, and derive a measure by calculating the minimum amount of space needed to insert and delete between points to convert a point-pattern into another. We show that this measure defines a metric. We also define a substitution operation such that the distance calculation automatically separates the points into matching and mismatching points. The algorithms are based on dynamic programming. The main motivation for these methods is two and higher dimensional point-pattern matching, and therefore we generalize these methods into the 2 0 case, and show that this generalization leads to an NP-complete problem. There is also applications for the I D case; we discuss shortly the matching of tree ring sequences in dendrochronology.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Fast categorisation of large document collections An efficient bottom-up distance between trees A documental database query language Genome rearrangements distance by fusion, fission, and transposition is easy Using semantics for paragraph selection in question answering systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1