基于n元树的数学公式相似性评价模型

Yifan Dai, Liangyu Chen, Zihan Zhang
{"title":"基于n元树的数学公式相似性评价模型","authors":"Yifan Dai, Liangyu Chen, Zihan Zhang","doi":"10.1109/SMC42975.2020.9283495","DOIUrl":null,"url":null,"abstract":"Accurate and efficient measurements for evaluating the similarity between mathematical formulae play an important role in mathematical information retrieval. Most previous studies have focused on representing formulae in different types to catch their features and combining the traditional structure matching algorithms. This paper presents a new unsupervised model called N-ary Tree-based Formula Embedding Model (NTFEM) for the task of mathematical similarity evaluation. Using an n-ary tree structure to represent the formula, we convert the formula into a linear sequence that can be viewed as the input sentence and then embed the formula by using a word embedding model. Based on the characteristics of mathematical formulae, a weighting function is also used to get the final weighted average embedding vector. Through some experiments on NTCIR-12 Wikipedia Formula Browsing Task, our model can outperform previous formula search engines in Bpref prediction metrics. In addition, compared with traditional tree-based models, NTFEM not only improves the retrieval effect, but also greatly reduces the training time and improves training efficiency.","PeriodicalId":6718,"journal":{"name":"2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC)","volume":"103 1","pages":"2578-2584"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"An N-ary Tree-based Model for Similarity Evaluation on Mathematical Formulae\",\"authors\":\"Yifan Dai, Liangyu Chen, Zihan Zhang\",\"doi\":\"10.1109/SMC42975.2020.9283495\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Accurate and efficient measurements for evaluating the similarity between mathematical formulae play an important role in mathematical information retrieval. Most previous studies have focused on representing formulae in different types to catch their features and combining the traditional structure matching algorithms. This paper presents a new unsupervised model called N-ary Tree-based Formula Embedding Model (NTFEM) for the task of mathematical similarity evaluation. Using an n-ary tree structure to represent the formula, we convert the formula into a linear sequence that can be viewed as the input sentence and then embed the formula by using a word embedding model. Based on the characteristics of mathematical formulae, a weighting function is also used to get the final weighted average embedding vector. Through some experiments on NTCIR-12 Wikipedia Formula Browsing Task, our model can outperform previous formula search engines in Bpref prediction metrics. In addition, compared with traditional tree-based models, NTFEM not only improves the retrieval effect, but also greatly reduces the training time and improves training efficiency.\",\"PeriodicalId\":6718,\"journal\":{\"name\":\"2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC)\",\"volume\":\"103 1\",\"pages\":\"2578-2584\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SMC42975.2020.9283495\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SMC42975.2020.9283495","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

准确、高效地度量数学公式之间的相似度在数学信息检索中起着重要作用。以往的研究大多集中在对不同类型的公式进行表征,捕捉其特征,并结合传统的结构匹配算法。本文提出了一种新的无监督模型——基于n元树的公式嵌入模型(NTFEM),用于数学相似性评价。我们使用n元树结构来表示公式,将公式转换为可视为输入句子的线性序列,然后使用词嵌入模型嵌入公式。根据数学公式的特点,利用加权函数得到最终的加权平均嵌入向量。通过在ntir -12维基百科公式浏览任务上的实验,我们的模型在Bpref预测指标上优于以往的公式搜索引擎。此外,与传统的基于树的模型相比,NTFEM不仅提高了检索效果,而且大大缩短了训练时间,提高了训练效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
An N-ary Tree-based Model for Similarity Evaluation on Mathematical Formulae
Accurate and efficient measurements for evaluating the similarity between mathematical formulae play an important role in mathematical information retrieval. Most previous studies have focused on representing formulae in different types to catch their features and combining the traditional structure matching algorithms. This paper presents a new unsupervised model called N-ary Tree-based Formula Embedding Model (NTFEM) for the task of mathematical similarity evaluation. Using an n-ary tree structure to represent the formula, we convert the formula into a linear sequence that can be viewed as the input sentence and then embed the formula by using a word embedding model. Based on the characteristics of mathematical formulae, a weighting function is also used to get the final weighted average embedding vector. Through some experiments on NTCIR-12 Wikipedia Formula Browsing Task, our model can outperform previous formula search engines in Bpref prediction metrics. In addition, compared with traditional tree-based models, NTFEM not only improves the retrieval effect, but also greatly reduces the training time and improves training efficiency.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
At-the-Edge Data Processing for Low Latency High Throughput Machine Learning Algorithms Machine Learning for First Principles Calculations of Material Properties for Ferromagnetic Materials Mobility Aware Computation Offloading Model for Edge Computing Toward an Autonomous Workflow for Single Crystal Neutron Diffraction Virtual Infrastructure Twins: Software Testing Platforms for Computing-Instrument Ecosystems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1