Applying Transfer Learning for Improving Domain-Specific Search Experience Using Query to Question Similarity

Ankush Chopra, S. Agrawal, Sohom Ghosh
{"title":"Applying Transfer Learning for Improving Domain-Specific Search Experience Using Query to Question Similarity","authors":"Ankush Chopra, S. Agrawal, Sohom Ghosh","doi":"10.1145/3446132.3446403","DOIUrl":null,"url":null,"abstract":"Search is one of the most common platforms used to seek information. However, users mostly get overloaded with results whenever they use such a platform to resolve their queries. Nowadays, direct answers to queries are being provided as a part of the search experience. The question-answer (QA) retrieval process plays a significant role in enriching the search experience. Most off-the-shelf Semantic Textual Similarity models work fine for well-formed search queries, but their performances degrade when applied to a domain-specific setting having incomplete or grammatically ill-formed search queries in prevalence. In this paper, we discuss a framework for calculating similarities between a given input query and a set of predefined questions to retrieve the question which matches to it the most. We have used it for the financial domain, but the framework is generalized for any domain-specific search engine and can be used in other domains as well. We use Siamese network [6] over Long Short-Term Memory (LSTM) [3] models to train a classifier which generates un-normalized and normalized similarity scores for a given pair of questions. Moreover, for each of these question pairs, we calculate three other similarity scores: cosine similarity between their average word2vec embeddings [15], cosine similarity between their sentence embeddings [7] generated using RoBERTa [17] and their customized fuzzy-match score. Finally, we develop a meta-classifier using Support Vector Machines [19] for combining these five scores to detect if a given pair of questions is similar. We benchmark our model's performance against existing State Of The Art (SOTA) models on Quora Question Pairs (QQP) dataset1 as well as a dataset specific to the financial domain. After evaluating its performance on the financial domain specific data, we conclude that it not only outperforms several existing SOTA models on F1 score but also has decent accuracy.","PeriodicalId":125388,"journal":{"name":"Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3446132.3446403","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Search is one of the most common platforms used to seek information. However, users mostly get overloaded with results whenever they use such a platform to resolve their queries. Nowadays, direct answers to queries are being provided as a part of the search experience. The question-answer (QA) retrieval process plays a significant role in enriching the search experience. Most off-the-shelf Semantic Textual Similarity models work fine for well-formed search queries, but their performances degrade when applied to a domain-specific setting having incomplete or grammatically ill-formed search queries in prevalence. In this paper, we discuss a framework for calculating similarities between a given input query and a set of predefined questions to retrieve the question which matches to it the most. We have used it for the financial domain, but the framework is generalized for any domain-specific search engine and can be used in other domains as well. We use Siamese network [6] over Long Short-Term Memory (LSTM) [3] models to train a classifier which generates un-normalized and normalized similarity scores for a given pair of questions. Moreover, for each of these question pairs, we calculate three other similarity scores: cosine similarity between their average word2vec embeddings [15], cosine similarity between their sentence embeddings [7] generated using RoBERTa [17] and their customized fuzzy-match score. Finally, we develop a meta-classifier using Support Vector Machines [19] for combining these five scores to detect if a given pair of questions is similar. We benchmark our model's performance against existing State Of The Art (SOTA) models on Quora Question Pairs (QQP) dataset1 as well as a dataset specific to the financial domain. After evaluating its performance on the financial domain specific data, we conclude that it not only outperforms several existing SOTA models on F1 score but also has decent accuracy.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
应用迁移学习改进基于查询问题相似度的特定领域搜索体验
搜索是最常用的信息搜索平台之一。然而,每当用户使用这样的平台来解决他们的查询时,结果往往会过载。如今,对查询的直接回答已成为搜索体验的一部分。问答检索过程在丰富搜索体验方面起着重要的作用。大多数现成的语义文本相似度模型对于格式良好的搜索查询都能很好地工作,但是当应用于普遍存在不完整或语法格式错误的搜索查询的特定领域设置时,它们的性能就会下降。在本文中,我们讨论了一个计算给定输入查询和一组预定义问题之间相似度的框架,以检索与它最匹配的问题。我们已经将其用于金融领域,但该框架适用于任何特定于领域的搜索引擎,也可以用于其他领域。我们在长短期记忆(LSTM)[3]模型上使用Siamese网络[6]来训练一个分类器,该分类器为给定的一对问题生成非规范化和规范化的相似性分数。此外,对于这些问题对中的每一个,我们计算了另外三个相似度分数:它们的平均word2vec嵌入[15]之间的余弦相似度,它们的句子嵌入[7]之间的余弦相似度,使用RoBERTa[17]生成的余弦相似度,以及它们定制的模糊匹配分数。最后,我们使用支持向量机(Support Vector Machines)开发了一个元分类器[19],用于组合这五个分数来检测给定的一对问题是否相似。我们将模型的性能与Quora问题对(QQP)数据集上现有的SOTA模型以及特定于金融领域的数据集进行基准测试。在评估了它在金融领域特定数据上的表现后,我们得出结论,它不仅在F1得分上优于几种现有的SOTA模型,而且具有不错的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Lane Detection Combining Details and Integrity: an Advanced Method for Lane Detection The Cat's Eye Effect Target Recognition Method Based on deep convolutional neural network Leveraging Different Context for Response Generation through Topic-guided Multi-head Attention Siamese Multiplicative LSTM for Semantic Text Similarity Multi-constrained Vehicle Routing Problem Solution based on Adaptive Genetic Algorithm
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1