Pub Date : 2023-08-04DOI: 10.48550/arXiv.2308.02294
Munazza Zaib, Wei Emma Zhang, Quan Z. Sheng, S. Sagar, A. Mahmood, Yang Zhang
The increasing demand for the web-based digital assistants has given a rapid rise in the interest of the Information Retrieval (IR) community towards the field of conversational question answering (ConvQA). However, one of the critical aspects of ConvQA is the effective selection of conversational history turns to answer the question at hand. The dependency between relevant history selection and correct answer prediction is an intriguing but under-explored area. The selected relevant context can better guide the system so as to where exactly in the passage to look for an answer. Irrelevant context, on the other hand, brings noise to the system, thereby resulting in a decline in the model's performance. In this paper, we propose a framework, DHS-ConvQA (Dynamic History Selection in Conversational Question Answering), that first generates the context and question entities for all the history turns, which are then pruned on the basis of similarity they share in common with the question at hand. We also propose an attention-based mechanism to re-rank the pruned terms based on their calculated weights of how useful they are in answering the question. In the end, we further aid the model by highlighting the terms in the re-ranked conversational history using a binary classification task and keeping the useful terms (predicted as 1) and ignoring the irrelevant terms (predicted as 0). We demonstrate the efficacy of our proposed framework with extensive experimental results on CANARD and QuAC -- the two popularly utilized datasets in ConvQA. We demonstrate that selecting relevant turns works better than rewriting the original question. We also investigate how adding the irrelevant history turns negatively impacts the model's performance and discuss the research challenges that demand more attention from the IR community.
{"title":"Learning to Select the Relevant History Turns in Conversational Question Answering","authors":"Munazza Zaib, Wei Emma Zhang, Quan Z. Sheng, S. Sagar, A. Mahmood, Yang Zhang","doi":"10.48550/arXiv.2308.02294","DOIUrl":"https://doi.org/10.48550/arXiv.2308.02294","url":null,"abstract":"The increasing demand for the web-based digital assistants has given a rapid rise in the interest of the Information Retrieval (IR) community towards the field of conversational question answering (ConvQA). However, one of the critical aspects of ConvQA is the effective selection of conversational history turns to answer the question at hand. The dependency between relevant history selection and correct answer prediction is an intriguing but under-explored area. The selected relevant context can better guide the system so as to where exactly in the passage to look for an answer. Irrelevant context, on the other hand, brings noise to the system, thereby resulting in a decline in the model's performance. In this paper, we propose a framework, DHS-ConvQA (Dynamic History Selection in Conversational Question Answering), that first generates the context and question entities for all the history turns, which are then pruned on the basis of similarity they share in common with the question at hand. We also propose an attention-based mechanism to re-rank the pruned terms based on their calculated weights of how useful they are in answering the question. In the end, we further aid the model by highlighting the terms in the re-ranked conversational history using a binary classification task and keeping the useful terms (predicted as 1) and ignoring the irrelevant terms (predicted as 0). We demonstrate the efficacy of our proposed framework with extensive experimental results on CANARD and QuAC -- the two popularly utilized datasets in ConvQA. We demonstrate that selecting relevant turns works better than rewriting the original question. We also investigate how adding the irrelevant history turns negatively impacts the model's performance and discuss the research challenges that demand more attention from the IR community.","PeriodicalId":424892,"journal":{"name":"WISE","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126241003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-05DOI: 10.48550/arXiv.2304.02323
Guangtong Zhou, Selasi Kwashie, Yidi Zhang, Michael Bewong, V. M. Nofong, Debo Cheng, K. He, Zaiwen Feng
This paper studies the discovery of approximate rules in property graphs. We propose a semantically meaningful measure of error for mining graph entity dependencies (GEDs) at almost hold, to tolerate errors and inconsistencies that exist in real-world graphs. We present a new characterisation of GED satisfaction, and devise a depth-first search strategy to traverse the search space of candidate rules efficiently. Further, we perform experiments to demonstrate the feasibility and scalability of our solution, FASTAGEDS, with three real-world graphs.
{"title":"FASTAGEDS: Fast Approximate Graph Entity Dependency Discovery","authors":"Guangtong Zhou, Selasi Kwashie, Yidi Zhang, Michael Bewong, V. M. Nofong, Debo Cheng, K. He, Zaiwen Feng","doi":"10.48550/arXiv.2304.02323","DOIUrl":"https://doi.org/10.48550/arXiv.2304.02323","url":null,"abstract":"This paper studies the discovery of approximate rules in property graphs. We propose a semantically meaningful measure of error for mining graph entity dependencies (GEDs) at almost hold, to tolerate errors and inconsistencies that exist in real-world graphs. We present a new characterisation of GED satisfaction, and devise a depth-first search strategy to traverse the search space of candidate rules efficiently. Further, we perform experiments to demonstrate the feasibility and scalability of our solution, FASTAGEDS, with three real-world graphs.","PeriodicalId":424892,"journal":{"name":"WISE","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125428862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-24DOI: 10.48550/arXiv.2208.11628
Z. Zhu, Shijing Si, Jianzong Wang, Yaodong Yang, Jing Xiao
. Deep neural networks can capture the intricate interaction history information between queries and documents, because of their many complicated nonlinear units, allowing them to provide correct search recommendations. However, service providers frequently face more complex obstacles in real-world circumstances, such as deployment cost constraints and fairness requirements. Knowledge distillation, which transfers the knowledge of a well-trained complex model (teacher) to a simple model (student), has been proposed to alleviate the former concern, but the best current distillation methods focus only on how to make the student model imitate the predictions of the teacher model. To better facilitate the application of deep models, we propose a fair information retrieval framework based on knowledge distillation. This framework can improve the exposure based fairness of models while considerably de-creasing model size. Our extensive experiments on three huge datasets show that our proposed framework can reduce the model size to a minimum of 1% of its original size while maintaining its black-box state. It also improves fairness performance by 15%~46% while keeping a high level of recommendation effectiveness.
{"title":"Debias the Black-box: A Fair Ranking Framework via Knowledge Distillation","authors":"Z. Zhu, Shijing Si, Jianzong Wang, Yaodong Yang, Jing Xiao","doi":"10.48550/arXiv.2208.11628","DOIUrl":"https://doi.org/10.48550/arXiv.2208.11628","url":null,"abstract":". Deep neural networks can capture the intricate interaction history information between queries and documents, because of their many complicated nonlinear units, allowing them to provide correct search recommendations. However, service providers frequently face more complex obstacles in real-world circumstances, such as deployment cost constraints and fairness requirements. Knowledge distillation, which transfers the knowledge of a well-trained complex model (teacher) to a simple model (student), has been proposed to alleviate the former concern, but the best current distillation methods focus only on how to make the student model imitate the predictions of the teacher model. To better facilitate the application of deep models, we propose a fair information retrieval framework based on knowledge distillation. This framework can improve the exposure based fairness of models while considerably de-creasing model size. Our extensive experiments on three huge datasets show that our proposed framework can reduce the model size to a minimum of 1% of its original size while maintaining its black-box state. It also improves fairness performance by 15%~46% while keeping a high level of recommendation effectiveness.","PeriodicalId":424892,"journal":{"name":"WISE","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114737664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-03DOI: 10.48550/arXiv.2208.02296
Kousik Kumar Dutta, Ankita Dewan, Venkata M. V. Gunturi
—The constrained path optimization (CPO) problem takes the following input: (a) a road network represented as a directed graph, where each edge is associated with a “cost” and a “score” value; (b) a source-destination pair and; (c) a budget value, which denotes the maximum permissible cost of the solution. Given the input, the goal is to determine a path from source to destination, which maximizes the “score” while constraining the total “cost” of the path to be within the given budget value. CPO problem has applications in urban navigation. However, the CPO problem is computationally challenging as it can be reduced to an instance of the arc orienteering problem, which is known to be NP-hard. The current state-of-the-art algorithms for this problem are essentially serial in nature and cannot take full advantage (i.e., achieve good load balance) of the increasingly available multi-core systems to solve a CPO query. Our proposed parallel algorithm (with its intelligent task-assignment scheme) achieves both superior solution quality and very low execution times (via good load balancing). Moreover, our approach is also able to demonstrate an almost linear speed-up with an increase in the number of cores.
{"title":"A Multi-Threading Algorithm for Constrained Path Optimization Problem on Road Networks","authors":"Kousik Kumar Dutta, Ankita Dewan, Venkata M. V. Gunturi","doi":"10.48550/arXiv.2208.02296","DOIUrl":"https://doi.org/10.48550/arXiv.2208.02296","url":null,"abstract":"—The constrained path optimization (CPO) problem takes the following input: (a) a road network represented as a directed graph, where each edge is associated with a “cost” and a “score” value; (b) a source-destination pair and; (c) a budget value, which denotes the maximum permissible cost of the solution. Given the input, the goal is to determine a path from source to destination, which maximizes the “score” while constraining the total “cost” of the path to be within the given budget value. CPO problem has applications in urban navigation. However, the CPO problem is computationally challenging as it can be reduced to an instance of the arc orienteering problem, which is known to be NP-hard. The current state-of-the-art algorithms for this problem are essentially serial in nature and cannot take full advantage (i.e., achieve good load balance) of the increasingly available multi-core systems to solve a CPO query. Our proposed parallel algorithm (with its intelligent task-assignment scheme) achieves both superior solution quality and very low execution times (via good load balancing). Moreover, our approach is also able to demonstrate an almost linear speed-up with an increase in the number of cores.","PeriodicalId":424892,"journal":{"name":"WISE","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116165703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-18DOI: 10.48550/arXiv.2206.09195
Geng Li, Boyuan Ren, Hongzhi Wang
. To accelerate learning process with few samples, meta-learning resorts to prior knowledge from previous tasks. However, the inconsis-tent task distribution and heterogeneity is hard to be handled through a global sharing model initialization. In this paper, based on gradient-based meta-learning, we propose an ensemble embedded meta-learning algorithm (EEML) that explicitly utilizes multi-model-ensemble to organize prior knowledge into diverse specific experts. We rely on a task embedding cluster mechanism to deliver diverse tasks to matching experts in training process and instruct how experts collaborate in test phase. As a result, the multi experts can focus on their own area of ex-pertise and cooperate in upcoming task to solve the task heterogeneity. The experimental results show that the proposed method outperforms recent state-of-the-arts easily in few-shot learning problem, which validates the importance of differentiation and cooperation.
{"title":"EEML: Ensemble Embedded Meta-learning","authors":"Geng Li, Boyuan Ren, Hongzhi Wang","doi":"10.48550/arXiv.2206.09195","DOIUrl":"https://doi.org/10.48550/arXiv.2206.09195","url":null,"abstract":". To accelerate learning process with few samples, meta-learning resorts to prior knowledge from previous tasks. However, the inconsis-tent task distribution and heterogeneity is hard to be handled through a global sharing model initialization. In this paper, based on gradient-based meta-learning, we propose an ensemble embedded meta-learning algorithm (EEML) that explicitly utilizes multi-model-ensemble to organize prior knowledge into diverse specific experts. We rely on a task embedding cluster mechanism to deliver diverse tasks to matching experts in training process and instruct how experts collaborate in test phase. As a result, the multi experts can focus on their own area of ex-pertise and cooperate in upcoming task to solve the task heterogeneity. The experimental results show that the proposed method outperforms recent state-of-the-arts easily in few-shot learning problem, which validates the importance of differentiation and cooperation.","PeriodicalId":424892,"journal":{"name":"WISE","volume":"177 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122344921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-17DOI: 10.48550/arXiv.2205.08252
Nazar Waheed, M. Ikram, S. S. Hashmi, Xiangjian He, P. Nanda
Web-based chatbots provide website owners with the benefits of increased sales, immediate response to their customers, and insight into customer behaviour. While Web-based chatbots are getting popular, they have not received much scrutiny from security researchers. The benefits to owners come at the cost of users' privacy and security. Vulnerabilities, such as tracking cookies and third-party domains, can be hidden in the chatbot's iFrame script. This paper presents a large-scale analysis of five Web-based chatbots among the top 1-million Alexa websites. Through our crawler tool, we identify the presence of chatbots in these 1-million websites. We discover that 13,515 out of the top 1-million Alexa websites (1.59%) use one of the five analysed chatbots. Our analysis reveals that the top 300k Alexa ranking websites are dominated by Intercom chatbots that embed the least number of third-party domains. LiveChat chatbots dominate the remaining websites and embed the highest samples of third-party domains. We also find that 850 (6.29%) of the chatbots use insecure protocols to transfer users' chats in plain text. Furthermore, some chatbots heavily rely on cookies for tracking and advertisement purposes. More than two-thirds (68.92%) of the identified cookies in chatbot iFrames are used for ads and tracking users. Our results show that, despite the promises for privacy, security, and anonymity given by the majority of the websites, millions of users may unknowingly be subject to poor security guarantees by chatbot service providers
{"title":"An Empirical Assessment of Security and Privacy Risks of Web based-Chatbots","authors":"Nazar Waheed, M. Ikram, S. S. Hashmi, Xiangjian He, P. Nanda","doi":"10.48550/arXiv.2205.08252","DOIUrl":"https://doi.org/10.48550/arXiv.2205.08252","url":null,"abstract":"Web-based chatbots provide website owners with the benefits of increased sales, immediate response to their customers, and insight into customer behaviour. While Web-based chatbots are getting popular, they have not received much scrutiny from security researchers. The benefits to owners come at the cost of users' privacy and security. Vulnerabilities, such as tracking cookies and third-party domains, can be hidden in the chatbot's iFrame script. This paper presents a large-scale analysis of five Web-based chatbots among the top 1-million Alexa websites. Through our crawler tool, we identify the presence of chatbots in these 1-million websites. We discover that 13,515 out of the top 1-million Alexa websites (1.59%) use one of the five analysed chatbots. Our analysis reveals that the top 300k Alexa ranking websites are dominated by Intercom chatbots that embed the least number of third-party domains. LiveChat chatbots dominate the remaining websites and embed the highest samples of third-party domains. We also find that 850 (6.29%) of the chatbots use insecure protocols to transfer users' chats in plain text. Furthermore, some chatbots heavily rely on cookies for tracking and advertisement purposes. More than two-thirds (68.92%) of the identified cookies in chatbot iFrames are used for ads and tracking users. Our results show that, despite the promises for privacy, security, and anonymity given by the majority of the websites, millions of users may unknowingly be subject to poor security guarantees by chatbot service providers","PeriodicalId":424892,"journal":{"name":"WISE","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129197853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-03DOI: 10.1007/978-3-030-90888-1_26
Samy Benslimane, J. Azé, S. Bringay, Maximilien Servajean, C. Mollevi
{"title":"Controversy Detection: A Text and Graph Neural Network Based Approach","authors":"Samy Benslimane, J. Azé, S. Bringay, Maximilien Servajean, C. Mollevi","doi":"10.1007/978-3-030-90888-1_26","DOIUrl":"https://doi.org/10.1007/978-3-030-90888-1_26","url":null,"abstract":"","PeriodicalId":424892,"journal":{"name":"WISE","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121352112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-29DOI: 10.1007/978-3-030-91560-5_21
R. Hochstenbach, F. Frasincar, Maria Mihaela Truşcǎ
{"title":"Adversarial Training for a Hybrid Approach to Aspect-Based Sentiment Analysis","authors":"R. Hochstenbach, F. Frasincar, Maria Mihaela Truşcǎ","doi":"10.1007/978-3-030-91560-5_21","DOIUrl":"https://doi.org/10.1007/978-3-030-91560-5_21","url":null,"abstract":"","PeriodicalId":424892,"journal":{"name":"WISE","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116757793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-11DOI: 10.1007/978-3-030-91560-5_19
A. Khritankov, Anton A. Pilkevich
{"title":"Existence conditions for hidden feedback loops in online recommender systems","authors":"A. Khritankov, Anton A. Pilkevich","doi":"10.1007/978-3-030-91560-5_19","DOIUrl":"https://doi.org/10.1007/978-3-030-91560-5_19","url":null,"abstract":"","PeriodicalId":424892,"journal":{"name":"WISE","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130178924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}