首页 > 最新文献

Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics最新文献

英文 中文
GENESIS: a generic RDF data access interface GENESIS:通用的RDF数据访问接口
Timofey Ermilov, Diego Moussallem, Ricardo Usbeck, A. N. Ngomo
The availability of billions of facts represented in RDF on the Web provides novel opportunities for data discovery and access. In particular, keyword search and question answering approaches enable even lay people to access this data. However, the interpretation of the results of these systems, as well as the navigation through these results, remains challenging. In this paper, we present Genesis, a generic RDF data access interface. Genesis can be deployed on top of any knowledge base and search engine with minimal effort and allows for the representation of RDF data in a layperson-friendly way. This is facilitated by the modular architecture for reusable components underlying our framework. Currently, these include a generic search back-end, together with corresponding interactive user interface components based on a service for similar and related entities as well as verbalization services to bridge between RDF and natural language.
Web上以RDF表示的数十亿事实的可用性为数据发现和访问提供了新的机会。特别是,关键字搜索和问答方法使外行也能访问这些数据。然而,对这些系统的结果进行解释,以及通过这些结果进行导航,仍然具有挑战性。在本文中,我们提出了通用的RDF数据访问接口Genesis。Genesis可以轻松地部署在任何知识库和搜索引擎之上,并允许以外行人友好的方式表示RDF数据。这得益于框架下可重用组件的模块化体系结构。目前,这些包括一个通用的搜索后端,以及相应的基于类似和相关实体服务的交互式用户界面组件,以及在RDF和自然语言之间架起桥梁的语言化服务。
{"title":"GENESIS: a generic RDF data access interface","authors":"Timofey Ermilov, Diego Moussallem, Ricardo Usbeck, A. N. Ngomo","doi":"10.1145/3106426.3106514","DOIUrl":"https://doi.org/10.1145/3106426.3106514","url":null,"abstract":"The availability of billions of facts represented in RDF on the Web provides novel opportunities for data discovery and access. In particular, keyword search and question answering approaches enable even lay people to access this data. However, the interpretation of the results of these systems, as well as the navigation through these results, remains challenging. In this paper, we present Genesis, a generic RDF data access interface. Genesis can be deployed on top of any knowledge base and search engine with minimal effort and allows for the representation of RDF data in a layperson-friendly way. This is facilitated by the modular architecture for reusable components underlying our framework. Currently, these include a generic search back-end, together with corresponding interactive user interface components based on a service for similar and related entities as well as verbalization services to bridge between RDF and natural language.","PeriodicalId":20685,"journal":{"name":"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89868675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A graph based approach to scientific paper recommendation 基于图的科学论文推荐方法
M. Amami, R. Faiz, Fabio Stella, G. Pasi
When looking for recently published scientific papers, a researcher usually focuses on the topics related to her/his scientific interests. The task of a recommender system is to provide a list of unseen papers that match these topics. The core idea of this paper is to leverage the latent topics of interest in the publications of the researchers, and to take advantage of the social structure of the researchers (relations among researchers in the same field) as reliable sources of knowledge to improve the recommendation effectiveness. In particular, we introduce a hybrid approach to the task of scientific papers recommendation, which combines content analysis based on probabilistic topic modeling and ideas from collaborative filtering based on a relevance-based language model. We conducted an experimental study on DBLP, which demonstrates that our approach is promising.
在寻找最近发表的科学论文时,研究人员通常会关注与他/她的科学兴趣相关的主题。推荐系统的任务是提供与这些主题匹配的未见过的论文列表。本文的核心思想是利用研究人员发表的潜在感兴趣的话题,利用研究人员的社会结构(同一领域的研究人员之间的关系)作为可靠的知识来源来提高推荐的有效性。特别地,我们引入了一种混合方法来完成科学论文推荐任务,该方法结合了基于概率主题建模的内容分析和基于基于相关性的语言模型的协同过滤思想。我们对DBLP进行了实验研究,结果表明我们的方法是有希望的。
{"title":"A graph based approach to scientific paper recommendation","authors":"M. Amami, R. Faiz, Fabio Stella, G. Pasi","doi":"10.1145/3106426.3106479","DOIUrl":"https://doi.org/10.1145/3106426.3106479","url":null,"abstract":"When looking for recently published scientific papers, a researcher usually focuses on the topics related to her/his scientific interests. The task of a recommender system is to provide a list of unseen papers that match these topics. The core idea of this paper is to leverage the latent topics of interest in the publications of the researchers, and to take advantage of the social structure of the researchers (relations among researchers in the same field) as reliable sources of knowledge to improve the recommendation effectiveness. In particular, we introduce a hybrid approach to the task of scientific papers recommendation, which combines content analysis based on probabilistic topic modeling and ideas from collaborative filtering based on a relevance-based language model. We conducted an experimental study on DBLP, which demonstrates that our approach is promising.","PeriodicalId":20685,"journal":{"name":"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89608690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Machine learning is better than human to satisfy decision by majority 机器学习比人类更能满足多数人的决策
S. Hirokawa, Takahiko Suzuki, Tsunenori Mine
Government 2.0 activities have become very attractive and popular these days. Using platforms to support the activities, anyone can anytime report issues or complaints in a city with their photographs and geographical information on the Web, and share them with other people. Since a variety of reports are posted, officials in the city management section have to check the importance of each report and sort out their priorities to the reports. However, it is not easy task to judge the importance of the reports. When several officials work on the task, the agreement rate of their judgments is not always high. Even if the task is done by only one official, his/her judgment sometimes varies on a similar report. To remedy this low agreement rate problem of human judgments, we propose a method of detecting signs of danger or unsafe problems described in citizens' reports. The proposed method uses a machine learning technique with word feature selection. Experimental results clearly explain the low agreement rate of human judgments, and illustrate that the proposed machine learning method has much higher performance than human judgments.
政府2.0活动最近变得非常有吸引力和流行。利用平台支持活动,任何人都可以随时在网络上用自己的照片和地理信息报告城市的问题或投诉,并与他人分享。由于各种各样的报告被张贴,城市管理部门的官员必须检查每个报告的重要性,并对报告进行排序。然而,要判断这些报告的重要性并不容易。当几个官员共同完成一项任务时,他们判断的一致性并不总是很高。即使这项任务只由一个官员完成,他/她的判断有时也会因类似的报告而发生变化。为了纠正这种人类判断的低一致性问题,我们提出了一种检测公民报告中描述的危险或不安全问题迹象的方法。该方法使用了带有单词特征选择的机器学习技术。实验结果清楚地解释了人类判断的低符合率,并说明所提出的机器学习方法具有比人类判断更高的性能。
{"title":"Machine learning is better than human to satisfy decision by majority","authors":"S. Hirokawa, Takahiko Suzuki, Tsunenori Mine","doi":"10.1145/3106426.3106520","DOIUrl":"https://doi.org/10.1145/3106426.3106520","url":null,"abstract":"Government 2.0 activities have become very attractive and popular these days. Using platforms to support the activities, anyone can anytime report issues or complaints in a city with their photographs and geographical information on the Web, and share them with other people. Since a variety of reports are posted, officials in the city management section have to check the importance of each report and sort out their priorities to the reports. However, it is not easy task to judge the importance of the reports. When several officials work on the task, the agreement rate of their judgments is not always high. Even if the task is done by only one official, his/her judgment sometimes varies on a similar report. To remedy this low agreement rate problem of human judgments, we propose a method of detecting signs of danger or unsafe problems described in citizens' reports. The proposed method uses a machine learning technique with word feature selection. Experimental results clearly explain the low agreement rate of human judgments, and illustrate that the proposed machine learning method has much higher performance than human judgments.","PeriodicalId":20685,"journal":{"name":"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78082652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Information evolution modeling and tracking in social media 社交媒体中的信息演化建模与跟踪
E. Shabunina, G. Pasi
Nowadays, User Generated Content is the main source of real time news and opinions on the world happenings. Social Media, which serves as an environment for the creation and spreading of User Generated Content, is, therefore, representative of our culture and constitutes a potential treasury of knowledge. In this paper we propose a fully automatic approach for modeling and tracking the information evolution in Social Media. In particular, we propose to model a Social Media stream as a text graph. A graph degeneracy technique is used to identify the temporal sequence of the core units of information streams represented by graphs. Furthermore, as the major novelty of this work, we propose a set of measures to track and evaluate the evolution of information in time. An experimental evaluation on the crawled datasets from one of the most popular Social Media platforms proves the validity and applicability of the proposed approach.
如今,用户生成内容是实时新闻和对世界事件的看法的主要来源。因此,社交媒体作为创造和传播用户生成内容的环境,代表了我们的文化,构成了潜在的知识宝库。在本文中,我们提出了一种全自动建模和跟踪社交媒体信息演变的方法。特别是,我们建议将社交媒体流建模为文本图。利用图简并技术识别图表示的信息流核心单元的时间序列。此外,作为这项工作的主要新颖之处,我们提出了一套及时跟踪和评估信息演变的措施。对一个最流行的社交媒体平台抓取的数据集进行了实验评估,证明了所提出方法的有效性和适用性。
{"title":"Information evolution modeling and tracking in social media","authors":"E. Shabunina, G. Pasi","doi":"10.1145/3106426.3106443","DOIUrl":"https://doi.org/10.1145/3106426.3106443","url":null,"abstract":"Nowadays, User Generated Content is the main source of real time news and opinions on the world happenings. Social Media, which serves as an environment for the creation and spreading of User Generated Content, is, therefore, representative of our culture and constitutes a potential treasury of knowledge. In this paper we propose a fully automatic approach for modeling and tracking the information evolution in Social Media. In particular, we propose to model a Social Media stream as a text graph. A graph degeneracy technique is used to identify the temporal sequence of the core units of information streams represented by graphs. Furthermore, as the major novelty of this work, we propose a set of measures to track and evaluate the evolution of information in time. An experimental evaluation on the crawled datasets from one of the most popular Social Media platforms proves the validity and applicability of the proposed approach.","PeriodicalId":20685,"journal":{"name":"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75170778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Haste makes waste: a case to favour voting bots 欲速则不达:一个支持投票机器人的案例
David Ben Yosef, L. Dery, S. Obraztsova, Zinovi Rabinovich, M. Bannikova
Voting is a common way to reach a group decision. When possible, voters will attempt to vote strategically, in order to optimize their satisfaction from the outcome. Previous research has modelled how rational voter agents (bots) vote to maximize their personal utility in an iterative voting process that has a deadline (a timeout). However, it remains an open question whether human beings behave rationally when faced with the same settings. The focus of this paper is therefore to examine how the deadline factor affects manipulative behavior in real-world scenarios were humans are required to reach a decision before a deadline. An On-line platform was built to enable voting games by all types of users: agents (bots), humans, and mixed games with both humans and agents. We compare the results of human behavior and bot behavior and conclude that it might be wise to allow bots to make (certain) decisions on our behalf.
投票是达成集体决策的一种常见方式。在可能的情况下,选民会尝试策略性地投票,以优化他们对结果的满意度。之前的研究已经模拟了理性选民代理(机器人)如何在一个有截止日期(超时)的迭代投票过程中投票以最大化他们的个人效用。然而,面对同样的环境,人类是否会理性行事,这仍然是一个悬而未决的问题。因此,本文的重点是研究在要求人类在截止日期之前做出决定的现实场景中,截止日期因素如何影响操纵行为。我们建立了一个在线平台,支持所有类型的用户进行投票游戏:代理(机器人)、人类,以及人类和代理的混合游戏。我们比较了人类行为和机器人行为的结果,并得出结论,允许机器人代表我们做出(某些)决定可能是明智的。
{"title":"Haste makes waste: a case to favour voting bots","authors":"David Ben Yosef, L. Dery, S. Obraztsova, Zinovi Rabinovich, M. Bannikova","doi":"10.1145/3106426.3106532","DOIUrl":"https://doi.org/10.1145/3106426.3106532","url":null,"abstract":"Voting is a common way to reach a group decision. When possible, voters will attempt to vote strategically, in order to optimize their satisfaction from the outcome. Previous research has modelled how rational voter agents (bots) vote to maximize their personal utility in an iterative voting process that has a deadline (a timeout). However, it remains an open question whether human beings behave rationally when faced with the same settings. The focus of this paper is therefore to examine how the deadline factor affects manipulative behavior in real-world scenarios were humans are required to reach a decision before a deadline. An On-line platform was built to enable voting games by all types of users: agents (bots), humans, and mixed games with both humans and agents. We compare the results of human behavior and bot behavior and conclude that it might be wise to allow bots to make (certain) decisions on our behalf.","PeriodicalId":20685,"journal":{"name":"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72972294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
AQUAMan: QoE-driven cost-aware mechanism for SaaS acceptability rate adaptation AQUAMan: SaaS可接受率调整的qos驱动的成本意识机制
A. Najjar, Yazan Mualla, O. Boissier, Gauthier Picard
As more interactive and multimedia-rich applications are migrating to the cloud, end-user satisfaction and her Quality of Experience (QoE) will become a determinant factor to secure success for any Software as a Service (SaaS) provider. Yet, in order to survive in this competitive market, SaaS providers also need to maximize their Quality of Business (QoBiz) and minimize costs paid to cloud providers. However, most of the existing works in the literature adopt a provider-centric approach where the end-user preferences are overlooked. In this article, we propose the AQUAMan mechanism that gives the provider a fine-grained QoE-driven control over the service acceptability rate while taking into account both end-users' satisfaction and provider's QoBiz. The proposed solution is implemented using a multi-agent simulation environment. The results show that the SaaS provider is capable of attaining the predefined acceptability rate while respecting the imposed average cost per user. Furthermore, the results help the SaaS provider identify the limits of the adaptation mechanism and estimate the best average cost to be invested per user.
随着越来越多的交互式和多媒体应用程序迁移到云端,终端用户满意度和体验质量(QoE)将成为确保任何软件即服务(SaaS)提供商成功的决定性因素。然而,为了在这个竞争激烈的市场中生存,SaaS提供商还需要最大化其业务质量(QoBiz)并最小化支付给云提供商的成本。然而,文献中的大多数现有工作都采用了以提供者为中心的方法,忽略了最终用户的偏好。在本文中,我们提出了AQUAMan机制,该机制在考虑最终用户满意度和提供者的QoBiz的同时,为提供者提供了对服务可接受率的细粒度qos驱动控制。该解决方案采用多智能体仿真环境实现。结果表明,SaaS提供商能够在尊重强加的每个用户平均成本的情况下达到预定义的可接受率。此外,这些结果有助于SaaS提供商确定适应机制的限制,并估计每个用户的最佳平均投资成本。
{"title":"AQUAMan: QoE-driven cost-aware mechanism for SaaS acceptability rate adaptation","authors":"A. Najjar, Yazan Mualla, O. Boissier, Gauthier Picard","doi":"10.1145/3106426.3106485","DOIUrl":"https://doi.org/10.1145/3106426.3106485","url":null,"abstract":"As more interactive and multimedia-rich applications are migrating to the cloud, end-user satisfaction and her Quality of Experience (QoE) will become a determinant factor to secure success for any Software as a Service (SaaS) provider. Yet, in order to survive in this competitive market, SaaS providers also need to maximize their Quality of Business (QoBiz) and minimize costs paid to cloud providers. However, most of the existing works in the literature adopt a provider-centric approach where the end-user preferences are overlooked. In this article, we propose the AQUAMan mechanism that gives the provider a fine-grained QoE-driven control over the service acceptability rate while taking into account both end-users' satisfaction and provider's QoBiz. The proposed solution is implemented using a multi-agent simulation environment. The results show that the SaaS provider is capable of attaining the predefined acceptability rate while respecting the imposed average cost per user. Furthermore, the results help the SaaS provider identify the limits of the adaptation mechanism and estimate the best average cost to be invested per user.","PeriodicalId":20685,"journal":{"name":"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81680932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Solving DCSP problems in highly degraded communication environments 解决高降级通信环境下的DCSP问题
Saeid Samadidana, R. Mailler
Although there have been tremendous gains in network communication reliability, many real world applications of distributed systems still face message loss, limitations, delay, and corruption. Yet despite this fact, most Distributed Constraint Satisfaction (DCSP) protocols assume that communication is perfect (messages that are sent will be received) although not ideal (not in a timely manner). As a result, many protocols are designed to exploit this assumption and are severely impacted when applied to real world conditions. This study compares the performance of several leading DCSP protocols including the Distributed Stochastic Algorithm (DSA), Distributed Breakout Algorithm (DBA), Max-Gain Message (MGM) and Distributed Probabilistic Protocol (DPP) to analyse their behaviour in communication degraded environments. The analysis begins by comparing the performance of all of the protocols in a perfect communication environment. We then use a simulated communication degraded environment where messages are probabilistically lost. Finally, we compare their performance by limiting the communication rate, which introduces delay. We show that DBA, once modified with a message timeout, is quite resistant to high message loss while DPP and DSA converge slower onto worse solutions. Our results also show that the setting of timeout value for DBA and MGM is an important factor in the convergence of these algorithms. Under conditions of message delay, DPP and DSA are less affected than DBA and MGM. Overall, DPP and DSA cause considerably less network load.
尽管在网络通信可靠性方面已经取得了巨大的进步,但是分布式系统的许多实际应用程序仍然面临消息丢失、限制、延迟和损坏的问题。尽管如此,大多数分布式约束满足(DCSP)协议都假设通信是完美的(发送的消息将被接收),尽管不理想(不及时)。因此,许多协议都是利用这种假设设计的,并且在应用于现实世界条件时受到严重影响。本研究比较了几种领先的DCSP协议的性能,包括分布式随机算法(DSA)、分布式中断算法(DBA)、最大增益消息(MGM)和分布式概率协议(DPP),以分析它们在通信退化环境中的行为。分析首先比较了所有协议在完美通信环境中的性能。然后,我们使用一个模拟的通信退化环境,其中消息可能会丢失。最后,我们通过限制通信速率来比较它们的性能,这引入了延迟。我们表明,一旦使用消息超时修改DBA,就可以抵抗高消息丢失,而DPP和DSA则会缓慢地收敛到更差的解决方案。我们的研究结果还表明,DBA和MGM超时值的设置是影响这些算法收敛性的重要因素。在消息延迟条件下,DPP和DSA比DBA和MGM受影响较小。总的来说,DPP和DSA造成的网络负载要小得多。
{"title":"Solving DCSP problems in highly degraded communication environments","authors":"Saeid Samadidana, R. Mailler","doi":"10.1145/3106426.3106445","DOIUrl":"https://doi.org/10.1145/3106426.3106445","url":null,"abstract":"Although there have been tremendous gains in network communication reliability, many real world applications of distributed systems still face message loss, limitations, delay, and corruption. Yet despite this fact, most Distributed Constraint Satisfaction (DCSP) protocols assume that communication is perfect (messages that are sent will be received) although not ideal (not in a timely manner). As a result, many protocols are designed to exploit this assumption and are severely impacted when applied to real world conditions. This study compares the performance of several leading DCSP protocols including the Distributed Stochastic Algorithm (DSA), Distributed Breakout Algorithm (DBA), Max-Gain Message (MGM) and Distributed Probabilistic Protocol (DPP) to analyse their behaviour in communication degraded environments. The analysis begins by comparing the performance of all of the protocols in a perfect communication environment. We then use a simulated communication degraded environment where messages are probabilistically lost. Finally, we compare their performance by limiting the communication rate, which introduces delay. We show that DBA, once modified with a message timeout, is quite resistant to high message loss while DPP and DSA converge slower onto worse solutions. Our results also show that the setting of timeout value for DBA and MGM is an important factor in the convergence of these algorithms. Under conditions of message delay, DPP and DSA are less affected than DBA and MGM. Overall, DPP and DSA cause considerably less network load.","PeriodicalId":20685,"journal":{"name":"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79800771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Using re-ranking to boost deep learning based community question retrieval 利用重新排序促进深度学习社区问题检索
K. Ghosh, Plaban Kumar Bhowmick, Pawan Goyal
The current study presents a two-stage question retrieval approach which, in the first phase, retrieves similar questions for a given query using a deep learning based approach and in the second phase, re-ranks initially retrieved questions on the basis of inter-question similarities. The suggested deep learning based approach is trained using several surface features of texts and the associated weights are pre-trained using a deep generative model for better initialization. The proposed retrieval model outperforms standard baseline question retrieval approaches. The proposed re-ranking approach performs inference over a similarity graph constructed with the initially retrieved questions and re-ranks the questions based on their similarity with other relevant questions. Suggested re-ranking approach significantly improves the precision for the retrieval task.
目前的研究提出了一种两阶段的问题检索方法,在第一阶段,使用基于深度学习的方法为给定查询检索相似的问题,在第二阶段,根据问题之间的相似性对最初检索到的问题重新排序。建议的基于深度学习的方法使用文本的几个表面特征进行训练,并使用深度生成模型对相关权重进行预训练,以便更好地初始化。提出的检索模型优于标准基线问题检索方法。提出的重新排序方法对由最初检索到的问题构造的相似图进行推理,并根据问题与其他相关问题的相似度对问题进行重新排序。提出的重新排序方法显著提高了检索任务的精度。
{"title":"Using re-ranking to boost deep learning based community question retrieval","authors":"K. Ghosh, Plaban Kumar Bhowmick, Pawan Goyal","doi":"10.1145/3106426.3106442","DOIUrl":"https://doi.org/10.1145/3106426.3106442","url":null,"abstract":"The current study presents a two-stage question retrieval approach which, in the first phase, retrieves similar questions for a given query using a deep learning based approach and in the second phase, re-ranks initially retrieved questions on the basis of inter-question similarities. The suggested deep learning based approach is trained using several surface features of texts and the associated weights are pre-trained using a deep generative model for better initialization. The proposed retrieval model outperforms standard baseline question retrieval approaches. The proposed re-ranking approach performs inference over a similarity graph constructed with the initially retrieved questions and re-ranks the questions based on their similarity with other relevant questions. Suggested re-ranking approach significantly improves the precision for the retrieval task.","PeriodicalId":20685,"journal":{"name":"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81779464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
MRR MRR
Vinicius Woloszyn, H. D. Dos Santos, Leandro Krug Wives, Karin Becker
The automatic detection of relevant reviews plays a major role in tasks such as opinion summarization, opinion-based recommendation, and opinion retrieval. Supervised approaches for ranking reviews by relevance rely on the existence of a significant, domain-dependent training data set. In this work, we propose MRR (Most Relevant Reviews), a new unsupervised algorithm that identifies relevant revisions based on the concept of graph centrality. The intuition behind MRR is that central reviews highlight aspects of a product that many other reviews frequently mention, with similar opinions, as expressed in terms of ratings. MRR constructs a graph where nodes represent reviews, which are connected by edges when a minimum similarity between a pair of reviews is observed, and then employs PageRank to compute the centrality. The minimum similarity is graph-specific, and takes into account how reviews are written in specific domains. The similarity function does not require extensive pre-processing, thus reducing the computational cost. Using reviews from books and electronics products, our approach has outperformed the two unsupervised baselines and shown a comparable performance with two supervised regression models in a specific setting. MRR has also achieved a significantly superior run-time performance in a comparison with the unsupervised baselines.
{"title":"MRR","authors":"Vinicius Woloszyn, H. D. Dos Santos, Leandro Krug Wives, Karin Becker","doi":"10.1145/3106426.3106444","DOIUrl":"https://doi.org/10.1145/3106426.3106444","url":null,"abstract":"The automatic detection of relevant reviews plays a major role in tasks such as opinion summarization, opinion-based recommendation, and opinion retrieval. Supervised approaches for ranking reviews by relevance rely on the existence of a significant, domain-dependent training data set. In this work, we propose MRR (Most Relevant Reviews), a new unsupervised algorithm that identifies relevant revisions based on the concept of graph centrality. The intuition behind MRR is that central reviews highlight aspects of a product that many other reviews frequently mention, with similar opinions, as expressed in terms of ratings. MRR constructs a graph where nodes represent reviews, which are connected by edges when a minimum similarity between a pair of reviews is observed, and then employs PageRank to compute the centrality. The minimum similarity is graph-specific, and takes into account how reviews are written in specific domains. The similarity function does not require extensive pre-processing, thus reducing the computational cost. Using reviews from books and electronics products, our approach has outperformed the two unsupervised baselines and shown a comparable performance with two supervised regression models in a specific setting. MRR has also achieved a significantly superior run-time performance in a comparison with the unsupervised baselines.","PeriodicalId":20685,"journal":{"name":"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81794341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
The Adressa dataset for news recommendation 新闻推荐的addressa数据集
J. Gulla, Lemei Zhang, Peng Liu, Özlem Özgöbek, Xiaomeng Su
Datasets for recommender systems are few and often inadequate for the contextualized nature of news recommendation. News recommender systems are both time- and location-dependent, make use of implicit signals, and often include both collaborative and content-based components. In this paper we introduce the Adressa compact news dataset, which supports all these aspects of news recommendation. The dataset comes in two versions, the large 20M dataset of 10 weeks' traffic on Adresseavisen's news portal, and the small 2M dataset of only one week's traffic. We explain the structure of the dataset and discuss how it can be used in advanced news recommender systems.
推荐系统的数据集很少,而且往往不足以满足新闻推荐的情境化性质。新闻推荐系统依赖于时间和地点,使用隐式信号,通常包括协作和基于内容的组件。在本文中,我们引入了支持所有这些方面的新闻推荐的addressa压缩新闻数据集。该数据集有两个版本,大型的2000万数据集记录了Adresseavisen新闻门户网站10周的流量,而小型的200万数据集只记录了一周的流量。我们解释了数据集的结构,并讨论了如何将其用于高级新闻推荐系统。
{"title":"The Adressa dataset for news recommendation","authors":"J. Gulla, Lemei Zhang, Peng Liu, Özlem Özgöbek, Xiaomeng Su","doi":"10.1145/3106426.3109436","DOIUrl":"https://doi.org/10.1145/3106426.3109436","url":null,"abstract":"Datasets for recommender systems are few and often inadequate for the contextualized nature of news recommendation. News recommender systems are both time- and location-dependent, make use of implicit signals, and often include both collaborative and content-based components. In this paper we introduce the Adressa compact news dataset, which supports all these aspects of news recommendation. The dataset comes in two versions, the large 20M dataset of 10 weeks' traffic on Adresseavisen's news portal, and the small 2M dataset of only one week's traffic. We explain the structure of the dataset and discuss how it can be used in advanced news recommender systems.","PeriodicalId":20685,"journal":{"name":"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81362279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 138
期刊
Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1