Proceedings of the 25th International Conference on World Wide Web最新文献

英文中文

Characterizing Long-tail SEO Spam on Cloud Web Hosting Services 表征长尾搜索引擎优化垃圾邮件在云网络托管服务

Proceedings of the 25th International Conference on World Wide Web

Pub Date : 2016-04-11 DOI: 10.1145/2872427.2883008

Xiaojing Liao, Chang Liu, Damon McCoy, E. Shi, S. Hao, R. Beyah

The popularity of long-tail search engine optimization (SEO) brings with new security challenges: incidents of long-tail keyword poisoning to lower competition and increase revenue have been reported. The emergence of cloud web hosting services provides a new and effective platform for long-tail SEO spam attacks. There is growing evidence that large-scale long-tail SEO campaigns are being carried out on cloud hosting platforms because they offer low-cost, high-speed hosting services. In this paper, we take the first step toward understanding how long-tail SEO spam is implemented on cloud hosting platforms. After identifying 3,186 cloud directories and 318,470 doorway pages on the leading cloud platforms for long-tail SEO spam, we characterize their abusive behavior. One highlight of our findings is the effectiveness of the cloud-based long-tail SEO spam, with 6% of the doorway pages successfully appearing in the top 10 search results of the poisoned long-tail keywords. Examples of other important discoveries include how such doorway pages monetize traffic and their ability to manage cloud platform's countermeasures. These findings bring such abuse to the spotlight and provide some insights to eliminating this practice.

长尾搜索引擎优化(SEO)的流行带来了新的安全挑战:长尾关键词中毒事件的报道降低了竞争，增加了收入。云虚拟主机服务的出现为长尾SEO垃圾邮件攻击提供了一个新的有效平台。越来越多的证据表明，大规模的长尾搜索引擎优化活动正在云托管平台上进行，因为它们提供低成本、高速的托管服务。在本文中，我们迈出了了解长尾SEO垃圾邮件如何在云托管平台上实现的第一步。在确定了3186个云目录和318470个门户页面在领先的云平台上的长尾搜索引擎优化垃圾邮件后，我们描述了他们的滥用行为。我们发现的一个亮点是基于云的长尾搜索引擎优化垃圾邮件的有效性，6%的门户页面成功出现在有毒长尾关键词的前10个搜索结果中。其他重要发现的例子包括这些门户页面如何将流量货币化，以及它们管理云平台对策的能力。这些发现使这种虐待行为成为人们关注的焦点，并为消除这种做法提供了一些见解。

{"title":"Characterizing Long-tail SEO Spam on Cloud Web Hosting Services","authors":"Xiaojing Liao, Chang Liu, Damon McCoy, E. Shi, S. Hao, R. Beyah","doi":"10.1145/2872427.2883008","DOIUrl":"https://doi.org/10.1145/2872427.2883008","url":null,"abstract":"The popularity of long-tail search engine optimization (SEO) brings with new security challenges: incidents of long-tail keyword poisoning to lower competition and increase revenue have been reported. The emergence of cloud web hosting services provides a new and effective platform for long-tail SEO spam attacks. There is growing evidence that large-scale long-tail SEO campaigns are being carried out on cloud hosting platforms because they offer low-cost, high-speed hosting services. In this paper, we take the first step toward understanding how long-tail SEO spam is implemented on cloud hosting platforms. After identifying 3,186 cloud directories and 318,470 doorway pages on the leading cloud platforms for long-tail SEO spam, we characterize their abusive behavior. One highlight of our findings is the effectiveness of the cloud-based long-tail SEO spam, with 6% of the doorway pages successfully appearing in the top 10 search results of the poisoned long-tail keywords. Examples of other important discoveries include how such doorway pages monetize traffic and their ability to manage cloud platform's countermeasures. These findings bring such abuse to the spotlight and provide some insights to eliminating this practice.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":"47 2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91039114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

Hidden Topic Sentiment Model 隐藏主题情感模型

Proceedings of the 25th International Conference on World Wide Web

Pub Date : 2016-04-11 DOI: 10.1145/2872427.2883072

Md. Mustafizur Rahman, Hongning Wang

Various topic models have been developed for sentiment analysis tasks. But the simple topic-sentiment mixture assumption prohibits them from finding fine-grained dependency between topical aspects and sentiments. In this paper, we build a Hidden Topic Sentiment Model (HTSM) to explicitly capture topic coherence and sentiment consistency in an opinionated text document to accurately extract latent aspects and corresponding sentiment polarities. In HTSM, 1) topic coherence is achieved by enforcing words in the same sentence to share the same topic assignment and modeling topic transition between successive sentences; 2) sentiment consistency is imposed by constraining topic transitions via tracking sentiment changes; and 3) both topic transition and sentiment transition are guided by a parameterized logistic function based on the linguistic signals directly observable in a document. Extensive experiments on four categories of product reviews from both Amazon and NewEgg validate the effectiveness of the proposed model.

已经为情感分析任务开发了各种主题模型。但是简单的主题-情感混合假设阻止了他们找到主题方面和情感之间的细粒度依赖关系。在本文中，我们建立了一个隐藏主题情感模型(HTSM)，以显式捕获固执己见的文本文档中的主题一致性和情感一致性，以准确提取潜在方面和相应的情感极性。在HTSM中，1)主题连贯是通过强制同一句子中的单词共享相同的主题分配和在连续句子之间建模主题转换来实现的;2)通过跟踪情感变化，约束主题转移，实现情感一致性;3)主题转换和情感转换都是由基于可直接观察到的语言信号的参数化逻辑函数引导的。对亚马逊和新蛋的四类产品评论进行了大量实验，验证了所提出模型的有效性。

引用次数: 41

Mechanism Design for Mixed Bidders 混合投标人机制设计

Proceedings of the 25th International Conference on World Wide Web

Pub Date : 2016-04-11 DOI: 10.1145/2872427.2882983

Y. Bachrach, S. Ceppi, Ian A. Kash, P. Key, M. Khani

The Generalized Second Price (GSP) auction has appealing properties when ads are simple (text based and identical in size), but does not generalize to richer ad settings, whereas truthful mechanisms such as VCG do. However, a straight switch from GSP to VCG incurs significant revenue loss for the search engine. We introduce a transitional mechanism which encourages advertisers to update their bids to their valuations, while mitigating revenue loss. In this setting, it is easier to propose first a payment function rather than an allocation function, so we give a general framework which guarantees incentive compatibility by requiring that the payment functions satisfy two specific properties. Finally, we analyze the revenue impacts of our mechanism on a sample of Bing data.

当广告很简单(基于文本且大小相同)时，广义第二价格(GSP)拍卖具有吸引人的特性，但不能推广到更丰富的广告设置，而真实机制(如VCG)则可以。然而，从GSP直接切换到VCG会给搜索引擎带来巨大的收入损失。我们引入了一种过渡机制，鼓励广告客户将其出价更新为其估值，同时减少收入损失。在这种情况下，首先提出支付函数比提出分配函数更容易，因此我们给出了一个一般框架，通过要求支付函数满足两个特定的属性来保证激励兼容性。最后，我们分析了我们的机制对必应数据样本的收益影响。

引用次数: 9

Non-Linear Mining of Competing Local Activities 竞争本地活动的非线性挖掘

Proceedings of the 25th International Conference on World Wide Web

Pub Date : 2016-04-11 DOI: 10.1145/2872427.2883010

Yasuko Matsubara, Yasushi Sakurai, C. Faloutsos

Given a large collection of time-evolving activities, such as Google search queries, which consist of d keywords/activities for m locations of duration n, how can we analyze temporal patterns and relationships among all these activities and find location-specific trends? How do we go about capturing non-linear evolutions of local activities and forecasting future patterns? For example, assume that we have the online search volume for multiple keywords, e.g., "Nokia/Nexus/Kindle" or "CNN/BBC" for 236 countries/territories, from 2004 to 2015. Our goal is to analyze a large collection of multi-evolving activities, and specifically, to answer the following questions: (a) Is there any sign of interaction/competition between two different keywords? If so, who competes with whom? (b) In which country is the competition strong? (c) Are there any seasonal/annual activities? (d) How can we automatically detect important world-wide (or local) events? We present COMPCUBE, a unifying non-linear model, which provides a compact and powerful representation of co-evolving activities; and also a novel fitting algorithm, COMPCUBE-FIT, which is parameter-free and scalable. Our method captures the following important patterns: (B)asic trends, i.e., non-linear dynamics of co-evolving activities, signs of (C)ompetition and latent interaction, e.g., Nokia vs. Nexus, (S)easonality, e.g., a Christmas spike for iPod in the U.S. and Europe, and (D)eltas, e.g., unrepeated local events such as the U.S. election in 2008. Thanks to its concise but effective summarization, COMPCUBE can also forecast long-range future activities. Extensive experiments on real datasets demonstrate that COMPCUBE consistently outperforms the best state-of- the-art methods in terms of both accuracy and execution speed.

给定大量随时间变化的活动，例如Google搜索查询，它由m个持续时间为n的地点的d个关键字/活动组成，我们如何分析所有这些活动之间的时间模式和关系，并找到特定于地点的趋势?我们如何捕捉本地活动的非线性演变并预测未来的模式?例如，假设我们有多个关键词的在线搜索量，例如，从2004年到2015年，236个国家/地区的“Nokia/Nexus/Kindle”或“CNN/BBC”。我们的目标是分析大量的多进化活动，特别是回答以下问题:(a)两个不同的关键词之间是否存在任何相互作用/竞争的迹象?如果是这样，谁与谁竞争?(b)哪个国家的竞争最激烈?(c)有没有季节性/年度活动?(d)我们如何能自动发现重要的世界性(或地方性)事件?我们提出了COMPCUBE，一个统一的非线性模型，它提供了一个紧凑而强大的共同进化活动的表示;并提出了一种新的无参数可扩展拟合算法COMPCUBE-FIT。我们的方法捕获了以下重要模式:(B)基本趋势，即共同发展活动的非线性动态，(C)竞争和潜在互动的迹象，例如，诺基亚与Nexus， (S)合理性，例如，iPod在美国和欧洲的圣诞节高峰，以及(D)eltas，例如，不重复的本地事件，如2008年美国大选。由于其简洁而有效的总结，COMPCUBE还可以预测长期的未来活动。在真实数据集上进行的大量实验表明，COMPCUBE在准确性和执行速度方面始终优于最先进的方法。

{"title":"Non-Linear Mining of Competing Local Activities","authors":"Yasuko Matsubara, Yasushi Sakurai, C. Faloutsos","doi":"10.1145/2872427.2883010","DOIUrl":"https://doi.org/10.1145/2872427.2883010","url":null,"abstract":"Given a large collection of time-evolving activities, such as Google search queries, which consist of d keywords/activities for m locations of duration n, how can we analyze temporal patterns and relationships among all these activities and find location-specific trends? How do we go about capturing non-linear evolutions of local activities and forecasting future patterns? For example, assume that we have the online search volume for multiple keywords, e.g., \"Nokia/Nexus/Kindle\" or \"CNN/BBC\" for 236 countries/territories, from 2004 to 2015. Our goal is to analyze a large collection of multi-evolving activities, and specifically, to answer the following questions: (a) Is there any sign of interaction/competition between two different keywords? If so, who competes with whom? (b) In which country is the competition strong? (c) Are there any seasonal/annual activities? (d) How can we automatically detect important world-wide (or local) events? We present COMPCUBE, a unifying non-linear model, which provides a compact and powerful representation of co-evolving activities; and also a novel fitting algorithm, COMPCUBE-FIT, which is parameter-free and scalable. Our method captures the following important patterns: (B)asic trends, i.e., non-linear dynamics of co-evolving activities, signs of (C)ompetition and latent interaction, e.g., Nokia vs. Nexus, (S)easonality, e.g., a Christmas spike for iPod in the U.S. and Europe, and (D)eltas, e.g., unrepeated local events such as the U.S. election in 2008. Thanks to its concise but effective summarization, COMPCUBE can also forecast long-range future activities. Extensive experiments on real datasets demonstrate that COMPCUBE consistently outperforms the best state-of- the-art methods in terms of both accuracy and execution speed.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81906892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Remedying Web Hijacking: Notification Effectiveness and Webmaster Comprehension 补救网络劫持:通知有效性和网站管理员理解

Proceedings of the 25th International Conference on World Wide Web

Pub Date : 2016-04-11 DOI: 10.1145/2872427.2883039

Frank H. Li, Grant Ho, Eric Kuan, Yuan Niu, L. Ballard, Kurt Thomas, Elie Bursztein, V. Paxson

As miscreants routinely hijack thousands of vulnerable web servers weekly for cheap hosting and traffic acquisition, security services have turned to notifications both to alert webmasters of ongoing incidents as well as to expedite recovery. In this work we present the first large-scale measurement study on the effectiveness of combinations of browser, search, and direct webmaster notifications at reducing the duration a site remains compromised. Our study captures the life cycle of 760,935 hijacking incidents from July, 2014--June, 2015, as identified by Google Safe Browsing and Search Quality. We observe that direct communication with webmasters increases the likelihood of cleanup by over 50% and reduces infection lengths by at least 62%. Absent this open channel for communication, we find browser interstitials---while intended to alert visitors to potentially harmful content---correlate with faster remediation. As part of our study, we also explore whether webmasters exhibit the necessary technical expertise to address hijacking incidents. Based on appeal logs where webmasters alert Google that their site is no longer compromised, we find 80% of operators successfully clean up symptoms on their first appeal. However, a sizeable fraction of site owners do not address the root cause of compromise, with over 12% of sites falling victim to a new attack within 30 days. We distill these findings into a set of recommendations for improving web security and best practices for webmasters.

由于不法分子每周都会劫持数千个易受攻击的网络服务器，以获取廉价的托管服务和流量，安全服务部门已经转向通知，提醒网站管理员正在发生的事件，并加快恢复速度。在这项工作中，我们提出了第一个大规模的测量研究，研究了浏览器、搜索和直接网站管理员通知组合在减少网站受损持续时间方面的有效性。我们的研究捕获了2014年7月至2015年6月期间760,935起劫持事件的生命周期，这些事件由谷歌安全浏览和搜索质量确定。我们观察到，与网站管理员的直接沟通增加了50%以上的清除可能性，并减少了至少62%的感染时间。如果没有这种开放的沟通渠道，我们发现浏览器插页广告——虽然旨在提醒访问者注意潜在的有害内容——与更快的修复相关。作为我们研究的一部分，我们还探讨了网站管理员是否表现出必要的技术专长来解决劫持事件。根据网站管理员提醒谷歌他们的网站不再受到威胁的申诉日志，我们发现80%的运营商在第一次申诉时就成功地清除了症状。然而，相当一部分网站所有者没有解决入侵的根本原因，超过12%的网站在30天内成为新攻击的受害者。我们将这些发现提炼成一组建议，以提高网络安全性，并为网站管理员提供最佳实践。

{"title":"Remedying Web Hijacking: Notification Effectiveness and Webmaster Comprehension","authors":"Frank H. Li, Grant Ho, Eric Kuan, Yuan Niu, L. Ballard, Kurt Thomas, Elie Bursztein, V. Paxson","doi":"10.1145/2872427.2883039","DOIUrl":"https://doi.org/10.1145/2872427.2883039","url":null,"abstract":"As miscreants routinely hijack thousands of vulnerable web servers weekly for cheap hosting and traffic acquisition, security services have turned to notifications both to alert webmasters of ongoing incidents as well as to expedite recovery. In this work we present the first large-scale measurement study on the effectiveness of combinations of browser, search, and direct webmaster notifications at reducing the duration a site remains compromised. Our study captures the life cycle of 760,935 hijacking incidents from July, 2014--June, 2015, as identified by Google Safe Browsing and Search Quality. We observe that direct communication with webmasters increases the likelihood of cleanup by over 50% and reduces infection lengths by at least 62%. Absent this open channel for communication, we find browser interstitials---while intended to alert visitors to potentially harmful content---correlate with faster remediation. As part of our study, we also explore whether webmasters exhibit the necessary technical expertise to address hijacking incidents. Based on appeal logs where webmasters alert Google that their site is no longer compromised, we find 80% of operators successfully clean up symptoms on their first appeal. However, a sizeable fraction of site owners do not address the root cause of compromise, with over 12% of sites falling victim to a new attack within 30 days. We distill these findings into a set of recommendations for improving web security and best practices for webmasters.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":"185 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88100856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 56

Recommendations in Signed Social Networks 签名社交网络中的推荐

Proceedings of the 25th International Conference on World Wide Web

Pub Date : 2016-04-11 DOI: 10.1145/2872427.2882971

Jiliang Tang, C. Aggarwal, Huan Liu

Recommender systems play a crucial role in mitigating the information overload problem in social media by suggesting relevant information to users. The popularity of pervasively available social activities for social media users has encouraged a large body of literature on exploiting social networks for recommendation. The vast majority of these systems focus on unsigned social networks (or social networks with only positive links), while little work exists for signed social networks (or social networks with positive and negative links). The availability of negative links in signed social networks presents both challenges and opportunities in the recommendation process. We provide a principled and mathematical approach to exploit signed social networks for recommendation, and propose a model, RecSSN, to leverage positive and negative links in signed social networks. Empirical results on real-world datasets demonstrate the effectiveness of the proposed framework. We also perform further experiments to explicitly understand the effect of signed networks in RecSSN.

推荐系统通过向用户推荐相关信息，在缓解社交媒体信息过载问题上发挥着至关重要的作用。社交媒体用户无处不在的社交活动的流行，鼓励了大量关于利用社交网络进行推荐的文献。这些系统中的绝大多数都专注于未签名的社交网络(或只有积极链接的社交网络)，而针对有签名的社交网络(或有积极和消极链接的社交网络)的工作却很少。签名社交网络中负面链接的可用性在推荐过程中既是挑战也是机遇。我们提供了一个原则性和数学方法来利用签名社交网络进行推荐，并提出了一个模型，RecSSN，以利用签名社交网络中的积极和消极联系。实际数据集的实证结果证明了所提出框架的有效性。我们还进行了进一步的实验来明确地理解签名网络在RecSSN中的影响。

引用次数: 101

User Fatigue in Online News Recommendation 网络新闻推荐中的用户疲劳

Proceedings of the 25th International Conference on World Wide Web

Pub Date : 2016-04-11 DOI: 10.1145/2872427.2874813

Hao Ma, Xueqing Liu, Zhihong Shen

Many aspects and properties of Recommender Systems have been well studied in the past decade, however, the impact of User Fatigue has been mostly ignored in the literature. User fatigue represents the phenomenon that a user quickly loses the interest on the recommended item if the same item has been presented to this user multiple times before. The direct impact caused by the user fatigue is the dramatic decrease of the Click Through Rate (CTR, i.e., the ratio of clicks to impressions). In this paper, we present a comprehensive study on the research of the user fatigue in online recommender systems. By analyzing user behavioral logs from Bing Now news recommendation, we find that user fatigue is a severe problem that greatly affects the user experience. We also notice that different users engage differently with repeated recommendations. Depending on the previous users' interaction with repeated recommendations, we illustrate that under certain condition the previously seen items should be demoted, while some other times they should be promoted. We demonstrate how statistics about the analysis of the user fatigue can be incorporated into ranking algorithms for personalized recommendations. Our experimental results indicate that significant gains can be achieved by introducing features that reflect users' interaction with previously seen recommendations (up to 15% enhancement on all users and 34% improvement on heavy users).

在过去的十年中，推荐系统的许多方面和特性都得到了很好的研究，然而，用户疲劳的影响在文献中大多被忽视。用户疲劳指的是，如果同一件商品在之前多次呈现给用户，用户很快就会对推荐商品失去兴趣。用户疲劳造成的直接影响是点击率(CTR，即点击与印象之比)的急剧下降。本文对在线推荐系统中的用户疲劳进行了全面的研究。通过分析Bing Now新闻推荐的用户行为日志，我们发现用户疲劳是一个严重的问题，极大地影响了用户体验。我们还注意到，不同的用户对重复推荐的反应是不同的。根据以前的用户与重复推荐的交互，我们说明在某些情况下，以前看到的项目应该降级，而在其他情况下，它们应该被提升。我们演示了如何将有关用户疲劳分析的统计数据纳入个性化推荐的排名算法。我们的实验结果表明，通过引入反映用户与之前看到的推荐的交互的功能，可以获得显著的收益(所有用户提高15%，重度用户提高34%)。

{"title":"User Fatigue in Online News Recommendation","authors":"Hao Ma, Xueqing Liu, Zhihong Shen","doi":"10.1145/2872427.2874813","DOIUrl":"https://doi.org/10.1145/2872427.2874813","url":null,"abstract":"Many aspects and properties of Recommender Systems have been well studied in the past decade, however, the impact of User Fatigue has been mostly ignored in the literature. User fatigue represents the phenomenon that a user quickly loses the interest on the recommended item if the same item has been presented to this user multiple times before. The direct impact caused by the user fatigue is the dramatic decrease of the Click Through Rate (CTR, i.e., the ratio of clicks to impressions). In this paper, we present a comprehensive study on the research of the user fatigue in online recommender systems. By analyzing user behavioral logs from Bing Now news recommendation, we find that user fatigue is a severe problem that greatly affects the user experience. We also notice that different users engage differently with repeated recommendations. Depending on the previous users' interaction with repeated recommendations, we illustrate that under certain condition the previously seen items should be demoted, while some other times they should be promoted. We demonstrate how statistics about the analysis of the user fatigue can be incorporated into ranking algorithms for personalized recommendations. Our experimental results indicate that significant gains can be achieved by introducing features that reflect users' interaction with previously seen recommendations (up to 15% enhancement on all users and 34% improvement on heavy users).","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78775800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 41

From Freebase to Wikidata: The Great Migration 从Freebase到维基数据:大迁移

Proceedings of the 25th International Conference on World Wide Web

Pub Date : 2016-04-11 DOI: 10.1145/2872427.2874809

Thomas Pellissier Tanon, Denny Vrandečić, Sebastian Schaffert, T. Steiner, Lydia Pintscher

Collaborative knowledge bases that make their data freely available in a machine-readable form are central for the data strategy of many projects and organizations. The two major collaborative knowledge bases are Wikimedia's Wikidata and Google's Freebase. Due to the success of Wikidata, Google decided in 2014 to offer the content of Freebase to the Wikidata community. In this paper, we report on the ongoing transfer efforts and data mapping challenges, and provide an analysis of the effort so far. We describe the Primary Sources Tool, which aims to facilitate this and future data migrations. Throughout the migration, we have gained deep insights into both Wikidata and Freebase, and share and discuss detailed statistics on both knowledge bases.

以机器可读的形式免费提供其数据的协作知识库是许多项目和组织的数据策略的核心。两个主要的协作知识库是维基媒体的Wikidata和谷歌的Freebase。由于维基数据的成功，谷歌在2014年决定将Freebase的内容提供给维基数据社区。在本文中，我们报告了正在进行的转移工作和数据映射挑战，并提供了到目前为止的工作分析。我们描述了主要来源工具，它旨在促进当前和未来的数据迁移。在整个迁移过程中，我们对Wikidata和Freebase都有了深入的了解，并分享和讨论了这两个知识库的详细统计数据。

引用次数: 193

GoCAD: GPU-Assisted Online Content-Adaptive Display Power Saving for Mobile Devices in Internet Streaming GoCAD: gpu辅助在线内容自适应显示在互联网流媒体中为移动设备省电

Proceedings of the 25th International Conference on World Wide Web

Pub Date : 2016-04-11 DOI: 10.1145/2872427.2883064

Yao Liu, Mengbai Xiao, Ming Zhang, Xin Li, Mian Dong, Zhan Ma, Zhenhua Li, Songqing Chen

During Internet streaming, a significant portion of the battery power is always consumed by the display panel on mobile devices. To reduce the display power consumption, backlight scaling, a scheme that intelligently dims the backlight has been proposed. To maintain perceived video appearance in backlight scaling, a computationally intensive luminance compensation process is required. However, this step, if performed by the CPU as existing schemes suggest, could easily offset the power savings gained from backlight scaling. Furthermore, computing the optimal backlight scaling values requires per-frame luminance information, which is typically too energy intensive for mobile devices to compute. Thus, existing schemes require such information to be available in advance. And such an offline approach makes these schemes impractical. To address these challenges, in this paper, we design and implement GoCAD, a GPU-assisted Online Content-Adaptive Display power saving scheme for mobile devices in Internet streaming sessions. In GoCAD, we employ the mobile device's GPU rather than the CPU to reduce power consumption during the luminance compensation phase. Furthermore, we compute the optimal backlight scaling values for small batches of video frames in an online fashion using a dynamic programming algorithm. Lastly, we make novel use of the widely available video storyboard, a pre-computed set of thumbnails associated with a video, to intelligently decide whether or not to apply our backlight scaling scheme for a given video. For example, when the GPU power consumption would offset the savings from dimming the backlight, no backlight scaling is conducted. To evaluate the performance of GoCAD, we implement a prototype within an Android application and use a Monsoon power monitor to measure the real power consumption. Experiments are conducted on more than 460 randomly selected YouTube videos. Results show that GoCAD can effectively produce power savings without affecting rendered video quality.

在互联网流媒体过程中，很大一部分电池电量总是被移动设备上的显示面板所消耗。为了降低显示器的功耗和背光缩放，提出了一种智能调暗背光的方案。为了在背光缩放中保持可感知的视频外观，需要计算密集的亮度补偿过程。然而，这一步，如果CPU执行现有方案建议，可以很容易地抵消从背光缩放获得的电力节省。此外，计算最佳的背光缩放值需要每帧亮度信息，这对于移动设备来说通常过于耗能。因此，现有的计划要求事先获得这些资料。而这种离线方式使得这些计划不切实际。为了解决这些挑战，在本文中，我们设计并实现了GoCAD，一个gpu辅助的在线内容自适应显示节能方案，用于互联网流媒体会话的移动设备。在GoCAD中，我们使用移动设备的GPU而不是CPU来减少亮度补偿阶段的功耗。此外，我们使用动态规划算法以在线方式计算小批量视频帧的最佳背光缩放值。最后，我们新颖地使用了广泛使用的视频故事板，这是一组预先计算的与视频相关的缩略图，可以智能地决定是否为给定的视频应用我们的背光缩放方案。例如，当GPU功耗将抵消调暗背光所节省的费用时，不进行背光缩放。为了评估GoCAD的性能，我们在Android应用程序中实现了一个原型，并使用Monsoon功率监视器来测量实际功耗。实验在460多个随机选择的YouTube视频上进行。结果表明，GoCAD可以在不影响渲染视频质量的情况下有效地节省功耗。

{"title":"GoCAD: GPU-Assisted Online Content-Adaptive Display Power Saving for Mobile Devices in Internet Streaming","authors":"Yao Liu, Mengbai Xiao, Ming Zhang, Xin Li, Mian Dong, Zhan Ma, Zhenhua Li, Songqing Chen","doi":"10.1145/2872427.2883064","DOIUrl":"https://doi.org/10.1145/2872427.2883064","url":null,"abstract":"During Internet streaming, a significant portion of the battery power is always consumed by the display panel on mobile devices. To reduce the display power consumption, backlight scaling, a scheme that intelligently dims the backlight has been proposed. To maintain perceived video appearance in backlight scaling, a computationally intensive luminance compensation process is required. However, this step, if performed by the CPU as existing schemes suggest, could easily offset the power savings gained from backlight scaling. Furthermore, computing the optimal backlight scaling values requires per-frame luminance information, which is typically too energy intensive for mobile devices to compute. Thus, existing schemes require such information to be available in advance. And such an offline approach makes these schemes impractical. To address these challenges, in this paper, we design and implement GoCAD, a GPU-assisted Online Content-Adaptive Display power saving scheme for mobile devices in Internet streaming sessions. In GoCAD, we employ the mobile device's GPU rather than the CPU to reduce power consumption during the luminance compensation phase. Furthermore, we compute the optimal backlight scaling values for small batches of video frames in an online fashion using a dynamic programming algorithm. Lastly, we make novel use of the widely available video storyboard, a pre-computed set of thumbnails associated with a video, to intelligently decide whether or not to apply our backlight scaling scheme for a given video. For example, when the GPU power consumption would offset the savings from dimming the backlight, no backlight scaling is conducted. To evaluate the performance of GoCAD, we implement a prototype within an Android application and use a Monsoon power monitor to measure the real power consumption. Experiments are conducted on more than 460 randomly selected YouTube videos. Results show that GoCAD can effectively produce power savings without affecting rendered video quality.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73468097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

A Study of Retrieval Models for Long Documents and Queries in Information Retrieval 信息检索中长文档和查询的检索模型研究

Proceedings of the 25th International Conference on World Wide Web

Pub Date : 2016-04-11 DOI: 10.1145/2872427.2883009

Ronan Cummins

Recent research has shown that long documents are unfairly penalised by a number of current retrieval methods. In this paper, we formally analyse two important but distinct reasons for normalising documents with respect to length, namely verbosity and scope, and discuss the practical implications of not normalising accordingly. We review a number of language modelling approaches and a range of recently developed retrieval methods, and show that most do not correctly model both phenomena, thus limiting their retrieval effectiveness in certain situations. Furthermore, the retrieval characteristics of long natural language queries have not traditionally had the same attention as short keyword queries. We develop a new discriminative query language modelling approach that demonstrates improved performance on long verbose queries by appropriately weighting salient aspects of the query. When combined with query expansion, we show that our new approach yields state-of-the-art performance for long verbose queries.

最近的研究表明，长文档在当前的一些检索方法中受到了不公平的惩罚。在本文中，我们正式分析了关于长度规范化文档的两个重要但不同的原因，即冗长和范围，并讨论了不规范化的实际含义。我们回顾了一些语言建模方法和一系列最近开发的检索方法，并表明大多数不能正确地模拟这两种现象，从而限制了它们在某些情况下的检索效率。此外，长自然语言查询的检索特征传统上没有像短关键字查询那样受到重视。我们开发了一种新的判别查询语言建模方法，该方法通过适当地加权查询的显著方面来演示长冗长查询的性能改进。当与查询展开结合使用时，我们发现我们的新方法可以为冗长查询提供最先进的性能。

引用次数: 10

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 25th International Conference on World Wide Web

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀