Xiaojing Liao, Chang Liu, Damon McCoy, E. Shi, S. Hao, R. Beyah
The popularity of long-tail search engine optimization (SEO) brings with new security challenges: incidents of long-tail keyword poisoning to lower competition and increase revenue have been reported. The emergence of cloud web hosting services provides a new and effective platform for long-tail SEO spam attacks. There is growing evidence that large-scale long-tail SEO campaigns are being carried out on cloud hosting platforms because they offer low-cost, high-speed hosting services. In this paper, we take the first step toward understanding how long-tail SEO spam is implemented on cloud hosting platforms. After identifying 3,186 cloud directories and 318,470 doorway pages on the leading cloud platforms for long-tail SEO spam, we characterize their abusive behavior. One highlight of our findings is the effectiveness of the cloud-based long-tail SEO spam, with 6% of the doorway pages successfully appearing in the top 10 search results of the poisoned long-tail keywords. Examples of other important discoveries include how such doorway pages monetize traffic and their ability to manage cloud platform's countermeasures. These findings bring such abuse to the spotlight and provide some insights to eliminating this practice.
{"title":"Characterizing Long-tail SEO Spam on Cloud Web Hosting Services","authors":"Xiaojing Liao, Chang Liu, Damon McCoy, E. Shi, S. Hao, R. Beyah","doi":"10.1145/2872427.2883008","DOIUrl":"https://doi.org/10.1145/2872427.2883008","url":null,"abstract":"The popularity of long-tail search engine optimization (SEO) brings with new security challenges: incidents of long-tail keyword poisoning to lower competition and increase revenue have been reported. The emergence of cloud web hosting services provides a new and effective platform for long-tail SEO spam attacks. There is growing evidence that large-scale long-tail SEO campaigns are being carried out on cloud hosting platforms because they offer low-cost, high-speed hosting services. In this paper, we take the first step toward understanding how long-tail SEO spam is implemented on cloud hosting platforms. After identifying 3,186 cloud directories and 318,470 doorway pages on the leading cloud platforms for long-tail SEO spam, we characterize their abusive behavior. One highlight of our findings is the effectiveness of the cloud-based long-tail SEO spam, with 6% of the doorway pages successfully appearing in the top 10 search results of the poisoned long-tail keywords. Examples of other important discoveries include how such doorway pages monetize traffic and their ability to manage cloud platform's countermeasures. These findings bring such abuse to the spotlight and provide some insights to eliminating this practice.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91039114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Various topic models have been developed for sentiment analysis tasks. But the simple topic-sentiment mixture assumption prohibits them from finding fine-grained dependency between topical aspects and sentiments. In this paper, we build a Hidden Topic Sentiment Model (HTSM) to explicitly capture topic coherence and sentiment consistency in an opinionated text document to accurately extract latent aspects and corresponding sentiment polarities. In HTSM, 1) topic coherence is achieved by enforcing words in the same sentence to share the same topic assignment and modeling topic transition between successive sentences; 2) sentiment consistency is imposed by constraining topic transitions via tracking sentiment changes; and 3) both topic transition and sentiment transition are guided by a parameterized logistic function based on the linguistic signals directly observable in a document. Extensive experiments on four categories of product reviews from both Amazon and NewEgg validate the effectiveness of the proposed model.
{"title":"Hidden Topic Sentiment Model","authors":"Md. Mustafizur Rahman, Hongning Wang","doi":"10.1145/2872427.2883072","DOIUrl":"https://doi.org/10.1145/2872427.2883072","url":null,"abstract":"Various topic models have been developed for sentiment analysis tasks. But the simple topic-sentiment mixture assumption prohibits them from finding fine-grained dependency between topical aspects and sentiments. In this paper, we build a Hidden Topic Sentiment Model (HTSM) to explicitly capture topic coherence and sentiment consistency in an opinionated text document to accurately extract latent aspects and corresponding sentiment polarities. In HTSM, 1) topic coherence is achieved by enforcing words in the same sentence to share the same topic assignment and modeling topic transition between successive sentences; 2) sentiment consistency is imposed by constraining topic transitions via tracking sentiment changes; and 3) both topic transition and sentiment transition are guided by a parameterized logistic function based on the linguistic signals directly observable in a document. Extensive experiments on four categories of product reviews from both Amazon and NewEgg validate the effectiveness of the proposed model.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79821174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y. Bachrach, S. Ceppi, Ian A. Kash, P. Key, M. Khani
The Generalized Second Price (GSP) auction has appealing properties when ads are simple (text based and identical in size), but does not generalize to richer ad settings, whereas truthful mechanisms such as VCG do. However, a straight switch from GSP to VCG incurs significant revenue loss for the search engine. We introduce a transitional mechanism which encourages advertisers to update their bids to their valuations, while mitigating revenue loss. In this setting, it is easier to propose first a payment function rather than an allocation function, so we give a general framework which guarantees incentive compatibility by requiring that the payment functions satisfy two specific properties. Finally, we analyze the revenue impacts of our mechanism on a sample of Bing data.
{"title":"Mechanism Design for Mixed Bidders","authors":"Y. Bachrach, S. Ceppi, Ian A. Kash, P. Key, M. Khani","doi":"10.1145/2872427.2882983","DOIUrl":"https://doi.org/10.1145/2872427.2882983","url":null,"abstract":"The Generalized Second Price (GSP) auction has appealing properties when ads are simple (text based and identical in size), but does not generalize to richer ad settings, whereas truthful mechanisms such as VCG do. However, a straight switch from GSP to VCG incurs significant revenue loss for the search engine. We introduce a transitional mechanism which encourages advertisers to update their bids to their valuations, while mitigating revenue loss. In this setting, it is easier to propose first a payment function rather than an allocation function, so we give a general framework which guarantees incentive compatibility by requiring that the payment functions satisfy two specific properties. Finally, we analyze the revenue impacts of our mechanism on a sample of Bing data.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77594357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Given a large collection of time-evolving activities, such as Google search queries, which consist of d keywords/activities for m locations of duration n, how can we analyze temporal patterns and relationships among all these activities and find location-specific trends? How do we go about capturing non-linear evolutions of local activities and forecasting future patterns? For example, assume that we have the online search volume for multiple keywords, e.g., "Nokia/Nexus/Kindle" or "CNN/BBC" for 236 countries/territories, from 2004 to 2015. Our goal is to analyze a large collection of multi-evolving activities, and specifically, to answer the following questions: (a) Is there any sign of interaction/competition between two different keywords? If so, who competes with whom? (b) In which country is the competition strong? (c) Are there any seasonal/annual activities? (d) How can we automatically detect important world-wide (or local) events? We present COMPCUBE, a unifying non-linear model, which provides a compact and powerful representation of co-evolving activities; and also a novel fitting algorithm, COMPCUBE-FIT, which is parameter-free and scalable. Our method captures the following important patterns: (B)asic trends, i.e., non-linear dynamics of co-evolving activities, signs of (C)ompetition and latent interaction, e.g., Nokia vs. Nexus, (S)easonality, e.g., a Christmas spike for iPod in the U.S. and Europe, and (D)eltas, e.g., unrepeated local events such as the U.S. election in 2008. Thanks to its concise but effective summarization, COMPCUBE can also forecast long-range future activities. Extensive experiments on real datasets demonstrate that COMPCUBE consistently outperforms the best state-of- the-art methods in terms of both accuracy and execution speed.
{"title":"Non-Linear Mining of Competing Local Activities","authors":"Yasuko Matsubara, Yasushi Sakurai, C. Faloutsos","doi":"10.1145/2872427.2883010","DOIUrl":"https://doi.org/10.1145/2872427.2883010","url":null,"abstract":"Given a large collection of time-evolving activities, such as Google search queries, which consist of d keywords/activities for m locations of duration n, how can we analyze temporal patterns and relationships among all these activities and find location-specific trends? How do we go about capturing non-linear evolutions of local activities and forecasting future patterns? For example, assume that we have the online search volume for multiple keywords, e.g., \"Nokia/Nexus/Kindle\" or \"CNN/BBC\" for 236 countries/territories, from 2004 to 2015. Our goal is to analyze a large collection of multi-evolving activities, and specifically, to answer the following questions: (a) Is there any sign of interaction/competition between two different keywords? If so, who competes with whom? (b) In which country is the competition strong? (c) Are there any seasonal/annual activities? (d) How can we automatically detect important world-wide (or local) events? We present COMPCUBE, a unifying non-linear model, which provides a compact and powerful representation of co-evolving activities; and also a novel fitting algorithm, COMPCUBE-FIT, which is parameter-free and scalable. Our method captures the following important patterns: (B)asic trends, i.e., non-linear dynamics of co-evolving activities, signs of (C)ompetition and latent interaction, e.g., Nokia vs. Nexus, (S)easonality, e.g., a Christmas spike for iPod in the U.S. and Europe, and (D)eltas, e.g., unrepeated local events such as the U.S. election in 2008. Thanks to its concise but effective summarization, COMPCUBE can also forecast long-range future activities. Extensive experiments on real datasets demonstrate that COMPCUBE consistently outperforms the best state-of- the-art methods in terms of both accuracy and execution speed.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81906892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frank H. Li, Grant Ho, Eric Kuan, Yuan Niu, L. Ballard, Kurt Thomas, Elie Bursztein, V. Paxson
As miscreants routinely hijack thousands of vulnerable web servers weekly for cheap hosting and traffic acquisition, security services have turned to notifications both to alert webmasters of ongoing incidents as well as to expedite recovery. In this work we present the first large-scale measurement study on the effectiveness of combinations of browser, search, and direct webmaster notifications at reducing the duration a site remains compromised. Our study captures the life cycle of 760,935 hijacking incidents from July, 2014--June, 2015, as identified by Google Safe Browsing and Search Quality. We observe that direct communication with webmasters increases the likelihood of cleanup by over 50% and reduces infection lengths by at least 62%. Absent this open channel for communication, we find browser interstitials---while intended to alert visitors to potentially harmful content---correlate with faster remediation. As part of our study, we also explore whether webmasters exhibit the necessary technical expertise to address hijacking incidents. Based on appeal logs where webmasters alert Google that their site is no longer compromised, we find 80% of operators successfully clean up symptoms on their first appeal. However, a sizeable fraction of site owners do not address the root cause of compromise, with over 12% of sites falling victim to a new attack within 30 days. We distill these findings into a set of recommendations for improving web security and best practices for webmasters.
{"title":"Remedying Web Hijacking: Notification Effectiveness and Webmaster Comprehension","authors":"Frank H. Li, Grant Ho, Eric Kuan, Yuan Niu, L. Ballard, Kurt Thomas, Elie Bursztein, V. Paxson","doi":"10.1145/2872427.2883039","DOIUrl":"https://doi.org/10.1145/2872427.2883039","url":null,"abstract":"As miscreants routinely hijack thousands of vulnerable web servers weekly for cheap hosting and traffic acquisition, security services have turned to notifications both to alert webmasters of ongoing incidents as well as to expedite recovery. In this work we present the first large-scale measurement study on the effectiveness of combinations of browser, search, and direct webmaster notifications at reducing the duration a site remains compromised. Our study captures the life cycle of 760,935 hijacking incidents from July, 2014--June, 2015, as identified by Google Safe Browsing and Search Quality. We observe that direct communication with webmasters increases the likelihood of cleanup by over 50% and reduces infection lengths by at least 62%. Absent this open channel for communication, we find browser interstitials---while intended to alert visitors to potentially harmful content---correlate with faster remediation. As part of our study, we also explore whether webmasters exhibit the necessary technical expertise to address hijacking incidents. Based on appeal logs where webmasters alert Google that their site is no longer compromised, we find 80% of operators successfully clean up symptoms on their first appeal. However, a sizeable fraction of site owners do not address the root cause of compromise, with over 12% of sites falling victim to a new attack within 30 days. We distill these findings into a set of recommendations for improving web security and best practices for webmasters.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88100856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recommender systems play a crucial role in mitigating the information overload problem in social media by suggesting relevant information to users. The popularity of pervasively available social activities for social media users has encouraged a large body of literature on exploiting social networks for recommendation. The vast majority of these systems focus on unsigned social networks (or social networks with only positive links), while little work exists for signed social networks (or social networks with positive and negative links). The availability of negative links in signed social networks presents both challenges and opportunities in the recommendation process. We provide a principled and mathematical approach to exploit signed social networks for recommendation, and propose a model, RecSSN, to leverage positive and negative links in signed social networks. Empirical results on real-world datasets demonstrate the effectiveness of the proposed framework. We also perform further experiments to explicitly understand the effect of signed networks in RecSSN.
{"title":"Recommendations in Signed Social Networks","authors":"Jiliang Tang, C. Aggarwal, Huan Liu","doi":"10.1145/2872427.2882971","DOIUrl":"https://doi.org/10.1145/2872427.2882971","url":null,"abstract":"Recommender systems play a crucial role in mitigating the information overload problem in social media by suggesting relevant information to users. The popularity of pervasively available social activities for social media users has encouraged a large body of literature on exploiting social networks for recommendation. The vast majority of these systems focus on unsigned social networks (or social networks with only positive links), while little work exists for signed social networks (or social networks with positive and negative links). The availability of negative links in signed social networks presents both challenges and opportunities in the recommendation process. We provide a principled and mathematical approach to exploit signed social networks for recommendation, and propose a model, RecSSN, to leverage positive and negative links in signed social networks. Empirical results on real-world datasets demonstrate the effectiveness of the proposed framework. We also perform further experiments to explicitly understand the effect of signed networks in RecSSN.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76606678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many aspects and properties of Recommender Systems have been well studied in the past decade, however, the impact of User Fatigue has been mostly ignored in the literature. User fatigue represents the phenomenon that a user quickly loses the interest on the recommended item if the same item has been presented to this user multiple times before. The direct impact caused by the user fatigue is the dramatic decrease of the Click Through Rate (CTR, i.e., the ratio of clicks to impressions). In this paper, we present a comprehensive study on the research of the user fatigue in online recommender systems. By analyzing user behavioral logs from Bing Now news recommendation, we find that user fatigue is a severe problem that greatly affects the user experience. We also notice that different users engage differently with repeated recommendations. Depending on the previous users' interaction with repeated recommendations, we illustrate that under certain condition the previously seen items should be demoted, while some other times they should be promoted. We demonstrate how statistics about the analysis of the user fatigue can be incorporated into ranking algorithms for personalized recommendations. Our experimental results indicate that significant gains can be achieved by introducing features that reflect users' interaction with previously seen recommendations (up to 15% enhancement on all users and 34% improvement on heavy users).
{"title":"User Fatigue in Online News Recommendation","authors":"Hao Ma, Xueqing Liu, Zhihong Shen","doi":"10.1145/2872427.2874813","DOIUrl":"https://doi.org/10.1145/2872427.2874813","url":null,"abstract":"Many aspects and properties of Recommender Systems have been well studied in the past decade, however, the impact of User Fatigue has been mostly ignored in the literature. User fatigue represents the phenomenon that a user quickly loses the interest on the recommended item if the same item has been presented to this user multiple times before. The direct impact caused by the user fatigue is the dramatic decrease of the Click Through Rate (CTR, i.e., the ratio of clicks to impressions). In this paper, we present a comprehensive study on the research of the user fatigue in online recommender systems. By analyzing user behavioral logs from Bing Now news recommendation, we find that user fatigue is a severe problem that greatly affects the user experience. We also notice that different users engage differently with repeated recommendations. Depending on the previous users' interaction with repeated recommendations, we illustrate that under certain condition the previously seen items should be demoted, while some other times they should be promoted. We demonstrate how statistics about the analysis of the user fatigue can be incorporated into ranking algorithms for personalized recommendations. Our experimental results indicate that significant gains can be achieved by introducing features that reflect users' interaction with previously seen recommendations (up to 15% enhancement on all users and 34% improvement on heavy users).","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78775800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thomas Pellissier Tanon, Denny Vrandečić, Sebastian Schaffert, T. Steiner, Lydia Pintscher
Collaborative knowledge bases that make their data freely available in a machine-readable form are central for the data strategy of many projects and organizations. The two major collaborative knowledge bases are Wikimedia's Wikidata and Google's Freebase. Due to the success of Wikidata, Google decided in 2014 to offer the content of Freebase to the Wikidata community. In this paper, we report on the ongoing transfer efforts and data mapping challenges, and provide an analysis of the effort so far. We describe the Primary Sources Tool, which aims to facilitate this and future data migrations. Throughout the migration, we have gained deep insights into both Wikidata and Freebase, and share and discuss detailed statistics on both knowledge bases.
{"title":"From Freebase to Wikidata: The Great Migration","authors":"Thomas Pellissier Tanon, Denny Vrandečić, Sebastian Schaffert, T. Steiner, Lydia Pintscher","doi":"10.1145/2872427.2874809","DOIUrl":"https://doi.org/10.1145/2872427.2874809","url":null,"abstract":"Collaborative knowledge bases that make their data freely available in a machine-readable form are central for the data strategy of many projects and organizations. The two major collaborative knowledge bases are Wikimedia's Wikidata and Google's Freebase. Due to the success of Wikidata, Google decided in 2014 to offer the content of Freebase to the Wikidata community. In this paper, we report on the ongoing transfer efforts and data mapping challenges, and provide an analysis of the effort so far. We describe the Primary Sources Tool, which aims to facilitate this and future data migrations. Throughout the migration, we have gained deep insights into both Wikidata and Freebase, and share and discuss detailed statistics on both knowledge bases.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73315494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yao Liu, Mengbai Xiao, Ming Zhang, Xin Li, Mian Dong, Zhan Ma, Zhenhua Li, Songqing Chen
During Internet streaming, a significant portion of the battery power is always consumed by the display panel on mobile devices. To reduce the display power consumption, backlight scaling, a scheme that intelligently dims the backlight has been proposed. To maintain perceived video appearance in backlight scaling, a computationally intensive luminance compensation process is required. However, this step, if performed by the CPU as existing schemes suggest, could easily offset the power savings gained from backlight scaling. Furthermore, computing the optimal backlight scaling values requires per-frame luminance information, which is typically too energy intensive for mobile devices to compute. Thus, existing schemes require such information to be available in advance. And such an offline approach makes these schemes impractical. To address these challenges, in this paper, we design and implement GoCAD, a GPU-assisted Online Content-Adaptive Display power saving scheme for mobile devices in Internet streaming sessions. In GoCAD, we employ the mobile device's GPU rather than the CPU to reduce power consumption during the luminance compensation phase. Furthermore, we compute the optimal backlight scaling values for small batches of video frames in an online fashion using a dynamic programming algorithm. Lastly, we make novel use of the widely available video storyboard, a pre-computed set of thumbnails associated with a video, to intelligently decide whether or not to apply our backlight scaling scheme for a given video. For example, when the GPU power consumption would offset the savings from dimming the backlight, no backlight scaling is conducted. To evaluate the performance of GoCAD, we implement a prototype within an Android application and use a Monsoon power monitor to measure the real power consumption. Experiments are conducted on more than 460 randomly selected YouTube videos. Results show that GoCAD can effectively produce power savings without affecting rendered video quality.
{"title":"GoCAD: GPU-Assisted Online Content-Adaptive Display Power Saving for Mobile Devices in Internet Streaming","authors":"Yao Liu, Mengbai Xiao, Ming Zhang, Xin Li, Mian Dong, Zhan Ma, Zhenhua Li, Songqing Chen","doi":"10.1145/2872427.2883064","DOIUrl":"https://doi.org/10.1145/2872427.2883064","url":null,"abstract":"During Internet streaming, a significant portion of the battery power is always consumed by the display panel on mobile devices. To reduce the display power consumption, backlight scaling, a scheme that intelligently dims the backlight has been proposed. To maintain perceived video appearance in backlight scaling, a computationally intensive luminance compensation process is required. However, this step, if performed by the CPU as existing schemes suggest, could easily offset the power savings gained from backlight scaling. Furthermore, computing the optimal backlight scaling values requires per-frame luminance information, which is typically too energy intensive for mobile devices to compute. Thus, existing schemes require such information to be available in advance. And such an offline approach makes these schemes impractical. To address these challenges, in this paper, we design and implement GoCAD, a GPU-assisted Online Content-Adaptive Display power saving scheme for mobile devices in Internet streaming sessions. In GoCAD, we employ the mobile device's GPU rather than the CPU to reduce power consumption during the luminance compensation phase. Furthermore, we compute the optimal backlight scaling values for small batches of video frames in an online fashion using a dynamic programming algorithm. Lastly, we make novel use of the widely available video storyboard, a pre-computed set of thumbnails associated with a video, to intelligently decide whether or not to apply our backlight scaling scheme for a given video. For example, when the GPU power consumption would offset the savings from dimming the backlight, no backlight scaling is conducted. To evaluate the performance of GoCAD, we implement a prototype within an Android application and use a Monsoon power monitor to measure the real power consumption. Experiments are conducted on more than 460 randomly selected YouTube videos. Results show that GoCAD can effectively produce power savings without affecting rendered video quality.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73468097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent research has shown that long documents are unfairly penalised by a number of current retrieval methods. In this paper, we formally analyse two important but distinct reasons for normalising documents with respect to length, namely verbosity and scope, and discuss the practical implications of not normalising accordingly. We review a number of language modelling approaches and a range of recently developed retrieval methods, and show that most do not correctly model both phenomena, thus limiting their retrieval effectiveness in certain situations. Furthermore, the retrieval characteristics of long natural language queries have not traditionally had the same attention as short keyword queries. We develop a new discriminative query language modelling approach that demonstrates improved performance on long verbose queries by appropriately weighting salient aspects of the query. When combined with query expansion, we show that our new approach yields state-of-the-art performance for long verbose queries.
{"title":"A Study of Retrieval Models for Long Documents and Queries in Information Retrieval","authors":"Ronan Cummins","doi":"10.1145/2872427.2883009","DOIUrl":"https://doi.org/10.1145/2872427.2883009","url":null,"abstract":"Recent research has shown that long documents are unfairly penalised by a number of current retrieval methods. In this paper, we formally analyse two important but distinct reasons for normalising documents with respect to length, namely verbosity and scope, and discuss the practical implications of not normalising accordingly. We review a number of language modelling approaches and a range of recently developed retrieval methods, and show that most do not correctly model both phenomena, thus limiting their retrieval effectiveness in certain situations. Furthermore, the retrieval characteristics of long natural language queries have not traditionally had the same attention as short keyword queries. We develop a new discriminative query language modelling approach that demonstrates improved performance on long verbose queries by appropriately weighting salient aspects of the query. When combined with query expansion, we show that our new approach yields state-of-the-art performance for long verbose queries.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73488372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}