Yaowei Zheng, Richong Zhang, Suyuchen Wang, Samuel Mensah, Yongyi Mao
Supervised learning relies heavily on readily available labelled data to infer an effective classification function. However, proposed methods under the supervised learning paradigm are faced with the scarcity of labelled data within domains, and are not generalized enough to adapt to other tasks. Transfer learning has proved to be a worthy choice to address these issues, by allowing knowledge to be shared across domains and tasks. In this paper, we propose two transfer learning methods Anchored Model Transfer (AMT) and Soft Instance Transfer (SIT), which are both based on multi-task learning, and account for model transfer and instance transfer, and can be combined into a common framework. We demonstrate the effectiveness of AMT and SIT for aspect-level sentiment classification showing the competitive performance against baseline models on benchmark datasets. Interestingly, we show that the integration of both methods AMT+SIT achieves state-of-the-art performance on the same task.
{"title":"Anchored Model Transfer and Soft Instance Transfer for Cross-Task Cross-Domain Learning: A Study Through Aspect-Level Sentiment Classification","authors":"Yaowei Zheng, Richong Zhang, Suyuchen Wang, Samuel Mensah, Yongyi Mao","doi":"10.1145/3366423.3380034","DOIUrl":"https://doi.org/10.1145/3366423.3380034","url":null,"abstract":"Supervised learning relies heavily on readily available labelled data to infer an effective classification function. However, proposed methods under the supervised learning paradigm are faced with the scarcity of labelled data within domains, and are not generalized enough to adapt to other tasks. Transfer learning has proved to be a worthy choice to address these issues, by allowing knowledge to be shared across domains and tasks. In this paper, we propose two transfer learning methods Anchored Model Transfer (AMT) and Soft Instance Transfer (SIT), which are both based on multi-task learning, and account for model transfer and instance transfer, and can be combined into a common framework. We demonstrate the effectiveness of AMT and SIT for aspect-level sentiment classification showing the competitive performance against baseline models on benchmark datasets. Interestingly, we show that the integration of both methods AMT+SIT achieves state-of-the-art performance on the same task.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79881668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adithya Kumar, Iyswarya Narayanan, T. Zhu, A. Sivasubramaniam
Small and medium sized enterprises use the cloud for running online, user-facing, tail latency sensitive applications with well-defined fixed monthly budgets. For these applications, adequate system capacity must be provisioned to extract maximal performance despite the challenges of uncertainties in load and request-sizes. In this paper, we address the problem of capacity provisioning under fixed budget constraints with the goal of minimizing tail latency. To tackle this problem, we propose building systems using a heterogeneous mix of low latency expensive resources and cheap resources that provide high throughput per dollar. As load changes through the day, we use more faster resources to reduce tail latency during low load periods and more cheaper resources to handle the high load periods. To achieve these tail latency benefits, we introduce novel heterogeneity-aware scheduling and autoscaling algorithms that are designed for minimizing tail latency. Using software prototypes and by running experiments on the public cloud, we show that our approach can outperform existing capacity provisioning systems by reducing the tail latency by as much as 45% under fixed-budget settings.
{"title":"The Fast and The Frugal: Tail Latency Aware Provisioning for Coping with Load Variations","authors":"Adithya Kumar, Iyswarya Narayanan, T. Zhu, A. Sivasubramaniam","doi":"10.1145/3366423.3380117","DOIUrl":"https://doi.org/10.1145/3366423.3380117","url":null,"abstract":"Small and medium sized enterprises use the cloud for running online, user-facing, tail latency sensitive applications with well-defined fixed monthly budgets. For these applications, adequate system capacity must be provisioned to extract maximal performance despite the challenges of uncertainties in load and request-sizes. In this paper, we address the problem of capacity provisioning under fixed budget constraints with the goal of minimizing tail latency. To tackle this problem, we propose building systems using a heterogeneous mix of low latency expensive resources and cheap resources that provide high throughput per dollar. As load changes through the day, we use more faster resources to reduce tail latency during low load periods and more cheaper resources to handle the high load periods. To achieve these tail latency benefits, we introduce novel heterogeneity-aware scheduling and autoscaling algorithms that are designed for minimizing tail latency. Using software prototypes and by running experiments on the public cloud, we show that our approach can outperform existing capacity provisioning systems by reducing the tail latency by as much as 45% under fixed-budget settings.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77997205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Keyword search is a prominent approach to querying Web data. For graph-structured data, a widely accepted semantics for keywords is based on group Steiner trees. For this NP-hard problem, existing algorithms with provable quality guarantees have prohibitive run time on large graphs. In this paper, we propose practical approximation algorithms with a guaranteed quality of computed answers and very low run time. Our algorithms rely on Hub Labeling (HL), a structure that labels each vertex in a graph with a list of vertices reachable from it, which we use to compute distances and shortest paths. We devise two HLs: a conventional static HL that uses a new heuristic to improve pruned landmark labeling, and a novel dynamic HL that inverts and aggregates query-relevant static labels to more efficiently process vertex sets. Our approach allows to compute a reasonably good approximation of answers to keyword queries in milliseconds on million-scale knowledge graphs.
{"title":"Keyword Search over Knowledge Graphs via Static and Dynamic Hub Labelings","authors":"Yuxuan Shi, Gong Cheng, E. Kharlamov","doi":"10.1145/3366423.3380110","DOIUrl":"https://doi.org/10.1145/3366423.3380110","url":null,"abstract":"Keyword search is a prominent approach to querying Web data. For graph-structured data, a widely accepted semantics for keywords is based on group Steiner trees. For this NP-hard problem, existing algorithms with provable quality guarantees have prohibitive run time on large graphs. In this paper, we propose practical approximation algorithms with a guaranteed quality of computed answers and very low run time. Our algorithms rely on Hub Labeling (HL), a structure that labels each vertex in a graph with a list of vertices reachable from it, which we use to compute distances and shortest paths. We devise two HLs: a conventional static HL that uses a new heuristic to improve pruned landmark labeling, and a novel dynamic HL that inverts and aggregates query-relevant static labels to more efficiently process vertex sets. Our approach allows to compute a reasonably good approximation of answers to keyword queries in milliseconds on million-scale knowledge graphs.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"24 3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86940128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huda Nassar, Caitlin Kennedy, Shweta Jain, Austin R. Benson, D. Gleich
In the simplest setting, graph visualization is the problem of producing a set of two-dimensional coordinates for each node that meaningfully shows connections and latent structure in a graph. Among other uses, having a meaningful layout is often useful to help interpret the results from network science tasks such as community detection and link prediction. There are several existing graph visualization techniques in the literature that are based on spectral methods, graph embeddings, or optimizing graph distances. Despite the large number of methods, it is still often challenging or extremely time consuming to produce meaningful layouts of graphs with hundreds of thousands of vertices. Existing methods often either fail to produce a visualization in a meaningful time window, or produce a layout colorfully called a “hairball”, which does not illustrate any internal structure in the graph. Here, we show that adding higher-order information based on cliques to a classic eigenvector based graph visualization technique enables it to produce meaningful plots of large graphs. We further evaluate these visualizations along a number of graph visualization metrics and we find that it outperforms existing techniques on a metric that uses random walks to measure the local structure. Finally, we show many examples of how our algorithm successfully produces layouts of large networks. Code to reproduce our results is available.
{"title":"Using Cliques with Higher-order Spectral Embeddings Improves Graph Visualizations","authors":"Huda Nassar, Caitlin Kennedy, Shweta Jain, Austin R. Benson, D. Gleich","doi":"10.1145/3366423.3380059","DOIUrl":"https://doi.org/10.1145/3366423.3380059","url":null,"abstract":"In the simplest setting, graph visualization is the problem of producing a set of two-dimensional coordinates for each node that meaningfully shows connections and latent structure in a graph. Among other uses, having a meaningful layout is often useful to help interpret the results from network science tasks such as community detection and link prediction. There are several existing graph visualization techniques in the literature that are based on spectral methods, graph embeddings, or optimizing graph distances. Despite the large number of methods, it is still often challenging or extremely time consuming to produce meaningful layouts of graphs with hundreds of thousands of vertices. Existing methods often either fail to produce a visualization in a meaningful time window, or produce a layout colorfully called a “hairball”, which does not illustrate any internal structure in the graph. Here, we show that adding higher-order information based on cliques to a classic eigenvector based graph visualization technique enables it to produce meaningful plots of large graphs. We further evaluate these visualizations along a number of graph visualization metrics and we find that it outperforms existing techniques on a metric that uses random walks to measure the local structure. Finally, we show many examples of how our algorithm successfully produces layouts of large networks. Code to reproduce our results is available.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90276559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Datacenters host a mix of applications: foreground applications perform distributed lookups in order to service user queries and background applications perform batch processing tasks such as data reorganization, backup, and replication. While background flows produce the most load, foreground applications produce the most number of flows. Because packets from both types of applications compete at switches for network bandwidth, the performance of applications is sensitive to scheduling mechanisms. Existing schedulers use flow size to distinguish critical flows from non-critical flows. However, recent studies on datacenter workloads reveal that most flows are small (e.g., most flows consist of only a handful number of packets). In light of recent findings, we make the key observation that because most flows are small, flow size is not sufficient to distinguish critical flows from non-critical flows and therefore existing flow schedulers do not achieve the desired prioritization. In this paper, we introduce ResQueue, which uses a combination of flow size and packet history to calculate the priority of each flow. Our evaluation shows that ResQueue improves tail flow completion times of short flows by up to 60% over the state-of-the-art flow scheduling mechanisms.
{"title":"ResQueue: A Smarter Datacenter Flow Scheduler","authors":"Hamed Rezaei, Balajee Vamanan","doi":"10.1145/3366423.3380012","DOIUrl":"https://doi.org/10.1145/3366423.3380012","url":null,"abstract":"Datacenters host a mix of applications: foreground applications perform distributed lookups in order to service user queries and background applications perform batch processing tasks such as data reorganization, backup, and replication. While background flows produce the most load, foreground applications produce the most number of flows. Because packets from both types of applications compete at switches for network bandwidth, the performance of applications is sensitive to scheduling mechanisms. Existing schedulers use flow size to distinguish critical flows from non-critical flows. However, recent studies on datacenter workloads reveal that most flows are small (e.g., most flows consist of only a handful number of packets). In light of recent findings, we make the key observation that because most flows are small, flow size is not sufficient to distinguish critical flows from non-critical flows and therefore existing flow schedulers do not achieve the desired prioritization. In this paper, we introduce ResQueue, which uses a combination of flow size and packet history to calculate the priority of each flow. Our evaluation shows that ResQueue improves tail flow completion times of short flows by up to 60% over the state-of-the-art flow scheduling mechanisms.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86493573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Future AI systems will be key contributors to science, but this is unlikely to happen unless we reinvent our current publications and embed our scientific records in the Web as structured Web objects. This implies that our scientific papers of the future will be complemented with explicit, structured descriptions of the experiments, software, data, and workflows used to reach new findings. These scientific papers of the future will not only culminate the promise of open science and reproducible research, but also enable the creation of AI systems that can ingest and organize scientific methods and processes, re-run experiments and re-analyze results, and explore their own hypothesis in systematic and unbiased ways. In this talk, I will describe guidelines for writing scientific papers of the future that embed the scientific record on the Web, and our progress on AI systems capable of using them to systematically explore experiments. I will also outline a research agenda with seven key characteristics for creating AI scientists that will exploit the Web to independently make new discoveries [1]. AI scientists have the potential to transform science and the processes of scientific discovery [2, 3].
{"title":"Embedding the Scientific Record on the Web: Towards Automating Scientific Discoveries","authors":"Y. Gil","doi":"10.1145/3366423.3382667","DOIUrl":"https://doi.org/10.1145/3366423.3382667","url":null,"abstract":"Future AI systems will be key contributors to science, but this is unlikely to happen unless we reinvent our current publications and embed our scientific records in the Web as structured Web objects. This implies that our scientific papers of the future will be complemented with explicit, structured descriptions of the experiments, software, data, and workflows used to reach new findings. These scientific papers of the future will not only culminate the promise of open science and reproducible research, but also enable the creation of AI systems that can ingest and organize scientific methods and processes, re-run experiments and re-analyze results, and explore their own hypothesis in systematic and unbiased ways. In this talk, I will describe guidelines for writing scientific papers of the future that embed the scientific record on the Web, and our progress on AI systems capable of using them to systematically explore experiments. I will also outline a research agenda with seven key characteristics for creating AI scientists that will exploit the Web to independently make new discoveries [1]. AI scientists have the potential to transform science and the processes of scientific discovery [2, 3].","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82686326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chun Lo, Emilie de Longueau, Ankan Saha, S. Chatterjee
Social networks act as major content marketplaces where creators and consumers come together to share and consume various kinds of content. Content ranking applications (e.g., newsfeed, moments, notifications) and edge recommendation products (e.g., connect to members, follow celebrities or groups or hashtags) on such platforms aim at improving the consumer experience. In this work, we focus on the creator experience and specifically on improving edge recommendations to better serve creators in such ecosystems. The audience and reach of creators – individuals, celebrities, publishers and companies – are critically shaped by these edge recommendation products. Hence, incorporating creator utility in such recommendations can have a material impact on their success, and in turn, on the marketplace. In this paper, we (i) propose a general framework to incorporate creator utility in edge recommendations, (ii) devise a specific method to estimate edge-level creator utilities for currently unformed edges, (iii) outline the challenges of measurement and propose a practical experiment design, and finally (iv) discuss the implementation of our proposal at scale on LinkedIn, a professional network with 645M+ members, and report our findings.
{"title":"Edge formation in Social Networks to Nurture Content Creators","authors":"Chun Lo, Emilie de Longueau, Ankan Saha, S. Chatterjee","doi":"10.1145/3366423.3380267","DOIUrl":"https://doi.org/10.1145/3366423.3380267","url":null,"abstract":"Social networks act as major content marketplaces where creators and consumers come together to share and consume various kinds of content. Content ranking applications (e.g., newsfeed, moments, notifications) and edge recommendation products (e.g., connect to members, follow celebrities or groups or hashtags) on such platforms aim at improving the consumer experience. In this work, we focus on the creator experience and specifically on improving edge recommendations to better serve creators in such ecosystems. The audience and reach of creators – individuals, celebrities, publishers and companies – are critically shaped by these edge recommendation products. Hence, incorporating creator utility in such recommendations can have a material impact on their success, and in turn, on the marketplace. In this paper, we (i) propose a general framework to incorporate creator utility in edge recommendations, (ii) devise a specific method to estimate edge-level creator utilities for currently unformed edges, (iii) outline the challenges of measurement and propose a practical experiment design, and finally (iv) discuss the implementation of our proposal at scale on LinkedIn, a professional network with 645M+ members, and report our findings.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82244935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christof Naumzik, Patrick Zoechbauer, S. Feuerriegel
Points-of-interest (POIs; i.e., restaurants, bars, landmarks, and other entities) are common in web-mined data: they greatly explain the spatial distributions of urban phenomena. The conventional modeling approach relies upon feature engineering, yet it ignores the spatial structure among POIs. In order to overcome this shortcoming, the present paper proposes a novel spatial model for explaining spatial distributions based on web-mined POIs. Our key contributions are: (1) We present a rigorous yet highly interpretable formalization in order to model the influence of POIs on a given outcome variable. Specifically, we accommodate the spatial distributions of both the outcome and POIs. In our case, this modeled by the sum of latent Gaussian processes. (2) In contrast to previous literature, our model infers the influence of POIs without feature engineering, instead we model the influence of POIs via distance-weighted kernel functions with fully learnable parameterizations. (3) We propose a scalable learning algorithm based on sparse variational approximation. For this purpose, we derive a tailored evidence lower bound (ELBO) and, for appropriate likelihoods, we even show that an analytical expression can be obtained. This allows fast and accurate computation of the ELBO. Finally, the value of our approach for web mining is demonstrated in two real-world case studies. Our findings provide substantial improvements over state-of-the-art baselines with regard to both predictive and, in particular, explanatory performance. Altogether, this yields a novel spatial model for leveraging web-mined POIs. Within the context of location-based social networks, it promises an extensive range of new insights and use cases.
{"title":"Mining Points-of-Interest for Explaining Urban Phenomena: A Scalable Variational Inference Approach","authors":"Christof Naumzik, Patrick Zoechbauer, S. Feuerriegel","doi":"10.1145/3366423.3380298","DOIUrl":"https://doi.org/10.1145/3366423.3380298","url":null,"abstract":"Points-of-interest (POIs; i.e., restaurants, bars, landmarks, and other entities) are common in web-mined data: they greatly explain the spatial distributions of urban phenomena. The conventional modeling approach relies upon feature engineering, yet it ignores the spatial structure among POIs. In order to overcome this shortcoming, the present paper proposes a novel spatial model for explaining spatial distributions based on web-mined POIs. Our key contributions are: (1) We present a rigorous yet highly interpretable formalization in order to model the influence of POIs on a given outcome variable. Specifically, we accommodate the spatial distributions of both the outcome and POIs. In our case, this modeled by the sum of latent Gaussian processes. (2) In contrast to previous literature, our model infers the influence of POIs without feature engineering, instead we model the influence of POIs via distance-weighted kernel functions with fully learnable parameterizations. (3) We propose a scalable learning algorithm based on sparse variational approximation. For this purpose, we derive a tailored evidence lower bound (ELBO) and, for appropriate likelihoods, we even show that an analytical expression can be obtained. This allows fast and accurate computation of the ELBO. Finally, the value of our approach for web mining is demonstrated in two real-world case studies. Our findings provide substantial improvements over state-of-the-art baselines with regard to both predictive and, in particular, explanatory performance. Altogether, this yields a novel spatial model for leveraging web-mined POIs. Within the context of location-based social networks, it promises an extensive range of new insights and use cases.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"505 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75214721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David Carmel, Elad Haramaty, Arnon Lazerson, L. Lewin-Eytan
Learning a ranking model in product search involves satisfying many requirements such as maximizing the relevance of retrieved products with respect to the user query, as well as maximizing the purchase likelihood of these products. Multi-Objective Ranking Optimization (MORO) is the task of learning a ranking model from training examples while optimizing multiple objectives simultaneously. Label aggregation is a popular solution approach for multi-objective optimization, which reduces the problem into a single objective optimization problem, by aggregating the multiple labels of the training examples, related to the different objectives, to a single label. In this work we explore several label aggregation methods for MORO in product search. We propose a novel stochastic label aggregation method which randomly selects a label per training example according to a given distribution over the labels. We provide a theoretical proof showing that stochastic label aggregation is superior to alternative aggregation approaches, in the sense that any optimal solution of the MORO problem can be generated by a proper parameter setting of the stochastic aggregation process. We experiment on three different datasets: two from the voice product search domain, and one publicly available dataset from the Web product search domain. We demonstrate empirically over these three datasets that MORO with stochastic label aggregation provides a family of ranking models that fully dominates the set of MORO models built using deterministic label aggregation.
{"title":"Multi-Objective Ranking Optimization for Product Search Using Stochastic Label Aggregation","authors":"David Carmel, Elad Haramaty, Arnon Lazerson, L. Lewin-Eytan","doi":"10.1145/3366423.3380122","DOIUrl":"https://doi.org/10.1145/3366423.3380122","url":null,"abstract":"Learning a ranking model in product search involves satisfying many requirements such as maximizing the relevance of retrieved products with respect to the user query, as well as maximizing the purchase likelihood of these products. Multi-Objective Ranking Optimization (MORO) is the task of learning a ranking model from training examples while optimizing multiple objectives simultaneously. Label aggregation is a popular solution approach for multi-objective optimization, which reduces the problem into a single objective optimization problem, by aggregating the multiple labels of the training examples, related to the different objectives, to a single label. In this work we explore several label aggregation methods for MORO in product search. We propose a novel stochastic label aggregation method which randomly selects a label per training example according to a given distribution over the labels. We provide a theoretical proof showing that stochastic label aggregation is superior to alternative aggregation approaches, in the sense that any optimal solution of the MORO problem can be generated by a proper parameter setting of the stochastic aggregation process. We experiment on three different datasets: two from the voice product search domain, and one publicly available dataset from the Web product search domain. We demonstrate empirically over these three datasets that MORO with stochastic label aggregation provides a family of ranking models that fully dominates the set of MORO models built using deterministic label aggregation.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73630719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Jenkins, Jennifer Zhao, Heath Vinicombe, Anant Subramanian, Arun Prasad, Atillia Dobi, E. Li, Yunsong Guo
Understanding content at scale is a difficult but important problem for many platforms. Many previous studies focus on content understanding to optimize engagement with existing users. However, little work studies how to leverage better content understanding to attract new users. In this work, we build a framework for generating natural language content annotations and show how they can be used for search engine optimization. The proposed framework relies on an XGBoost model that labels “pins” with high probability phrases, and a logistic regression layer that learns to rank aggregated annotations for groups of content. The pipeline identifies keywords that are descriptive and contextually meaningful. We perform a large-scale production experiment deployed on the Pinterest platform and show that natural language annotations cause a 1-2% increase in traffic from leading search engines. This increase is statistically significant. Finally, we explore and interpret the characteristics of our annotations framework.
{"title":"Natural Language Annotations for Search Engine Optimization","authors":"P. Jenkins, Jennifer Zhao, Heath Vinicombe, Anant Subramanian, Arun Prasad, Atillia Dobi, E. Li, Yunsong Guo","doi":"10.1145/3366423.3380049","DOIUrl":"https://doi.org/10.1145/3366423.3380049","url":null,"abstract":"Understanding content at scale is a difficult but important problem for many platforms. Many previous studies focus on content understanding to optimize engagement with existing users. However, little work studies how to leverage better content understanding to attract new users. In this work, we build a framework for generating natural language content annotations and show how they can be used for search engine optimization. The proposed framework relies on an XGBoost model that labels “pins” with high probability phrases, and a logistic regression layer that learns to rank aggregated annotations for groups of content. The pipeline identifies keywords that are descriptive and contextually meaningful. We perform a large-scale production experiment deployed on the Pinterest platform and show that natural language annotations cause a 1-2% increase in traffic from leading search engines. This increase is statistically significant. Finally, we explore and interpret the characteristics of our annotations framework.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76820148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}