首页 > 最新文献

Proceedings of the 25th International Conference on World Wide Web最新文献

英文 中文
Mining User Intentions from Medical Queries: A Neural Network Based Heterogeneous Jointly Modeling Approach 从医疗查询中挖掘用户意图:一种基于神经网络的异构联合建模方法
Pub Date : 2016-04-11 DOI: 10.1145/2872427.2874810
Chenwei Zhang, Wei Fan, Nan Du, Philip S. Yu
Text queries are naturally encoded with user intentions. An intention detection task tries to model and discover intentions that user encoded in text queries. Unlike conventional text classification tasks where the label of text is highly correlated with some topic-specific words, words from different topic categories tend to co-occur in medical related queries. Besides the existence of topic-specific words and word order, word correlations and the way words organized into sentence are crucial to intention detection tasks. In this paper, we present a neural network based jointly modeling approach to model and capture user intentions in medical related text queries. Regardless of the exact words in text queries, the proposed method incorporates two types of heterogeneous information: 1) pairwise word feature correlations and 2) part-of-speech tags of a sentence to jointly model user intentions. Variable-length text queries are first inherently taken care of by a fixed-size pairwise feature correlation matrix. Moreover, convolution and pooling operations are applied on feature correlations to fully exploit latent semantic structure within the query. Sentence rephrasing is finally introduced as a data augmentation technique to improve model generalization ability during model training. Experiment results on real world medical queries have shown that the proposed method is able to extract complete and precise user intentions from text queries.
文本查询自然是用用户意图编码的。意图检测任务试图对用户在文本查询中编码的意图进行建模和发现。与传统的文本分类任务(文本的标签与某些特定主题的词高度相关)不同,来自不同主题类别的词往往同时出现在医学相关查询中。除了特定主题词和词序的存在外,词的相关性和词的句子组织方式对意图检测任务至关重要。在本文中,我们提出了一种基于神经网络的联合建模方法来建模和捕获医学相关文本查询中的用户意图。在不考虑文本查询中的确切词的情况下,该方法结合了两类异构信息:1)成对词特征关联和2)句子词性标签,共同建模用户意图。可变长度的文本查询首先由固定大小的两两特征相关矩阵来处理。此外,在特征关联上应用卷积和池化操作,充分利用查询中潜在的语义结构。最后介绍了句子改写作为一种数据增强技术,在模型训练过程中提高模型泛化能力。实际医疗查询的实验结果表明,该方法能够从文本查询中提取完整、精确的用户意图。
{"title":"Mining User Intentions from Medical Queries: A Neural Network Based Heterogeneous Jointly Modeling Approach","authors":"Chenwei Zhang, Wei Fan, Nan Du, Philip S. Yu","doi":"10.1145/2872427.2874810","DOIUrl":"https://doi.org/10.1145/2872427.2874810","url":null,"abstract":"Text queries are naturally encoded with user intentions. An intention detection task tries to model and discover intentions that user encoded in text queries. Unlike conventional text classification tasks where the label of text is highly correlated with some topic-specific words, words from different topic categories tend to co-occur in medical related queries. Besides the existence of topic-specific words and word order, word correlations and the way words organized into sentence are crucial to intention detection tasks. In this paper, we present a neural network based jointly modeling approach to model and capture user intentions in medical related text queries. Regardless of the exact words in text queries, the proposed method incorporates two types of heterogeneous information: 1) pairwise word feature correlations and 2) part-of-speech tags of a sentence to jointly model user intentions. Variable-length text queries are first inherently taken care of by a fixed-size pairwise feature correlation matrix. Moreover, convolution and pooling operations are applied on feature correlations to fully exploit latent semantic structure within the query. Sentence rephrasing is finally introduced as a data augmentation technique to improve model generalization ability during model training. Experiment results on real world medical queries have shown that the proposed method is able to extract complete and precise user intentions from text queries.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76436484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
Exploiting Green Energy to Reduce the Operational Costs of Multi-Center Web Search Engines 利用绿色能源降低多中心网络搜索引擎的运行成本
Pub Date : 2016-04-11 DOI: 10.1145/2872427.2883021
Roi Blanco, Matteo Catena, N. Tonellotto
Carbon dioxide emissions resulting from fossil fuels (brown energy) combustion are the main cause of global warming due to the greenhouse effect. Large IT companies have recently increased their efforts in reducing the carbon dioxide footprint originated from their data center electricity consumption. On one hand, better infrastructure and modern hardware allow for a more efficient usage of electric resources. On the other hand, data-centers can be powered by renewable sources (green energy) that are both environmental friendly and economically convenient. In this paper, we tackle the problem of targeting the usage of green energy to minimize the expenditure of running multi-center Web search engines, i.e., systems composed by multiple, geographically remote, computing facilities. We propose a mathematical model to minimize the operational costs of multi-center Web search engines by exploiting renewable energies whenever available at different locations. Using this model, we design an algorithm which decides what fraction of the incoming query load arriving into one processing facility must be forwarded to be processed at different sites to use green energy sources. We experiment using real traffic from a large search engine and we compare our model against state of the art baselines for query forwarding. Our experimental results show that the proposed solution maintains an high query throughput, while reducing by up to ~25% the energy operational costs of multi-center search engines. Additionally, our algorithm can reduce the brown energy consumption by almost 6% when energy-proportional servers are employed.
由于温室效应,化石燃料(棕色能源)燃烧产生的二氧化碳排放是全球变暖的主要原因。大型IT公司最近加大了减少数据中心电力消耗产生的二氧化碳足迹的努力。一方面,更好的基础设施和现代化的硬件可以更有效地利用电力资源。另一方面,数据中心可以由既环保又经济方便的可再生能源(绿色能源)供电。在本文中,我们解决了目标使用绿色能源的问题,以尽量减少运行多中心Web搜索引擎的支出,即由多个地理上遥远的计算设施组成的系统。我们提出了一个数学模型,通过在不同地点利用可再生能源来最小化多中心网络搜索引擎的运营成本。利用该模型,我们设计了一种算法,该算法决定到达一个处理设施的传入查询负载的哪些部分必须转发到不同的站点进行处理,以使用绿色能源。我们使用来自大型搜索引擎的真实流量进行实验,并将我们的模型与查询转发的最新基线进行比较。实验结果表明,该方法保持了较高的查询吞吐量,同时将多中心搜索引擎的能量运行成本降低了25%。此外,当使用能量比例服务器时,我们的算法可以减少近6%的棕色能源消耗。
{"title":"Exploiting Green Energy to Reduce the Operational Costs of Multi-Center Web Search Engines","authors":"Roi Blanco, Matteo Catena, N. Tonellotto","doi":"10.1145/2872427.2883021","DOIUrl":"https://doi.org/10.1145/2872427.2883021","url":null,"abstract":"Carbon dioxide emissions resulting from fossil fuels (brown energy) combustion are the main cause of global warming due to the greenhouse effect. Large IT companies have recently increased their efforts in reducing the carbon dioxide footprint originated from their data center electricity consumption. On one hand, better infrastructure and modern hardware allow for a more efficient usage of electric resources. On the other hand, data-centers can be powered by renewable sources (green energy) that are both environmental friendly and economically convenient. In this paper, we tackle the problem of targeting the usage of green energy to minimize the expenditure of running multi-center Web search engines, i.e., systems composed by multiple, geographically remote, computing facilities. We propose a mathematical model to minimize the operational costs of multi-center Web search engines by exploiting renewable energies whenever available at different locations. Using this model, we design an algorithm which decides what fraction of the incoming query load arriving into one processing facility must be forwarded to be processed at different sites to use green energy sources. We experiment using real traffic from a large search engine and we compare our model against state of the art baselines for query forwarding. Our experimental results show that the proposed solution maintains an high query throughput, while reducing by up to ~25% the energy operational costs of multi-center search engines. Additionally, our algorithm can reduce the brown energy consumption by almost 6% when energy-proportional servers are employed.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73760850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Economic Recommendation with Surplus Maximization 盈余最大化的经济建议
Pub Date : 2016-04-11 DOI: 10.1145/2872427.2882973
Yongfeng Zhang, Qi Zhao, Yi Zhang, D. Friedman, Min Zhang, Yiqun Liu, Shaoping Ma
A prime function of many major World Wide Web applications is Online Service Allocation (OSA), the function of matching individual consumers with particular services/goods (which may include loans or jobs as well as products) each with its own producer. In the applications of interest, consumers are free to choose, so OSA usually takes the form of personalized recommendation or search in practice. The performance metrics of recommender and search systems currently tend to focus on just one side of the match, in some cases the consumers (e.g. satisfaction) and in other cases the producers (e.g., profit). However, a sustainable OSA platform needs benefit both consumers and producers; otherwise the neglected party eventually may stop using it. In this paper, we show how to adapt economists' traditional idea of maximizing total surplus (the sum of consumer net benefit and producer profit) to the heterogeneous world of online service allocation, in an effort to promote the web intelligence for social good in online eco-systems. Modifications of traditional personalized recommendation algorithms enable us to apply Total Surplus Maximization (TSM) to three very different types of real-world tasks -- e-commerce, P2P lending and freelancing. The results for all three tasks suggest that TSM compares very favorably to currently popular approaches, to the benefit of both producers and consumers.
许多主要的万维网应用程序的主要功能是在线服务分配(OSA),该功能将单个消费者与特定的服务/商品(可能包括贷款或工作以及产品)相匹配,每个服务/商品都有自己的生产者。在感兴趣的应用中,消费者可以自由选择,因此OSA在实践中通常采取个性化推荐或搜索的形式。推荐和搜索系统的性能指标目前倾向于只关注匹配的一方,在某些情况下是消费者(如满意度),在其他情况下是生产者(如利润)。然而,可持续的OSA平台需要对消费者和生产者都有利;否则,被忽视的一方最终可能会停止使用它。在本文中,我们展示了如何将经济学家的总剩余最大化(消费者净利益和生产者利润的总和)的传统思想适应于网络服务分配的异构世界,以促进网络生态系统中社会利益的网络智能。对传统个性化推荐算法的改进使我们能够将总剩余最大化(TSM)应用于三种非常不同类型的现实世界任务——电子商务、P2P借贷和自由职业。这三个任务的结果表明,TSM与目前流行的方法相比非常有利,这对生产者和消费者都有利。
{"title":"Economic Recommendation with Surplus Maximization","authors":"Yongfeng Zhang, Qi Zhao, Yi Zhang, D. Friedman, Min Zhang, Yiqun Liu, Shaoping Ma","doi":"10.1145/2872427.2882973","DOIUrl":"https://doi.org/10.1145/2872427.2882973","url":null,"abstract":"A prime function of many major World Wide Web applications is Online Service Allocation (OSA), the function of matching individual consumers with particular services/goods (which may include loans or jobs as well as products) each with its own producer. In the applications of interest, consumers are free to choose, so OSA usually takes the form of personalized recommendation or search in practice. The performance metrics of recommender and search systems currently tend to focus on just one side of the match, in some cases the consumers (e.g. satisfaction) and in other cases the producers (e.g., profit). However, a sustainable OSA platform needs benefit both consumers and producers; otherwise the neglected party eventually may stop using it. In this paper, we show how to adapt economists' traditional idea of maximizing total surplus (the sum of consumer net benefit and producer profit) to the heterogeneous world of online service allocation, in an effort to promote the web intelligence for social good in online eco-systems. Modifications of traditional personalized recommendation algorithms enable us to apply Total Surplus Maximization (TSM) to three very different types of real-world tasks -- e-commerce, P2P lending and freelancing. The results for all three tasks suggest that TSM compares very favorably to currently popular approaches, to the benefit of both producers and consumers.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74494527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
A Piggyback System for Joint Entity Mention Detection and Linking in Web Queries Web查询中联合实体提及检测与链接的背载系统
Pub Date : 2016-04-11 DOI: 10.1145/2872427.2883061
M. Cornolti, P. Ferragina, Massimiliano Ciaramita, Stefan Rüd, Hinrich Schütze
In this paper we study the problem of linking open-domain web-search queries towards entities drawn from the full entity inventory of Wikipedia articles. We introduce SMAPH-2, a second-order approach that, by piggybacking on a web search engine, alleviates the noise and irregularities that characterize the language of queries and puts queries in a larger context in which it is easier to make sense of them. The key algorithmic idea underlying SMAPH-2 is to first discover a candidate set of entities and then link-back those entities to their mentions occurring in the input query. This allows us to confine the possible concepts pertinent to the query to only the ones really mentioned in it. The link-back is implemented via a collective disambiguation step based upon a supervised ranking model that makes one joint prediction for the annotation of the complete query optimizing directly the F1 measure. We evaluate both known features, such as word embeddings and semantic relatedness among entities, and several novel features such as an approximate distance between mentions and entities (which can handle spelling errors). We demonstrate that SMAPH-2 achieves state-of-the-art performance on the ERD@SIGIR2014 benchmark. We also publish GERDAQ (General Entity Recognition, Disambiguation and Annotation in Queries), a novel, public dataset built specifically for web-query entity linking via a crowdsourcing effort. SMAPH-2 outperforms the benchmarks by comparable margins also on GERDAQ.
在本文中,我们研究了将开放域网络搜索查询链接到维基百科文章的完整实体目录中的实体的问题。我们引入了SMAPH-2,这是一种二阶方法,通过承载web搜索引擎,减轻了查询语言特征的噪音和不规则性,并将查询置于更大的上下文中,更容易理解它们。SMAPH-2的关键算法思想是首先发现候选实体集,然后将这些实体链接回输入查询中出现的提及。这允许我们将与查询相关的可能概念限制为查询中真正提到的概念。该链接是通过基于监督排序模型的集体消歧步骤实现的,该模型对直接优化F1度量的完整查询的注释进行联合预测。我们评估了已知的特征,如词嵌入和实体之间的语义相关性,以及几个新特征,如提及和实体之间的近似距离(可以处理拼写错误)。我们证明SMAPH-2在ERD@SIGIR2014基准上实现了最先进的性能。我们还发布了GERDAQ(查询中的通用实体识别、消歧和注释),这是一个新颖的公共数据集,专门为通过众包努力建立的web查询实体链接而构建。SMAPH-2的表现也比GERDAQ的基准股指高出相当的利润率。
{"title":"A Piggyback System for Joint Entity Mention Detection and Linking in Web Queries","authors":"M. Cornolti, P. Ferragina, Massimiliano Ciaramita, Stefan Rüd, Hinrich Schütze","doi":"10.1145/2872427.2883061","DOIUrl":"https://doi.org/10.1145/2872427.2883061","url":null,"abstract":"In this paper we study the problem of linking open-domain web-search queries towards entities drawn from the full entity inventory of Wikipedia articles. We introduce SMAPH-2, a second-order approach that, by piggybacking on a web search engine, alleviates the noise and irregularities that characterize the language of queries and puts queries in a larger context in which it is easier to make sense of them. The key algorithmic idea underlying SMAPH-2 is to first discover a candidate set of entities and then link-back those entities to their mentions occurring in the input query. This allows us to confine the possible concepts pertinent to the query to only the ones really mentioned in it. The link-back is implemented via a collective disambiguation step based upon a supervised ranking model that makes one joint prediction for the annotation of the complete query optimizing directly the F1 measure. We evaluate both known features, such as word embeddings and semantic relatedness among entities, and several novel features such as an approximate distance between mentions and entities (which can handle spelling errors). We demonstrate that SMAPH-2 achieves state-of-the-art performance on the ERD@SIGIR2014 benchmark. We also publish GERDAQ (General Entity Recognition, Disambiguation and Annotation in Queries), a novel, public dataset built specifically for web-query entity linking via a crowdsourcing effort. SMAPH-2 outperforms the benchmarks by comparable margins also on GERDAQ.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83917914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Exploiting Dining Preference for Restaurant Recommendation 利用用餐偏好进行餐厅推荐
Pub Date : 2016-04-11 DOI: 10.1145/2872427.2882995
Fuzheng Zhang, Nicholas Jing Yuan, Kai Zheng, Defu Lian, Xing Xie, Y. Rui
The wide adoption of location-based services provide the potential to understand people's mobility pattern at an unprecedented level, which can also enable food-service industry to accurately predict consumers' dining behavior. In this paper, based on users' dining implicit feedbacks (restaurant visit via check-ins), explicit feedbacks (restaurant reviews) as well as some meta data (e.g., location, user demographics, restaurant attributes), we aim at recommending each user a list of restaurants for his next dining. Implicit and Explicit feedbacks of dining behavior exhibit different characteristics of user preference. Therefore, in our work, user's dining preference mainly contains two parts: implicit preference coming from check-in data (implicit feedbacks) and explicit preference coming from rating and review data (explicit feedbacks). For implicit preference, we first apply a probabilistic tensor factorization model (PTF) to capture preference in a latent subspace. Then, in order to incorporate contextual signals from meta data, we extend PTF by proposing an Implicit Preference Model (IPM), which can simultaneously capture users'/restaurants'/time' preference in the collaborative filtering and dining preference in a specific context (e.g., spatial distance preference, environmental preference). For explicit preference, we propose Explicit Preference Model (EPM) by combining matrix factorization with topic modeling to discover the user preference embedded both in rating score and text content. Finally, we design a unified model termed as Collective Implicit Explicit Preference Model (CIEPM) to combine implicit and explicit preference together for restaurant recommendation. To evaluate the performance of our system, we conduct extensive experiments with large-scale datasets covering hundreds of thousands of users and restaurants. The results reveal that our system is effective for restaurant recommendation.
基于位置的服务的广泛应用为了解人们的移动模式提供了前所未有的潜力,这也可以使食品服务行业准确预测消费者的用餐行为。在本文中,基于用户就餐的隐式反馈(通过签到访问餐厅),显式反馈(餐厅评论)以及一些元数据(例如,位置,用户人口统计,餐厅属性),我们的目标是为每个用户推荐他下一次用餐的餐厅列表。用餐行为的内隐反馈和外显反馈表现出不同的用户偏好特征。因此,在我们的工作中,用户的用餐偏好主要包括两部分:来自签到数据的隐式偏好(隐式反馈)和来自评分和评论数据的显式偏好(显式反馈)。对于隐式偏好,我们首先应用概率张量分解模型(PTF)来捕获潜在子空间中的偏好。然后,为了整合来自元数据的上下文信号,我们通过提出一个隐式偏好模型(IPM)来扩展PTF,该模型可以同时捕获协同过滤中的用户/餐厅/时间偏好和特定上下文中的用餐偏好(例如空间距离偏好、环境偏好)。对于显式偏好,我们将矩阵分解与主题建模相结合,提出了显式偏好模型(explicit preference Model, EPM),以发现嵌入在评分和文本内容中的用户偏好。最后,我们设计了一个统一的模型,称为集体隐式显式偏好模型(CIEPM),将隐式偏好和显式偏好结合在一起进行餐厅推荐。为了评估我们系统的性能,我们对覆盖数十万用户和餐馆的大规模数据集进行了广泛的实验。结果表明,该系统对餐厅推荐是有效的。
{"title":"Exploiting Dining Preference for Restaurant Recommendation","authors":"Fuzheng Zhang, Nicholas Jing Yuan, Kai Zheng, Defu Lian, Xing Xie, Y. Rui","doi":"10.1145/2872427.2882995","DOIUrl":"https://doi.org/10.1145/2872427.2882995","url":null,"abstract":"The wide adoption of location-based services provide the potential to understand people's mobility pattern at an unprecedented level, which can also enable food-service industry to accurately predict consumers' dining behavior. In this paper, based on users' dining implicit feedbacks (restaurant visit via check-ins), explicit feedbacks (restaurant reviews) as well as some meta data (e.g., location, user demographics, restaurant attributes), we aim at recommending each user a list of restaurants for his next dining. Implicit and Explicit feedbacks of dining behavior exhibit different characteristics of user preference. Therefore, in our work, user's dining preference mainly contains two parts: implicit preference coming from check-in data (implicit feedbacks) and explicit preference coming from rating and review data (explicit feedbacks). For implicit preference, we first apply a probabilistic tensor factorization model (PTF) to capture preference in a latent subspace. Then, in order to incorporate contextual signals from meta data, we extend PTF by proposing an Implicit Preference Model (IPM), which can simultaneously capture users'/restaurants'/time' preference in the collaborative filtering and dining preference in a specific context (e.g., spatial distance preference, environmental preference). For explicit preference, we propose Explicit Preference Model (EPM) by combining matrix factorization with topic modeling to discover the user preference embedded both in rating score and text content. Finally, we design a unified model termed as Collective Implicit Explicit Preference Model (CIEPM) to combine implicit and explicit preference together for restaurant recommendation. To evaluate the performance of our system, we conduct extensive experiments with large-scale datasets covering hundreds of thousands of users and restaurants. The results reveal that our system is effective for restaurant recommendation.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87797722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
On the Relevance of Irrelevant Alternatives 论不相关选择的相关性
Pub Date : 2016-04-11 DOI: 10.1145/2872427.2883025
Austin R. Benson, Ravi Kumar, A. Tomkins
Multinomial logistic regression is a powerful tool to model choice from a finite set of alternatives, but it comes with an underlying model assumption called the independence of irrelevant alternatives, stating that any item added to the set of choices will decrease all other items' likelihood by an equal fraction. We perform statistical tests of this assumption across a variety of datasets and give results showing how often it is violated. When this axiom is violated, choice theorists will often invoke a richer model known as nested logistic regression, in which information about competition among items is encoded in a tree structure known as a nest. However, to our knowledge there are no known algorithms to induce the correct nest structure. We present the first such algorithm, which runs in quadratic time under an oracle model, and we pair it with a matching lower bound. We then perform experiments on synthetic and real datasets to validate the algorithm, and show that nested logit over learned nests outperforms traditional multinomial regression. Finally, in addition to automatically learning nests, we show how nests may be constructed by hand to test hypotheses about the data, and evaluated by their explanatory power.
多项逻辑回归是一种强大的工具,可以从有限的选项集中对选择进行建模,但它附带了一个潜在的模型假设,即无关选项的独立性,即任何添加到选择集中的选项都会以同等比例降低所有其他选项的可能性。我们在各种数据集上对这一假设进行统计测试,并给出结果,显示它被违反的频率。当这个公理被违背时,选择理论家通常会调用一个更丰富的模型,称为嵌套逻辑回归,其中关于项目之间竞争的信息被编码在称为巢的树结构中。然而,据我们所知,还没有已知的算法来诱导正确的巢结构。我们提出了第一个这样的算法,该算法在一个oracle模型下以二次时间运行,并将其与匹配的下界配对。然后,我们在合成数据集和真实数据集上进行实验来验证算法,并表明在学习的巢上嵌套logit优于传统的多项回归。最后,除了自动学习巢穴之外,我们还展示了如何手工构建巢穴来测试关于数据的假设,并通过它们的解释力进行评估。
{"title":"On the Relevance of Irrelevant Alternatives","authors":"Austin R. Benson, Ravi Kumar, A. Tomkins","doi":"10.1145/2872427.2883025","DOIUrl":"https://doi.org/10.1145/2872427.2883025","url":null,"abstract":"Multinomial logistic regression is a powerful tool to model choice from a finite set of alternatives, but it comes with an underlying model assumption called the independence of irrelevant alternatives, stating that any item added to the set of choices will decrease all other items' likelihood by an equal fraction. We perform statistical tests of this assumption across a variety of datasets and give results showing how often it is violated. When this axiom is violated, choice theorists will often invoke a richer model known as nested logistic regression, in which information about competition among items is encoded in a tree structure known as a nest. However, to our knowledge there are no known algorithms to induce the correct nest structure. We present the first such algorithm, which runs in quadratic time under an oracle model, and we pair it with a matching lower bound. We then perform experiments on synthetic and real datasets to validate the algorithm, and show that nested logit over learned nests outperforms traditional multinomial regression. Finally, in addition to automatically learning nests, we show how nests may be constructed by hand to test hypotheses about the data, and evaluated by their explanatory power.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82243638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
An Empirical Analysis of Algorithmic Pricing on Amazon Marketplace 亚马逊市场算法定价的实证分析
Pub Date : 2016-04-11 DOI: 10.1145/2872427.2883089
Le Chen, A. Mislove, Christo Wilson
The rise of e-commerce has unlocked practical applications for algorithmic pricing (also called dynamic pricing algorithms), where sellers set prices using computer algorithms. Travel websites and large, well known e-retailers have already adopted algorithmic pricing strategies, but the tools and techniques are now available to small-scale sellers as well. While algorithmic pricing can make merchants more competitive, it also creates new challenges. Examples have emerged of cases where competing pieces of algorithmic pricing software interacted in unexpected ways and produced unpredictable prices, as well as cases where algorithms were intentionally designed to implement price fixing. Unfortunately, the public currently lack comprehensive knowledge about the prevalence and behavior of algorithmic pricing algorithms in-the-wild. In this study, we develop a methodology for detecting algorithmic pricing, and use it empirically to analyze their prevalence and behavior on Amazon Marketplace. We gather four months of data covering all merchants selling any of 1,641 best-seller products. Using this dataset, we are able to uncover the algorithmic pricing strategies adopted by over 500 sellers. We explore the characteristics of these sellers and characterize the impact of these strategies on the dynamics of the marketplace.
电子商务的兴起开启了算法定价(也称为动态定价算法)的实际应用,卖家使用计算机算法设定价格。旅游网站和大型知名电子零售商已经采用了算法定价策略,但这些工具和技术现在也适用于小型卖家。虽然算法定价可以使商家更具竞争力,但它也带来了新的挑战。已经出现了这样的例子,即竞争性的算法定价软件以意想不到的方式相互作用,产生不可预测的价格,以及故意设计算法来实施价格操纵的情况。不幸的是,公众目前对算法定价算法的流行和行为缺乏全面的了解。在本研究中,我们开发了一种检测算法定价的方法,并使用它来实证分析它们在亚马逊市场上的流行程度和行为。我们收集了四个月的数据,涵盖销售1,641种畅销产品中的任何一种的所有商家。使用这个数据集,我们能够发现500多家卖家采用的算法定价策略。我们探讨了这些卖家的特点,并描述了这些策略对市场动态的影响。
{"title":"An Empirical Analysis of Algorithmic Pricing on Amazon Marketplace","authors":"Le Chen, A. Mislove, Christo Wilson","doi":"10.1145/2872427.2883089","DOIUrl":"https://doi.org/10.1145/2872427.2883089","url":null,"abstract":"The rise of e-commerce has unlocked practical applications for algorithmic pricing (also called dynamic pricing algorithms), where sellers set prices using computer algorithms. Travel websites and large, well known e-retailers have already adopted algorithmic pricing strategies, but the tools and techniques are now available to small-scale sellers as well. While algorithmic pricing can make merchants more competitive, it also creates new challenges. Examples have emerged of cases where competing pieces of algorithmic pricing software interacted in unexpected ways and produced unpredictable prices, as well as cases where algorithms were intentionally designed to implement price fixing. Unfortunately, the public currently lack comprehensive knowledge about the prevalence and behavior of algorithmic pricing algorithms in-the-wild. In this study, we develop a methodology for detecting algorithmic pricing, and use it empirically to analyze their prevalence and behavior on Amazon Marketplace. We gather four months of data covering all merchants selling any of 1,641 best-seller products. Using this dataset, we are able to uncover the algorithmic pricing strategies adopted by over 500 sellers. We explore the characteristics of these sellers and characterize the impact of these strategies on the dynamics of the marketplace.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81711552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 171
Beyond the Baseline: Establishing the Value in Mobile Phone Based Poverty Estimates 超越基线:建立基于手机的贫困估算的价值
Pub Date : 2016-04-11 DOI: 10.1145/2872427.2883076
Chris Smith-Clarke, L. Capra
Within the remit of `Data for Development' there have been a number of promising recent works that investigate the use of mobile phone Call Detail Records (CDRs) to estimate the spatial distribution of poverty or socio-economic status. The methods being developed have the potential to offer immense value to organisations and agencies who currently struggle to identify the poorest parts of a country, due to the lack of reliable and up to date survey data in certain parts of the world. However, the results of this research have thus far only been presented in isolation rather than in comparison to any alternative approach or benchmark. Consequently, the true practical value of these methods remains unknown. Here, we seek to allay this shortcoming, by proposing two baseline poverty estimators grounded on concrete usage scenarios: one that exploits correlation with population density only, to be used when no poverty data exists at all; and one that also exploits spatial autocorrelation, to be used when poverty data has been collected for a few regions within a country. We then compare the predictive performance of these baseline models with models that also include features derived from CDRs, so to establish their real added value. We present extensive analysis of the performance of all these models on data acquired for two developing countries -- Senegal and Ivory Coast. Our results reveal that CDR-based models do provide more accurate estimates in most cases; however, the improvement is modest and more significant when estimating (extreme) poverty intensity rates rather than mean wealth.
在“数据促进发展”的范围内,最近有一些有前途的工作,调查使用移动电话详细记录(cdr)来估计贫困或社会经济地位的空间分布。由于在世界某些地区缺乏可靠和最新的调查数据,目前正在努力确定一个国家最贫穷地区的组织和机构正在开发的方法有可能提供巨大的价值。然而,到目前为止,这项研究的结果只是单独提出的,而不是与任何替代方法或基准进行比较。因此,这些方法的真正实用价值仍然未知。在这里,我们试图通过提出两个基于具体使用情景的基线贫困估计器来减轻这一缺点:一个仅利用与人口密度的相关性,在根本没有贫困数据的情况下使用;另一种方法还利用了空间自相关性,用于收集一个国家内少数地区的贫困数据。然后,我们将这些基线模型的预测性能与也包含来自cdr的特征的模型进行比较,以便建立它们的实际附加价值。我们对所有这些模型在两个发展中国家——塞内加尔和科特迪瓦的数据上的表现进行了广泛的分析。我们的研究结果表明,在大多数情况下,基于cdr的模型确实提供了更准确的估计;然而,在估计(极端)贫困强度率而不是平均财富时,这种改善是温和的,而且更为显著。
{"title":"Beyond the Baseline: Establishing the Value in Mobile Phone Based Poverty Estimates","authors":"Chris Smith-Clarke, L. Capra","doi":"10.1145/2872427.2883076","DOIUrl":"https://doi.org/10.1145/2872427.2883076","url":null,"abstract":"Within the remit of `Data for Development' there have been a number of promising recent works that investigate the use of mobile phone Call Detail Records (CDRs) to estimate the spatial distribution of poverty or socio-economic status. The methods being developed have the potential to offer immense value to organisations and agencies who currently struggle to identify the poorest parts of a country, due to the lack of reliable and up to date survey data in certain parts of the world. However, the results of this research have thus far only been presented in isolation rather than in comparison to any alternative approach or benchmark. Consequently, the true practical value of these methods remains unknown. Here, we seek to allay this shortcoming, by proposing two baseline poverty estimators grounded on concrete usage scenarios: one that exploits correlation with population density only, to be used when no poverty data exists at all; and one that also exploits spatial autocorrelation, to be used when poverty data has been collected for a few regions within a country. We then compare the predictive performance of these baseline models with models that also include features derived from CDRs, so to establish their real added value. We present extensive analysis of the performance of all these models on data acquired for two developing countries -- Senegal and Ivory Coast. Our results reveal that CDR-based models do provide more accurate estimates in most cases; however, the improvement is modest and more significant when estimating (extreme) poverty intensity rates rather than mean wealth.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90582530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Unsupervised, Efficient and Semantic Expertise Retrieval 无监督、高效的语义专业知识检索
Pub Date : 2016-04-11 DOI: 10.1145/2872427.2882974
Christophe Van Gysel, M. de Rijke, M. Worring
We introduce an unsupervised discriminative model for the task of retrieving experts in online document collections. We exclusively employ textual evidence and avoid explicit feature engineering by learning distributed word representations in an unsupervised way. We compare our model to state-of-the-art unsupervised statistical vector space and probabilistic generative approaches. Our proposed log-linear model achieves the retrieval performance levels of state-of-the-art document-centric methods with the low inference cost of so-called profile-centric approaches. It yields a statistically significant improved ranking over vector space and generative models in most cases, matching the performance of supervised methods on various benchmarks. That is, by using solely text we can do as well as methods that work with external evidence and/or relevance feedback. A contrastive analysis of rankings produced by discriminative and generative approaches shows that they have complementary strengths due to the ability of the unsupervised discriminative model to perform semantic matching.
我们引入了一种无监督判别模型,用于在线文档集合中的专家检索任务。我们专门使用文本证据,并通过以无监督的方式学习分布式单词表示来避免显式特征工程。我们将我们的模型与最先进的无监督统计向量空间和概率生成方法进行比较。我们提出的对数线性模型达到了最先进的以文档为中心方法的检索性能水平,而所谓的以概要为中心方法的推理成本较低。在大多数情况下,它在矢量空间和生成模型上产生了统计上显着的改进排名,与监督方法在各种基准上的性能相匹配。也就是说,通过单独使用文本,我们可以使用与外部证据和/或相关反馈相同的方法。判别和生成方法产生的排名对比分析表明,由于无监督判别模型执行语义匹配的能力,它们具有互补的优势。
{"title":"Unsupervised, Efficient and Semantic Expertise Retrieval","authors":"Christophe Van Gysel, M. de Rijke, M. Worring","doi":"10.1145/2872427.2882974","DOIUrl":"https://doi.org/10.1145/2872427.2882974","url":null,"abstract":"We introduce an unsupervised discriminative model for the task of retrieving experts in online document collections. We exclusively employ textual evidence and avoid explicit feature engineering by learning distributed word representations in an unsupervised way. We compare our model to state-of-the-art unsupervised statistical vector space and probabilistic generative approaches. Our proposed log-linear model achieves the retrieval performance levels of state-of-the-art document-centric methods with the low inference cost of so-called profile-centric approaches. It yields a statistically significant improved ranking over vector space and generative models in most cases, matching the performance of supervised methods on various benchmarks. That is, by using solely text we can do as well as methods that work with external evidence and/or relevance feedback. A contrastive analysis of rankings produced by discriminative and generative approaches shows that they have complementary strengths due to the ability of the unsupervised discriminative model to perform semantic matching.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79323060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 67
Just in Time: Controlling Temporal Performance in Crowdsourcing Competitions 及时:控制众包竞争中的时间表现
Pub Date : 2016-04-11 DOI: 10.1145/2872427.2883075
Markus Rokicki, Sergej Zerr, Stefan Siersdorfer
Many modern data analytics applications in areas such as crisis management, stock trading, and healthcare, rely on components capable of nearly real-time processing of streaming data produced at varying rates. In addition to automatic processing methods, many tasks involved in those applications require further human assessment and analysis. However, current crowdsourcing platforms and systems do not support stream processing with variable loads. In this paper, we investigate how incentive mechanisms in competition based crowdsourcing can be employed in such scenarios. More specifically, we explore techniques for stimulating workers to dynamically adapt to both anticipated and sudden changes in data volume and processing demand, and we analyze effects such as data processing throughput, peak-to-average ratios, and saturation effects. To this end, we study a wide range of incentive schemes and utility functions inspired by real world applications. Our large-scale experimental evaluation with more than 900 participants and more than 6200 hours of work spent by crowd workers demonstrates that our competition based mechanisms are capable of adjusting the throughput of online workers and lead to substantial on-demand performance boosts.
危机管理、股票交易和医疗保健等领域的许多现代数据分析应用程序都依赖于能够近乎实时地处理以不同速率产生的流数据的组件。除了自动处理方法之外,这些应用程序中涉及的许多任务需要进一步的人工评估和分析。然而,目前的众包平台和系统不支持可变负载的流处理。在本文中,我们研究了基于竞争的众包中的激励机制如何在这种情况下使用。更具体地说,我们探索了刺激工人动态适应数据量和处理需求的预期和突然变化的技术,我们分析了数据处理吞吐量、峰值平均比和饱和效应等影响。为此,我们研究了广泛的激励方案和效用函数,灵感来自现实世界的应用。我们对900多名参与者和超过6200小时的群聚工作者进行的大规模实验评估表明,我们基于竞争的机制能够调整在线工作者的吞吐量,并导致按需性能的大幅提升。
{"title":"Just in Time: Controlling Temporal Performance in Crowdsourcing Competitions","authors":"Markus Rokicki, Sergej Zerr, Stefan Siersdorfer","doi":"10.1145/2872427.2883075","DOIUrl":"https://doi.org/10.1145/2872427.2883075","url":null,"abstract":"Many modern data analytics applications in areas such as crisis management, stock trading, and healthcare, rely on components capable of nearly real-time processing of streaming data produced at varying rates. In addition to automatic processing methods, many tasks involved in those applications require further human assessment and analysis. However, current crowdsourcing platforms and systems do not support stream processing with variable loads. In this paper, we investigate how incentive mechanisms in competition based crowdsourcing can be employed in such scenarios. More specifically, we explore techniques for stimulating workers to dynamically adapt to both anticipated and sudden changes in data volume and processing demand, and we analyze effects such as data processing throughput, peak-to-average ratios, and saturation effects. To this end, we study a wide range of incentive schemes and utility functions inspired by real world applications. Our large-scale experimental evaluation with more than 900 participants and more than 6200 hours of work spent by crowd workers demonstrates that our competition based mechanisms are capable of adjusting the throughput of online workers and lead to substantial on-demand performance boosts.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79697598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
Proceedings of the 25th International Conference on World Wide Web
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1