首页 > 最新文献

Proceedings of the 22nd ACM international conference on Information & Knowledge Management最新文献

英文 中文
Spatial search for K diverse-near neighbors K个异近邻的空间搜索
Gregory Ference, Wang-Chien Lee, Hui-Ju Hung, De-Nian Yang
To many location-based service applications that prefer diverse results, finding locations that are spatially diverse and close in proximity to a query point (e.g., the current location of a user) can be more useful than finding the k nearest neighbors/locations. In this paper, we investigate the problem of searching for the k Diverse-Near Neighbors (kDNNs)} in spatial space that is based upon the spatial diversity and proximity of candidate locations to the query point. While employing a conventional distance measure for proximity, we develop a new and intuitive diversity metric based upon the variance of the angles among the candidate locations with respect to the query point. Accordingly, we create a dynamic programming algorithm that finds the optimal kDNNs. Unfortunately, the dynamic programming algorithm, with a time complexity of O(kn3), incurs excessive computational cost. Therefore, we further propose two heuristic algorithms, namely, Distance-based Browsing (DistBrow) and Diversity-based Browsing (DivBrow) that provide high effectiveness while being efficient by exploring the search space prioritized upon the proximity to the query point and spatial diversity, respectively. Using real and synthetic datasets, we conduct a comprehensive performance evaluation. The results show that DistBrow and DivBrow have superior effectiveness compared to state-of-the-art algorithms while maintaining high efficiency.
对于许多喜欢不同结果的基于位置的服务应用程序,查找空间上不同且靠近查询点的位置(例如,用户的当前位置)可能比查找k个最近的邻居/位置更有用。在本文中,我们研究了基于候选位置与查询点的空间多样性和接近性在空间空间中搜索k个不同近邻(kdnn)}的问题。在使用传统距离度量接近度的同时,我们基于候选位置相对于查询点的角度方差开发了一种新的直观的多样性度量。因此,我们创建了一个动态规划算法来找到最优的kdnn。然而,动态规划算法的时间复杂度为0 (kn3),计算代价过高。因此,我们进一步提出了两种启发式算法,即基于距离的浏览(DistBrow)和基于多样性的浏览(DivBrow),这两种算法分别根据查询点的接近度和空间多样性优先级来探索搜索空间,从而提供了较高的效率。使用真实和合成数据集,我们进行了全面的性能评估。结果表明,与最先进的算法相比,DistBrow和DivBrow在保持高效率的同时具有优越的有效性。
{"title":"Spatial search for K diverse-near neighbors","authors":"Gregory Ference, Wang-Chien Lee, Hui-Ju Hung, De-Nian Yang","doi":"10.1145/2505515.2505747","DOIUrl":"https://doi.org/10.1145/2505515.2505747","url":null,"abstract":"To many location-based service applications that prefer diverse results, finding locations that are spatially diverse and close in proximity to a query point (e.g., the current location of a user) can be more useful than finding the k nearest neighbors/locations. In this paper, we investigate the problem of searching for the k Diverse-Near Neighbors (kDNNs)} in spatial space that is based upon the spatial diversity and proximity of candidate locations to the query point. While employing a conventional distance measure for proximity, we develop a new and intuitive diversity metric based upon the variance of the angles among the candidate locations with respect to the query point. Accordingly, we create a dynamic programming algorithm that finds the optimal kDNNs. Unfortunately, the dynamic programming algorithm, with a time complexity of O(kn3), incurs excessive computational cost. Therefore, we further propose two heuristic algorithms, namely, Distance-based Browsing (DistBrow) and Diversity-based Browsing (DivBrow) that provide high effectiveness while being efficient by exploring the search space prioritized upon the proximity to the query point and spatial diversity, respectively. Using real and synthetic datasets, we conduct a comprehensive performance evaluation. The results show that DistBrow and DivBrow have superior effectiveness compared to state-of-the-art algorithms while maintaining high efficiency.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"79 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73891715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Entropy-based histograms for selectivity estimation 基于熵的直方图的选择性估计
Hien To, Kuorong Chiang, C. Shahabi
Histograms have been extensively used for selectivity estimation by academics and have successfully been adopted by database industry. However, the estimation error is usually large for skewed distributions and biased attributes, which are typical in real-world data. Therefore, we propose effective models to quantitatively measure bias and selectivity based on information entropy. These models together with the principles of maximum entropy are then used to develop a class of entropy-based histograms. Moreover, since entropy can be computed incrementally, we present the incremental variations of our algorithms that reduce the complexities of the histogram construction from quadratic to linear. We conducted an extensive set of experiments with both synthetic and real-world datasets to compare the accuracy and efficiency of our proposed techniques with many other histogram-based techniques, showing the superiority of the entropy-based approaches for both equality and range queries.
直方图已被学术界广泛用于选择性估计,并已成功地应用于数据库行业。然而,对于偏态分布和偏态属性,估计误差通常很大,这在实际数据中是典型的。因此,我们提出了基于信息熵的有效模型来定量测量偏差和选择性。这些模型与最大熵原理一起用于开发一类基于熵的直方图。此外,由于熵可以增量计算,我们提出了算法的增量变化,将直方图构造的复杂性从二次型降低到线性型。我们对合成数据集和真实数据集进行了广泛的实验,以比较我们提出的技术与许多其他基于直方图的技术的准确性和效率,显示了基于熵的方法在等式和范围查询方面的优越性。
{"title":"Entropy-based histograms for selectivity estimation","authors":"Hien To, Kuorong Chiang, C. Shahabi","doi":"10.1145/2505515.2505756","DOIUrl":"https://doi.org/10.1145/2505515.2505756","url":null,"abstract":"Histograms have been extensively used for selectivity estimation by academics and have successfully been adopted by database industry. However, the estimation error is usually large for skewed distributions and biased attributes, which are typical in real-world data. Therefore, we propose effective models to quantitatively measure bias and selectivity based on information entropy. These models together with the principles of maximum entropy are then used to develop a class of entropy-based histograms. Moreover, since entropy can be computed incrementally, we present the incremental variations of our algorithms that reduce the complexities of the histogram construction from quadratic to linear. We conducted an extensive set of experiments with both synthetic and real-world datasets to compare the accuracy and efficiency of our proposed techniques with many other histogram-based techniques, showing the superiority of the entropy-based approaches for both equality and range queries.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"584 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75238299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Exploring XML data is as easy as using maps 探索XML数据就像使用地图一样简单
Yong Zeng, Z. Bao, Guoliang Li, T. Ling
For keyword search on XML data, traditionally, a list of query results in the form of subtrees will be returned to users. However, we find that it is still not sufficient to meet users' information needs because: (1) the search intention of a certain keyword query varies from person to person; (2) amongst the query results, they may have sibling or containment relationships (in the context of whole XML database), which could be important for users to digest the query results and should be shown to users. Therefore, we try to equip the traditional XML keyword search engine with our new exploration model XMAP, providing user an interactive yet novel way to explore the results with better user experience.
对于XML数据上的关键字搜索,传统上将以子树的形式将查询结果列表返回给用户。然而,我们发现它仍然不足以满足用户的信息需求,因为:(1)某一关键词查询的搜索意图因人而异;(2)在查询结果之间,它们可能具有兄弟关系或包含关系(在整个XML数据库上下文中),这对于用户消化查询结果可能很重要,应该显示给用户。因此,我们尝试用新的探索模型XMAP来装备传统的XML关键字搜索引擎,为用户提供一种交互式的新颖方式,以更好的用户体验来探索结果。
{"title":"Exploring XML data is as easy as using maps","authors":"Yong Zeng, Z. Bao, Guoliang Li, T. Ling","doi":"10.1145/2505515.2508201","DOIUrl":"https://doi.org/10.1145/2505515.2508201","url":null,"abstract":"For keyword search on XML data, traditionally, a list of query results in the form of subtrees will be returned to users. However, we find that it is still not sufficient to meet users' information needs because: (1) the search intention of a certain keyword query varies from person to person; (2) amongst the query results, they may have sibling or containment relationships (in the context of whole XML database), which could be important for users to digest the query results and should be shown to users. Therefore, we try to equip the traditional XML keyword search engine with our new exploration model XMAP, providing user an interactive yet novel way to explore the results with better user experience.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"17 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72563739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying salient entities in web pages 识别网页中的显著实体
Michael Gamon, T. Yano, Xinying Song, Johnson Apacible, Patrick Pantel
We propose a system that determines the salience of entities within web documents. Many recent advances in commercial search engines leverage the identification of entities in web pages. However, for many pages, only a small subset of entities are central to the document, which can lead to degraded relevance for entity triggered experiences. We address this problem by devising a system that scores each entity on a web page according to its centrality to the page content. We propose salience classification functions that incorporate various cues from document content, web search logs, and a large web graph. To cost-effectively train the models, we introduce a soft labeling methodology that generates a set of annotations based on user behaviors observed in web search logs. We evaluate several variations of our model via a large-scale empirical study conducted over a test set, which we release publicly to the research community. We demonstrate that our methods significantly outperform competitive baselines and the previous state of the art, while keeping the human annotation cost to a minimum.
我们提出了一个系统来确定web文档中实体的显著性。商业搜索引擎的许多最新进展都利用了网页中实体的识别。然而,对于许多页面来说,只有一小部分实体是文档的中心,这可能导致实体触发体验的相关性降低。我们通过设计一个系统来解决这个问题,该系统根据页面内容的中心性对网页上的每个实体进行评分。我们提出了显著性分类功能,该功能结合了来自文档内容、网络搜索日志和大型网络图的各种线索。为了经济有效地训练模型,我们引入了一种软标记方法,该方法根据在web搜索日志中观察到的用户行为生成一组注释。我们通过在测试集上进行的大规模实证研究来评估我们模型的几个变体,我们向研究社区公开发布。我们证明了我们的方法明显优于竞争基线和以前的技术状态,同时将人工注释成本降至最低。
{"title":"Identifying salient entities in web pages","authors":"Michael Gamon, T. Yano, Xinying Song, Johnson Apacible, Patrick Pantel","doi":"10.1145/2505515.2505602","DOIUrl":"https://doi.org/10.1145/2505515.2505602","url":null,"abstract":"We propose a system that determines the salience of entities within web documents. Many recent advances in commercial search engines leverage the identification of entities in web pages. However, for many pages, only a small subset of entities are central to the document, which can lead to degraded relevance for entity triggered experiences. We address this problem by devising a system that scores each entity on a web page according to its centrality to the page content. We propose salience classification functions that incorporate various cues from document content, web search logs, and a large web graph. To cost-effectively train the models, we introduce a soft labeling methodology that generates a set of annotations based on user behaviors observed in web search logs. We evaluate several variations of our model via a large-scale empirical study conducted over a test set, which we release publicly to the research community. We demonstrate that our methods significantly outperform competitive baselines and the previous state of the art, while keeping the human annotation cost to a minimum.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78225908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
QBEES: query by entity examples QBEES:按实体样例查询
S. Metzger, Ralf Schenkel, M. Sydow
Structured knowledge bases are an increasingly important way for storing and retrieving information. Within such knowledge bases, an important search task is finding similar entities based on one or more example entities. We present QBEES, a novel framework for defining entity similarity based only on structural features, so-called aspects, of the entities, that includes query-dependent and query-independent entity ranking components. We present evaluation results with a number of existing entity list completion benchmarks, comparing to several state-of-the-art baselines.
结构化知识库是存储和检索信息的一种日益重要的方式。在这样的知识库中,一个重要的搜索任务是基于一个或多个示例实体找到相似的实体。我们提出了QBEES,这是一个新的框架,用于仅基于实体的结构特征(即所谓的方面)来定义实体相似性,其中包括依赖于查询和独立于查询的实体排序组件。我们提出了一些现有实体列表完成基准的评估结果,与几个最先进的基线进行比较。
{"title":"QBEES: query by entity examples","authors":"S. Metzger, Ralf Schenkel, M. Sydow","doi":"10.1145/2505515.2507873","DOIUrl":"https://doi.org/10.1145/2505515.2507873","url":null,"abstract":"Structured knowledge bases are an increasingly important way for storing and retrieving information. Within such knowledge bases, an important search task is finding similar entities based on one or more example entities. We present QBEES, a novel framework for defining entity similarity based only on structural features, so-called aspects, of the entities, that includes query-dependent and query-independent entity ranking components. We present evaluation results with a number of existing entity list completion benchmarks, comparing to several state-of-the-art baselines.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"51 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75892252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Trustable aggregation of online ratings 可靠的在线评级汇总
Hyun-Kyo Oh, Sang-Wook Kim, Sunju Park, M. Zhou
The average of the customer ratings on the product, which we call reputation, is one of the key factors in online purchasing decision of a product. There is, however, no guarantee in the trustworthiness of the reputation since it can be manipulated rather easily. In this paper, we define false reputation as the problem of the reputation to be manipulated by unfair ratings, and design a general framework that provides trustable reputation. For this purpose, we propose TRUEREPUTATION, an algorithm that iteratively adjusts the reputation based on the confidence of customer ratings.
顾客对产品的平均评价,即我们所说的信誉,是在线购买产品决策的关键因素之一。然而,由于声誉很容易被操纵,因此无法保证声誉的可信度。在本文中,我们将虚假声誉定义为声誉被不公平评级操纵的问题,并设计了一个提供可信声誉的一般框架。为此,我们提出了trureputation,这是一种基于客户评级置信度迭代调整声誉的算法。
{"title":"Trustable aggregation of online ratings","authors":"Hyun-Kyo Oh, Sang-Wook Kim, Sunju Park, M. Zhou","doi":"10.1145/2505515.2507863","DOIUrl":"https://doi.org/10.1145/2505515.2507863","url":null,"abstract":"The average of the customer ratings on the product, which we call reputation, is one of the key factors in online purchasing decision of a product. There is, however, no guarantee in the trustworthiness of the reputation since it can be manipulated rather easily. In this paper, we define false reputation as the problem of the reputation to be manipulated by unfair ratings, and design a general framework that provides trustable reputation. For this purpose, we propose TRUEREPUTATION, an algorithm that iteratively adjusts the reputation based on the confidence of customer ratings.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"64 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74962508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Random walk-based graphical sampling in unbalanced heterogeneous bipartite social graphs 非平衡异构二部社会图中基于随机行走的图形抽样
Yusheng Xie, Zhengzhang Chen, Ankit Agrawal, A. Choudhary, Lu Liu
We investigate sampling techniques in unbalanced heterogeneous bipartite graphs (UHBGs), which have wide applications in real world web-scale social networks. We propose random walked-based link sampling and stratified sampling for UHBGs and show that they have advantages over generic random walk samplers. In addition, each sampler's node degree distribution parameter estimator statistic is analytically derived to be used as a quality indicator. In the experiments, we apply the two sampling techniques, with a baseline node sampling method, to both synthetic and real Facebook data. The experimental results show that random walk-based stratified sampler has significant advantage over node sampler and link sampler on UHBGs.
我们研究了非平衡异构二部图(UHBGs)的采样技术,该技术在现实世界的网络规模的社交网络中有广泛的应用。我们提出了基于随机行走的链路抽样和分层抽样的uhbg,并表明它们比一般的随机行走抽样具有优势。此外,分析导出了每个采样器的节点度分布参数估计量,作为质量指标。在实验中,我们将这两种采样技术与基线节点采样方法一起应用于合成和真实Facebook数据。实验结果表明,基于随机游动的分层采样器在UHBGs上比节点采样器和链路采样器具有显著的优势。
{"title":"Random walk-based graphical sampling in unbalanced heterogeneous bipartite social graphs","authors":"Yusheng Xie, Zhengzhang Chen, Ankit Agrawal, A. Choudhary, Lu Liu","doi":"10.1145/2505515.2507822","DOIUrl":"https://doi.org/10.1145/2505515.2507822","url":null,"abstract":"We investigate sampling techniques in unbalanced heterogeneous bipartite graphs (UHBGs), which have wide applications in real world web-scale social networks. We propose random walked-based link sampling and stratified sampling for UHBGs and show that they have advantages over generic random walk samplers. In addition, each sampler's node degree distribution parameter estimator statistic is analytically derived to be used as a quality indicator. In the experiments, we apply the two sampling techniques, with a baseline node sampling method, to both synthetic and real Facebook data. The experimental results show that random walk-based stratified sampler has significant advantage over node sampler and link sampler on UHBGs.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"58 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80145395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Nonparametric bayesian multitask collaborative filtering 非参数贝叶斯多任务协同过滤
S. Chatzis
The dramatic rates new digital content becomes available has brought collaborative filtering systems to the epicenter of computer science research in the last decade. One of the greatest challenges collaborative filtering systems are confronted with is the data sparsity problem: users typically rate only very few items; thus, availability of historical data is not adequate to effectively perform prediction. To alleviate these issues, in this paper we propose a novel multitask collaborative filtering approach. Our approach is based on a coupled latent factor model of the users rating functions, which allows for coming up with an agile information sharing mechanism that extracts much richer task-correlation information compared to existing approaches. Formulation of our method is based on concepts from the field of Bayesian nonparametrics, specifically Indian Buffet Process priors, which allow for data-driven determination of the optimal number of underlying latent features (item characteristics and user traits) assumed in the context of the model. We experiment on several real-world datasets, demonstrating both the efficacy of our method, and its superiority over existing approaches.
在过去的十年里,新的数字内容的惊人速度使协同过滤系统成为计算机科学研究的中心。协同过滤系统面临的最大挑战之一是数据稀疏性问题:用户通常只对很少的项目进行评分;因此,历史数据的可用性不足以有效地进行预测。为了解决这些问题,本文提出了一种新的多任务协同过滤方法。我们的方法基于用户评级函数的耦合潜在因素模型,它允许提出一个敏捷的信息共享机制,与现有方法相比,该机制可以提取更丰富的任务相关信息。我们的方法的制定是基于贝叶斯非参数领域的概念,特别是印度自助餐过程先验,它允许在模型背景下假设的数据驱动的潜在特征(项目特征和用户特征)的最佳数量的确定。我们在几个真实世界的数据集上进行了实验,证明了我们的方法的有效性,以及它比现有方法的优越性。
{"title":"Nonparametric bayesian multitask collaborative filtering","authors":"S. Chatzis","doi":"10.1145/2505515.2505517","DOIUrl":"https://doi.org/10.1145/2505515.2505517","url":null,"abstract":"The dramatic rates new digital content becomes available has brought collaborative filtering systems to the epicenter of computer science research in the last decade. One of the greatest challenges collaborative filtering systems are confronted with is the data sparsity problem: users typically rate only very few items; thus, availability of historical data is not adequate to effectively perform prediction. To alleviate these issues, in this paper we propose a novel multitask collaborative filtering approach. Our approach is based on a coupled latent factor model of the users rating functions, which allows for coming up with an agile information sharing mechanism that extracts much richer task-correlation information compared to existing approaches. Formulation of our method is based on concepts from the field of Bayesian nonparametrics, specifically Indian Buffet Process priors, which allow for data-driven determination of the optimal number of underlying latent features (item characteristics and user traits) assumed in the context of the model. We experiment on several real-world datasets, demonstrating both the efficacy of our method, and its superiority over existing approaches.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80197138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Domain-dependent/independent topic switching model for online reviews with numerical ratings 带有数字评级的在线评论的领域依赖/独立主题切换模型
Yasutoshi Ida, Takuma Nakamura, Takashi Matsumoto
We propose a domain-dependent/independent topic switching model based on Bayesian probabilistic modeling for modeling online product reviews that are accompanied with numerical ratings provided by users. In this model, each word is allocated to a domain-dependent topic or a domain-independent topic, and the distribution of topics in an online review is connected to an observed numerical rating via a linear regression model. Domain-dependent topics utilize domain information observed with a corpus, and domain-independent topics utilize the framework of Bayesian Nonparametrics, which can estimate the number of topics in posterior distributions. The posterior distribution is estimated via collapsed Gibbs sampling. Using real data, our proposed model had smaller mean square error and smaller average mean error with a small model size and achieved convergence in fewer iterations for a regression task involving online review ratings, outperforming a baseline model that did not consider domains. Moreover, the proposed model can also tell us whether the words are positive or negative in the form of continuous values. This feature allows us to extract domain-dependent and -independent sentiment words.
我们提出了一种基于贝叶斯概率建模的领域相关/独立主题切换模型,用于对带有用户提供的数值评级的在线产品评论进行建模。在该模型中,每个词被分配到一个领域相关的主题或一个领域独立的主题,并通过线性回归模型将在线评论中的主题分布与观察到的数值评级联系起来。领域相关的主题利用从语料库中观察到的领域信息,而领域独立的主题利用贝叶斯非参数框架,可以估计后验分布中的主题数量。后验分布是通过崩溃吉布斯抽样估计的。使用真实数据,我们提出的模型具有较小的均方误差和较小的平均平均误差,模型尺寸较小,并且在涉及在线评论评级的回归任务中实现了较少的迭代收敛,优于不考虑域的基线模型。此外,该模型还可以以连续值的形式告诉我们单词是正的还是负的。这个特性允许我们提取领域相关和独立的情感词。
{"title":"Domain-dependent/independent topic switching model for online reviews with numerical ratings","authors":"Yasutoshi Ida, Takuma Nakamura, Takashi Matsumoto","doi":"10.1145/2505515.2505540","DOIUrl":"https://doi.org/10.1145/2505515.2505540","url":null,"abstract":"We propose a domain-dependent/independent topic switching model based on Bayesian probabilistic modeling for modeling online product reviews that are accompanied with numerical ratings provided by users. In this model, each word is allocated to a domain-dependent topic or a domain-independent topic, and the distribution of topics in an online review is connected to an observed numerical rating via a linear regression model. Domain-dependent topics utilize domain information observed with a corpus, and domain-independent topics utilize the framework of Bayesian Nonparametrics, which can estimate the number of topics in posterior distributions. The posterior distribution is estimated via collapsed Gibbs sampling. Using real data, our proposed model had smaller mean square error and smaller average mean error with a small model size and achieved convergence in fewer iterations for a regression task involving online review ratings, outperforming a baseline model that did not consider domains. Moreover, the proposed model can also tell us whether the words are positive or negative in the form of continuous values. This feature allows us to extract domain-dependent and -independent sentiment words.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80391731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Automated snippet generation for online advertising 自动片段生成在线广告
Stamatina Thomaidou, Ismini Lourentzou, Panagiotis Katsivelis-Perakis, M. Vazirgiannis
Products, services or brands can be advertised alongside the search results in major search engines, while recently smaller displays on devices like tablets and smartphones have imposed the need for smaller ad texts. In this paper, we propose a method that produces in an automated manner compact text ads (promotional text snippets), given as input a product description webpage (landing page). The challenge is to produce a small comprehensive ad while maintaining at the same time relevance, clarity, and attractiveness. Our method includes the following phases. Initially, it extracts relevant and important n-grams (keywords) given the landing page. The keywords reserved must have a positive meaning in order to have a call-to-action style, thus we attempt sentiment analysis on them. Next, we build an Advertising Language Model to evaluate phrases in terms of their marketing appeal. We experiment with two variations of our method and we show that they outperform all the baseline approaches.
产品、服务或品牌广告可以在主要搜索引擎的搜索结果旁边显示,而最近平板电脑和智能手机等设备上的屏幕越来越小,因此需要更小的广告文本。在本文中,我们提出了一种以自动方式生成紧凑文本广告(促销文本片段)的方法,将产品描述网页(着陆页)作为输入。挑战在于制作一个小而全面的广告,同时保持相关性,清晰度和吸引力。我们的方法包括以下几个阶段。最初,它提取相关的和重要的n-gram(关键词)给定的着陆页。保留的关键词必须具有积极的意义,才能具有号召性的风格,因此我们尝试对它们进行情感分析。接下来,我们建立了一个广告语言模型来评估短语的营销吸引力。我们用我们方法的两种变体进行了实验,结果表明它们的性能优于所有基线方法。
{"title":"Automated snippet generation for online advertising","authors":"Stamatina Thomaidou, Ismini Lourentzou, Panagiotis Katsivelis-Perakis, M. Vazirgiannis","doi":"10.1145/2505515.2507876","DOIUrl":"https://doi.org/10.1145/2505515.2507876","url":null,"abstract":"Products, services or brands can be advertised alongside the search results in major search engines, while recently smaller displays on devices like tablets and smartphones have imposed the need for smaller ad texts. In this paper, we propose a method that produces in an automated manner compact text ads (promotional text snippets), given as input a product description webpage (landing page). The challenge is to produce a small comprehensive ad while maintaining at the same time relevance, clarity, and attractiveness. Our method includes the following phases. Initially, it extracts relevant and important n-grams (keywords) given the landing page. The keywords reserved must have a positive meaning in order to have a call-to-action style, thus we attempt sentiment analysis on them. Next, we build an Advertising Language Model to evaluate phrases in terms of their marketing appeal. We experiment with two variations of our method and we show that they outperform all the baseline approaches.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"151 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80518137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
期刊
Proceedings of the 22nd ACM international conference on Information & Knowledge Management
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1