首页 > 最新文献

2010 IEEE International Conference on Data Mining Workshops最新文献

英文 中文
Integer Programming for Multi-class Active Learning 多类主动学习的整数规划
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.148
Dragomir Yankov, Suju Rajan, A. Ratnaparkhi
Active learning has been demonstrated to be a powerful tool for improving the effectiveness of binary classifiers. It iteratively identifies informative unlabeled examples which after labeling are used to augment the initial training set. Adapting the procedure to large-scale, multi-class classification problems, however, poses certain challenges. For instance, to guarantee improvement by the method we may need to select a large number of examples that require prohibitive labeling resources. Furthermore, the notion of informative examples also changes significantly when multiple classes are considered. In this paper we show that multi-class active learning can be cast into an integer programming framework, where a subset of examples that are informative across maximum number of classes is selected. We test our approach on several large-scale document categorization problems. We demonstrate that in the case of limited labeling resources and large number of classes the proposed method is more effective compared to other known approaches.
主动学习已被证明是提高二元分类器有效性的有力工具。它迭代地识别信息丰富的未标记样本,标记后用于增强初始训练集。然而,将该方法应用于大规模、多类的分类问题,存在一定的挑战。例如,为了保证该方法的改进,我们可能需要选择大量需要禁用标记资源的示例。此外,当考虑多个类时,信息性示例的概念也会发生重大变化。在本文中,我们证明了多类主动学习可以被转换成一个整数规划框架,在这个框架中,选择在最大数量的类中具有信息的示例子集。我们在几个大规模文档分类问题上测试了我们的方法。我们证明,在有限的标记资源和大量的类的情况下,所提出的方法比其他已知的方法更有效。
{"title":"Integer Programming for Multi-class Active Learning","authors":"Dragomir Yankov, Suju Rajan, A. Ratnaparkhi","doi":"10.1109/ICDMW.2010.148","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.148","url":null,"abstract":"Active learning has been demonstrated to be a powerful tool for improving the effectiveness of binary classifiers. It iteratively identifies informative unlabeled examples which after labeling are used to augment the initial training set. Adapting the procedure to large-scale, multi-class classification problems, however, poses certain challenges. For instance, to guarantee improvement by the method we may need to select a large number of examples that require prohibitive labeling resources. Furthermore, the notion of informative examples also changes significantly when multiple classes are considered. In this paper we show that multi-class active learning can be cast into an integer programming framework, where a subset of examples that are informative across maximum number of classes is selected. We test our approach on several large-scale document categorization problems. We demonstrate that in the case of limited labeling resources and large number of classes the proposed method is more effective compared to other known approaches.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128357688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Enhancing Document Exploration with OLAP 使用OLAP增强文档探索
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.37
Zhibo Chen, Carlos Garcia-Alvarado, C. Ordonez
Finding relevant documents in digital libraries has been a well studied problem in information retrieval. It is not uncommon to see users browsing digital collections without having a clear idea of the keyword search that they should perform. However, we believe that such initial query search is not totally independent from the target search. Therefore, we use these initial document selections to further explore these documents. In the following demonstration, we exploit On-line Analytical Processing (OLAP) for knowledge discovery in digital collections to achieve query refinement. Such refinement is the result of applying a traditional ranking technique, based on the vector space model, selecting the top keywords in the resulting subset of documents, and then displaying certain cuboids of the keywords. Based on these cuboids, which are ranked by their frequency, the users can select a query that can better represent their actual target search. We show that this document exploration can be done efficiently within the DBMS and exploit in-database extensions, such as User-Defined Functions, as well as standard SQL. Additionally, we demonstrate a novel approach to obtaining query refinement through OLAP data cubes.
在数字图书馆中查找相关文献一直是信息检索中研究较多的问题。用户在浏览数字馆藏时,并不清楚他们应该执行的关键字搜索,这种情况并不少见。然而,我们认为这种初始查询搜索并不是完全独立于目标搜索的。因此,我们使用这些初始文档选择来进一步研究这些文档。在下面的演示中,我们利用在线分析处理(OLAP)在数字馆藏中进行知识发现,以实现查询细化。这种精化是应用基于向量空间模型的传统排序技术的结果,在结果文档子集中选择最重要的关键字,然后显示关键字的某些长方体。基于这些长方体,根据它们的频率进行排名,用户可以选择一个更能代表他们实际目标搜索的查询。我们展示了这种文档探索可以在DBMS中有效地完成,并利用数据库内扩展(如用户定义函数)和标准SQL。此外,我们还演示了一种通过OLAP数据集获得查询精化的新方法。
{"title":"Enhancing Document Exploration with OLAP","authors":"Zhibo Chen, Carlos Garcia-Alvarado, C. Ordonez","doi":"10.1109/ICDMW.2010.37","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.37","url":null,"abstract":"Finding relevant documents in digital libraries has been a well studied problem in information retrieval. It is not uncommon to see users browsing digital collections without having a clear idea of the keyword search that they should perform. However, we believe that such initial query search is not totally independent from the target search. Therefore, we use these initial document selections to further explore these documents. In the following demonstration, we exploit On-line Analytical Processing (OLAP) for knowledge discovery in digital collections to achieve query refinement. Such refinement is the result of applying a traditional ranking technique, based on the vector space model, selecting the top keywords in the resulting subset of documents, and then displaying certain cuboids of the keywords. Based on these cuboids, which are ranked by their frequency, the users can select a query that can better represent their actual target search. We show that this document exploration can be done efficiently within the DBMS and exploit in-database extensions, such as User-Defined Functions, as well as standard SQL. Additionally, we demonstrate a novel approach to obtaining query refinement through OLAP data cubes.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126901236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Block Mixture Model for Pattern Discovery in Preference Data 偏好数据模式发现的块混合模型
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.59
Nicola Barbieri, M. Guarascio, G. Manco
This paper presents a probabilistic co-clustering approach to pattern discovery in preference data. We extended the original formulation of the block mixture model to handle rating data, the resulting model allows the simultaneous clustering of users and items in homogeneous user communities and item categories. The parameter of the model are determined using a variational approximation and a two-phase application of the EM algorithm. The experimental evaluation showed that proposed approach can be used both for rating prediction and pattern discovery tasks, such as the analysis of common trends within the same user community and the identification of interesting relationships between products belonging to the same item category. In particular, using Movie Lens data, we show how it is possibile to infer topics for each item category, and how to model community interests and transition among topics of interest.
提出了一种基于概率共聚类的偏好数据模式发现方法。我们扩展了块混合模型的原始公式来处理评级数据,所得到的模型允许在同质用户社区和商品类别中同时聚类用户和商品。模型参数的确定采用变分逼近和两阶段应用电磁算法。实验结果表明,该方法既可以用于评级预测,也可以用于模式发现任务,如分析同一用户群体内的共同趋势,识别属于同一商品类别的产品之间的有趣关系。特别是,使用Movie Lens数据,我们展示了如何推断每个项目类别的主题,以及如何建模社区兴趣和兴趣主题之间的转换。
{"title":"A Block Mixture Model for Pattern Discovery in Preference Data","authors":"Nicola Barbieri, M. Guarascio, G. Manco","doi":"10.1109/ICDMW.2010.59","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.59","url":null,"abstract":"This paper presents a probabilistic co-clustering approach to pattern discovery in preference data. We extended the original formulation of the block mixture model to handle rating data, the resulting model allows the simultaneous clustering of users and items in homogeneous user communities and item categories. The parameter of the model are determined using a variational approximation and a two-phase application of the EM algorithm. The experimental evaluation showed that proposed approach can be used both for rating prediction and pattern discovery tasks, such as the analysis of common trends within the same user community and the identification of interesting relationships between products belonging to the same item category. In particular, using Movie Lens data, we show how it is possibile to infer topics for each item category, and how to model community interests and transition among topics of interest.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"167 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126939481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
On Attribute Disclosure in Randomization Based Privacy Preserving Data Publishing 基于随机化的隐私保护数据发布中的属性披露研究
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.76
Ling Guo, Xiaowei Ying, Xintao Wu
Privacy preserving micro data publication has received wide attentions. In this paper, we investigate the randomization approach and focus on attribute disclosure under linking attacks. We give efficient solutions to determine optimal distortion parameters such that we can maximize utility preservation while still satisfying privacy requirements. We compare our randomization approach with l-diversity and anatomy in terms of utility preservation (under the same privacy requirements) from three aspects (reconstructed distributions, accuracy of answering queries, and preservation of correlations). Our empirical results show that randomization incurs significantly smaller utility loss.
隐私保护微数据发布受到了广泛关注。本文研究了随机化方法,重点研究了链接攻击下的属性披露问题。我们给出了确定最优失真参数的有效解决方案,以便在满足隐私要求的同时最大限度地保持效用。我们从三个方面(重构分布、回答查询的准确性和相关性保存)比较了我们的随机化方法与l-diversity和解剖学的效用保存(在相同的隐私要求下)。我们的实证结果表明,随机化导致的效用损失明显较小。
{"title":"On Attribute Disclosure in Randomization Based Privacy Preserving Data Publishing","authors":"Ling Guo, Xiaowei Ying, Xintao Wu","doi":"10.1109/ICDMW.2010.76","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.76","url":null,"abstract":"Privacy preserving micro data publication has received wide attentions. In this paper, we investigate the randomization approach and focus on attribute disclosure under linking attacks. We give efficient solutions to determine optimal distortion parameters such that we can maximize utility preservation while still satisfying privacy requirements. We compare our randomization approach with l-diversity and anatomy in terms of utility preservation (under the same privacy requirements) from three aspects (reconstructed distributions, accuracy of answering queries, and preservation of correlations). Our empirical results show that randomization incurs significantly smaller utility loss.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116781217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
System Biology Approach for Elucidating the Relationship Between Indonesian Herbal Plants and the Efficacy of Jamu 用系统生物学方法研究印尼草本植物与Jamu功效的关系
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.105
F. Afendi, L. K. Darusman, Aki Hirai, M. Altaf-Ul-Amin, Hiroki Takahashi, Kensuke Nakamura, S. Kanaya
Jamu is Indonesian herbal medicine made from a mixture of several plants. Some plants perform as main ingredients and the others as supporting ingredients. By utilizing biplot configuration, we explored the relationship between Indonesian herbal plants and the efficacy of jamu. Among 465 plants used in 3138 jamu, we determined that 190 plants were efficacious in at least one efficacy. We therefore consider these plants to be the main ingredients of jamu. The other 275 plants are considered to be supporting ingredients in jamu because their efficacy has not been established.
Jamu是一种印尼草药,由几种植物混合而成。有些植物是主要成分,有些则是辅助成分。利用双标图配置,探讨了印尼草本植物与加木药效的关系。在3138株jamu中使用的465种植物中,我们确定190种植物至少有一种功效。因此,我们认为这些植物是jamu的主要成分。其他275种植物被认为是jamu的辅助成分,因为它们的功效尚未得到证实。
{"title":"System Biology Approach for Elucidating the Relationship Between Indonesian Herbal Plants and the Efficacy of Jamu","authors":"F. Afendi, L. K. Darusman, Aki Hirai, M. Altaf-Ul-Amin, Hiroki Takahashi, Kensuke Nakamura, S. Kanaya","doi":"10.1109/ICDMW.2010.105","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.105","url":null,"abstract":"Jamu is Indonesian herbal medicine made from a mixture of several plants. Some plants perform as main ingredients and the others as supporting ingredients. By utilizing biplot configuration, we explored the relationship between Indonesian herbal plants and the efficacy of jamu. Among 465 plants used in 3138 jamu, we determined that 190 plants were efficacious in at least one efficacy. We therefore consider these plants to be the main ingredients of jamu. The other 275 plants are considered to be supporting ingredients in jamu because their efficacy has not been established.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129701305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
A Framework for Emotion Mining from Text in Online Social Networks 在线社交网络中文本情感挖掘的框架
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.75
Mohamed Yassine, Hazem M. Hajj
Online Social Networks are so popular nowadays that they are a major component of an individual’s social interaction. They are also emotionally-rich environments where close friends share their emotions, feelings and thoughts. In this paper, a new framework is proposed for characterizing emotional interactions in social networks, and then using these characteristics to distinguish friends from acquaintances. The goal is to extract the emotional content of texts in online social networks. The interest is in whether the text is an expression of the writer’s emotions or not. For this purpose, text mining techniques are performed on comments retrieved from a social network. The framework includes a model for data collection, database schemas, data processing and data mining steps. The informal language of online social networks is a main point to consider before performing any text mining techniques. This is why the framework includes the development of special lexicons. In general, the paper presents a new perspective for studying friendship relations and emotions’ expression in online social networks where it deals with the nature of these sites and the nature of the language used. It considers Lebanese Face book users as a case study. The technique adopted is unsupervised, it mainly uses the k-means clustering algorithm. Experiments show high accuracy for the model in both determining subjectivity of texts and predicting friendship.
在线社交网络现在非常流行,它们是个人社交互动的主要组成部分。他们也是情感丰富的环境,亲密的朋友分享他们的情感、感受和想法。本文提出了一个新的框架来表征社交网络中的情感互动,然后利用这些特征来区分朋友和熟人。目标是提取在线社交网络文本的情感内容。我们感兴趣的是文章是否表达了作者的情感。为此,对从社交网络检索到的评论执行文本挖掘技术。该框架包括用于数据收集、数据库模式、数据处理和数据挖掘步骤的模型。在执行任何文本挖掘技术之前,在线社交网络的非正式语言是需要考虑的主要问题。这就是为什么该框架包含了特殊词汇的开发。总的来说,这篇论文为研究在线社交网络中的友谊关系和情感表达提供了一个新的视角,它涉及到这些网站的性质和使用的语言的性质。它将黎巴嫩的facebook用户作为案例研究对象。采用的技术是无监督的,主要使用k-means聚类算法。实验结果表明,该模型在判断文本主观性和预测友谊方面均具有较高的准确性。
{"title":"A Framework for Emotion Mining from Text in Online Social Networks","authors":"Mohamed Yassine, Hazem M. Hajj","doi":"10.1109/ICDMW.2010.75","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.75","url":null,"abstract":"Online Social Networks are so popular nowadays that they are a major component of an individual’s social interaction. They are also emotionally-rich environments where close friends share their emotions, feelings and thoughts. In this paper, a new framework is proposed for characterizing emotional interactions in social networks, and then using these characteristics to distinguish friends from acquaintances. The goal is to extract the emotional content of texts in online social networks. The interest is in whether the text is an expression of the writer’s emotions or not. For this purpose, text mining techniques are performed on comments retrieved from a social network. The framework includes a model for data collection, database schemas, data processing and data mining steps. The informal language of online social networks is a main point to consider before performing any text mining techniques. This is why the framework includes the development of special lexicons. In general, the paper presents a new perspective for studying friendship relations and emotions’ expression in online social networks where it deals with the nature of these sites and the nature of the language used. It considers Lebanese Face book users as a case study. The technique adopted is unsupervised, it mainly uses the k-means clustering algorithm. Experiments show high accuracy for the model in both determining subjectivity of texts and predicting friendship.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128866960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 105
QueRIE: A Query Recommender System Supporting Interactive Database Exploration QueRIE:一个支持交互式数据库探索的查询推荐系统
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.43
Sarika Mittal, Jothi Swarubini Vindhiya Varman, Gloria Chatzopoulou, M. Eirinaki, N. Polyzotis
This demonstration presents QueRIE, a recommender system that supports interactive database exploration. This system aims at assisting non-expert users of scientific databases by generating personalized query recommendations. Drawing inspiration from Web recommender systems, QueRIE tracks the querying behavior of each user and identifies potentially “interesting” parts of the database related to the corresponding data analysis task by locating those database parts that were accessed by similar users in the past. It then generates and recommends the queries that cover those parts to the user.
这个演示展示了QueRIE,一个支持交互式数据库探索的推荐系统。该系统旨在通过生成个性化查询建议来帮助科学数据库的非专业用户。从Web推荐系统中获得灵感,QueRIE跟踪每个用户的查询行为,并通过定位那些过去被类似用户访问过的数据库部分,识别与相应数据分析任务相关的数据库中潜在的“有趣”部分。然后,它生成并向用户推荐涵盖这些部分的查询。
{"title":"QueRIE: A Query Recommender System Supporting Interactive Database Exploration","authors":"Sarika Mittal, Jothi Swarubini Vindhiya Varman, Gloria Chatzopoulou, M. Eirinaki, N. Polyzotis","doi":"10.1109/ICDMW.2010.43","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.43","url":null,"abstract":"This demonstration presents QueRIE, a recommender system that supports interactive database exploration. This system aims at assisting non-expert users of scientific databases by generating personalized query recommendations. Drawing inspiration from Web recommender systems, QueRIE tracks the querying behavior of each user and identifies potentially “interesting” parts of the database related to the corresponding data analysis task by locating those database parts that were accessed by similar users in the past. It then generates and recommends the queries that cover those parts to the user.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"286 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124565569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Large-Scale Customized Models for Advertisers 广告主大规模定制模式
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.157
A. Bagherjeiran, A. O. Hatch, A. Ratnaparkhi, R. Parekh
Performance advertisers want to maximize the return on their advertising spend. In the online advertising world, this means showing the ad only to those users most likely to convert i.e. buy a product or service. Existing ad targeting solutions such as context targeting and rule-based segment targeting primarily leverage marketing intuition to identify audience segments that would be likely to convert. Even the more sophisticated model-based approaches such as behavioral targeting identify audience segments interested in certain coarse-grained categories defined by the publisher. Advertisers are now able, through beaconing, to tell us exactly who their preferred customers are. Advertisers want to augment their existing advertising campaign with custom models that learn from the campaign and focus on attracting new users. Motivated by our experience with advertisers, we pose this problem within the context of ensemble learning. Building custom models for an existing ad campaign can be viewed as operations on an ensemble classifier: add, modify, or complement a classifier. An ideal new classifier should incrementally improve the ensemble and minimize overlap with any existing classifiers already in the ensemble–it should learn something new. With the proposed approach we are able to augment the advertising campaigns of several large advertisers at a large online advertising company.
绩效广告客户希望最大化其广告支出的回报。在在线广告领域,这意味着只向那些最有可能转化为购买产品或服务的用户展示广告。现有的广告定位解决方案,如情境定位和基于规则的细分市场定位,主要利用营销直觉来识别可能转化的受众群体。甚至更复杂的基于模型的方法(如行为目标)也能识别出对发行商定义的粗粒度类别感兴趣的用户群体。广告商现在能够通过信标准确地告诉我们他们的首选客户是谁。广告商希望通过定制模式来增强现有的广告活动,这些模式可以从广告活动中学习,并专注于吸引新用户。受广告主经验的启发,我们在集成学习的背景下提出了这个问题。为现有的广告活动构建定制模型可以看作是对集成分类器的操作:添加、修改或补充分类器。一个理想的新分类器应该逐步改进集成,并最小化与集成中已有的任何现有分类器的重叠——它应该学习一些新的东西。通过提出的方法,我们能够在一家大型在线广告公司增加几家大型广告商的广告活动。
{"title":"Large-Scale Customized Models for Advertisers","authors":"A. Bagherjeiran, A. O. Hatch, A. Ratnaparkhi, R. Parekh","doi":"10.1109/ICDMW.2010.157","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.157","url":null,"abstract":"Performance advertisers want to maximize the return on their advertising spend. In the online advertising world, this means showing the ad only to those users most likely to convert i.e. buy a product or service. Existing ad targeting solutions such as context targeting and rule-based segment targeting primarily leverage marketing intuition to identify audience segments that would be likely to convert. Even the more sophisticated model-based approaches such as behavioral targeting identify audience segments interested in certain coarse-grained categories defined by the publisher. Advertisers are now able, through beaconing, to tell us exactly who their preferred customers are. Advertisers want to augment their existing advertising campaign with custom models that learn from the campaign and focus on attracting new users. Motivated by our experience with advertisers, we pose this problem within the context of ensemble learning. Building custom models for an existing ad campaign can be viewed as operations on an ensemble classifier: add, modify, or complement a classifier. An ideal new classifier should incrementally improve the ensemble and minimize overlap with any existing classifiers already in the ensemble–it should learn something new. With the proposed approach we are able to augment the advertising campaigns of several large advertisers at a large online advertising company.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121777318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
RnR: Extracting Rationale from Online Reviews and Ratings RnR:从在线评论和评级中提取基本原理
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.167
Dwi A. P. Rahayu, S. Krishnaswamy, O. Alahakoon, C. Labbé
Review mining is a part of web mining which focuses on getting main information from user review. State of the art review mining systems focus on identifying semantic orientation of reviews and providing sentences or feature scores. There has been little focus on understanding the rationale for the ratings that are provided. This paper presents our proposed RnR system for extracting rationale from online reviews and ratings. We have implemented the system for evaluation on online reviews for hotels from TripAdvisor.com and present extensive experimental evaluation that demonstrates the improved computational performance of our approach and the accuracy in terms of identifying the rationale. This RnR system is available for testing from http://rnrsystem.com/RnRSystem
评论挖掘是web挖掘的一部分,侧重于从用户评论中获取主要信息。目前最先进的评论挖掘系统集中在识别评论的语义方向和提供句子或特征分数上。人们很少关注于理解所提供评级的基本原理。本文提出了我们提出的RnR系统,用于从在线评论和评级中提取基本原理。我们已经对TripAdvisor.com上的酒店在线评论进行了评估,并进行了广泛的实验评估,证明了我们的方法在计算性能上的改进,以及在识别基本原理方面的准确性。该RnR系统可从http://rnrsystem.com/RnRSystem进行测试
{"title":"RnR: Extracting Rationale from Online Reviews and Ratings","authors":"Dwi A. P. Rahayu, S. Krishnaswamy, O. Alahakoon, C. Labbé","doi":"10.1109/ICDMW.2010.167","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.167","url":null,"abstract":"Review mining is a part of web mining which focuses on getting main information from user review. State of the art review mining systems focus on identifying semantic orientation of reviews and providing sentences or feature scores. There has been little focus on understanding the rationale for the ratings that are provided. This paper presents our proposed RnR system for extracting rationale from online reviews and ratings. We have implemented the system for evaluation on online reviews for hotels from TripAdvisor.com and present extensive experimental evaluation that demonstrates the improved computational performance of our approach and the accuracy in terms of identifying the rationale. This RnR system is available for testing from http://rnrsystem.com/RnRSystem","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"193 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126079258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
S4: Distributed Stream Computing Platform S4:分布式流计算平台
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.172
L. Neumeyer, B. Robbins, Anish Nair, Anand Kesari
S4 is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data. Keyed data events are routed with affinity to Processing Elements (PEs), which consume the events and do one or both of the following: (1) emit one or more events which may be consumed by other PEs, (2) publish results. The architecture resembles the Actors model, providing semantics of encapsulation and location transparency, thus allowing applications to be massively concurrent while exposing a simple programming interface to application developers. In this paper, we outline the S4 architecture in detail, describe various applications, including real-life deployments. Our design is primarily driven by large scale applications for data mining and machine learning in a production environment. We show that the S4 design is surprisingly flexible and lends itself to run in large clusters built with commodity hardware.
S4是一种通用的、分布式的、可扩展的、部分容错的、可插拔的平台,它允许程序员轻松地开发用于处理连续无界数据流的应用程序。关键数据事件与处理元素(Processing element, pe)的关联被路由,处理元素使用事件并执行以下一项或两项操作:(1)发出一个或多个事件,这些事件可能被其他pe使用;(2)发布结果。该体系结构类似于Actors模型,提供封装语义和位置透明性,从而允许应用程序大规模并发,同时向应用程序开发人员公开一个简单的编程接口。在本文中,我们详细概述了S4架构,描述了各种应用程序,包括实际部署。我们的设计主要是由生产环境中的大规模数据挖掘和机器学习应用程序驱动的。我们展示了S4的设计具有惊人的灵活性,可以在使用普通硬件构建的大型集群中运行。
{"title":"S4: Distributed Stream Computing Platform","authors":"L. Neumeyer, B. Robbins, Anish Nair, Anand Kesari","doi":"10.1109/ICDMW.2010.172","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.172","url":null,"abstract":"S4 is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data. Keyed data events are routed with affinity to Processing Elements (PEs), which consume the events and do one or both of the following: (1) emit one or more events which may be consumed by other PEs, (2) publish results. The architecture resembles the Actors model, providing semantics of encapsulation and location transparency, thus allowing applications to be massively concurrent while exposing a simple programming interface to application developers. In this paper, we outline the S4 architecture in detail, describe various applications, including real-life deployments. Our design is primarily driven by large scale applications for data mining and machine learning in a production environment. We show that the S4 design is surprisingly flexible and lends itself to run in large clusters built with commodity hardware.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"162 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125456492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 968
期刊
2010 IEEE International Conference on Data Mining Workshops
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1