首页 > 最新文献

2010 IEEE International Conference on Data Mining Workshops最新文献

英文 中文
Infrequent Purchased Product Recommendation Making Based on User Behaviour and Opinions in E-commerce Sites 基于用户行为和意见的电子商务网站非经常性购买商品推荐
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.116
N. Abdullah, Yue Xu, S. Geva, Jinghong Chen
Web based commercial recommender systems (RS) can help users to make decisions about which product to purchase from the vast amount of products available on the Internet. Currently, many commercial recommender systems are developed for recommending frequently purchased products where a large amount of explicit ratings or purchase history data is available to predict user preferences. However, for products that are infrequently purchased by users, it is difficult to collect such data and, thus, user profiling becomes a major challenge for recommending these kinds of products. This paper proposes a recommendation approach for infrequently purchased products based on user opinions and navigation data. User opinion data, which is collected from product review data, is used to generate product profiles and user navigation data is used to generate user profiles, both of which are used for recommending products that best satisfy the users’ needs. Experiments conducted on real e-commerce data show that the proposed approach, named, Adaptive Collaborative Filtering (ACF), which utilizes user and product profiles, outperforms the Query Expansion (QE) approach that only utilizes product profiles to recommend products. The ACF also performs better than the Basic Search (BS) approach, which is widely applied by the current e-commerce applications.
基于Web的商业推荐系统(RS)可以帮助用户从Internet上提供的大量产品中决定购买哪种产品。目前,许多商业推荐系统都是为了推荐经常购买的产品而开发的,其中有大量的明确评级或购买历史数据可以用来预测用户的偏好。然而,对于用户不经常购买的产品,很难收集这些数据,因此,用户分析成为推荐这类产品的主要挑战。本文提出了一种基于用户意见和导航数据的非频繁购买商品推荐方法。从产品评论数据中收集到的用户意见数据用于生成产品概况,用户导航数据用于生成用户概况,两者都用于推荐最能满足用户需求的产品。在实际电子商务数据上进行的实验表明,利用用户和产品配置文件的自适应协同过滤(ACF)方法优于仅利用产品配置文件推荐产品的查询扩展(QE)方法。ACF的性能亦优于目前电子商贸应用广泛采用的“基本检索”方法。
{"title":"Infrequent Purchased Product Recommendation Making Based on User Behaviour and Opinions in E-commerce Sites","authors":"N. Abdullah, Yue Xu, S. Geva, Jinghong Chen","doi":"10.1109/ICDMW.2010.116","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.116","url":null,"abstract":"Web based commercial recommender systems (RS) can help users to make decisions about which product to purchase from the vast amount of products available on the Internet. Currently, many commercial recommender systems are developed for recommending frequently purchased products where a large amount of explicit ratings or purchase history data is available to predict user preferences. However, for products that are infrequently purchased by users, it is difficult to collect such data and, thus, user profiling becomes a major challenge for recommending these kinds of products. This paper proposes a recommendation approach for infrequently purchased products based on user opinions and navigation data. User opinion data, which is collected from product review data, is used to generate product profiles and user navigation data is used to generate user profiles, both of which are used for recommending products that best satisfy the users’ needs. Experiments conducted on real e-commerce data show that the proposed approach, named, Adaptive Collaborative Filtering (ACF), which utilizes user and product profiles, outperforms the Query Expansion (QE) approach that only utilizes product profiles to recommend products. The ACF also performs better than the Basic Search (BS) approach, which is widely applied by the current e-commerce applications.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"&NA; 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126027959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Ant Colony Optimization Algorithm Based on Immunity Vaccine and Dynamic Pheromone Updating 基于免疫疫苗和信息素动态更新的蚁群优化算法
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.16
Wanjun Liu, Juan Zhang, Junli Liu
To solve the problem of complexity and uncertainty in logistics and distribution system, a kind of Ant Colony Optimization algorithm based on immunity vaccine and dynamic pheromone updating is put forward in this paper. In this new Algorithm, initial antibody is vaccinated first to produce better solutions, and initial parameters are then set depending on these better solutions by the Ant Colony. Mechanism of dynamic adjustment and pheromone updating is also introduced into this algorithm to slow down the increment of the pheromone concentration difference between the paths, avoiding sectional solution of the algorithm. Experiment results show that DACOIV can effectively find optimal path for logistics and distribution system.
为解决物流配送系统的复杂性和不确定性问题,提出了一种基于免疫疫苗和信息素动态更新的蚁群优化算法。在该算法中,首先接种初始抗体以产生更好的解,然后由蚁群根据这些更好的解设置初始参数。该算法还引入了动态调整和信息素更新机制,减缓了路径间信息素浓度差的增量,避免了算法的分段求解。实验结果表明,DACOIV能够有效地找到物流配送系统的最优路径。
{"title":"Ant Colony Optimization Algorithm Based on Immunity Vaccine and Dynamic Pheromone Updating","authors":"Wanjun Liu, Juan Zhang, Junli Liu","doi":"10.1109/ICDMW.2010.16","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.16","url":null,"abstract":"To solve the problem of complexity and uncertainty in logistics and distribution system, a kind of Ant Colony Optimization algorithm based on immunity vaccine and dynamic pheromone updating is put forward in this paper. In this new Algorithm, initial antibody is vaccinated first to produce better solutions, and initial parameters are then set depending on these better solutions by the Ant Colony. Mechanism of dynamic adjustment and pheromone updating is also introduced into this algorithm to slow down the increment of the pheromone concentration difference between the paths, avoiding sectional solution of the algorithm. Experiment results show that DACOIV can effectively find optimal path for logistics and distribution system.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127290604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Network Anomaly Detection Using a Commute Distance Based Approach 基于通勤距离的网络异常检测方法
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.90
N. Khoa, T. Babaie, S. Chawla, Z. Zaidi
We propose the use of commute distance, a random walk metric, to discover anomalies in network traffic data. The commute distance based anomaly detection approach has several advantages over Principal Component Analysis (PCA), which is the method of choice for this task: (i) It generalizes both distance and density based anomaly detection techniques while PCA is primarily distance-based (ii) It is agnostic about the underlying data distribution, while PCA is based on the assumption that data follows a Gaussian distribution and (iii) It is more robust compared to PCA, i.e., a perturbation of the underlying data or changes in parameters used will have a less significant effect on the output of it than PCA. Experiments and analysis on simulated and real datasets are used to validate our claims.
我们建议使用通勤距离(一种随机行走度量)来发现网络流量数据中的异常。与主成分分析(PCA)相比,基于通勤距离的异常检测方法具有以下几个优点:(i)它推广了基于距离和密度的异常检测技术,而PCA主要是基于距离的;(ii)它对底层数据分布不可知,而PCA基于数据遵循高斯分布的假设;(iii)与PCA相比,它更具鲁棒性,即底层数据的扰动或所使用参数的变化对其输出的影响比PCA要小。在模拟和真实数据集上进行了实验和分析,以验证我们的主张。
{"title":"Network Anomaly Detection Using a Commute Distance Based Approach","authors":"N. Khoa, T. Babaie, S. Chawla, Z. Zaidi","doi":"10.1109/ICDMW.2010.90","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.90","url":null,"abstract":"We propose the use of commute distance, a random walk metric, to discover anomalies in network traffic data. The commute distance based anomaly detection approach has several advantages over Principal Component Analysis (PCA), which is the method of choice for this task: (i) It generalizes both distance and density based anomaly detection techniques while PCA is primarily distance-based (ii) It is agnostic about the underlying data distribution, while PCA is based on the assumption that data follows a Gaussian distribution and (iii) It is more robust compared to PCA, i.e., a perturbation of the underlying data or changes in parameters used will have a less significant effect on the output of it than PCA. Experiments and analysis on simulated and real datasets are used to validate our claims.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121925590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Interesting Subset Discovery and Its Application on Service Processes 兴趣子集发现及其在服务进程中的应用
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.98
M. Natu, Girish Keshav Palshikar
Various real-life datasets can be viewed as a set of records consisting of attributes explaining the records and set of measures evaluating the records. In this paper, we address the problem of automatically discovering interesting subsets from such a dataset, such that the discovered interesting subsets have significantly different characteristics of performance than the rest of the dataset. We present an algorithm to discover such interesting subsets. The proposed algorithm uses a generic domain-independent definition of interestingness and uses various heuristics to intelligently prune the search space in order to build a solution scalable to large size datasets. This paper presents application of the interesting subset discovery algorithm on four real-world case-studies and demonstrates the effectiveness of the interesting subset discovery algorithm in extracting insights in order to identify problem areas and provide improvement recommendations to wide variety of systems.
各种现实生活中的数据集可以被视为一组记录,由解释记录的属性和评估记录的一组度量组成。在本文中,我们解决了从这样的数据集中自动发现感兴趣的子集的问题,使得发现的感兴趣的子集与数据集的其余部分具有显着不同的性能特征。我们提出了一种算法来发现这些有趣的子集。该算法使用了一个通用的与领域无关的兴趣度定义,并使用各种启发式算法来智能地修剪搜索空间,以构建一个可扩展到大型数据集的解决方案。本文介绍了有趣子集发现算法在四个实际案例研究中的应用,并展示了有趣子集发现算法在提取见解方面的有效性,以便识别问题领域并为各种系统提供改进建议。
{"title":"Interesting Subset Discovery and Its Application on Service Processes","authors":"M. Natu, Girish Keshav Palshikar","doi":"10.1109/ICDMW.2010.98","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.98","url":null,"abstract":"Various real-life datasets can be viewed as a set of records consisting of attributes explaining the records and set of measures evaluating the records. In this paper, we address the problem of automatically discovering interesting subsets from such a dataset, such that the discovered interesting subsets have significantly different characteristics of performance than the rest of the dataset. We present an algorithm to discover such interesting subsets. The proposed algorithm uses a generic domain-independent definition of interestingness and uses various heuristics to intelligently prune the search space in order to build a solution scalable to large size datasets. This paper presents application of the interesting subset discovery algorithm on four real-world case-studies and demonstrates the effectiveness of the interesting subset discovery algorithm in extracting insights in order to identify problem areas and provide improvement recommendations to wide variety of systems.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122669075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Differential Analysis on Deep Web Data Sources 深层网络数据源的差异分析
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.22
Tantan Liu, Fan Wang, Jiedan Zhu, G. Agrawal
The growing use of Internet in everyday life has been creating new challenges and opportunities to use data mining techniques. A relatively new trend in the Internet is the deep web. As a large number of deep web data sources tend to provide similar data, an important problem is to perform offline analysis to understand the differences in data available from different sources. This paper introduces data mining methods to extract a high-level summary of the differences in data provided by different deep web data sources. We consider pattern of values with respect to the same entity and we formulate a new data mining problem, which we refer to as differential rule mining. We have developed an algorithm for mining such rules. Our method includes a pruning method to summarize the identified differential rules. For efficiency, a hash-table is used to accelerate the pruning process. We show the effectiveness, efficiency, and utility of our methods by analyzing data across four travel-related web-sites.
互联网在日常生活中的日益普及为数据挖掘技术的应用带来了新的挑战和机遇。互联网上一个相对较新的趋势是深网。由于大量的深网数据源往往提供相似的数据,一个重要的问题是如何进行离线分析,以了解不同来源的数据之间的差异。本文介绍了数据挖掘方法,以提取不同深度网络数据源提供的数据差异的高级总结。我们考虑了同一实体的值模式,并提出了一个新的数据挖掘问题,我们称之为差分规则挖掘。我们已经开发了一种算法来挖掘这些规则。我们的方法包括一个修剪方法来总结识别的微分规则。为了提高效率,我们使用哈希表来加速剪枝过程。通过分析四个旅游相关网站的数据,我们展示了我们方法的有效性、效率和实用性。
{"title":"Differential Analysis on Deep Web Data Sources","authors":"Tantan Liu, Fan Wang, Jiedan Zhu, G. Agrawal","doi":"10.1109/ICDMW.2010.22","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.22","url":null,"abstract":"The growing use of Internet in everyday life has been creating new challenges and opportunities to use data mining techniques. A relatively new trend in the Internet is the deep web. As a large number of deep web data sources tend to provide similar data, an important problem is to perform offline analysis to understand the differences in data available from different sources. This paper introduces data mining methods to extract a high-level summary of the differences in data provided by different deep web data sources. We consider pattern of values with respect to the same entity and we formulate a new data mining problem, which we refer to as differential rule mining. We have developed an algorithm for mining such rules. Our method includes a pruning method to summarize the identified differential rules. For efficiency, a hash-table is used to accelerate the pruning process. We show the effectiveness, efficiency, and utility of our methods by analyzing data across four travel-related web-sites.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131347216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
FDCluster: Mining Frequent Closed Discriminative Bicluster without Candidate Maintenance in Multiple Microarray Datasets FDCluster:在多个微阵列数据集中挖掘无候选维护的频繁封闭判别双聚类
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.10
Miao Wang, Xuequn Shang, Shaohua Zhang, Zhanhuai Li
Biclustering is a methodology allowing for condition set and gene set points clustering simultaneously. Almost all the current biclustering algorithms find bicluster in one microarray dataset. In order to reduce the noise influence and find more biological biclusters, we propose an algorithm, FDCluster, to mine frequent closed discriminative bicluster in multiple microarray datasets. FDCluster uses Apriori property and several novel techniques for pruning to mine frequent closed bicluster without candidate maintenance. The experimental results show that FDCluster is more effectiveness than traditional method in either single micorarray dataset or multiple microarray datasets. We also test the biological significance using GO to show our proposed method is able to produce biologically relevant biclusters.
双聚类是一种允许条件集点和基因集点同时聚类的方法。目前几乎所有的双聚类算法都是在一个微阵列数据集中找到双聚类的。为了降低噪声影响,发现更多的生物双聚类,我们提出了一种FDCluster算法,在多个微阵列数据集中挖掘频繁封闭判别双聚类。FDCluster利用Apriori属性和一些新的剪枝技术来挖掘频繁的封闭双聚类,无需候选维护。实验结果表明,无论是在单个微阵列数据集还是在多个微阵列数据集上,FDCluster都比传统方法更有效。我们还使用氧化石墨烯测试了生物学意义,以表明我们提出的方法能够产生生物学相关的双聚类。
{"title":"FDCluster: Mining Frequent Closed Discriminative Bicluster without Candidate Maintenance in Multiple Microarray Datasets","authors":"Miao Wang, Xuequn Shang, Shaohua Zhang, Zhanhuai Li","doi":"10.1109/ICDMW.2010.10","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.10","url":null,"abstract":"Biclustering is a methodology allowing for condition set and gene set points clustering simultaneously. Almost all the current biclustering algorithms find bicluster in one microarray dataset. In order to reduce the noise influence and find more biological biclusters, we propose an algorithm, FDCluster, to mine frequent closed discriminative bicluster in multiple microarray datasets. FDCluster uses Apriori property and several novel techniques for pruning to mine frequent closed bicluster without candidate maintenance. The experimental results show that FDCluster is more effectiveness than traditional method in either single micorarray dataset or multiple microarray datasets. We also test the biological significance using GO to show our proposed method is able to produce biologically relevant biclusters.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125784561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Evaluation of Protein Backbone Alphabets: Using Predicted Local Structure for Fold Recognition 蛋白质主链字母的评价:使用预测的局部结构进行折叠识别
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.168
Kyong Jin Shim
Optimally combining available information is one of the key challenges in knowledge-driven prediction techniques. In this study, we evaluate six Phi and Psi-based backbone alphabets. We show that the addition of predicted backbone conformations to SVM classifiers can improve fold recognition. Our experimental results show that the inclusion of predicted backbone conformations in our feature representation leads to higher overall accuracy compared to when using amino acid residues alone.
在知识驱动的预测技术中,最优地组合现有信息是一个关键挑战。在本研究中,我们评估了六个基于Phi和psi的骨干字母。我们证明了在SVM分类器中加入预测的主结构可以提高折叠识别。我们的实验结果表明,与单独使用氨基酸残基相比,在我们的特征表示中包含预测的主链构象可以获得更高的整体准确性。
{"title":"Evaluation of Protein Backbone Alphabets: Using Predicted Local Structure for Fold Recognition","authors":"Kyong Jin Shim","doi":"10.1109/ICDMW.2010.168","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.168","url":null,"abstract":"Optimally combining available information is one of the key challenges in knowledge-driven prediction techniques. In this study, we evaluate six Phi and Psi-based backbone alphabets. We show that the addition of predicted backbone conformations to SVM classifiers can improve fold recognition. Our experimental results show that the inclusion of predicted backbone conformations in our feature representation leads to higher overall accuracy compared to when using amino acid residues alone.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129902364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Additive Models via the Generalized Lasso 基于广义套索的有效加性模型
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.184
D. Semenovich, Nobuyuki Morioka, A. Sowmya
We propose a framework for learning generalized additive models at very little additional cost (a small constant) compared to some of the most efficient schemes for learning linear classifiers such as linear SVMs and regularized logistic regression. We achieve this through a simple feature encoding scheme followed by a novel approach to regularization which we term ``generalized lasso''. Addtive models offer an attractive alternative to linear models for many large scale tasks as they have significantly higher predictive power while remaining easily interpretable. Furthermore, our regularizations approach extends to arbitrary graphs, allowing, for example, to explicitly incorporate spatial information or similar priors. Traditional approaches for learning additive models, such as back fitting, do not scale to large datasets. Our new formulation of the resulting optimization problem allows us to investigate the use of recent accelerated gradient algorithms and demonstrate speed comparable to state of the art linear SVM training methods, making additive models suitable for very large problems. In our experiments we find that additive models consistently outperform linear models on various datasets.
与学习线性分类器(如线性支持向量机和正则化逻辑回归)的一些最有效的方案相比,我们提出了一个框架,用于以很少的额外成本(一个小常数)学习广义加性模型。我们通过一个简单的特征编码方案和一种新的正则化方法来实现这一点,我们称之为“广义套索”。对于许多大规模任务,加性模型为线性模型提供了一个有吸引力的替代方案,因为它们具有显着更高的预测能力,同时易于解释。此外,我们的正则化方法扩展到任意图,例如,允许显式地合并空间信息或类似的先验。学习加性模型的传统方法,如反向拟合,不能扩展到大型数据集。我们对结果优化问题的新公式使我们能够研究最近加速梯度算法的使用,并展示与最先进的线性支持向量机训练方法相当的速度,使加性模型适用于非常大的问题。在我们的实验中,我们发现在各种数据集上,加性模型始终优于线性模型。
{"title":"Efficient Additive Models via the Generalized Lasso","authors":"D. Semenovich, Nobuyuki Morioka, A. Sowmya","doi":"10.1109/ICDMW.2010.184","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.184","url":null,"abstract":"We propose a framework for learning generalized additive models at very little additional cost (a small constant) compared to some of the most efficient schemes for learning linear classifiers such as linear SVMs and regularized logistic regression. We achieve this through a simple feature encoding scheme followed by a novel approach to regularization which we term ``generalized lasso''. Addtive models offer an attractive alternative to linear models for many large scale tasks as they have significantly higher predictive power while remaining easily interpretable. Furthermore, our regularizations approach extends to arbitrary graphs, allowing, for example, to explicitly incorporate spatial information or similar priors. Traditional approaches for learning additive models, such as back fitting, do not scale to large datasets. Our new formulation of the resulting optimization problem allows us to investigate the use of recent accelerated gradient algorithms and demonstrate speed comparable to state of the art linear SVM training methods, making additive models suitable for very large problems. In our experiments we find that additive models consistently outperform linear models on various datasets.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"220 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129957210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data Analysis in Los Angeles Long Beach with Seasonal Time Series Model 基于季节时间序列模型的洛杉矶长滩数据分析
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.93
Weiqiang Wang, Zhendong Niu
Air pollution has been a huge problem for a long time, more and more scientists focus on this hot topic, In this paper we presented a series data analysis methods for Los Angeles Long Beach datasets by Seasonal ARIMA(autoregressive integrated moving average) model and MCMC(Markov chain Monte Carlo) method. The MCMC methods are studied with LA long beach air pollution PM 2.5 traffic from 1997 to 2008 observations. The conclusion illustrated that experimental results indicate that the seasonal ARIMA model can be an effective way to forecast air pollution, and also know the MCMC model fitting the datasets very significantly. This approach applied to a large class of utility functions and models for Air pollution and traffic fields.
长期以来,空气污染一直是一个巨大的问题,越来越多的科学家关注这一热门话题。本文提出了一种基于季节性ARIMA(自回归综合移动平均)模型和MCMC(马尔可夫链蒙特卡罗)方法的洛杉矶长滩数据分析方法。利用1997 ~ 2008年洛杉矶长滩空气污染PM 2.5交通观测资料对MCMC方法进行了研究。实验结果表明,季节ARIMA模型是一种有效的大气污染预报方法,MCMC模型对数据集的拟合效果也很好。这种方法适用于空气污染和交通领域的大量效用函数和模型。
{"title":"Data Analysis in Los Angeles Long Beach with Seasonal Time Series Model","authors":"Weiqiang Wang, Zhendong Niu","doi":"10.1109/ICDMW.2010.93","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.93","url":null,"abstract":"Air pollution has been a huge problem for a long time, more and more scientists focus on this hot topic, In this paper we presented a series data analysis methods for Los Angeles Long Beach datasets by Seasonal ARIMA(autoregressive integrated moving average) model and MCMC(Markov chain Monte Carlo) method. The MCMC methods are studied with LA long beach air pollution PM 2.5 traffic from 1997 to 2008 observations. The conclusion illustrated that experimental results indicate that the seasonal ARIMA model can be an effective way to forecast air pollution, and also know the MCMC model fitting the datasets very significantly. This approach applied to a large class of utility functions and models for Air pollution and traffic fields.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130705725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Sentence-Level and Document-Level Sentiment Mining for Arabic Texts 阿拉伯语文本的句子级和文档级情感挖掘
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.95
N. Farra, Elie Challita, R. A. Assi, Hazem M. Hajj
In this work, we investigate sentiment mining of Arabic text at both the sentence level and the document level. Existing research in Arabic sentiment mining remains very limited. For sentence-level classification, we investigate two approaches. The first is a novel grammatical approach that employs the use of a general structure for the Arabic sentence. The second approach is based on the semantic orientation of words and their corresponding frequencies, to do this we built an interactive learning semantic dictionary which stores the polarities of the roots of different words and identifies new polarities based on these roots. For document-level classification, we use sentences of known classes to classify whole documents, using a novel approach whereby documents are divided dynamically into chunks and classification is based on the semantic contributions of different chunks in the document. This dynamic chunking approach can also be investigated for sentiment mining in other languages. Finally, we propose a hierarchical classification scheme that uses the results of the sentence-level classifier as input to the document-level classifier, an approach which has not been investigated previously for Arabic documents. We also pinpoint the various challenges that are faced by sentiment mining for Arabic texts and propose suggestions for its development. We demonstrate promising results with our sentence-level approach, and our document-level experiments show, with high accuracy, that it is feasible to extract the sentiment of an Arabic document based on the classes of its sentences.
在这项工作中,我们从句子层面和文档层面研究了阿拉伯语文本的情感挖掘。现有的关于阿拉伯语情感挖掘的研究仍然非常有限。对于句子级分类,我们研究了两种方法。第一种是一种新颖的语法方法,它采用了阿拉伯语句子的一般结构。第二种方法是基于单词的语义方向及其相应的频率,为此我们建立了一个交互式学习语义词典,该词典存储不同单词词根的极性,并根据这些词根识别新的极性。对于文档级分类,我们使用已知类的句子对整个文档进行分类,使用一种新颖的方法,将文档动态划分为块,并基于文档中不同块的语义贡献进行分类。这种动态分块方法也可以用于其他语言的情感挖掘。最后,我们提出了一种分层分类方案,该方案使用句子级分类器的结果作为文档级分类器的输入,这种方法以前没有研究过阿拉伯语文档。我们还指出了阿拉伯语文本情感挖掘面临的各种挑战,并提出了其发展建议。我们用句子级的方法证明了有希望的结果,我们的文档级实验表明,基于句子的类别提取阿拉伯语文档的情感是可行的,并且具有很高的准确性。
{"title":"Sentence-Level and Document-Level Sentiment Mining for Arabic Texts","authors":"N. Farra, Elie Challita, R. A. Assi, Hazem M. Hajj","doi":"10.1109/ICDMW.2010.95","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.95","url":null,"abstract":"In this work, we investigate sentiment mining of Arabic text at both the sentence level and the document level. Existing research in Arabic sentiment mining remains very limited. For sentence-level classification, we investigate two approaches. The first is a novel grammatical approach that employs the use of a general structure for the Arabic sentence. The second approach is based on the semantic orientation of words and their corresponding frequencies, to do this we built an interactive learning semantic dictionary which stores the polarities of the roots of different words and identifies new polarities based on these roots. For document-level classification, we use sentences of known classes to classify whole documents, using a novel approach whereby documents are divided dynamically into chunks and classification is based on the semantic contributions of different chunks in the document. This dynamic chunking approach can also be investigated for sentiment mining in other languages. Finally, we propose a hierarchical classification scheme that uses the results of the sentence-level classifier as input to the document-level classifier, an approach which has not been investigated previously for Arabic documents. We also pinpoint the various challenges that are faced by sentiment mining for Arabic texts and propose suggestions for its development. We demonstrate promising results with our sentence-level approach, and our document-level experiments show, with high accuracy, that it is feasible to extract the sentiment of an Arabic document based on the classes of its sentences.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130709289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 186
期刊
2010 IEEE International Conference on Data Mining Workshops
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1