首页 > 最新文献

2010 IEEE International Conference on Data Mining Workshops最新文献

英文 中文
Spatio-Temporal Symbolization of Multidimensional Time Series 多维时间序列的时空符号化
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.86
S. Hidaka, Chen Yu
The present study proposes a new symbolization algorithm for multidimensional time series. We view temporal sequences as observed data generated by a dynamical system, and therefore the goal of symbolization is to estimate symbolic sequences that minimize loss of information, which is called generating partition in nonlinear physics. In order to utilize the theoretical property of symbol dynamics in data mining, our algorithm estimates symbols on multivariate time series by integrating both spatial and temporal information and selecting those dimensions in multidimensional time series containing useful information. Probabilistic symbolic sequences derived from our symbolization method can be used in various supervised and unsupervised data-mining tasks. To demonstrate this, the algorithm is evaluated by applying it to both simulated data and a real-world dataset. In both cases, the new algorithm outperforms its alternative approaches.
本文提出了一种新的多维时间序列符号化算法。我们将时间序列视为由动态系统产生的观测数据,因此符号化的目标是估计信息损失最小的符号序列,这在非线性物理中称为生成分区。为了利用符号动力学在数据挖掘中的理论特性,我们的算法通过整合空间和时间信息,并在多维时间序列中选择包含有用信息的维度来估计多变量时间序列上的符号。由我们的符号化方法得到的概率符号序列可用于各种监督和无监督数据挖掘任务。为了证明这一点,该算法通过将其应用于模拟数据和现实世界数据集来评估。在这两种情况下,新算法都优于其替代方法。
{"title":"Spatio-Temporal Symbolization of Multidimensional Time Series","authors":"S. Hidaka, Chen Yu","doi":"10.1109/ICDMW.2010.86","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.86","url":null,"abstract":"The present study proposes a new symbolization algorithm for multidimensional time series. We view temporal sequences as observed data generated by a dynamical system, and therefore the goal of symbolization is to estimate symbolic sequences that minimize loss of information, which is called generating partition in nonlinear physics. In order to utilize the theoretical property of symbol dynamics in data mining, our algorithm estimates symbols on multivariate time series by integrating both spatial and temporal information and selecting those dimensions in multidimensional time series containing useful information. Probabilistic symbolic sequences derived from our symbolization method can be used in various supervised and unsupervised data-mining tasks. To demonstrate this, the algorithm is evaluated by applying it to both simulated data and a real-world dataset. In both cases, the new algorithm outperforms its alternative approaches.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"04 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127449134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
CUBS: Multivariate Sequence Classification Using Bounded Z-score with Sampling 小熊:多元序列分类使用有界z分数与抽样
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.38
A. Richardson, G. Kaminka, Sarit Kraus
Multivariate temporal sequence classification is an important and challenging task. Several attempts to address this problem exist, but none provide a full solution. In this paper we present CUBS: Classification Using Bounded Z-Score with Sampling. CUBS uses item set mining to produce frequent subsequences, and then selects among them the statistically significant subsequences to compose a classification model. We introduce an improved item set mining algorithm that solves the short sequence bias present in many item set mining algorithms. Unfortunately, the z-score normalization hinders pruning. We provide a bound on the z-score to address this issue. Calculation of the z-score normalization requires knowledge of some statistical values of the data gathered using a small sample of the database. The sampling causes a distortion in the values. We analyze this distortion and correct it. We evaluate CUBS for accuracy and scalability on a synthetic dataset and on two real world dataset. The results demonstrate how short subsequence bias is solved in the mining, and show how our bound and sampling technique enable speedup.
多元时间序列分类是一项重要而富有挑战性的任务。解决这个问题的一些尝试已经存在,但是没有一个提供一个完整的解决方案。在本文中,我们提出了小熊:使用有界z分数与抽样的分类。小熊分类算法通过项目集挖掘产生频繁子序列,然后从中选择统计上显著的子序列组成分类模型。提出了一种改进的项目集挖掘算法,解决了许多项目集挖掘算法中存在的短序列偏差。不幸的是,z分数归一化阻碍了修剪。我们提供了z分数的界限来解决这个问题。计算z分数归一化需要了解使用数据库的小样本收集的数据的一些统计值。采样导致值失真。我们分析这种扭曲并加以纠正。我们在一个合成数据集和两个真实数据集上评估了小熊的准确性和可扩展性。结果显示了在挖掘中如何解决短子序列偏差,并显示了我们的定界和采样技术如何实现加速。
{"title":"CUBS: Multivariate Sequence Classification Using Bounded Z-score with Sampling","authors":"A. Richardson, G. Kaminka, Sarit Kraus","doi":"10.1109/ICDMW.2010.38","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.38","url":null,"abstract":"Multivariate temporal sequence classification is an important and challenging task. Several attempts to address this problem exist, but none provide a full solution. In this paper we present CUBS: Classification Using Bounded Z-Score with Sampling. CUBS uses item set mining to produce frequent subsequences, and then selects among them the statistically significant subsequences to compose a classification model. We introduce an improved item set mining algorithm that solves the short sequence bias present in many item set mining algorithms. Unfortunately, the z-score normalization hinders pruning. We provide a bound on the z-score to address this issue. Calculation of the z-score normalization requires knowledge of some statistical values of the data gathered using a small sample of the database. The sampling causes a distortion in the values. We analyze this distortion and correct it. We evaluate CUBS for accuracy and scalability on a synthetic dataset and on two real world dataset. The results demonstrate how short subsequence bias is solved in the mining, and show how our bound and sampling technique enable speedup.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130929951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Multiple Feature-Based Classifier and Its Application to Image Classification 多特征分类器及其在图像分类中的应用
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.82
Dong-Chul Park
A new image classification method with multiple feature-based classifier (MFC) is proposed in this paper. MFC does not use the entire feature vectors extracted from the original data in a concatenated form to classify each datum, but rather uses groups of features related to each feature vector separately. In the training stage, a confusion table calculated from each local classifier that uses a specific feature vector group is drawn throughout the accuracy of each local classifier and then, in the testing stage, the final classification result is obtained by applying weights corresponding to the confidence level of each local classifier. The proposed MFC algorithm is applied to the problem of image classification on a set of image data. The results demonstrate that the proposed MFC scheme can optimally enhance the classification accuracy of individual classifiers that use specific feature vector group.
提出了一种基于多特征分类器(MFC)的图像分类方法。MFC不使用从原始数据中提取的整个特征向量以串联的形式对每个数据进行分类,而是单独使用与每个特征向量相关的特征组。在训练阶段,通过每个局部分类器的准确率绘制出使用特定特征向量组的每个局部分类器计算出的混淆表,然后在测试阶段,通过对每个局部分类器的置信度应用相应的权重得到最终的分类结果。将该算法应用于一组图像数据上的图像分类问题。结果表明,所提出的MFC方案可以最优地提高使用特定特征向量组的单个分类器的分类精度。
{"title":"Multiple Feature-Based Classifier and Its Application to Image Classification","authors":"Dong-Chul Park","doi":"10.1109/ICDMW.2010.82","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.82","url":null,"abstract":"A new image classification method with multiple feature-based classifier (MFC) is proposed in this paper. MFC does not use the entire feature vectors extracted from the original data in a concatenated form to classify each datum, but rather uses groups of features related to each feature vector separately. In the training stage, a confusion table calculated from each local classifier that uses a specific feature vector group is drawn throughout the accuracy of each local classifier and then, in the testing stage, the final classification result is obtained by applying weights corresponding to the confidence level of each local classifier. The proposed MFC algorithm is applied to the problem of image classification on a set of image data. The results demonstrate that the proposed MFC scheme can optimally enhance the classification accuracy of individual classifiers that use specific feature vector group.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130464587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
From Convex to Nonconvex: A Loss Function Analysis for Binary Classification 从凸到非凸:二值分类的损失函数分析
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.57
Lei Zhao, M. Mammadov, J. Yearwood
Problems of data classification can be studied in the framework of regularization theory as ill-posed problems. In this framework, loss functions play an important role in the application of regularization theory to classification. In this paper, we review some important convex loss functions, including hinge loss, square loss, modified square loss, exponential loss, logistic regression loss, as well as some non-convex loss functions, such as sigmoid loss, $phi$-loss, ramp loss, normalized sigmoid loss, and the loss function of 2 layer neural network. Based on the analysis of these loss functions, we propose a new differentiable non-convex loss function, called smoothed 0-1 loss function, which is a natural approximation of the 0-1 loss function. To compare the performance of different loss functions, we propose two binary classification algorithms for binary classification, one for convex loss functions, the other for non-convex loss functions. A set of experiments are launched on several binary data sets from the UCI repository. The results show that the proposed smoothed 0-1 loss function is robust, especially for those noisy data sets with many outliers.
数据分类问题可以在正则化理论的框架下作为不适定问题来研究。在这个框架中,损失函数在正则化理论在分类中的应用中起着重要的作用。本文综述了一些重要的凸损失函数,包括铰链损失、平方损失、修正平方损失、指数损失、逻辑回归损失,以及一些非凸损失函数,如sigmoid损失、$phi$-损失、斜坡损失、归一化sigmoid损失和2层神经网络的损失函数。在分析这些损失函数的基础上,我们提出了一种新的可微非凸损失函数,称为光滑的0-1损失函数,它是0-1损失函数的自然逼近。为了比较不同损失函数的性能,我们提出了两种二元分类算法,一种用于凸损失函数,另一种用于非凸损失函数。在UCI存储库的几个二进制数据集上启动了一组实验。结果表明,所提出的平滑0-1损失函数具有较强的鲁棒性,尤其适用于含有大量异常值的噪声数据集。
{"title":"From Convex to Nonconvex: A Loss Function Analysis for Binary Classification","authors":"Lei Zhao, M. Mammadov, J. Yearwood","doi":"10.1109/ICDMW.2010.57","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.57","url":null,"abstract":"Problems of data classification can be studied in the framework of regularization theory as ill-posed problems. In this framework, loss functions play an important role in the application of regularization theory to classification. In this paper, we review some important convex loss functions, including hinge loss, square loss, modified square loss, exponential loss, logistic regression loss, as well as some non-convex loss functions, such as sigmoid loss, $phi$-loss, ramp loss, normalized sigmoid loss, and the loss function of 2 layer neural network. Based on the analysis of these loss functions, we propose a new differentiable non-convex loss function, called smoothed 0-1 loss function, which is a natural approximation of the 0-1 loss function. To compare the performance of different loss functions, we propose two binary classification algorithms for binary classification, one for convex loss functions, the other for non-convex loss functions. A set of experiments are launched on several binary data sets from the UCI repository. The results show that the proposed smoothed 0-1 loss function is robust, especially for those noisy data sets with many outliers.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131325756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
Is Topotecan Effective at Killing Cancer Cells? 拓扑替康能有效杀死癌细胞吗?
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.129
R. Santiago-Mozos, I. Khan, M. G. Madden
In this paper, we analyse the behaviour of osteosarcoma cancer cells which are either exposed to the anticancer agent Topotecan or not exposed to any external agent. For the analyses of cell lineage data encoded from time lapse microscopy, we choose data mining tools that generate interpretable models of the data, and we address their statistical significance. We consider the mortality of unexposed cancer cells, the static and dynamic cytotoxic effects of the anticancer agent, the prediction of the clonal potential of resistant populations, and the differences between exposed and unexposed populations. We find that the anticancer agent affects the cells dynamics and events ratios i.e. (death/division, etc.) proportionately to its concentration, but it is ineffective at stopping the proliferation of the cancer at all dosages considered. In addition, we observe that cells exposed to the anticancer agent have greater displacements over time, indicating a putative relationship between cytotoxic effect and cell motility.
在本文中,我们分析了暴露于抗癌剂拓扑替康或未暴露于任何外部剂的骨肉瘤癌细胞的行为。为了分析从时间推移显微镜编码的细胞谱系数据,我们选择了数据挖掘工具来生成数据的可解释模型,并解决了它们的统计显著性。我们考虑了未暴露的癌细胞的死亡率,抗癌剂的静态和动态细胞毒性作用,耐药群体克隆潜力的预测,以及暴露和未暴露群体之间的差异。我们发现抗癌剂对细胞动力学和事件比率(即死亡/分裂等)的影响与其浓度成正比,但在所有考虑的剂量下,它在阻止癌症增殖方面都是无效的。此外,我们观察到暴露于抗癌剂的细胞随着时间的推移有更大的位移,表明细胞毒性作用和细胞运动性之间的假定关系。
{"title":"Is Topotecan Effective at Killing Cancer Cells?","authors":"R. Santiago-Mozos, I. Khan, M. G. Madden","doi":"10.1109/ICDMW.2010.129","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.129","url":null,"abstract":"In this paper, we analyse the behaviour of osteosarcoma cancer cells which are either exposed to the anticancer agent Topotecan or not exposed to any external agent. For the analyses of cell lineage data encoded from time lapse microscopy, we choose data mining tools that generate interpretable models of the data, and we address their statistical significance. We consider the mortality of unexposed cancer cells, the static and dynamic cytotoxic effects of the anticancer agent, the prediction of the clonal potential of resistant populations, and the differences between exposed and unexposed populations. We find that the anticancer agent affects the cells dynamics and events ratios i.e. (death/division, etc.) proportionately to its concentration, but it is ineffective at stopping the proliferation of the cancer at all dosages considered. In addition, we observe that cells exposed to the anticancer agent have greater displacements over time, indicating a putative relationship between cytotoxic effect and cell motility.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128878914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fixed-Parameter Tractable Combinatorial Algorithms for Metabolic Networks Alignments 代谢网络排列的固定参数可处理组合算法
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.179
Qiong Cheng, Jinpeng Wei, A. Zelikovsky, M. Ogihara
The accumulation of high-throughput genomic and proteomic data allows for the reconstruction of the increasingly large and complex metabolic networks. In order to analyze accumulated data and reconstructed networks, it is critical to identify network patterns and evolutionary relations between metabolic networks. But even finding similar networks is computationally challenging. Based on the property of gene duplication and function sharing in biological network, we have formulated the network alignment problem which asks the optimal vertex-to-vertex mapping allowing path contraction, vertex deletion, and vertex insertions. In this paper we present fixed parameter tractable combinatorial algorithms, which take into account the enzymes' functions and the similarity of arbitrary network topologies such as trees and arbitrary graphs wit hallowing the different types of vertex deletions. The proposed algorithms are fixed parameter tractable in the liner or square of the size of feedback vertex set respectively for the case of disallowing or allowing the deletions. We have developed the web service tool MetNetAligner which aligns metabolic networks. We evaluated our results by the randomizedP-Value computation. In the computation, we followed two standard randomization procedures and further developed two other random graph generators which keep the more stringent and consistent topology constraints. By comparing their distribution of the significant alignment pairs, we observed that the more stringent constraints in the topology the random graph generator has, the more pairs of significant alignments there exist. We also performed pair wise mapping of all pathways for four organisms and found a set of statistically significant pathway similarities. We have applied the network alignment to identifying pathway holes which are resulted by inconsistency and missing enzymes. MetNetAligner is available athttp://alla.cs.gsu.edu:8080/MinePW/pages/gmapping/GMMain.html Two random graph generations and the list of identified pathway holes are available online.
高通量基因组学和蛋白质组学数据的积累使得越来越大和复杂的代谢网络的重建成为可能。为了分析积累的数据和重建的网络,识别网络模式和代谢网络之间的进化关系是至关重要的。但即使是找到类似的网络,在计算上也是具有挑战性的。基于生物网络中基因复制和功能共享的特性,提出了允许路径收缩、顶点删除和顶点插入的最优顶点到顶点映射的网络对齐问题。在本文中,我们提出了固定参数易处理的组合算法,该算法考虑了酶的功能和任意网络拓扑(如树和任意图)的相似性,并允许不同类型的顶点删除。在不允许删除或允许删除的情况下,所提出的算法分别在反馈顶点集大小的线性或平方中具有固定参数可处理性。我们已经开发了网络服务工具MetNetAligner来校准代谢网络。我们通过随机p值计算来评估我们的结果。在计算中,我们遵循了两个标准的随机化程序,并进一步开发了另外两个保持更严格和一致的拓扑约束的随机图生成器。通过比较它们的显著对齐对分布,我们观察到随机图生成器的拓扑约束越严格,存在的显著对齐对就越多。我们还对四种生物的所有途径进行了配对映射,并发现了一组具有统计学意义的途径相似性。我们已经应用网络比对来识别由不一致和缺失酶导致的通路孔。MetNetAligner是可用的,网址://alla.cs.gsu.edu:8080/MinePW/pages/gmapping/GMMain.html 两个随机图形生成和已识别的路径洞列表可在线获得。
{"title":"Fixed-Parameter Tractable Combinatorial Algorithms for Metabolic Networks Alignments","authors":"Qiong Cheng, Jinpeng Wei, A. Zelikovsky, M. Ogihara","doi":"10.1109/ICDMW.2010.179","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.179","url":null,"abstract":"The accumulation of high-throughput genomic and proteomic data allows for the reconstruction of the increasingly large and complex metabolic networks. In order to analyze accumulated data and reconstructed networks, it is critical to identify network patterns and evolutionary relations between metabolic networks. But even finding similar networks is computationally challenging. Based on the property of gene duplication and function sharing in biological network, we have formulated the network alignment problem which asks the optimal vertex-to-vertex mapping allowing path contraction, vertex deletion, and vertex insertions. In this paper we present fixed parameter tractable combinatorial algorithms, which take into account the enzymes' functions and the similarity of arbitrary network topologies such as trees and arbitrary graphs wit hallowing the different types of vertex deletions. The proposed algorithms are fixed parameter tractable in the liner or square of the size of feedback vertex set respectively for the case of disallowing or allowing the deletions. We have developed the web service tool MetNetAligner which aligns metabolic networks. We evaluated our results by the randomizedP-Value computation. In the computation, we followed two standard randomization procedures and further developed two other random graph generators which keep the more stringent and consistent topology constraints. By comparing their distribution of the significant alignment pairs, we observed that the more stringent constraints in the topology the random graph generator has, the more pairs of significant alignments there exist. We also performed pair wise mapping of all pathways for four organisms and found a set of statistically significant pathway similarities. We have applied the network alignment to identifying pathway holes which are resulted by inconsistency and missing enzymes. MetNetAligner is available athttp://alla.cs.gsu.edu:8080/MinePW/pages/gmapping/GMMain.html Two random graph generations and the list of identified pathway holes are available online.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125402440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Controlling Consistency in Top-N Recommender Systems Top-N推荐系统的一致性控制
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.65
P. Cremonesi, R. Turrin
Recommender systems have become essential navigational tools for users to surf through vast on-line catalogs. However, recommender algorithms are often tuned to improve accuracy, without paying any attention to the consistency of the recommendations when small changes happen to the user profile or to the model. Consistency of recommendations is closely related with user satisfaction and trust. In this work we analyze how small changes in either the user profile or the recommender model may affect the consistency of Top-N recommendation systems. We also design two mechanisms able to promote consistency without degrading accuracy and novelty of recommendations. Finally, we investigate the consistency of Top-N recommendation algorithms over time by analyzing real data from a production IPTV recommender system.
推荐系统已经成为用户浏览大量在线目录必不可少的导航工具。然而,当用户配置文件或模型发生微小变化时,推荐算法通常会调整以提高准确性,而不会注意推荐的一致性。推荐的一致性与用户满意度和信任度密切相关。在这项工作中,我们分析了用户配置文件或推荐模型的微小变化如何影响Top-N推荐系统的一致性。我们还设计了两种机制,能够在不降低推荐的准确性和新颖性的情况下提高一致性。最后,我们通过分析生产IPTV推荐系统的真实数据来研究Top-N推荐算法随时间的一致性。
{"title":"Controlling Consistency in Top-N Recommender Systems","authors":"P. Cremonesi, R. Turrin","doi":"10.1109/ICDMW.2010.65","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.65","url":null,"abstract":"Recommender systems have become essential navigational tools for users to surf through vast on-line catalogs. However, recommender algorithms are often tuned to improve accuracy, without paying any attention to the consistency of the recommendations when small changes happen to the user profile or to the model. Consistency of recommendations is closely related with user satisfaction and trust. In this work we analyze how small changes in either the user profile or the recommender model may affect the consistency of Top-N recommendation systems. We also design two mechanisms able to promote consistency without degrading accuracy and novelty of recommendations. Finally, we investigate the consistency of Top-N recommendation algorithms over time by analyzing real data from a production IPTV recommender system.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125547917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
IEEE ICDM 2010 Contest: TomTom Traffic Prediction for Intelligent GPS Navigation IEEE ICDM 2010竞赛:智能GPS导航的TomTom交通预测
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.51
M. Wojnarski, P. Góra, Marcin S. Szczuka, H. Nguyen, Joanna Swietlicka, Demetrios Zeinalipour-Yazti
In this foreword, we summarize the IEEE ICDM 2010 Contest: “TomTom Traffic Prediction for Intelligent GPS Navigation”. The challenge was held between Jun 22, 2010 and Sep 7, 2010 as an interactive on-line competition, using the TunedIT platform (http://tunedit.org). We present the scope of the ICDM contest series in general, the scope of this year’s contest, description of its tasks, statistics about participation, details about the TunedIT platform and the Traffic Simulation Framework. A detailed description of winning solutions is part of this proceeding series.
在这篇前言中,我们总结了IEEE ICDM 2010竞赛:“智能GPS导航的TomTom交通预测”。该挑战赛于2010年6月22日至9月7日期间在TunedIT平台(http://tunedit.org)举行,是一场在线互动比赛。我们将介绍ICDM系列比赛的总体范围、今年比赛的范围、任务描述、参与统计数据、TunedIT平台和交通模拟框架的详细信息。关于获胜解决方案的详细描述是本系列的一部分。
{"title":"IEEE ICDM 2010 Contest: TomTom Traffic Prediction for Intelligent GPS Navigation","authors":"M. Wojnarski, P. Góra, Marcin S. Szczuka, H. Nguyen, Joanna Swietlicka, Demetrios Zeinalipour-Yazti","doi":"10.1109/ICDMW.2010.51","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.51","url":null,"abstract":"In this foreword, we summarize the IEEE ICDM 2010 Contest: “TomTom Traffic Prediction for Intelligent GPS Navigation”. The challenge was held between Jun 22, 2010 and Sep 7, 2010 as an interactive on-line competition, using the TunedIT platform (http://tunedit.org). We present the scope of the ICDM contest series in general, the scope of this year’s contest, description of its tasks, statistics about participation, details about the TunedIT platform and the Traffic Simulation Framework. A detailed description of winning solutions is part of this proceeding series.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123134824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Paired Evaluators Method to Track Concept Drift: An Application for Hedge Funds Operations 跟踪概念漂移的配对评估方法:在对冲基金操作中的应用
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.131
Masabumi Furuhata, T. Mizuta, J. So
In order to deal with sudden unexpected changes of circumstances, we propose a new forecast method based on paired evaluators, the stable evaluator and the reactive evaluator. These two evaluators are good at detecting consecutive concept drifts. We conduct a back-testing using financial data in order to demonstrate the performance of our proposing forecast method. The results of the back-testing show that our method is effective and robust even against the late-2000s recessions.
为了应对突发环境变化,提出了一种基于成对评价器、稳定评价器和反应性评价器的预测方法。这两种评估器都善于发现连续的概念漂移。我们使用财务数据进行回测,以证明我们提出的预测方法的性能。反向测试的结果表明,即使在本世纪末的经济衰退中,我们的方法也是有效和稳健的。
{"title":"Paired Evaluators Method to Track Concept Drift: An Application for Hedge Funds Operations","authors":"Masabumi Furuhata, T. Mizuta, J. So","doi":"10.1109/ICDMW.2010.131","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.131","url":null,"abstract":"In order to deal with sudden unexpected changes of circumstances, we propose a new forecast method based on paired evaluators, the stable evaluator and the reactive evaluator. These two evaluators are good at detecting consecutive concept drifts. We conduct a back-testing using financial data in order to demonstrate the performance of our proposing forecast method. The results of the back-testing show that our method is effective and robust even against the late-2000s recessions.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116638481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
W-LEACH: Weighted Low Energy Adaptive Clustering Hierarchy Aggregation Algorithm for Data Streams in Wireless Sensor Networks 无线传感器网络中数据流的加权低能量自适应聚类层次聚合算法
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.28
Hanady M. Abdulsalam, Layla K. Kamel
Many recent applications deal with continues flows of data (data streams). One important area of applications that is based on data streams is the area of Wireless Sensor Networks (WSNs) applications. Since sensors have limited lifetime, the need for developing algorithms for aggregating sensors' data forms an important concern in the area of WSNs. We present W-LEACH, a data-stream aggregation algorithm for WSNs that extends LEACH algorithm by Heinzelman et al. W-LEACH is able to handle non-uniform networks as well as uniform networks, while not affecting the network lifetime. It, instead, increases the average lifetime for sensors. We simulate our algorithm to evaluate its performance. Results show that W-LEACH increases the network lifetime and the average lifetime for sensors for uniform and non-uniform WSNs.
最近的许多应用程序处理连续的数据流(数据流)。基于数据流的一个重要应用领域是无线传感器网络(wsn)应用领域。由于传感器的使用寿命有限,因此需要开发用于聚合传感器数据的算法,这是无线传感器网络领域的一个重要问题。我们提出了W-LEACH,一种用于wsn的数据流聚合算法,它扩展了Heinzelman等人的LEACH算法。W-LEACH既能处理非均匀网络,也能处理均匀网络,同时不影响网络的生存期。相反,它增加了传感器的平均寿命。我们模拟我们的算法来评估其性能。结果表明,对于均匀和非均匀wsn, W-LEACH提高了传感器的网络寿命和平均寿命。
{"title":"W-LEACH: Weighted Low Energy Adaptive Clustering Hierarchy Aggregation Algorithm for Data Streams in Wireless Sensor Networks","authors":"Hanady M. Abdulsalam, Layla K. Kamel","doi":"10.1109/ICDMW.2010.28","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.28","url":null,"abstract":"Many recent applications deal with continues flows of data (data streams). One important area of applications that is based on data streams is the area of Wireless Sensor Networks (WSNs) applications. Since sensors have limited lifetime, the need for developing algorithms for aggregating sensors' data forms an important concern in the area of WSNs. We present W-LEACH, a data-stream aggregation algorithm for WSNs that extends LEACH algorithm by Heinzelman et al. W-LEACH is able to handle non-uniform networks as well as uniform networks, while not affecting the network lifetime. It, instead, increases the average lifetime for sensors. We simulate our algorithm to evaluate its performance. Results show that W-LEACH increases the network lifetime and the average lifetime for sensors for uniform and non-uniform WSNs.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122666658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 90
期刊
2010 IEEE International Conference on Data Mining Workshops
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1