首页 > 最新文献

2010 IEEE International Conference on Data Mining Workshops最新文献

英文 中文
Trading Tests of Long-Term Market Forecast by Text Mining 基于文本挖掘的长期市场预测交易检验
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.60
K. Izumi, Takashi Goto, Tohgoroh Matsui
We propose a new approach for analyzing the Japanese government bond (JGB) market using text-mining technology. First, we extracted the feature vectors of the monthly reports from the Bank of Japan (BOJ). Then, the trends in the JGB market were estimated by a regression analysis using the feature vectors. As a result of comparison with support vector regression and other methods, the proposal method could forecast in higher accuracy about both the level and direction of long-term market trends. Moreover, our method showed high returns with annual rate averages as a result of the implementation test.
本文提出了一种利用文本挖掘技术分析日本国债市场的新方法。首先,我们提取了日本银行(BOJ)月度报告的特征向量。然后,利用特征向量进行回归分析,估计了日本国债市场的趋势。通过与支持向量回归等方法的比较,该方法对市场长期趋势的水平和方向都有较高的预测精度。此外,通过实施测试,我们的方法显示出较高的年平均收益率。
{"title":"Trading Tests of Long-Term Market Forecast by Text Mining","authors":"K. Izumi, Takashi Goto, Tohgoroh Matsui","doi":"10.1109/ICDMW.2010.60","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.60","url":null,"abstract":"We propose a new approach for analyzing the Japanese government bond (JGB) market using text-mining technology. First, we extracted the feature vectors of the monthly reports from the Bank of Japan (BOJ). Then, the trends in the JGB market were estimated by a regression analysis using the feature vectors. As a result of comparison with support vector regression and other methods, the proposal method could forecast in higher accuracy about both the level and direction of long-term market trends. Moreover, our method showed high returns with annual rate averages as a result of the implementation test.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133771147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
PTCR-Miner: Progressive Temporal Class Rule Mining for Multivariate Temporal Data Classification PTCR-Miner:用于多变量时态数据分类的渐进时态类规则挖掘
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.171
Chao-Hui Lee, V. Tseng
Recently, multivariate temporal data classification has been widely applied on many fields, such as bio-signals analysis, stocks prediction and weather forecasting. Multivariate temporal data contains hybrid type of attributes like numeric and categorical ones. However, most classification methods proposed in the past researches are not directly applicable to the multivariate temporal data with multiple types. Additionally, no useful and readable rules are provided in the existing methods for advanced classification analysis. In this paper, we proposed a novel algorithm named Progressive Temporal Class Rule Miner (PTCR-Miner) for classification on multivariate temporal data with a rule-based design. Through our algorithm, the classification rules discovered follow the purification concept we defined to be comprehensible and intuitive for general users to use on data classification. A series of experiments were conducted to evaluate our method with a multivariate temporal data simulator. The experimental results showed that PTCR-Miner performs effectively and efficiently on different simulated multivariate temporal datasets. Additionally, a real dataset related to asthma monitoring was also tested and the results showed that our classification mechanism works stably for asthma attack predictions. This means the discovered rules are really helpful and comprehensible for data classification. Furthermore, the rule-based and flexible architecture make PTCR-Miner more applicable to different application areas of multivariate temporal data classification.
近年来,多元时间数据分类在生物信号分析、股票预测和天气预报等领域得到了广泛的应用。多变量时态数据包含混合类型的属性,如数字和分类属性。然而,以往研究中提出的大多数分类方法并不直接适用于多类型的多元时间数据。此外,现有的高级分类分析方法没有提供有用的、可读的规则。本文提出了一种基于规则的多变量时态数据分类算法——渐进式时态类规则挖掘算法(PTCR-Miner)。通过我们的算法,发现的分类规则遵循我们定义的净化概念,便于一般用户在数据分类上使用。在多变量时间数据模拟器上进行了一系列实验来评估我们的方法。实验结果表明,PTCR-Miner在不同的模拟多元时间数据集上具有良好的性能。此外,还测试了与哮喘监测相关的真实数据集,结果表明我们的分类机制在哮喘发作预测中稳定地工作。这意味着发现的规则对于数据分类是非常有用和可理解的。此外,基于规则的灵活架构使PTCR-Miner更适用于多元时态数据分类的不同应用领域。
{"title":"PTCR-Miner: Progressive Temporal Class Rule Mining for Multivariate Temporal Data Classification","authors":"Chao-Hui Lee, V. Tseng","doi":"10.1109/ICDMW.2010.171","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.171","url":null,"abstract":"Recently, multivariate temporal data classification has been widely applied on many fields, such as bio-signals analysis, stocks prediction and weather forecasting. Multivariate temporal data contains hybrid type of attributes like numeric and categorical ones. However, most classification methods proposed in the past researches are not directly applicable to the multivariate temporal data with multiple types. Additionally, no useful and readable rules are provided in the existing methods for advanced classification analysis. In this paper, we proposed a novel algorithm named Progressive Temporal Class Rule Miner (PTCR-Miner) for classification on multivariate temporal data with a rule-based design. Through our algorithm, the classification rules discovered follow the purification concept we defined to be comprehensible and intuitive for general users to use on data classification. A series of experiments were conducted to evaluate our method with a multivariate temporal data simulator. The experimental results showed that PTCR-Miner performs effectively and efficiently on different simulated multivariate temporal datasets. Additionally, a real dataset related to asthma monitoring was also tested and the results showed that our classification mechanism works stably for asthma attack predictions. This means the discovered rules are really helpful and comprehensible for data classification. Furthermore, the rule-based and flexible architecture make PTCR-Miner more applicable to different application areas of multivariate temporal data classification.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"76 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114147515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Unsupervised DRG Upcoding Detection in Healthcare Databases 医疗保健数据库中的无监督DRG升级检测
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.108
Wei Luo, M. Gallagher
Diagnosis Related Group (DRG) upcoding is an anomaly in healthcare data that costs hundreds of millions of dollars in many developed countries. DRG upcoding is typically detected through resource intensive auditing. As supervised modeling of DRG upcoding is severely constrained by scope and timeliness of past audit data, we propose in this paper an unsupervised algorithm to filter data for potential identification of DRG upcoding. The algorithm has been applied to a hip replacement/revision dataset and a heart-attack dataset. The results are consistent with the assumptions held by domain experts.
诊断相关组(DRG)升级编码是医疗保健数据中的一种异常现象,在许多发达国家造成数亿美元的损失。DRG升级编码通常通过资源密集型审计来检测。由于DRG升级编码的监督建模受到过去审计数据的范围和时效性的严重限制,我们在本文中提出了一种无监督算法来过滤数据以识别DRG升级编码的潜在特征。该算法已应用于髋关节置换/修复数据集和心脏病数据集。研究结果与领域专家的假设基本一致。
{"title":"Unsupervised DRG Upcoding Detection in Healthcare Databases","authors":"Wei Luo, M. Gallagher","doi":"10.1109/ICDMW.2010.108","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.108","url":null,"abstract":"Diagnosis Related Group (DRG) upcoding is an anomaly in healthcare data that costs hundreds of millions of dollars in many developed countries. DRG upcoding is typically detected through resource intensive auditing. As supervised modeling of DRG upcoding is severely constrained by scope and timeliness of past audit data, we propose in this paper an unsupervised algorithm to filter data for potential identification of DRG upcoding. The algorithm has been applied to a hip replacement/revision dataset and a heart-attack dataset. The results are consistent with the assumptions held by domain experts.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121802116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
An Empirical Comparison of Platt Calibration and Inductive Confidence Machines for Predictions in Drug Discovery 药物发现预测中普拉特校准和归纳置信机的实证比较
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.111
Nikil Wale
During the early phase of drug discovery, machine learning methods are often utilized to select compounds to send for experimental screening. In order to accomplish this goal, any method that can provide estimates of error rate for a given set of predictions is an extremely valuable tool. In this paper we compare Platt Calibration Algorithm and recently introduced Conformal Algorithm to control the error rate in the sense of precision while preserving the ability to identify as many compounds as possible (recall) that are highly likely to be bio-active in a certain context. We empirically evaluate and compare the performance of Platt’s Calibration and offline Mondrian ICM in the context of SVM-based classification on 75 distinct classification problems. We perform this evaluation in the real world setting where the true class labels of compounds are unknown at the time of prediction and are only revealed after the biological experiment is completed. Our empirical results show that under this setting, offline Mondrian ICM and Platt Calibration are not able to bound precision rates very well on an absolute basis. Comparatively, Mondrian ICM, even though not theoretically designed to control precision directly, compares favorably with Platt Calibration for this task.
在药物发现的早期阶段,机器学习方法通常用于选择化合物进行实验筛选。为了实现这一目标,任何能够提供给定预测集错误率估计的方法都是非常有价值的工具。在本文中,我们比较了Platt校准算法和最近引入的保形算法,以控制精度意义上的错误率,同时保留了识别尽可能多的化合物(召回率)的能力,这些化合物在特定环境中极有可能具有生物活性。我们在基于svm的分类背景下,对75个不同的分类问题进行了Platt’s Calibration和离线Mondrian ICM的性能进行了实证评估和比较。我们在现实世界中进行评估,在预测时化合物的真实类别标签是未知的,只有在生物实验完成后才会显示出来。我们的实证结果表明,在这种设置下,离线Mondrian ICM和Platt校准不能在绝对基础上很好地约束精度率。相比之下,蒙德里安ICM,即使不是理论上设计直接控制精度,比较有利的普拉特校准这项任务。
{"title":"An Empirical Comparison of Platt Calibration and Inductive Confidence Machines for Predictions in Drug Discovery","authors":"Nikil Wale","doi":"10.1109/ICDMW.2010.111","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.111","url":null,"abstract":"During the early phase of drug discovery, machine learning methods are often utilized to select compounds to send for experimental screening. In order to accomplish this goal, any method that can provide estimates of error rate for a given set of predictions is an extremely valuable tool. In this paper we compare Platt Calibration Algorithm and recently introduced Conformal Algorithm to control the error rate in the sense of precision while preserving the ability to identify as many compounds as possible (recall) that are highly likely to be bio-active in a certain context. We empirically evaluate and compare the performance of Platt’s Calibration and offline Mondrian ICM in the context of SVM-based classification on 75 distinct classification problems. We perform this evaluation in the real world setting where the true class labels of compounds are unknown at the time of prediction and are only revealed after the biological experiment is completed. Our empirical results show that under this setting, offline Mondrian ICM and Platt Calibration are not able to bound precision rates very well on an absolute basis. Comparatively, Mondrian ICM, even though not theoretically designed to control precision directly, compares favorably with Platt Calibration for this task.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116400542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Dimensionality Reduction on Undersampled Problems through Incremental Discriminative Common Vectors 基于增量判别公共向量的欠采样问题有效降维
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.50
F. Ferri, Katerine Díaz-Chito, W. D. Villanueva
An efficient incremental approach to the discriminative common vector (DCV) method for dimensionality reduction and classification is presented. Starting from the original batch method, an incremental formulation is given. The main idea is to minimize both matrix operations and space constraints. To this end, an straightforward per sample correction is obtained enabling the possibility of setting up an efficient online algorithm. The performance results and the same good properties than the original method are preserved but with a very significant decrease in computational burden when used in dynamic contexts. Extensive experimentation assessing the properties of the proposed algorithms with regard to previously proposed ones using several publicly available high dimensional databases has been carried out.
提出了一种有效的增量式判别公共向量降维分类方法。从原始的批处理方法出发,给出了一种增量公式。其主要思想是最小化矩阵运算和空间约束。为此,获得了一个简单的每个样本校正,使建立一个有效的在线算法成为可能。在动态环境中使用该方法时,不仅保留了原有方法的性能结果和良好的性能,而且大大减少了计算量。利用几个公开可用的高维数据库,对先前提出的算法进行了广泛的实验,以评估所提出算法的特性。
{"title":"Efficient Dimensionality Reduction on Undersampled Problems through Incremental Discriminative Common Vectors","authors":"F. Ferri, Katerine Díaz-Chito, W. D. Villanueva","doi":"10.1109/ICDMW.2010.50","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.50","url":null,"abstract":"An efficient incremental approach to the discriminative common vector (DCV) method for dimensionality reduction and classification is presented. Starting from the original batch method, an incremental formulation is given. The main idea is to minimize both matrix operations and space constraints. To this end, an straightforward per sample correction is obtained enabling the possibility of setting up an efficient online algorithm. The performance results and the same good properties than the original method are preserved but with a very significant decrease in computational burden when used in dynamic contexts. Extensive experimentation assessing the properties of the proposed algorithms with regard to previously proposed ones using several publicly available high dimensional databases has been carried out.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116855655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Mining Users' Opinions Based on Item Folksonomy and Taxonomy for Personalized Recommender Systems 基于条目分类法的个性化推荐系统用户意见挖掘
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.163
Huizhi Liang, Yue Xu, Yuefeng Li
Item folksonomy or tag information is a kind of typical and prevalent web 2.0 information. Item folksonmy contains rich opinion information of users on item classifications and descriptions. It can be used as another important information source to conduct opinion mining. On the other hand, each item is associated with taxonomy information that reflects the viewpoints of experts. In this paper, we propose to mine for users¡¯ opinions on items based on item taxonomy developed by experts and folksonomy contributed by users. In addition, we explore how to make personalized item recommendations based on users¡¯ opinions. The experiments conducted on real word datasets collected from Amazon.com and CiteULike demonstrated the effectiveness of the proposed approaches.
条目分类法或标签信息是一种典型的、流行的web 2.0信息。物品民俗包含了用户对物品分类和描述的丰富意见信息。它可以作为进行意见挖掘的另一个重要信息源。另一方面,每个条目都与反映专家观点的分类法信息相关联。在本文中,我们提出基于专家开发的物品分类法和用户贡献的大众分类法来挖掘用户对物品的意见。此外,我们还将探索如何根据用户的意见进行个性化的商品推荐。在Amazon.com和CiteULike上收集的真实单词数据集上进行的实验证明了所提出方法的有效性。
{"title":"Mining Users' Opinions Based on Item Folksonomy and Taxonomy for Personalized Recommender Systems","authors":"Huizhi Liang, Yue Xu, Yuefeng Li","doi":"10.1109/ICDMW.2010.163","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.163","url":null,"abstract":"Item folksonomy or tag information is a kind of typical and prevalent web 2.0 information. Item folksonmy contains rich opinion information of users on item classifications and descriptions. It can be used as another important information source to conduct opinion mining. On the other hand, each item is associated with taxonomy information that reflects the viewpoints of experts. In this paper, we propose to mine for users¡¯ opinions on items based on item taxonomy developed by experts and folksonomy contributed by users. In addition, we explore how to make personalized item recommendations based on users¡¯ opinions. The experiments conducted on real word datasets collected from Amazon.com and CiteULike demonstrated the effectiveness of the proposed approaches.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121174167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A Study on the Accuracy of Frequency Measures and Its Impact on Knowledge Discovery in Single Sequences 单序列中频率度量的准确性及其对知识发现的影响研究
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.83
M. Gan, H. Dai
In knowledge discovery in single sequences, different results could be discovered from the same sequence when different frequency measures are adopted. It is natural to raise such questions as (1) do these frequency measures reflect actual frequencies accurately? (2) what impacts do frequency measures have on discovered knowledge? (3) are discovered results accurate and reliable? and (4) which measures are appropriate for reflecting frequencies accurately? In this paper, taking three major factors (anti-monotonicity, maximum-frequency and window-width restriction) into account, we identify inaccuracies inherent in seven existing frequency measures, and investigate their impacts on the soundness and completeness of two kinds of knowledge, frequent episodes and episode rules, discovered from single sequences. In order to obtain more accurate frequencies and knowledge, we provide three recommendations for defining appropriate frequency measures. Following the recommendations, we introduce a more appropriate frequency measure. Empirical evaluation reveals the inaccuracies and verifies our findings.
在单序列的知识发现中,采用不同的频率度量,同一序列的知识发现结果可能不同。人们自然会提出这样的问题:(1)这些频率测量是否准确地反映了实际频率?(2)频率度量对发现的知识有什么影响?(3)发现结果是否准确可靠?(4)哪些措施适合准确反映频率?在本文中,我们考虑了三个主要因素(反单调性、最大频率和窗宽限制),识别了现有七种频率度量中固有的不准确性,并研究了它们对从单个序列中发现的频繁事件和事件规则两类知识的健全性和完整性的影响。为了获得更准确的频率和知识,我们提供了定义适当频率测量的三个建议。根据这些建议,我们引入一个更合适的频率度量。实证评估揭示了不准确性,并验证了我们的发现。
{"title":"A Study on the Accuracy of Frequency Measures and Its Impact on Knowledge Discovery in Single Sequences","authors":"M. Gan, H. Dai","doi":"10.1109/ICDMW.2010.83","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.83","url":null,"abstract":"In knowledge discovery in single sequences, different results could be discovered from the same sequence when different frequency measures are adopted. It is natural to raise such questions as (1) do these frequency measures reflect actual frequencies accurately? (2) what impacts do frequency measures have on discovered knowledge? (3) are discovered results accurate and reliable? and (4) which measures are appropriate for reflecting frequencies accurately? In this paper, taking three major factors (anti-monotonicity, maximum-frequency and window-width restriction) into account, we identify inaccuracies inherent in seven existing frequency measures, and investigate their impacts on the soundness and completeness of two kinds of knowledge, frequent episodes and episode rules, discovered from single sequences. In order to obtain more accurate frequencies and knowledge, we provide three recommendations for defining appropriate frequency measures. Following the recommendations, we introduce a more appropriate frequency measure. Empirical evaluation reveals the inaccuracies and verifies our findings.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124017604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Learning Robust Bayesian Network Classifiers in the Space of Markov Equivalent Classes 在马尔可夫等价类空间中学习稳健贝叶斯网络分类器
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.91
Zhongfeng Wang, Zhihai Wang, Bin Fu
Tree Augmented Na¨ýve Bayes(TAN) is a robust classification model. However, so far some researchers still attempt to improve the performance by considering directions of edges, because traditional learning method merely takes into account log likelihood, which is not suitable for learning classifiers, when learning a tree topological structure. In this paper, we analyze search spaces of TAN, research equivalent classes in them. Accordingly, we point out it is not necessary to pay attention to the dependent directions between conditional variables for these directions do not play a role in maximizing log conditional likelihood. For application, we propose a novel framework for learning TAN classifiers. Finally, we run experiments on Weka platform using 45 problems from the University of California at Irvine repository. Experimental results show that classification accuracy and stability do not change statistically in our leraning framework.
Tree Augmented Na¨ýve贝叶斯(TAN)是一种鲁棒分类模型。然而,由于传统的学习方法在学习树状拓扑结构时只考虑对数似然,不适合学习分类器,因此目前仍有一些研究者试图通过考虑边的方向来提高性能。本文分析了TAN的搜索空间,研究了其中的等价类。因此,我们指出没有必要注意条件变量之间的依赖方向,因为这些方向在最大化对数条件似然中不起作用。为了应用,我们提出了一种新的学习TAN分类器的框架。最后,我们使用来自加州大学欧文分校知识库的45个问题在Weka平台上运行实验。实验结果表明,在我们的学习框架下,分类精度和稳定性没有统计学上的变化。
{"title":"Learning Robust Bayesian Network Classifiers in the Space of Markov Equivalent Classes","authors":"Zhongfeng Wang, Zhihai Wang, Bin Fu","doi":"10.1109/ICDMW.2010.91","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.91","url":null,"abstract":"Tree Augmented Na¨ýve Bayes(TAN) is a robust classification model. However, so far some researchers still attempt to improve the performance by considering directions of edges, because traditional learning method merely takes into account log likelihood, which is not suitable for learning classifiers, when learning a tree topological structure. In this paper, we analyze search spaces of TAN, research equivalent classes in them. Accordingly, we point out it is not necessary to pay attention to the dependent directions between conditional variables for these directions do not play a role in maximizing log conditional likelihood. For application, we propose a novel framework for learning TAN classifiers. Finally, we run experiments on Weka platform using 45 problems from the University of California at Irvine repository. Experimental results show that classification accuracy and stability do not change statistically in our leraning framework.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124461374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Integrative Scoring Approach to Identify Transcriptional Regulations Controlling Lung Surfactant Homeostasis 一种综合评分方法识别控制肺表面活性物质稳态的转录调控
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.110
Minlu Zhang, C. Fang, Yan Xu, R. Bhatnagar, L. Lu
Transcriptional regulatory network identification is both a fundamental challenge in systems biology and an important practical application of data mining and machine learning. In this study, we propose a semi-supervised learning-based integrative scoring approach to tackle this challenge and predict transcriptional regulations. Our approach out-performs a state-of-the-art label propagation method and reaches AUC scores above 0.96 for three datasets from microarray experiments in the validation. A map of the transcriptional regulatory network controlling lung surfactant homeostasis was constructed. The predicted and prioritized transcriptional regulations were further validated through experimental verifications. Many other predicted novel regulations may serve as candidates for future experimental investigations.
转录调控网络识别既是系统生物学的基本挑战,也是数据挖掘和机器学习的重要实际应用。在本研究中,我们提出了一种基于半监督学习的综合评分方法来解决这一挑战并预测转录调控。我们的方法优于最先进的标签传播方法,并且在验证中来自微阵列实验的三个数据集的AUC得分超过0.96。构建了调控肺表面活性物质稳态的转录调控网络图谱。通过实验进一步验证了预测和优先排序的转录调控。许多其他预测的新规则可能作为未来实验研究的候选。
{"title":"An Integrative Scoring Approach to Identify Transcriptional Regulations Controlling Lung Surfactant Homeostasis","authors":"Minlu Zhang, C. Fang, Yan Xu, R. Bhatnagar, L. Lu","doi":"10.1109/ICDMW.2010.110","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.110","url":null,"abstract":"Transcriptional regulatory network identification is both a fundamental challenge in systems biology and an important practical application of data mining and machine learning. In this study, we propose a semi-supervised learning-based integrative scoring approach to tackle this challenge and predict transcriptional regulations. Our approach out-performs a state-of-the-art label propagation method and reaches AUC scores above 0.96 for three datasets from microarray experiments in the validation. A map of the transcriptional regulatory network controlling lung surfactant homeostasis was constructed. The predicted and prioritized transcriptional regulations were further validated through experimental verifications. Many other predicted novel regulations may serve as candidates for future experimental investigations.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"369 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123316694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Computing Popular Places Using Graphics Processors 使用图形处理器计算热门场所
Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.45
Marta Fort, J. A. Sellarès, Nacho Valladares
Mobile devices provide the availability of tracking and collecting trajectories of moving objects such as vehicles, people or animals. There exists a well-known collection of patterns which can occur for a subset of trajectories. Specifically we study the so-called Popular Places, that is regions that are visited by many distinct moving objects.We propose algorithms to efficiently compute different forms of reporting Popular Places, that take benefit of the Graphics Processing Unit parallelism capabilities. We also describe how to visualize the reported solutions. Finally we present and discuss experimentalresults obtained with the implementation of our algorithms.
移动设备提供了跟踪和收集移动物体(如车辆、人或动物)轨迹的可用性。存在一个众所周知的模式集合,这些模式可以出现在轨迹的子集中。具体来说,我们研究所谓的热门地点,即许多不同的移动物体访问的区域。我们提出算法来有效地计算不同形式的流行地点报告,利用图形处理单元的并行能力。我们还描述了如何可视化报告的解决方案。最后给出并讨论了算法实现的实验结果。
{"title":"Computing Popular Places Using Graphics Processors","authors":"Marta Fort, J. A. Sellarès, Nacho Valladares","doi":"10.1109/ICDMW.2010.45","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.45","url":null,"abstract":"Mobile devices provide the availability of tracking and collecting trajectories of moving objects such as vehicles, people or animals. There exists a well-known collection of patterns which can occur for a subset of trajectories. Specifically we study the so-called Popular Places, that is regions that are visited by many distinct moving objects.We propose algorithms to efficiently compute different forms of reporting Popular Places, that take benefit of the Graphics Processing Unit parallelism capabilities. We also describe how to visualize the reported solutions. Finally we present and discuss experimentalresults obtained with the implementation of our algorithms.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128166041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
2010 IEEE International Conference on Data Mining Workshops
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1