Pub Date : 2023-12-01Epub Date: 2024-02-05DOI: 10.1109/icdm58522.2023.00173
Yanchao Tan, Zihao Zhou, Leisheng Yu, Weiming Liu, Chaochao Chen, Guofang Ma, Xiao Hu, Vicki S Hertzberg, Carl Yang
Personalized diagnosis prediction based on electronic health records (EHR) of patients is a promising yet challenging task for AI in healthcare. Existing studies typically ignore the heterogeneity of diseases across different patients. For example, diabetes can have different complications across different patients (e.g., hyperlipidemia and circulatory disorder), which requires personalized diagnoses and treatments. Specifically, existing models fail to consider 1) varying severity of the same diseases for different patients, 2) complexinteractions among syndromic diseases, and 3) dynamic progression of chronic diseases. In this work, we propose to perform personalized diagnosis prediction based on EHR data via capturing disease severity, interaction, and progression. In particular, we enable personalized disease representations via severity-driven embeddings at the disease level. Then, at the visit level, we propose to capture higher-order interactions among diseases that can collectively affect patients' health status via hypergraph-based aggregation; at the patient level, we devise a personalized generative model based on neural ordinary differential equations to capture the continuous-time disease progressions underlying discrete and incomplete visits. Extensive experiments on two real-world EHR datasets show significant performance gains brought by our approach, yielding average improvements of 10.70% for diagnosis prediction over state-of-the-art competitors.
{"title":"Enhancing Personalized Healthcare via Capturing Disease Severity, Interaction, and Progression.","authors":"Yanchao Tan, Zihao Zhou, Leisheng Yu, Weiming Liu, Chaochao Chen, Guofang Ma, Xiao Hu, Vicki S Hertzberg, Carl Yang","doi":"10.1109/icdm58522.2023.00173","DOIUrl":"10.1109/icdm58522.2023.00173","url":null,"abstract":"<p><p>Personalized diagnosis prediction based on electronic health records (EHR) of patients is a promising yet challenging task for AI in healthcare. Existing studies typically ignore the heterogeneity of diseases across different patients. For example, diabetes can have different complications across different patients (e.g., hyperlipidemia and circulatory disorder), which requires personalized diagnoses and treatments. Specifically, existing models fail to consider 1) <i>varying severity</i> of the same diseases for different patients, 2) <i>complex</i> <i>interactions</i> among syndromic diseases, and 3) <i>dynamic progression</i> of chronic diseases. In this work, we propose to perform personalized diagnosis prediction based on EHR data via capturing disease severity, interaction, and progression. In particular, we enable personalized disease representations via severity-driven embeddings at the disease level. Then, at the visit level, we propose to capture higher-order interactions among diseases that can collectively affect patients' health status via hypergraph-based aggregation; at the patient level, we devise a personalized generative model based on neural ordinary differential equations to capture the continuous-time disease progressions underlying discrete and incomplete visits. Extensive experiments on two real-world EHR datasets show significant performance gains brought by our approach, yielding average improvements of 10.70% for diagnosis prediction over state-of-the-art competitors.</p>","PeriodicalId":74565,"journal":{"name":"Proceedings. IEEE International Conference on Data Mining","volume":"2023 ","pages":"1349-1354"},"PeriodicalIF":0.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10868667/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139742807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-01Epub Date: 2024-02-05DOI: 10.1109/icdm58522.2023.00127
Seungyeon Lee, Ruoqi Liu, Wenyu Song, Ping Zhang
Deep learning models have demonstrated promising results in estimating treatment effects (TEE). However, most of them overlook the variations in treatment outcomes among subgroups with distinct characteristics. This limitation hinders their ability to provide accurate estimations and treatment recommendations for specific subgroups. In this study, we introduce a novel neural network-based framework, named SubgroupTE, which incorporates subgroup identification and treatment effect estimation. SubgroupTE identifies diverse subgroups and simultaneously estimates treatment effects for each subgroup, improving the treatment effect estimation by considering the heterogeneity of treatment responses. Comparative experiments on synthetic data show that SubgroupTE outperforms existing models in treatment effect estimation. Furthermore, experiments on a real-world dataset related to opioid use disorder (OUD) demonstrate the potential of our approach to enhance personalized treatment recommendations for OUD patients.
{"title":"Heterogeneous Treatment Effect Estimation with Subpopulation Identification for Personalized Medicine in Opioid Use Disorder.","authors":"Seungyeon Lee, Ruoqi Liu, Wenyu Song, Ping Zhang","doi":"10.1109/icdm58522.2023.00127","DOIUrl":"10.1109/icdm58522.2023.00127","url":null,"abstract":"<p><p>Deep learning models have demonstrated promising results in estimating treatment effects (TEE). However, most of them overlook the variations in treatment outcomes among subgroups with distinct characteristics. This limitation hinders their ability to provide accurate estimations and treatment recommendations for specific subgroups. In this study, we introduce a novel neural network-based framework, named SubgroupTE, which incorporates subgroup identification and treatment effect estimation. SubgroupTE identifies diverse subgroups and simultaneously estimates treatment effects for each subgroup, improving the treatment effect estimation by considering the heterogeneity of treatment responses. Comparative experiments on synthetic data show that SubgroupTE outperforms existing models in treatment effect estimation. Furthermore, experiments on a real-world dataset related to opioid use disorder (OUD) demonstrate the potential of our approach to enhance personalized treatment recommendations for OUD patients.</p>","PeriodicalId":74565,"journal":{"name":"Proceedings. IEEE International Conference on Data Mining","volume":"2023 ","pages":"1079-1084"},"PeriodicalIF":0.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10883421/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139934592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AI-powered Medical Imaging has recently achieved enormous attention due to its ability to provide fast-paced healthcare diagnoses. However, it usually suffers from a lack of high-quality datasets due to high annotation cost, inter-observer variability, human annotator error, and errors in computer-generated labels. Deep learning models trained on noisy labelled datasets are sensitive to the noise type and lead to less generalization on the unseen samples. To address this challenge, we propose a Robust Stochastic Knowledge Distillation (RoS-KD) framework which mimics the notion of learning a topic from multiple sources to ensure deterrence in learning noisy information. More specifically, RoS-KD learns a smooth, well-informed, and robust student manifold by distilling knowledge from multiple teachers trained on overlapping subsets of training data. Our extensive experiments on popular medical imaging classification tasks (cardiopulmonary disease and lesion classification) using real-world datasets, show the performance benefit of RoS-KD, its ability to distill knowledge from many popular large networks (ResNet-50, DenseNet-121, MobileNet-V2) in a comparatively small network, and its robustness to adversarial attacks (PGD, FSGM). More specifically, RoS-KD achieves > 2% and > 4% improvement on F1-score for lesion classification and cardiopulmonary disease classification tasks, respectively, when the underlying student is ResNet-18 against recent competitive knowledge distillation baseline. Additionally, on cardiopulmonary disease classification task, RoS-KD outperforms most of the SOTA baselines by ~1% gain in AUC score.
{"title":"RoS-KD: A Robust Stochastic Knowledge Distillation Approach for Noisy Medical Imaging.","authors":"Ajay Jaiswal, Kumar Ashutosh, Justin F Rousseau, Yifan Peng, Zhangyang Wang, Ying Ding","doi":"10.1109/icdm54844.2022.00118","DOIUrl":"10.1109/icdm54844.2022.00118","url":null,"abstract":"<p><p>AI-powered Medical Imaging has recently achieved enormous attention due to its ability to provide fast-paced healthcare diagnoses. However, it usually suffers from a lack of high-quality datasets due to high annotation cost, inter-observer variability, human annotator error, and errors in computer-generated labels. Deep learning models trained on noisy labelled datasets are sensitive to the noise type and lead to less generalization on the unseen samples. To address this challenge, we propose a Robust Stochastic Knowledge Distillation (RoS-KD) framework which mimics the notion of learning a topic from multiple sources to ensure deterrence in learning noisy information. More specifically, RoS-KD learns a <i>smooth, well-informed, and robust student manifold</i> by distilling knowledge from multiple teachers trained on <i>overlapping subsets</i> of training data. Our extensive experiments on popular medical imaging classification tasks (cardiopulmonary disease and lesion classification) using real-world datasets, show the performance benefit of RoS-KD, its ability to distill knowledge from many popular large networks (ResNet-50, DenseNet-121, MobileNet-V2) in a comparatively small network, and its robustness to adversarial attacks (PGD, FSGM). More specifically, RoS-KD achieves > 2% and > 4% improvement on F1-score for lesion classification and cardiopulmonary disease classification tasks, respectively, when the underlying student is ResNet-18 against recent competitive knowledge distillation baseline. Additionally, on cardiopulmonary disease classification task, RoS-KD outperforms most of the SOTA baselines by ~1% gain in AUC score.</p>","PeriodicalId":74565,"journal":{"name":"Proceedings. IEEE International Conference on Data Mining","volume":"2022 ","pages":"981-986"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10082964/pdf/nihms-1888486.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9294932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Unsupervised Domain Adaptation (UDA) provides a promising solution for learning without supervision, which transfers knowledge from relevant source domains with accessible labeled training data. Existing UDA solutions hinge on clean training data with a short-tail distribution from the source domain, which can be fragile when the source domain data is corrupted either inherently or via adversarial attacks. In this work, we propose an effective framework to address the challenges of UDA from corrupted source domains in a principled manner. Specifically, we perform knowledge ensemble from multiple domain-invariant models that are learned on random partitions of training data. To further address the distribution shift from the source to the target domain, we refine each of the learned models via mutual information maximization, which adaptively obtains the predictive information of the target domain with high confidence. Extensive empirical studies demonstrate that the proposed approach is robust against various types of poisoned data attacks while achieving high asymptotic performance on the target domain.
{"title":"Robust Unsupervised Domain Adaptation from A Corrupted Source.","authors":"Shuyang Yu, Zhuangdi Zhu, Boyang Liu, Anil K Jain, Jiayu Zhou","doi":"10.1109/icdm54844.2022.00171","DOIUrl":"10.1109/icdm54844.2022.00171","url":null,"abstract":"<p><p>Unsupervised Domain Adaptation (UDA) provides a promising solution for learning without supervision, which transfers knowledge from relevant source domains with accessible labeled training data. Existing UDA solutions hinge on clean training data with a short-tail distribution from the source domain, which can be fragile when the source domain data is corrupted either inherently or via adversarial attacks. In this work, we propose an effective framework to address the challenges of UDA from corrupted source domains in a principled manner. Specifically, we perform knowledge ensemble from multiple domain-invariant models that are learned on random partitions of training data. To further address the distribution shift from the source to the target domain, we refine each of the learned models via mutual information maximization, which adaptively obtains the predictive information of the target domain with high confidence. Extensive empirical studies demonstrate that the proposed approach is robust against various types of poisoned data attacks while achieving high asymptotic performance on the target domain.</p>","PeriodicalId":74565,"journal":{"name":"Proceedings. IEEE International Conference on Data Mining","volume":"2022 ","pages":"1299-1304"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10097501/pdf/nihms-1888097.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9664945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01Epub Date: 2022-01-24DOI: 10.1109/icdm51629.2021.00147
Jing Ma, Qiuchen Zhang, Jian Lou, Li Xiong, Sivasubramanium Bhavani, Joyce C Ho
Tensor factorization has been proved as an efficient unsupervised learning approach for health data analysis, especially for computational phenotyping, where the high-dimensional Electronic Health Records (EHRs) with patients history of medical procedures, medications, diagnosis, lab tests, etc., are converted to meaningful and interpretable medical concepts. Federated tensor factorization distributes the tensor computation to multiple workers under the coordination of a central server, which enables jointly learning the phenotypes across multiple hospitals while preserving the privacy of the patient information. However, existing federated tensor factorization algorithms encounter the single-point-failure issue with the involvement of the central server, which is not only easily exposed to external attacks, but also limits the number of clients sharing information with the server under restricted uplink bandwidth. In this paper, we propose CiderTF, a communication-efficient decentralized generalized tensor factorization, which reduces the uplink communication cost by leveraging a four-level communication reduction strategy designed for a generalized tensor factorization, which has the flexibility of modeling different tensor distribution with multiple kinds of loss functions. Experiments on two real-world EHR datasets demonstrate that CiderTF achieves comparable convergence with the communication reduction up to 99.99%.
{"title":"Communication Efficient Tensor Factorization for Decentralized Healthcare Networks.","authors":"Jing Ma, Qiuchen Zhang, Jian Lou, Li Xiong, Sivasubramanium Bhavani, Joyce C Ho","doi":"10.1109/icdm51629.2021.00147","DOIUrl":"10.1109/icdm51629.2021.00147","url":null,"abstract":"<p><p>Tensor factorization has been proved as an efficient unsupervised learning approach for health data analysis, especially for computational phenotyping, where the high-dimensional Electronic Health Records (EHRs) with patients history of medical procedures, medications, diagnosis, lab tests, etc., are converted to meaningful and interpretable medical concepts. Federated tensor factorization distributes the tensor computation to multiple workers under the coordination of a central server, which enables jointly learning the phenotypes across multiple hospitals while preserving the privacy of the patient information. However, existing federated tensor factorization algorithms encounter the single-point-failure issue with the involvement of the central server, which is not only easily exposed to external attacks, but also limits the number of clients sharing information with the server under restricted uplink bandwidth. In this paper, we propose CiderTF, a communication-efficient decentralized generalized tensor factorization, which reduces the uplink communication cost by leveraging a four-level communication reduction strategy designed for a generalized tensor factorization, which has the flexibility of modeling different tensor distribution with multiple kinds of loss functions. Experiments on two real-world EHR datasets demonstrate that CiderTF achieves comparable convergence with the communication reduction up to 99.99%.</p>","PeriodicalId":74565,"journal":{"name":"Proceedings. IEEE International Conference on Data Mining","volume":"2021 ","pages":"1216-1221"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9652777/pdf/nihms-1846243.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9738773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/icdm51629.2021.00097
Chengxi Zang, Fei Wang
Contrastive learning has demonstrated promising performance in image and text domains either in a self-supervised or a supervised manner. In this work, we extend the supervised contrastive learning framework to clinical risk prediction problems based on longitudinal electronic health records (EHR). We propose a general supervised contrastive loss for learning both binary classification (e.g. in-hospital mortality prediction) and multi-label classification (e.g. phenotyping) in a unified framework. Our supervised contrastive loss practices the key idea of contrastive learning, namely, pulling similar samples closer and pushing dissimilar ones apart from each other, simultaneously by its two components: tries to contrast samples with learned anchors which represent positive and negative clusters, and tries to contrast samples with each other according to their supervised labels. We propose two versions of the above supervised contrastive loss and our experiments on real-world EHR data demonstrate that our proposed loss functions show benefits in improving the performance of strong baselines and even state-of-the-art models on benchmarking tasks for clinical risk predictions. Our loss functions work well with extremely imbalanced data which are common for clinical risk prediction problems. Our loss functions can be easily used to replace (binary or multi-label
对比学习在图像和文本领域都有良好的表现,无论是自监督学习还是监督学习。在这项工作中,我们将监督对比学习框架扩展到基于纵向电子健康记录(EHR)的临床风险预测问题。我们建议采用一种综合监督对比损失ℒC o n t r s t i v e C r o s s e n t r o p y +λℒs p e r u v i s e d C o n t r s t i v e r e g u l r i z e r学习两个二进制分类(如住院死亡率预测)和多标记分类(例如表现型)在一个统一的框架中。我们的监督对比损失算法实践了对比学习的关键思想,即通过它的两个组成部分,同时把相似的样本拉得更近,把不相似的样本推得更远:ℒC o n t r s t i v e C r o s s e n t r o p y试图对比样本学习锚,它代表的积极的和消极的集群,我ℒs p e r u v s e d C o n t r s t i v e r e g u l r i z e r试图相互对比样本根据他们的监管标签。我们提出了上述监督对比损失的两个版本,我们在现实世界的电子病历数据上的实验表明,我们提出的损失函数在提高强基线甚至最先进的模型在临床风险预测基准任务上的性能方面具有优势。我们的损失函数可以很好地处理临床风险预测问题中常见的极度不平衡的数据。我们的损失函数可以很容易地取代现有临床预测模型中采用的(二值或多标签)交叉熵损失。Pytorch代码在https://github.com/calvin-zcx/SCEHR上发布。
{"title":"SCEHR: Supervised Contrastive Learning for Clinical Risk Prediction using Electronic Health Records.","authors":"Chengxi Zang, Fei Wang","doi":"10.1109/icdm51629.2021.00097","DOIUrl":"https://doi.org/10.1109/icdm51629.2021.00097","url":null,"abstract":"<p><p>Contrastive learning has demonstrated promising performance in image and text domains either in a self-supervised or a supervised manner. In this work, we extend the supervised contrastive learning framework to clinical risk prediction problems based on longitudinal electronic health records (EHR). We propose a general supervised contrastive loss <math> <mrow><msub><mi>ℒ</mi> <mrow><mi>C</mi> <mi>o</mi> <mi>n</mi> <mi>t</mi> <mi>r</mi> <mi>a</mi> <mi>s</mi> <mi>t</mi> <mi>i</mi> <mi>v</mi> <mi>e</mi> <mspace></mspace> <mi>C</mi> <mi>r</mi> <mi>o</mi> <mi>s</mi> <mi>s</mi> <mspace></mspace> <mi>E</mi> <mi>n</mi> <mi>t</mi> <mi>r</mi> <mi>o</mi> <mi>p</mi> <mi>y</mi></mrow> </msub> <mo>+</mo> <mi>λ</mi> <msub><mi>ℒ</mi> <mrow><mi>S</mi> <mi>u</mi> <mi>p</mi> <mi>e</mi> <mi>r</mi> <mi>v</mi> <mi>i</mi> <mi>s</mi> <mi>e</mi> <mi>d</mi> <mspace></mspace> <mi>C</mi> <mi>o</mi> <mi>n</mi> <mi>t</mi> <mi>r</mi> <mi>a</mi> <mi>s</mi> <mi>t</mi> <mi>i</mi> <mi>v</mi> <mi>e</mi> <mspace></mspace> <mi>R</mi> <mi>e</mi> <mi>g</mi> <mi>u</mi> <mi>l</mi> <mi>a</mi> <mi>r</mi> <mi>i</mi> <mi>z</mi> <mi>e</mi> <mi>r</mi></mrow> </msub> </mrow> </math> for learning both binary classification (e.g. in-hospital mortality prediction) and multi-label classification (e.g. phenotyping) in a unified framework. Our supervised contrastive loss practices the key idea of contrastive learning, namely, pulling similar samples closer and pushing dissimilar ones apart from each other, simultaneously by its two components: <math> <mrow><msub><mi>ℒ</mi> <mrow><mi>C</mi> <mi>o</mi> <mi>n</mi> <mi>t</mi> <mi>r</mi> <mi>a</mi> <mi>s</mi> <mi>t</mi> <mi>i</mi> <mi>v</mi> <mi>e</mi> <mspace></mspace> <mi>C</mi> <mi>r</mi> <mi>o</mi> <mi>s</mi> <mi>s</mi> <mspace></mspace> <mi>E</mi> <mi>n</mi> <mi>t</mi> <mi>r</mi> <mi>o</mi> <mi>p</mi> <mi>y</mi></mrow> </msub> </mrow> </math> tries to contrast samples with learned anchors which represent positive and negative clusters, and <math> <mrow><msub><mi>ℒ</mi> <mrow><mi>S</mi> <mi>u</mi> <mi>p</mi> <mi>e</mi> <mi>r</mi> <mi>v</mi> <mi>i</mi> <mi>s</mi> <mi>e</mi> <mi>d</mi> <mspace></mspace> <mi>C</mi> <mi>o</mi> <mi>n</mi> <mi>t</mi> <mi>r</mi> <mi>a</mi> <mi>s</mi> <mi>t</mi> <mi>i</mi> <mi>v</mi> <mi>e</mi> <mspace></mspace> <mi>R</mi> <mi>e</mi> <mi>g</mi> <mi>u</mi> <mi>l</mi> <mi>a</mi> <mi>r</mi> <mi>i</mi> <mi>z</mi> <mi>e</mi> <mi>r</mi></mrow> </msub> </mrow> </math> tries to contrast samples with each other according to their supervised labels. We propose two versions of the above supervised contrastive loss and our experiments on real-world EHR data demonstrate that our proposed loss functions show benefits in improving the performance of strong baselines and even state-of-the-art models on benchmarking tasks for clinical risk predictions. Our loss functions work well with extremely imbalanced data which are common for clinical risk prediction problems. Our loss functions can be easily used to replace (binary or multi-label","PeriodicalId":74565,"journal":{"name":"Proceedings. IEEE International Conference on Data Mining","volume":" ","pages":"857-866"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9692209/pdf/nihms-1847610.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40486255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-01Epub Date: 2017-12-18DOI: 10.1109/ICDM.2017.114
Biwei Huang, Kun Zhang, Jiji Zhang, Ruben Sanchez-Romero, Clark Glymour, Bernhard Schölkopf
We address two important issues in causal discovery from nonstationary or heterogeneous data, where parameters associated with a causal structure may change over time or across data sets. First, we investigate how to efficiently estimate the "driving force" of the nonstationarity of a causal mechanism. That is, given a causal mechanism that varies over time or across data sets and whose qualitative structure is known, we aim to extract from data a low-dimensional and interpretable representation of the main components of the changes. For this purpose we develop a novel kernel embedding of nonstationary conditional distributions that does not rely on sliding windows. Second, the embedding also leads to a measure of dependence between the changes of causal modules that can be used to determine the directions of many causal arrows. We demonstrate the power of our methods with experiments on both synthetic and real data.
{"title":"Behind Distribution Shift: Mining Driving Forces of Changes and Causal Arrows.","authors":"Biwei Huang, Kun Zhang, Jiji Zhang, Ruben Sanchez-Romero, Clark Glymour, Bernhard Schölkopf","doi":"10.1109/ICDM.2017.114","DOIUrl":"https://doi.org/10.1109/ICDM.2017.114","url":null,"abstract":"<p><p>We address two important issues in causal discovery from nonstationary or heterogeneous data, where parameters associated with a causal structure may change over time or across data sets. First, we investigate how to efficiently estimate the \"driving force\" of the nonstationarity of a causal mechanism. That is, given a causal mechanism that varies over time or across data sets and whose qualitative structure is known, we aim to extract from data a low-dimensional and interpretable representation of the main components of the changes. For this purpose we develop a novel kernel embedding of nonstationary conditional distributions that does not rely on sliding windows. Second, the embedding also leads to a measure of dependence between the changes of causal modules that can be used to determine the directions of many causal arrows. We demonstrate the power of our methods with experiments on both synthetic and real data.</p>","PeriodicalId":74565,"journal":{"name":"Proceedings. IEEE International Conference on Data Mining","volume":"2017 ","pages":"913-918"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/ICDM.2017.114","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37221581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Learning accurate probabilistic models from data is crucial in many practical tasks in data mining. In this paper we present a new non-parametric calibration method called ensemble of near isotonic regression (ENIR). The method can be considered as an extension of BBQ [20], a recently proposed calibration method, as well as the commonly used calibration method based on isotonic regression (IsoRegC) [27]. ENIR is designed to address the key limitation of IsoRegC which is the monotonicity assumption of the predictions. Similar to BBQ, the method post-processes the output of a binary classifier to obtain calibrated probabilities. Thus it can be used with many existing classification models to generate accurate probabilistic predictions. We demonstrate the performance of ENIR on synthetic and real datasets for commonly applied binary classification models. Experimental results show that the method outperforms several common binary classifier calibration methods. In particular on the real data, ENIR commonly performs statistically significantly better than the other methods, and never worse. It is able to improve the calibration power of classifiers, while retaining their discrimination power. The method is also computationally tractable for large scale datasets, as it is O(N log N) time, where N is the number of samples.
{"title":"Binary Classifier Calibration using an Ensemble of Near Isotonic Regression Models.","authors":"Mahdi Pakdaman Naeini, Gregory F Cooper","doi":"10.1109/ICDM.2016.0047","DOIUrl":"https://doi.org/10.1109/ICDM.2016.0047","url":null,"abstract":"<p><p>Learning accurate probabilistic models from data is crucial in many practical tasks in data mining. In this paper we present a new non-parametric calibration method called <i>ensemble of near isotonic regression</i> (ENIR). The method can be considered as an extension of BBQ [20], a recently proposed calibration method, as well as the commonly used calibration method based on isotonic regression (IsoRegC) [27]. ENIR is designed to address the key limitation of IsoRegC which is the monotonicity assumption of the predictions. Similar to BBQ, the method post-processes the output of a binary classifier to obtain calibrated probabilities. Thus it can be used with many existing classification models to generate accurate probabilistic predictions. We demonstrate the performance of ENIR on synthetic and real datasets for commonly applied binary classification models. Experimental results show that the method outperforms several common binary classifier calibration methods. In particular on the real data, ENIR commonly performs statistically significantly better than the other methods, and never worse. It is able to improve the calibration power of classifiers, while retaining their discrimination power. The method is also computationally tractable for large scale datasets, as it is <i>O</i>(<i>N</i> log <i>N</i>) time, where <i>N</i> is the number of samples.</p>","PeriodicalId":74565,"journal":{"name":"Proceedings. IEEE International Conference on Data Mining","volume":"2016 ","pages":"360-369"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/ICDM.2016.0047","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9889794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01Epub Date: 2017-02-02DOI: 10.1109/ICDM.2016.0146
Jingchao Ni, Wei Cheng, Wei Fan, Xiang Zhang
Joint clustering of multiple networks has been shown to be more accurate than performing clustering on individual networks separately. Many multi-view and multi-domain network clustering methods have been developed for joint multi-network clustering. These methods typically assume there is a common clustering structure shared by all networks, and different networks can provide complementary information on this underlying clustering structure. However, this assumption is too strict to hold in many emerging real-life applications, where multiple networks have diverse data distributions. More popularly, the networks in consideration belong to different underlying groups. Only networks in the same underlying group share similar clustering structures. Better clustering performance can be achieved by considering such groups differently. As a result, an ideal method should be able to automatically detect network groups so that networks in the same group share a common clustering structure. To address this problem, we propose a novel method, ComClus, to simultaneously group and cluster multiple networks. ComClus treats node clusters as features of networks and uses them to differentiate different network groups. Network grouping and clustering are coupled and mutually enhanced during the learning process. Extensive experimental evaluation on a variety of synthetic and real datasets demonstrates the effectiveness of our method.
{"title":"Self-Grouping Multi-Network Clustering.","authors":"Jingchao Ni, Wei Cheng, Wei Fan, Xiang Zhang","doi":"10.1109/ICDM.2016.0146","DOIUrl":"10.1109/ICDM.2016.0146","url":null,"abstract":"<p><p>Joint clustering of multiple networks has been shown to be more accurate than performing clustering on individual networks separately. Many multi-view and multi-domain network clustering methods have been developed for joint multi-network clustering. These methods typically assume there is a common clustering structure shared by all networks, and different networks can provide complementary information on this underlying clustering structure. However, this assumption is too strict to hold in many emerging real-life applications, where multiple networks have diverse data distributions. More popularly, the networks in consideration belong to different underlying groups. Only networks in the same underlying group share similar clustering structures. Better clustering performance can be achieved by considering such groups differently. As a result, an ideal method should be able to automatically detect network groups so that networks in the same group share a common clustering structure. To address this problem, we propose a novel method, ComClus, to simultaneously group and cluster multiple networks. ComClus treats node clusters as features of networks and uses them to differentiate different network groups. Network grouping and clustering are coupled and mutually enhanced during the learning process. Extensive experimental evaluation on a variety of synthetic and real datasets demonstrates the effectiveness of our method.</p>","PeriodicalId":74565,"journal":{"name":"Proceedings. IEEE International Conference on Data Mining","volume":" ","pages":"1119-1124"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/ICDM.2016.0146","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35161055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01Epub Date: 2017-02-02DOI: 10.1109/ICDM.2016.0041
Dijun Luo, Zhouyuan Huo, Yang Wang, Andrew J Saykin, Li Shen, Heng Huang
Many recent scientific efforts have been devoted to constructing the human connectome using Diffusion Tensor Imaging (DTI) data for understanding large-scale brain networks that underlie higher-level cognition in human. However, suitable network analysis computational tools are still lacking in human brain connectivity research. To address this problem, we propose a novel probabilistic multi-graph decomposition model to identify consistent network modules from the brain connectivity networks of the studied subjects. At first, we propose a new probabilistic graph decomposition model to address the high computational complexity issue in existing stochastic block models. After that, we further extend our new probabilistic graph decomposition model for multiple networks/graphs to identify the shared modules cross multiple brain networks by simultaneously incorporating multiple networks and predicting the hidden block state variables. We also derive an efficient optimization algorithm to solve the proposed objective and estimate the model parameters. We validate our method by analyzing both the weighted fiber connectivity networks constructed from DTI images and the standard human face image clustering benchmark data sets. The promising empirical results demonstrate the superior performance of our proposed method.
{"title":"New Probabilistic Multi-Graph Decomposition Model to Identify Consistent Human Brain Network Modules.","authors":"Dijun Luo, Zhouyuan Huo, Yang Wang, Andrew J Saykin, Li Shen, Heng Huang","doi":"10.1109/ICDM.2016.0041","DOIUrl":"https://doi.org/10.1109/ICDM.2016.0041","url":null,"abstract":"<p><p>Many recent scientific efforts have been devoted to constructing the human connectome using Diffusion Tensor Imaging (DTI) data for understanding large-scale brain networks that underlie higher-level cognition in human. However, suitable network analysis computational tools are still lacking in human brain connectivity research. To address this problem, we propose a novel probabilistic multi-graph decomposition model to identify consistent network modules from the brain connectivity networks of the studied subjects. At first, we propose a new probabilistic graph decomposition model to address the high computational complexity issue in existing stochastic block models. After that, we further extend our new probabilistic graph decomposition model for multiple networks/graphs to identify the shared modules cross multiple brain networks by simultaneously incorporating multiple networks and predicting the hidden block state variables. We also derive an efficient optimization algorithm to solve the proposed objective and estimate the model parameters. We validate our method by analyzing both the weighted fiber connectivity networks constructed from DTI images and the standard human face image clustering benchmark data sets. The promising empirical results demonstrate the superior performance of our proposed method.</p>","PeriodicalId":74565,"journal":{"name":"Proceedings. IEEE International Conference on Data Mining","volume":" ","pages":"301-310"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/ICDM.2016.0041","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36044857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}