Xuan Yang, Keqin Li, Yang Zhang, Xi Yu, Junli Deng, Jianxiao Liu
{"title":"EpiMCBN:一种基于MCMC采样优化贝叶斯网络的上位挖掘方法","authors":"Xuan Yang, Keqin Li, Yang Zhang, Xi Yu, Junli Deng, Jianxiao Liu","doi":"10.1109/BIBM55620.2022.9995264","DOIUrl":null,"url":null,"abstract":"Proposing a more effective and accurate epistatic loci detection method is of great significance in improving crop quality, disease treatment, etc. Due to the characteristics of high accuracy and processing non-linear relationship, Bayesian network (BN) has been widely used in constructing the network of SNPs and phenotypes and thus to mine epistasis. However, the shortcoming of BN is that the search space is too large and unable to process large-scale SNPs. In this work, we propose a kind of epistasis mining method using Markov Chain Monte Carlo (MCMC) sampling optimizing Bayesian network (EpiMCBN). Firstly, we use the space of node order composed of SNPs and phenotype to replace the space of network structure. Then MCMC algorithm is used to do sampling to generate multiple different initial orders in linear space or partial space. We use Markov state transition matrix to transfer the initial samples along the Markov chain, thus obtaining multiple order samples. Then we use the $\\alpha$-BICBN scoring function to score the Bayesian networks corresponding to these node orders. Through estimating the probability of edge occurrence in the Bayesian networks, we get an approximate Bayesian network of SNPs and phenotype, then obtain the epistatic loci affecting phenotype. Finally, we compare EpiMCBN with the current popular epistasis mining algorithms using both simulated and real age-related macular disease (AMD) datasets. Experiment results show that EpiMCBN has better epistasis detection accuracy, lower false positive rate, and higher F1-score compared to other methods. Availability and implementation: Source code and dataset are available at: http://122.205.95.139/EpiMCBN/.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"EpiMCBN: A Kind of Epistasis Mining Approach Using MCMC Sampling Optimizing Bayesian Network\",\"authors\":\"Xuan Yang, Keqin Li, Yang Zhang, Xi Yu, Junli Deng, Jianxiao Liu\",\"doi\":\"10.1109/BIBM55620.2022.9995264\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Proposing a more effective and accurate epistatic loci detection method is of great significance in improving crop quality, disease treatment, etc. Due to the characteristics of high accuracy and processing non-linear relationship, Bayesian network (BN) has been widely used in constructing the network of SNPs and phenotypes and thus to mine epistasis. However, the shortcoming of BN is that the search space is too large and unable to process large-scale SNPs. In this work, we propose a kind of epistasis mining method using Markov Chain Monte Carlo (MCMC) sampling optimizing Bayesian network (EpiMCBN). Firstly, we use the space of node order composed of SNPs and phenotype to replace the space of network structure. Then MCMC algorithm is used to do sampling to generate multiple different initial orders in linear space or partial space. We use Markov state transition matrix to transfer the initial samples along the Markov chain, thus obtaining multiple order samples. Then we use the $\\\\alpha$-BICBN scoring function to score the Bayesian networks corresponding to these node orders. Through estimating the probability of edge occurrence in the Bayesian networks, we get an approximate Bayesian network of SNPs and phenotype, then obtain the epistatic loci affecting phenotype. Finally, we compare EpiMCBN with the current popular epistasis mining algorithms using both simulated and real age-related macular disease (AMD) datasets. Experiment results show that EpiMCBN has better epistasis detection accuracy, lower false positive rate, and higher F1-score compared to other methods. Availability and implementation: Source code and dataset are available at: http://122.205.95.139/EpiMCBN/.\",\"PeriodicalId\":210337,\"journal\":{\"name\":\"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BIBM55620.2022.9995264\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM55620.2022.9995264","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
EpiMCBN: A Kind of Epistasis Mining Approach Using MCMC Sampling Optimizing Bayesian Network
Proposing a more effective and accurate epistatic loci detection method is of great significance in improving crop quality, disease treatment, etc. Due to the characteristics of high accuracy and processing non-linear relationship, Bayesian network (BN) has been widely used in constructing the network of SNPs and phenotypes and thus to mine epistasis. However, the shortcoming of BN is that the search space is too large and unable to process large-scale SNPs. In this work, we propose a kind of epistasis mining method using Markov Chain Monte Carlo (MCMC) sampling optimizing Bayesian network (EpiMCBN). Firstly, we use the space of node order composed of SNPs and phenotype to replace the space of network structure. Then MCMC algorithm is used to do sampling to generate multiple different initial orders in linear space or partial space. We use Markov state transition matrix to transfer the initial samples along the Markov chain, thus obtaining multiple order samples. Then we use the $\alpha$-BICBN scoring function to score the Bayesian networks corresponding to these node orders. Through estimating the probability of edge occurrence in the Bayesian networks, we get an approximate Bayesian network of SNPs and phenotype, then obtain the epistatic loci affecting phenotype. Finally, we compare EpiMCBN with the current popular epistasis mining algorithms using both simulated and real age-related macular disease (AMD) datasets. Experiment results show that EpiMCBN has better epistasis detection accuracy, lower false positive rate, and higher F1-score compared to other methods. Availability and implementation: Source code and dataset are available at: http://122.205.95.139/EpiMCBN/.