Due to the recent advances in modern metagenomics sequencing methods, it becomes possible to directly analyze the microbial communities within human body. To understand how microbial communities adapt, develop, and interact over time with the human body and the surrounding environment, a critical step is the inference of interactions among different microbes directly from sequencing data. However, metagenomics data is both compositional and highly dimensional in nature. Consequently, new approaches that can accurately and robustly estimate the interactions among various microbe species are needed to analyze such data. To this end, we propose a novel framework called Microbial Time-series Prior Lasso (MTPLasso) which integrates sparse linear regression with microbial co-occurrences and associations obtained from scientific literature and cross-sectional metagenomics data. We show that MTPLasso outperforms existing models in terms of precision and recall rates, as well as the accuracy in inferring the interaction types. Finally, the interaction networks we infer from human gut data demonstrate credible results when compared against real data.
{"title":"Inferring Microbial Interactions from Metagenomic Time-series Using Prior Biological Knowledge","authors":"Chieh Lo, R. Marculescu","doi":"10.1145/3107411.3107435","DOIUrl":"https://doi.org/10.1145/3107411.3107435","url":null,"abstract":"Due to the recent advances in modern metagenomics sequencing methods, it becomes possible to directly analyze the microbial communities within human body. To understand how microbial communities adapt, develop, and interact over time with the human body and the surrounding environment, a critical step is the inference of interactions among different microbes directly from sequencing data. However, metagenomics data is both compositional and highly dimensional in nature. Consequently, new approaches that can accurately and robustly estimate the interactions among various microbe species are needed to analyze such data. To this end, we propose a novel framework called Microbial Time-series Prior Lasso (MTPLasso) which integrates sparse linear regression with microbial co-occurrences and associations obtained from scientific literature and cross-sectional metagenomics data. We show that MTPLasso outperforms existing models in terms of precision and recall rates, as well as the accuracy in inferring the interaction types. Finally, the interaction networks we infer from human gut data demonstrate credible results when compared against real data.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130511934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Typing methods are widely used in the surveillance of infectious diseases, outbreaks investigation and studies of the natural history of an infection. And their use is becoming standard, in particular with the introduction of High Throughput Sequencing (HTS). On the other hand, the data being generated is massive and many algorithms have been proposed for phylogenetic analysis of typing data, such as the goeBURST algorithm. These algorithms must however be run whenever new data becomes available starting from scratch. We address this issue proposing a dynamic version of goeBURST algorithm. Experimental results show that this new version is efficient on integrating new data and updating inferred evolutionary patterns, improving the update running time by at least one order of magnitude.
{"title":"Dynamic Phylogenetic Inference for Sequence-based Typing Data","authors":"Alexandre P. Francisco, M. Nascimento, Cátia Vaz","doi":"10.1145/3107411.3108214","DOIUrl":"https://doi.org/10.1145/3107411.3108214","url":null,"abstract":"Typing methods are widely used in the surveillance of infectious diseases, outbreaks investigation and studies of the natural history of an infection. And their use is becoming standard, in particular with the introduction of High Throughput Sequencing (HTS). On the other hand, the data being generated is massive and many algorithms have been proposed for phylogenetic analysis of typing data, such as the goeBURST algorithm. These algorithms must however be run whenever new data becomes available starting from scratch. We address this issue proposing a dynamic version of goeBURST algorithm. Experimental results show that this new version is efficient on integrating new data and updating inferred evolutionary patterns, improving the update running time by at least one order of magnitude.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"25 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120856937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Computing median trees from gene trees using path-difference metrics has provided several credible species tree estimates. Similar to these metrics is the cophenetic family of metrics that originates from a dendrogram comparison metric introduced more than 50 years ago. Despite the tradition and appeal of the cophenetic metrics, the problem of computing median trees under this family of metrics has not been analyzed. Like other standard median tree problems relevant in practice, as we show here, this problem is also NP-hard. NP-hard median tree problems have been successfully addressed by local search heuristics that are solving thousands of instances of a corresponding local search problem. For the local search problem under a cophenetic metric the best known (naive) algorithm has a time complexity that is typically prohibitive for effective heuristic searches. Focusing on the Manhattan norm (Manhattan cophenetic metric), we describe an efficient algorithm for this problem that improves on the naive solution by a factor of n, where n is the size of the input trees. We demonstrate the performance of our local search algorithm in a comparative study using published empirical data sets.
{"title":"Cophenetic Median Trees Under the Manhattan Distance","authors":"Alexey Markin, O. Eulenstein","doi":"10.1145/3107411.3107443","DOIUrl":"https://doi.org/10.1145/3107411.3107443","url":null,"abstract":"Computing median trees from gene trees using path-difference metrics has provided several credible species tree estimates. Similar to these metrics is the cophenetic family of metrics that originates from a dendrogram comparison metric introduced more than 50 years ago. Despite the tradition and appeal of the cophenetic metrics, the problem of computing median trees under this family of metrics has not been analyzed. Like other standard median tree problems relevant in practice, as we show here, this problem is also NP-hard. NP-hard median tree problems have been successfully addressed by local search heuristics that are solving thousands of instances of a corresponding local search problem. For the local search problem under a cophenetic metric the best known (naive) algorithm has a time complexity that is typically prohibitive for effective heuristic searches. Focusing on the Manhattan norm (Manhattan cophenetic metric), we describe an efficient algorithm for this problem that improves on the naive solution by a factor of n, where n is the size of the input trees. We demonstrate the performance of our local search algorithm in a comparative study using published empirical data sets.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"39 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120885023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Clinical Decision Support (CDS) Systems are widely used to support efficient evidence-based care and have become an important aspect of healthcare. CDS systems are complex, and sometimes malfunction or exhibit anomalous behavior. We have previously shown how anomaly detection models can be used to successfully identify malfunctions in CDS systems. We have extended this work and applied two new anomaly detection models on CDS alert firing data from a large health system.
{"title":"Applying Bayesian Changepoint Model and Hierarchical Divisive Model for Detecting Anomalies in Clinical Decision Support Alert Firing","authors":"Soumi Ray, A. Wright","doi":"10.1145/3107411.3108200","DOIUrl":"https://doi.org/10.1145/3107411.3108200","url":null,"abstract":"Clinical Decision Support (CDS) Systems are widely used to support efficient evidence-based care and have become an important aspect of healthcare. CDS systems are complex, and sometimes malfunction or exhibit anomalous behavior. We have previously shown how anomaly detection models can be used to successfully identify malfunctions in CDS systems. We have extended this work and applied two new anomaly detection models on CDS alert firing data from a large health system.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"89 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120903307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xuefu Wang, Sujun Li, Wenjing Peng, Y. Mechref, Haixu Tang
Glycomics and glycotranscitomics have emerged as two key high-throughput approaches to interrogating the glycome within specific cells, tissues or organisms under specific conditions. Because the glycotransciptomic analysis utilizes the same experimental protocol as the whole-transcriptome sequencing (RNA-seq) that is commonly used in the genomic research, the glycotranscriptomic information can be conveniently extracted in silico for many biological samples from which RNA-seq data have been collected and made publicly available through large-scale projects such as The Cancer Genome Atlas (TCGA) proeject. However, the glycomic data collection is constrained by specialized analytical tools that are less accessible by biological researchers. In this paper, we present a Bayesian sparse latent regression (BSLR) model for predicting quantitative glycan abundances from glycotranscriptomic data. The model is built using the matched glycomic and glycotranscriptomic data collected in a same set of samples as training sets, and is then exploited to study the common properties of the training samples and to predict these properties (e.g., the glycan abundances) in similar samples from which only glycotranscriptomc data are available. The BSLR model assumes the glycomic and the glycotranscriptomic abundances are both modulated by a small number of independent latent variables, and thus can be constructed by using only a relatively small number of training samples. When tested on simulated data, we show our approach achieves satisfactory performance using only 10-20 training samples. We also tested our model on five cancer cell lines, and showed the BSLR model can accurately predict the glycan abundances from the transcription levels of glycan synthetic genes. Furthermore, the predicted glycan abundances can distinguish the metastatic cell line specifically targeting brain from the remaining breast cancer cell lines as well as the a brain cancer cell line, with only slightly lower power than the observed glycan abundances in glycomic experiments, indicating the BSLR prediction retains the variations of glycan abundances across different groups of samples from their glycotranscriptomic data.
{"title":"A Sparse Latent Regression Approach for Integrative Analysis of Glycomic and Glycotranscriptomic Data","authors":"Xuefu Wang, Sujun Li, Wenjing Peng, Y. Mechref, Haixu Tang","doi":"10.1145/3107411.3107468","DOIUrl":"https://doi.org/10.1145/3107411.3107468","url":null,"abstract":"Glycomics and glycotranscitomics have emerged as two key high-throughput approaches to interrogating the glycome within specific cells, tissues or organisms under specific conditions. Because the glycotransciptomic analysis utilizes the same experimental protocol as the whole-transcriptome sequencing (RNA-seq) that is commonly used in the genomic research, the glycotranscriptomic information can be conveniently extracted in silico for many biological samples from which RNA-seq data have been collected and made publicly available through large-scale projects such as The Cancer Genome Atlas (TCGA) proeject. However, the glycomic data collection is constrained by specialized analytical tools that are less accessible by biological researchers. In this paper, we present a Bayesian sparse latent regression (BSLR) model for predicting quantitative glycan abundances from glycotranscriptomic data. The model is built using the matched glycomic and glycotranscriptomic data collected in a same set of samples as training sets, and is then exploited to study the common properties of the training samples and to predict these properties (e.g., the glycan abundances) in similar samples from which only glycotranscriptomc data are available. The BSLR model assumes the glycomic and the glycotranscriptomic abundances are both modulated by a small number of independent latent variables, and thus can be constructed by using only a relatively small number of training samples. When tested on simulated data, we show our approach achieves satisfactory performance using only 10-20 training samples. We also tested our model on five cancer cell lines, and showed the BSLR model can accurately predict the glycan abundances from the transcription levels of glycan synthetic genes. Furthermore, the predicted glycan abundances can distinguish the metastatic cell line specifically targeting brain from the remaining breast cancer cell lines as well as the a brain cancer cell line, with only slightly lower power than the observed glycan abundances in glycomic experiments, indicating the BSLR prediction retains the variations of glycan abundances across different groups of samples from their glycotranscriptomic data.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129952794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cryo-electron microscopy (Cryo-EM) is able to achieve high resolution density maps. These density maps are near atomic resolution and individual atoms can be seen as well as large secondary structure elements. However, it is challenging to extract the backbone structure information automatically and efficiently. This paper presents a novel method for creating a backbone trace and predicting locations of Cα atoms for high resolution density maps. It is a graph based method utilizing density along a backbone trace and features of secondary structure elements to find the optimal backbone trace and Cα atom locations. The method is mostly automatic requiring an initial user determined threshold value and primary secondary structure type. We tested our method on fifteen simulated maps at 3A resolution and four experimental cryo-EM density maps between 2.6-3.1A resolution. The result shows that our method is able to generate a complete Cα backbone trace when the density map is not missing data at near atomic resolution.
{"title":"A Graph Based Method for the Prediction of Backbone Trace from Cryo-EM Density Maps","authors":"P. Collins, Dong Si","doi":"10.1145/3107411.3107501","DOIUrl":"https://doi.org/10.1145/3107411.3107501","url":null,"abstract":"Cryo-electron microscopy (Cryo-EM) is able to achieve high resolution density maps. These density maps are near atomic resolution and individual atoms can be seen as well as large secondary structure elements. However, it is challenging to extract the backbone structure information automatically and efficiently. This paper presents a novel method for creating a backbone trace and predicting locations of Cα atoms for high resolution density maps. It is a graph based method utilizing density along a backbone trace and features of secondary structure elements to find the optimal backbone trace and Cα atom locations. The method is mostly automatic requiring an initial user determined threshold value and primary secondary structure type. We tested our method on fifteen simulated maps at 3A resolution and four experimental cryo-EM density maps between 2.6-3.1A resolution. The result shows that our method is able to generate a complete Cα backbone trace when the density map is not missing data at near atomic resolution.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134446852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cell neighbor determination is a significant component in the simulation of a metazoan embryo system since it influences a number of fundamental biological processes, such as cell signaling, migration, and proliferation. Traditional approaches to find the neighbors of a cell such as Voronoi diagram successfully accomplish this goal, but are too time-consuming as the number of cells grows exponentially. In this paper, we propose a learning-based algorithm that determines the neighbors of specific cells in the metazoan embryo in real-time. We decrease the computational time by four orders of magnitude, and achieve an accuracy of 99.66%. For the verification purpose, the simulation results indicate that our model successfully reproduces the neighbor relationship in C. elegans Notch signaling pathways and cell-cell squeeze force modeling of the cell division process.
{"title":"Cell Neighbor Determination in the Metazoan Embryo System","authors":"Z. Wang, Dali Wang, Husheng Li, Z. Bao","doi":"10.1145/3107411.3107465","DOIUrl":"https://doi.org/10.1145/3107411.3107465","url":null,"abstract":"Cell neighbor determination is a significant component in the simulation of a metazoan embryo system since it influences a number of fundamental biological processes, such as cell signaling, migration, and proliferation. Traditional approaches to find the neighbors of a cell such as Voronoi diagram successfully accomplish this goal, but are too time-consuming as the number of cells grows exponentially. In this paper, we propose a learning-based algorithm that determines the neighbors of specific cells in the metazoan embryo in real-time. We decrease the computational time by four orders of magnitude, and achieve an accuracy of 99.66%. For the verification purpose, the simulation results indicate that our model successfully reproduces the neighbor relationship in C. elegans Notch signaling pathways and cell-cell squeeze force modeling of the cell division process.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132272437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The ability to rationally manipulate the transcriptional states of cells would be of great use in medicine and bioengineering. We have developed an algorithm, NetSurgeon, which uses genome wide gene-regulatory networks to identify interventions that force a cell toward a desired expression state. We first validated NetSurgeon extensively on existing datasets. Next, we used Net-Surgeon to select transcription factor deletions aimed at improving ethanol production in Saccharomyces cerevisiae cultures that are catabolizing xylose. We reasoned that interventions that move the transcriptional state of cells using xylose toward that of cells producing large amounts of ethanol from glucose might improve xylose fermentation. Some of the interventions selected by NetSurgeon successfully promoted a fermentative transcriptional state in the absence of glucose, resulting in strains with a 2.7-fold increase in xylose import rates, a 4-fold improvement in xylose integration into central carbon metabolism, or a 1.3-fold increase in ethanol production rate. We conclude by presenting an integrated model of transcriptional regulation and metabolic flux that will enable future efforts aimed at improving xylose fermentation to prioritize functional regulators of central carbon metabolism.
{"title":"Model-based Transcriptome Engineering","authors":"M. Brent","doi":"10.1145/3107411.3107454","DOIUrl":"https://doi.org/10.1145/3107411.3107454","url":null,"abstract":"The ability to rationally manipulate the transcriptional states of cells would be of great use in medicine and bioengineering. We have developed an algorithm, NetSurgeon, which uses genome wide gene-regulatory networks to identify interventions that force a cell toward a desired expression state. We first validated NetSurgeon extensively on existing datasets. Next, we used Net-Surgeon to select transcription factor deletions aimed at improving ethanol production in Saccharomyces cerevisiae cultures that are catabolizing xylose. We reasoned that interventions that move the transcriptional state of cells using xylose toward that of cells producing large amounts of ethanol from glucose might improve xylose fermentation. Some of the interventions selected by NetSurgeon successfully promoted a fermentative transcriptional state in the absence of glucose, resulting in strains with a 2.7-fold increase in xylose import rates, a 4-fold improvement in xylose integration into central carbon metabolism, or a 1.3-fold increase in ethanol production rate. We conclude by presenting an integrated model of transcriptional regulation and metabolic flux that will enable future efforts aimed at improving xylose fermentation to prioritize functional regulators of central carbon metabolism.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129472373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Z. Park, Ko-Woon Choi, D. Seo, S. Ryu, Jong Gu Lee, W. Oh
Inflammatory bowel disease (IBD) subdividing into Crohn's disease (CD) and ulcerative colitis (UC) is a chronic intestinal inflammatory disorder. Infliximab (IFX) as an anti-TNF-α has been prescribed for treatment of IBD patients. However, some patients show no response or a loss of response to this agent. In this study, we investigated to identify genetic variants associated with response to IFX. A total of 148 IBD patients from Yonsei University Health System who received IFX were classified according to subtypes of IBD except 12 patients unsuitable for this study. We also categorized the patients into three groups by IFX response; response (sustained response, loss of response), nonresponse. Whole exome sequencing (WES) was performed and identified on average 35,000 variants including silent, missense and nonsense mutation in each sample. We performed GWAS using the WES data to find out genetic variants associated with response to IFX. We identified only missense variants with suggestive evidence of association. In CD patients, AEBP1 (rs2537188) was associated with nonresponse and PLA2R1 (rs35771982, rs3749117) and IDO2 (rs10109853) were associated with loss of response. In UC patients, AMACR (rs10941112, rs3195676) was associated with loss of response. Furthermore, we will investigate in vitro study at the cellular level for the functional analysis of those genetic variants.
{"title":"Genome-Wide Association Study (GWAS) for the Infliximab Responsiveness in Korean Inflammatory Bowel Disease Patients","authors":"Z. Park, Ko-Woon Choi, D. Seo, S. Ryu, Jong Gu Lee, W. Oh","doi":"10.1145/3107411.3108218","DOIUrl":"https://doi.org/10.1145/3107411.3108218","url":null,"abstract":"Inflammatory bowel disease (IBD) subdividing into Crohn's disease (CD) and ulcerative colitis (UC) is a chronic intestinal inflammatory disorder. Infliximab (IFX) as an anti-TNF-α has been prescribed for treatment of IBD patients. However, some patients show no response or a loss of response to this agent. In this study, we investigated to identify genetic variants associated with response to IFX. A total of 148 IBD patients from Yonsei University Health System who received IFX were classified according to subtypes of IBD except 12 patients unsuitable for this study. We also categorized the patients into three groups by IFX response; response (sustained response, loss of response), nonresponse. Whole exome sequencing (WES) was performed and identified on average 35,000 variants including silent, missense and nonsense mutation in each sample. We performed GWAS using the WES data to find out genetic variants associated with response to IFX. We identified only missense variants with suggestive evidence of association. In CD patients, AEBP1 (rs2537188) was associated with nonresponse and PLA2R1 (rs35771982, rs3749117) and IDO2 (rs10109853) were associated with loss of response. In UC patients, AMACR (rs10941112, rs3195676) was associated with loss of response. Furthermore, we will investigate in vitro study at the cellular level for the functional analysis of those genetic variants.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132235677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Effective in silico compound prioritization is critical to identify promising candidates in the early stages of drug discovery. Current methods typically focus on compound ranking based on one single property, for example, activity, against a single target. However, compound selectivity is also a key property that should be deliberated simultaneously so as to reduce the likelihood of undesired side effects of future drugs. In this paper, we present a novel machine learning based differential compound prioritization method dCPPP. This dCPPP method learns compound prioritization models that rank active compounds well, and meanwhile, preferably rank selective compounds higher via a bi-directional push strategy. The bidirectional push is enhanced with push powers that are determined by ranking difference of selective compounds over multiple bioassays. Our experiments demonstrate that the dCPPP achieves an overall 19.221% improvement on prioritizing selective compounds over baseline models.
{"title":"Differential Compound Prioritization via Bi-Directional Selectivity Push with Power","authors":"Junfeng Liu, Xia Ning","doi":"10.1145/3107411.3107486","DOIUrl":"https://doi.org/10.1145/3107411.3107486","url":null,"abstract":"Effective in silico compound prioritization is critical to identify promising candidates in the early stages of drug discovery. Current methods typically focus on compound ranking based on one single property, for example, activity, against a single target. However, compound selectivity is also a key property that should be deliberated simultaneously so as to reduce the likelihood of undesired side effects of future drugs. In this paper, we present a novel machine learning based differential compound prioritization method dCPPP. This dCPPP method learns compound prioritization models that rank active compounds well, and meanwhile, preferably rank selective compounds higher via a bi-directional push strategy. The bidirectional push is enhanced with push powers that are determined by ranking difference of selective compounds over multiple bioassays. Our experiments demonstrate that the dCPPP achieves an overall 19.221% improvement on prioritizing selective compounds over baseline models.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133765257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}