ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine最新文献
Alternative splicing significantly contributes to proteomic diversity and mis-regulation of splicing can cause diseases in human. Although both genomic and chromatin features have been shown to associate with splicing, the mechanisms by which various chromatin marks influence splicing is not clear for the most part. Moreover, it is not known whether the influence of specific genomic features on splicing is potentially modulated by the chromatin context. Here we report a deep neural network (DNN) model for predicting exon inclusion based on comprehensive genomic and chromatin features. Our analysis in three cell lines shows that, while both genomic and chromatin features can predict splicing to varying degrees, genomic features are the primary drivers of splicing, and the predictive power of chromatin features can largely be explained by their correlation with genomic features; chromatin features do not yield substantial independent contribution to splicing predictability. However, our model identified specific interactions between chromatin and genomic features suggesting that the effect of genomic elements may be modulated by chromatin context.
{"title":"Chromatin and Genomic determinants of alternative splicing.","authors":"Kun Wang, Kan Cao, Sridhar Hannenhalli","doi":"10.1145/2808719.2808755","DOIUrl":"https://doi.org/10.1145/2808719.2808755","url":null,"abstract":"<p><p>Alternative splicing significantly contributes to proteomic diversity and mis-regulation of splicing can cause diseases in human. Although both genomic and chromatin features have been shown to associate with splicing, the mechanisms by which various chromatin marks influence splicing is not clear for the most part. Moreover, it is not known whether the influence of specific genomic features on splicing is potentially modulated by the chromatin context. Here we report a deep neural network (DNN) model for predicting exon inclusion based on comprehensive genomic and chromatin features. Our analysis in three cell lines shows that, while both genomic and chromatin features can predict splicing to varying degrees, genomic features are the primary drivers of splicing, and the predictive power of chromatin features can largely be explained by their correlation with genomic features; chromatin features do not yield substantial independent contribution to splicing predictability. However, our model identified specific interactions between chromatin and genomic features suggesting that the effect of genomic elements may be modulated by chromatin context.</p>","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":"2015 ","pages":"345-354"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2808719.2808755","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35427235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Clinicians in intensive care units (ICUs) rely on standardized scores as risk prediction models to predict a patient's vulnerability to life-threatening events. Conventional Current scales calculate scores from a fixed set of conditions collected within a specific time window. However, modern monitoring technologies generate complex, temporal, and multimodal patient data that conventional prediction models scales cannot fully utilize. Thus, a more sophisticated model is needed to tailor individual characteristics and incorporate multiple temporal modalities for a personalized risk prediction. Furthermore, most scales models focus on adult patients. To address this needdeficiency, we propose a newly designed ICU risk prediction system, called icuARM-II, using a large-scaled pediatric ICU database from Children's Healthcare of Atlanta. This novel database contains clinical data collected in 5,739 ICU visits from 4,975 patients. We propose a temporal association rule mining framework giving clinicians a potential to perform predict risks prediction based on all available patient conditions without being restricted by a fixed observation window. We also develop a new metric that can rigidly assesses the reliability of all all generated association rules. In addition, the icuARM-II features an interactive user interface. Using the icuARM-II, our results demonstrated showed a use case of short-term mortality prediction using lab testing results, which demonstrated a potential new solution for reliable ICU risk prediction using personalized clinical data in a previously neglected population.
{"title":"icuARM-II: improving the reliability of personalized risk prediction in pediatric intensive care units.","authors":"Chih-Wen Cheng, Nikhil Chanani, Kevin Maher, Wang","doi":"10.1145/2649387.2649440","DOIUrl":"10.1145/2649387.2649440","url":null,"abstract":"<p><p>Clinicians in intensive care units (ICUs) rely on standardized scores as risk prediction models to predict a patient's vulnerability to life-threatening events. Conventional Current scales calculate scores from a fixed set of conditions collected within a specific time window. However, modern monitoring technologies generate complex, temporal, and multimodal patient data that conventional prediction models scales cannot fully utilize. Thus, a more sophisticated model is needed to tailor individual characteristics and incorporate multiple temporal modalities for a personalized risk prediction. Furthermore, most scales models focus on adult patients. To address this needdeficiency, we propose a newly designed ICU risk prediction system, called icuARM-II, using a large-scaled pediatric ICU database from Children's Healthcare of Atlanta. This novel database contains clinical data collected in 5,739 ICU visits from 4,975 patients. We propose a temporal association rule mining framework giving clinicians a potential to perform predict risks prediction based on all available patient conditions without being restricted by a fixed observation window. We also develop a new metric that can rigidly assesses the reliability of all all generated association rules. In addition, the icuARM-II features an interactive user interface. Using the icuARM-II, our results demonstrated showed a use case of short-term mortality prediction using lab testing results, which demonstrated a potential new solution for reliable ICU risk prediction using personalized clinical data in a previously neglected population.</p>","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":" ","pages":"211-219"},"PeriodicalIF":0.0,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4983419/pdf/nihms805837.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34313365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Robust prediction models are important for numerous science, engineering, and biomedical applications. However, best-practice procedures for optimizing prediction models can be computationally complex, especially when choosing models from among hundreds or thousands of parameter choices. Computational complexity has further increased with the growth of data in these fields, concurrent with the era of "Big Data". Grid computing is a potential solution to the computational challenges of Big Data. Desktop grid computing, which uses idle CPU cycles of commodity desktop machines, coupled with commercial cloud computing resources can enable research labs to gain easier and more cost effective access to vast computing resources. We have developed omniClassifier, a multi-purpose prediction modeling application that provides researchers with a tool for conducting machine learning research within the guidelines of recommended best-practices. omniClassifier is implemented as a desktop grid computing system using the Berkeley Open Infrastructure for Network Computing (BOINC) middleware. In addition to describing implementation details, we use various gene expression datasets to demonstrate the potential scalability of omniClassifier for efficient and robust Big Data prediction modeling. A prototype of omniClassifier can be accessed at http://omniclassifier.bme.gatech.edu/.
{"title":"omniClassifier: a Desktop Grid Computing System for Big Data Prediction Modeling.","authors":"John H Phan, Sonal Kothari, May D Wang","doi":"10.1145/2649387.2649439","DOIUrl":"10.1145/2649387.2649439","url":null,"abstract":"<p><p>Robust prediction models are important for numerous science, engineering, and biomedical applications. However, best-practice procedures for optimizing prediction models can be computationally complex, especially when choosing models from among hundreds or thousands of parameter choices. Computational complexity has further increased with the growth of data in these fields, concurrent with the era of \"Big Data\". Grid computing is a potential solution to the computational challenges of Big Data. Desktop grid computing, which uses idle CPU cycles of commodity desktop machines, coupled with commercial cloud computing resources can enable research labs to gain easier and more cost effective access to vast computing resources. We have developed omniClassifier, a multi-purpose prediction modeling application that provides researchers with a tool for conducting machine learning research within the guidelines of recommended best-practices. omniClassifier is implemented as a desktop grid computing system using the Berkeley Open Infrastructure for Network Computing (BOINC) middleware. In addition to describing implementation details, we use various gene expression datasets to demonstrate the potential scalability of omniClassifier for efficient and robust Big Data prediction modeling. A prototype of omniClassifier can be accessed at http://omniclassifier.bme.gatech.edu/.</p>","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":"2014 ","pages":"514-523"},"PeriodicalIF":0.0,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4983434/pdf/nihms805844.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9852973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mahbubur Rahman, Rummana Bari, Amin Ahsan Ali, Moushumi Sharmin, Andrew Raij, Karen Hovsepian, Syed Monowar Hossain, Emre Ertin, Ashley Kennedy, David H Epstein, Kenzie L Preston, Michelle Jobes, J Gayle Beck, Satish Kedia, Kenneth D Ward, Mustafa al'Absi, Santosh Kumar
Stress can lead to headaches and fatigue, precipitate addictive behaviors (e.g., smoking, alcohol and drug use), and lead to cardiovascular diseases and cancer. Continuous assessment of stress from sensors can be used for timely delivery of a variety of interventions to reduce or avoid stress. We investigate the feasibility of continuous stress measurement via two field studies using wireless physiological sensors - a four-week study with illicit drug users (n = 40), and a one-week study with daily smokers and social drinkers (n = 30). We find that 11+ hours/day of usable data can be obtained in a 4-week study. Significant learning effect is observed after the first week and data yield is seen to be increasing over time even in the fourth week. We propose a framework to analyze sensor data yield and find that losses in wireless channel is negligible; the main hurdle in further improving data yield is the attachment constraint. We show the feasibility of measuring stress minutes preceding events of interest and observe the sensor-derived stress to be rising prior to self-reported stress and smoking events.
{"title":"Are We There Yet? Feasibility of Continuous Stress Assessment via Wireless Physiological Sensors.","authors":"Mahbubur Rahman, Rummana Bari, Amin Ahsan Ali, Moushumi Sharmin, Andrew Raij, Karen Hovsepian, Syed Monowar Hossain, Emre Ertin, Ashley Kennedy, David H Epstein, Kenzie L Preston, Michelle Jobes, J Gayle Beck, Satish Kedia, Kenneth D Ward, Mustafa al'Absi, Santosh Kumar","doi":"10.1145/2649387.2649433","DOIUrl":"10.1145/2649387.2649433","url":null,"abstract":"<p><p>Stress can lead to headaches and fatigue, precipitate addictive behaviors (e.g., smoking, alcohol and drug use), and lead to cardiovascular diseases and cancer. Continuous assessment of stress from sensors can be used for timely delivery of a variety of interventions to reduce or avoid stress. We investigate the feasibility of continuous stress measurement via two field studies using wireless physiological sensors - a four-week study with illicit drug users (<i>n</i> = 40), and a one-week study with daily smokers and social drinkers (<i>n</i> = 30). We find that 11+ hours/day of usable data can be obtained in a 4-week study. Significant learning effect is observed after the first week and data yield is seen to be increasing over time even in the fourth week. We propose a framework to analyze sensor data yield and find that losses in wireless channel is negligible; the main hurdle in further improving data yield is the attachment constraint. We show the feasibility of measuring stress minutes preceding events of interest and observe the sensor-derived stress to be rising prior to self-reported stress and smoking events.</p>","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":"2014 ","pages":"479-488"},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4374173/pdf/nihms-671146.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33047557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many text-mining studies have focused on the issue of named entity recognition and normalization, especially in the field of biomedical natural language processing. However, entity recognition is a complicated and difficult task in biomedical text. One particular challenge is to identify and resolve composite named entities, where a single span refers to more than one concept(e.g., BRCA1/2). Most bioconcept recognition and normalization studies have either ignored this issue, used simple ad-hoc rules, or only handled coordination ellipsis, which is only one of the many types of composite mentions studied in this work. No systematic methods for simplifying composite mentions have been previously reported, making a robust approach greatly needed. To this end, we propose a hybrid approach by integrating a machine learning model with a pattern identification strategy to identify the antecedent and conjuncts regions of a concept mention, and then reassemble the composite mention using those identified regions. Our method, which we have named SimConcept, is the first method to systematically handle most types of composite mentions. Our method achieves high performance in identifying and resolving composite mentions for three fundamental biological entities: genes (89.29% in F-measure), diseases (85.52% in F-measure) and chemicals (84.04% in F-measure). Furthermore, our results show that, using our SimConcept method can subsequently help improve the performance of gene and disease concept recognition and normalization.
{"title":"SimConcept: A Hybrid Approach for Simplifying Composite Named Entities in Biomedicine.","authors":"Chih-Hsuan Wei, Robert Leaman, Zhiyong Lu","doi":"10.1145/2649387.2649420","DOIUrl":"10.1145/2649387.2649420","url":null,"abstract":"<p><p>Many text-mining studies have focused on the issue of named entity recognition and normalization, especially in the field of biomedical natural language processing. However, entity recognition is a complicated and difficult task in biomedical text. One particular challenge is to identify and resolve composite named entities, where a single span refers to more than one concept(e.g., BRCA1/2). Most bioconcept recognition and normalization studies have either ignored this issue, used simple ad-hoc rules, or only handled coordination ellipsis, which is only one of the many types of composite mentions studied in this work. No systematic methods for simplifying composite mentions have been previously reported, making a robust approach greatly needed. To this end, we propose a hybrid approach by integrating a machine learning model with a pattern identification strategy to identify the antecedent and conjuncts regions of a concept mention, and then reassemble the composite mention using those identified regions. Our method, which we have named SimConcept, is the first method to systematically handle most types of composite mentions. Our method achieves high performance in identifying and resolving composite mentions for three fundamental biological entities: genes (89.29% in F-measure), diseases (85.52% in F-measure) and chemicals (84.04% in F-measure). Furthermore, our results show that, using our SimConcept method can subsequently help improve the performance of gene and disease concept recognition and normalization.</p>","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":"2014 ","pages":"138-146"},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4384177/pdf/nihms673019.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33193039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The dynamic temporal regulatory effects of microRNA are not well known. We introduce a technique for integrating miRNA and mRNA time series microarray data with known disease pathology. The integrated analysis includes identifying both mRNA and miRNA that are signi cantly similar to the quantitative pathology. Potential regulatory miRNA/mRNA target pairs are identi ed through databases of both predicted and validated pairs. Finally, potential target pairs are ltered by examining the second derivatives of the fold changes over time. Our system was used on genome-wide microarray expression data of mouse lungs (n = 160) following aspiration of multi-walled carbon nanotubes. This system shows promise of readily identifying miRNA for further study as potential biomarker use.
{"title":"Integrated miRNA and mRNA Analysis of Time Series Microarray Data.","authors":"Julian Dymacek, Nancy Lan Guo","doi":"10.1145/2649387.2649411","DOIUrl":"https://doi.org/10.1145/2649387.2649411","url":null,"abstract":"<p><p>The dynamic temporal regulatory effects of microRNA are not well known. We introduce a technique for integrating miRNA and mRNA time series microarray data with known disease pathology. The integrated analysis includes identifying both mRNA and miRNA that are signi cantly similar to the quantitative pathology. Potential regulatory miRNA/mRNA target pairs are identi ed through databases of both predicted and validated pairs. Finally, potential target pairs are ltered by examining the second derivatives of the fold changes over time. Our system was used on genome-wide microarray expression data of mouse lungs (<i>n</i> = 160) following aspiration of multi-walled carbon nanotubes. This system shows promise of readily identifying miRNA for further study as potential biomarker use.</p>","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":"2014 ","pages":"122-127"},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2649387.2649411","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33315379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Raghu Chandramohan, Po-Yen Wu, John H Phan, May D Wang
RNA-sequencing (RNA-seq) technology has emerged as the preferred method for quantification of gene and isoform expression. Numerous RNA-seq quantification tools have been proposed and developed, bringing us closer to developing expression-based diagnostic tests based on this technology. However, because of the rapidly evolving technologies and algorithms, it is essential to establish a systematic method for evaluating the quality of RNA-seq quantification. We investigate how different RNA-seq experimental designs (i.e., variations in sequencing depth and read length) affect various quantification algorithms (i.e., HTSeq, Cufflinks, and MISO). Using simulated data, we evaluate the quantification tools based on four metrics, namely: (1) total number of usable fragments for quantification, (2) detection of genes and isoforms, (3) correlation, and (4) accuracy of expression quantification with respect to the ground truth. Results show that Cufflinks is able to use the largest number of fragments for quantification, leading to better detection of genes and isoforms. However, HTSeq produces more accurate expression estimates. Moreover, each quantification algorithm is affected differently by varying sequencing depth and read length, suggesting that the selection of quantification algorithms should be application-dependent.
{"title":"Systematic Assessment of RNA-Seq Quantification Tools Using Simulated Sequence Data.","authors":"Raghu Chandramohan, Po-Yen Wu, John H Phan, May D Wang","doi":"10.1145/2506583.2506648","DOIUrl":"10.1145/2506583.2506648","url":null,"abstract":"RNA-sequencing (RNA-seq) technology has emerged as the preferred method for quantification of gene and isoform expression. Numerous RNA-seq quantification tools have been proposed and developed, bringing us closer to developing expression-based diagnostic tests based on this technology. However, because of the rapidly evolving technologies and algorithms, it is essential to establish a systematic method for evaluating the quality of RNA-seq quantification. We investigate how different RNA-seq experimental designs (i.e., variations in sequencing depth and read length) affect various quantification algorithms (i.e., HTSeq, Cufflinks, and MISO). Using simulated data, we evaluate the quantification tools based on four metrics, namely: (1) total number of usable fragments for quantification, (2) detection of genes and isoforms, (3) correlation, and (4) accuracy of expression quantification with respect to the ground truth. Results show that Cufflinks is able to use the largest number of fragments for quantification, leading to better detection of genes and isoforms. However, HTSeq produces more accurate expression estimates. Moreover, each quantification algorithm is affected differently by varying sequencing depth and read length, suggesting that the selection of quantification algorithms should be application-dependent.","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":"2013 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2506583.2506648","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34378450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sonal Kothari, John H Phan, Adeboye O Osunkoya, May D Wang
We propose a framework for studying visual morphological patterns across histopathological whole-slide images (WSIs). Image representation is an important component of computer-aided decision support systems for histopathological cancer diagnosis. Such systems extract hundreds of quantitative image features from digitized tissue biopsy slides and produce models for prediction. The performance of these models depends on the identification of informative features for selection of appropriate regions-of-interest (ROIs) from heterogeneous WSIs and for development of models. However, identification of informative features is hindered by the semantic gap between human interpretation of visual morphological patterns and quantitative image features. We address this challenge by using data mining and information visualization tools to study spatial patterns formed by features extracted from sub-sections of WSIs. Using ovarian serous cystadenocarcinoma (OvCa) WSIs provided by the cancer genome atlas (TCGA), we show that (1) individual and (2) multivariate image features correspond to biologically relevant ROIs, and (3) supervised image feature selection can map histopathology domain knowledge to quantitative image features.
我们提出了一个研究组织病理学全切片图像(WSI)视觉形态模式的框架。图像表示是组织病理学癌症诊断计算机辅助决策支持系统的重要组成部分。此类系统从数字化组织活检切片中提取数百个定量图像特征,并生成预测模型。这些模型的性能取决于信息特征的识别,以便从异构的 WSI 中选择适当的感兴趣区(ROI)并开发模型。然而,由于人类对视觉形态模式的解释与定量图像特征之间存在语义差距,因此信息特征的识别受到阻碍。为了应对这一挑战,我们利用数据挖掘和信息可视化工具来研究从 WSI 的子截面中提取的特征所形成的空间模式。利用癌症基因组图谱(TCGA)提供的卵巢浆液性囊腺癌(OvCa)WSIs,我们证明了(1)单个和(2)多元图像特征对应于生物相关的 ROI,以及(3)监督图像特征选择可以将组织病理学领域的知识映射到定量图像特征。
{"title":"Biological Interpretation of Morphological Patterns in Histopathological Whole-Slide Images.","authors":"Sonal Kothari, John H Phan, Adeboye O Osunkoya, May D Wang","doi":"10.1145/2382936.2382964","DOIUrl":"10.1145/2382936.2382964","url":null,"abstract":"<p><p>We propose a framework for studying visual morphological patterns across histopathological whole-slide images (WSIs). Image representation is an important component of computer-aided decision support systems for histopathological cancer diagnosis. Such systems extract hundreds of quantitative image features from digitized tissue biopsy slides and produce models for prediction. The performance of these models depends on the identification of informative features for selection of appropriate regions-of-interest (ROIs) from heterogeneous WSIs and for development of models. However, identification of informative features is hindered by the semantic gap between human interpretation of visual morphological patterns and quantitative image features. We address this challenge by using data mining and information visualization tools to study spatial patterns formed by features extracted from sub-sections of WSIs. Using ovarian serous cystadenocarcinoma (OvCa) WSIs provided by the cancer genome atlas (TCGA), we show that (1) individual and (2) multivariate image features correspond to biologically relevant ROIs, and (3) supervised image feature selection can map histopathology domain knowledge to quantitative image features.</p>","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":"2012 ","pages":"218-225"},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5859578/pdf/nihms807306.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35939491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jie Liu, Humberto Vidaillet, Elizabeth Burnside, David Page
Genome-wide association studies (GWAS) analyze genetic variation (SNPs) across the entire human genome, searching for SNPs that are associated with certain phenotypes, most often diseases, such as breast cancer. In GWAS, we seek a ranking of SNPs in terms of their relevance to the given phenotype. However, because certain SNPs are known to be highly correlated with one another across individuals, it can be beneficial to take into account these correlations when ranking. If a SNP appears associated with the phenotype, and we question whether this association is real, the extent to which its neighbors (correlated SNPs) also appear associated can be informative. Therefore, we propose CollectRank, a ranking approach which allows SNPs to reinforce one another via the correlation structure. CollectRank is loosely analogous to the well-known PageRank algorithm. We first evaluate CollectRank on synthetic data generated from a variety of genetic models under different settings. The numerical results suggest CollectRank can significantly outperform common GWAS methods at the cost of a small amount of extra computation. We further evaluate CollectRank on two real-world GWAS on breast cancer and atrial fibrillation/flutter, and CollectRank performs well in both studies. We finally provide a theoretical analysis that also suggests CollectRank's advantages.
{"title":"A Collective Ranking Method for Genome-wide Association Studies.","authors":"Jie Liu, Humberto Vidaillet, Elizabeth Burnside, David Page","doi":"10.1145/2382936.2382976","DOIUrl":"https://doi.org/10.1145/2382936.2382976","url":null,"abstract":"<p><p>Genome-wide association studies (GWAS) analyze genetic variation (SNPs) across the entire human genome, searching for SNPs that are associated with certain phenotypes, most often diseases, such as breast cancer. In GWAS, we seek a ranking of SNPs in terms of their relevance to the given phenotype. However, because certain SNPs are known to be highly correlated with one another across individuals, it can be beneficial to take into account these correlations when ranking. If a SNP appears associated with the phenotype, and we question whether this association is real, the extent to which its neighbors (correlated SNPs) also appear associated can be informative. Therefore, we propose CollectRank, a ranking approach which allows SNPs to reinforce one another via the correlation structure. CollectRank is loosely analogous to the well-known PageRank algorithm. We first evaluate CollectRank on synthetic data generated from a variety of genetic models under different settings. The numerical results suggest CollectRank can significantly outperform common GWAS methods at the cost of a small amount of extra computation. We further evaluate CollectRank on two real-world GWAS on breast cancer and atrial fibrillation/flutter, and CollectRank performs well in both studies. We finally provide a theoretical analysis that also suggests CollectRank's advantages.</p>","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":"2012 ","pages":"313-320"},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2382936.2382976","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37889997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li C Xue, Rafael A Jordan, Yasser El-Manzalawy, Drena Dobbs, Vasant Honavar
Computational protein-protein docking is a valuable tool for determining the conformation of complexes formed by interacting proteins. Selecting near-native conformations from the large number of possible models generated by docking software presents a significant challenge in practice. We introduce a novel method for ranking docked conformations based on the degree of overlap between the interface residues of a docked conformation formed by a pair of proteins with the set of predicted interface residues between them. Our approach relies on a method, called PS-HomPPI, for reliably predicting protein-protein interface residues by taking into account information derived from both interacting proteins. PS-HomPPI infers the residues of a query protein that are likely to interact with a partner protein based on known interface residues of the homo-interologs of the query-partner protein pair, i.e., pairs of interacting proteins that are homologous to the query protein and partner protein. Our results on Docking Benchmark 3.0 show that the quality of the ranking of docked conformations using our method is consistently superior to that produced using ClusPro cluster-size-based and energy-based criteria for 61 out of the 64 docking complexes for which PS-HomPPI produces interface predictions. An implementation of our method for ranking docked models is freely available at: http://einstein.cs.iastate.edu/DockRank/.
{"title":"Ranking Docked Models of Protein-Protein Complexes Using Predicted Partner-Specific Protein-Protein Interfaces: A Preliminary Study.","authors":"Li C Xue, Rafael A Jordan, Yasser El-Manzalawy, Drena Dobbs, Vasant Honavar","doi":"10.1145/2147805.2147866","DOIUrl":"10.1145/2147805.2147866","url":null,"abstract":"<p><p>Computational protein-protein docking is a valuable tool for determining the conformation of complexes formed by interacting proteins. Selecting near-native conformations from the large number of possible models generated by docking software presents a significant challenge in practice. We introduce a novel method for ranking docked conformations based on the degree of overlap between the interface residues of a docked conformation formed by a pair of proteins with the set of predicted interface residues between them. Our approach relies on a method, called PS-HomPPI, for reliably predicting protein-protein interface residues by taking into account information derived from both interacting proteins. PS-HomPPI infers the residues of a query protein that are likely to interact with a partner protein based on known interface residues of the homo-interologs of the query-partner protein pair, i.e., pairs of interacting proteins that are homologous to the query protein and partner protein. Our results on Docking Benchmark 3.0 show that the quality of the ranking of docked conformations using our method is consistently superior to that produced using ClusPro cluster-size-based and energy-based criteria for 61 out of the 64 docking complexes for which PS-HomPPI produces interface predictions. An implementation of our method for ranking docked models is freely available at: http://einstein.cs.iastate.edu/DockRank/.</p>","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":"2011 ","pages":"441-445"},"PeriodicalIF":0.0,"publicationDate":"2011-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4403796/pdf/nihms314851.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33243558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine