首页 > 最新文献

Proceedings. IEEE International Conference on Bioinformatics and Biomedicine最新文献

英文 中文
Uncertainty Quantified Computational Analysis of the Energetics of Virus Capsid Assembly. 病毒外壳组装能量的不确定性量化计算分析。
Pub Date : 2016-12-01 Epub Date: 2017-01-19 DOI: 10.1109/BIBM.2016.7822775
N Clement, M Rasheed, C Bajaj

Most of the existing research in assembly pathway prediction/analysis of viral capsids makes the simplifying assumption that the configuration of the intermediate states can be extracted directly from the final configuration of the entire capsid. This assumption does not take into account the conformational changes of the constituent proteins as well as minor changes to the binding interfaces that continue throughout the assembly process until stabilization. This paper presents a statistical-ensemble based approach which samples the configurational space for each monomer with the relative local orientation between monomers, to capture the uncertainties in binding and conformations. Furthermore, instead of using larger capsomers (trimers, pentamers) as building blocks, we allow all possible subassemblies to bind in all possible combinations. We represent the resulting assembly graph in two different ways: First, we use the Wilcoxon signed rank measure to compare the distributions of binding free energy computed on the sampled conformations to predict likely pathways. Second, we represent chemical equilibrium aspects of the transitions as a Bayesian Factor graph where both associations and dissociations are modeled based on concentrations and the binding free energies. We applied these protocols on the feline panleukopenia virus and the Nudaurelia capensis virus. Results from these experiments showed significant departure from those one would obtain if only the static configurations of the proteins were considered. Hence, we establish the importance of an uncertainty-aware protocol for pathway analysis, and provide a statistical framework as an important first step towards assembly pathway prediction with high statistical confidence.

现有的大多数病毒衣壳组装路径预测/分析研究都做了一个简化假设,即中间状态的构型可以直接从整个衣壳的最终构型中提取出来。这一假设没有考虑到组成蛋白的构象变化以及结合界面的微小变化,而这些变化在整个组装过程中一直持续到稳定为止。本文提出了一种基于统计组合的方法,该方法利用单体间的相对局部取向对每个单体的构象空间进行采样,以捕捉结合和构象中的不确定性。此外,我们不使用较大的单体(三聚体、五聚体)作为构建模块,而是允许所有可能的子装配以所有可能的组合进行结合。我们用两种不同的方法表示由此产生的组装图:首先,我们使用 Wilcoxon 符号秩测量法来比较在采样构象上计算的结合自由能分布,以预测可能的路径。其次,我们用贝叶斯因子图来表示化学平衡方面的转变,其中关联和解离都是根据浓度和结合自由能来建模的。我们将这些方案应用于猫泛白细胞减少症病毒和帽状瘤病毒。这些实验的结果表明,如果只考虑蛋白质的静态构型,结果会有很大偏差。因此,我们确定了不确定性感知协议对通路分析的重要性,并提供了一个统计框架,作为以高统计置信度进行组装通路预测的重要第一步。
{"title":"Uncertainty Quantified Computational Analysis of the Energetics of Virus Capsid Assembly.","authors":"N Clement, M Rasheed, C Bajaj","doi":"10.1109/BIBM.2016.7822775","DOIUrl":"10.1109/BIBM.2016.7822775","url":null,"abstract":"<p><p>Most of the existing research in assembly pathway prediction/analysis of viral capsids makes the simplifying assumption that the configuration of the intermediate states can be extracted directly from the final configuration of the entire capsid. This assumption does not take into account the conformational changes of the constituent proteins as well as minor changes to the binding interfaces that continue throughout the assembly process until stabilization. This paper presents a statistical-ensemble based approach which samples the configurational space for each monomer with the relative local orientation between monomers, to capture the uncertainties in binding and conformations. Furthermore, instead of using larger capsomers (trimers, pentamers) as building blocks, we allow all possible subassemblies to bind in all possible combinations. We represent the resulting assembly graph in two different ways: First, we use the Wilcoxon signed rank measure to compare the distributions of binding free energy computed on the sampled conformations to predict likely pathways. Second, we represent chemical equilibrium aspects of the transitions as a Bayesian Factor graph where both associations and dissociations are modeled based on concentrations and the binding free energies. We applied these protocols on the feline panleukopenia virus and the <i>Nudaurelia capensis</i> virus. Results from these experiments showed significant departure from those one would obtain if only the static configurations of the proteins were considered. Hence, we establish the importance of an uncertainty-aware protocol for pathway analysis, and provide a statistical framework as an important first step towards assembly pathway prediction with high statistical confidence.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2016 ","pages":"1706-1713"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5604467/pdf/nihms894982.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35431193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transcriptional Responses to Ultraviolet and Ionizing Radiation: An Approach Based on Graph Curvature. 对紫外线和电离辐射的转录响应:一种基于图曲率的方法。
Pub Date : 2016-12-01 Epub Date: 2017-01-19 DOI: 10.1109/BIBM.2016.7822706
Yongxin Chen, Jung Hun Oh, Romeil Sandhu, Sangkyu Lee, Joseph O Deasy, Allen Tannenbaum

More than half of all cancer patients receive radiotherapy in their treatment process. However, our understanding of abnormal transcriptional responses to radiation remains poor. In this study, we employ an extended definition of Ollivier-Ricci curvature based on LI-Wasserstein distance to investigate genes and biological processes associated with ionizing radiation (IR) and ultraviolet radiation (UV) exposure using a microarray dataset. Gene expression levels were modeled on a gene interaction topology downloaded from the Human Protein Reference Database (HPRD). This was performed for IR, UV, and mock datasets, separately. The difference curvature value between IR and mock graphs (also between UV and mock) for each gene was used as a metric to estimate the extent to which the gene responds to radiation. We found that in comparison of the top 200 genes identified from IR and UV graphs, about 20~30% genes were overlapping. Through gene ontology enrichment analysis, we found that the metabolic-related biological process was highly associated with both IR and UV radiation exposure.

超过一半的癌症患者在治疗过程中接受放疗。然而,我们对辐射异常转录反应的理解仍然很差。在这项研究中,我们采用基于LI-Wasserstein距离的奥利维耶-里奇曲率的扩展定义,使用微阵列数据集研究与电离辐射(IR)和紫外线辐射(UV)暴露相关的基因和生物过程。基因表达水平是根据从人类蛋白质参考数据库(HPRD)下载的基因相互作用拓扑结构建模的。这是分别对IR、UV和模拟数据集执行的。每个基因的红外图和模拟图(紫外图和模拟图)之间的曲率差值被用作估计基因对辐射反应程度的度量。我们发现,在IR图和UV图中鉴定的前200个基因中,约有20~30%的基因重叠。通过基因本体富集分析,我们发现代谢相关的生物过程与IR和UV辐射暴露高度相关。
{"title":"Transcriptional Responses to Ultraviolet and Ionizing Radiation: An Approach Based on Graph Curvature.","authors":"Yongxin Chen,&nbsp;Jung Hun Oh,&nbsp;Romeil Sandhu,&nbsp;Sangkyu Lee,&nbsp;Joseph O Deasy,&nbsp;Allen Tannenbaum","doi":"10.1109/BIBM.2016.7822706","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822706","url":null,"abstract":"<p><p>More than half of all cancer patients receive radiotherapy in their treatment process. However, our understanding of abnormal transcriptional responses to radiation remains poor. In this study, we employ an extended definition of Ollivier-Ricci curvature based on LI-Wasserstein distance to investigate genes and biological processes associated with ionizing radiation (IR) and ultraviolet radiation (UV) exposure using a microarray dataset. Gene expression levels were modeled on a gene interaction topology downloaded from the Human Protein Reference Database (HPRD). This was performed for IR, UV, and mock datasets, separately. The difference curvature value between IR and mock graphs (also between UV and mock) for each gene was used as a metric to estimate the extent to which the gene responds to radiation. We found that in comparison of the top 200 genes identified from IR and UV graphs, about 20~30% genes were overlapping. Through gene ontology enrichment analysis, we found that the metabolic-related biological process was highly associated with both IR and UV radiation exposure.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2016 ","pages":"1302-1306"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBM.2016.7822706","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34784321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Classification of Use Status for Dietary Supplements in Clinical Notes. 临床记录中膳食补充剂使用状况的分类。
Pub Date : 2016-12-01 Epub Date: 2017-01-19 DOI: 10.1109/BIBM.2016.7822668
Yadan Fan, Lu He, Rui Zhang

Clinical notes contain rich information about dietary supplements, which are critical for detecting signals of dietary supplement side effects and interactions between drugs and supplements. One of the important factors of supplement documentation is usage status, such as started and discontinuation. Such information is usually stored in the unstructured clinical notes. We developed a rule-based classifier to identify supplement usage status in clinical notes. The categories referring to the patient's status of supplement use were classified into four classes: Continuing (C), Discontinued (D), Started (S), and Unclassified (U). Clinical notes containing 10 of the most commonly consumed supplements (i.e., alfalfa, echinacea, fish oil, garlic, ginger, ginkgo, ginseng, melatonin, St. John's Wort, and Vitamin E) were retrieved from the University of Minnesota Clinical Data Repository. The gold standard was defined by manually annotating 1000 randomly selected sentences or statements mentioning at least one of these 10 supplements. The rules in the classifier was initially developed on two-thirds of the set of 7 supplements (i.e., alfalfa, garlic, ginger, ginkgo, ginseng, St. John's Wort, and Vitamin E); the performance was evaluated on the remaining one-third of this set. To evaluate the generalizability of rules, we further validated the second testing set on other 3 supplements (i.e., echinacea, fish oil, and melatonin). The performance of the classifier achieved F-measures of 0.95, 0.97, 0.96, and 0.96 for status C, D, S, and U on 7 supplements, respectively. The classifier also showed good generalizability when it was applied to the other 3 supplements with F-measures of 0.96 for C, 0.96 for D, 0.95 for S, and 0.89 for U. This study demonstrated that the classifier can accurately classify supplement usage status, which can be further integrated as a module into the existing natural language processing pipeline for supporting dietary supplement knowledge discovery.

临床记录包含丰富的膳食补充剂信息,这对于检测膳食补充剂副作用和药物与补充剂之间的相互作用至关重要。补充文档的一个重要因素是使用状态,如启动和停止。这些信息通常存储在非结构化的临床记录中。我们开发了一个基于规则的分类器来识别临床记录中的补充剂使用状况。将患者服用补充剂的情况分为四类:持续(C)、停止(D)、开始(S)和未分类(U)。临床记录中包含10种最常服用的补充剂(即苜蓿、紫锥菊、鱼油、大蒜、生姜、银杏、人参、褪黑素、圣约翰草和维生素E)从明尼苏达大学临床数据存储库中检索。黄金标准是通过手动标注1000个随机选择的句子或语句来定义的,这些句子或语句至少提到了这10个补充内容中的一个。分类器中的规则最初是针对7种补充剂(即苜蓿、大蒜、生姜、银杏、人参、圣约翰草和维生素E)中的三分之二制定的;对剩下的三分之一进行性能评估。为了评估规则的普遍性,我们进一步验证了其他3种补充剂(即紫锥菊、鱼油和褪黑素)的第二组测试集。分类器在7种补充剂上的C、D、S和U状态的f测量值分别为0.95、0.97、0.96和0.96。该分类器对C、D、S、u的f值分别为0.96、0.96、0.95和0.89的其他3种补充剂也表现出了良好的泛化性。研究表明,该分类器可以准确地对补充剂的使用状态进行分类,可以作为模块进一步集成到现有的自然语言处理管道中,支持膳食补充剂知识的发现。
{"title":"Classification of Use Status for Dietary Supplements in Clinical Notes.","authors":"Yadan Fan,&nbsp;Lu He,&nbsp;Rui Zhang","doi":"10.1109/BIBM.2016.7822668","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822668","url":null,"abstract":"<p><p>Clinical notes contain rich information about dietary supplements, which are critical for detecting signals of dietary supplement side effects and interactions between drugs and supplements. One of the important factors of supplement documentation is usage status, such as started and discontinuation. Such information is usually stored in the unstructured clinical notes. We developed a rule-based classifier to identify supplement usage status in clinical notes. The categories referring to the patient's status of supplement use were classified into four classes: Continuing (C), Discontinued (D), Started (S), and Unclassified (U). Clinical notes containing 10 of the most commonly consumed supplements (i.e., alfalfa, echinacea, fish oil, garlic, ginger, ginkgo, ginseng, melatonin, St. John's Wort, and Vitamin E) were retrieved from the University of Minnesota Clinical Data Repository. The gold standard was defined by manually annotating 1000 randomly selected sentences or statements mentioning at least one of these 10 supplements. The rules in the classifier was initially developed on two-thirds of the set of 7 supplements (i.e., alfalfa, garlic, ginger, ginkgo, ginseng, St. John's Wort, and Vitamin E); the performance was evaluated on the remaining one-third of this set. To evaluate the generalizability of rules, we further validated the second testing set on other 3 supplements (i.e., echinacea, fish oil, and melatonin). The performance of the classifier achieved F-measures of 0.95, 0.97, 0.96, and 0.96 for status C, D, S, and U on 7 supplements, respectively. The classifier also showed good generalizability when it was applied to the other 3 supplements with F-measures of 0.96 for C, 0.96 for D, 0.95 for S, and 0.89 for U. This study demonstrated that the classifier can accurately classify supplement usage status, which can be further integrated as a module into the existing natural language processing pipeline for supporting dietary supplement knowledge discovery.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2016 ","pages":"1054-1061"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBM.2016.7822668","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35428398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
DeeperBind: Enhancing Prediction of Sequence Specificities of DNA Binding Proteins. DeeperBind:加强对 DNA 结合蛋白序列特异性的预测。
Pub Date : 2016-12-01 Epub Date: 2017-01-19 DOI: 10.1109/bibm.2016.7822515
Hamid Reza Hassanzadeh, May D Wang

Transcription factors (TFs) are macromolecules that bind to cis-regulatory specific sub-regions of DNA promoters and initiate transcription. Finding the exact location of these binding sites (aka motifs) is important in a variety of domains such as drug design and development. To address this need, several in vivo and in vitro techniques have been developed so far that try to characterize and predict the binding specificity of a protein to different DNA loci. The major problem with these techniques is that they are not accurate enough in prediction of the binding affinity and characterization of the corresponding motifs. As a result, downstream analysis is required to uncover the locations where proteins of interest bind. Here, we propose DeeperBind, a long short term recurrent convolutional network for prediction of protein binding specificities with respect to DNA probes. DeeperBind can model the positional dynamics of probe sequences and hence reckons with the contributions made by individual sub-regions in DNA sequences, in an effective way. Moreover, it can be trained and tested on datasets containing varying-length sequences. We apply our pipeline to the datasets derived from protein binding microarrays (PBMs), an in-vitro high-throughput technology for quantification of protein-DNA binding preferences, and present promising results. To the best of our knowledge, this is the most accurate pipeline that can predict binding specificities of DNA sequences from the data produced by high-throughput technologies through utilization of the power of deep learning for feature generation and positional dynamics modeling.

转录因子(TF)是与 DNA 启动子的顺式调节特定子区域结合并启动转录的大分子。找到这些结合位点(又称图案)的确切位置对药物设计和开发等多个领域都很重要。为了满足这一需求,迄今已开发出多种体内和体外技术,试图描述和预测蛋白质与不同 DNA 位点结合的特异性。这些技术的主要问题在于,它们在预测结合亲和力和表征相应基团方面不够准确。因此,需要进行下游分析才能发现相关蛋白质的结合位置。在此,我们提出了 DeeperBind,这是一种用于预测蛋白质与 DNA 探针结合特异性的长短期递归卷积网络。DeeperBind 可以对探针序列的位置动态进行建模,从而有效地计算 DNA 序列中各个子区域的贡献。此外,它还可以在包含不同长度序列的数据集上进行训练和测试。蛋白质结合微阵列是一种用于量化蛋白质-DNA 结合偏好的体外高通量技术。据我们所知,这是通过利用深度学习在特征生成和位置动力学建模方面的强大功能,从高通量技术产生的数据中预测 DNA 序列结合特异性的最准确的管道。
{"title":"DeeperBind: Enhancing Prediction of Sequence Specificities of DNA Binding Proteins.","authors":"Hamid Reza Hassanzadeh, May D Wang","doi":"10.1109/bibm.2016.7822515","DOIUrl":"10.1109/bibm.2016.7822515","url":null,"abstract":"<p><p>Transcription factors (TFs) are macromolecules that bind to cis-regulatory specific sub-regions of DNA promoters and initiate transcription. Finding the exact location of these binding sites (aka motifs) is important in a variety of domains such as drug design and development. To address this need, several in vivo and in vitro techniques have been developed so far that try to characterize and predict the binding specificity of a protein to different DNA loci. The major problem with these techniques is that they are not accurate enough in prediction of the binding affinity and characterization of the corresponding motifs. As a result, downstream analysis is required to uncover the locations where proteins of interest bind. Here, we propose DeeperBind, a long short term recurrent convolutional network for prediction of protein binding specificities with respect to DNA probes. DeeperBind can model the positional dynamics of probe sequences and hence reckons with the contributions made by individual sub-regions in DNA sequences, in an effective way. Moreover, it can be trained and tested on datasets containing varying-length sequences. We apply our pipeline to the datasets derived from protein binding microarrays (PBMs), an in-vitro high-throughput technology for quantification of protein-DNA binding preferences, and present promising results. To the best of our knowledge, this is the most accurate pipeline that can predict binding specificities of DNA sequences from the data produced by high-throughput technologies through utilization of the power of deep learning for feature generation and positional dynamics modeling.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2016 ","pages":"178-183"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7302108/pdf/nihms-1595286.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38060153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Convolutional Neural Networks for Detecting Secondary Structures in Protein Density Maps from Cryo-Electron Microscopy. 用于从冷冻电镜蛋白质密度图中检测二级结构的深度卷积神经网络
Pub Date : 2016-12-01 Epub Date: 2017-01-19 DOI: 10.1109/BIBM.2016.7822490
Rongjian Li, Dong Si, Tao Zeng, Shuiwang Ji, Jing He

The detection of secondary structure of proteins using three dimensional (3D) cryo-electron microscopy (cryo-EM) images is still a challenging task when the spatial resolution of cryo-EM images is at medium level (5-10Å ). Prior researches focused on the usage of local features that may not capture the global information of image objects. In this study, we propose to use deep learning methods to extract high representative global features and then automatically detect secondary structures of proteins. In particular, we build a convolutional neural network (CNN) classifier that predicts the probability of label for every individual voxel in 3D cryo-EM image with respect to the secondary structure elements of proteins such as α-helix, β-sheet and background. To effectively incorporate the 3D spatial information in protein structures, we propose to perform 3D convolutions in the convolutional layers of CNNs. We show that the proposed CNN classifier can outperform existing SVM method on identifying the secondary structure elements of proteins from 3D cryo-EM medium resolution images.

当冷冻电子显微镜(cryo-EM)图像的空间分辨率处于中等水平(5-10 Å)时,利用三维(3D)冷冻电子显微镜(cryo-EM)图像检测蛋白质的二级结构仍然是一项具有挑战性的任务。之前的研究主要集中在局部特征的使用上,这可能无法捕捉到图像对象的全局信息。在本研究中,我们建议使用深度学习方法来提取高代表性的全局特征,然后自动检测蛋白质的二级结构。具体而言,我们建立了一个卷积神经网络(CNN)分类器,该分类器可预测三维冷冻电镜图像中每个体素的标签概率,并与蛋白质的二级结构元素(如α-螺旋、β-片和背景)相关。为了有效地将三维空间信息纳入蛋白质结构,我们建议在 CNN 的卷积层中执行三维卷积。结果表明,在从三维冷冻电镜中等分辨率图像识别蛋白质二级结构元素方面,所提出的 CNN 分类器优于现有的 SVM 方法。
{"title":"Deep Convolutional Neural Networks for Detecting Secondary Structures in Protein Density Maps from Cryo-Electron Microscopy.","authors":"Rongjian Li, Dong Si, Tao Zeng, Shuiwang Ji, Jing He","doi":"10.1109/BIBM.2016.7822490","DOIUrl":"10.1109/BIBM.2016.7822490","url":null,"abstract":"<p><p>The detection of secondary structure of proteins using three dimensional (3D) cryo-electron microscopy (cryo-EM) images is still a challenging task when the spatial resolution of cryo-EM images is at medium level (5-10Å ). Prior researches focused on the usage of local features that may not capture the global information of image objects. In this study, we propose to use deep learning methods to extract high representative global features and then automatically detect secondary structures of proteins. In particular, we build a convolutional neural network (CNN) classifier that predicts the probability of label for every individual voxel in 3D cryo-EM image with respect to the secondary structure elements of proteins such as <i>α</i>-helix, <i>β</i>-sheet and background. To effectively incorporate the 3D spatial information in protein structures, we propose to perform 3D convolutions in the convolutional layers of CNNs. We show that the proposed CNN classifier can outperform existing SVM method on identifying the secondary structure elements of proteins from 3D cryo-EM medium resolution images.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2016 ","pages":"41-46"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5952046/pdf/nihms874389.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36106213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Multi-Modal Graph-Based Semi-Supervised Pipeline for Predicting Cancer Survival. 基于多模态图的半监督管道预测癌症生存。
Pub Date : 2016-12-01 Epub Date: 2017-01-19 DOI: 10.1109/bibm.2016.7822516
Hamid Reza Hassanzadeh, John H Phan, May D Wang

Cancer survival prediction is an active area of research that can help prevent unnecessary therapies and improve patient's quality of life. Gene expression profiling is being widely used in cancer studies to discover informative biomarkers that aid predict different clinical endpoint prediction. We use multiple modalities of data derived from RNA deep-sequencing (RNA-seq) to predict survival of cancer patients. Despite the wealth of information available in expression profiles of cancer tumors, fulfilling the aforementioned objective remains a big challenge, for the most part, due to the paucity of data samples compared to the high dimension of the expression profiles. As such, analysis of transcriptomic data modalities calls for state-of-the-art big-data analytics techniques that can maximally use all the available data to discover the relevant information hidden within a significant amount of noise. In this paper, we propose a pipeline that predicts cancer patients' survival by exploiting the structure of the input (manifold learning) and by leveraging the unlabeled samples using Laplacian support vector machines, a graph-based semi supervised learning (GSSL) paradigm. We show that under certain circumstances, no single modality per se will result in the best accuracy and by fusing different models together via a stacked generalization strategy, we may boost the accuracy synergistically. We apply our approach to two cancer datasets and present promising results. We maintain that a similar pipeline can be used for predictive tasks where labeled samples are expensive to acquire.

癌症生存预测是一个活跃的研究领域,可以帮助预防不必要的治疗,提高患者的生活质量。基因表达谱被广泛应用于癌症研究中,以发现信息丰富的生物标志物,帮助预测不同的临床终点预测。我们使用来自RNA深度测序(RNA-seq)的多种数据模式来预测癌症患者的生存。尽管在癌症肿瘤的表达谱中有丰富的可用信息,但在很大程度上,由于与高维表达谱相比数据样本的缺乏,实现上述目标仍然是一个巨大的挑战。因此,转录组数据模式的分析需要最先进的大数据分析技术,这些技术可以最大限度地利用所有可用数据来发现隐藏在大量噪声中的相关信息。在本文中,我们提出了一个管道,通过利用输入的结构(流形学习)和利用使用拉普拉斯支持向量机(一种基于图的半监督学习(GSSL)范例的未标记样本来预测癌症患者的生存。研究表明,在某些情况下,单一模型本身不会产生最佳精度,通过堆叠泛化策略将不同模型融合在一起,可以协同提高精度。我们将我们的方法应用于两个癌症数据集,并提出了有希望的结果。我们认为,类似的管道可以用于预测任务,其中标记的样本是昂贵的获取。
{"title":"A Multi-Modal Graph-Based Semi-Supervised Pipeline for Predicting Cancer Survival.","authors":"Hamid Reza Hassanzadeh,&nbsp;John H Phan,&nbsp;May D Wang","doi":"10.1109/bibm.2016.7822516","DOIUrl":"https://doi.org/10.1109/bibm.2016.7822516","url":null,"abstract":"<p><p>Cancer survival prediction is an active area of research that can help prevent unnecessary therapies and improve patient's quality of life. Gene expression profiling is being widely used in cancer studies to discover informative biomarkers that aid predict different clinical endpoint prediction. We use multiple modalities of data derived from RNA deep-sequencing (RNA-seq) to predict survival of cancer patients. Despite the wealth of information available in expression profiles of cancer tumors, fulfilling the aforementioned objective remains a big challenge, for the most part, due to the paucity of data samples compared to the high dimension of the expression profiles. As such, analysis of transcriptomic data modalities calls for state-of-the-art big-data analytics techniques that can maximally use all the available data to discover the relevant information hidden within a significant amount of noise. In this paper, we propose a pipeline that predicts cancer patients' survival by exploiting the structure of the input (manifold learning) and by leveraging the unlabeled samples using Laplacian support vector machines, a graph-based semi supervised learning (GSSL) paradigm. We show that under certain circumstances, no single modality per se will result in the best accuracy and by fusing different models together via a stacked generalization strategy, we may boost the accuracy synergistically. We apply our approach to two cancer datasets and present promising results. We maintain that a similar pipeline can be used for predictive tasks where labeled samples are expensive to acquire.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2016 ","pages":"184-189"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/bibm.2016.7822516","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38151657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Analysis of Temporal Constraints in Qualitative Eligibility Criteria of Cancer Clinical Studies. 肿瘤临床研究定性资格标准的时间约束分析。
Pub Date : 2016-12-01 Epub Date: 2017-01-19 DOI: 10.1109/BIBM.2016.7822607
Zhe He, Zhiwei Chen, Jiang Bian

Clinical studies, especially randomized controlled trials, generate gold-standard medical evidence. However, the lack of population representativeness of clinical studies has hampered their generalizability to the real-world population. Overly restrictive qualitative criteria are often applied to exclude patients. In this work, we develop a lexical-pattern-based tool to structure qualitative eligibility criteria with temporal constraints, with which we analyzed over 10,800 cancer clinical studies. Our results showed that restrictive temporal constraints are often applied on qualitative criteria in cancer studies, limiting the generalizability of their results.

临床研究,尤其是随机对照试验,产生了黄金标准的医学证据。然而,缺乏人群代表性的临床研究阻碍了其推广到现实世界的人群。过于严格的定性标准常常被用于排除患者。在这项工作中,我们开发了一个基于词汇模式的工具来构建具有时间约束的定性资格标准,我们分析了超过10,800个癌症临床研究。我们的研究结果表明,限制性的时间约束通常应用于癌症研究的定性标准,限制了其结果的普遍性。
{"title":"Analysis of Temporal Constraints in Qualitative Eligibility Criteria of Cancer Clinical Studies.","authors":"Zhe He,&nbsp;Zhiwei Chen,&nbsp;Jiang Bian","doi":"10.1109/BIBM.2016.7822607","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822607","url":null,"abstract":"<p><p>Clinical studies, especially randomized controlled trials, generate gold-standard medical evidence. However, the lack of population representativeness of clinical studies has hampered their generalizability to the real-world population. Overly restrictive qualitative criteria are often applied to exclude patients. In this work, we develop a lexical-pattern-based tool to structure qualitative eligibility criteria with temporal constraints, with which we analyzed over 10,800 cancer clinical studies. Our results showed that restrictive temporal constraints are often applied on qualitative criteria in cancer studies, limiting the generalizability of their results.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2016 ","pages":"717-722"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBM.2016.7822607","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35676265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
CHALLENGES IN MATCHING SECONDARY STRUCTURES IN CRYO-EM: AN EXPLORATION. 低温电镜中二级结构匹配的挑战:探索。
Pub Date : 2016-12-01 Epub Date: 2017-01-19 DOI: 10.1109/BIBM.2016.7822776
Devin Haslam, Mohammad Zubair, Desh Ranjan, Abhishek Biswas, Jing He

Cryo-electron microscopy is a fast emerging biophysical technique for structural determination of large protein complexes. While more atomic structures are being determined using this technique, it is still challenging to derive atomic structures from density maps produced at medium resolution when no suitable templates are available. A critical step in structure determination is how a protein chain threads through the 3-dimensional density map. A dynamic programming method was previously developed to generate K best matches of secondary structures between the density map and its protein sequence using shortest paths in a related weighted graph. We discuss challenges associated with the creation of the weighted graph and explore heuristic methods to solve the problem of matching secondary structures.

低温电子显微镜是一种快速兴起的生物物理技术,用于测定大型蛋白质复合物的结构。虽然使用这种技术可以确定更多的原子结构,但在没有合适模板的情况下,从中等分辨率的密度图中导出原子结构仍然具有挑战性。结构确定的关键步骤是蛋白质链如何穿过三维密度图。之前提出了一种动态规划方法,利用相关加权图中的最短路径在密度图与其蛋白质序列之间生成K个二级结构的最佳匹配。我们讨论了与创建加权图相关的挑战,并探索了解决匹配二级结构问题的启发式方法。
{"title":"CHALLENGES IN MATCHING SECONDARY STRUCTURES IN CRYO-EM: AN EXPLORATION.","authors":"Devin Haslam,&nbsp;Mohammad Zubair,&nbsp;Desh Ranjan,&nbsp;Abhishek Biswas,&nbsp;Jing He","doi":"10.1109/BIBM.2016.7822776","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822776","url":null,"abstract":"<p><p>Cryo-electron microscopy is a fast emerging biophysical technique for structural determination of large protein complexes. While more atomic structures are being determined using this technique, it is still challenging to derive atomic structures from density maps produced at medium resolution when no suitable templates are available. A critical step in structure determination is how a protein chain threads through the 3-dimensional density map. A dynamic programming method was previously developed to generate <i>K</i> best matches of secondary structures between the density map and its protein sequence using shortest paths in a related weighted graph. We discuss challenges associated with the creation of the weighted graph and explore heuristic methods to solve the problem of matching secondary structures.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2016 ","pages":"1714-1719"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBM.2016.7822776","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36106214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Sparse Canonical Correlation Analysis via Truncated 1-norm with Application to Brain Imaging Genetics. 截断1-范数稀疏典型相关分析及其在脑成像遗传学中的应用。
Pub Date : 2016-01-01 Epub Date: 2017-01-19 DOI: 10.1109/BIBM.2016.7822605
Lei Du, Tuo Zhang, Kefei Liu, Xiaohui Yao, Jingwen Yan, Shannon L Risacher, Lei Guo, Andrew J Saykin, Li Shen

Discovering bi-multivariate associations between genetic markers and neuroimaging quantitative traits is a major task in brain imaging genetics. Sparse Canonical Correlation Analysis (SCCA) is a popular technique in this area for its powerful capability in identifying bi-multivariate relationships coupled with feature selection. The existing SCCA methods impose either the 1-norm or its variants. The 0-norm is more desirable, which however remains unexplored since the 0-norm minimization is NP-hard. In this paper, we impose the truncated 1-norm to improve the performance of the 1-norm based SCCA methods. Besides, we propose two efficient optimization algorithms and prove their convergence. The experimental results, compared with two benchmark methods, show that our method identifies better and meaningful canonical loading patterns in both simulated and real imaging genetic analyse.

发现遗传标记与神经成像数量性状之间的多变量关联是脑成像遗传学的主要任务。稀疏典型相关分析(SCCA)因其在识别双多元关系和特征选择方面的强大能力而成为该领域的一种流行技术。现有的SCCA方法要么施加l1范数,要么施加它的变体。l0 -范数是更理想的,但由于l0 -范数最小化是np困难的,因此仍未被探索。在本文中,我们施加截断的1-范数来提高基于1-范数的SCCA方法的性能。此外,我们还提出了两种高效的优化算法,并证明了它们的收敛性。实验结果表明,与两种基准方法相比,该方法在模拟和真实成像遗传分析中都能更好地识别出有意义的典型加载模式。
{"title":"Sparse Canonical Correlation Analysis via Truncated <i>ℓ</i><sub>1</sub>-norm with Application to Brain Imaging Genetics.","authors":"Lei Du,&nbsp;Tuo Zhang,&nbsp;Kefei Liu,&nbsp;Xiaohui Yao,&nbsp;Jingwen Yan,&nbsp;Shannon L Risacher,&nbsp;Lei Guo,&nbsp;Andrew J Saykin,&nbsp;Li Shen","doi":"10.1109/BIBM.2016.7822605","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822605","url":null,"abstract":"<p><p>Discovering bi-multivariate associations between genetic markers and neuroimaging quantitative traits is a major task in brain imaging genetics. Sparse Canonical Correlation Analysis (SCCA) is a popular technique in this area for its powerful capability in identifying bi-multivariate relationships coupled with feature selection. The existing SCCA methods impose either the <i>ℓ</i><sub>1</sub>-norm or its variants. The <i>ℓ</i><sub>0</sub>-norm is more desirable, which however remains unexplored since the <i>ℓ</i><sub>0</sub>-norm minimization is NP-hard. In this paper, we impose the truncated <i>ℓ</i><sub>1</sub>-norm to improve the performance of the <i>ℓ</i><sub>1</sub>-norm based SCCA methods. Besides, we propose two efficient optimization algorithms and prove their convergence. The experimental results, compared with two benchmark methods, show that our method identifies better and meaningful canonical loading patterns in both simulated and real imaging genetic analyse.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2016 ","pages":"707-711"},"PeriodicalIF":0.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBM.2016.7822605","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35426546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Human Absorbable MicroRNA Prediction based on an Ensemble Manifold Ranking Model. 基于集成流形排序模型的人可吸收MicroRNA预测。
Pub Date : 2015-11-01 Epub Date: 2015-12-17 DOI: 10.1109/BIBM.2015.7359697
Jiang Shu, Kevin Chiang, Dongyu Zhao, Juan Cui

MicroRNAs, a class of short non-coding RNAs, are able to regulate more than half of human genes and affect many fundamental biological processes. It has been long considered synthesized endogenously until very recent discoveries showing that human can absorb exogenous microRNAs from dietary resources. This finding has raised a challenge scientific question: which exogenous microRNAs can be integrated into human circulation and possibly exert functions in human? Here we present a well-designed ensemble manifold ranking model for identifying human absorbable exogenous miRNAs from 14 common dietary species. Specifically, we have analyzed 4,910 dietary microRNAs with 1,120 features derived based on the microRNA sequence and structure. In total, 70 discriminative features were selected to characterize the circulating microRNAs in human and have been used to infer the possibility of a certain exogenous microRNA getting integrated into human circulation. Finally, 461 dietary microRNAs have been identified as transportable exogenous microRNAs. To assess the performance of our ensemble model, we have validated the top predictions through a milk-feeding study. In addition, 26 microRNAs from two virus species were predicted as transportable and have been validated in two external experiments. The results demonstrate the data-driven computational model is highly promising to study transportable microRNAs while bypassing the complex mechanistic details.

MicroRNAs是一类短的非编码rna,能够调节一半以上的人类基因,并影响许多基本的生物过程。它一直被认为是内源性合成的,直到最近发现人类可以从饮食资源中吸收外源性microrna。这一发现提出了一个具有挑战性的科学问题:哪些外源性microrna可以整合到人体循环中并可能在人体中发挥作用?在这里,我们提出了一个设计良好的集合流形排序模型,用于从14种常见膳食物种中鉴定人类可吸收的外源性mirna。具体来说,我们分析了4,910种膳食microRNA,根据microRNA的序列和结构得出了1,120种特征。总共选择了70个鉴别特征来表征人体循环中的microRNA,并用于推断某种外源性microRNA融入人体循环的可能性。最后,461种膳食microrna被鉴定为可转运的外源性microrna。为了评估我们的集成模型的性能,我们通过母乳喂养研究验证了最佳预测。此外,来自两种病毒的26个microrna被预测为可转运的,并已在两个外部实验中得到验证。结果表明,数据驱动的计算模型非常有希望研究可运输的microrna,同时绕过复杂的机制细节。
{"title":"Human Absorbable MicroRNA Prediction based on an Ensemble Manifold Ranking Model.","authors":"Jiang Shu,&nbsp;Kevin Chiang,&nbsp;Dongyu Zhao,&nbsp;Juan Cui","doi":"10.1109/BIBM.2015.7359697","DOIUrl":"https://doi.org/10.1109/BIBM.2015.7359697","url":null,"abstract":"<p><p>MicroRNAs, a class of short non-coding RNAs, are able to regulate more than half of human genes and affect many fundamental biological processes. It has been long considered synthesized endogenously until very recent discoveries showing that human can absorb exogenous microRNAs from dietary resources. This finding has raised a challenge scientific question: which exogenous microRNAs can be integrated into human circulation and possibly exert functions in human? Here we present a well-designed ensemble manifold ranking model for identifying human absorbable exogenous miRNAs from 14 common dietary species. Specifically, we have analyzed 4,910 dietary microRNAs with 1,120 features derived based on the microRNA sequence and structure. In total, 70 discriminative features were selected to characterize the circulating microRNAs in human and have been used to infer the possibility of a certain exogenous microRNA getting integrated into human circulation. Finally, 461 dietary microRNAs have been identified as transportable exogenous microRNAs. To assess the performance of our ensemble model, we have validated the top predictions through a milk-feeding study. In addition, 26 microRNAs from two virus species were predicted as transportable and have been validated in two external experiments. The results demonstrate the data-driven computational model is highly promising to study transportable microRNAs while bypassing the complex mechanistic details.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2015 ","pages":"295-300"},"PeriodicalIF":0.0,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBM.2015.7359697","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36655301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
Proceedings. IEEE International Conference on Bioinformatics and Biomedicine
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1