ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine最新文献_第2页

A deep learning fusion model for brain disorder classification: Application to distinguishing schizophrenia and autism spectrum disorder. 用于脑部疾病分类的深度学习融合模型：应用于区分精神分裂症和自闭症谱系障碍。

ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Pub Date : 2020-09-01 DOI: 10.1145/3388440.3412478

Yuhui Du, Bang Li, Yuliang Hou, Vince D Calhoun

Deep learning has shown a great promise in classifying brain disorders due to its powerful ability in learning optimal features by nonlinear transformation. However, given the high-dimension property of neuroimaging data, how to jointly exploit complementary information from multimodal neuroimaging data in deep learning is difficult. In this paper, we propose a novel multilevel convolutional neural network (CNN) fusion method that can effectively combine different types of neuroimage-derived features. Importantly, we incorporate a sequential feature selection into the CNN model to increase the feature interpretability. To evaluate our method, we classified two symptom-related brain disorders using large-sample multi-site data from 335 schizophrenia (SZ) patients and 380 autism spectrum disorder (ASD) patients within a cross-validation procedure. Brain functional networks, functional network connectivity, and brain structural morphology were employed to provide possible features. As expected, our fusion method outperformed the CNN model using only single type of features, as our method yielded higher classification accuracy (with mean accuracy >85%) and was more reliable across multiple runs in differentiating the two groups. We found that the default mode, cognitive control, and subcortical regions contributed more in their distinction. Taken together, our method provides an effective means to fuse multimodal features for the diagnosis of different psychiatric and neurological disorders.

深度学习通过非线性变换学习最优特征的强大能力，使其在脑疾病分类方面大有可为。然而，鉴于神经影像数据的高维特性，如何在深度学习中联合利用多模态神经影像数据的互补信息是一个难题。在本文中，我们提出了一种新颖的多级卷积神经网络（CNN）融合方法，它能有效地结合不同类型的神经图像特征。重要的是，我们在 CNN 模型中加入了顺序特征选择，以提高特征的可解释性。为了评估我们的方法，我们使用来自 335 名精神分裂症（SZ）患者和 380 名自闭症谱系障碍（ASD）患者的大样本多站点数据，在交叉验证程序中对两种症状相关的脑部疾病进行了分类。大脑功能网络、功能网络连通性和大脑结构形态被用来提供可能的特征。不出所料，我们的融合方法优于只使用单一类型特征的 CNN 模型，因为我们的方法获得了更高的分类准确率（平均准确率大于 85%），并且在多次运行中区分两组的可靠性更高。我们发现，默认模式、认知控制和皮层下区域对它们的区分贡献更大。综上所述，我们的方法为融合多模态特征诊断不同的精神和神经疾病提供了有效手段。

{"title":"A deep learning fusion model for brain disorder classification: Application to distinguishing schizophrenia and autism spectrum disorder.","authors":"Yuhui Du, Bang Li, Yuliang Hou, Vince D Calhoun","doi":"10.1145/3388440.3412478","DOIUrl":"10.1145/3388440.3412478","url":null,"abstract":"Deep learning has shown a great promise in classifying brain disorders due to its powerful ability in learning optimal features by nonlinear transformation. However, given the high-dimension property of neuroimaging data, how to jointly exploit complementary information from multimodal neuroimaging data in deep learning is difficult. In this paper, we propose a novel multilevel convolutional neural network (CNN) fusion method that can effectively combine different types of neuroimage-derived features. Importantly, we incorporate a sequential feature selection into the CNN model to increase the feature interpretability. To evaluate our method, we classified two symptom-related brain disorders using large-sample multi-site data from 335 schizophrenia (SZ) patients and 380 autism spectrum disorder (ASD) patients within a cross-validation procedure. Brain functional networks, functional network connectivity, and brain structural morphology were employed to provide possible features. As expected, our fusion method outperformed the CNN model using only single type of features, as our method yielded higher classification accuracy (with mean accuracy >85%) and was more reliable across multiple runs in differentiating the two groups. We found that the default mode, cognitive control, and subcortical regions contributed more in their distinction. Taken together, our method provides an effective means to fuse multimodal features for the diagnosis of different psychiatric and neurological disorders.","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7758676/pdf/nihms-1654686.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38750792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Combine Cryo-EM Density Map and Residue Contact for Protein Structure Prediction - A Case Study. 结合低温电镜密度图和残馀接触蛋白结构预测-一个案例研究。

ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Pub Date : 2020-09-01 DOI: 10.1145/3388440.3414708

Maytha Alshammari, Jing He

Cryo-electron microscopy is a major structure determination technique for large molecular machines and membrane-associated complexes. Although atomic structures have been determined directly from cryo-EM density maps with high resolutions, current structure determination methods for medium resolution (5 to 10 Å) cryo-EM maps are limited by the availability of structure templates. Secondary structure traces are lines detected from a cryo-EM density map for α-helices and β-strands of a protein. When combined with secondary structure sequence segments predicted from a protein sequence, it is possible to generate a set of likely topologies of α-traces and β-sheet traces. A topology describes the overall folding relationship among secondary structures; it is a critical piece of information for deriving the corresponding atomic structure. We propose a method for protein structure prediction that combines three sources of information: the secondary structure traces detected from the cryo-EM density map, predicted secondary structure sequence segments, and amino acid contact pairs predicted using MULTICOM. A case study shows that using amino acid contact prediction from MULTICOM improves the ranking of the true topology. Our observations convey that using a small set of highly voted secondary structure contact pairs enhances the ranking in all experiments conducted for this case.

低温电子显微镜是大分子机器和膜相关复合物的主要结构测定技术。虽然原子结构已经直接从高分辨率的低温电镜密度图中确定，但目前用于中分辨率(5到10 Å)低温电镜图的结构确定方法受到结构模板可用性的限制。二级结构痕迹是从蛋白质的α-螺旋和β-链的低温电镜密度图中检测到的线。当结合从蛋白质序列中预测的二级结构序列片段时，可以生成一组α-示踪和β-示踪的可能拓扑结构。拓扑描述二级结构之间的整体折叠关系;这是推导相应原子结构的关键信息。我们提出了一种结合三种信息来源的蛋白质结构预测方法:从低温电镜密度图中检测到的二级结构痕迹，预测的二级结构序列片段，以及使用MULTICOM预测的氨基酸接触对。实例研究表明，利用MULTICOM的氨基酸接触预测提高了真实拓扑的排序。我们的观察表明，使用一组高度投票的二级结构接触对提高了在这种情况下进行的所有实验中的排名。

{"title":"Combine Cryo-EM Density Map and Residue Contact for Protein Structure Prediction - A Case Study.","authors":"Maytha Alshammari, Jing He","doi":"10.1145/3388440.3414708","DOIUrl":"https://doi.org/10.1145/3388440.3414708","url":null,"abstract":"Cryo-electron microscopy is a major structure determination technique for large molecular machines and membrane-associated complexes. Although atomic structures have been determined directly from cryo-EM density maps with high resolutions, current structure determination methods for medium resolution (5 to 10 Å) cryo-EM maps are limited by the availability of structure templates. Secondary structure traces are lines detected from a cryo-EM density map for α-helices and β-strands of a protein. When combined with secondary structure sequence segments predicted from a protein sequence, it is possible to generate a set of likely topologies of α-traces and β-sheet traces. A topology describes the overall folding relationship among secondary structures; it is a critical piece of information for deriving the corresponding atomic structure. We propose a method for protein structure prediction that combines three sources of information: the secondary structure traces detected from the cryo-EM density map, predicted secondary structure sequence segments, and amino acid contact pairs predicted using MULTICOM. A case study shows that using amino acid contact prediction from MULTICOM improves the ranking of the true topology. Our observations convey that using a small set of highly voted secondary structure contact pairs enhances the ranking in all experiments conducted for this case.","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3388440.3414708","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40524905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Using Curriculum Learning in Pattern Recognition of 3-dimensional Cryo-electron Microscopy Density Maps. 课程学习在三维冷冻电镜密度图模式识别中的应用。

ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Pub Date : 2020-09-01 DOI: 10.1145/3388440.3414710

Yangmei Deng, Yongcheng Mu, Salim Sazzed, Jiangwen Sun, Jing He

Although Cryo-electron microscopy (cryo-EM) has been successfully used to derive atomic structures for many proteins, it is still challenging to derive atomic structure when the resolution of cryo-EM density maps is in the medium range, e.g., 5-10 Å. Studies have attempted to utilize machine learning methods, especially deep neural networks to build predictive models for the detection of protein secondary structures from cryo-EM images, which ultimately helps to derive the atomic structure of proteins. However, the large variation in data quality makes it challenging to train a deep neural network with high prediction accuracy. Curriculum learning has been shown as an effective learning paradigm in machine learning. In this paper, we present a study using curriculum learning as a more effective way to utilize cryo-EM density maps with varying quality. We investigated three distinct training curricula that differ in whether/how images used for training in past are reused while the network was continually trained using new images. A total of 1,382 3-dimensional cryo-EM images were extracted from density maps of Electron Microscopy Data Bank in our study. Our results indicate learning with curriculum significantly improves the performance of the final trained network when the forgetting problem is properly addressed.

尽管冷冻电子显微镜(cryo-EM)已经成功地用于推导许多蛋白质的原子结构，但当冷冻电子显微镜密度图的分辨率在中等范围内(例如5-10 Å)时，推导原子结构仍然具有挑战性。研究试图利用机器学习方法，特别是深度神经网络来建立预测模型，用于从冷冻电镜图像中检测蛋白质二级结构，最终有助于推导蛋白质的原子结构。然而，数据质量的巨大变化给训练具有高预测精度的深度神经网络带来了挑战。课程学习已被证明是机器学习中一种有效的学习范式。在本文中，我们提出了一项研究，使用课程学习作为一种更有效的方法来利用不同质量的低温电镜密度图。我们研究了三种不同的训练课程，它们在使用新图像不断训练网络的同时，是否/如何重复使用过去用于训练的图像。本研究从电子显微镜数据库的密度图中提取了1382张三维冷冻电镜图像。我们的研究结果表明，当遗忘问题得到适当解决时，课程学习显著提高了最终训练网络的性能。

{"title":"Using Curriculum Learning in Pattern Recognition of 3-dimensional Cryo-electron Microscopy Density Maps.","authors":"Yangmei Deng, Yongcheng Mu, Salim Sazzed, Jiangwen Sun, Jing He","doi":"10.1145/3388440.3414710","DOIUrl":"https://doi.org/10.1145/3388440.3414710","url":null,"abstract":"Although Cryo-electron microscopy (cryo-EM) has been successfully used to derive atomic structures for many proteins, it is still challenging to derive atomic structure when the resolution of cryo-EM density maps is in the medium range, e.g., 5-10 Å. Studies have attempted to utilize machine learning methods, especially deep neural networks to build predictive models for the detection of protein secondary structures from cryo-EM images, which ultimately helps to derive the atomic structure of proteins. However, the large variation in data quality makes it challenging to train a deep neural network with high prediction accuracy. Curriculum learning has been shown as an effective learning paradigm in machine learning. In this paper, we present a study using curriculum learning as a more effective way to utilize cryo-EM density maps with varying quality. We investigated three distinct training curricula that differ in whether/how images used for training in past are reused while the network was continually trained using new images. A total of 1,382 3-dimensional cryo-EM images were extracted from density maps of Electron Microscopy Data Bank in our study. Our results indicate learning with curriculum significantly improves the performance of the final trained network when the forgetting problem is properly addressed.","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3388440.3414710","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40507888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Correlation Imputation in Single cell RNA-seq using Auxiliary Information and Ensemble Learning. 基于辅助信息和集成学习的单细胞RNA-seq相关归算。

ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Pub Date : 2020-09-01 DOI: 10.1145/3388440.3412462

Luqin Gan, Giuseppe Vinci, Genevera I Allen

Single cell RNA sequencing is a powerful technique that measures the gene expression of individual cells in a high throughput fashion. However, due to sequencing inefficiency, the data is unreliable due to dropout events, or technical artifacts where genes erroneously appear to have zero expression. Many data imputation methods have been proposed to alleviate this issue. Yet, effective imputation can be difficult and biased because the data is sparse and high-dimensional, resulting in major distortions in downstream analyses. In this paper, we propose a completely novel approach that imputes the gene-by-gene correlations rather than the data itself. We call this method SCENA: Single cell RNA-seq Correlation completion by ENsemble learning and Auxiliary information. The SCENA gene-by-gene correlation matrix estimate is obtained by model stacking of multiple imputed correlation matrices based on known auxiliary information about gene connections. In an extensive simulation study based on real scRNA-seq data, we demonstrate that SCENA not only accurately imputes gene correlations but also outperforms existing imputation approaches in downstream analyses such as dimension reduction, cell clustering, graphical model estimation.

单细胞RNA测序是一种强大的技术，以高通量的方式测量单个细胞的基因表达。然而，由于测序效率低下，数据是不可靠的，由于辍学事件，或技术工件，基因错误地表现为零表达。为了缓解这一问题，人们提出了许多数据输入方法。然而，由于数据稀疏和高维，有效的归算可能是困难的和有偏差的，导致下游分析中的主要扭曲。在本文中，我们提出了一种全新的方法，计算基因间的相关性，而不是数据本身。我们称这种方法为SCENA:单细胞RNA-seq相关的集成学习和辅助信息完成。基于已知的基因连接辅助信息，对多个输入的相关矩阵进行模型叠加，得到SCENA基因间的相关矩阵估计。在一项基于真实scRNA-seq数据的广泛模拟研究中，我们证明了SCENA不仅准确地推测基因相关性，而且在下游分析(如降维、细胞聚类、图形模型估计)中优于现有的推测方法。

{"title":"Correlation Imputation in Single cell RNA-seq using Auxiliary Information and Ensemble Learning.","authors":"Luqin Gan, Giuseppe Vinci, Genevera I Allen","doi":"10.1145/3388440.3412462","DOIUrl":"https://doi.org/10.1145/3388440.3412462","url":null,"abstract":"Single cell RNA sequencing is a powerful technique that measures the gene expression of individual cells in a high throughput fashion. However, due to sequencing inefficiency, the data is unreliable due to dropout events, or technical artifacts where genes erroneously appear to have zero expression. Many data imputation methods have been proposed to alleviate this issue. Yet, effective imputation can be difficult and biased because the data is sparse and high-dimensional, resulting in major distortions in downstream analyses. In this paper, we propose a completely novel approach that imputes the gene-by-gene correlations rather than the data itself. We call this method SCENA: Single cell RNA-seq Correlation completion by ENsemble learning and Auxiliary information. The SCENA gene-by-gene correlation matrix estimate is obtained by model stacking of multiple imputed correlation matrices based on known auxiliary information about gene connections. In an extensive simulation study based on real scRNA-seq data, we demonstrate that SCENA not only accurately imputes gene correlations but also outperforms existing imputation approaches in downstream analyses such as dimension reduction, cell clustering, graphical model estimation.","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3388440.3412462","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39197526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

SAU-Net: A Universal Deep Network for Cell Counting. SAU-Net：一个用于细胞计数的通用深度网络。

ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Pub Date : 2019-09-01 DOI: 10.1145/3307339.3342153

Yue Guo, Guorong Wu, Jason Stein, Ashok Krishnamurthy

Image-based cell counting is a fundamental yet challenging task with wide applications in biological research. In this paper, we propose a novel Deep Network designed to universally solve this problem for various cell types. Specifically, we first extend the segmentation network, U-Net with a Self-Attention module, named SAU-Net, for cell counting. Second, we design an online version of Batch Normalization to mitigate the generalization gap caused by data augmentation in small datasets. We evaluate the proposed method on four public cell counting benchmarks - synthetic fluorescence microscopy (VGG) dataset, Modified Bone Marrow (MBM) dataset, human subcutaneous adipose tissue (ADI) dataset, and Dublin Cell Counting (DCC) dataset. Our method surpasses the current state-of-the-art performance in the three real datasets (MBM, ADI and DCC) and achieves competitive results in the synthetic dataset (VGG). The source code is available at https://github.com/mzlr/sau-net.

基于图像的细胞计数是一项基础性但具有挑战性的任务，在生物学研究中有着广泛的应用。在本文中，我们提出了一种新的深度网络，旨在普遍解决各种细胞类型的这个问题。具体来说，我们首先扩展了分割网络U-Net，并添加了一个名为SAU-Net的自注意模块，用于细胞计数。其次，我们设计了一个在线版本的Batch Normalization，以缓解小数据集中数据扩充造成的泛化差距。我们在四个公共细胞计数基准上评估了所提出的方法——合成荧光显微镜（VGG）数据集、改良骨髓（MBM）数据集，人体皮下脂肪组织（ADI）数据集和都柏林细胞计数（DCC）数据集中。我们的方法在三个真实数据集（MBM、ADI和DCC）中超越了当前最先进的性能，并在合成数据集（VGG）中取得了有竞争力的结果。源代码位于https://github.com/mzlr/sau-net.

引用次数: 29

Integration of Heterogeneous Experimental Data Improves Global Map of Human Protein Complexes. 异质实验数据的整合改进了人类蛋白质复合物的全球图谱。

ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Pub Date : 2019-09-01 DOI: 10.1145/3307339.3342150

Jose Lugo-Martinez, Ziv Bar-Joseph, Jörn Dengjel, Robert F Murphy

Protein complexes play a significant role in the core functionality of cells. These complexes are typically identified by detecting densely connected subgraphs in protein-protein interaction (PPI) networks. Recently, multiple large-scale mass spectrometry-based experiments have significantly increased the availability of PPI data in order to further expand the set of known complexes. However, high-throughput experimental data generally are incomplete, show limited agreement between experiments, and show frequent false positive interactions. There is a need for computational approaches that can address these limitations in order to improve the coverage and accuracy of human protein complexes. Here, we present a new method that integrates data from multiple heterogeneous experiments and sources in order to increase the reliability and coverage of predicted protein complexes. We first fused the heterogeneous data into a feature matrix and trained classifiers to score pairwise protein interactions. We next used graph based methods to combine pairwise interactions into predicted protein complexes. Our approach improves the accuracy and coverage of protein pairwise interactions, accurately identifies known complexes, and suggests both novel additions to known complexes and entirely new complexes. Our results suggest that integration of heterogeneous experimental data helps improve the reliability and coverage of diverse high-throughput mass-spectrometry experiments, leading to an improved global map of human protein complexes.

蛋白质复合物在细胞的核心功能中起着重要的作用。这些复合物通常通过检测蛋白质-蛋白质相互作用(PPI)网络中的紧密连接子图来识别。最近，多个基于质谱的大规模实验显著增加了PPI数据的可用性，以进一步扩大已知配合物的集合。然而，高通量实验数据通常是不完整的，实验之间的一致性有限，并且经常出现假阳性相互作用。为了提高人类蛋白质复合物的覆盖范围和准确性，需要能够解决这些限制的计算方法。在这里，我们提出了一种新的方法，该方法集成了来自多个异构实验和来源的数据，以提高预测蛋白质复合物的可靠性和覆盖率。我们首先将异构数据融合到一个特征矩阵中，并训练分类器对成对的蛋白质相互作用进行评分。接下来，我们使用基于图的方法将成对相互作用组合到预测的蛋白质复合物中。我们的方法提高了蛋白质成对相互作用的准确性和覆盖率，准确地识别了已知的复合物，并提出了已知复合物的新添加物和全新的复合物。我们的研究结果表明，整合异质实验数据有助于提高各种高通量质谱实验的可靠性和覆盖率，从而改进人类蛋白质复合物的全球图谱。

{"title":"Integration of Heterogeneous Experimental Data Improves Global Map of Human Protein Complexes.","authors":"Jose Lugo-Martinez, Ziv Bar-Joseph, Jörn Dengjel, Robert F Murphy","doi":"10.1145/3307339.3342150","DOIUrl":"https://doi.org/10.1145/3307339.3342150","url":null,"abstract":"Protein complexes play a significant role in the core functionality of cells. These complexes are typically identified by detecting densely connected subgraphs in protein-protein interaction (PPI) networks. Recently, multiple large-scale mass spectrometry-based experiments have significantly increased the availability of PPI data in order to further expand the set of known complexes. However, high-throughput experimental data generally are incomplete, show limited agreement between experiments, and show frequent false positive interactions. There is a need for computational approaches that can address these limitations in order to improve the coverage and accuracy of human protein complexes. Here, we present a new method that integrates data from multiple heterogeneous experiments and sources in order to increase the reliability and coverage of predicted protein complexes. We first fused the heterogeneous data into a feature matrix and trained classifiers to score pairwise protein interactions. We next used graph based methods to combine pairwise interactions into predicted protein complexes. Our approach improves the accuracy and coverage of protein pairwise interactions, accurately identifies known complexes, and suggests both novel additions to known complexes and entirely new complexes. Our results suggest that integration of heterogeneous experimental data helps improve the reliability and coverage of diverse high-throughput mass-spectrometry experiments, leading to an improved global map of human protein complexes.","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3307339.3342150","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37979688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Copy Number Variation Detection Using Total Variation. 使用总变异检测拷贝数变异。

ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Pub Date : 2019-09-01 DOI: 10.1145/3307339.3342181

Fatima Zare, Sheida Nabavi

Next-generation sequencing (NGS) technologies offer new opportunities for precise and accurate identification of genomic aberrations, including copy number variations (CNVs). For high-throughput NGS data, using depth of coverage has become a major approach to identify CNVs, especially for whole exome sequencing (WES) data. Due to the high level of noise and biases of read-count data and complexity of the WES data, existing CNV detection tools identify many false CNV segments. Besides, NGS generates a huge amount of data, requiring to use effective and efficient methods. In this work, we propose a novel segmentation algorithm based on the total variation approach to detect CNVs more precisely and efficiently using WES data. The proposed method also filters out outlier read-counts and identifies significant change points to reduce false positives. We used real and simulated data to evaluate the performance of the proposed method and compare its performance with those of other commonly used CNV detection methods. Using simulated and real data, we show that the proposed method outperforms the existing CNV detection methods in terms of accuracy and false discovery rate and has a faster runtime compared to the circular binary segmentation method.

下一代测序(NGS)技术为精确和准确地鉴定基因组畸变(包括拷贝数变异(CNVs))提供了新的机会。对于高通量NGS数据，使用覆盖深度已成为鉴定CNVs的主要方法，特别是对于全外显子组测序(WES)数据。由于读计数数据的高噪声和偏差以及WES数据的复杂性，现有的CNV检测工具识别出许多虚假的CNV片段。此外，NGS产生了大量的数据，需要使用有效和高效的方法。在这项工作中，我们提出了一种新的基于总变分方法的分割算法，以更精确和有效地利用WES数据检测CNVs。该方法还可以过滤掉异常的读取计数，并识别重要的变化点，以减少误报。我们使用真实数据和模拟数据来评估该方法的性能，并将其与其他常用的CNV检测方法的性能进行比较。通过仿真和真实数据，我们证明了该方法在准确率和错误发现率方面优于现有的CNV检测方法，并且与圆形二值分割方法相比具有更快的运行时间。

{"title":"Copy Number Variation Detection Using Total Variation.","authors":"Fatima Zare, Sheida Nabavi","doi":"10.1145/3307339.3342181","DOIUrl":"https://doi.org/10.1145/3307339.3342181","url":null,"abstract":"Next-generation sequencing (NGS) technologies offer new opportunities for precise and accurate identification of genomic aberrations, including copy number variations (CNVs). For high-throughput NGS data, using depth of coverage has become a major approach to identify CNVs, especially for whole exome sequencing (WES) data. Due to the high level of noise and biases of read-count data and complexity of the WES data, existing CNV detection tools identify many false CNV segments. Besides, NGS generates a huge amount of data, requiring to use effective and efficient methods. In this work, we propose a novel segmentation algorithm based on the total variation approach to detect CNVs more precisely and efficiently using WES data. The proposed method also filters out outlier read-counts and identifies significant change points to reduce false positives. We used real and simulated data to evaluate the performance of the proposed method and compare its performance with those of other commonly used CNV detection methods. Using simulated and real data, we show that the proposed method outperforms the existing CNV detection methods in terms of accuracy and false discovery rate and has a faster runtime compared to the circular binary segmentation method.","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3307339.3342181","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38028752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Learning to Evaluate Color Similarity for Histopathology Images using Triplet Networks. 使用三重网络学习评估组织病理学图像的颜色相似性。

ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Pub Date : 2019-09-01 DOI: 10.1145/3307339.3342170

Anirudh Choudhary, Hang Wu, Li Tong, May D Wang

Stain normalization is a crucial pre-processing step for histopathological image processing, and can help improve the accuracy of downstream tasks such as segmentation and classification. To evaluate the effectiveness of stain normalization methods, various metrics based on color-perceptual similarity and stain color evaluation have been proposed. However, there still exists a huge gap between metric evaluation and human perception, given the limited explainability power of existing metrics and inability to combine color and semantic information efficiently. Inspired by the effectiveness of deep neural networks in evaluating perceptual similarity of natural images, in this paper, we propose TriNet-P, a color-perceptual similarity metric for whole slide images, based on deep metric embeddings. We evaluate the proposed approach using four publicly available breast cancer histological datasets. The benefit of our approach is its representation efficiency of the perceptual factors associated with H&E stained images with minimal human intervention. We show that our metric can capture the semantic similarities, both at subject (patient) and laboratory levels, and leads to better performance in image retrieval and clustering tasks.

染色归一化是组织病理图像处理的关键预处理步骤，有助于提高下游任务(如分割和分类)的准确性。为了评估染色归一化方法的有效性，人们提出了各种基于颜色感知相似性和染色颜色评价的度量。然而，由于现有度量的解释能力有限，并且无法有效地将颜色和语义信息结合起来，度量评价与人类感知之间仍然存在巨大差距。受深度神经网络在评估自然图像感知相似性方面有效性的启发，本文提出了基于深度度量嵌入的全幻灯片图像颜色感知相似性度量TriNet-P。我们使用四个公开可用的乳腺癌组织学数据集来评估所提出的方法。我们的方法的好处是它的表示效率与H&E染色图像相关的感知因素与最小的人为干预。我们表明，我们的度量可以捕获受试者(患者)和实验室级别的语义相似性，并在图像检索和聚类任务中获得更好的性能。

{"title":"Learning to Evaluate Color Similarity for Histopathology Images using Triplet Networks.","authors":"Anirudh Choudhary, Hang Wu, Li Tong, May D Wang","doi":"10.1145/3307339.3342170","DOIUrl":"https://doi.org/10.1145/3307339.3342170","url":null,"abstract":"Stain normalization is a crucial pre-processing step for histopathological image processing, and can help improve the accuracy of downstream tasks such as segmentation and classification. To evaluate the effectiveness of stain normalization methods, various metrics based on color-perceptual similarity and stain color evaluation have been proposed. However, there still exists a huge gap between metric evaluation and human perception, given the limited explainability power of existing metrics and inability to combine color and semantic information efficiently. Inspired by the effectiveness of deep neural networks in evaluating perceptual similarity of natural images, in this paper, we propose TriNet-P, a color-perceptual similarity metric for whole slide images, based on deep metric embeddings. We evaluate the proposed approach using four publicly available breast cancer histological datasets. The benefit of our approach is its representation efficiency of the perceptual factors associated with H&E stained images with minimal human intervention. We show that our metric can capture the semantic similarities, both at subject (patient) and laboratory levels, and leads to better performance in image retrieval and clustering tasks.","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3307339.3342170","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38067063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Unexpected Predictors of Antibiotic Resistance in Housekeeping Genes of Staphylococcus Aureus. 金黄色葡萄球菌管家基因中抗生素耐药性的意外预测因子。

ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Pub Date : 2019-09-01 DOI: 10.1145/3307339.3342138

Mattia Prosperi, Marco Salemi, Taj Azarian, Franco Milicchio, Judith A Johnson, Marco Oliva

Methicillin-resistant Staphylococcus aureus (MRSA) is currently the most commonly identified antibiotic-resistant pathogen in US hospitals. Resistance to methicillin is carried by SCCmec genetic elements. Multilocus sequence typing (MLST) covers internal fragments of seven housekeeping genes of S. aureus. In conjunction with mec typing, MLST has been used to create an international nomenclature for S. aureus. MLST sequence types with a single nucleotide polymorphism (SNP) considered distinct. In this work, relationships among MLST SNPs and methicillin/oxacillin resistance or susceptibility were studied, using a public data base, by means of cross-tabulation tests, multivariable (phylogenetic) logistic regression (LR), decision trees, rule bases, and random forests (RF). Model performances were assessed through multiple cross-validation. Hierarchical clustering of SNPs was also employed to analyze mutational covariation. The number of instances with a known methicillin (oxacillin) antibiogram result was 1526 (649), where 63% (54%) was resistant to methicillin (oxacillin). In univariable analysis, several MLST SNPs were found strongly associated with antibiotic resistance/susceptibility. A RF model predicted correctly the resistance/susceptibility to methicillin and oxacillin in 75% and 63% of cases (cross-validated). Results were similar for LR. Hierarchical clustering of the aforementioned SNPs yielded a high level of covariation both within the same and different genes; this suggests strong genetic linkage between SNPs of housekeeping genes and antibiotic resistant associated genes. This finding provides a basis for rapid identification of antibiotic resistant S. arues lineages using a small number of genomic markers. The number of sites could subsequently be increased moderately to increase the sensitivity and specificity of genotypic tests for resistance that do not rely on the direct detection of the resistance marker itself.

耐甲氧西林金黄色葡萄球菌（MRSA）是目前美国医院中最常见的抗生素耐药性病原体。对甲氧西林的耐药性是由短链氯化石蜡基因元件携带的。多基因座序列分型（MLST）涵盖了金黄色葡萄球菌7个持家基因的内部片段。结合mec分型，MLST已被用于创建金黄色葡萄球菌的国际命名法。具有单核苷酸多态性（SNP）的MLST序列类型被认为是不同的。在这项工作中，使用公共数据库，通过交叉表测试、多变量（系统发育）逻辑回归（LR）、决策树、规则库和随机森林（RF），研究了MLST SNPs与甲氧西林/苯唑西林耐药性或易感性之间的关系。模型性能通过多次交叉验证进行评估。SNPs的层次聚类也被用来分析变异协变量。已知的甲氧西林（苯唑西林）抗体图谱结果为1526例（649例），其中63%（54%）对甲氧西林有耐药性。在单变量分析中，发现几个MLST SNPs与抗生素耐药性/易感性密切相关。RF模型正确预测了75%和63%的病例对甲氧西林和苯唑西林的耐药性/易感性（交叉验证）。LR的结果相似。上述SNPs的层次聚类在相同和不同的基因内产生了高水平的协变；这表明管家基因的SNPs和抗生素抗性相关基因之间存在强烈的遗传联系。这一发现为使用少量基因组标记快速鉴定具有抗生素耐药性的铜绿假单胞菌谱系提供了基础。位点的数量随后可以适度增加，以提高不依赖于抗性标记物本身的直接检测的抗性基因型检测的敏感性和特异性。

{"title":"Unexpected Predictors of Antibiotic Resistance in Housekeeping Genes of Staphylococcus Aureus.","authors":"Mattia Prosperi, Marco Salemi, Taj Azarian, Franco Milicchio, Judith A Johnson, Marco Oliva","doi":"10.1145/3307339.3342138","DOIUrl":"https://doi.org/10.1145/3307339.3342138","url":null,"abstract":"Methicillin-resistant Staphylococcus aureus (MRSA) is currently the most commonly identified antibiotic-resistant pathogen in US hospitals. Resistance to methicillin is carried by SCCmec genetic elements. Multilocus sequence typing (MLST) covers internal fragments of seven housekeeping genes of S. aureus. In conjunction with mec typing, MLST has been used to create an international nomenclature for S. aureus. MLST sequence types with a single nucleotide polymorphism (SNP) considered distinct. In this work, relationships among MLST SNPs and methicillin/oxacillin resistance or susceptibility were studied, using a public data base, by means of cross-tabulation tests, multivariable (phylogenetic) logistic regression (LR), decision trees, rule bases, and random forests (RF). Model performances were assessed through multiple cross-validation. Hierarchical clustering of SNPs was also employed to analyze mutational covariation. The number of instances with a known methicillin (oxacillin) antibiogram result was 1526 (649), where 63% (54%) was resistant to methicillin (oxacillin). In univariable analysis, several MLST SNPs were found strongly associated with antibiotic resistance/susceptibility. A RF model predicted correctly the resistance/susceptibility to methicillin and oxacillin in 75% and 63% of cases (cross-validated). Results were similar for LR. Hierarchical clustering of the aforementioned SNPs yielded a high level of covariation both within the same and different genes; this suggests strong genetic linkage between SNPs of housekeeping genes and antibiotic resistant associated genes. This finding provides a basis for rapid identification of antibiotic resistant S. arues lineages using a small number of genomic markers. The number of sites could subsequently be increased moderately to increase the sensitivity and specificity of genotypic tests for resistance that do not rely on the direct detection of the resistance marker itself.","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3307339.3342138","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41221536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Fusion in Breast Cancer Histology Classification. 融合在乳腺癌组织学分类中的应用。

ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Pub Date : 2019-09-01 DOI: 10.1145/3307339.3342166

Juan Vizcarra, Ryan Place, Li Tong, David Gutman, May D Wang

Breast cancer is a deadly disease that affects millions of women worldwide. The International Conference on Image Analysis and Recognition in 2018 presents the BreAst Cancer Histology (ICIAR2018 BACH) image data challenge that calls for computer tools to assist pathologists and doctors in the clinical diagnosis of breast cancer subtypes. Using the BACH dataset, we have developed an image classification pipeline that combines both a shallow learner (support vector machine) and a deep learner (convolutional neural network). The shallow learner and deep learners achieved moderate accuracies of 79% and 81% individually. When being integrated by fusion algorithms, the system outperformed any individual learner with the highest accuracy as 92%. The fusion presents big potential for improving clinical design support.

乳腺癌是一种致命的疾病，影响着全世界数百万妇女。2018年图像分析与识别国际会议提出了乳腺癌组织学(ICIAR2018 BACH)图像数据挑战，要求计算机工具协助病理学家和医生进行乳腺癌亚型的临床诊断。使用BACH数据集，我们开发了一个图像分类管道，该管道结合了浅学习器(支持向量机)和深度学习器(卷积神经网络)。浅层学习器和深度学习器分别达到了79%和81%的中等准确率。当通过融合算法集成时，该系统以92%的最高准确率优于任何单个学习者。这种融合为改善临床设计支持提供了巨大的潜力。

引用次数: 19