首页 > 最新文献

ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine最新文献

英文 中文
SAU-Net: A Universal Deep Network for Cell Counting. SAU-Net:一个用于细胞计数的通用深度网络。
Yue Guo, Guorong Wu, Jason Stein, Ashok Krishnamurthy

Image-based cell counting is a fundamental yet challenging task with wide applications in biological research. In this paper, we propose a novel Deep Network designed to universally solve this problem for various cell types. Specifically, we first extend the segmentation network, U-Net with a Self-Attention module, named SAU-Net, for cell counting. Second, we design an online version of Batch Normalization to mitigate the generalization gap caused by data augmentation in small datasets. We evaluate the proposed method on four public cell counting benchmarks - synthetic fluorescence microscopy (VGG) dataset, Modified Bone Marrow (MBM) dataset, human subcutaneous adipose tissue (ADI) dataset, and Dublin Cell Counting (DCC) dataset. Our method surpasses the current state-of-the-art performance in the three real datasets (MBM, ADI and DCC) and achieves competitive results in the synthetic dataset (VGG). The source code is available at https://github.com/mzlr/sau-net.

基于图像的细胞计数是一项基础性但具有挑战性的任务,在生物学研究中有着广泛的应用。在本文中,我们提出了一种新的深度网络,旨在普遍解决各种细胞类型的这个问题。具体来说,我们首先扩展了分割网络U-Net,并添加了一个名为SAU-Net的自注意模块,用于细胞计数。其次,我们设计了一个在线版本的Batch Normalization,以缓解小数据集中数据扩充造成的泛化差距。我们在四个公共细胞计数基准上评估了所提出的方法——合成荧光显微镜(VGG)数据集、改良骨髓(MBM)数据集,人体皮下脂肪组织(ADI)数据集和都柏林细胞计数(DCC)数据集中。我们的方法在三个真实数据集(MBM、ADI和DCC)中超越了当前最先进的性能,并在合成数据集(VGG)中取得了有竞争力的结果。源代码位于https://github.com/mzlr/sau-net.
{"title":"SAU-Net: A Universal Deep Network for Cell Counting.","authors":"Yue Guo,&nbsp;Guorong Wu,&nbsp;Jason Stein,&nbsp;Ashok Krishnamurthy","doi":"10.1145/3307339.3342153","DOIUrl":"10.1145/3307339.3342153","url":null,"abstract":"<p><p>Image-based cell counting is a fundamental yet challenging task with wide applications in biological research. In this paper, we propose a novel Deep Network designed to universally solve this problem for various cell types. Specifically, we first extend the segmentation network, U-Net with a Self-Attention module, named SAU-Net, for cell counting. Second, we design an online version of Batch Normalization to mitigate the generalization gap caused by data augmentation in small datasets. We evaluate the proposed method on four public cell counting benchmarks - synthetic fluorescence microscopy (VGG) dataset, Modified Bone Marrow (MBM) dataset, human subcutaneous adipose tissue (ADI) dataset, and Dublin Cell Counting (DCC) dataset. Our method surpasses the current state-of-the-art performance in the three real datasets (MBM, ADI and DCC) and achieves competitive results in the synthetic dataset (VGG). The source code is available at https://github.com/mzlr/sau-net.</p>","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":"2019 ","pages":"299-306"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3307339.3342153","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39027804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Integration of Heterogeneous Experimental Data Improves Global Map of Human Protein Complexes. 异质实验数据的整合改进了人类蛋白质复合物的全球图谱。
Jose Lugo-Martinez, Ziv Bar-Joseph, Jörn Dengjel, Robert F Murphy

Protein complexes play a significant role in the core functionality of cells. These complexes are typically identified by detecting densely connected subgraphs in protein-protein interaction (PPI) networks. Recently, multiple large-scale mass spectrometry-based experiments have significantly increased the availability of PPI data in order to further expand the set of known complexes. However, high-throughput experimental data generally are incomplete, show limited agreement between experiments, and show frequent false positive interactions. There is a need for computational approaches that can address these limitations in order to improve the coverage and accuracy of human protein complexes. Here, we present a new method that integrates data from multiple heterogeneous experiments and sources in order to increase the reliability and coverage of predicted protein complexes. We first fused the heterogeneous data into a feature matrix and trained classifiers to score pairwise protein interactions. We next used graph based methods to combine pairwise interactions into predicted protein complexes. Our approach improves the accuracy and coverage of protein pairwise interactions, accurately identifies known complexes, and suggests both novel additions to known complexes and entirely new complexes. Our results suggest that integration of heterogeneous experimental data helps improve the reliability and coverage of diverse high-throughput mass-spectrometry experiments, leading to an improved global map of human protein complexes.

蛋白质复合物在细胞的核心功能中起着重要的作用。这些复合物通常通过检测蛋白质-蛋白质相互作用(PPI)网络中的紧密连接子图来识别。最近,多个基于质谱的大规模实验显著增加了PPI数据的可用性,以进一步扩大已知配合物的集合。然而,高通量实验数据通常是不完整的,实验之间的一致性有限,并且经常出现假阳性相互作用。为了提高人类蛋白质复合物的覆盖范围和准确性,需要能够解决这些限制的计算方法。在这里,我们提出了一种新的方法,该方法集成了来自多个异构实验和来源的数据,以提高预测蛋白质复合物的可靠性和覆盖率。我们首先将异构数据融合到一个特征矩阵中,并训练分类器对成对的蛋白质相互作用进行评分。接下来,我们使用基于图的方法将成对相互作用组合到预测的蛋白质复合物中。我们的方法提高了蛋白质成对相互作用的准确性和覆盖率,准确地识别了已知的复合物,并提出了已知复合物的新添加物和全新的复合物。我们的研究结果表明,整合异质实验数据有助于提高各种高通量质谱实验的可靠性和覆盖率,从而改进人类蛋白质复合物的全球图谱。
{"title":"Integration of Heterogeneous Experimental Data Improves Global Map of Human Protein Complexes.","authors":"Jose Lugo-Martinez,&nbsp;Ziv Bar-Joseph,&nbsp;Jörn Dengjel,&nbsp;Robert F Murphy","doi":"10.1145/3307339.3342150","DOIUrl":"https://doi.org/10.1145/3307339.3342150","url":null,"abstract":"<p><p>Protein complexes play a significant role in the core functionality of cells. These complexes are typically identified by detecting densely connected subgraphs in protein-protein interaction (PPI) networks. Recently, multiple large-scale mass spectrometry-based experiments have significantly increased the availability of PPI data in order to further expand the set of known complexes. However, high-throughput experimental data generally are incomplete, show limited agreement between experiments, and show frequent false positive interactions. There is a need for computational approaches that can address these limitations in order to improve the coverage and accuracy of human protein complexes. Here, we present a new method that integrates data from multiple heterogeneous experiments and sources in order to increase the reliability and coverage of predicted protein complexes. We first fused the heterogeneous data into a feature matrix and trained classifiers to score pairwise protein interactions. We next used graph based methods to combine pairwise interactions into predicted protein complexes. Our approach improves the accuracy and coverage of protein pairwise interactions, accurately identifies known complexes, and suggests both novel additions to known complexes and entirely new complexes. Our results suggest that integration of heterogeneous experimental data helps improve the reliability and coverage of diverse high-throughput mass-spectrometry experiments, leading to an improved global map of human protein complexes.</p>","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":"2019 ","pages":"144-153"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3307339.3342150","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37979688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Copy Number Variation Detection Using Total Variation. 使用总变异检测拷贝数变异。
Fatima Zare, Sheida Nabavi

Next-generation sequencing (NGS) technologies offer new opportunities for precise and accurate identification of genomic aberrations, including copy number variations (CNVs). For high-throughput NGS data, using depth of coverage has become a major approach to identify CNVs, especially for whole exome sequencing (WES) data. Due to the high level of noise and biases of read-count data and complexity of the WES data, existing CNV detection tools identify many false CNV segments. Besides, NGS generates a huge amount of data, requiring to use effective and efficient methods. In this work, we propose a novel segmentation algorithm based on the total variation approach to detect CNVs more precisely and efficiently using WES data. The proposed method also filters out outlier read-counts and identifies significant change points to reduce false positives. We used real and simulated data to evaluate the performance of the proposed method and compare its performance with those of other commonly used CNV detection methods. Using simulated and real data, we show that the proposed method outperforms the existing CNV detection methods in terms of accuracy and false discovery rate and has a faster runtime compared to the circular binary segmentation method.

下一代测序(NGS)技术为精确和准确地鉴定基因组畸变(包括拷贝数变异(CNVs))提供了新的机会。对于高通量NGS数据,使用覆盖深度已成为鉴定CNVs的主要方法,特别是对于全外显子组测序(WES)数据。由于读计数数据的高噪声和偏差以及WES数据的复杂性,现有的CNV检测工具识别出许多虚假的CNV片段。此外,NGS产生了大量的数据,需要使用有效和高效的方法。在这项工作中,我们提出了一种新的基于总变分方法的分割算法,以更精确和有效地利用WES数据检测CNVs。该方法还可以过滤掉异常的读取计数,并识别重要的变化点,以减少误报。我们使用真实数据和模拟数据来评估该方法的性能,并将其与其他常用的CNV检测方法的性能进行比较。通过仿真和真实数据,我们证明了该方法在准确率和错误发现率方面优于现有的CNV检测方法,并且与圆形二值分割方法相比具有更快的运行时间。
{"title":"Copy Number Variation Detection Using Total Variation.","authors":"Fatima Zare,&nbsp;Sheida Nabavi","doi":"10.1145/3307339.3342181","DOIUrl":"https://doi.org/10.1145/3307339.3342181","url":null,"abstract":"<p><p>Next-generation sequencing (NGS) technologies offer new opportunities for precise and accurate identification of genomic aberrations, including copy number variations (CNVs). For high-throughput NGS data, using depth of coverage has become a major approach to identify CNVs, especially for whole exome sequencing (WES) data. Due to the high level of noise and biases of read-count data and complexity of the WES data, existing CNV detection tools identify many false CNV segments. Besides, NGS generates a huge amount of data, requiring to use effective and efficient methods. In this work, we propose a novel segmentation algorithm based on the total variation approach to detect CNVs more precisely and efficiently using WES data. The proposed method also filters out outlier read-counts and identifies significant change points to reduce false positives. We used real and simulated data to evaluate the performance of the proposed method and compare its performance with those of other commonly used CNV detection methods. Using simulated and real data, we show that the proposed method outperforms the existing CNV detection methods in terms of accuracy and false discovery rate and has a faster runtime compared to the circular binary segmentation method.</p>","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":"2019 ","pages":"423-428"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3307339.3342181","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38028752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Unexpected Predictors of Antibiotic Resistance in Housekeeping Genes of Staphylococcus Aureus. 金黄色葡萄球菌管家基因中抗生素耐药性的意外预测因子。
Mattia Prosperi, Marco Salemi, Taj Azarian, Franco Milicchio, Judith A Johnson, Marco Oliva

Methicillin-resistant Staphylococcus aureus (MRSA) is currently the most commonly identified antibiotic-resistant pathogen in US hospitals. Resistance to methicillin is carried by SCCmec genetic elements. Multilocus sequence typing (MLST) covers internal fragments of seven housekeeping genes of S. aureus. In conjunction with mec typing, MLST has been used to create an international nomenclature for S. aureus. MLST sequence types with a single nucleotide polymorphism (SNP) considered distinct. In this work, relationships among MLST SNPs and methicillin/oxacillin resistance or susceptibility were studied, using a public data base, by means of cross-tabulation tests, multivariable (phylogenetic) logistic regression (LR), decision trees, rule bases, and random forests (RF). Model performances were assessed through multiple cross-validation. Hierarchical clustering of SNPs was also employed to analyze mutational covariation. The number of instances with a known methicillin (oxacillin) antibiogram result was 1526 (649), where 63% (54%) was resistant to methicillin (oxacillin). In univariable analysis, several MLST SNPs were found strongly associated with antibiotic resistance/susceptibility. A RF model predicted correctly the resistance/susceptibility to methicillin and oxacillin in 75% and 63% of cases (cross-validated). Results were similar for LR. Hierarchical clustering of the aforementioned SNPs yielded a high level of covariation both within the same and different genes; this suggests strong genetic linkage between SNPs of housekeeping genes and antibiotic resistant associated genes. This finding provides a basis for rapid identification of antibiotic resistant S. arues lineages using a small number of genomic markers. The number of sites could subsequently be increased moderately to increase the sensitivity and specificity of genotypic tests for resistance that do not rely on the direct detection of the resistance marker itself.

耐甲氧西林金黄色葡萄球菌(MRSA)是目前美国医院中最常见的抗生素耐药性病原体。对甲氧西林的耐药性是由短链氯化石蜡基因元件携带的。多基因座序列分型(MLST)涵盖了金黄色葡萄球菌7个持家基因的内部片段。结合mec分型,MLST已被用于创建金黄色葡萄球菌的国际命名法。具有单核苷酸多态性(SNP)的MLST序列类型被认为是不同的。在这项工作中,使用公共数据库,通过交叉表测试、多变量(系统发育)逻辑回归(LR)、决策树、规则库和随机森林(RF),研究了MLST SNPs与甲氧西林/苯唑西林耐药性或易感性之间的关系。模型性能通过多次交叉验证进行评估。SNPs的层次聚类也被用来分析变异协变量。已知的甲氧西林(苯唑西林)抗体图谱结果为1526例(649例),其中63%(54%)对甲氧西林有耐药性。在单变量分析中,发现几个MLST SNPs与抗生素耐药性/易感性密切相关。RF模型正确预测了75%和63%的病例对甲氧西林和苯唑西林的耐药性/易感性(交叉验证)。LR的结果相似。上述SNPs的层次聚类在相同和不同的基因内产生了高水平的协变;这表明管家基因的SNPs和抗生素抗性相关基因之间存在强烈的遗传联系。这一发现为使用少量基因组标记快速鉴定具有抗生素耐药性的铜绿假单胞菌谱系提供了基础。位点的数量随后可以适度增加,以提高不依赖于抗性标记物本身的直接检测的抗性基因型检测的敏感性和特异性。
{"title":"Unexpected Predictors of Antibiotic Resistance in Housekeeping Genes of Staphylococcus Aureus.","authors":"Mattia Prosperi,&nbsp;Marco Salemi,&nbsp;Taj Azarian,&nbsp;Franco Milicchio,&nbsp;Judith A Johnson,&nbsp;Marco Oliva","doi":"10.1145/3307339.3342138","DOIUrl":"https://doi.org/10.1145/3307339.3342138","url":null,"abstract":"<p><p>Methicillin-resistant <i>Staphylococcus aureus</i> (MRSA) is currently the most commonly identified antibiotic-resistant pathogen in US hospitals. Resistance to methicillin is carried by SCCmec genetic elements. Multilocus sequence typing (MLST) covers internal fragments of seven housekeeping genes of <i>S. aureus.</i> In conjunction with mec typing, MLST has been used to create an international nomenclature for <i>S. aureus</i>. MLST sequence types with a single nucleotide polymorphism (SNP) considered distinct. In this work, relationships among MLST SNPs and methicillin/oxacillin resistance or susceptibility were studied, using a public data base, by means of cross-tabulation tests, multivariable (phylogenetic) logistic regression (LR), decision trees, rule bases, and random forests (RF). Model performances were assessed through multiple cross-validation. Hierarchical clustering of SNPs was also employed to analyze mutational covariation. The number of instances with a known methicillin (oxacillin) antibiogram result was 1526 (649), where 63% (54%) was resistant to methicillin (oxacillin). In univariable analysis, several MLST SNPs were found strongly associated with antibiotic resistance/susceptibility. A RF model predicted correctly the resistance/susceptibility to methicillin and oxacillin in 75% and 63% of cases (cross-validated). Results were similar for LR. Hierarchical clustering of the aforementioned SNPs yielded a high level of covariation both within the same and different genes; this suggests strong genetic linkage between SNPs of housekeeping genes and antibiotic resistant associated genes. This finding provides a basis for rapid identification of antibiotic resistant <i>S. arues</i> lineages using a small number of genomic markers. The number of sites could subsequently be increased moderately to increase the sensitivity and specificity of genotypic tests for resistance that do not rely on the direct detection of the resistance marker itself.</p>","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":"2019 ","pages":"259-268"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3307339.3342138","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41221536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Learning to Evaluate Color Similarity for Histopathology Images using Triplet Networks. 使用三重网络学习评估组织病理学图像的颜色相似性。
Anirudh Choudhary, Hang Wu, Li Tong, May D Wang

Stain normalization is a crucial pre-processing step for histopathological image processing, and can help improve the accuracy of downstream tasks such as segmentation and classification. To evaluate the effectiveness of stain normalization methods, various metrics based on color-perceptual similarity and stain color evaluation have been proposed. However, there still exists a huge gap between metric evaluation and human perception, given the limited explainability power of existing metrics and inability to combine color and semantic information efficiently. Inspired by the effectiveness of deep neural networks in evaluating perceptual similarity of natural images, in this paper, we propose TriNet-P, a color-perceptual similarity metric for whole slide images, based on deep metric embeddings. We evaluate the proposed approach using four publicly available breast cancer histological datasets. The benefit of our approach is its representation efficiency of the perceptual factors associated with H&E stained images with minimal human intervention. We show that our metric can capture the semantic similarities, both at subject (patient) and laboratory levels, and leads to better performance in image retrieval and clustering tasks.

染色归一化是组织病理图像处理的关键预处理步骤,有助于提高下游任务(如分割和分类)的准确性。为了评估染色归一化方法的有效性,人们提出了各种基于颜色感知相似性和染色颜色评价的度量。然而,由于现有度量的解释能力有限,并且无法有效地将颜色和语义信息结合起来,度量评价与人类感知之间仍然存在巨大差距。受深度神经网络在评估自然图像感知相似性方面有效性的启发,本文提出了基于深度度量嵌入的全幻灯片图像颜色感知相似性度量TriNet-P。我们使用四个公开可用的乳腺癌组织学数据集来评估所提出的方法。我们的方法的好处是它的表示效率与H&E染色图像相关的感知因素与最小的人为干预。我们表明,我们的度量可以捕获受试者(患者)和实验室级别的语义相似性,并在图像检索和聚类任务中获得更好的性能。
{"title":"Learning to Evaluate Color Similarity for Histopathology Images using Triplet Networks.","authors":"Anirudh Choudhary,&nbsp;Hang Wu,&nbsp;Li Tong,&nbsp;May D Wang","doi":"10.1145/3307339.3342170","DOIUrl":"https://doi.org/10.1145/3307339.3342170","url":null,"abstract":"<p><p>Stain normalization is a crucial pre-processing step for histopathological image processing, and can help improve the accuracy of downstream tasks such as segmentation and classification. To evaluate the effectiveness of stain normalization methods, various metrics based on color-perceptual similarity and stain color evaluation have been proposed. However, there still exists a huge gap between metric evaluation and human perception, given the limited explainability power of existing metrics and inability to combine color and semantic information efficiently. Inspired by the effectiveness of deep neural networks in evaluating perceptual similarity of natural images, in this paper, we propose TriNet-P, a color-perceptual similarity metric for whole slide images, based on deep metric embeddings. We evaluate the proposed approach using four publicly available breast cancer histological datasets. The benefit of our approach is its representation efficiency of the perceptual factors associated with H&E stained images with minimal human intervention. We show that our metric can capture the semantic similarities, both at subject (patient) and laboratory levels, and leads to better performance in image retrieval and clustering tasks.</p>","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":"2019 ","pages":"466-474"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3307339.3342170","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38067063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Fusion in Breast Cancer Histology Classification. 融合在乳腺癌组织学分类中的应用。
Juan Vizcarra, Ryan Place, Li Tong, David Gutman, May D Wang

Breast cancer is a deadly disease that affects millions of women worldwide. The International Conference on Image Analysis and Recognition in 2018 presents the BreAst Cancer Histology (ICIAR2018 BACH) image data challenge that calls for computer tools to assist pathologists and doctors in the clinical diagnosis of breast cancer subtypes. Using the BACH dataset, we have developed an image classification pipeline that combines both a shallow learner (support vector machine) and a deep learner (convolutional neural network). The shallow learner and deep learners achieved moderate accuracies of 79% and 81% individually. When being integrated by fusion algorithms, the system outperformed any individual learner with the highest accuracy as 92%. The fusion presents big potential for improving clinical design support.

乳腺癌是一种致命的疾病,影响着全世界数百万妇女。2018年图像分析与识别国际会议提出了乳腺癌组织学(ICIAR2018 BACH)图像数据挑战,要求计算机工具协助病理学家和医生进行乳腺癌亚型的临床诊断。使用BACH数据集,我们开发了一个图像分类管道,该管道结合了浅学习器(支持向量机)和深度学习器(卷积神经网络)。浅层学习器和深度学习器分别达到了79%和81%的中等准确率。当通过融合算法集成时,该系统以92%的最高准确率优于任何单个学习者。这种融合为改善临床设计支持提供了巨大的潜力。
{"title":"Fusion in Breast Cancer Histology Classification.","authors":"Juan Vizcarra,&nbsp;Ryan Place,&nbsp;Li Tong,&nbsp;David Gutman,&nbsp;May D Wang","doi":"10.1145/3307339.3342166","DOIUrl":"https://doi.org/10.1145/3307339.3342166","url":null,"abstract":"<p><p>Breast cancer is a deadly disease that affects millions of women worldwide. The International Conference on Image Analysis and Recognition in 2018 presents the BreAst Cancer Histology (ICIAR2018 BACH) image data challenge that calls for computer tools to assist pathologists and doctors in the clinical diagnosis of breast cancer subtypes. Using the BACH dataset, we have developed an image classification pipeline that combines both a shallow learner (support vector machine) and a deep learner (convolutional neural network). The shallow learner and deep learners achieved moderate accuracies of 79% and 81% individually. When being integrated by fusion algorithms, the system outperformed any individual learner with the highest accuracy as 92%. The fusion presents big potential for improving clinical design support.</p>","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":"2019 ","pages":"485-493"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3307339.3342166","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38136422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
A Histogram-based Outlier Profile for Atomic Structures Derived from Cryo-Electron Microscopy. 基于直方图的异常值剖面图,用于从冷冻电子显微镜得出的原子结构。
Lin Chen, Jing He

As more atomic structures are determined from cryo-electron microscopy (cryo-EM) density maps, validation of such structures is an important task. We report findings after analyzing the change of cryo-EM structures in a comparison between those released by December 2016 and those released between 2017 and 2019. The cryo-EM models created from density maps with resolution better than 6 Å were divided into six data sets. A histogram-based outlier score (HBOS) was implemented and validation reports were collected from the Protein Data Bank. The results suggest that the overall quality of EM structures released after December 2016 is better than that of structures released before 2017. The conformation qualities of most residue types might have been improved, except for Leucine, Phenylalanine, and Serine in high-resolution datasets (higher than 4 Å). We observe that structures solved from 0-4 Å resolution density maps have an almost identical HBOS profile as that of structures derived from density maps with 4-6 Å resolution.

随着越来越多的原子结构由冷冻电镜(cryo-EM)密度图确定,对这些结构进行验证是一项重要任务。我们对 2016 年 12 月之前发布的低温电子显微镜结构与 2017 年至 2019 年之间发布的低温电子显微镜结构的变化进行了比较分析,并报告了分析结果。根据分辨率优于 6 Å 的密度图创建的冷冻电镜模型被分为六个数据集。采用了基于直方图的离群点评分(HBOS),并从蛋白质数据库收集了验证报告。结果表明,2016 年 12 月之后发布的 EM 结构的整体质量优于 2017 年之前发布的结构。除了高分辨率数据集(高于 4 Å)中的亮氨酸、苯丙氨酸和丝氨酸外,大多数残基类型的构象质量可能都有所改善。我们观察到,根据 0-4 Å 分辨率密度图解算出的结构与根据 4-6 Å 分辨率密度图得出的结构具有几乎相同的 HBOS 曲线。
{"title":"A Histogram-based Outlier Profile for Atomic Structures Derived from Cryo-Electron Microscopy.","authors":"Lin Chen, Jing He","doi":"10.1145/3307339.3343865","DOIUrl":"10.1145/3307339.3343865","url":null,"abstract":"<p><p>As more atomic structures are determined from cryo-electron microscopy (cryo-EM) density maps, validation of such structures is an important task. We report findings after analyzing the change of cryo-EM structures in a comparison between those released by December 2016 and those released between 2017 and 2019. The cryo-EM models created from density maps with resolution better than 6 Å were divided into six data sets. A histogram-based outlier score (HBOS) was implemented and validation reports were collected from the Protein Data Bank. The results suggest that the overall quality of EM structures released after December 2016 is better than that of structures released before 2017. The conformation qualities of most residue types might have been improved, except for Leucine, Phenylalanine, and Serine in high-resolution datasets (higher than 4 Å). We observe that structures solved from 0-4 Å resolution density maps have an almost identical HBOS profile as that of structures derived from density maps with 4-6 Å resolution.</p>","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":" ","pages":"586-591"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9279010/pdf/nihms-1662219.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40507828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving Validity of Cause of Death on Death Certificates. 提高死亡证明书死因的有效性。
Ryan A Hoffman, Janani Venugopalan, Li Qu, Hang Wu, May D Wang
Accurate reporting of causes of death on death certificates is essential to formulate appropriate disease control, prevention and emergency response by national health-protection institutions such as Center for disease prevention and control (CDC). In this study, we utilize knowledge from publicly available expert-formulated rules for the cause of death to determine the extent of discordance in the death certificates in national mortality data with the expert knowledge base. We also report the most commonly occurring invalid causal pairs which physicians put in the death certificates. We use sequence rule mining to find patterns that are most frequent on death certificates and compare them with the rules from the expert knowledge based. Based on our results, 20.1% of the common patterns derived from entries into death certificates were discordant. The most probable causes of these discordance or invalid rules are missing steps and non-specific ICD-10 codes on the death certificates.
在死亡证明上准确报告死亡原因是疾病预防控制中心等国家卫生保护机构制定适当的疾病控制、预防和应急措施的必要条件。在这项研究中,我们利用来自公开可用的专家制定的死因规则的知识来确定国家死亡率数据中死亡证明与专家知识库的不一致程度。我们还报告了医生在死亡证明中填写的最常见的无效因果对。我们使用序列规则挖掘来发现死亡证明中最常见的模式,并将其与基于专家知识的规则进行比较。根据我们的结果,从死亡证明条目中得出的常见模式中有20.1%是不一致的。这些不一致或无效规则的最可能原因是缺少步骤和死亡证明上的非特定ICD-10代码。
{"title":"Improving Validity of Cause of Death on Death Certificates.","authors":"Ryan A Hoffman, Janani Venugopalan, Li Qu, Hang Wu, May D Wang","doi":"10.1145/3233547.3233581","DOIUrl":"10.1145/3233547.3233581","url":null,"abstract":"Accurate reporting of causes of death on death certificates is essential to formulate appropriate disease control, prevention and emergency response by national health-protection institutions such as Center for disease prevention and control (CDC). In this study, we utilize knowledge from publicly available expert-formulated rules for the cause of death to determine the extent of discordance in the death certificates in national mortality data with the expert knowledge base. We also report the most commonly occurring invalid causal pairs which physicians put in the death certificates. We use sequence rule mining to find patterns that are most frequent on death certificates and compare them with the rules from the expert knowledge based. Based on our results, 20.1% of the common patterns derived from entries into death certificates were discordant. The most probable causes of these discordance or invalid rules are missing steps and non-specific ICD-10 codes on the death certificates.","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":" ","pages":"178-183"},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3233547.3233581","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38067060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Target Gene Prediction of Transcription Factor Using a New Neighborhood-regularized Tri-factorization One-class Collaborative Filtering Algorithm. 基于邻域正则化三因子一类协同过滤算法的转录因子靶基因预测。
Hansaim Lim, Lei Xie

Identifying the target genes of transcription factors (TFs) is one of the key factors to understand transcriptional regulation. However, our understanding of genome-wide TF targeting profile is limited due to the cost of large scale experiments and intrinsic complexity. Thus, computational prediction methods are useful to predict the unobserved associations. Here, we developed a new one-class collaborative filtering algorithm tREMAP that is based on regularized, weighted nonnegative matrix tri-factorization. The algorithm predicts unobserved target genes for TFs using known gene-TF associations and protein-protein interaction network. Our benchmark study shows that tREMAP significantly outperforms its counterpart REMAP, a bi-factorization-based algorithm, for transcription factor target gene prediction in all four performance metrics AUC, MAP, MPR, and HLU. When evaluated by independent data sets, the prediction accuracy is 37.8% on the top 495 predicted associations, an enrichment factor of 4.19 compared with the random guess. Furthermore, many of the predicted novel associations by tREMAP are supported by evidence from literature. Although we only use canonical TF-target gene interaction data in this study, tREMAP can be directly applied to tissue-specific data sets. tREMAP provides a framework to integrate multiple omics data for the further improvement of TF target gene prediction. Thus, tREMAP is a potentially useful tool in studying gene regulatory networks. The benchmark data set and the source code of tREMAP are freely available at https://github.com/hansaimlim/REMAP/tree/master/TriFacREMAP.

确定转录因子的靶基因是了解转录调控的关键因素之一。然而,由于大规模实验的成本和内在的复杂性,我们对全基因组TF靶向谱的理解有限。因此,计算预测方法对于预测未观察到的关联是有用的。在这里,我们开发了一种新的一类协同过滤算法tREMAP,该算法基于正则化,加权非负矩阵三因子分解。该算法利用已知的基因- tf关联和蛋白-蛋白相互作用网络预测未观察到的tf靶基因。我们的基准研究表明,在所有四个性能指标AUC、MAP、MPR和HLU方面,tREMAP在转录因子靶基因预测方面都明显优于REMAP(一种基于双因子分解的算法)。当用独立数据集评估时,对前495个预测关联的预测准确率为37.8%,与随机猜测相比,富集系数为4.19。此外,tREMAP预测的许多新关联都得到了文献证据的支持。虽然我们在本研究中只使用了标准的tf靶基因相互作用数据,但tREMAP可以直接应用于组织特异性数据集。tREMAP为进一步完善TF靶基因预测提供了一个整合多组学数据的框架。因此,tREMAP是研究基因调控网络的潜在有用工具。tREMAP的基准数据集和源代码可以在https://github.com/hansaimlim/REMAP/tree/master/TriFacREMAP上免费获得。
{"title":"Target Gene Prediction of Transcription Factor Using a New Neighborhood-regularized Tri-factorization One-class Collaborative Filtering Algorithm.","authors":"Hansaim Lim,&nbsp;Lei Xie","doi":"10.1145/3233547.3233551","DOIUrl":"https://doi.org/10.1145/3233547.3233551","url":null,"abstract":"<p><p>Identifying the target genes of transcription factors (TFs) is one of the key factors to understand transcriptional regulation. However, our understanding of genome-wide TF targeting profile is limited due to the cost of large scale experiments and intrinsic complexity. Thus, computational prediction methods are useful to predict the unobserved associations. Here, we developed a new one-class collaborative filtering algorithm tREMAP that is based on regularized, weighted nonnegative matrix tri-factorization. The algorithm predicts unobserved target genes for TFs using known gene-TF associations and protein-protein interaction network. Our benchmark study shows that tREMAP significantly outperforms its counterpart REMAP, a bi-factorization-based algorithm, for transcription factor target gene prediction in all four performance metrics AUC, MAP, MPR, and HLU. When evaluated by independent data sets, the prediction accuracy is 37.8% on the top 495 predicted associations, an enrichment factor of 4.19 compared with the random guess. Furthermore, many of the predicted novel associations by tREMAP are supported by evidence from literature. Although we only use canonical TF-target gene interaction data in this study, tREMAP can be directly applied to tissue-specific data sets. tREMAP provides a framework to integrate multiple omics data for the further improvement of TF target gene prediction. Thus, tREMAP is a potentially useful tool in studying gene regulatory networks. The benchmark data set and the source code of tREMAP are freely available at https://github.com/hansaimlim/REMAP/tree/master/TriFacREMAP.</p>","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":"2018 ","pages":"1-10"},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3233547.3233551","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37380671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Using Combined Features to Analyze Atomic Structures Derived from Cryo-EM Density Maps. 利用组合特征分析从低温电镜密度图中得到的原子结构。
Lin Chen, Jing He

Cryo-electron microscopy (cryo-EM) has become a major technique for protein structure determination. Many atomic structures have been derived from cryo-EM density maps of about 3Å resolution. Side-chain conformations are well determined in density maps with super-resolutions such as 1-2Å. It is desirable to have a statistical method to detect anomalous side-chains without a super-resolution density map. In this study, we analyzed structures derived from X-ray density maps with higher than 1.5Å resolution and those from cryo-EM density maps with 2-4 Å and 4-6 Å resolutions respectively. We introduce a histogram-based outlier score (HBOS) for anomaly detection in protein models built from cryo-EM density maps. This method uses the statistics derived from X-ray dataset (<1.5Å) as the reference and combines five features involving the distal block distance, side-chain length, phi, psi, and first chi angle of the residue. Higher percentages of anomalies were observed in the cryo-EM models than in the super-resolution X-ray models. Lower percentages of anomalies were observed in cryo-EM models derived after January 2017 than those derived before 2017.

低温电子显微镜(cryo-EM)已成为测定蛋白质结构的主要技术。许多原子结构已经从大约3Å分辨率的低温电镜密度图中得到。侧链构象在超分辨率的密度图中可以很好地确定,例如1-2Å。希望有一种不需要超分辨率密度图的统计方法来检测异常侧链。在这项研究中,我们分别分析了分辨率高于1.5Å的x射线密度图和分辨率为2-4 Å和4-6 Å的低温电镜密度图的结构。我们引入了一种基于直方图的异常值评分(HBOS),用于从低温电镜密度图构建的蛋白质模型的异常检测。该方法使用x射线数据集(
{"title":"Using Combined Features to Analyze Atomic Structures Derived from Cryo-EM Density Maps.","authors":"Lin Chen,&nbsp;Jing He","doi":"10.1145/3233547.3233709","DOIUrl":"https://doi.org/10.1145/3233547.3233709","url":null,"abstract":"<p><p>Cryo-electron microscopy (cryo-EM) has become a major technique for protein structure determination. Many atomic structures have been derived from cryo-EM density maps of about 3Å resolution. Side-chain conformations are well determined in density maps with super-resolutions such as 1-2Å. It is desirable to have a statistical method to detect anomalous side-chains without a super-resolution density map. In this study, we analyzed structures derived from X-ray density maps with higher than 1.5Å resolution and those from cryo-EM density maps with 2-4 Å and 4-6 Å resolutions respectively. We introduce a histogram-based outlier score (HBOS) for anomaly detection in protein models built from cryo-EM density maps. This method uses the statistics derived from X-ray dataset (<1.5Å) as the reference and combines five features involving the distal block distance, side-chain length, phi, psi, and first chi angle of the residue. Higher percentages of anomalies were observed in the cryo-EM models than in the super-resolution X-ray models. Lower percentages of anomalies were observed in cryo-EM models derived after January 2017 than those derived before 2017.</p>","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":"2018 ","pages":"651-655"},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3233547.3233709","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9869907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1