Identification of Copy Number Aberrations in Breast Cancer Subtypes Using Persistence Topology.

Microarrays Pub Date : 2015-08-12 DOI:10.3390/microarrays4030339

Javier Arsuaga, Tyler Borrman, Raymond Cavalcante, Georgina Gonzalez, Catherine Park

{"title":"Identification of Copy Number Aberrations in Breast Cancer Subtypes Using Persistence Topology.","authors":"Javier Arsuaga, Tyler Borrman, Raymond Cavalcante, Georgina Gonzalez, Catherine Park","doi":"10.3390/microarrays4030339","DOIUrl":null,"url":null,"abstract":"<p><p>DNA copy number aberrations (CNAs) are of biological and medical interest because they help identify regulatory mechanisms underlying tumor initiation and evolution. Identification of tumor-driving CNAs (driver CNAs) however remains a challenging task, because they are frequently hidden by CNAs that are the product of random events that take place during tumor evolution. Experimental detection of CNAs is commonly accomplished through array comparative genomic hybridization (aCGH) assays followed by supervised and/or unsupervised statistical methods that combine the segmented profiles of all patients to identify driver CNAs. Here, we extend a previously-presented supervised algorithm for the identification of CNAs that is based on a topological representation of the data. Our method associates a two-dimensional (2D) point cloud with each aCGH profile and generates a sequence of simplicial complexes, mathematical objects that generalize the concept of a graph. This representation of the data permits segmenting the data at different resolutions and identifying CNAs by interrogating the topological properties of these simplicial complexes. We tested our approach on a published dataset with the goal of identifying specific breast cancer CNAs associated with specific molecular subtypes. Identification of CNAs associated with each subtype was performed by analyzing each subtype separately from the others and by taking the rest of the subtypes as the control. Our results found a new amplification in 11q at the location of the progesterone receptor in the Luminal A subtype. Aberrations in the Luminal B subtype were found only upon removal of the basal-like subtype from the control set. Under those conditions, all regions found in the original publication, except for 17q, were confirmed; all aberrations, except those in chromosome arms 8q and 12q were confirmed in the basal-like subtype. These two chromosome arms, however, were detected only upon removal of three patients with exceedingly large copy number values. More importantly, we detected 10 and 21 additional regions in the Luminal B and basal-like subtypes, respectively. Most of the additional regions were either validated on an independent dataset and/or using GISTIC. Furthermore, we found three new CNAs in the basal-like subtype: a combination of gains and losses in 1p, a gain in 2p and a loss in 14q. Based on these results, we suggest that topological approaches that incorporate multiresolution analyses and that interrogate topological properties of the data can help in the identification of copy number changes in cancer. </p>","PeriodicalId":56355,"journal":{"name":"Microarrays","volume":" ","pages":"339-69"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3390/microarrays4030339","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microarrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/microarrays4030339","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 21

Abstract

DNA copy number aberrations (CNAs) are of biological and medical interest because they help identify regulatory mechanisms underlying tumor initiation and evolution. Identification of tumor-driving CNAs (driver CNAs) however remains a challenging task, because they are frequently hidden by CNAs that are the product of random events that take place during tumor evolution. Experimental detection of CNAs is commonly accomplished through array comparative genomic hybridization (aCGH) assays followed by supervised and/or unsupervised statistical methods that combine the segmented profiles of all patients to identify driver CNAs. Here, we extend a previously-presented supervised algorithm for the identification of CNAs that is based on a topological representation of the data. Our method associates a two-dimensional (2D) point cloud with each aCGH profile and generates a sequence of simplicial complexes, mathematical objects that generalize the concept of a graph. This representation of the data permits segmenting the data at different resolutions and identifying CNAs by interrogating the topological properties of these simplicial complexes. We tested our approach on a published dataset with the goal of identifying specific breast cancer CNAs associated with specific molecular subtypes. Identification of CNAs associated with each subtype was performed by analyzing each subtype separately from the others and by taking the rest of the subtypes as the control. Our results found a new amplification in 11q at the location of the progesterone receptor in the Luminal A subtype. Aberrations in the Luminal B subtype were found only upon removal of the basal-like subtype from the control set. Under those conditions, all regions found in the original publication, except for 17q, were confirmed; all aberrations, except those in chromosome arms 8q and 12q were confirmed in the basal-like subtype. These two chromosome arms, however, were detected only upon removal of three patients with exceedingly large copy number values. More importantly, we detected 10 and 21 additional regions in the Luminal B and basal-like subtypes, respectively. Most of the additional regions were either validated on an independent dataset and/or using GISTIC. Furthermore, we found three new CNAs in the basal-like subtype: a combination of gains and losses in 1p, a gain in 2p and a loss in 14q. Based on these results, we suggest that topological approaches that incorporate multiresolution analyses and that interrogate topological properties of the data can help in the identification of copy number changes in cancer.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用持久性拓扑识别乳腺癌亚型的拷贝数畸变。

DNA拷贝数畸变（CNAs）具有生物学和医学意义，因为它们有助于确定肿瘤起始和进化的调控机制。然而，识别肿瘤驱动CNA（驱动CNA）仍然是一项具有挑战性的任务，因为它们经常被CNA隐藏，而CNA是肿瘤进化过程中发生的随机事件的产物。CNA的实验检测通常通过阵列比较基因组杂交（aCGH）分析，然后是有监督和/或无监督的统计方法来完成，这些方法结合所有患者的分段图谱来识别驱动CNA。在这里，我们扩展了之前提出的用于识别CNA的监督算法，该算法基于数据的拓扑表示。我们的方法将二维（2D）点云与每个aCGH轮廓相关联，并生成一系列单纯复形，即推广图概念的数学对象。数据的这种表示允许以不同的分辨率分割数据，并通过询问这些单复数的拓扑性质来识别CNA。我们在已发表的数据集上测试了我们的方法，目的是识别与特定分子亚型相关的特定乳腺癌症CNA。通过将每个亚型与其他亚型分开分析，并将其余亚型作为对照，对与每个亚型相关的CNA进行鉴定。我们的研究结果发现，在Luminal a亚型中黄体酮受体的11q位置有一个新的扩增。Luminal B亚型的畸变仅在从对照组中去除基底样亚型后才发现。在这些条件下，除17q外，原始出版物中发现的所有区域都得到了确认；除染色体臂8q和12q外，所有畸变均在基底样亚型中得到证实。然而，这两条染色体臂只有在切除三名拷贝数值极高的患者后才被检测到。更重要的是，我们在Luminal B和基底样亚型中分别检测到10个和21个额外的区域。大多数附加区域要么在独立数据集上进行验证，要么使用GISTIC进行验证。此外，我们在基底样亚型中发现了三种新的CNA：1p的增益和损失的组合，2p的增益和14q的损失。基于这些结果，我们建议结合多分辨率分析和询问数据拓扑属性的拓扑方法可以帮助识别癌症的拷贝数变化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Microarrays

自引率

0.00%

发文量

审稿时长

11 weeks

期刊介绍： High-Throughput (formerly Microarrays, ISSN 2076-3905) is a multidisciplinary peer-reviewed scientific journal that provides an advanced forum for the publication of studies reporting high-dimensional approaches and developments in Life Sciences, Chemistry and related fields. Our aim is to encourage scientists to publish their experimental and theoretical results based on high-throughput techniques as well as computational and statistical tools for data analysis and interpretation. The full experimental or methodological details must be provided so that the results can be reproduced. There is no restriction on the length of the papers. High-Throughput invites submissions covering several topics, including, but not limited to: Microarrays, DNA Sequencing, RNA Sequencing, Protein Identification and Quantification, Cell-based Approaches, Omics Technologies, Imaging, Bioinformatics, Computational Biology/Chemistry, Statistics, Integrative Omics, Drug Discovery and Development, Microfluidics, Lab-on-a-chip, Data Mining, Databases, Multiplex Assays.