USNAP:快速独特致密区域检测及其在肺癌中的应用。

IF 4.4 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Bioinformatics Pub Date : 2023-08-01 DOI:10.1093/bioinformatics/btad477
Serene W H Wong, Chiara Pastrello, Max Kotlyar, Christos Faloutsos, Igor Jurisica
{"title":"USNAP:快速独特致密区域检测及其在肺癌中的应用。","authors":"Serene W H Wong, Chiara Pastrello, Max Kotlyar, Christos Faloutsos, Igor Jurisica","doi":"10.1093/bioinformatics/btad477","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Many real-world problems can be modeled as annotated graphs. Scalable graph algorithms that extract actionable information from such data are in demand since these graphs are large, varying in topology, and have diverse node/edge annotations. When these graphs change over time they create dynamic graphs, and open the possibility to find patterns across different time points. In this article, we introduce a scalable algorithm that finds unique dense regions across time points in dynamic graphs. Such algorithms have applications in many different areas, including the biological, financial, and social domains.</p><p><strong>Results: </strong>There are three important contributions to this manuscript. First, we designed a scalable algorithm, USNAP, to effectively identify dense subgraphs that are unique to a time stamp given a dynamic graph. Importantly, USNAP provides a lower bound of the density measure in each step of the greedy algorithm. Second, insights and understanding obtained from validating USNAP on real data show its effectiveness. While USNAP is domain independent, we applied it to four non-small cell lung cancer gene expression datasets. Stages in non-small cell lung cancer were modeled as dynamic graphs, and input to USNAP. Pathway enrichment analyses and comprehensive interpretations from literature show that USNAP identified biologically relevant mechanisms for different stages of cancer progression. Third, USNAP is scalable, and has a time complexity of O(m+mc log nc+nc log nc), where m is the number of edges, and n is the number of vertices in the dynamic graph; mc is the number of edges, and nc is the number of vertices in the collapsed graph.</p><p><strong>Availability and implementation: </strong>The code of USNAP is available at https://www.cs.utoronto.ca/~juris/data/USNAP22.</p>","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"39 8","pages":""},"PeriodicalIF":4.4000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10425186/pdf/","citationCount":"0","resultStr":"{\"title\":\"USNAP: fast unique dense region detection and its application to lung cancer.\",\"authors\":\"Serene W H Wong, Chiara Pastrello, Max Kotlyar, Christos Faloutsos, Igor Jurisica\",\"doi\":\"10.1093/bioinformatics/btad477\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Motivation: </strong>Many real-world problems can be modeled as annotated graphs. Scalable graph algorithms that extract actionable information from such data are in demand since these graphs are large, varying in topology, and have diverse node/edge annotations. When these graphs change over time they create dynamic graphs, and open the possibility to find patterns across different time points. In this article, we introduce a scalable algorithm that finds unique dense regions across time points in dynamic graphs. Such algorithms have applications in many different areas, including the biological, financial, and social domains.</p><p><strong>Results: </strong>There are three important contributions to this manuscript. First, we designed a scalable algorithm, USNAP, to effectively identify dense subgraphs that are unique to a time stamp given a dynamic graph. Importantly, USNAP provides a lower bound of the density measure in each step of the greedy algorithm. Second, insights and understanding obtained from validating USNAP on real data show its effectiveness. While USNAP is domain independent, we applied it to four non-small cell lung cancer gene expression datasets. Stages in non-small cell lung cancer were modeled as dynamic graphs, and input to USNAP. Pathway enrichment analyses and comprehensive interpretations from literature show that USNAP identified biologically relevant mechanisms for different stages of cancer progression. Third, USNAP is scalable, and has a time complexity of O(m+mc log nc+nc log nc), where m is the number of edges, and n is the number of vertices in the dynamic graph; mc is the number of edges, and nc is the number of vertices in the collapsed graph.</p><p><strong>Availability and implementation: </strong>The code of USNAP is available at https://www.cs.utoronto.ca/~juris/data/USNAP22.</p>\",\"PeriodicalId\":8903,\"journal\":{\"name\":\"Bioinformatics\",\"volume\":\"39 8\",\"pages\":\"\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2023-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10425186/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/bioinformatics/btad477\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btad477","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

摘要

动机现实世界中的许多问题都可以用带注释的图来建模。由于这些图规模庞大、拓扑结构各异、节点/边注释多样,因此需要能从此类数据中提取可操作信息的可扩展图算法。当这些图随时间发生变化时,它们就会形成动态图,为发现不同时间点的模式提供了可能性。在本文中,我们介绍了一种可扩展的算法,它能在动态图中找到跨时间点的独特密集区域。这种算法可应用于许多不同领域,包括生物、金融和社会领域:本手稿有三个重要贡献。首先,我们设计了一种可扩展算法 USNAP,可有效识别动态图中时间戳唯一的密集子图。重要的是,USNAP 在贪婪算法的每个步骤中都提供了密度度量的下限。其次,在真实数据上验证 USNAP 所获得的洞察力和理解力表明了它的有效性。虽然 USNAP 不受领域限制,但我们将其应用于四个非小细胞肺癌基因表达数据集。非小细胞肺癌的分期被建模为动态图,并输入 USNAP。通路富集分析和文献综合解释表明,USNAP 发现了癌症进展不同阶段的生物学相关机制。第三,USNAP具有可扩展性,其时间复杂度为O(m+mc log nc+nc log nc),其中m为动态图中的边数,n为顶点数;mc为折叠图中的边数,nc为顶点数:USNAP 的代码见 https://www.cs.utoronto.ca/~juris/data/USNAP22。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
USNAP: fast unique dense region detection and its application to lung cancer.

Motivation: Many real-world problems can be modeled as annotated graphs. Scalable graph algorithms that extract actionable information from such data are in demand since these graphs are large, varying in topology, and have diverse node/edge annotations. When these graphs change over time they create dynamic graphs, and open the possibility to find patterns across different time points. In this article, we introduce a scalable algorithm that finds unique dense regions across time points in dynamic graphs. Such algorithms have applications in many different areas, including the biological, financial, and social domains.

Results: There are three important contributions to this manuscript. First, we designed a scalable algorithm, USNAP, to effectively identify dense subgraphs that are unique to a time stamp given a dynamic graph. Importantly, USNAP provides a lower bound of the density measure in each step of the greedy algorithm. Second, insights and understanding obtained from validating USNAP on real data show its effectiveness. While USNAP is domain independent, we applied it to four non-small cell lung cancer gene expression datasets. Stages in non-small cell lung cancer were modeled as dynamic graphs, and input to USNAP. Pathway enrichment analyses and comprehensive interpretations from literature show that USNAP identified biologically relevant mechanisms for different stages of cancer progression. Third, USNAP is scalable, and has a time complexity of O(m+mc log nc+nc log nc), where m is the number of edges, and n is the number of vertices in the dynamic graph; mc is the number of edges, and nc is the number of vertices in the collapsed graph.

Availability and implementation: The code of USNAP is available at https://www.cs.utoronto.ca/~juris/data/USNAP22.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Bioinformatics
Bioinformatics 生物-生化研究方法
CiteScore
11.20
自引率
5.20%
发文量
753
审稿时长
2.1 months
期刊介绍: The leading journal in its field, Bioinformatics publishes the highest quality scientific papers and review articles of interest to academic and industrial researchers. Its main focus is on new developments in genome bioinformatics and computational biology. Two distinct sections within the journal - Discovery Notes and Application Notes- focus on shorter papers; the former reporting biologically interesting discoveries using computational methods, the latter exploring the applications used for experiments.
期刊最新文献
MEHunter: Transformer-based mobile element variant detection from long reads Metabolic syndrome may be more frequent in treatment-naive sarcoidosis patients. Coracle—A Machine Learning Framework to Identify Bacteria Associated with Continuous Variables CoSIA: an R Bioconductor package for CrOss Species Investigation and Analysis LncLocFormer: a Transformer-based deep learning model for multi-label lncRNA subcellular localization prediction by using localization-specific attention mechanism
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1