Francisco Sena , Eliel Ingervo , Shahbaz Khan , Andrey Prjibelski , Sebastian Schmidt , Alexandru Tomescu
{"title":"Flowtigs: Safety in flow decompositions for assembly graphs","authors":"Francisco Sena , Eliel Ingervo , Shahbaz Khan , Andrey Prjibelski , Sebastian Schmidt , Alexandru Tomescu","doi":"10.1016/j.isci.2024.111208","DOIUrl":null,"url":null,"abstract":"<div><div>A <em>decomposition</em> of a network flow is a set of weighted walks whose superposition equals the flow. In this article, we give a simple and linear-time-verifiable complete characterization (<em>flowtigs</em>) of walks that are <em>safe</em> in such general flow decompositions, i.e., that are subwalks of any possible flow decomposition. We provide an <em>O</em>(<em>mn</em>)-time algorithm that identifies all maximal flowtigs and represents them inside a compact structure. On the practical side, we study flowtigs in the use-case of metagenomic assembly. By using the species abundances as flow values of the metagenomic assembly graph, we can model the possible assembly solutions as flow decompositions into weighted closed walks. On simulated data, compared to reporting unitigs or maximal safe walks based only on the graph structure, reporting flowtigs results in a notably more contiguous assembly. On real data, we frame flowtigs as a heuristic and provide an algorithm that is guided by this heuristic.</div></div>","PeriodicalId":342,"journal":{"name":"iScience","volume":"27 12","pages":"Article 111208"},"PeriodicalIF":4.6000,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"iScience","FirstCategoryId":"103","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2589004224024337","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
A decomposition of a network flow is a set of weighted walks whose superposition equals the flow. In this article, we give a simple and linear-time-verifiable complete characterization (flowtigs) of walks that are safe in such general flow decompositions, i.e., that are subwalks of any possible flow decomposition. We provide an O(mn)-time algorithm that identifies all maximal flowtigs and represents them inside a compact structure. On the practical side, we study flowtigs in the use-case of metagenomic assembly. By using the species abundances as flow values of the metagenomic assembly graph, we can model the possible assembly solutions as flow decompositions into weighted closed walks. On simulated data, compared to reporting unitigs or maximal safe walks based only on the graph structure, reporting flowtigs results in a notably more contiguous assembly. On real data, we frame flowtigs as a heuristic and provide an algorithm that is guided by this heuristic.
期刊介绍:
Science has many big remaining questions. To address them, we will need to work collaboratively and across disciplines. The goal of iScience is to help fuel that type of interdisciplinary thinking. iScience is a new open-access journal from Cell Press that provides a platform for original research in the life, physical, and earth sciences. The primary criterion for publication in iScience is a significant contribution to a relevant field combined with robust results and underlying methodology. The advances appearing in iScience include both fundamental and applied investigations across this interdisciplinary range of topic areas. To support transparency in scientific investigation, we are happy to consider replication studies and papers that describe negative results.
We know you want your work to be published quickly and to be widely visible within your community and beyond. With the strong international reputation of Cell Press behind it, publication in iScience will help your work garner the attention and recognition it merits. Like all Cell Press journals, iScience prioritizes rapid publication. Our editorial team pays special attention to high-quality author service and to efficient, clear-cut decisions based on the information available within the manuscript. iScience taps into the expertise across Cell Press journals and selected partners to inform our editorial decisions and help publish your science in a timely and seamless way.