To determine the gene content of an organism, the reads generated by the sequencing process must be assembled using an assembly strategy, either by reference or de novo. However, this process often results in multiple sequences called contigs, which, after the sorting steps, are grouped into scaffolds. The completion stage aims to obtain a single genomic sequence, called a complete genome, which is not a trivial task. Various analytical strategies have been developed to help in this process, many of which have been implemented in computer tools to obtain complete genomes or as close to this as possible, the so-called drafts. The manuscript presents ContigPolishing, a computational tool with a simple and intuitive graphical interface, developed to improve the assembly of prokaryotic genomes, such as bacteria and metagenomes. Despite existing software, there is a gap for solutions that combine simplicity and robustness. ContigPolishing addresses this need, featuring an integrated database that allows processing to be resumed at any time. The tool was validated with 90 NCBI datasets from genera such as Escherichia coli, Corynebacterium, and Nocardia, as well as raw reads from the SRA database to simulate real-world situations. The results showed improvement in the contiguity of the assemblies, with an increase in N50 and improvement in L50, and a reduction in the number of contigs, by extending the contigs using the similarity between their flanks. In some cases, the software was able to elevate the status of genomes from draft to complete, proving its efficiency. ContigPolishing is available at: https://github.com/allanverasce/contigpolishing.
扫码关注我们
求助内容:
应助结果提醒方式:
