首页 > 最新文献

Source Code for Biology and Medicine最新文献

英文 中文
plot2groups: an R package to plot scatter points for two groups of values plot2groups:一个R包,用于绘制两组值的散点
Q2 Decision Sciences Pub Date : 2014-03-15 DOI: 10.1186/1751-0473-9-23
Yong Xu, Fuquan Zhang, Guoqiang Wang, Hongbao Cao, Y. Shugart, Zaohuo Cheng
{"title":"plot2groups: an R package to plot scatter points for two groups of values","authors":"Yong Xu, Fuquan Zhang, Guoqiang Wang, Hongbao Cao, Y. Shugart, Zaohuo Cheng","doi":"10.1186/1751-0473-9-23","DOIUrl":"https://doi.org/10.1186/1751-0473-9-23","url":null,"abstract":"","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2014-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1751-0473-9-23","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"65725332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
PFClust: an optimised implementation of a parameter-free clustering algorithm. PFClust:无参数聚类算法的优化实现。
Q2 Decision Sciences Pub Date : 2014-02-04 DOI: 10.1186/1751-0473-9-5
Khadija Musayeva, Tristan Henderson, John Bo Mitchell, Lazaros Mavridis

Background: A well-known problem in cluster analysis is finding an optimal number of clusters reflecting the inherent structure of the data. PFClust is a partitioning-based clustering algorithm capable, unlike many widely-used clustering algorithms, of automatically proposing an optimal number of clusters for the data.

Results: The results of tests on various types of data showed that PFClust can discover clusters of arbitrary shapes, sizes and densities. The previous implementation of the algorithm had already been successfully used to cluster large macromolecular structures and small druglike compounds. We have greatly improved the algorithm by a more efficient implementation, which enables PFClust to process large data sets acceptably fast.

Conclusions: In this paper we present a new optimized implementation of the PFClust algorithm that runs considerably faster than the original.

背景:聚类分析中一个众所周知的问题是找到反映数据固有结构的最优聚类数量。PFClust是一种基于分区的聚类算法,与许多广泛使用的聚类算法不同,它能够自动为数据提出最佳数量的聚类。结果:对不同类型数据的测试结果表明,PFClust可以发现任意形状、大小和密度的簇。该算法之前的实现已经成功地用于聚类大型大分子结构和小型药物类化合物。我们通过更有效的实现大大改进了算法,使PFClust能够以可接受的速度处理大型数据集。结论:在本文中,我们提出了一种新的PFClust算法优化实现,其运行速度比原始算法快得多。
{"title":"PFClust: an optimised implementation of a parameter-free clustering algorithm.","authors":"Khadija Musayeva,&nbsp;Tristan Henderson,&nbsp;John Bo Mitchell,&nbsp;Lazaros Mavridis","doi":"10.1186/1751-0473-9-5","DOIUrl":"https://doi.org/10.1186/1751-0473-9-5","url":null,"abstract":"<p><strong>Background: </strong>A well-known problem in cluster analysis is finding an optimal number of clusters reflecting the inherent structure of the data. PFClust is a partitioning-based clustering algorithm capable, unlike many widely-used clustering algorithms, of automatically proposing an optimal number of clusters for the data.</p><p><strong>Results: </strong>The results of tests on various types of data showed that PFClust can discover clusters of arbitrary shapes, sizes and densities. The previous implementation of the algorithm had already been successfully used to cluster large macromolecular structures and small druglike compounds. We have greatly improved the algorithm by a more efficient implementation, which enables PFClust to process large data sets acceptably fast.</p><p><strong>Conclusions: </strong>In this paper we present a new optimized implementation of the PFClust algorithm that runs considerably faster than the original.</p>","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"9 1","pages":"5"},"PeriodicalIF":0.0,"publicationDate":"2014-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1751-0473-9-5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32084362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Correction: Dispelling myths about rare disease registry system development. 纠正:消除关于罕见疾病登记系统发展的神话。
Q2 Decision Sciences Pub Date : 2014-01-31 DOI: 10.1186/1751-0473-9-4
Matthew Bellgard, Christophe Beroud, Kay Parkinson, Tess Harris, Segolene Ayme, Gareth Baynam, Tarun Weeramanthri, Hugh Dawkins, Adam Hunter
{"title":"Correction: Dispelling myths about rare disease registry system development.","authors":"Matthew Bellgard,&nbsp;Christophe Beroud,&nbsp;Kay Parkinson,&nbsp;Tess Harris,&nbsp;Segolene Ayme,&nbsp;Gareth Baynam,&nbsp;Tarun Weeramanthri,&nbsp;Hugh Dawkins,&nbsp;Adam Hunter","doi":"10.1186/1751-0473-9-4","DOIUrl":"https://doi.org/10.1186/1751-0473-9-4","url":null,"abstract":"","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"9 1","pages":"4"},"PeriodicalIF":0.0,"publicationDate":"2014-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1751-0473-9-4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32081088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
ROVER variant caller: read-pair overlap considerate variant-calling software applied to PCR-based massively parallel sequencing datasets. ROVER变体调用器:用于基于pcr的大规模并行测序数据集的读取对重叠考虑变体调用软件。
Q2 Decision Sciences Pub Date : 2014-01-24 DOI: 10.1186/1751-0473-9-3
Bernard J Pope, Tú Nguyen-Dumont, Fleur Hammet, Daniel J Park

Background: We recently described Hi-Plex, a highly multiplexed PCR-based target-enrichment system for massively parallel sequencing (MPS), which allows the uniform definition of library size so that subsequent paired-end sequencing can achieve complete overlap of read pairs. Variant calling from Hi-Plex-derived datasets can thus rely on the identification of variants appearing in both reads of read-pairs, permitting stringent filtering of sequencing chemistry-induced errors. These principles underly ROVER software (derived from Read Overlap PCR-MPS variant caller), which we have recently used to report the screening for genetic mutations in the breast cancer predisposition gene PALB2. Here, we describe the algorithms underlying ROVER and its usage.

Results: ROVER enables users to quickly and accurately identify genetic variants from PCR-targeted, overlapping paired-end MPS datasets. The open-source availability of the software and threshold tailorability enables broad access for a range of PCR-MPS users.

Methods: ROVER is implemented in Python and runs on all popular POSIX-like operating systems (Linux, OS X). The software accepts a tab-delimited text file listing the coordinates of the target-specific primers used for targeted enrichment based on a specified genome-build. It also accepts aligned sequence files resulting from mapping to the same genome-build. ROVER identifies the amplicon a given read-pair represents and removes the primer sequences by using the mapping co-ordinates and primer co-ordinates. It considers overlapping read-pairs with respect to primer-intervening sequence. Only when a variant is observed in both reads of a read-pair does the signal contribute to a tally of read-pairs containing or not containing the variant. A user-defined threshold informs the minimum number of, and proportion of, read-pairs a variant must be observed in for a 'call' to be made. ROVER also reports the depth of coverage across amplicons to facilitate the identification of any regions that may require further screening.

Conclusions: ROVER can facilitate rapid and accurate genetic variant calling for a broad range of PCR-MPS users.

背景:我们最近描述了Hi-Plex,这是一种高度复用的基于pcr的大规模并行测序(MPS)目标富集系统,它允许统一定义文库大小,以便随后的成对端测序可以实现读取对的完全重叠。因此,来自hi - plex衍生数据集的变体调用可以依赖于在读取对的两个读取中出现的变体的识别,从而允许严格过滤测序化学引起的错误。这些原则是ROVER软件(源自Read Overlap PCR-MPS variant caller)的基础,我们最近用它来报道乳腺癌易感基因PALB2的基因突变筛选。在这里,我们描述了基于ROVER的算法及其用法。结果:ROVER使用户能够快速准确地从pcr靶向的重叠成对端MPS数据集中识别遗传变异。软件的开源可用性和阈值可定制性使PCR-MPS用户能够广泛访问。方法:ROVER使用Python实现,可在所有流行的类posix操作系统(Linux、OS X)上运行。该软件接受一个以制表符分隔的文本文件,该文件列出了用于基于特定基因组构建的靶向富集的目标特异性引物的坐标。它还接受由映射到相同基因组构建而产生的对齐序列文件。ROVER识别给定读对所代表的扩增子,并通过使用映射坐标和引物坐标去除引物序列。它考虑了相对于引物介入序列的重叠读对。只有当在一对读对的两次读取中都观察到一个变体时,信号才会对包含或不包含该变体的读对计数做出贡献。用户定义的阈值通知了一个变量在进行“调用”时必须观察到的读对的最小数量和比例。ROVER还报告了扩增子的覆盖深度,以方便识别任何可能需要进一步筛选的区域。结论:ROVER可以促进快速和准确的基因变异,需要广泛的PCR-MPS用户。
{"title":"ROVER variant caller: read-pair overlap considerate variant-calling software applied to PCR-based massively parallel sequencing datasets.","authors":"Bernard J Pope,&nbsp;Tú Nguyen-Dumont,&nbsp;Fleur Hammet,&nbsp;Daniel J Park","doi":"10.1186/1751-0473-9-3","DOIUrl":"https://doi.org/10.1186/1751-0473-9-3","url":null,"abstract":"<p><strong>Background: </strong>We recently described Hi-Plex, a highly multiplexed PCR-based target-enrichment system for massively parallel sequencing (MPS), which allows the uniform definition of library size so that subsequent paired-end sequencing can achieve complete overlap of read pairs. Variant calling from Hi-Plex-derived datasets can thus rely on the identification of variants appearing in both reads of read-pairs, permitting stringent filtering of sequencing chemistry-induced errors. These principles underly ROVER software (derived from Read Overlap PCR-MPS variant caller), which we have recently used to report the screening for genetic mutations in the breast cancer predisposition gene PALB2. Here, we describe the algorithms underlying ROVER and its usage.</p><p><strong>Results: </strong>ROVER enables users to quickly and accurately identify genetic variants from PCR-targeted, overlapping paired-end MPS datasets. The open-source availability of the software and threshold tailorability enables broad access for a range of PCR-MPS users.</p><p><strong>Methods: </strong>ROVER is implemented in Python and runs on all popular POSIX-like operating systems (Linux, OS X). The software accepts a tab-delimited text file listing the coordinates of the target-specific primers used for targeted enrichment based on a specified genome-build. It also accepts aligned sequence files resulting from mapping to the same genome-build. ROVER identifies the amplicon a given read-pair represents and removes the primer sequences by using the mapping co-ordinates and primer co-ordinates. It considers overlapping read-pairs with respect to primer-intervening sequence. Only when a variant is observed in both reads of a read-pair does the signal contribute to a tally of read-pairs containing or not containing the variant. A user-defined threshold informs the minimum number of, and proportion of, read-pairs a variant must be observed in for a 'call' to be made. ROVER also reports the depth of coverage across amplicons to facilitate the identification of any regions that may require further screening.</p><p><strong>Conclusions: </strong>ROVER can facilitate rapid and accurate genetic variant calling for a broad range of PCR-MPS users.</p>","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"9 1","pages":"3"},"PeriodicalIF":0.0,"publicationDate":"2014-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1751-0473-9-3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32060349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
MOSAL: software tools for multiobjective sequence alignment. MOSAL:多目标序列比对软件工具。
Q2 Decision Sciences Pub Date : 2014-01-08 DOI: 10.1186/1751-0473-9-2
Luís Paquete, Pedro Matias, Maryam Abbasi, Miguel Pinheiro

: Multiobjective sequence alignment brings the advantage of providing a set of alignments that represent the trade-off between performing insertion/deletions and matching symbols from both sequences. Each of these alignments provide a potential explanation of the relationship between the sequences. We introduce MOSAL, a software tool that provides an open-source implementation and an on-line application for multiobjective pairwise sequence alignment.

:多目标序列比对带来的优势是提供了一组比对,这些比对代表了在执行插入/删除和从两个序列中匹配符号之间的权衡。每一种比对都为序列之间的关系提供了一种可能的解释。我们介绍了MOSAL,这是一个软件工具,它提供了一个开源的实现和多目标成对序列比对的在线应用程序。
{"title":"MOSAL: software tools for multiobjective sequence alignment.","authors":"Luís Paquete, Pedro Matias, Maryam Abbasi, Miguel Pinheiro","doi":"10.1186/1751-0473-9-2","DOIUrl":"10.1186/1751-0473-9-2","url":null,"abstract":"<p><p>: Multiobjective sequence alignment brings the advantage of providing a set of alignments that represent the trade-off between performing insertion/deletions and matching symbols from both sequences. Each of these alignments provide a potential explanation of the relationship between the sequences. We introduce MOSAL, a software tool that provides an open-source implementation and an on-line application for multiobjective pairwise sequence alignment. </p>","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"9 1","pages":"2"},"PeriodicalIF":0.0,"publicationDate":"2014-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1751-0473-9-2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32009792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
TrigNER: automatically optimized biomedical event trigger recognition on scientific documents. 触发器:自动优化生物医学事件触发识别科学文件。
Q2 Decision Sciences Pub Date : 2014-01-08 DOI: 10.1186/1751-0473-9-1
David Campos, Quoc-Chinh Bui, Sérgio Matos, José Luís Oliveira

Background: Cellular events play a central role in the understanding of biological processes and functions, providing insight on both physiological and pathogenesis mechanisms. Automatic extraction of mentions of such events from the literature represents an important contribution to the progress of the biomedical domain, allowing faster updating of existing knowledge. The identification of trigger words indicating an event is a very important step in the event extraction pipeline, since the following task(s) rely on its output. This step presents various complex and unsolved challenges, namely the selection of informative features, the representation of the textual context, and the selection of a specific event type for a trigger word given this context.

Results: We propose TrigNER, a machine learning-based solution for biomedical event trigger recognition, which takes advantage of Conditional Random Fields (CRFs) with a high-end feature set, including linguistic-based, orthographic, morphological, local context and dependency parsing features. Additionally, a completely configurable algorithm is used to automatically optimize the feature set and training parameters for each event type. Thus, it automatically selects the features that have a positive contribution and automatically optimizes the CRF model order, n-grams sizes, vertex information and maximum hops for dependency parsing features. The final output consists of various CRF models, each one optimized to the linguistic characteristics of each event type.

Conclusions: TrigNER was tested in the BioNLP 2009 shared task corpus, achieving a total F-measure of 62.7 and outperforming existing solutions on various event trigger types, namely gene expression, transcription, protein catabolism, phosphorylation and binding. The proposed solution allows researchers to easily apply complex and optimized techniques in the recognition of biomedical event triggers, making its application a simple routine task. We believe this work is an important contribution to the biomedical text mining community, contributing to improved and faster event recognition on scientific articles, and consequent hypothesis generation and knowledge discovery. This solution is freely available as open source at http://bioinformatics.ua.pt/trigner.

背景:细胞事件在理解生物学过程和功能中起着核心作用,为生理和发病机制提供了见解。从文献中自动提取此类事件的提及代表了对生物医学领域进步的重要贡献,允许更快地更新现有知识。识别指示事件的触发词是事件提取管道中非常重要的一步,因为下面的任务依赖于它的输出。这一步提出了各种复杂和未解决的挑战,即信息特征的选择,文本上下文的表示,以及给定该上下文的触发词的特定事件类型的选择。结果:我们提出了一种基于机器学习的生物医学事件触发识别解决方案triger,它利用条件随机场(CRFs)的高端特征集,包括基于语言的、正字法的、形态学的、局部上下文的和依赖解析的特征。此外,采用完全可配置的算法自动优化每种事件类型的特征集和训练参数。因此,它自动选择有积极贡献的特征,并自动优化依赖解析特征的CRF模型顺序、n-grams大小、顶点信息和最大跳数。最终输出由各种CRF模型组成,每个模型都针对每种事件类型的语言特征进行了优化。结论:在BioNLP 2009共享任务语料库中对TrigNER进行了测试,总f值为62.7,在基因表达、转录、蛋白质分解代谢、磷酸化和结合等各种事件触发类型上优于现有解决方案。提出的解决方案使研究人员能够轻松地将复杂和优化的技术应用于生物医学事件触发器的识别,使其应用成为简单的常规任务。我们相信这项工作是对生物医学文本挖掘社区的重要贡献,有助于改进和更快地识别科学文章的事件,以及随之而来的假设生成和知识发现。这个解决方案可以在http://bioinformatics.ua.pt/trigner上免费获得。
{"title":"TrigNER: automatically optimized biomedical event trigger recognition on scientific documents.","authors":"David Campos,&nbsp;Quoc-Chinh Bui,&nbsp;Sérgio Matos,&nbsp;José Luís Oliveira","doi":"10.1186/1751-0473-9-1","DOIUrl":"https://doi.org/10.1186/1751-0473-9-1","url":null,"abstract":"<p><strong>Background: </strong>Cellular events play a central role in the understanding of biological processes and functions, providing insight on both physiological and pathogenesis mechanisms. Automatic extraction of mentions of such events from the literature represents an important contribution to the progress of the biomedical domain, allowing faster updating of existing knowledge. The identification of trigger words indicating an event is a very important step in the event extraction pipeline, since the following task(s) rely on its output. This step presents various complex and unsolved challenges, namely the selection of informative features, the representation of the textual context, and the selection of a specific event type for a trigger word given this context.</p><p><strong>Results: </strong>We propose TrigNER, a machine learning-based solution for biomedical event trigger recognition, which takes advantage of Conditional Random Fields (CRFs) with a high-end feature set, including linguistic-based, orthographic, morphological, local context and dependency parsing features. Additionally, a completely configurable algorithm is used to automatically optimize the feature set and training parameters for each event type. Thus, it automatically selects the features that have a positive contribution and automatically optimizes the CRF model order, n-grams sizes, vertex information and maximum hops for dependency parsing features. The final output consists of various CRF models, each one optimized to the linguistic characteristics of each event type.</p><p><strong>Conclusions: </strong>TrigNER was tested in the BioNLP 2009 shared task corpus, achieving a total F-measure of 62.7 and outperforming existing solutions on various event trigger types, namely gene expression, transcription, protein catabolism, phosphorylation and binding. The proposed solution allows researchers to easily apply complex and optimized techniques in the recognition of biomedical event triggers, making its application a simple routine task. We believe this work is an important contribution to the biomedical text mining community, contributing to improved and faster event recognition on scientific articles, and consequent hypothesis generation and knowledge discovery. This solution is freely available as open source at http://bioinformatics.ua.pt/trigner.</p>","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"9 1","pages":"1"},"PeriodicalIF":0.0,"publicationDate":"2014-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1751-0473-9-1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32010174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Combining de novo and reference-guided assembly with scaffold_builder. 结合de novo和参考引导组装与scaffold_builder。
Q2 Decision Sciences Pub Date : 2013-11-22 DOI: 10.1186/1751-0473-8-23
Genivaldo Gz Silva, Bas E Dutilh, T David Matthews, Keri Elkins, Robert Schmieder, Elizabeth A Dinsdale, Robert A Edwards

Genome sequencing has become routine, however genome assembly still remains a challenge despite the computational advances in the last decade. In particular, the abundance of repeat elements in genomes makes it difficult to assemble them into a single complete sequence. Identical repeats shorter than the average read length can generally be assembled without issue. However, longer repeats such as ribosomal RNA operons cannot be accurately assembled using existing tools. The application Scaffold_builder was designed to generate scaffolds - super contigs of sequences joined by N-bases - based on the similarity to a closely related reference sequence. This is independent of mate-pair information and can be used complementarily for genome assembly, e.g. when mate-pairs are not available or have already been exploited. Scaffold_builder was evaluated using simulated pyrosequencing reads of the bacterial genomes Escherichia coli 042, Lactobacillus salivarius UCC118 and Salmonella enterica subsp. enterica serovar Typhi str. P-stx-12. Moreover, we sequenced two genomes from Salmonella enterica serovar Typhimurium LT2 G455 and Salmonella enterica serovar Typhimurium SDT1291 and show that Scaffold_builder decreases the number of contig sequences by 53% while more than doubling their average length. Scaffold_builder is written in Python and is available at http://edwards.sdsu.edu/scaffold_builder. A web-based implementation is additionally provided to allow users to submit a reference genome and a set of contigs to be scaffolded.

基因组测序已经成为常规,然而基因组组装仍然是一个挑战,尽管在过去的十年中计算的进步。特别是,基因组中大量的重复元素使得很难将它们组装成一个完整的序列。比平均读取长度短的相同重复序列通常可以毫无问题地组装起来。然而,较长的重复序列,如核糖体RNA操纵子,不能使用现有的工具准确地组装。应用程序Scaffold_builder被设计用于基于与密切相关的参考序列的相似性来生成支架-由n个碱基连接的序列的超级contigs。这是独立于配偶对信息,可以互补用于基因组组装,例如,当配偶对不可用或已经被利用。通过模拟焦磷酸测序对大肠杆菌042、唾液乳杆菌UCC118和肠沙门氏菌亚种的细菌基因组进行评估。伤寒链球菌P-stx-12。此外,我们对肠沙门氏菌血清型鼠伤寒杆菌LT2 G455和肠沙门氏菌血清型鼠伤寒杆菌SDT1291的两个基因组进行了测序,发现Scaffold_builder减少了53%的序列数量,而平均长度增加了一倍以上。Scaffold_builder是用Python编写的,可以在http://edwards.sdsu.edu/scaffold_builder上获得。另外还提供了一个基于web的实现,允许用户提交参考基因组和一组要搭建的基因组。
{"title":"Combining de novo and reference-guided assembly with scaffold_builder.","authors":"Genivaldo Gz Silva,&nbsp;Bas E Dutilh,&nbsp;T David Matthews,&nbsp;Keri Elkins,&nbsp;Robert Schmieder,&nbsp;Elizabeth A Dinsdale,&nbsp;Robert A Edwards","doi":"10.1186/1751-0473-8-23","DOIUrl":"https://doi.org/10.1186/1751-0473-8-23","url":null,"abstract":"<p><p>Genome sequencing has become routine, however genome assembly still remains a challenge despite the computational advances in the last decade. In particular, the abundance of repeat elements in genomes makes it difficult to assemble them into a single complete sequence. Identical repeats shorter than the average read length can generally be assembled without issue. However, longer repeats such as ribosomal RNA operons cannot be accurately assembled using existing tools. The application Scaffold_builder was designed to generate scaffolds - super contigs of sequences joined by N-bases - based on the similarity to a closely related reference sequence. This is independent of mate-pair information and can be used complementarily for genome assembly, e.g. when mate-pairs are not available or have already been exploited. Scaffold_builder was evaluated using simulated pyrosequencing reads of the bacterial genomes Escherichia coli 042, Lactobacillus salivarius UCC118 and Salmonella enterica subsp. enterica serovar Typhi str. P-stx-12. Moreover, we sequenced two genomes from Salmonella enterica serovar Typhimurium LT2 G455 and Salmonella enterica serovar Typhimurium SDT1291 and show that Scaffold_builder decreases the number of contig sequences by 53% while more than doubling their average length. Scaffold_builder is written in Python and is available at http://edwards.sdsu.edu/scaffold_builder. A web-based implementation is additionally provided to allow users to submit a reference genome and a set of contigs to be scaffolded. </p>","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"8 1","pages":"23"},"PeriodicalIF":0.0,"publicationDate":"2013-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1751-0473-8-23","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31895068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 67
[COMMODE] a large-scale database of molecular descriptors using compounds from PubChem. 一个使用PubChem化合物的大型分子描述符数据库。
Q2 Decision Sciences Pub Date : 2013-11-13 DOI: 10.1186/1751-0473-8-22
Andreas Dander, Laurin Aj Mueller, Ralf Gallasch, Stephan Pabinger, Frank Emmert-Streib, Armin Graber, Matthias Dehmer

Background: Molecular descriptors have been extensively used in the field of structure-oriented drug design and structural chemistry. They have been applied in QSPR and QSAR models to predict ADME-Tox properties, which specify essential features for drugs. Molecular descriptors capture chemical and structural information, but investigating their interpretation and meaning remains very challenging.

Results: This paper introduces a large-scale database of molecular descriptors called COMMODE containing more than 25 million compounds originated from PubChem. About 2500 DRAGON-descriptors have been calculated for all compounds and integrated into this database, which is accessible through a web interface at http://commode.i-med.ac.at.

背景:分子描述符在面向结构的药物设计和结构化学领域得到了广泛的应用。它们已应用于QSPR和QSAR模型来预测ADME-Tox的特性,这些特性指定了药物的基本特征。分子描述符捕获化学和结构信息,但研究它们的解释和意义仍然非常具有挑战性。结果:本文介绍了一个名为COMMODE的大型分子描述符数据库,其中包含来自PubChem的超过2500万种化合物。所有化合物的大约2500个龙描述符已经计算出来,并整合到这个数据库中,该数据库可通过http://commode.i-med.ac.at的网络界面访问。
{"title":"[COMMODE] a large-scale database of molecular descriptors using compounds from PubChem.","authors":"Andreas Dander,&nbsp;Laurin Aj Mueller,&nbsp;Ralf Gallasch,&nbsp;Stephan Pabinger,&nbsp;Frank Emmert-Streib,&nbsp;Armin Graber,&nbsp;Matthias Dehmer","doi":"10.1186/1751-0473-8-22","DOIUrl":"https://doi.org/10.1186/1751-0473-8-22","url":null,"abstract":"<p><strong>Background: </strong>Molecular descriptors have been extensively used in the field of structure-oriented drug design and structural chemistry. They have been applied in QSPR and QSAR models to predict ADME-Tox properties, which specify essential features for drugs. Molecular descriptors capture chemical and structural information, but investigating their interpretation and meaning remains very challenging.</p><p><strong>Results: </strong>This paper introduces a large-scale database of molecular descriptors called COMMODE containing more than 25 million compounds originated from PubChem. About 2500 DRAGON-descriptors have been calculated for all compounds and integrated into this database, which is accessible through a web interface at http://commode.i-med.ac.at.</p>","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"8 1","pages":"22"},"PeriodicalIF":0.0,"publicationDate":"2013-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1751-0473-8-22","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31860791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Dispelling myths about rare disease registry system development. 破除关于罕见病登记系统发展的迷思。
Q2 Decision Sciences Pub Date : 2013-10-16 DOI: 10.1186/1751-0473-8-21
Matthew Bellgard, Christophe Beroud, Kay Parkinson, Tess Harris, Segolene Ayme, Gareth Baynam, Tarun Weeramanthri, Hugh Dawkins, Adam Hunter

Rare disease registries (RDRs) are an essential tool to improve knowledge and monitor interventions for rare diseases. If designed appropriately, patient and disease related information captured within them can become the cornerstone for effective diagnosis and new therapies. Surprisingly however, registries possess a diverse range of functionality, operate in different, often-times incompatible, software environments and serve various, and sometimes incongruous, purposes. Given the ambitious goals of the International Rare Diseases Research Consortium (IRDiRC) by 2020 and beyond, RDRs must be designed with the agility to evolve and efficiently interoperate in an ever changing rare disease landscape, as well as to cater for rapid changes in Information Communication Technologies. In this paper, we contend that RDR requirements will also evolve in response to a number of factors such as changing disease definitions and diagnostic criteria, the requirement to integrate patient/disease information from advances in either biotechnology and/or phenotypying approaches, as well as the need to adapt dynamically to security and privacy concerns. We dispel a number of myths in RDR development, outline key criteria for robust and sustainable RDR implementation and introduce the concept of a RDR Checklist to guide future RDR development.

罕见病登记是提高罕见病知识和监测罕见病干预措施的重要工具。如果设计得当,其中捕获的患者和疾病相关信息可以成为有效诊断和新疗法的基石。然而,令人惊讶的是,注册表拥有各种各样的功能,在不同的(通常是不兼容的)软件环境中运行,并服务于各种(有时是不协调的)目的。鉴于国际罕见病研究联盟(IRDiRC)到2020年及以后的宏伟目标,rdr的设计必须具有灵活性,以便在不断变化的罕见病环境中发展和有效地互操作,并适应信息通信技术的快速变化。在本文中,我们认为RDR需求也将随着许多因素的变化而发展,例如疾病定义和诊断标准的变化,从生物技术和/或表型方法的进步中整合患者/疾病信息的要求,以及动态适应安全和隐私问题的需要。我们消除了RDR发展中的一些神话,概述了稳健和可持续的RDR实施的关键标准,并引入了RDR清单的概念,以指导未来的RDR发展。
{"title":"Dispelling myths about rare disease registry system development.","authors":"Matthew Bellgard,&nbsp;Christophe Beroud,&nbsp;Kay Parkinson,&nbsp;Tess Harris,&nbsp;Segolene Ayme,&nbsp;Gareth Baynam,&nbsp;Tarun Weeramanthri,&nbsp;Hugh Dawkins,&nbsp;Adam Hunter","doi":"10.1186/1751-0473-8-21","DOIUrl":"https://doi.org/10.1186/1751-0473-8-21","url":null,"abstract":"<p><p>Rare disease registries (RDRs) are an essential tool to improve knowledge and monitor interventions for rare diseases. If designed appropriately, patient and disease related information captured within them can become the cornerstone for effective diagnosis and new therapies. Surprisingly however, registries possess a diverse range of functionality, operate in different, often-times incompatible, software environments and serve various, and sometimes incongruous, purposes. Given the ambitious goals of the International Rare Diseases Research Consortium (IRDiRC) by 2020 and beyond, RDRs must be designed with the agility to evolve and efficiently interoperate in an ever changing rare disease landscape, as well as to cater for rapid changes in Information Communication Technologies. In this paper, we contend that RDR requirements will also evolve in response to a number of factors such as changing disease definitions and diagnostic criteria, the requirement to integrate patient/disease information from advances in either biotechnology and/or phenotypying approaches, as well as the need to adapt dynamically to security and privacy concerns. We dispel a number of myths in RDR development, outline key criteria for robust and sustainable RDR implementation and introduce the concept of a RDR Checklist to guide future RDR development.</p>","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"8 1","pages":"21"},"PeriodicalIF":0.0,"publicationDate":"2013-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1751-0473-8-21","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31811774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
MIA - A free and open source software for gray scale medical image analysis. MIA -一个免费的开源软件,用于灰度医学图像分析。
Q2 Decision Sciences Pub Date : 2013-10-11 DOI: 10.1186/1751-0473-8-20
Gert Wollny, Peter Kellman, María-Jesus Ledesma-Carbayo, Matthew M Skinner, Jean-Jaques Hublin, Thomas Hierl

Background: Gray scale images make the bulk of data in bio-medical image analysis, and hence, the main focus of many image processing tasks lies in the processing of these monochrome images. With ever improving acquisition devices, spatial and temporal image resolution increases, and data sets become very large.Various image processing frameworks exists that make the development of new algorithms easy by using high level programming languages or visual programming. These frameworks are also accessable to researchers that have no background or little in software development because they take care of otherwise complex tasks. Specifically, the management of working memory is taken care of automatically, usually at the price of requiring more it. As a result, processing large data sets with these tools becomes increasingly difficult on work station class computers.One alternative to using these high level processing tools is the development of new algorithms in a languages like C++, that gives the developer full control over how memory is handled, but the resulting workflow for the prototyping of new algorithms is rather time intensive, and also not appropriate for a researcher with little or no knowledge in software development.Another alternative is in using command line tools that run image processing tasks, use the hard disk to store intermediate results, and provide automation by using shell scripts. Although not as convenient as, e.g. visual programming, this approach is still accessable to researchers without a background in computer science. However, only few tools exist that provide this kind of processing interface, they are usually quite task specific, and don't provide an clear approach when one wants to shape a new command line tool from a prototype shell script.

Results: The proposed framework, MIA, provides a combination of command line tools, plug-ins, and libraries that make it possible to run image processing tasks interactively in a command shell and to prototype by using the according shell scripting language. Since the hard disk becomes the temporal storage memory management is usually a non-issue in the prototyping phase. By using string-based descriptions for filters, optimizers, and the likes, the transition from shell scripts to full fledged programs implemented in C++ is also made easy. In addition, its design based on atomic plug-ins and single tasks command line tools makes it easy to extend MIA, usually without the requirement to touch or recompile existing code.

Conclusion: In this article, we describe the general design of MIA, a general purpouse framework for gray scale image processing. We demonstrated the applicability of the software with example applications from three different research scenarios, namely motion compensation in myocardial perfusion imaging, the processing of high resolution image data that arises in virtual anthropology, and retrospe

背景:在生物医学图像分析中,灰度图像是大量的数据,因此许多图像处理任务的主要焦点在于对这些单色图像的处理。随着采集设备的不断改进,空间和时间图像分辨率提高,数据集变得非常大。存在各种图像处理框架,通过使用高级编程语言或可视化编程,使新算法的开发变得容易。这些框架也适用于没有软件开发背景或很少有软件开发背景的研究人员,因为它们可以处理其他复杂的任务。具体来说,工作记忆的管理是自动完成的,通常需要更多的代价。因此,在工作站级计算机上使用这些工具处理大型数据集变得越来越困难。使用这些高级处理工具的另一种选择是用c++等语言开发新算法,这使开发人员可以完全控制如何处理内存,但是新算法原型的最终工作流程相当耗时,也不适合对软件开发知之甚少或一无所知的研究人员。另一种替代方法是使用命令行工具来运行图像处理任务,使用硬盘存储中间结果,并通过使用shell脚本提供自动化。虽然不像可视化编程那样方便,但对于没有计算机科学背景的研究人员来说,这种方法仍然是可行的。然而,只有少数工具提供这种处理接口,它们通常是非常特定于任务的,当想要从原型shell脚本构建新的命令行工具时,它们不提供明确的方法。结果:提出的框架MIA提供了命令行工具、插件和库的组合,使在命令shell中交互式地运行图像处理任务成为可能,并通过使用相应的shell脚本语言创建原型。由于硬盘成为临时存储,内存管理在原型设计阶段通常不是问题。通过对过滤器、优化器等使用基于字符串的描述,从shell脚本过渡到用c++实现的成熟程序也变得很容易。此外,它基于原子插件和单任务命令行工具的设计使得扩展MIA很容易,通常不需要修改或重新编译现有代码。在本文中,我们描述了MIA的总体设计,这是一个用于灰度图像处理的通用框架。我们通过三个不同研究场景的示例应用来演示该软件的适用性,即心肌灌注成像中的运动补偿,虚拟人类学中出现的高分辨率图像数据的处理以及对正颌手术治疗结果的回顾性分析。使用结合了小的、单任务命令行工具的shell脚本的MIA原型算法是使用高级语言的可行替代方案,这种方法在需要处理大型数据集时特别有用。
{"title":"MIA - A free and open source software for gray scale medical image analysis.","authors":"Gert Wollny, Peter Kellman, María-Jesus Ledesma-Carbayo, Matthew M Skinner, Jean-Jaques Hublin, Thomas Hierl","doi":"10.1186/1751-0473-8-20","DOIUrl":"10.1186/1751-0473-8-20","url":null,"abstract":"<p><strong>Background: </strong>Gray scale images make the bulk of data in bio-medical image analysis, and hence, the main focus of many image processing tasks lies in the processing of these monochrome images. With ever improving acquisition devices, spatial and temporal image resolution increases, and data sets become very large.Various image processing frameworks exists that make the development of new algorithms easy by using high level programming languages or visual programming. These frameworks are also accessable to researchers that have no background or little in software development because they take care of otherwise complex tasks. Specifically, the management of working memory is taken care of automatically, usually at the price of requiring more it. As a result, processing large data sets with these tools becomes increasingly difficult on work station class computers.One alternative to using these high level processing tools is the development of new algorithms in a languages like C++, that gives the developer full control over how memory is handled, but the resulting workflow for the prototyping of new algorithms is rather time intensive, and also not appropriate for a researcher with little or no knowledge in software development.Another alternative is in using command line tools that run image processing tasks, use the hard disk to store intermediate results, and provide automation by using shell scripts. Although not as convenient as, e.g. visual programming, this approach is still accessable to researchers without a background in computer science. However, only few tools exist that provide this kind of processing interface, they are usually quite task specific, and don't provide an clear approach when one wants to shape a new command line tool from a prototype shell script.</p><p><strong>Results: </strong>The proposed framework, MIA, provides a combination of command line tools, plug-ins, and libraries that make it possible to run image processing tasks interactively in a command shell and to prototype by using the according shell scripting language. Since the hard disk becomes the temporal storage memory management is usually a non-issue in the prototyping phase. By using string-based descriptions for filters, optimizers, and the likes, the transition from shell scripts to full fledged programs implemented in C++ is also made easy. In addition, its design based on atomic plug-ins and single tasks command line tools makes it easy to extend MIA, usually without the requirement to touch or recompile existing code.</p><p><strong>Conclusion: </strong>In this article, we describe the general design of MIA, a general purpouse framework for gray scale image processing. We demonstrated the applicability of the software with example applications from three different research scenarios, namely motion compensation in myocardial perfusion imaging, the processing of high resolution image data that arises in virtual anthropology, and retrospe","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"8 1","pages":"20"},"PeriodicalIF":0.0,"publicationDate":"2013-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1751-0473-8-20","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31800990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
期刊
Source Code for Biology and Medicine
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1