Source Code for Biology and Medicine最新文献_第3页

TagDigger: user-friendly extraction of read counts from GBS and RAD-seq data. TagDigger:用户友好的从GBS和RAD-seq数据中提取读取计数。

Q2 Decision Sciences

Source Code for Biology and Medicine

Pub Date : 2016-07-11 eCollection Date: 2016-01-01 DOI: 10.1186/s13029-016-0057-7

Lindsay V Clark, Erik J Sacks

Background: In genotyping-by-sequencing (GBS) and restriction site-associated DNA sequencing (RAD-seq), read depth is important for assessing the quality of genotype calls and estimating allele dosage in polyploids. However, existing pipelines for GBS and RAD-seq do not provide read counts in formats that are both accurate and easy to access. Additionally, although existing pipelines allow previously-mined SNPs to be genotyped on new samples, they do not allow the user to manually specify a subset of loci to examine. Pipelines that do not use a reference genome assign arbitrary names to SNPs, making meta-analysis across projects difficult.

Results: We created the software TagDigger, which includes three programs for analyzing GBS and RAD-seq data. The first script, tagdigger_interactive.py, rapidly extracts read counts and genotypes from FASTQ files using user-supplied sets of barcodes and tags. Input and output is in CSV format so that it can be opened by spreadsheet software. Tag sequences can also be imported from the Stacks, TASSEL-GBSv2, TASSEL-UNEAK, or pyRAD pipelines, and a separate file can be imported listing the names of markers to retain. A second script, tag_manager.py, consolidates marker names and sequences across multiple projects. A third script, barcode_splitter.py, assists with preparing FASTQ data for deposit in a public archive by splitting FASTQ files by barcode and generating MD5 checksums for the resulting files.

Conclusions: TagDigger is open-source and freely available software written in Python 3. It uses a scalable, rapid search algorithm that can process over 100 million FASTQ reads per hour. TagDigger will run on a laptop with any operating system, does not consume hard drive space with intermediate files, and does not require programming skill to use.

背景:在测序基因分型(GBS)和限制性内切位点相关DNA测序(RAD-seq)中，读取深度对于评估多倍体基因型的质量和估计等位基因的剂量是重要的。然而，GBS和RAD-seq的现有管道没有以既准确又易于访问的格式提供读计数。此外，尽管现有的管道允许先前挖掘的snp在新样品上进行基因分型，但它们不允许用户手动指定要检查的位点子集。不使用参考基因组的管道为snp指定任意名称，使得跨项目的元分析变得困难。结果:我们创建了TagDigger软件，该软件包含三个用于分析GBS和RAD-seq数据的程序。第一个脚本tagdigger_interactive.py使用用户提供的条形码和标签集从FASTQ文件中快速提取读取计数和基因型。输入输出为CSV格式，可通过电子表格软件打开。标签序列也可以从Stacks、TASSEL-GBSv2、TASSEL-UNEAK或pyRAD管道中导入，并且可以导入一个单独的文件，列出要保留的标记的名称。第二个脚本tag_manager.py整合了跨多个项目的标记名和序列。第三个脚本barcode_splitter.py通过按条形码分割FASTQ文件并为生成的文件生成MD5校验和，帮助准备FASTQ数据以存放到公共存档中。结论:TagDigger是一个开源的免费软件，使用Python 3编写。它使用可扩展的快速搜索算法，每小时可以处理超过1亿次FASTQ读取。TagDigger可以在任何操作系统的笔记本电脑上运行，不需要中间文件占用硬盘空间，也不需要编程技能。

{"title":"TagDigger: user-friendly extraction of read counts from GBS and RAD-seq data.","authors":"Lindsay V Clark, Erik J Sacks","doi":"10.1186/s13029-016-0057-7","DOIUrl":"https://doi.org/10.1186/s13029-016-0057-7","url":null,"abstract":"Background: In genotyping-by-sequencing (GBS) and restriction site-associated DNA sequencing (RAD-seq), read depth is important for assessing the quality of genotype calls and estimating allele dosage in polyploids. However, existing pipelines for GBS and RAD-seq do not provide read counts in formats that are both accurate and easy to access. Additionally, although existing pipelines allow previously-mined SNPs to be genotyped on new samples, they do not allow the user to manually specify a subset of loci to examine. Pipelines that do not use a reference genome assign arbitrary names to SNPs, making meta-analysis across projects difficult.Results: We created the software TagDigger, which includes three programs for analyzing GBS and RAD-seq data. The first script, tagdigger_interactive.py, rapidly extracts read counts and genotypes from FASTQ files using user-supplied sets of barcodes and tags. Input and output is in CSV format so that it can be opened by spreadsheet software. Tag sequences can also be imported from the Stacks, TASSEL-GBSv2, TASSEL-UNEAK, or pyRAD pipelines, and a separate file can be imported listing the names of markers to retain. A second script, tag_manager.py, consolidates marker names and sequences across multiple projects. A third script, barcode_splitter.py, assists with preparing FASTQ data for deposit in a public archive by splitting FASTQ files by barcode and generating MD5 checksums for the resulting files.Conclusions: TagDigger is open-source and freely available software written in Python 3. It uses a scalable, rapid search algorithm that can process over 100 million FASTQ reads per hour. TagDigger will run on a laptop with any operating system, does not consume hard drive space with intermediate files, and does not require programming skill to use.","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":" ","pages":"11"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13029-016-0057-7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34662025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

A new tool for prioritization of sequence variants from whole exome sequencing data. 从全外显子组测序数据中排序序列变异的新工具。

Q2 Decision Sciences

Source Code for Biology and Medicine

Pub Date : 2016-07-01 eCollection Date: 2016-01-01 DOI: 10.1186/s13029-016-0056-8

Brigitte Glanzmann, Hendri Herbst, Craig J Kinnear, Marlo Möller, Junaid Gamieldien, Soraya Bardien

Background: Whole exome sequencing (WES) has provided a means for researchers to gain access to a highly enriched subset of the human genome in which to search for variants that are likely to be pathogenic and possibly provide important insights into disease mechanisms. In developing countries, bioinformatics capacity and expertise is severely limited and wet bench scientists are required to take on the challenging task of understanding and implementing the barrage of bioinformatics tools that are available to them.

Results: We designed a novel method for the filtration of WES data called TAPER™ (Tool for Automated selection and Prioritization for Efficient Retrieval of sequence variants).

Conclusions: TAPER™ implements a set of logical steps by which to prioritize candidate variants that could be associated with disease and this is aimed for implementation in biomedical laboratories with limited bioinformatics capacity. TAPER™ is free, can be setup on a Windows operating system (from Windows 7 and above) and does not require any programming knowledge. In summary, we have developed a freely available tool that simplifies variant prioritization from WES data in order to facilitate discovery of disease-causing genes.

背景:全外显子组测序(WES)为研究人员提供了一种获取人类基因组高度富集子集的方法，在该子集中搜索可能致病的变异，并可能为疾病机制提供重要见解。在发展中国家，生物信息学能力和专业知识严重有限，湿台式科学家需要承担理解和实施他们可用的大量生物信息学工具这一具有挑战性的任务。结果:我们设计了一种新的方法来过滤WES数据，称为锥度™(工具自动选择和优先排序的有效检索序列变体)。结论:锥度™实现了一组逻辑步骤，通过这些步骤可以优先考虑可能与疾病相关的候选变异，这旨在在生物信息学能力有限的生物医学实验室中实施。锥形™是免费的，可以安装在Windows操作系统(从Windows 7及以上)，不需要任何编程知识。总之，我们开发了一种免费的工具，可以简化WES数据的变异优先级排序，从而促进致病基因的发现。

{"title":"A new tool for prioritization of sequence variants from whole exome sequencing data.","authors":"Brigitte Glanzmann, Hendri Herbst, Craig J Kinnear, Marlo Möller, Junaid Gamieldien, Soraya Bardien","doi":"10.1186/s13029-016-0056-8","DOIUrl":"https://doi.org/10.1186/s13029-016-0056-8","url":null,"abstract":"Background: Whole exome sequencing (WES) has provided a means for researchers to gain access to a highly enriched subset of the human genome in which to search for variants that are likely to be pathogenic and possibly provide important insights into disease mechanisms. In developing countries, bioinformatics capacity and expertise is severely limited and wet bench scientists are required to take on the challenging task of understanding and implementing the barrage of bioinformatics tools that are available to them.Results: We designed a novel method for the filtration of WES data called TAPER™ (Tool for Automated selection and Prioritization for Efficient Retrieval of sequence variants).Conclusions: TAPER™ implements a set of logical steps by which to prioritize candidate variants that could be associated with disease and this is aimed for implementation in biomedical laboratories with limited bioinformatics capacity. TAPER™ is free, can be setup on a Windows operating system (from Windows 7 and above) and does not require any programming knowledge. In summary, we have developed a freely available tool that simplifies variant prioritization from WES data in order to facilitate discovery of disease-causing genes.","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":" ","pages":"10"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13029-016-0056-8","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34634488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Log::ProgramInfo: A Perl module to collect and log data for bioinformatics pipelines. Log::ProgramInfo：用于收集和记录生物信息学管道数据的 Perl 模块。

Q2 Decision Sciences

Source Code for Biology and Medicine

Pub Date : 2016-06-24 eCollection Date: 2016-01-01 DOI: 10.1186/s13029-016-0055-9

John M Macdonald, Paul C Boutros

Background: To reproduce and report a bioinformatics analysis, it is important to be able to determine the environment in which a program was run. It can also be valuable when trying to debug why different executions are giving unexpectedly different results.

Results: Log::ProgramInfo is a Perl module that writes a log file at the termination of execution of the enclosing program, to document useful execution characteristics. This log file can be used to re-create the environment in order to reproduce an earlier execution. It can also be used to compare the environments of two executions to determine whether there were any differences that might affect (or explain) their operation.

Availability: The source is available on CPAN (Macdonald and Boutros, Log-ProgramInfo. http://search.cpan.org/~boutroslb/Log-ProgramInfo/).

Conclusion: Using Log::ProgramInfo in programs creating result data for publishable research, and including the Log::ProgramInfo output log as part of the publication of that research is a valuable method to assist others to duplicate the programming environment as a precursor to validating and/or extending that research.

背景为了重现和报告生物信息学分析，确定程序的运行环境非常重要。当试图调试为什么不同的执行结果出乎意料地不同时，这也很有价值：Log::ProgramInfo是一个Perl模块，可在外层程序执行结束时写入日志文件，记录有用的执行特征。该日志文件可用于重新创建环境，以重现早期的执行情况。它还可用于比较两次执行的环境，以确定是否存在任何可能影响（或解释）其运行的差异：源代码可在 CPAN 上获取（Macdonald 和 Boutros，Log-ProgramInfo. http://search.cpan.org/~boutroslb/Log-ProgramInfo/）。结论：使用 Log::ProgramInfo 对程序的运行有很大帮助：在为可发表的研究创建结果数据的程序中使用 Log::ProgramInfo，并将 Log::ProgramInfo 输出日志作为该研究成果发表的一部分，是一种有价值的方法，可帮助他人复制编程环境，作为验证和/或扩展该研究的前奏。

{"title":"Log::ProgramInfo: A Perl module to collect and log data for bioinformatics pipelines.","authors":"John M Macdonald, Paul C Boutros","doi":"10.1186/s13029-016-0055-9","DOIUrl":"10.1186/s13029-016-0055-9","url":null,"abstract":"Background: To reproduce and report a bioinformatics analysis, it is important to be able to determine the environment in which a program was run. It can also be valuable when trying to debug why different executions are giving unexpectedly different results.Results: Log::ProgramInfo is a Perl module that writes a log file at the termination of execution of the enclosing program, to document useful execution characteristics. This log file can be used to re-create the environment in order to reproduce an earlier execution. It can also be used to compare the environments of two executions to determine whether there were any differences that might affect (or explain) their operation.Availability: The source is available on CPAN (Macdonald and Boutros, Log-ProgramInfo. http://search.cpan.org/~boutroslb/Log-ProgramInfo/).Conclusion: Using Log::ProgramInfo in programs creating result data for publishable research, and including the Log::ProgramInfo output log as part of the publication of that research is a valuable method to assist others to duplicate the programming environment as a precursor to validating and/or extending that research.","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":" ","pages":"9"},"PeriodicalIF":0.0,"publicationDate":"2016-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4919834/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34613905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SV-STAT accurately detects structural variation via alignment to reference-based assemblies. SV-STAT通过对基于参考的组件进行校准来准确检测结构变化。

Q2 Decision Sciences

Source Code for Biology and Medicine

Pub Date : 2016-06-18 eCollection Date: 2016-01-01 DOI: 10.1186/s13029-016-0051-0

Caleb F Davis, Deborah I Ritter, David A Wheeler, Hongmei Wang, Yan Ding, Shannon P Dugan, Matthew N Bainbridge, Donna M Muzny, Pulivarthi H Rao, Tsz-Kwong Man, Sharon E Plon, Richard A Gibbs, Ching C Lau

Background: Genomic deletions, inversions, and other rearrangements known collectively as structural variations (SVs) are implicated in many human disorders. Technologies for sequencing DNA provide a potentially rich source of information in which to detect breakpoints of structural variations at base-pair resolution. However, accurate prediction of SVs remains challenging, and existing informatics tools predict rearrangements with significant rates of false positives or negatives.

Results: To address this challenge, we developed 'Structural Variation detection by STAck and Tail' (SV-STAT) which implements a novel scoring metric. The software uses this statistic to quantify evidence for structural variation in genomic regions suspected of harboring rearrangements. To demonstrate SV-STAT, we used targeted and genome-wide approaches. First, we applied a custom capture array followed by Roche/454 and SV-STAT to three pediatric B-lineage acute lymphoblastic leukemias, identifying five structural variations joining known and novel breakpoint regions. Next, we detected SVs genome-wide in paired-end Illumina data collected from additional tumor samples. SV-STAT showed predictive accuracy as high as or higher than leading alternatives. The software is freely available under the terms of the GNU General Public License version 3 at https://gitorious.org/svstat/svstat.

Conclusions: SV-STAT works across multiple sequencing chemistries, paired and single-end technologies, targeted or whole-genome strategies, and it complements existing SV-detection software. The method is a significant advance towards accurate detection and genotyping of genomic rearrangements from DNA sequencing data.

背景:基因组缺失、倒置和其他重排统称为结构变异(SVs)，与许多人类疾病有关。DNA测序技术为检测碱基对分辨率结构变异的断点提供了潜在的丰富信息来源。然而，SVs的准确预测仍然具有挑战性，现有的信息学工具预测重排的假阳性或阴性率很高。结果:为了解决这一挑战，我们开发了“STAck和Tail结构变异检测”(SV-STAT)，它实现了一种新的评分指标。该软件利用这一统计数据来量化基因区域结构变异的证据，这些基因区域被怀疑存在重排。为了证明SV-STAT，我们使用了靶向和全基因组方法。首先，我们应用自定义捕获阵列，随后采用Roche/454和SV-STAT对3例儿童b系急性淋巴细胞白血病进行检测，确定了连接已知和新的断点区域的5种结构变异。接下来，我们在从其他肿瘤样本收集的配对端Illumina数据中检测全基因组的sv。SV-STAT的预测准确性与领先的替代方法相同或更高。该软件在GNU通用公共许可证版本3的条款下免费提供，网址为https://gitorious.org/svstat/svstat.Conclusions:。SV-STAT适用于多种测序化学，配对和单端技术，靶向或全基因组策略，并且它补充了现有的sv检测软件。该方法是从DNA测序数据中准确检测和基因分型基因组重排的重大进展。

{"title":"SV-STAT accurately detects structural variation via alignment to reference-based assemblies.","authors":"Caleb F Davis, Deborah I Ritter, David A Wheeler, Hongmei Wang, Yan Ding, Shannon P Dugan, Matthew N Bainbridge, Donna M Muzny, Pulivarthi H Rao, Tsz-Kwong Man, Sharon E Plon, Richard A Gibbs, Ching C Lau","doi":"10.1186/s13029-016-0051-0","DOIUrl":"https://doi.org/10.1186/s13029-016-0051-0","url":null,"abstract":"Background: Genomic deletions, inversions, and other rearrangements known collectively as structural variations (SVs) are implicated in many human disorders. Technologies for sequencing DNA provide a potentially rich source of information in which to detect breakpoints of structural variations at base-pair resolution. However, accurate prediction of SVs remains challenging, and existing informatics tools predict rearrangements with significant rates of false positives or negatives.Results: To address this challenge, we developed 'Structural Variation detection by STAck and Tail' (SV-STAT) which implements a novel scoring metric. The software uses this statistic to quantify evidence for structural variation in genomic regions suspected of harboring rearrangements. To demonstrate SV-STAT, we used targeted and genome-wide approaches. First, we applied a custom capture array followed by Roche/454 and SV-STAT to three pediatric B-lineage acute lymphoblastic leukemias, identifying five structural variations joining known and novel breakpoint regions. Next, we detected SVs genome-wide in paired-end Illumina data collected from additional tumor samples. SV-STAT showed predictive accuracy as high as or higher than leading alternatives. The software is freely available under the terms of the GNU General Public License version 3 at https://gitorious.org/svstat/svstat.Conclusions: SV-STAT works across multiple sequencing chemistries, paired and single-end technologies, targeted or whole-genome strategies, and it complements existing SV-detection software. The method is a significant advance towards accurate detection and genotyping of genomic rearrangements from DNA sequencing data.","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":" ","pages":"8"},"PeriodicalIF":0.0,"publicationDate":"2016-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13029-016-0051-0","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34601128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Implementation and clinical application of a deformation method for fast simulation of biological tissue formed by fibers and fluid. 一种快速模拟纤维和流体形成的生物组织的变形方法的实现和临床应用。

Q2 Decision Sciences

Source Code for Biology and Medicine

Pub Date : 2016-04-15 eCollection Date: 2016-01-01 DOI: 10.1186/s13029-016-0054-x

Ana Gabriella de Oliveira Sardinha, Ceres Nunes de Resende Oyama, Armando de Mendonça Maroja, Ivan F Costa

Background: The aim of this paper is to provide a general discussion, algorithm, and actual working programs of the deformation method for fast simulation of biological tissue formed by fibers and fluid. In order to demonstrate the benefit of the clinical applications software, we successfully used our computational program to deform a 3D breast image acquired from patients, using a 3D scanner, in a real hospital environment.

Results: The method implements a quasi-static solution for elastic global deformations of objects. Each pair of vertices of the surface is connected and defines an elastic fiber. The set of all the elastic fibers defines a mesh of smaller size than the volumetric meshes, allowing for simulation of complex objects with less computational effort. The behavior similar to the stress tensor is obtained by the volume conservation equation that mixes the 3D coordinates. Step by step, we show the computational implementation of this approach.

Conclusions: As an example, a 2D rectangle formed by only 4 vertices is solved and, for this simple geometry, all intermediate results are shown. On the other hand, actual implementations of these ideas in the form of working computer routines are provided for general 3D objects, including a clinical application.

背景:本文的目的是提供纤维和流体形成的生物组织快速模拟的变形法的一般讨论，算法和实际工作程序。为了展示临床应用软件的优势，我们成功地使用我们的计算程序在真实的医院环境中使用3D扫描仪对从患者那里获得的3D乳房图像进行了变形。结果:该方法实现了物体弹性全局变形的准静态解。表面的每一对顶点相互连接并定义一个弹性纤维。所有弹性纤维的集合定义了一个比体积网格尺寸更小的网格，允许以更少的计算量模拟复杂物体。通过混合三维坐标的体积守恒方程获得与应力张量相似的行为。一步一步地，我们展示了这种方法的计算实现。结论:以一个仅由4个顶点构成的二维矩形为例，对于这个简单的几何图形，所有的中间结果都显示出来。另一方面，以工作计算机例程的形式为一般3D对象提供了这些想法的实际实现，包括临床应用。

{"title":"Implementation and clinical application of a deformation method for fast simulation of biological tissue formed by fibers and fluid.","authors":"Ana Gabriella de Oliveira Sardinha, Ceres Nunes de Resende Oyama, Armando de Mendonça Maroja, Ivan F Costa","doi":"10.1186/s13029-016-0054-x","DOIUrl":"https://doi.org/10.1186/s13029-016-0054-x","url":null,"abstract":"Background: The aim of this paper is to provide a general discussion, algorithm, and actual working programs of the deformation method for fast simulation of biological tissue formed by fibers and fluid. In order to demonstrate the benefit of the clinical applications software, we successfully used our computational program to deform a 3D breast image acquired from patients, using a 3D scanner, in a real hospital environment.Results: The method implements a quasi-static solution for elastic global deformations of objects. Each pair of vertices of the surface is connected and defines an elastic fiber. The set of all the elastic fibers defines a mesh of smaller size than the volumetric meshes, allowing for simulation of complex objects with less computational effort. The behavior similar to the stress tensor is obtained by the volume conservation equation that mixes the 3D coordinates. Step by step, we show the computational implementation of this approach.Conclusions: As an example, a 2D rectangle formed by only 4 vertices is solved and, for this simple geometry, all intermediate results are shown. On the other hand, actual implementations of these ideas in the form of working computer routines are provided for general 3D objects, including a clinical application.","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":" ","pages":"7"},"PeriodicalIF":0.0,"publicationDate":"2016-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13029-016-0054-x","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34312216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MM2S: personalized diagnosis of medulloblastoma patients and model systems. MM2S:髓母细胞瘤患者的个性化诊断及模型系统。

Q2 Decision Sciences

Source Code for Biology and Medicine

Pub Date : 2016-04-11 eCollection Date: 2016-01-01 DOI: 10.1186/s13029-016-0053-y

Deena M A Gendoo, Benjamin Haibe-Kains

Background: Medulloblastoma (MB) is a highly malignant and heterogeneous brain tumour that is the most common cause of cancer-related deaths in children. Increasing availability of genomic data over the last decade had resulted in improvement of human subtype classification methods, and the parallel development of MB mouse models towards identification of subtype-specific disease origins and signaling pathways. Despite these advances, MB classification schemes remained inadequate for personalized prediction of MB subtypes for individual patient samples and across model systems. To address this issue, we developed the Medullo-Model to Subtypes ( MM2S ) classifier, a new method enabling classification of individual gene expression profiles from MB samples (patient samples, mouse models, and cell lines) against well-established molecular subtypes [Genomics 106:96-106, 2015]. We demonstrated the accuracy and flexibility of MM2S in the largest meta-analysis of human patients and mouse models to date. Here, we present a new functional package that provides an easy-to-use and fully documented implementation of the MM2S method, with additional functionalities that allow users to obtain graphical and tabular summaries of MB subtype predictions for single samples and across sample replicates. The flexibility of the MM2S package promotes incorporation of MB predictions into large Medulloblastoma-driven analysis pipelines, making this tool suitable for use by researchers.

Results: The MM2S package is applied in two case studies involving human primary patient samples, as well as sample replicates of the GTML mouse model. We highlight functions that are of use for species-specific MB classification, across individual samples and sample replicates. We emphasize on the range of functions that can be used to derive both singular and meta-centric views of MB predictions, across samples and across MB subtypes.

Conclusions: Our MM2S package can be used to generate predictions without having to rely on an external web server or additional sources. Our open-source package facilitates and extends the MM2S algorithm in diverse computational and bioinformatics contexts. The package is available on CRAN, at the following URL: https://cran.r-project.org/web/packages/MM2S/, as well as on Github at the following URLs: https://github.com/DGendoo and https://github.com/bhklab.

背景:髓母细胞瘤(MB)是一种高度恶性和异质性的脑肿瘤，是儿童癌症相关死亡的最常见原因。在过去十年中，基因组数据的增加导致了人类亚型分类方法的改进，以及MB小鼠模型的平行发展，以确定亚型特异性疾病的起源和信号通路。尽管取得了这些进展，但MB分类方案仍然不足以对个体患者样本和跨模型系统的MB亚型进行个性化预测。为了解决这个问题，我们开发了Medullo-Model To Subtypes (MM2S)分类器，这是一种新的方法，可以根据已建立的分子亚型对MB样本(患者样本、小鼠模型和细胞系)的个体基因表达谱进行分类[Genomics 106:96-106, 2015]。我们在迄今为止最大的人类患者和小鼠模型荟萃分析中证明了MM2S的准确性和灵活性。在这里，我们提出了一个新的功能包，它提供了一个易于使用和完整文档化的MM2S方法实现，并具有其他功能，允许用户获得单个样本和跨样本复制的MB亚型预测的图形和表格摘要。MM2S包的灵活性促进了将MB预测合并到大型髓母细胞瘤驱动的分析管道中，使该工具适合研究人员使用。结果:MM2S包应用于涉及人类主要患者样本的两个案例研究，以及GTML小鼠模型的样本复制。我们强调了在个体样本和样本重复中用于物种特异性MB分类的功能。我们强调了函数的范围，这些函数可用于推导跨样本和跨MB亚型的MB预测的奇异和元中心视图。结论:我们的MM2S包可以用来生成预测，而无需依赖外部web服务器或其他来源。我们的开源包在不同的计算和生物信息学环境中促进和扩展了MM2S算法。该软件包可在CRAN上获得，网址如下:https://cran.r-project.org/web/packages/MM2S/，也可在Github上获得，网址如下:https://github.com/DGendoo和https://github.com/bhklab。

{"title":"MM2S: personalized diagnosis of medulloblastoma patients and model systems.","authors":"Deena M A Gendoo, Benjamin Haibe-Kains","doi":"10.1186/s13029-016-0053-y","DOIUrl":"https://doi.org/10.1186/s13029-016-0053-y","url":null,"abstract":"Background: Medulloblastoma (MB) is a highly malignant and heterogeneous brain tumour that is the most common cause of cancer-related deaths in children. Increasing availability of genomic data over the last decade had resulted in improvement of human subtype classification methods, and the parallel development of MB mouse models towards identification of subtype-specific disease origins and signaling pathways. Despite these advances, MB classification schemes remained inadequate for personalized prediction of MB subtypes for individual patient samples and across model systems. To address this issue, we developed the Medullo-Model to Subtypes ( MM2S ) classifier, a new method enabling classification of individual gene expression profiles from MB samples (patient samples, mouse models, and cell lines) against well-established molecular subtypes [Genomics 106:96-106, 2015]. We demonstrated the accuracy and flexibility of MM2S in the largest meta-analysis of human patients and mouse models to date. Here, we present a new functional package that provides an easy-to-use and fully documented implementation of the MM2S method, with additional functionalities that allow users to obtain graphical and tabular summaries of MB subtype predictions for single samples and across sample replicates. The flexibility of the MM2S package promotes incorporation of MB predictions into large Medulloblastoma-driven analysis pipelines, making this tool suitable for use by researchers.Results: The MM2S package is applied in two case studies involving human primary patient samples, as well as sample replicates of the GTML mouse model. We highlight functions that are of use for species-specific MB classification, across individual samples and sample replicates. We emphasize on the range of functions that can be used to derive both singular and meta-centric views of MB predictions, across samples and across MB subtypes.Conclusions: Our MM2S package can be used to generate predictions without having to rely on an external web server or additional sources. Our open-source package facilitates and extends the MM2S algorithm in diverse computational and bioinformatics contexts. The package is available on CRAN, at the following URL: https://cran.r-project.org/web/packages/MM2S/, as well as on Github at the following URLs: https://github.com/DGendoo and https://github.com/bhklab.","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":" ","pages":"6"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13029-016-0053-y","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34307296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

A flexible tool to plot a genomic map for single nucleotide polymorphisms 一个灵活的工具来绘制一个基因组图的单核苷酸多态性

Q2 Decision Sciences

Source Code for Biology and Medicine

Pub Date : 2016-04-02 DOI: 10.1186/s13029-016-0052-z

Fuquan Zhang

引用次数: 1

A generic schema and data collection forms applicable to diverse entomological studies of mosquitoes. 适用于各种蚊虫学研究的通用模式和数据收集表。

Q2 Decision Sciences

Source Code for Biology and Medicine

Pub Date : 2016-03-28 eCollection Date: 2016-01-01 DOI: 10.1186/s13029-016-0050-1

Samson S Kiware, Tanya L Russell, Zacharia J Mtema, Alpha D Malishee, Prosper Chaki, Dickson Lwetoijera, Javan Chanda, Dingani Chinula, Silas Majambere, John E Gimnig, Thomas A Smith, Gerry F Killeen

Background: Standardized schemas, databases, and public data repositories are needed for the studies of malaria vectors that encompass a remarkably diverse array of designs and rapidly generate large data volumes, often in resource-limited tropical settings lacking specialized software or informatics support.

Results: Data from the majority of mosquito studies conformed to a generic schema, with data collection forms recording the experimental design, sorting of collections, details of sample pooling or subdivision, and additional observations. Generically applicable forms with standardized attribute definitions enabled rigorous, consistent data and sample management with generic software and minimal expertise. Forms use now includes 20 experiments, 8 projects, and 15 users at 3 research and control institutes in 3 African countries, resulting in 11 peer-reviewed publications.

Conclusion: We have designed generic data schema that can be used to develop paper or electronic based data collection forms depending on the availability of resources. We have developed paper-based data collection forms that can be used to collect data from majority of entomological studies across multiple study areas using standardized data formats. Data recorded on these forms with standardized formats can be entered and linked with any relational database software. These informatics tools are recommended because they ensure that medical entomologists save time, improve data quality, and data collected and shared across multiple studies is in standardized formats hence increasing research outputs.

背景：对疟疾病媒的研究需要标准化的模式、数据库和公共数据储存库，因为这些研究包含多种多样的设计，并能迅速产生大量数据，而且通常是在资源有限、缺乏专业软件或信息学支持的热带环境中进行：结果：大多数蚊虫研究的数据符合通用模式，数据收集表格记录了实验设计、收集分类、样本集中或细分的细节以及其他观察结果。通用表格具有标准化的属性定义，只需使用通用软件和最低限度的专业知识，就能实现严格、一致的数据和样本管理。目前，在非洲 3 个国家的 3 个研究与控制机构中，有 20 项实验、8 个项目和 15 名用户使用了表格，并发表了 11 篇经同行评审的论文：我们设计了通用数据模式，可用于开发纸质或电子数据收集表，具体取决于资源的可用性。我们开发了纸质数据收集表，可用于使用标准化数据格式从多个研究领域的大多数昆虫学研究中收集数据。在这些采用标准化格式的表格上记录的数据可以输入任何关系数据库软件并与之连接。推荐使用这些信息学工具的原因是，它们可确保医学昆虫学家节省时间、提高数据质量，并以标准化格式收集和共享多项研究的数据，从而提高研究成果。

{"title":"A generic schema and data collection forms applicable to diverse entomological studies of mosquitoes.","authors":"Samson S Kiware, Tanya L Russell, Zacharia J Mtema, Alpha D Malishee, Prosper Chaki, Dickson Lwetoijera, Javan Chanda, Dingani Chinula, Silas Majambere, John E Gimnig, Thomas A Smith, Gerry F Killeen","doi":"10.1186/s13029-016-0050-1","DOIUrl":"10.1186/s13029-016-0050-1","url":null,"abstract":"Background: Standardized schemas, databases, and public data repositories are needed for the studies of malaria vectors that encompass a remarkably diverse array of designs and rapidly generate large data volumes, often in resource-limited tropical settings lacking specialized software or informatics support.Results: Data from the majority of mosquito studies conformed to a generic schema, with data collection forms recording the experimental design, sorting of collections, details of sample pooling or subdivision, and additional observations. Generically applicable forms with standardized attribute definitions enabled rigorous, consistent data and sample management with generic software and minimal expertise. Forms use now includes 20 experiments, 8 projects, and 15 users at 3 research and control institutes in 3 African countries, resulting in 11 peer-reviewed publications.Conclusion: We have designed generic data schema that can be used to develop paper or electronic based data collection forms depending on the availability of resources. We have developed paper-based data collection forms that can be used to collect data from majority of entomological studies across multiple study areas using standardized data formats. Data recorded on these forms with standardized formats can be entered and linked with any relational database software. These informatics tools are recommended because they ensure that medical entomologists save time, improve data quality, and data collected and shared across multiple studies is in standardized formats hence increasing research outputs.","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"11 ","pages":"4"},"PeriodicalIF":0.0,"publicationDate":"2016-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4809029/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9832699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Conserved antigenic sites between MERS-CoV and Bat-coronavirus are revealed through sequence analysis 通过序列分析揭示了MERS-CoV与蝙蝠冠状病毒之间的保守抗原位点

Q2 Decision Sciences

Source Code for Biology and Medicine

Pub Date : 2016-03-09 DOI: 10.1186/s13029-016-0049-7

Refat Sharmin, A. B. Islam

引用次数: 5

Implementation of the Rank-Weighted Co-localization (RWC) algorithm in multiple image analysis platforms for quantitative analysis of microscopy images 基于秩加权共定位(RWC)算法的多图像分析平台显微图像定量分析

Q2 Decision Sciences

Source Code for Biology and Medicine

Pub Date : 2016-02-16 DOI: 10.1186/s13029-016-0048-8

Vasanth R. Singan, J. Simpson

引用次数: 6