{"title":"plot2groups: an R package to plot scatter points for two groups of values","authors":"Yong Xu, Fuquan Zhang, Guoqiang Wang, Hongbao Cao, Y. Shugart, Zaohuo Cheng","doi":"10.1186/1751-0473-9-23","DOIUrl":"https://doi.org/10.1186/1751-0473-9-23","url":null,"abstract":"","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2014-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1751-0473-9-23","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"65725332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Khadija Musayeva, Tristan Henderson, John Bo Mitchell, Lazaros Mavridis
Background: A well-known problem in cluster analysis is finding an optimal number of clusters reflecting the inherent structure of the data. PFClust is a partitioning-based clustering algorithm capable, unlike many widely-used clustering algorithms, of automatically proposing an optimal number of clusters for the data.
Results: The results of tests on various types of data showed that PFClust can discover clusters of arbitrary shapes, sizes and densities. The previous implementation of the algorithm had already been successfully used to cluster large macromolecular structures and small druglike compounds. We have greatly improved the algorithm by a more efficient implementation, which enables PFClust to process large data sets acceptably fast.
Conclusions: In this paper we present a new optimized implementation of the PFClust algorithm that runs considerably faster than the original.
{"title":"PFClust: an optimised implementation of a parameter-free clustering algorithm.","authors":"Khadija Musayeva, Tristan Henderson, John Bo Mitchell, Lazaros Mavridis","doi":"10.1186/1751-0473-9-5","DOIUrl":"https://doi.org/10.1186/1751-0473-9-5","url":null,"abstract":"<p><strong>Background: </strong>A well-known problem in cluster analysis is finding an optimal number of clusters reflecting the inherent structure of the data. PFClust is a partitioning-based clustering algorithm capable, unlike many widely-used clustering algorithms, of automatically proposing an optimal number of clusters for the data.</p><p><strong>Results: </strong>The results of tests on various types of data showed that PFClust can discover clusters of arbitrary shapes, sizes and densities. The previous implementation of the algorithm had already been successfully used to cluster large macromolecular structures and small druglike compounds. We have greatly improved the algorithm by a more efficient implementation, which enables PFClust to process large data sets acceptably fast.</p><p><strong>Conclusions: </strong>In this paper we present a new optimized implementation of the PFClust algorithm that runs considerably faster than the original.</p>","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"9 1","pages":"5"},"PeriodicalIF":0.0,"publicationDate":"2014-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1751-0473-9-5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32084362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bernard J Pope, Tú Nguyen-Dumont, Fleur Hammet, Daniel J Park
Background: We recently described Hi-Plex, a highly multiplexed PCR-based target-enrichment system for massively parallel sequencing (MPS), which allows the uniform definition of library size so that subsequent paired-end sequencing can achieve complete overlap of read pairs. Variant calling from Hi-Plex-derived datasets can thus rely on the identification of variants appearing in both reads of read-pairs, permitting stringent filtering of sequencing chemistry-induced errors. These principles underly ROVER software (derived from Read Overlap PCR-MPS variant caller), which we have recently used to report the screening for genetic mutations in the breast cancer predisposition gene PALB2. Here, we describe the algorithms underlying ROVER and its usage.
Results: ROVER enables users to quickly and accurately identify genetic variants from PCR-targeted, overlapping paired-end MPS datasets. The open-source availability of the software and threshold tailorability enables broad access for a range of PCR-MPS users.
Methods: ROVER is implemented in Python and runs on all popular POSIX-like operating systems (Linux, OS X). The software accepts a tab-delimited text file listing the coordinates of the target-specific primers used for targeted enrichment based on a specified genome-build. It also accepts aligned sequence files resulting from mapping to the same genome-build. ROVER identifies the amplicon a given read-pair represents and removes the primer sequences by using the mapping co-ordinates and primer co-ordinates. It considers overlapping read-pairs with respect to primer-intervening sequence. Only when a variant is observed in both reads of a read-pair does the signal contribute to a tally of read-pairs containing or not containing the variant. A user-defined threshold informs the minimum number of, and proportion of, read-pairs a variant must be observed in for a 'call' to be made. ROVER also reports the depth of coverage across amplicons to facilitate the identification of any regions that may require further screening.
Conclusions: ROVER can facilitate rapid and accurate genetic variant calling for a broad range of PCR-MPS users.
{"title":"ROVER variant caller: read-pair overlap considerate variant-calling software applied to PCR-based massively parallel sequencing datasets.","authors":"Bernard J Pope, Tú Nguyen-Dumont, Fleur Hammet, Daniel J Park","doi":"10.1186/1751-0473-9-3","DOIUrl":"https://doi.org/10.1186/1751-0473-9-3","url":null,"abstract":"<p><strong>Background: </strong>We recently described Hi-Plex, a highly multiplexed PCR-based target-enrichment system for massively parallel sequencing (MPS), which allows the uniform definition of library size so that subsequent paired-end sequencing can achieve complete overlap of read pairs. Variant calling from Hi-Plex-derived datasets can thus rely on the identification of variants appearing in both reads of read-pairs, permitting stringent filtering of sequencing chemistry-induced errors. These principles underly ROVER software (derived from Read Overlap PCR-MPS variant caller), which we have recently used to report the screening for genetic mutations in the breast cancer predisposition gene PALB2. Here, we describe the algorithms underlying ROVER and its usage.</p><p><strong>Results: </strong>ROVER enables users to quickly and accurately identify genetic variants from PCR-targeted, overlapping paired-end MPS datasets. The open-source availability of the software and threshold tailorability enables broad access for a range of PCR-MPS users.</p><p><strong>Methods: </strong>ROVER is implemented in Python and runs on all popular POSIX-like operating systems (Linux, OS X). The software accepts a tab-delimited text file listing the coordinates of the target-specific primers used for targeted enrichment based on a specified genome-build. It also accepts aligned sequence files resulting from mapping to the same genome-build. ROVER identifies the amplicon a given read-pair represents and removes the primer sequences by using the mapping co-ordinates and primer co-ordinates. It considers overlapping read-pairs with respect to primer-intervening sequence. Only when a variant is observed in both reads of a read-pair does the signal contribute to a tally of read-pairs containing or not containing the variant. A user-defined threshold informs the minimum number of, and proportion of, read-pairs a variant must be observed in for a 'call' to be made. ROVER also reports the depth of coverage across amplicons to facilitate the identification of any regions that may require further screening.</p><p><strong>Conclusions: </strong>ROVER can facilitate rapid and accurate genetic variant calling for a broad range of PCR-MPS users.</p>","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"9 1","pages":"3"},"PeriodicalIF":0.0,"publicationDate":"2014-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1751-0473-9-3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32060349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luís Paquete, Pedro Matias, Maryam Abbasi, Miguel Pinheiro
: Multiobjective sequence alignment brings the advantage of providing a set of alignments that represent the trade-off between performing insertion/deletions and matching symbols from both sequences. Each of these alignments provide a potential explanation of the relationship between the sequences. We introduce MOSAL, a software tool that provides an open-source implementation and an on-line application for multiobjective pairwise sequence alignment.
{"title":"MOSAL: software tools for multiobjective sequence alignment.","authors":"Luís Paquete, Pedro Matias, Maryam Abbasi, Miguel Pinheiro","doi":"10.1186/1751-0473-9-2","DOIUrl":"10.1186/1751-0473-9-2","url":null,"abstract":"<p><p>: Multiobjective sequence alignment brings the advantage of providing a set of alignments that represent the trade-off between performing insertion/deletions and matching symbols from both sequences. Each of these alignments provide a potential explanation of the relationship between the sequences. We introduce MOSAL, a software tool that provides an open-source implementation and an on-line application for multiobjective pairwise sequence alignment. </p>","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"9 1","pages":"2"},"PeriodicalIF":0.0,"publicationDate":"2014-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1751-0473-9-2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32009792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David Campos, Quoc-Chinh Bui, Sérgio Matos, José Luís Oliveira
Background: Cellular events play a central role in the understanding of biological processes and functions, providing insight on both physiological and pathogenesis mechanisms. Automatic extraction of mentions of such events from the literature represents an important contribution to the progress of the biomedical domain, allowing faster updating of existing knowledge. The identification of trigger words indicating an event is a very important step in the event extraction pipeline, since the following task(s) rely on its output. This step presents various complex and unsolved challenges, namely the selection of informative features, the representation of the textual context, and the selection of a specific event type for a trigger word given this context.
Results: We propose TrigNER, a machine learning-based solution for biomedical event trigger recognition, which takes advantage of Conditional Random Fields (CRFs) with a high-end feature set, including linguistic-based, orthographic, morphological, local context and dependency parsing features. Additionally, a completely configurable algorithm is used to automatically optimize the feature set and training parameters for each event type. Thus, it automatically selects the features that have a positive contribution and automatically optimizes the CRF model order, n-grams sizes, vertex information and maximum hops for dependency parsing features. The final output consists of various CRF models, each one optimized to the linguistic characteristics of each event type.
Conclusions: TrigNER was tested in the BioNLP 2009 shared task corpus, achieving a total F-measure of 62.7 and outperforming existing solutions on various event trigger types, namely gene expression, transcription, protein catabolism, phosphorylation and binding. The proposed solution allows researchers to easily apply complex and optimized techniques in the recognition of biomedical event triggers, making its application a simple routine task. We believe this work is an important contribution to the biomedical text mining community, contributing to improved and faster event recognition on scientific articles, and consequent hypothesis generation and knowledge discovery. This solution is freely available as open source at http://bioinformatics.ua.pt/trigner.
{"title":"TrigNER: automatically optimized biomedical event trigger recognition on scientific documents.","authors":"David Campos, Quoc-Chinh Bui, Sérgio Matos, José Luís Oliveira","doi":"10.1186/1751-0473-9-1","DOIUrl":"https://doi.org/10.1186/1751-0473-9-1","url":null,"abstract":"<p><strong>Background: </strong>Cellular events play a central role in the understanding of biological processes and functions, providing insight on both physiological and pathogenesis mechanisms. Automatic extraction of mentions of such events from the literature represents an important contribution to the progress of the biomedical domain, allowing faster updating of existing knowledge. The identification of trigger words indicating an event is a very important step in the event extraction pipeline, since the following task(s) rely on its output. This step presents various complex and unsolved challenges, namely the selection of informative features, the representation of the textual context, and the selection of a specific event type for a trigger word given this context.</p><p><strong>Results: </strong>We propose TrigNER, a machine learning-based solution for biomedical event trigger recognition, which takes advantage of Conditional Random Fields (CRFs) with a high-end feature set, including linguistic-based, orthographic, morphological, local context and dependency parsing features. Additionally, a completely configurable algorithm is used to automatically optimize the feature set and training parameters for each event type. Thus, it automatically selects the features that have a positive contribution and automatically optimizes the CRF model order, n-grams sizes, vertex information and maximum hops for dependency parsing features. The final output consists of various CRF models, each one optimized to the linguistic characteristics of each event type.</p><p><strong>Conclusions: </strong>TrigNER was tested in the BioNLP 2009 shared task corpus, achieving a total F-measure of 62.7 and outperforming existing solutions on various event trigger types, namely gene expression, transcription, protein catabolism, phosphorylation and binding. The proposed solution allows researchers to easily apply complex and optimized techniques in the recognition of biomedical event triggers, making its application a simple routine task. We believe this work is an important contribution to the biomedical text mining community, contributing to improved and faster event recognition on scientific articles, and consequent hypothesis generation and knowledge discovery. This solution is freely available as open source at http://bioinformatics.ua.pt/trigner.</p>","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"9 1","pages":"1"},"PeriodicalIF":0.0,"publicationDate":"2014-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1751-0473-9-1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32010174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genivaldo Gz Silva, Bas E Dutilh, T David Matthews, Keri Elkins, Robert Schmieder, Elizabeth A Dinsdale, Robert A Edwards
Genome sequencing has become routine, however genome assembly still remains a challenge despite the computational advances in the last decade. In particular, the abundance of repeat elements in genomes makes it difficult to assemble them into a single complete sequence. Identical repeats shorter than the average read length can generally be assembled without issue. However, longer repeats such as ribosomal RNA operons cannot be accurately assembled using existing tools. The application Scaffold_builder was designed to generate scaffolds - super contigs of sequences joined by N-bases - based on the similarity to a closely related reference sequence. This is independent of mate-pair information and can be used complementarily for genome assembly, e.g. when mate-pairs are not available or have already been exploited. Scaffold_builder was evaluated using simulated pyrosequencing reads of the bacterial genomes Escherichia coli 042, Lactobacillus salivarius UCC118 and Salmonella enterica subsp. enterica serovar Typhi str. P-stx-12. Moreover, we sequenced two genomes from Salmonella enterica serovar Typhimurium LT2 G455 and Salmonella enterica serovar Typhimurium SDT1291 and show that Scaffold_builder decreases the number of contig sequences by 53% while more than doubling their average length. Scaffold_builder is written in Python and is available at http://edwards.sdsu.edu/scaffold_builder. A web-based implementation is additionally provided to allow users to submit a reference genome and a set of contigs to be scaffolded.
{"title":"Combining de novo and reference-guided assembly with scaffold_builder.","authors":"Genivaldo Gz Silva, Bas E Dutilh, T David Matthews, Keri Elkins, Robert Schmieder, Elizabeth A Dinsdale, Robert A Edwards","doi":"10.1186/1751-0473-8-23","DOIUrl":"https://doi.org/10.1186/1751-0473-8-23","url":null,"abstract":"<p><p>Genome sequencing has become routine, however genome assembly still remains a challenge despite the computational advances in the last decade. In particular, the abundance of repeat elements in genomes makes it difficult to assemble them into a single complete sequence. Identical repeats shorter than the average read length can generally be assembled without issue. However, longer repeats such as ribosomal RNA operons cannot be accurately assembled using existing tools. The application Scaffold_builder was designed to generate scaffolds - super contigs of sequences joined by N-bases - based on the similarity to a closely related reference sequence. This is independent of mate-pair information and can be used complementarily for genome assembly, e.g. when mate-pairs are not available or have already been exploited. Scaffold_builder was evaluated using simulated pyrosequencing reads of the bacterial genomes Escherichia coli 042, Lactobacillus salivarius UCC118 and Salmonella enterica subsp. enterica serovar Typhi str. P-stx-12. Moreover, we sequenced two genomes from Salmonella enterica serovar Typhimurium LT2 G455 and Salmonella enterica serovar Typhimurium SDT1291 and show that Scaffold_builder decreases the number of contig sequences by 53% while more than doubling their average length. Scaffold_builder is written in Python and is available at http://edwards.sdsu.edu/scaffold_builder. A web-based implementation is additionally provided to allow users to submit a reference genome and a set of contigs to be scaffolded. </p>","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"8 1","pages":"23"},"PeriodicalIF":0.0,"publicationDate":"2013-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1751-0473-8-23","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31895068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andreas Dander, Laurin Aj Mueller, Ralf Gallasch, Stephan Pabinger, Frank Emmert-Streib, Armin Graber, Matthias Dehmer
Background: Molecular descriptors have been extensively used in the field of structure-oriented drug design and structural chemistry. They have been applied in QSPR and QSAR models to predict ADME-Tox properties, which specify essential features for drugs. Molecular descriptors capture chemical and structural information, but investigating their interpretation and meaning remains very challenging.
Results: This paper introduces a large-scale database of molecular descriptors called COMMODE containing more than 25 million compounds originated from PubChem. About 2500 DRAGON-descriptors have been calculated for all compounds and integrated into this database, which is accessible through a web interface at http://commode.i-med.ac.at.
{"title":"[COMMODE] a large-scale database of molecular descriptors using compounds from PubChem.","authors":"Andreas Dander, Laurin Aj Mueller, Ralf Gallasch, Stephan Pabinger, Frank Emmert-Streib, Armin Graber, Matthias Dehmer","doi":"10.1186/1751-0473-8-22","DOIUrl":"https://doi.org/10.1186/1751-0473-8-22","url":null,"abstract":"<p><strong>Background: </strong>Molecular descriptors have been extensively used in the field of structure-oriented drug design and structural chemistry. They have been applied in QSPR and QSAR models to predict ADME-Tox properties, which specify essential features for drugs. Molecular descriptors capture chemical and structural information, but investigating their interpretation and meaning remains very challenging.</p><p><strong>Results: </strong>This paper introduces a large-scale database of molecular descriptors called COMMODE containing more than 25 million compounds originated from PubChem. About 2500 DRAGON-descriptors have been calculated for all compounds and integrated into this database, which is accessible through a web interface at http://commode.i-med.ac.at.</p>","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"8 1","pages":"22"},"PeriodicalIF":0.0,"publicationDate":"2013-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1751-0473-8-22","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31860791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matthew Bellgard, Christophe Beroud, Kay Parkinson, Tess Harris, Segolene Ayme, Gareth Baynam, Tarun Weeramanthri, Hugh Dawkins, Adam Hunter
Rare disease registries (RDRs) are an essential tool to improve knowledge and monitor interventions for rare diseases. If designed appropriately, patient and disease related information captured within them can become the cornerstone for effective diagnosis and new therapies. Surprisingly however, registries possess a diverse range of functionality, operate in different, often-times incompatible, software environments and serve various, and sometimes incongruous, purposes. Given the ambitious goals of the International Rare Diseases Research Consortium (IRDiRC) by 2020 and beyond, RDRs must be designed with the agility to evolve and efficiently interoperate in an ever changing rare disease landscape, as well as to cater for rapid changes in Information Communication Technologies. In this paper, we contend that RDR requirements will also evolve in response to a number of factors such as changing disease definitions and diagnostic criteria, the requirement to integrate patient/disease information from advances in either biotechnology and/or phenotypying approaches, as well as the need to adapt dynamically to security and privacy concerns. We dispel a number of myths in RDR development, outline key criteria for robust and sustainable RDR implementation and introduce the concept of a RDR Checklist to guide future RDR development.
{"title":"Dispelling myths about rare disease registry system development.","authors":"Matthew Bellgard, Christophe Beroud, Kay Parkinson, Tess Harris, Segolene Ayme, Gareth Baynam, Tarun Weeramanthri, Hugh Dawkins, Adam Hunter","doi":"10.1186/1751-0473-8-21","DOIUrl":"https://doi.org/10.1186/1751-0473-8-21","url":null,"abstract":"<p><p>Rare disease registries (RDRs) are an essential tool to improve knowledge and monitor interventions for rare diseases. If designed appropriately, patient and disease related information captured within them can become the cornerstone for effective diagnosis and new therapies. Surprisingly however, registries possess a diverse range of functionality, operate in different, often-times incompatible, software environments and serve various, and sometimes incongruous, purposes. Given the ambitious goals of the International Rare Diseases Research Consortium (IRDiRC) by 2020 and beyond, RDRs must be designed with the agility to evolve and efficiently interoperate in an ever changing rare disease landscape, as well as to cater for rapid changes in Information Communication Technologies. In this paper, we contend that RDR requirements will also evolve in response to a number of factors such as changing disease definitions and diagnostic criteria, the requirement to integrate patient/disease information from advances in either biotechnology and/or phenotypying approaches, as well as the need to adapt dynamically to security and privacy concerns. We dispel a number of myths in RDR development, outline key criteria for robust and sustainable RDR implementation and introduce the concept of a RDR Checklist to guide future RDR development.</p>","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"8 1","pages":"21"},"PeriodicalIF":0.0,"publicationDate":"2013-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1751-0473-8-21","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31811774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gert Wollny, Peter Kellman, María-Jesus Ledesma-Carbayo, Matthew M Skinner, Jean-Jaques Hublin, Thomas Hierl
Background: Gray scale images make the bulk of data in bio-medical image analysis, and hence, the main focus of many image processing tasks lies in the processing of these monochrome images. With ever improving acquisition devices, spatial and temporal image resolution increases, and data sets become very large.Various image processing frameworks exists that make the development of new algorithms easy by using high level programming languages or visual programming. These frameworks are also accessable to researchers that have no background or little in software development because they take care of otherwise complex tasks. Specifically, the management of working memory is taken care of automatically, usually at the price of requiring more it. As a result, processing large data sets with these tools becomes increasingly difficult on work station class computers.One alternative to using these high level processing tools is the development of new algorithms in a languages like C++, that gives the developer full control over how memory is handled, but the resulting workflow for the prototyping of new algorithms is rather time intensive, and also not appropriate for a researcher with little or no knowledge in software development.Another alternative is in using command line tools that run image processing tasks, use the hard disk to store intermediate results, and provide automation by using shell scripts. Although not as convenient as, e.g. visual programming, this approach is still accessable to researchers without a background in computer science. However, only few tools exist that provide this kind of processing interface, they are usually quite task specific, and don't provide an clear approach when one wants to shape a new command line tool from a prototype shell script.
Results: The proposed framework, MIA, provides a combination of command line tools, plug-ins, and libraries that make it possible to run image processing tasks interactively in a command shell and to prototype by using the according shell scripting language. Since the hard disk becomes the temporal storage memory management is usually a non-issue in the prototyping phase. By using string-based descriptions for filters, optimizers, and the likes, the transition from shell scripts to full fledged programs implemented in C++ is also made easy. In addition, its design based on atomic plug-ins and single tasks command line tools makes it easy to extend MIA, usually without the requirement to touch or recompile existing code.
Conclusion: In this article, we describe the general design of MIA, a general purpouse framework for gray scale image processing. We demonstrated the applicability of the software with example applications from three different research scenarios, namely motion compensation in myocardial perfusion imaging, the processing of high resolution image data that arises in virtual anthropology, and retrospe
{"title":"MIA - A free and open source software for gray scale medical image analysis.","authors":"Gert Wollny, Peter Kellman, María-Jesus Ledesma-Carbayo, Matthew M Skinner, Jean-Jaques Hublin, Thomas Hierl","doi":"10.1186/1751-0473-8-20","DOIUrl":"10.1186/1751-0473-8-20","url":null,"abstract":"<p><strong>Background: </strong>Gray scale images make the bulk of data in bio-medical image analysis, and hence, the main focus of many image processing tasks lies in the processing of these monochrome images. With ever improving acquisition devices, spatial and temporal image resolution increases, and data sets become very large.Various image processing frameworks exists that make the development of new algorithms easy by using high level programming languages or visual programming. These frameworks are also accessable to researchers that have no background or little in software development because they take care of otherwise complex tasks. Specifically, the management of working memory is taken care of automatically, usually at the price of requiring more it. As a result, processing large data sets with these tools becomes increasingly difficult on work station class computers.One alternative to using these high level processing tools is the development of new algorithms in a languages like C++, that gives the developer full control over how memory is handled, but the resulting workflow for the prototyping of new algorithms is rather time intensive, and also not appropriate for a researcher with little or no knowledge in software development.Another alternative is in using command line tools that run image processing tasks, use the hard disk to store intermediate results, and provide automation by using shell scripts. Although not as convenient as, e.g. visual programming, this approach is still accessable to researchers without a background in computer science. However, only few tools exist that provide this kind of processing interface, they are usually quite task specific, and don't provide an clear approach when one wants to shape a new command line tool from a prototype shell script.</p><p><strong>Results: </strong>The proposed framework, MIA, provides a combination of command line tools, plug-ins, and libraries that make it possible to run image processing tasks interactively in a command shell and to prototype by using the according shell scripting language. Since the hard disk becomes the temporal storage memory management is usually a non-issue in the prototyping phase. By using string-based descriptions for filters, optimizers, and the likes, the transition from shell scripts to full fledged programs implemented in C++ is also made easy. In addition, its design based on atomic plug-ins and single tasks command line tools makes it easy to extend MIA, usually without the requirement to touch or recompile existing code.</p><p><strong>Conclusion: </strong>In this article, we describe the general design of MIA, a general purpouse framework for gray scale image processing. We demonstrated the applicability of the software with example applications from three different research scenarios, namely motion compensation in myocardial perfusion imaging, the processing of high resolution image data that arises in virtual anthropology, and retrospe","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"8 1","pages":"20"},"PeriodicalIF":0.0,"publicationDate":"2013-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1751-0473-8-20","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31800990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}