Pub Date : 2018-11-14DOI: 10.7287/peerj.preprints.27347v1
C. Colantuono, Marco Miralto, Mara Sangiovanni, Luca Ambrosino, M. Chiusano
Next-generation sequencing (NGS) technologies are greatly facilitating the sequencing of whole genomes leading to the production of different gene annotations, released often from both reference resources (such as NCBI or Ensembl) and specific consortia. All these annotations are in general very heterogeneous and not cross-linked, providing ambiguous knowledge to the users. In order to give a quick view of what is available, and trying to centralize all the genomic information of reference marine species, we set up GENOMA (GENOmes for MArine biology). GENOMA is a multilevel platform that includes all the available genome assemblies and gene annotations about 12 species (Acanthaster planci, Branchiostoma floridae, Ciona robusta, Ciona savignyi, Gasterosteus aculeatus, Octopus bimaculoides, Patiria miniata, Phaeodactylum tricornutum, Ptychodera flava and Saccoglossus kowalevskii). Each species has a dedicated JBroswe and web page, where is summarized the comparison between the different genome versions and gene annotations available, together with the possibility to directly download all the information. Moreover, an interactive table including the union of different gene annotations is also consultable on-line. Finally, a query page system that allows to search specific features in one or more annotations simultaneously, is embedded in the platform. GENOMA is publicly available at http://bioinfo.szn.it/genoma/.
{"title":"GENOMA: a multilevel platform for marine biology","authors":"C. Colantuono, Marco Miralto, Mara Sangiovanni, Luca Ambrosino, M. Chiusano","doi":"10.7287/peerj.preprints.27347v1","DOIUrl":"https://doi.org/10.7287/peerj.preprints.27347v1","url":null,"abstract":"Next-generation sequencing (NGS) technologies are greatly facilitating the sequencing of whole genomes leading to the production of different gene annotations, released often from both reference resources (such as NCBI or Ensembl) and specific consortia. All these annotations are in general very heterogeneous and not cross-linked, providing ambiguous knowledge to the users. In order to give a quick view of what is available, and trying to centralize all the genomic information of reference marine species, we set up GENOMA (GENOmes for MArine biology). GENOMA is a multilevel platform that includes all the available genome assemblies and gene annotations about 12 species (Acanthaster planci, Branchiostoma floridae, Ciona robusta, Ciona savignyi, Gasterosteus aculeatus, Octopus bimaculoides, Patiria miniata, Phaeodactylum tricornutum, Ptychodera flava and Saccoglossus kowalevskii). Each species has a dedicated JBroswe and web page, where is summarized the comparison between the different genome versions and gene annotations available, together with the possibility to directly download all the information. Moreover, an interactive table including the union of different gene annotations is also consultable on-line. Finally, a query page system that allows to search specific features in one or more annotations simultaneously, is embedded in the platform. GENOMA is publicly available at http://bioinfo.szn.it/genoma/.","PeriodicalId":93040,"journal":{"name":"PeerJ preprints","volume":"112 1","pages":"e27347"},"PeriodicalIF":0.0,"publicationDate":"2018-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85341730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-13DOI: 10.7287/peerj.preprints.27344v1
Mara Sangiovanni, R. Piredda, Marco Miralto, M. Tangherlini, M. Chiusano
Long-term observatories are widely used in marine sciences to monitor marine ecosystems and investigate their evolution. Recently, data from innovative technologies as well as ‘omics-based' approaches is being collected alongside physical, biogeochemical and taxonomic information. Their integration represents a challenging opportunity, pushing for suitable computational approaches to for data retrieval, storage, interoperability, reusability and sharing. Several initiatives are addressing these issues, suggesting the most appropriate and sensitive strategies and protocols. Ensuring interoperability among different sources and providing seamless data access is essential when designing tools to store and share the collected information.Here we present our effort in the development of web-accessible resources for Long-Term Ecosystem Research (LTER), taking into account available protocols and approaching appropriate software solutions for: i) collecting and integrating real-time environmental and biological observations with -omics data; ii) exploiting international established data formats and protocols to expose through RESTful APIs the collected data; iii) accessing the collections through an interactive, web-accessible resource to permit aggregated views.The aim of this effort is to reinforce the leadership of the Stazione Zoologica “Anton Dohrn” as a Mediterranean Sea marine observatory, and to be ready for the next era challenges in marine biology.
{"title":"Data sharing and interoperability from multi-source long term observations: challenges and opportunities in marine biology","authors":"Mara Sangiovanni, R. Piredda, Marco Miralto, M. Tangherlini, M. Chiusano","doi":"10.7287/peerj.preprints.27344v1","DOIUrl":"https://doi.org/10.7287/peerj.preprints.27344v1","url":null,"abstract":"Long-term observatories are widely used in marine sciences to monitor marine ecosystems and investigate their evolution. Recently, data from innovative technologies as well as ‘omics-based' approaches is being collected alongside physical, biogeochemical and taxonomic information. Their integration represents a challenging opportunity, pushing for suitable computational approaches to for data retrieval, storage, interoperability, reusability and sharing. Several initiatives are addressing these issues, suggesting the most appropriate and sensitive strategies and protocols. Ensuring interoperability among different sources and providing seamless data access is essential when designing tools to store and share the collected information.Here we present our effort in the development of web-accessible resources for Long-Term Ecosystem Research (LTER), taking into account available protocols and approaching appropriate software solutions for: i) collecting and integrating real-time environmental and biological observations with -omics data; ii) exploiting international established data formats and protocols to expose through RESTful APIs the collected data; iii) accessing the collections through an interactive, web-accessible resource to permit aggregated views.The aim of this effort is to reinforce the leadership of the Stazione Zoologica “Anton Dohrn” as a Mediterranean Sea marine observatory, and to be ready for the next era challenges in marine biology.","PeriodicalId":93040,"journal":{"name":"PeerJ preprints","volume":"1 1","pages":"e27344"},"PeriodicalIF":0.0,"publicationDate":"2018-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90880958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-12DOI: 10.7287/peerj.preprints.27339v1
B. Aziz, Jeyong Jung
Steganography and Steganalysis in recent years have become an important area of research involving dierent applications. Steganography is the process of hiding secret data into any digital media without any signicant notable changes in a cover object, while steganalysis is the process of detecting hiding content in the cover object. In this study, we evaluated one of the modern automated steganalysis tools, Stegdetect, to study its false negative rates when analysing a bulk of images. In so doing, we used JPHide method to embed a randomly generated messages into 2000 JPEG images. The aim of this study is to help digital forensics analysts during their investigations by means of providing an idea of the false negative rates of Stegdetect. This study found that (1) the false negative rates depended largely on the tool's sensitivity values, (2) the tool had a high false negative rate between the sensitivity values from 0.1 to 3.4 and (3) the best sensitivity value for detection of JPHide method was 6.2. It is recommended that when analysing a huge bulk of images forensic analysts need to take into consideration sensitivity values to reduce the false negative rates of Stegdetect.
{"title":"A false negative study of the steganalysis tool: Stegdetect","authors":"B. Aziz, Jeyong Jung","doi":"10.7287/peerj.preprints.27339v1","DOIUrl":"https://doi.org/10.7287/peerj.preprints.27339v1","url":null,"abstract":"Steganography and Steganalysis in recent years have become an important area of research involving dierent applications. Steganography is the process of hiding secret data into any digital media without any signicant notable changes in a cover object, while steganalysis is the process of detecting hiding content in the cover object. In this study, we evaluated one of the modern automated steganalysis tools, Stegdetect, to study its false negative rates when analysing a bulk of images. In so doing, we used JPHide method to embed a randomly generated messages into 2000 JPEG images. The aim of this study is to help digital forensics analysts during their investigations by means of providing an idea of the false negative rates of Stegdetect. This study found that (1) the false negative rates depended largely on the tool's sensitivity values, (2) the tool had a high false negative rate between the sensitivity values from 0.1 to 3.4 and (3) the best sensitivity value for detection of JPHide method was 6.2. It is recommended that when analysing a huge bulk of images forensic analysts need to take into consideration sensitivity values to reduce the false negative rates of Stegdetect.","PeriodicalId":93040,"journal":{"name":"PeerJ preprints","volume":"9 1","pages":"e27339"},"PeriodicalIF":0.0,"publicationDate":"2018-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75090839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-10DOI: 10.7287/peerj.preprints.27350v1
Shaun C. D'Souza
In this paper we look at the Eclipse IDE and its support for CDT (C/C++ Development Tools). Eclipse is an open source IDE and supports a variety of programming languages including plugin functionality. Eclipse supports the standard GNU environment for compiling, building and debugging applications. The CDT is a plugin which enables development of C/C++ applications in eclipse. It enables functionality including code browsing, syntax highlighting and code completion. We verify a 50X improvement in LOC automation for Fake class .cpp / .h and class test .cpp code generation.
{"title":"Eclipse CDT code analysis and unit testing","authors":"Shaun C. D'Souza","doi":"10.7287/peerj.preprints.27350v1","DOIUrl":"https://doi.org/10.7287/peerj.preprints.27350v1","url":null,"abstract":"In this paper we look at the Eclipse IDE and its support for CDT (C/C++ Development Tools). Eclipse is an open source IDE and supports a variety of programming languages including plugin functionality. Eclipse supports the standard GNU environment for compiling, building and debugging applications. The CDT is a plugin which enables development of C/C++ applications in eclipse. It enables functionality including code browsing, syntax highlighting and code completion. We verify a 50X improvement in LOC automation for Fake class .cpp / .h and class test .cpp code generation.","PeriodicalId":93040,"journal":{"name":"PeerJ preprints","volume":"56 1","pages":"e27350"},"PeriodicalIF":0.0,"publicationDate":"2018-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76613942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-07DOI: 10.7287/peerj.preprints.27327v1
P. Palumbo, M. Vanoni, F. Papa, S. Busti, L. Alberghina
One of the most challenging fields in Life Science research is to deeply understand how complex cellular functions arise from the interactions of molecules in living cells. Mathematical and computational methods in Systems Biology are fundamental to study the complex molecular interactions within biological systems and to accelerate discoveries. Within this framework, a need exists to integrate different mathematical tools in order to develop quantitative models of entire organisms, i.e. whole-cell models. This note presents a first attempt to show the feasibility of such a task for the budding yeast Saccharomyces cerevisiae, a model organism for eukaryotic cells: the proposed model refers to the main cellular activities like metabolism, growth and cycle in a modular fashion, therefore allowing to treat them separately as single input/output modules, as well as to interconnect them in order to build the backbone of a coarse-grain whole cell model. The model modularity allows to substitute a low granularity module with one with a finer grain, whenever molecular details are required to correctly reproduce specific experiments. Furthermore, by properly setting the cellular division, simulations of cell populations are achieved, able to deal with protein distributions. Whole cell modeling will help understanding logic of cell resilience.
{"title":"Whole yeast model: what and why","authors":"P. Palumbo, M. Vanoni, F. Papa, S. Busti, L. Alberghina","doi":"10.7287/peerj.preprints.27327v1","DOIUrl":"https://doi.org/10.7287/peerj.preprints.27327v1","url":null,"abstract":"One of the most challenging fields in Life Science research is to deeply understand how complex cellular functions arise from the interactions of molecules in living cells. Mathematical and computational methods in Systems Biology are fundamental to study the complex molecular interactions within biological systems and to accelerate discoveries. Within this framework, a need exists to integrate different mathematical tools in order to develop quantitative models of entire organisms, i.e. whole-cell models. This note presents a first attempt to show the feasibility of such a task for the budding yeast Saccharomyces cerevisiae, a model organism for eukaryotic cells: the proposed model refers to the main cellular activities like metabolism, growth and cycle in a modular fashion, therefore allowing to treat them separately as single input/output modules, as well as to interconnect them in order to build the backbone of a coarse-grain whole cell model. The model modularity allows to substitute a low granularity module with one with a finer grain, whenever molecular details are required to correctly reproduce specific experiments. Furthermore, by properly setting the cellular division, simulations of cell populations are achieved, able to deal with protein distributions. Whole cell modeling will help understanding logic of cell resilience.","PeriodicalId":93040,"journal":{"name":"PeerJ preprints","volume":"59 1","pages":"e27327"},"PeriodicalIF":0.0,"publicationDate":"2018-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84261659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-05DOI: 10.7287/peerj.preprints.27317v2
G. Spinozzi, V. Tini, Laura Mincarelli, B. Falini, M. Martelli
There are many methods available for each phase of the RNA-Seq analysis and each of them uses different algorithms. It is therefore useful to identify a pipeline that combines the best tools in terms of time and results. For this purpose, we compared five different pipelines, obtained by combining the most used tools in RNA-Seq analysis. Using RNA-Seq data on samples of different Acute Myeloid Leukemia (AML) cell lines, we compared five pipelines from the alignment to the differential expression analysis (DEA). For each one we evaluated the peak of RAM and time and then compared the differentially expressed genes identified by each pipeline. It emerged that the pipeline with shorter times, lower consumption of RAM and more reliable results, is that which involves the use ofHISAT2for alignment, featureCountsfor quantification and edgeRfor differential analysis. Finally, we developed an automated pipeline that recurs by default to the cited pipeline, but it also allows to choose between different tools. In addition, the pipeline makes a final meta-analysis that includes a Gene Ontology and Pathway analysis. The results can be viewed in an interactive Shiny Appand exported in a report (pdf, word or html formats).
{"title":"A comprehensive RNA-Seq pipeline includes meta-analysis, interactivity and automatic reporting","authors":"G. Spinozzi, V. Tini, Laura Mincarelli, B. Falini, M. Martelli","doi":"10.7287/peerj.preprints.27317v2","DOIUrl":"https://doi.org/10.7287/peerj.preprints.27317v2","url":null,"abstract":"There are many methods available for each phase of the RNA-Seq analysis and each of them uses different algorithms. It is therefore useful to identify a pipeline that combines the best tools in terms of time and results. For this purpose, we compared five different pipelines, obtained by combining the most used tools in RNA-Seq analysis. Using RNA-Seq data on samples of different Acute Myeloid Leukemia (AML) cell lines, we compared five pipelines from the alignment to the differential expression analysis (DEA). For each one we evaluated the peak of RAM and time and then compared the differentially expressed genes identified by each pipeline. It emerged that the pipeline with shorter times, lower consumption of RAM and more reliable results, is that which involves the use ofHISAT2for alignment, featureCountsfor quantification and edgeRfor differential analysis. Finally, we developed an automated pipeline that recurs by default to the cited pipeline, but it also allows to choose between different tools. In addition, the pipeline makes a final meta-analysis that includes a Gene Ontology and Pathway analysis. The results can be viewed in an interactive Shiny Appand exported in a report (pdf, word or html formats).","PeriodicalId":93040,"journal":{"name":"PeerJ preprints","volume":"7 1","pages":"e27317"},"PeriodicalIF":0.0,"publicationDate":"2018-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79726102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.7287/peerj.preprints.27314v1
Sebastian Daberdaku
Protein pockets and cavities usually coincide with the active sites of biological processes, and their identification is significant since it constitutes an important step for structure-based drug design and protein-ligand docking applications. This research presents PoCavEDT, an automated purely geometric technique for the identification of binding pockets and occluded cavities in proteins based on the 3D Euclidean Distance Transform. Candidate protein pocket regions are identified between two Solvent-Excluded surfaces generated with the Euclidean Distance Transform using different probe spheres, which depend on the size of the binding ligand. The application of simple, yet effective geometrical heuristics ensures that the proposed method obtains very good ligand binding site prediction results. The method was applied to a representative set of protein-ligand complexes and their corresponding unbound protein structures to evaluate its ligand binding site prediction capabilities. Its performance was compared to the results achieved with several purely geometric pocket and cavity prediction methods, namely SURFNET, PASS, CAST, LIGSITE, LIGSITECS, PocketPicker and POCASA. Success rates PoCavEDT were comparable to those of POCASA and outperformed the other software.
{"title":"Identification of protein pockets and cavities by Euclidean Distance Transform","authors":"Sebastian Daberdaku","doi":"10.7287/peerj.preprints.27314v1","DOIUrl":"https://doi.org/10.7287/peerj.preprints.27314v1","url":null,"abstract":"Protein pockets and cavities usually coincide with the active sites of biological processes, and their identification is significant since it constitutes an important step for structure-based drug design and protein-ligand docking applications. This research presents PoCavEDT, an automated purely geometric technique for the identification of binding pockets and occluded cavities in proteins based on the 3D Euclidean Distance Transform. Candidate protein pocket regions are identified between two Solvent-Excluded surfaces generated with the Euclidean Distance Transform using different probe spheres, which depend on the size of the binding ligand. The application of simple, yet effective geometrical heuristics ensures that the proposed method obtains very good ligand binding site prediction results. The method was applied to a representative set of protein-ligand complexes and their corresponding unbound protein structures to evaluate its ligand binding site prediction capabilities. Its performance was compared to the results achieved with several purely geometric pocket and cavity prediction methods, namely SURFNET, PASS, CAST, LIGSITE, LIGSITECS, PocketPicker and POCASA. Success rates PoCavEDT were comparable to those of POCASA and outperformed the other software.","PeriodicalId":93040,"journal":{"name":"PeerJ preprints","volume":"13 1","pages":"e27314"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84801515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-31DOI: 10.7287/peerj.preprints.27312v1
Adnan Muslija, Eduard Paul Enoiu
Software complexity metrics, such as code size and cyclomatic complexity, have been used in the software engineering community for predicting quality metrics such as maintainability, bug proneness and robustness. However, not many studies have addressed the relationship between complexity metrics and software testing and there is little experimental evidence to support the use of these code metrics in the estimation of test effort. We have investigated and evaluated the relationship between test effort (i.e, number of test cases and test execution time) and software complexity metrics for industrial control software used in an embedded system. We show how to measure different software complexity metrics such as number of elements, cyclomatic complexity, and information flow for a popular programming language named FBD used in the safety critical domain. In addition, we use test data and test suites created by experienced test engineers working at Bombardier Transportation Sweden AB to evaluate the correlation between several complexity measures and the testing effort. We found that there is a moderate correlation between software complexity metrics and test effort. In addition, the results show that the software size (i.e., number of elements in the FBD program) provides the highest correlation level with the number of test cases created and test execution time. Our results suggest that software size and structure metrics, while useful for identifying parts of the system that are more complicated, should not be solely used for identifying parts of the system for which test engineers might need to create more test cases. A potential explanation of this result concerns the nature of testing, since other attributes such as the level of thorough testing required and the size of the specifications can influence the creation of test cases. In addition, we used a linear regression model to estimate the test effort using the software complexity measurement results.
{"title":"On the correlation between testing effort and software complexity metrics","authors":"Adnan Muslija, Eduard Paul Enoiu","doi":"10.7287/peerj.preprints.27312v1","DOIUrl":"https://doi.org/10.7287/peerj.preprints.27312v1","url":null,"abstract":"Software complexity metrics, such as code size and cyclomatic complexity, have been used in the software engineering community for predicting quality metrics such as maintainability, bug proneness and robustness. However, not many studies have addressed the relationship between complexity metrics and software testing and there is little experimental evidence to support the use of these code metrics in the estimation of test effort. We have investigated and evaluated the relationship between test effort (i.e, number of test cases and test execution time) and software complexity metrics for industrial control software used in an embedded system. We show how to measure different software complexity metrics such as number of elements, cyclomatic complexity, and information flow for a popular programming language named FBD used in the safety critical domain. In addition, we use test data and test suites created by experienced test engineers working at Bombardier Transportation Sweden AB to evaluate the correlation between several complexity measures and the testing effort. We found that there is a moderate correlation between software complexity metrics and test effort. In addition, the results show that the software size (i.e., number of elements in the FBD program) provides the highest correlation level with the number of test cases created and test execution time. Our results suggest that software size and structure metrics, while useful for identifying parts of the system that are more complicated, should not be solely used for identifying parts of the system for which test engineers might need to create more test cases. A potential explanation of this result concerns the nature of testing, since other attributes such as the level of thorough testing required and the size of the specifications can influence the creation of test cases. In addition, we used a linear regression model to estimate the test effort using the software complexity measurement results.","PeriodicalId":93040,"journal":{"name":"PeerJ preprints","volume":"26 1","pages":"e27312"},"PeriodicalIF":0.0,"publicationDate":"2018-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83040043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-26DOI: 10.7287/peerj.preprints.27299v1
D. Wright, R. Richardson, P. Coveney
The concept underlying precision medicine is that prevention, diagnosis and treatment of pathologies such as cancer can be improved through an understanding of the influence of individual patient characteristics. Predictive medicine seeks to derive this understanding through mechanistic models of the causes and (potential) progression of diseases within a given individual. This represents a grand challenge for computational biomedicine as it requires the integration of highly varied (and potentially vast) quantitative experimental datasets into models of complex biological systems. It is becoming increasingly clear that this challenge can only be answered through the use of complex workflows that combine diverse analyses and whose design is informed by an understanding of how predictions must be accompanied by estimates of uncertainty. Each stage in such a workflow can, in general, have very different computational requirements. If funding bodies and the HPC community are serious about the desire to support such approaches, they must consider the need for portable, persistent and stable tools designed to promote extensive long term development and testing of these workflows. From the perspective of model developers (and with even greater relevance to potential clinical or experimental collaborators) the enormous diversity of interfaces and supercomputer policies, frequently designed with monolithic applications in mind, can represent a serious barrier to innovation. Here we use experiences from work on two very different biomedical modeling scenarios - brain bloodflow and small molecule drug selection - to highlight issues with the current programming and execution environments and suggest potential solutions.
{"title":"Practical challenges for biomedical modeling using HPC","authors":"D. Wright, R. Richardson, P. Coveney","doi":"10.7287/peerj.preprints.27299v1","DOIUrl":"https://doi.org/10.7287/peerj.preprints.27299v1","url":null,"abstract":"The concept underlying precision medicine is that prevention, diagnosis and treatment of pathologies such as cancer can be improved through an understanding of the influence of individual patient characteristics. Predictive medicine seeks to derive this understanding through mechanistic models of the causes and (potential) progression of diseases within a given individual. This represents a grand challenge for computational biomedicine as it requires the integration of highly varied (and potentially vast) quantitative experimental datasets into models of complex biological systems. It is becoming increasingly clear that this challenge can only be answered through the use of complex workflows that combine diverse analyses and whose design is informed by an understanding of how predictions must be accompanied by estimates of uncertainty. Each stage in such a workflow can, in general, have very different computational requirements. If funding bodies and the HPC community are serious about the desire to support such approaches, they must consider the need for portable, persistent and stable tools designed to promote extensive long term development and testing of these workflows. From the perspective of model developers (and with even greater relevance to potential clinical or experimental collaborators) the enormous diversity of interfaces and supercomputer policies, frequently designed with monolithic applications in mind, can represent a serious barrier to innovation. Here we use experiences from work on two very different biomedical modeling scenarios - brain bloodflow and small molecule drug selection - to highlight issues with the current programming and execution environments and suggest potential solutions.","PeriodicalId":93040,"journal":{"name":"PeerJ preprints","volume":"23 1","pages":"e27299"},"PeriodicalIF":0.0,"publicationDate":"2018-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85007659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-15DOI: 10.7287/peerj.preprints.27280v1
Thomas Miano
Machine learning is a field of study that uses computational and statistical techniques to enable computers to learn. When machine learning is applied, it functions as an instrument that can solve problems or expand knowledge about the surrounding world. Increasingly, machine learning is also an instrument for artistic expression in digital and non-digital media. While painted art has existed for thousands of years, the oldest digital art is less than a century old. Digital media as an art form is a relatively nascent, and the practice of machine learning in digital art is even more recent. Across all artistic media, a piece is powerful when it can captivate its consumer. Such captivation can be elicited through through a wide variety of methods including but not limited to distinct technique, emotionally evocative communication, and aesthetically pleasing combinations of textures. This work aims to explore how machine learning can be used simultaneously as a scientific instrument for understanding the world and as an artistic instrument for inspiring awe. Specifically, our goal is to build an end-to-end system that uses modern machine learning techniques to accurately recognize sounds in the natural environment and to communicate via visualization those sounds that it has recognized. We validate existing research by finding that convolutional neural networks, when paired with transfer learning using out-of-domain data, can be successful in mapping an image classification task to a sound classification task. Our work offers a novel application where the model used for performant sound classification is also used for visualization in an end-to-end, sound-to-image system.
{"title":"Hear and See: End-to-end sound classification and visualization of classified sounds","authors":"Thomas Miano","doi":"10.7287/peerj.preprints.27280v1","DOIUrl":"https://doi.org/10.7287/peerj.preprints.27280v1","url":null,"abstract":"Machine learning is a field of study that uses computational and statistical techniques to enable computers to learn. When machine learning is applied, it functions as an instrument that can solve problems or expand knowledge about the surrounding world. Increasingly, machine learning is also an instrument for artistic expression in digital and non-digital media. While painted art has existed for thousands of years, the oldest digital art is less than a century old. Digital media as an art form is a relatively nascent, and the practice of machine learning in digital art is even more recent. Across all artistic media, a piece is powerful when it can captivate its consumer. Such captivation can be elicited through through a wide variety of methods including but not limited to distinct technique, emotionally evocative communication, and aesthetically pleasing combinations of textures. This work aims to explore how machine learning can be used simultaneously as a scientific instrument for understanding the world and as an artistic instrument for inspiring awe. Specifically, our goal is to build an end-to-end system that uses modern machine learning techniques to accurately recognize sounds in the natural environment and to communicate via visualization those sounds that it has recognized. We validate existing research by finding that convolutional neural networks, when paired with transfer learning using out-of-domain data, can be successful in mapping an image classification task to a sound classification task. Our work offers a novel application where the model used for performant sound classification is also used for visualization in an end-to-end, sound-to-image system.","PeriodicalId":93040,"journal":{"name":"PeerJ preprints","volume":"44 1","pages":"e27280"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87007210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}