PeerJ preprints最新文献_第6页

GENOMA: a multilevel platform for marine biology GENOMA:海洋生物学的多层次平台

PeerJ preprints

Pub Date : 2018-11-14 DOI: 10.7287/peerj.preprints.27347v1

C. Colantuono, Marco Miralto, Mara Sangiovanni, Luca Ambrosino, M. Chiusano

Next-generation sequencing (NGS) technologies are greatly facilitating the sequencing of whole genomes leading to the production of different gene annotations, released often from both reference resources (such as NCBI or Ensembl) and specific consortia. All these annotations are in general very heterogeneous and not cross-linked, providing ambiguous knowledge to the users. In order to give a quick view of what is available, and trying to centralize all the genomic information of reference marine species, we set up GENOMA (GENOmes for MArine biology). GENOMA is a multilevel platform that includes all the available genome assemblies and gene annotations about 12 species (Acanthaster planci, Branchiostoma floridae, Ciona robusta, Ciona savignyi, Gasterosteus aculeatus, Octopus bimaculoides, Patiria miniata, Phaeodactylum tricornutum, Ptychodera flava and Saccoglossus kowalevskii). Each species has a dedicated JBroswe and web page, where is summarized the comparison between the different genome versions and gene annotations available, together with the possibility to directly download all the information. Moreover, an interactive table including the union of different gene annotations is also consultable on-line. Finally, a query page system that allows to search specific features in one or more annotations simultaneously, is embedded in the platform. GENOMA is publicly available at http://bioinfo.szn.it/genoma/.

下一代测序(NGS)技术极大地促进了全基因组的测序，从而产生了不同的基因注释，这些注释通常来自参考资源(如NCBI或Ensembl)和特定联盟。一般来说，所有这些注释都是异构的，并且没有交叉链接，为用户提供了模糊的知识。为了快速查看可用的信息，并试图集中所有参考海洋物种的基因组信息，我们建立了GENOMA(基因组for marine biology)。GENOMA是一个多层平台，包括所有可用的基因组组装和基因注释约12个物种(棘蟹，佛罗里达Branchiostoma佛罗里达科，乔纳·robusta，乔纳·savignyi，棘虾，章鱼，鹦鹉螺，三角褐藻，黄斑拟虾和Saccoglossus kowalevski)。每个物种都有一个专门的JBroswe和网页，其中总结了不同基因组版本和基因注释之间的比较，并提供了直接下载所有信息的可能性。此外，还提供了包含不同基因注释联合的交互式表。最后，查询页面系统允许同时搜索一个或多个注释中的特定功能，该系统嵌入到平台中。GENOMA可在http://bioinfo.szn.it/genoma/公开获取。

{"title":"GENOMA: a multilevel platform for marine biology","authors":"C. Colantuono, Marco Miralto, Mara Sangiovanni, Luca Ambrosino, M. Chiusano","doi":"10.7287/peerj.preprints.27347v1","DOIUrl":"https://doi.org/10.7287/peerj.preprints.27347v1","url":null,"abstract":"Next-generation sequencing (NGS) technologies are greatly facilitating the sequencing of whole genomes leading to the production of different gene annotations, released often from both reference resources (such as NCBI or Ensembl) and specific consortia. All these annotations are in general very heterogeneous and not cross-linked, providing ambiguous knowledge to the users. In order to give a quick view of what is available, and trying to centralize all the genomic information of reference marine species, we set up GENOMA (GENOmes for MArine biology). GENOMA is a multilevel platform that includes all the available genome assemblies and gene annotations about 12 species (Acanthaster planci, Branchiostoma floridae, Ciona robusta, Ciona savignyi, Gasterosteus aculeatus, Octopus bimaculoides, Patiria miniata, Phaeodactylum tricornutum, Ptychodera flava and Saccoglossus kowalevskii). Each species has a dedicated JBroswe and web page, where is summarized the comparison between the different genome versions and gene annotations available, together with the possibility to directly download all the information. Moreover, an interactive table including the union of different gene annotations is also consultable on-line. Finally, a query page system that allows to search specific features in one or more annotations simultaneously, is embedded in the platform. GENOMA is publicly available at http://bioinfo.szn.it/genoma/.","PeriodicalId":93040,"journal":{"name":"PeerJ preprints","volume":"112 1","pages":"e27347"},"PeriodicalIF":0.0,"publicationDate":"2018-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85341730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Data sharing and interoperability from multi-source long term observations: challenges and opportunities in marine biology 来自多来源长期观测的数据共享和互操作性:海洋生物学的挑战和机遇

PeerJ preprints

Pub Date : 2018-11-13 DOI: 10.7287/peerj.preprints.27344v1

Mara Sangiovanni, R. Piredda, Marco Miralto, M. Tangherlini, M. Chiusano

Long-term observatories are widely used in marine sciences to monitor marine ecosystems and investigate their evolution. Recently, data from innovative technologies as well as ‘omics-based' approaches is being collected alongside physical, biogeochemical and taxonomic information. Their integration represents a challenging opportunity, pushing for suitable computational approaches to for data retrieval, storage, interoperability, reusability and sharing. Several initiatives are addressing these issues, suggesting the most appropriate and sensitive strategies and protocols. Ensuring interoperability among different sources and providing seamless data access is essential when designing tools to store and share the collected information.Here we present our effort in the development of web-accessible resources for Long-Term Ecosystem Research (LTER), taking into account available protocols and approaching appropriate software solutions for: i) collecting and integrating real-time environmental and biological observations with -omics data; ii) exploiting international established data formats and protocols to expose through RESTful APIs the collected data; iii) accessing the collections through an interactive, web-accessible resource to permit aggregated views.The aim of this effort is to reinforce the leadership of the Stazione Zoologica “Anton Dohrn” as a Mediterranean Sea marine observatory, and to be ready for the next era challenges in marine biology.

长期观测站在海洋科学中被广泛用于监测海洋生态系统并研究其演变。最近，除了物理、生物地球化学和分类学信息外，还收集了来自创新技术和“基于组学”方法的数据。它们的集成代表了一个具有挑战性的机会，推动了数据检索、存储、互操作性、可重用性和共享的合适计算方法。一些倡议正在解决这些问题，提出最适当和最敏感的战略和协议。在设计存储和共享收集到的信息的工具时，确保不同来源之间的互操作性和提供无缝的数据访问是必不可少的。在此，我们介绍了我们为长期生态系统研究(LTER)开发网络可访问资源的努力，考虑到现有的协议和接近适当的软件解决方案:i)收集和整合实时环境和生物观测与组学数据;ii)利用国际上已建立的数据格式和协议，通过RESTful api公开所收集的数据;Iii)通过交互式的、可通过网络访问的资源访问集合，以允许聚合视图。这项工作的目的是加强安东·多恩动物园作为地中海海洋观测站的领导地位，并为海洋生物学的下一个时代的挑战做好准备。

{"title":"Data sharing and interoperability from multi-source long term observations: challenges and opportunities in marine biology","authors":"Mara Sangiovanni, R. Piredda, Marco Miralto, M. Tangherlini, M. Chiusano","doi":"10.7287/peerj.preprints.27344v1","DOIUrl":"https://doi.org/10.7287/peerj.preprints.27344v1","url":null,"abstract":"Long-term observatories are widely used in marine sciences to monitor marine ecosystems and investigate their evolution. Recently, data from innovative technologies as well as ‘omics-based' approaches is being collected alongside physical, biogeochemical and taxonomic information. Their integration represents a challenging opportunity, pushing for suitable computational approaches to for data retrieval, storage, interoperability, reusability and sharing. Several initiatives are addressing these issues, suggesting the most appropriate and sensitive strategies and protocols. Ensuring interoperability among different sources and providing seamless data access is essential when designing tools to store and share the collected information.Here we present our effort in the development of web-accessible resources for Long-Term Ecosystem Research (LTER), taking into account available protocols and approaching appropriate software solutions for: i) collecting and integrating real-time environmental and biological observations with -omics data; ii) exploiting international established data formats and protocols to expose through RESTful APIs the collected data; iii) accessing the collections through an interactive, web-accessible resource to permit aggregated views.The aim of this effort is to reinforce the leadership of the Stazione Zoologica “Anton Dohrn” as a Mediterranean Sea marine observatory, and to be ready for the next era challenges in marine biology.","PeriodicalId":93040,"journal":{"name":"PeerJ preprints","volume":"1 1","pages":"e27344"},"PeriodicalIF":0.0,"publicationDate":"2018-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90880958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A false negative study of the steganalysis tool: Stegdetect 隐写分析工具的假阴性研究:隐写检测

PeerJ preprints

Pub Date : 2018-11-12 DOI: 10.7287/peerj.preprints.27339v1

B. Aziz, Jeyong Jung

Steganography and Steganalysis in recent years have become an important area of research involving dierent applications. Steganography is the process of hiding secret data into any digital media without any signicant notable changes in a cover object, while steganalysis is the process of detecting hiding content in the cover object. In this study, we evaluated one of the modern automated steganalysis tools, Stegdetect, to study its false negative rates when analysing a bulk of images. In so doing, we used JPHide method to embed a randomly generated messages into 2000 JPEG images. The aim of this study is to help digital forensics analysts during their investigations by means of providing an idea of the false negative rates of Stegdetect. This study found that (1) the false negative rates depended largely on the tool's sensitivity values, (2) the tool had a high false negative rate between the sensitivity values from 0.1 to 3.4 and (3) the best sensitivity value for detection of JPHide method was 6.2. It is recommended that when analysing a huge bulk of images forensic analysts need to take into consideration sensitivity values to reduce the false negative rates of Stegdetect.

隐写和隐写分析近年来已成为一个重要的研究领域，涉及不同的应用领域。隐写(Steganography)是将秘密数据隐藏到任何数字媒体中，而掩蔽对象没有任何显著变化的过程，而隐写分析(steganalysis)是检测掩蔽对象中隐藏内容的过程。在本研究中，我们评估了现代自动隐写分析工具之一，隐写检测，以研究其在分析大量图像时的假阴性率。在此过程中，我们使用JPHide方法将随机生成的消息嵌入到2000张JPEG图像中。本研究的目的是通过提供Stegdetect的假阴性率的概念，帮助数字取证分析人员进行调查。本研究发现:(1)假阴性率在很大程度上取决于工具的灵敏度值，(2)工具在0.1 ~ 3.4的灵敏度值之间具有较高的假阴性率，(3)JPHide方法检测的最佳灵敏度值为6.2。建议在分析大量图像时，法医分析人员需要考虑灵敏度值，以降低Stegdetect的假阴性率。

引用次数: 0

Eclipse CDT code analysis and unit testing Eclipse CDT代码分析和单元测试

PeerJ preprints

Pub Date : 2018-11-10 DOI: 10.7287/peerj.preprints.27350v1

Shaun C. D'Souza

In this paper we look at the Eclipse IDE and its support for CDT (C/C++ Development Tools). Eclipse is an open source IDE and supports a variety of programming languages including plugin functionality. Eclipse supports the standard GNU environment for compiling, building and debugging applications. The CDT is a plugin which enables development of C/C++ applications in eclipse. It enables functionality including code browsing, syntax highlighting and code completion. We verify a 50X improvement in LOC automation for Fake class .cpp / .h and class test .cpp code generation.

在本文中，我们将介绍Eclipse IDE及其对CDT (C/ c++开发工具)的支持。Eclipse是一个开源IDE，支持各种编程语言，包括插件功能。Eclipse支持用于编译、构建和调试应用程序的标准GNU环境。CDT是一个插件，它支持在eclipse中开发C/ c++应用程序。它支持的功能包括代码浏览、语法高亮显示和代码完成。我们验证了Fake类。cpp / .h和类测试。cpp代码生成的LOC自动化提高了50倍。

引用次数: 0

Whole yeast model: what and why 全酵母模型:什么和为什么

PeerJ preprints

Pub Date : 2018-11-07 DOI: 10.7287/peerj.preprints.27327v1

P. Palumbo, M. Vanoni, F. Papa, S. Busti, L. Alberghina

One of the most challenging fields in Life Science research is to deeply understand how complex cellular functions arise from the interactions of molecules in living cells. Mathematical and computational methods in Systems Biology are fundamental to study the complex molecular interactions within biological systems and to accelerate discoveries. Within this framework, a need exists to integrate different mathematical tools in order to develop quantitative models of entire organisms, i.e. whole-cell models. This note presents a first attempt to show the feasibility of such a task for the budding yeast Saccharomyces cerevisiae, a model organism for eukaryotic cells: the proposed model refers to the main cellular activities like metabolism, growth and cycle in a modular fashion, therefore allowing to treat them separately as single input/output modules, as well as to interconnect them in order to build the backbone of a coarse-grain whole cell model. The model modularity allows to substitute a low granularity module with one with a finer grain, whenever molecular details are required to correctly reproduce specific experiments. Furthermore, by properly setting the cellular division, simulations of cell populations are achieved, able to deal with protein distributions. Whole cell modeling will help understanding logic of cell resilience.

生命科学研究中最具挑战性的领域之一是深入了解活细胞中分子的相互作用如何产生复杂的细胞功能。系统生物学中的数学和计算方法是研究生物系统内复杂分子相互作用和加速发现的基础。在这个框架内，需要整合不同的数学工具，以便开发整个生物体的定量模型，即全细胞模型。本文首次尝试对真核细胞的模式生物酿酒酵母(Saccharomyces cerevisiae)展示这种任务的可行性:提出的模型指的是以模块化方式进行代谢、生长和循环等主要细胞活动，因此可以将它们单独视为单个输入/输出模块，并将它们相互连接以构建粗粒全细胞模型的主干。当需要正确重现特定实验的分子细节时，模型模块化允许用更细粒度的模块代替低粒度模块。此外，通过适当设置细胞分裂，实现了细胞群体的模拟，能够处理蛋白质分布。全细胞建模将有助于理解细胞弹性的逻辑。

{"title":"Whole yeast model: what and why","authors":"P. Palumbo, M. Vanoni, F. Papa, S. Busti, L. Alberghina","doi":"10.7287/peerj.preprints.27327v1","DOIUrl":"https://doi.org/10.7287/peerj.preprints.27327v1","url":null,"abstract":"One of the most challenging fields in Life Science research is to deeply understand how complex cellular functions arise from the interactions of molecules in living cells. Mathematical and computational methods in Systems Biology are fundamental to study the complex molecular interactions within biological systems and to accelerate discoveries. Within this framework, a need exists to integrate different mathematical tools in order to develop quantitative models of entire organisms, i.e. whole-cell models. This note presents a first attempt to show the feasibility of such a task for the budding yeast Saccharomyces cerevisiae, a model organism for eukaryotic cells: the proposed model refers to the main cellular activities like metabolism, growth and cycle in a modular fashion, therefore allowing to treat them separately as single input/output modules, as well as to interconnect them in order to build the backbone of a coarse-grain whole cell model. The model modularity allows to substitute a low granularity module with one with a finer grain, whenever molecular details are required to correctly reproduce specific experiments. Furthermore, by properly setting the cellular division, simulations of cell populations are achieved, able to deal with protein distributions. Whole cell modeling will help understanding logic of cell resilience.","PeriodicalId":93040,"journal":{"name":"PeerJ preprints","volume":"59 1","pages":"e27327"},"PeriodicalIF":0.0,"publicationDate":"2018-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84261659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A comprehensive RNA-Seq pipeline includes meta-analysis, interactivity and automatic reporting 一个全面的RNA-Seq管道包括元分析，交互性和自动报告

PeerJ preprints

Pub Date : 2018-11-05 DOI: 10.7287/peerj.preprints.27317v2

G. Spinozzi, V. Tini, Laura Mincarelli, B. Falini, M. Martelli

There are many methods available for each phase of the RNA-Seq analysis and each of them uses different algorithms. It is therefore useful to identify a pipeline that combines the best tools in terms of time and results. For this purpose, we compared five different pipelines, obtained by combining the most used tools in RNA-Seq analysis. Using RNA-Seq data on samples of different Acute Myeloid Leukemia (AML) cell lines, we compared five pipelines from the alignment to the differential expression analysis (DEA). For each one we evaluated the peak of RAM and time and then compared the differentially expressed genes identified by each pipeline. It emerged that the pipeline with shorter times, lower consumption of RAM and more reliable results, is that which involves the use ofHISAT2for alignment, featureCountsfor quantification and edgeRfor differential analysis. Finally, we developed an automated pipeline that recurs by default to the cited pipeline, but it also allows to choose between different tools. In addition, the pipeline makes a final meta-analysis that includes a Gene Ontology and Pathway analysis. The results can be viewed in an interactive Shiny Appand exported in a report (pdf, word or html formats).

RNA-Seq分析的每个阶段都有许多可用的方法，每种方法使用不同的算法。因此，确定在时间和结果方面结合了最佳工具的管道是有用的。为此，我们比较了五种不同的管道，通过结合RNA-Seq分析中最常用的工具获得。利用不同急性髓性白血病(AML)细胞系样本的RNA-Seq数据，我们比较了从比对到差异表达分析(DEA)的五种途径。对于每个管道，我们评估了RAM和时间的峰值，然后比较了每个管道鉴定的差异表达基因。结果表明，时间更短、内存消耗更低、结果更可靠的管道，包括使用hisat2进行校准、使用featurets进行量化和使用edge进行差异分析。最后，我们开发了一个自动管道，它在默认情况下重复引用的管道，但它也允许在不同的工具之间进行选择。此外，管道进行最后的元分析，包括基因本体和通路分析。结果可以在一个交互式的Shiny应用程序中查看，并导出为报告(pdf, word或html格式)。

{"title":"A comprehensive RNA-Seq pipeline includes meta-analysis, interactivity and automatic reporting","authors":"G. Spinozzi, V. Tini, Laura Mincarelli, B. Falini, M. Martelli","doi":"10.7287/peerj.preprints.27317v2","DOIUrl":"https://doi.org/10.7287/peerj.preprints.27317v2","url":null,"abstract":"There are many methods available for each phase of the RNA-Seq analysis and each of them uses different algorithms. It is therefore useful to identify a pipeline that combines the best tools in terms of time and results. For this purpose, we compared five different pipelines, obtained by combining the most used tools in RNA-Seq analysis. Using RNA-Seq data on samples of different Acute Myeloid Leukemia (AML) cell lines, we compared five pipelines from the alignment to the differential expression analysis (DEA). For each one we evaluated the peak of RAM and time and then compared the differentially expressed genes identified by each pipeline. It emerged that the pipeline with shorter times, lower consumption of RAM and more reliable results, is that which involves the use ofHISAT2for alignment, featureCountsfor quantification and edgeRfor differential analysis. Finally, we developed an automated pipeline that recurs by default to the cited pipeline, but it also allows to choose between different tools. In addition, the pipeline makes a final meta-analysis that includes a Gene Ontology and Pathway analysis. The results can be viewed in an interactive Shiny Appand exported in a report (pdf, word or html formats).","PeriodicalId":93040,"journal":{"name":"PeerJ preprints","volume":"7 1","pages":"e27317"},"PeriodicalIF":0.0,"publicationDate":"2018-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79726102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Identification of protein pockets and cavities by Euclidean Distance Transform 用欧几里得距离变换识别蛋白质袋和空腔

PeerJ preprints

Pub Date : 2018-11-01 DOI: 10.7287/peerj.preprints.27314v1

Sebastian Daberdaku

Protein pockets and cavities usually coincide with the active sites of biological processes, and their identification is significant since it constitutes an important step for structure-based drug design and protein-ligand docking applications. This research presents PoCavEDT, an automated purely geometric technique for the identification of binding pockets and occluded cavities in proteins based on the 3D Euclidean Distance Transform. Candidate protein pocket regions are identified between two Solvent-Excluded surfaces generated with the Euclidean Distance Transform using different probe spheres, which depend on the size of the binding ligand. The application of simple, yet effective geometrical heuristics ensures that the proposed method obtains very good ligand binding site prediction results. The method was applied to a representative set of protein-ligand complexes and their corresponding unbound protein structures to evaluate its ligand binding site prediction capabilities. Its performance was compared to the results achieved with several purely geometric pocket and cavity prediction methods, namely SURFNET, PASS, CAST, LIGSITE, LIGSITECS, PocketPicker and POCASA. Success rates PoCavEDT were comparable to those of POCASA and outperformed the other software.

蛋白质口袋和空腔通常与生物过程的活性位点重合，它们的识别是重要的，因为它是基于结构的药物设计和蛋白质配体对接应用的重要步骤。本研究提出了PoCavEDT，一种基于三维欧几里得距离变换的自动化纯几何技术，用于识别蛋白质中的结合口袋和闭塞腔。候选的蛋白质口袋区域在两个由欧几里得距离变换产生的溶剂排除表面之间被识别，使用不同的探针球，这取决于结合配体的大小。采用简单而有效的几何启发式方法，保证了所提出的方法能获得很好的配体结合位点预测结果。将该方法应用于一组具有代表性的蛋白质-配体复合物及其相应的未结合蛋白质结构，以评估其配体结合位点预测能力。将其性能与几种纯几何袋腔预测方法(SURFNET、PASS、CAST、LIGSITE、LIGSITECS、PocketPicker和POCASA)的结果进行了比较。PoCavEDT的成功率与POCASA相当，优于其他软件。

{"title":"Identification of protein pockets and cavities by Euclidean Distance Transform","authors":"Sebastian Daberdaku","doi":"10.7287/peerj.preprints.27314v1","DOIUrl":"https://doi.org/10.7287/peerj.preprints.27314v1","url":null,"abstract":"Protein pockets and cavities usually coincide with the active sites of biological processes, and their identification is significant since it constitutes an important step for structure-based drug design and protein-ligand docking applications. This research presents PoCavEDT, an automated purely geometric technique for the identification of binding pockets and occluded cavities in proteins based on the 3D Euclidean Distance Transform. Candidate protein pocket regions are identified between two Solvent-Excluded surfaces generated with the Euclidean Distance Transform using different probe spheres, which depend on the size of the binding ligand. The application of simple, yet effective geometrical heuristics ensures that the proposed method obtains very good ligand binding site prediction results. The method was applied to a representative set of protein-ligand complexes and their corresponding unbound protein structures to evaluate its ligand binding site prediction capabilities. Its performance was compared to the results achieved with several purely geometric pocket and cavity prediction methods, namely SURFNET, PASS, CAST, LIGSITE, LIGSITECS, PocketPicker and POCASA. Success rates PoCavEDT were comparable to those of POCASA and outperformed the other software.","PeriodicalId":93040,"journal":{"name":"PeerJ preprints","volume":"13 1","pages":"e27314"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84801515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

On the correlation between testing effort and software complexity metrics 关于测试工作和软件复杂性度量之间的关系

PeerJ preprints

Pub Date : 2018-10-31 DOI: 10.7287/peerj.preprints.27312v1

Adnan Muslija, Eduard Paul Enoiu

Software complexity metrics, such as code size and cyclomatic complexity, have been used in the software engineering community for predicting quality metrics such as maintainability, bug proneness and robustness. However, not many studies have addressed the relationship between complexity metrics and software testing and there is little experimental evidence to support the use of these code metrics in the estimation of test effort. We have investigated and evaluated the relationship between test effort (i.e, number of test cases and test execution time) and software complexity metrics for industrial control software used in an embedded system. We show how to measure different software complexity metrics such as number of elements, cyclomatic complexity, and information flow for a popular programming language named FBD used in the safety critical domain. In addition, we use test data and test suites created by experienced test engineers working at Bombardier Transportation Sweden AB to evaluate the correlation between several complexity measures and the testing effort. We found that there is a moderate correlation between software complexity metrics and test effort. In addition, the results show that the software size (i.e., number of elements in the FBD program) provides the highest correlation level with the number of test cases created and test execution time. Our results suggest that software size and structure metrics, while useful for identifying parts of the system that are more complicated, should not be solely used for identifying parts of the system for which test engineers might need to create more test cases. A potential explanation of this result concerns the nature of testing, since other attributes such as the level of thorough testing required and the size of the specifications can influence the creation of test cases. In addition, we used a linear regression model to estimate the test effort using the software complexity measurement results.

软件复杂性度量，例如代码大小和圈复杂度，已经在软件工程社区中用于预测质量度量，例如可维护性、bug倾向性和健壮性。然而，没有太多的研究处理复杂性度量和软件测试之间的关系，并且很少有实验证据支持在测试工作的估计中使用这些代码度量。我们已经调查并评估了在嵌入式系统中使用的工业控制软件的测试工作(例如，测试用例的数量和测试执行时间)和软件复杂性度量之间的关系。我们展示了如何度量不同的软件复杂性度量，例如在安全关键领域中使用的名为FBD的流行编程语言的元素数量、圈复杂度和信息流。此外，我们使用由庞巴迪运输瑞典公司经验丰富的测试工程师创建的测试数据和测试套件来评估几个复杂性度量和测试工作之间的相关性。我们发现在软件复杂性度量和测试工作之间存在适度的相关性。另外，结果显示软件大小(即，FBD程序中元素的数量)与创建的测试用例的数量和测试执行时间提供了最高的相关性。我们的结果表明，软件大小和结构度量，虽然对识别系统中更复杂的部分有用，但不应该仅仅用于识别测试工程师可能需要创建更多测试用例的系统部分。这一结果的潜在解释与测试的性质有关，因为其他属性，如所需的彻底测试的级别和规格说明的大小可以影响测试用例的创建。此外，我们使用线性回归模型来使用软件复杂性度量结果来估计测试工作量。

{"title":"On the correlation between testing effort and software complexity metrics","authors":"Adnan Muslija, Eduard Paul Enoiu","doi":"10.7287/peerj.preprints.27312v1","DOIUrl":"https://doi.org/10.7287/peerj.preprints.27312v1","url":null,"abstract":"Software complexity metrics, such as code size and cyclomatic complexity, have been used in the software engineering community for predicting quality metrics such as maintainability, bug proneness and robustness. However, not many studies have addressed the relationship between complexity metrics and software testing and there is little experimental evidence to support the use of these code metrics in the estimation of test effort. We have investigated and evaluated the relationship between test effort (i.e, number of test cases and test execution time) and software complexity metrics for industrial control software used in an embedded system. We show how to measure different software complexity metrics such as number of elements, cyclomatic complexity, and information flow for a popular programming language named FBD used in the safety critical domain. In addition, we use test data and test suites created by experienced test engineers working at Bombardier Transportation Sweden AB to evaluate the correlation between several complexity measures and the testing effort. We found that there is a moderate correlation between software complexity metrics and test effort. In addition, the results show that the software size (i.e., number of elements in the FBD program) provides the highest correlation level with the number of test cases created and test execution time. Our results suggest that software size and structure metrics, while useful for identifying parts of the system that are more complicated, should not be solely used for identifying parts of the system for which test engineers might need to create more test cases. A potential explanation of this result concerns the nature of testing, since other attributes such as the level of thorough testing required and the size of the specifications can influence the creation of test cases. In addition, we used a linear regression model to estimate the test effort using the software complexity measurement results.","PeriodicalId":93040,"journal":{"name":"PeerJ preprints","volume":"26 1","pages":"e27312"},"PeriodicalIF":0.0,"publicationDate":"2018-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83040043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Practical challenges for biomedical modeling using HPC 使用高性能计算进行生物医学建模的实际挑战

PeerJ preprints

Pub Date : 2018-10-26 DOI: 10.7287/peerj.preprints.27299v1

D. Wright, R. Richardson, P. Coveney

The concept underlying precision medicine is that prevention, diagnosis and treatment of pathologies such as cancer can be improved through an understanding of the influence of individual patient characteristics. Predictive medicine seeks to derive this understanding through mechanistic models of the causes and (potential) progression of diseases within a given individual. This represents a grand challenge for computational biomedicine as it requires the integration of highly varied (and potentially vast) quantitative experimental datasets into models of complex biological systems. It is becoming increasingly clear that this challenge can only be answered through the use of complex workflows that combine diverse analyses and whose design is informed by an understanding of how predictions must be accompanied by estimates of uncertainty. Each stage in such a workflow can, in general, have very different computational requirements. If funding bodies and the HPC community are serious about the desire to support such approaches, they must consider the need for portable, persistent and stable tools designed to promote extensive long term development and testing of these workflows. From the perspective of model developers (and with even greater relevance to potential clinical or experimental collaborators) the enormous diversity of interfaces and supercomputer policies, frequently designed with monolithic applications in mind, can represent a serious barrier to innovation. Here we use experiences from work on two very different biomedical modeling scenarios - brain bloodflow and small molecule drug selection - to highlight issues with the current programming and execution environments and suggest potential solutions.

精准医学的基本概念是，通过了解患者个体特征的影响，可以改善癌症等疾病的预防、诊断和治疗。预测医学试图通过对特定个体的病因和(潜在)疾病进展的机制模型来获得这种理解。这对计算生物医学来说是一个巨大的挑战，因为它需要将高度多样化(潜在巨大的)定量实验数据集整合到复杂生物系统的模型中。越来越清楚的是，这个挑战只能通过使用复杂的工作流来回答，这些工作流结合了不同的分析，并且其设计是通过理解预测必须如何伴随着不确定性的估计来告知的。通常，这种工作流中的每个阶段都有非常不同的计算需求。如果资助机构和高性能计算社区对支持这些方法的愿望是认真的，他们必须考虑对便携式、持久和稳定的工具的需求，这些工具旨在促进这些工作流的广泛长期开发和测试。从模型开发人员的角度来看(并且与潜在的临床或实验合作者有更大的相关性)，接口和超级计算机策略的巨大多样性，通常以单片应用程序设计，可能代表创新的严重障碍。在这里，我们利用两种截然不同的生物医学建模场景(脑血流和小分子药物选择)的工作经验来强调当前编程和执行环境中的问题，并提出潜在的解决方案。

{"title":"Practical challenges for biomedical modeling using HPC","authors":"D. Wright, R. Richardson, P. Coveney","doi":"10.7287/peerj.preprints.27299v1","DOIUrl":"https://doi.org/10.7287/peerj.preprints.27299v1","url":null,"abstract":"The concept underlying precision medicine is that prevention, diagnosis and treatment of pathologies such as cancer can be improved through an understanding of the influence of individual patient characteristics. Predictive medicine seeks to derive this understanding through mechanistic models of the causes and (potential) progression of diseases within a given individual. This represents a grand challenge for computational biomedicine as it requires the integration of highly varied (and potentially vast) quantitative experimental datasets into models of complex biological systems. It is becoming increasingly clear that this challenge can only be answered through the use of complex workflows that combine diverse analyses and whose design is informed by an understanding of how predictions must be accompanied by estimates of uncertainty. Each stage in such a workflow can, in general, have very different computational requirements. If funding bodies and the HPC community are serious about the desire to support such approaches, they must consider the need for portable, persistent and stable tools designed to promote extensive long term development and testing of these workflows. From the perspective of model developers (and with even greater relevance to potential clinical or experimental collaborators) the enormous diversity of interfaces and supercomputer policies, frequently designed with monolithic applications in mind, can represent a serious barrier to innovation. Here we use experiences from work on two very different biomedical modeling scenarios - brain bloodflow and small molecule drug selection - to highlight issues with the current programming and execution environments and suggest potential solutions.","PeriodicalId":93040,"journal":{"name":"PeerJ preprints","volume":"23 1","pages":"e27299"},"PeriodicalIF":0.0,"publicationDate":"2018-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85007659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hear and See: End-to-end sound classification and visualization of classified sounds 听和看:端到端的声音分类和可视化分类声音

PeerJ preprints

Pub Date : 2018-10-15 DOI: 10.7287/peerj.preprints.27280v1

Thomas Miano

Machine learning is a field of study that uses computational and statistical techniques to enable computers to learn. When machine learning is applied, it functions as an instrument that can solve problems or expand knowledge about the surrounding world. Increasingly, machine learning is also an instrument for artistic expression in digital and non-digital media. While painted art has existed for thousands of years, the oldest digital art is less than a century old. Digital media as an art form is a relatively nascent, and the practice of machine learning in digital art is even more recent. Across all artistic media, a piece is powerful when it can captivate its consumer. Such captivation can be elicited through through a wide variety of methods including but not limited to distinct technique, emotionally evocative communication, and aesthetically pleasing combinations of textures. This work aims to explore how machine learning can be used simultaneously as a scientific instrument for understanding the world and as an artistic instrument for inspiring awe. Specifically, our goal is to build an end-to-end system that uses modern machine learning techniques to accurately recognize sounds in the natural environment and to communicate via visualization those sounds that it has recognized. We validate existing research by finding that convolutional neural networks, when paired with transfer learning using out-of-domain data, can be successful in mapping an image classification task to a sound classification task. Our work offers a novel application where the model used for performant sound classification is also used for visualization in an end-to-end, sound-to-image system.

机器学习是一个研究领域，它使用计算和统计技术使计算机能够学习。当机器学习被应用时，它可以作为一种工具来解决问题或扩展对周围世界的知识。机器学习也越来越多地成为数字和非数字媒体艺术表达的工具。虽然绘画艺术已经存在了数千年，但最古老的数字艺术还不到一个世纪。数字媒体作为一种艺术形式是相对新生的，而机器学习在数字艺术中的实践更是最近才出现的。在所有的艺术媒介中，当一件作品能够吸引消费者时，它就是强大的。这种魅力可以通过各种各样的方法来激发，包括但不限于独特的技术，情感上唤起的交流，以及美学上令人愉悦的纹理组合。这项工作旨在探索如何将机器学习同时用作理解世界的科学工具和激发敬畏的艺术工具。具体来说，我们的目标是建立一个端到端的系统，该系统使用现代机器学习技术来准确识别自然环境中的声音，并通过可视化来交流它所识别的声音。我们通过发现卷积神经网络与使用域外数据的迁移学习配对，可以成功地将图像分类任务映射到声音分类任务，从而验证了现有的研究。我们的工作提供了一种新的应用，其中用于性能声音分类的模型也用于端到端声音到图像系统的可视化。

{"title":"Hear and See: End-to-end sound classification and visualization of classified sounds","authors":"Thomas Miano","doi":"10.7287/peerj.preprints.27280v1","DOIUrl":"https://doi.org/10.7287/peerj.preprints.27280v1","url":null,"abstract":"Machine learning is a field of study that uses computational and statistical techniques to enable computers to learn. When machine learning is applied, it functions as an instrument that can solve problems or expand knowledge about the surrounding world. Increasingly, machine learning is also an instrument for artistic expression in digital and non-digital media. While painted art has existed for thousands of years, the oldest digital art is less than a century old. Digital media as an art form is a relatively nascent, and the practice of machine learning in digital art is even more recent. Across all artistic media, a piece is powerful when it can captivate its consumer. Such captivation can be elicited through through a wide variety of methods including but not limited to distinct technique, emotionally evocative communication, and aesthetically pleasing combinations of textures. This work aims to explore how machine learning can be used simultaneously as a scientific instrument for understanding the world and as an artistic instrument for inspiring awe. Specifically, our goal is to build an end-to-end system that uses modern machine learning techniques to accurately recognize sounds in the natural environment and to communicate via visualization those sounds that it has recognized. We validate existing research by finding that convolutional neural networks, when paired with transfer learning using out-of-domain data, can be successful in mapping an image classification task to a sound classification task. Our work offers a novel application where the model used for performant sound classification is also used for visualization in an end-to-end, sound-to-image system.","PeriodicalId":93040,"journal":{"name":"PeerJ preprints","volume":"44 1","pages":"e27280"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87007210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3