首页 > 最新文献

Journal of Integrative Bioinformatics最新文献

英文 中文
Data literacy in genome research. 基因组研究中的数据素养。
IF 1.9 Q1 Medicine Pub Date : 2023-12-05 eCollection Date: 2023-12-01 DOI: 10.1515/jib-2023-0033
Katharina Wolff, Ronja Friedhoff, Friderieke Schwarzer, Boas Pucker

With an ever increasing amount of research data available, it becomes constantly more important to possess data literacy skills to benefit from this valuable resource. An integrative course was developed to teach students the fundamentals of data literacy through an engaging genome sequencing project. Each cohort of students performed planning of the experiment, DNA extraction, nanopore sequencing, genome sequence assembly, prediction of genes in the assembled sequence, and assignment of functional annotation terms to predicted genes. Students learned how to communicate science through writing a protocol in the form of a scientific paper, providing comments during a peer-review process, and presenting their findings as part of an international symposium. Many students enjoyed the opportunity to own a project and to work towards a meaningful objective.

随着可用的研究数据数量的不断增加,拥有数据素养技能以从这一宝贵资源中受益变得越来越重要。我们开发了一门综合课程,通过一个引人入胜的基因组测序项目来教授学生数据素养的基础知识。每组学生进行实验计划、DNA提取、纳米孔测序、基因组序列组装、组装序列中基因的预测以及对预测基因的功能注释项的分配。学生们通过以科学论文的形式撰写协议、在同行评审过程中提供评论以及在国际研讨会上展示他们的发现,学会了如何交流科学。许多学生都喜欢有机会拥有一个项目,并朝着一个有意义的目标努力。
{"title":"Data literacy in genome research.","authors":"Katharina Wolff, Ronja Friedhoff, Friderieke Schwarzer, Boas Pucker","doi":"10.1515/jib-2023-0033","DOIUrl":"10.1515/jib-2023-0033","url":null,"abstract":"<p><p>With an ever increasing amount of research data available, it becomes constantly more important to possess data literacy skills to benefit from this valuable resource. An integrative course was developed to teach students the fundamentals of data literacy through an engaging genome sequencing project. Each cohort of students performed planning of the experiment, DNA extraction, nanopore sequencing, genome sequence assembly, prediction of genes in the assembled sequence, and assignment of functional annotation terms to predicted genes. Students learned how to communicate science through writing a protocol in the form of a scientific paper, providing comments during a peer-review process, and presenting their findings as part of an international symposium. Many students enjoyed the opportunity to own a project and to work towards a meaningful objective.</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10777367/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138479289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accurate noise-robust classification of Bacillus species from MALDI-TOF MS spectra using a denoising autoencoder. 利用去噪自编码器对MALDI-TOF质谱中的芽孢杆菌进行精确的噪声鲁棒分类。
IF 1.9 Q1 Medicine Pub Date : 2023-11-20 eCollection Date: 2023-09-01 DOI: 10.1515/jib-2023-0017
Yulia E Uvarova, Pavel S Demenkov, Irina N Kuzmicheva, Artur S Venzel, Elena L Mischenko, Timofey V Ivanisenko, Vadim M Efimov, Svetlana V Bannikova, Asya R Vasilieva, Vladimir A Ivanisenko, Sergey E Peltek

Bacillus strains are ubiquitous in the environment and are widely used in the microbiological industry as valuable enzyme sources, as well as in agriculture to stimulate plant growth. The Bacillus genus comprises several closely related groups of species. The rapid classification of these remains challenging using existing methods. Techniques based on MALDI-TOF MS data analysis hold significant promise for fast and precise microbial strains classification at both the genus and species levels. In previous work, we proposed a geometric approach to Bacillus strain classification based on mass spectra analysis via the centroid method (CM). One limitation of such methods is the noise in MS spectra. In this study, we used a denoising autoencoder (DAE) to improve bacteria classification accuracy under noisy MS spectra conditions. We employed a denoising autoencoder approach to convert noisy MS spectra into latent variables representing molecular patterns in the original MS data, and the Random Forest method to classify bacterial strains by latent variables. Comparison of the DAE-RF with the CM method using the artificially noisy test samples showed that DAE-RF offers higher noise robustness. Hence, the DAE-RF method could be utilized for noise-robust, fast, and neat classification of Bacillus species according to MALDI-TOF MS data.

芽孢杆菌菌株在环境中无处不在,作为有价值的酶源被广泛应用于微生物工业,以及在农业中刺激植物生长。芽孢杆菌属包括几个密切相关的物种群。使用现有方法对这些疾病进行快速分类仍然具有挑战性。基于MALDI-TOF MS数据分析的技术在属和种水平上对微生物菌株进行快速和精确的分类具有重要的前景。在之前的工作中,我们提出了一种基于质心法(CM)质谱分析的芽孢杆菌菌株分类几何方法。这种方法的一个局限性是质谱中的噪声。在本研究中,我们使用去噪自编码器(DAE)来提高在有噪声的质谱条件下的细菌分类精度。我们采用去噪自编码器方法将有噪声的质谱转换为代表原始质谱数据中分子模式的潜在变量,并采用随机森林方法根据潜在变量对菌株进行分类。将DAE-RF与使用人工噪声测试样本的CM方法进行比较,结果表明DAE-RF具有更高的噪声鲁棒性。因此,根据MALDI-TOF MS数据,DAE-RF方法可以实现芽孢杆菌种类的抗噪、快速、整洁分类。
{"title":"Accurate noise-robust classification of Bacillus species from MALDI-TOF MS spectra using a denoising autoencoder.","authors":"Yulia E Uvarova, Pavel S Demenkov, Irina N Kuzmicheva, Artur S Venzel, Elena L Mischenko, Timofey V Ivanisenko, Vadim M Efimov, Svetlana V Bannikova, Asya R Vasilieva, Vladimir A Ivanisenko, Sergey E Peltek","doi":"10.1515/jib-2023-0017","DOIUrl":"10.1515/jib-2023-0017","url":null,"abstract":"<p><p>Bacillus strains are ubiquitous in the environment and are widely used in the microbiological industry as valuable enzyme sources, as well as in agriculture to stimulate plant growth. The Bacillus genus comprises several closely related groups of species. The rapid classification of these remains challenging using existing methods. Techniques based on MALDI-TOF MS data analysis hold significant promise for fast and precise microbial strains classification at both the genus and species levels. In previous work, we proposed a geometric approach to Bacillus strain classification based on mass spectra analysis via the centroid method (CM). One limitation of such methods is the noise in MS spectra. In this study, we used a denoising autoencoder (DAE) to improve bacteria classification accuracy under noisy MS spectra conditions. We employed a denoising autoencoder approach to convert noisy MS spectra into latent variables representing molecular patterns in the original MS data, and the Random Forest method to classify bacterial strains by latent variables. Comparison of the DAE-RF with the CM method using the artificially noisy test samples showed that DAE-RF offers higher noise robustness. Hence, the DAE-RF method could be utilized for noise-robust, fast, and neat classification of Bacillus species according to MALDI-TOF MS data.</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10757077/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136400294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reconstruction of the regulatory hypermethylation network controlling hepatocellular carcinoma development during hepatitis C viral infection. 丙型肝炎病毒感染期间控制肝细胞癌发展的调节性超甲基化网络的重建。
IF 1.9 Q1 Medicine Pub Date : 2023-11-20 eCollection Date: 2023-09-01 DOI: 10.1515/jib-2023-0013
Evgeniya A Antropova, Tamara M Khlebodarova, Pavel S Demenkov, Anastasiia R Volianskaia, Artur S Venzel, Nikita V Ivanisenko, Alexandr D Gavrilenko, Timofey V Ivanisenko, Anna V Adamovskaya, Polina M Revva, Nikolay A Kolchanov, Inna N Lavrik, Vladimir A Ivanisenko

Hepatocellular carcinoma (HCC) has been associated with hepatitis C viral (HCV) infection as a potential risk factor. Nonetheless, the precise genetic regulatory mechanisms triggered by the virus, leading to virus-induced hepatocarcinogenesis, remain unclear. We hypothesized that HCV proteins might modulate the activity of aberrantly methylated HCC genes through regulatory pathways. Virus-host regulatory pathways, interactions between proteins, gene expression, transport, and stability regulation, were reconstructed using the ANDSystem. Gene expression regulation was statistically significant. Gene network analysis identified four out of 70 HCC marker genes whose expression regulation by viral proteins may be associated with HCC: DNA-binding protein inhibitor ID - 1 (ID1), flap endonuclease 1 (FEN1), cyclin-dependent kinase inhibitor 2A (CDKN2A), and telomerase reverse transcriptase (TERT). It suggested the following viral protein effects in HCV/human protein heterocomplexes: HCV NS3(p70) protein activates human STAT3 and NOTC1; NS2-3(p23), NS5B(p68), NS1(E2), and core(p21) activate SETD2; NS5A inhibits SMYD3; and NS3 inhibits CCN2. Interestingly, NS3 and E1(gp32) activate c-Jun when it positively regulates CDKN2A and inhibit it when it represses TERT. The discovered regulatory mechanisms might be key areas of focus for creating medications and preventative therapies to decrease the likelihood of HCC development during HCV infection.

肝细胞癌(HCC)与丙型肝炎病毒(HCV)感染有关,是一种潜在的危险因素。尽管如此,病毒引发的导致病毒诱导肝癌发生的精确遗传调控机制仍不清楚。我们假设HCV蛋白可能通过调控途径调节异常甲基化HCC基因的活性。利用ANDSystem重建病毒-宿主调控途径、蛋白之间的相互作用、基因表达、运输和稳定性调控。基因表达调控差异有统计学意义。基因网络分析鉴定出70个HCC标志物基因中的4个,其表达被病毒蛋白调控可能与HCC相关:dna结合蛋白抑制剂ID - 1 (ID1)、皮瓣内切酶1 (FEN1)、细胞周期蛋白依赖性激酶抑制剂2A (CDKN2A)和端粒酶逆转录酶(TERT)。提示病毒蛋白在HCV/人蛋白异质复合物中的作用:HCV NS3(p70)蛋白激活人STAT3和NOTC1;NS2-3(p23)、NS5B(p68)、NS1(E2)和core(p21)激活SETD2;NS5A抑制SMYD3;NS3抑制CCN2。有趣的是,NS3和E1(gp32)在正向调节CDKN2A时激活c-Jun,而在抑制TERT时抑制c-Jun。发现的调节机制可能是开发药物和预防性疗法以降低HCV感染期间HCC发展可能性的关键领域。
{"title":"Reconstruction of the regulatory hypermethylation network controlling hepatocellular carcinoma development during hepatitis C viral infection.","authors":"Evgeniya A Antropova, Tamara M Khlebodarova, Pavel S Demenkov, Anastasiia R Volianskaia, Artur S Venzel, Nikita V Ivanisenko, Alexandr D Gavrilenko, Timofey V Ivanisenko, Anna V Adamovskaya, Polina M Revva, Nikolay A Kolchanov, Inna N Lavrik, Vladimir A Ivanisenko","doi":"10.1515/jib-2023-0013","DOIUrl":"10.1515/jib-2023-0013","url":null,"abstract":"<p><p>Hepatocellular carcinoma (HCC) has been associated with hepatitis C viral (HCV) infection as a potential risk factor. Nonetheless, the precise genetic regulatory mechanisms triggered by the virus, leading to virus-induced hepatocarcinogenesis, remain unclear. We hypothesized that HCV proteins might modulate the activity of aberrantly methylated HCC genes through regulatory pathways. Virus-host regulatory pathways, interactions between proteins, gene expression, transport, and stability regulation, were reconstructed using the ANDSystem. Gene expression regulation was statistically significant. Gene network analysis identified four out of 70 HCC marker genes whose expression regulation by viral proteins may be associated with HCC: <i>DNA-binding protein inhibitor ID - 1 (ID1)</i>, <i>flap endonuclease 1 (FEN1)</i>, <i>cyclin-dependent kinase inhibitor 2A (CDKN2A)</i>, and <i>telomerase reverse transcriptase (TERT)</i>. It suggested the following viral protein effects in HCV/human protein heterocomplexes: HCV NS3(p70) protein activates human STAT3 and NOTC1; NS2-3(p23), NS5B(p68), NS1(E2), and core(p21) activate SETD2; NS5A inhibits SMYD3; and NS3 inhibits CCN2. Interestingly, NS3 and E1(gp32) activate c-Jun when it positively regulates <i>CDKN2A</i> and inhibit it when it represses <i>TERT</i>. The discovered regulatory mechanisms might be key areas of focus for creating medications and preventative therapies to decrease the likelihood of HCC development during HCV infection.</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10757076/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136400296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BGRS: bioinformatics of genome regulation and data integration. 基因组调控与数据整合的生物信息学。
IF 1.9 Q1 Medicine Pub Date : 2023-11-16 eCollection Date: 2023-09-01 DOI: 10.1515/jib-2023-0032
Yuriy L Orlov, Ming Chen, Nikolay A Kolchanov, Ralf Hofestädt
{"title":"BGRS: bioinformatics of genome regulation and data integration.","authors":"Yuriy L Orlov, Ming Chen, Nikolay A Kolchanov, Ralf Hofestädt","doi":"10.1515/jib-2023-0032","DOIUrl":"10.1515/jib-2023-0032","url":null,"abstract":"","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2023-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10757072/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136400295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
STARGATE-X: a Python package for statistical analysis on the REACTOME network. STARGATE-X:一个用于REACTOME网络统计分析的Python包。
IF 1.9 Q1 Medicine Pub Date : 2023-09-21 eCollection Date: 2023-09-01 DOI: 10.1515/jib-2022-0029
Andrea Marino, Blerina Sinaimeri, Enrico Tronci, Tiziana Calamoneri

Many important aspects of biological knowledge at the molecular level can be represented by pathways. Through their analysis, we gain mechanistic insights and interpret lists of interesting genes from experiments (usually omics and functional genomic experiments). As a result, pathways play a central role in the development of bioinformatics methods and tools for computing predictions from known molecular-level mechanisms. Qualitative as well as quantitative knowledge about pathways can be effectively represented through biochemical networks linking the biochemical reactions and the compounds (e.g., proteins) occurring in the considered pathways. So, repositories providing biochemical networks for known pathways play a central role in bioinformatics and in systems biology. Here we focus on Reactome, a free, comprehensive, and widely used repository for biochemical networks and pathways. In this paper, we: (1) introduce a tool StARGate-X (STatistical Analysis of the Reactome multi-GrAph Through nEtworkX) to carry out an automated analysis of the connectivity properties of Reactome biochemical reaction network and of its biological hierarchy (i.e., cell compartments, namely, the closed parts within the cytosol, usually surrounded by a membrane); the code is freely available at https://github.com/marinoandrea/stargate-x; (2) show the effectiveness of our tool by providing an analysis of the Reactome network, in terms of centrality measures, with respect to in- and out-degree. As an example of usage of StARGate-X, we provide a detailed automated analysis of the Reactome network, in terms of centrality measures. We focus both on the subgraphs induced by single compartments and on the graph whose nodes are the strongly connected components. To the best of our knowledge, this is the first freely available tool that enables automatic analysis of the large biochemical network within Reactome through easy-to-use APIs (Application Programming Interfaces).

分子水平上生物学知识的许多重要方面可以用途径来表示。通过他们的分析,我们获得了机制上的见解,并解释了实验(通常是组学和功能基因组实验)中有趣的基因列表。因此,通路在生物信息学方法和工具的发展中发挥着核心作用,这些方法和工具用于从已知的分子水平机制计算预测。关于途径的定性和定量知识可以通过连接所考虑的途径中发生的生物化学反应和化合物(例如蛋白质)的生物化学网络来有效地表示。因此,为已知途径提供生物化学网络的存储库在生物信息学和系统生物学中发挥着核心作用。在这里,我们关注Reactome,一个免费、全面、广泛使用的生化网络和途径库。在本文中,我们:(1)介绍了一种工具StARGate-X(通过nEtorkX对反应体多GrAph的统计分析),用于对反应体生物化学反应网络的连接特性及其生物层次(即细胞隔室,即胞质溶胶内的封闭部分,通常被膜包围)进行自动分析;该代码可在https://github.com/marinoandrea/stargate-x;(2) 通过对Reactome网络的中心性度量以及输入和输出程度进行分析,展示了我们工具的有效性。作为StARGate-X使用的一个例子,我们提供了Reactome网络在中心性测量方面的详细自动化分析。我们既关注由单格诱导的子图,也关注其节点是强连通分量的图。据我们所知,这是第一个免费提供的工具,可以通过易于使用的API(应用程序编程接口)自动分析Reactome内的大型生化网络。
{"title":"STARGATE-X: a Python package for statistical analysis on the REACTOME network.","authors":"Andrea Marino, Blerina Sinaimeri, Enrico Tronci, Tiziana Calamoneri","doi":"10.1515/jib-2022-0029","DOIUrl":"10.1515/jib-2022-0029","url":null,"abstract":"<p><p>Many important aspects of biological knowledge at the molecular level can be represented by <i>pathways</i>. Through their analysis, we gain mechanistic insights and interpret lists of interesting genes from experiments (usually omics and functional genomic experiments). As a result, pathways play a central role in the development of bioinformatics methods and tools for computing predictions from known molecular-level mechanisms. Qualitative as well as quantitative knowledge about pathways can be effectively represented through <i>biochemical networks</i> linking the <i>biochemical reactions</i> and the compounds (<i>e.g.</i>, proteins) occurring in the considered pathways. So, repositories providing biochemical networks for known pathways play a central role in bioinformatics and in <i>systems biology</i>. Here we focus on Reactome, a free, comprehensive, and widely used repository for biochemical networks and pathways. In this paper, we: (1) introduce a tool StARGate-X (<i>STatistical Analysis of the</i> Reactome <i>multi-GrAph Through</i> nEtworkX) to carry out an automated analysis of the connectivity properties of Reactome biochemical reaction network and of its biological hierarchy (<i>i.e.</i>, cell compartments, namely, the closed parts within the cytosol, usually surrounded by a membrane); the code is freely available at https://github.com/marinoandrea/stargate-x; (2) show the effectiveness of our tool by providing an analysis of the Reactome network, in terms of centrality measures, with respect to in- and out-degree. As an example of usage of StARGate-X, we provide a detailed automated analysis of the Reactome network, in terms of centrality measures. We focus both on the subgraphs induced by single compartments and on the graph whose nodes are the strongly connected components. To the best of our knowledge, this is the first freely available tool that enables automatic analysis of the large biochemical network within Reactome through easy-to-use APIs (<i>Application Programming Interfaces</i>).</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2023-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10757075/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41168952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RNAcode_Web - Convenient identification of evolutionary conserved protein coding regions. RNAcode_Web - 方便地识别进化保守的蛋白质编码区。
IF 1.9 Q1 Medicine Pub Date : 2023-08-25 eCollection Date: 2023-09-01 DOI: 10.1515/jib-2022-0046
John Anders, Peter F Stadler

The differentiation of regions with coding potential from non-coding regions remains a key task in computational biology. Methods such as RNAcode that exploit patterns of sequence conservation for this task have a substantial advantage in classification accuracy in particular for short coding sequences, compared to methods that rely on a single input sequence. However, they require sequence alignments as input. Frequently, suitable multiple sequence alignments are not readily available and are tedious, and sometimes difficult to construct. We therefore introduce here a new web service that provides access to the well-known coding sequence detector RNAcode with minimal user overhead. It requires as input only a single target nucleotide sequence. The service automates the collection, selection, and preparation of homologous sequences from the NCBI database, as well as the construction of the multiple sequence alignment that are needed as input for RNAcode. The service automatizes the entire pre- and postprocessing and thus makes the investigation of specific genomic regions for previously unannotated coding regions, such as small peptides or additional introns, a simple task that is easily accessible to non-expert users. RNAcode_Web is accessible online at rnacode.bioinf.uni-leipzig.de.

区分具有编码潜力的区域与非编码区域仍然是计算生物学的一项关键任务。与依赖单一输入序列的方法相比,利用序列保持模式完成这项任务的方法(如 RNAcode)在分类准确性方面有很大优势,特别是对于短编码序列。不过,它们需要序列排列作为输入。通常,合适的多序列比对并不容易获得,而且非常繁琐,有时甚至难以构建。因此,我们在此介绍一种新的网络服务,它能以最小的用户开销访问著名的编码序列检测器 RNAcode。它只需要输入一个目标核苷酸序列。该服务可自动从 NCBI 数据库中收集、选择和准备同源序列,并构建 RNAcode 所需的多序列比对输入。该服务将整个前处理和后处理自动化,从而使研究特定基因组区域中以前未注明的编码区域(如小肽或额外的内含子)成为一项简单的任务,非专业用户也能轻松使用。RNAcode_Web 可通过 rnacode.bioinf.uni-leipzig.de 在线访问。
{"title":"RNAcode_Web - Convenient identification of evolutionary conserved protein coding regions.","authors":"John Anders, Peter F Stadler","doi":"10.1515/jib-2022-0046","DOIUrl":"10.1515/jib-2022-0046","url":null,"abstract":"<p><p>The differentiation of regions with coding potential from non-coding regions remains a key task in computational biology. Methods such as RNAcode that exploit patterns of sequence conservation for this task have a substantial advantage in classification accuracy in particular for short coding sequences, compared to methods that rely on a single input sequence. However, they require sequence alignments as input. Frequently, suitable multiple sequence alignments are not readily available and are tedious, and sometimes difficult to construct. We therefore introduce here a new web service that provides access to the well-known coding sequence detector RNAcode with minimal user overhead. It requires as input only a single target nucleotide sequence. The service automates the collection, selection, and preparation of homologous sequences from the NCBI database, as well as the construction of the multiple sequence alignment that are needed as input for RNAcode. The service automatizes the entire pre- and postprocessing and thus makes the investigation of specific genomic regions for previously unannotated coding regions, such as small peptides or additional introns, a simple task that is easily accessible to non-expert users. RNAcode_Web is accessible online at rnacode.bioinf.uni-leipzig.de.</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2023-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10757073/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10057634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SnakeLines: integrated set of computational pipelines for sequencing reads. SnakeLines:用于测序读数的集成计算管道集。
IF 1.9 Q1 Medicine Pub Date : 2023-08-21 eCollection Date: 2023-09-01 DOI: 10.1515/jib-2022-0059
Jaroslav Budiš, Werner Krampl, Marcel Kucharík, Rastislav Hekel, Adrián Goga, Jozef Sitarčík, Michal Lichvár, Dávid Smol'ak, Miroslav Böhmer, Andrej Baláž, František Ďuriš, Juraj Gazdarica, Katarína Šoltys, Ján Turňa, Ján Radvánszky, Tomáš Szemes

With the rapid growth of massively parallel sequencing technologies, still more laboratories are utilising sequenced DNA fragments for genomic analyses. Interpretation of sequencing data is, however, strongly dependent on bioinformatics processing, which is often too demanding for clinicians and researchers without a computational background. Another problem represents the reproducibility of computational analyses across separated computational centres with inconsistent versions of installed libraries and bioinformatics tools. We propose an easily extensible set of computational pipelines, called SnakeLines, for processing sequencing reads; including mapping, assembly, variant calling, viral identification, transcriptomics, and metagenomics analysis. Individual steps of an analysis, along with methods and their parameters can be readily modified in a single configuration file. Provided pipelines are embedded in virtual environments that ensure isolation of required resources from the host operating system, rapid deployment, and reproducibility of analysis across different Unix-based platforms. SnakeLines is a powerful framework for the automation of bioinformatics analyses, with emphasis on a simple set-up, modifications, extensibility, and reproducibility. The framework is already routinely used in various research projects and their applications, especially in the Slovak national surveillance of SARS-CoV-2.

随着大规模并行测序技术的快速发展,越来越多的实验室开始利用测序 DNA 片段进行基因组分析。然而,测序数据的解读在很大程度上依赖于生物信息学处理,这对于没有计算背景的临床医生和研究人员来说往往要求过高。另一个问题是,不同的计算中心安装的文库和生物信息学工具版本不一致,计算分析的可重复性也存在问题。我们提出了一套易于扩展的计算管道,称为 "SnakeLines",用于处理测序读数,包括映射、组装、变异调用、病毒识别、转录组学和元基因组学分析。分析的各个步骤、方法及其参数可在单个配置文件中轻松修改。所提供的流水线被嵌入虚拟环境中,确保所需资源与主机操作系统隔离、快速部署以及在不同的 Unix 平台上进行分析的可重复性。SnakeLines 是一个功能强大的生物信息学自动化分析框架,强调简单的设置、修改、可扩展性和可重复性。该框架已在多个研究项目及其应用中得到常规使用,特别是在斯洛伐克的 SARS-CoV-2 国家监测中。
{"title":"SnakeLines: integrated set of computational pipelines for sequencing reads.","authors":"Jaroslav Budiš, Werner Krampl, Marcel Kucharík, Rastislav Hekel, Adrián Goga, Jozef Sitarčík, Michal Lichvár, Dávid Smol'ak, Miroslav Böhmer, Andrej Baláž, František Ďuriš, Juraj Gazdarica, Katarína Šoltys, Ján Turňa, Ján Radvánszky, Tomáš Szemes","doi":"10.1515/jib-2022-0059","DOIUrl":"10.1515/jib-2022-0059","url":null,"abstract":"<p><p>With the rapid growth of massively parallel sequencing technologies, still more laboratories are utilising sequenced DNA fragments for genomic analyses. Interpretation of sequencing data is, however, strongly dependent on bioinformatics processing, which is often too demanding for clinicians and researchers without a computational background. Another problem represents the reproducibility of computational analyses across separated computational centres with inconsistent versions of installed libraries and bioinformatics tools. We propose an easily extensible set of computational pipelines, called SnakeLines, for processing sequencing reads; including mapping, assembly, variant calling, viral identification, transcriptomics, and metagenomics analysis. Individual steps of an analysis, along with methods and their parameters can be readily modified in a single configuration file. Provided pipelines are embedded in virtual environments that ensure isolation of required resources from the host operating system, rapid deployment, and reproducibility of analysis across different Unix-based platforms. SnakeLines is a powerful framework for the automation of bioinformatics analyses, with emphasis on a simple set-up, modifications, extensibility, and reproducibility. The framework is already routinely used in various research projects and their applications, especially in the Slovak national surveillance of SARS-CoV-2.</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2023-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10757078/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10089530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Concentration of inverted repeats along human DNA. 人类DNA中反向重复序列的浓度。
IF 1.9 Q1 Medicine Pub Date : 2023-07-25 eCollection Date: 2023-06-01 DOI: 10.1515/jib-2022-0052
Carlos A C Bastos, Vera Afreixo, João M O S Rodrigues, Armando J Pinho

This work aims to describe the observed enrichment of inverted repeats in the human genome; and to identify and describe, with detailed length profiles, the regions with significant and relevant enriched occurrence of inverted repeats. The enrichment is assessed and tested with a recently proposed measure (z-scores based measure). We simulate a genome using an order 7 Markov model trained with the data from the real genome. The simulated genome is used to establish the critical values which are used as decision thresholds to identify the regions with significant enriched concentrations. Several human genome regions are highly enriched in the occurrence of inverted repeats. This is observed in all the human chromosomes. The distribution of inverted repeat lengths varies along the genome. The majority of the regions with severely exaggerated enrichment contain mainly short length inverted repeats. There are also regions with regular peaks along the inverted repeats lengths distribution (periodic regularities) and other regions with exaggerated enrichment for long lengths (less frequent). However, adjacent regions tend to have similar distributions.

这项工作旨在描述在人类基因组中观察到的反向重复序列的富集;并用详细的长度剖面来识别和描述反向重复出现显著和相关富集的区域。利用最近提出的测量(基于z分数的测量)来评估和测试富集度。我们使用用来自真实基因组的数据训练的7阶马尔可夫模型来模拟基因组。模拟基因组用于建立临界值,该临界值用作识别具有显著富集浓度的区域的决策阈值。几个人类基因组区域在反向重复的出现中高度富集。这在所有人类染色体中都可以观察到。反向重复长度的分布在基因组中各不相同。富集程度严重夸大的大多数区域主要包含短长度的反向重复序列。沿着反向重复长度分布(周期性规律),也有具有规则峰值的区域,以及对于长长度具有夸大富集的其他区域(不太频繁)。然而,相邻区域往往具有相似的分布。
{"title":"Concentration of inverted repeats along human DNA.","authors":"Carlos A C Bastos,&nbsp;Vera Afreixo,&nbsp;João M O S Rodrigues,&nbsp;Armando J Pinho","doi":"10.1515/jib-2022-0052","DOIUrl":"10.1515/jib-2022-0052","url":null,"abstract":"<p><p>This work aims to describe the observed enrichment of inverted repeats in the human genome; and to identify and describe, with detailed length profiles, the regions with significant and relevant enriched occurrence of inverted repeats. The enrichment is assessed and tested with a recently proposed measure (<i>z</i>-scores based measure). We simulate a genome using an order 7 Markov model trained with the data from the real genome. The simulated genome is used to establish the critical values which are used as decision thresholds to identify the regions with significant enriched concentrations. Several human genome regions are highly enriched in the occurrence of inverted repeats. This is observed in all the human chromosomes. The distribution of inverted repeat lengths varies along the genome. The majority of the regions with severely exaggerated enrichment contain mainly short length inverted repeats. There are also regions with regular peaks along the inverted repeats lengths distribution (periodic regularities) and other regions with exaggerated enrichment for long lengths (less frequent). However, adjacent regions tend to have similar distributions.</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2023-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10561070/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9895627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrating omics databases for enhanced crop breeding. 整合 omics 数据库,促进作物育种。
IF 1.9 Q1 Medicine Pub Date : 2023-07-25 eCollection Date: 2023-12-01 DOI: 10.1515/jib-2023-0012
Haoyu Chao, Shilong Zhang, Yueming Hu, Qingyang Ni, Saige Xin, Liang Zhao, Vladimir A Ivanisenko, Yuriy L Orlov, Ming Chen

Crop plant breeding involves selecting and developing new plant varieties with desirable traits such as increased yield, improved disease resistance, and enhanced nutritional value. With the development of high-throughput technologies, such as genomics, transcriptomics, and metabolomics, crop breeding has entered a new era. However, to effectively use these technologies, integration of multi-omics data from different databases is required. Integration of omics data provides a comprehensive understanding of the biological processes underlying plant traits and their interactions. This review highlights the importance of integrating omics databases in crop plant breeding, discusses available omics data and databases, describes integration challenges, and highlights recent developments and potential benefits. Taken together, the integration of omics databases is a critical step towards enhancing crop plant breeding and improving global food security.

作物育种包括选择和开发具有理想性状的植物新品种,如增加产量、提高抗病性和营养价值。随着基因组学、转录组学和代谢组学等高通量技术的发展,作物育种进入了一个新时代。然而,要有效利用这些技术,需要整合来自不同数据库的多组学数据。通过整合组学数据,可以全面了解植物性状背后的生物过程及其相互作用。本综述强调了在农作物育种中整合 omics 数据库的重要性,讨论了可用的 omics 数据和数据库,介绍了整合面临的挑战,并重点介绍了最新进展和潜在效益。总之,整合表型组学数据库是加强作物育种和提高全球粮食安全的关键一步。
{"title":"Integrating omics databases for enhanced crop breeding.","authors":"Haoyu Chao, Shilong Zhang, Yueming Hu, Qingyang Ni, Saige Xin, Liang Zhao, Vladimir A Ivanisenko, Yuriy L Orlov, Ming Chen","doi":"10.1515/jib-2023-0012","DOIUrl":"10.1515/jib-2023-0012","url":null,"abstract":"<p><p>Crop plant breeding involves selecting and developing new plant varieties with desirable traits such as increased yield, improved disease resistance, and enhanced nutritional value. With the development of high-throughput technologies, such as genomics, transcriptomics, and metabolomics, crop breeding has entered a new era. However, to effectively use these technologies, integration of multi-omics data from different databases is required. Integration of omics data provides a comprehensive understanding of the biological processes underlying plant traits and their interactions. This review highlights the importance of integrating omics databases in crop plant breeding, discusses available omics data and databases, describes integration challenges, and highlights recent developments and potential benefits. Taken together, the integration of omics databases is a critical step towards enhancing crop plant breeding and improving global food security.</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2023-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10777369/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9912715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhanced identification of membrane transport proteins: a hybrid approach combining ProtBERT-BFD and convolutional neural networks. 增强膜转运蛋白的识别:结合ProtBERT-BFD和卷积神经网络的混合方法。
IF 1.9 Q1 Medicine Pub Date : 2023-06-01 DOI: 10.1515/jib-2022-0055
Hamed Ghazikhani, Gregory Butler

Transmembrane transport proteins (transporters) play a crucial role in the fundamental cellular processes of all organisms by facilitating the transport of hydrophilic substrates across hydrophobic membranes. Despite the availability of numerous membrane protein sequences, their structures and functions remain largely elusive. Recently, natural language processing (NLP) techniques have shown promise in the analysis of protein sequences. Bidirectional Encoder Representations from Transformers (BERT) is an NLP technique adapted for proteins to learn contextual embeddings of individual amino acids within a protein sequence. Our previous strategy, TooT-BERT-T, differentiated transporters from non-transporters by employing a logistic regression classifier with fine-tuned representations from ProtBERT-BFD. In this study, we expand upon this approach by utilizing representations from ProtBERT, ProtBERT-BFD, and MembraneBERT in combination with classical classifiers. Additionally, we introduce TooT-BERT-CNN-T, a novel method that fine-tunes ProtBERT-BFD and discriminates transporters using a Convolutional Neural Network (CNN). Our experimental results reveal that CNN surpasses traditional classifiers in discriminating transporters from non-transporters, achieving an MCC of 0.89 and an accuracy of 95.1 % on the independent test set. This represents an improvement of 0.03 and 1.11 percentage points compared to TooT-BERT-T, respectively.

跨膜转运蛋白(transporter)通过促进亲水性底物在疏水膜上的转运,在所有生物体的基本细胞过程中起着至关重要的作用。尽管有许多膜蛋白序列,但它们的结构和功能在很大程度上仍然难以捉摸。近年来,自然语言处理(NLP)技术在蛋白质序列分析中显示出了良好的前景。BERT (Bidirectional Encoder Representations from Transformers)是一种用于蛋白质学习蛋白质序列中单个氨基酸的上下文嵌入的NLP技术。我们之前的策略TooT-BERT-T通过使用逻辑回归分类器和ProtBERT-BFD的微调表示来区分转运蛋白和非转运蛋白。在本研究中,我们通过利用ProtBERT、ProtBERT- bfd和膜伯特的表示与经典分类器相结合,扩展了这种方法。此外,我们介绍了TooT-BERT-CNN-T,这是一种使用卷积神经网络(CNN)微调ProtBERT-BFD并区分转运体的新方法。我们的实验结果表明,CNN在区分转运蛋白和非转运蛋白方面优于传统分类器,在独立测试集上实现了0.89的MCC和95.1% %的准确率。与TooT-BERT-T相比,这分别提高了0.03和1.11个百分点。
{"title":"Enhanced identification of membrane transport proteins: a hybrid approach combining ProtBERT-BFD and convolutional neural networks.","authors":"Hamed Ghazikhani,&nbsp;Gregory Butler","doi":"10.1515/jib-2022-0055","DOIUrl":"https://doi.org/10.1515/jib-2022-0055","url":null,"abstract":"<p><p>Transmembrane transport proteins (transporters) play a crucial role in the fundamental cellular processes of all organisms by facilitating the transport of hydrophilic substrates across hydrophobic membranes. Despite the availability of numerous membrane protein sequences, their structures and functions remain largely elusive. Recently, natural language processing (NLP) techniques have shown promise in the analysis of protein sequences. Bidirectional Encoder Representations from Transformers (BERT) is an NLP technique adapted for proteins to learn contextual embeddings of individual amino acids within a protein sequence. Our previous strategy, TooT-BERT-T, differentiated transporters from non-transporters by employing a logistic regression classifier with fine-tuned representations from ProtBERT-BFD. In this study, we expand upon this approach by utilizing representations from ProtBERT, ProtBERT-BFD, and MembraneBERT in combination with classical classifiers. Additionally, we introduce TooT-BERT-CNN-T, a novel method that fine-tunes ProtBERT-BFD and discriminates transporters using a Convolutional Neural Network (CNN). Our experimental results reveal that CNN surpasses traditional classifiers in discriminating transporters from non-transporters, achieving an MCC of 0.89 and an accuracy of 95.1 % on the independent test set. This represents an improvement of 0.03 and 1.11 percentage points compared to TooT-BERT-T, respectively.</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10389051/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9925128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Integrative Bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1