IEEE/ACM Transactions on Computational Biology and Bioinformatics最新文献_第7页

AirLift: A Fast and Comprehensive Technique for Remapping Alignments between Reference Genomes. AirLift：快速、全面的参考基因组间重配技术。

IF 3.6 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Pub Date : 2024-08-19 DOI: 10.1109/TCBB.2024.3433378

Jeremie S Kim, Can Firtina, Meryem Banu Cavlak, Damla Senol Cali, Nastaran Hajinazar, Mohammed Alser, Can Alkan, Onur Mutlu

AirLift is the first read remapping tool that enables users to quickly and comprehensively map a read set, that had been previously mapped to one reference genome, to another similar reference. Users can then quickly run a downstream analysis of read sets for each latest reference release. Compared to the state-of-the-art method for remapping reads (i.e., full mapping), AirLift reduces the overall execution time to remap read sets between two reference genome versions by up to 27.4×. We validate our remapping results with GATK and find that AirLift provides high accuracy in identifying ground truth SNP/INDEL variants.

AirLift 是第一款读数重映射工具，用户可以将以前映射到一个参考基因组的读数集快速、全面地映射到另一个类似的参考基因组。然后，用户可以针对每个最新发布的参考文献快速运行读数集下游分析。与最先进的读数重映射方法（即完全映射）相比，AirLift 将两个参考基因组版本之间的读数集重映射的总体执行时间缩短了 27.4 倍。我们用 GATK 验证了我们的重映射结果，发现 AirLift 在识别地面实况 SNP/INDEL 变异方面具有很高的准确性。

引用次数: 0

TeaTFactor: A Prediction Tool for Tea Plant Transcription Factors Based on BERT TeaTFactor：基于BERT的茶树转录因子预测工具。

IF 3.6 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Pub Date : 2024-08-16 DOI: 10.1109/TCBB.2024.3444466

Qinan Tang;Ying Xiang;Wanling Gao;Liqiang Zhu;Zishu Xu;Yeyun Li;Zhenyu Yue

A transcription factor (TF) is a sequence-specific DNA-binding protein, which plays key roles in cell-fate decision by regulating gene expression. Predicting TFs is key for tea plant research community, as they regulate gene expression, influencing plant growth, development, and stress responses. It is a challenging task through wet lab experimental validation, due to their rarity, as well as the high cost and time requirements. As a result, computational methods are increasingly popular to be chosen. The pre-training strategy has been applied to many tasks in natural language processing (NLP) and has achieved impressive performance. In this paper, we present a novel recognition algorithm named TeaTFactor that utilizes pre-training for the model training of TFs prediction. The model is built upon the BERT architecture, initially pre-trained using protein data from UniProt. Subsequently, the model was fine-tuned using the collected TFs data of tea plants. We evaluated four different word segmentation methods and the existing state-of-the-art prediction tools. According to the comprehensive experimental results and a case study, our model is superior to existing models and achieves the goal of accurate identification. In addition, we have developed a web server at http://teatfactor.tlds.cc, which we believe will facilitate future studies on tea transcription factors and advance the field of crop synthetic biology.

转录因子（TF）是一种序列特异的 DNA 结合蛋白，通过调控基因表达在细胞命运决定中发挥关键作用。转录因子调控基因表达，影响植物的生长、发育和胁迫反应，因此预测转录因子是茶叶植物研究界的关键。由于其稀有性、高成本和时间要求，通过湿实验室实验验证是一项具有挑战性的任务。因此，越来越多的人选择计算方法。预训练策略已被应用到自然语言处理（NLP）的许多任务中，并取得了令人瞩目的成绩。在本文中，我们提出了一种名为 TeaTFactor 的新型识别算法，它利用预训练来进行 TFs 预测的模型训练。该模型基于 BERT 架构，最初使用 UniProt 中的蛋白质数据进行预训练。随后，利用收集到的茶树 TFs 数据对模型进行了微调。我们评估了四种不同的单词分割方法和现有的最先进预测工具。根据综合实验结果和案例研究，我们的模型优于现有模型，实现了准确识别的目标。此外，我们还在 http://teatfactor.tlds.cc 网站上开发了一个网络服务器，相信这将有助于今后对茶叶转录因子的研究，并推动作物合成生物学领域的发展。

{"title":"TeaTFactor: A Prediction Tool for Tea Plant Transcription Factors Based on BERT","authors":"Qinan Tang;Ying Xiang;Wanling Gao;Liqiang Zhu;Zishu Xu;Yeyun Li;Zhenyu Yue","doi":"10.1109/TCBB.2024.3444466","DOIUrl":"10.1109/TCBB.2024.3444466","url":null,"abstract":"A transcription factor (TF) is a sequence-specific DNA-binding protein, which plays key roles in cell-fate decision by regulating gene expression. Predicting TFs is key for tea plant research community, as they regulate gene expression, influencing plant growth, development, and stress responses. It is a challenging task through wet lab experimental validation, due to their rarity, as well as the high cost and time requirements. As a result, computational methods are increasingly popular to be chosen. The pre-training strategy has been applied to many tasks in natural language processing (NLP) and has achieved impressive performance. In this paper, we present a novel recognition algorithm named TeaTFactor that utilizes pre-training for the model training of TFs prediction. The model is built upon the BERT architecture, initially pre-trained using protein data from UniProt. Subsequently, the model was fine-tuned using the collected TFs data of tea plants. We evaluated four different word segmentation methods and the existing state-of-the-art prediction tools. According to the comprehensive experimental results and a case study, our model is superior to existing models and achieves the goal of accurate identification. In addition, we have developed a web server at \u0000<uri>http://teatfactor.tlds.cc</uri>\u0000, which we believe will facilitate future studies on tea transcription factors and advance the field of crop synthetic biology.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2123-2132"},"PeriodicalIF":3.6,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141992290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Joint Extraction of Biomedical Events Based on Dynamic Path Planning Strategy and Hybrid Neural Network 基于动态路径规划策略和混合神经网络的生物医学事件联合提取。

IF 3.6 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Pub Date : 2024-08-13 DOI: 10.1109/TCBB.2024.3442199

Xinyu He;Yujie Tang;Bo Yu;Shixin Li;Yonggong Ren

Biomedical event detection is a pivotal information extraction task in molecular biology and biomedical research, which provides inspiration for the medical search, disease prevention, and new drug development. The existing methods usually detect simple biomedical events and complex events with the same model, and the performance of the complex biomedical event extraction is relatively low. In this paper, we build different neural networks for simple and complex events respectively, which helps to promote the performance of complex event extraction. To avoid redundant information, we design dynamic path planning strategy for argument detection. To take full use of the information between the trigger identification and argument detection subtasks, and reduce the cascading errors, we build a joint event extraction model. Experimental results demonstrate our approach achieves the best F-score on the biomedical benchmark MLEE dataset and outperforms the recent state-of-the-art methods.

生物医学事件检测是分子生物学和生物医学研究中一项关键的信息提取任务，它为医学搜索、疾病预防和新药开发提供了灵感。现有方法通常用同一模型检测简单生物医学事件和复杂事件，复杂生物医学事件提取的性能相对较低。本文针对简单事件和复杂事件分别构建了不同的神经网络，有助于提高复杂事件提取的性能。为了避免冗余信息，我们设计了用于参数检测的动态路径规划策略。为了充分利用触发器识别和参数检测子任务之间的信息，减少级联错误，我们建立了一个联合事件提取模型。实验结果表明，我们的方法在生物医学基准 MLEE 数据集上取得了最佳 F-score，优于最新的先进方法。

引用次数: 0

Deep Learning in Gene Regulatory Network Inference: A Survey 基因调控网络推断中的深度学习：调查。

IF 3.6 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Pub Date : 2024-08-13 DOI: 10.1109/TCBB.2024.3442536

Jiayi Dong;Jiahao Li;Fei Wang

Understanding the intricate regulatory relationships among genes is crucial for comprehending the development, differentiation, and cellular response in living systems. Consequently, inferring gene regulatory networks (GRNs) based on observed data has gained significant attention as a fundamental goal in biological applications. The proliferation and diversification of available data present both opportunities and challenges in accurately inferring GRNs. Deep learning, a highly successful technique in various domains, holds promise in aiding GRN inference. Several GRN inference methods employing deep learning models have been proposed; however, the selection of an appropriate method remains a challenge for life scientists. In this survey, we provide a comprehensive analysis of 12 GRN inference methods that leverage deep learning models. We trace the evolution of these major methods and categorize them based on the types of applicable data. We delve into the core concepts and specific steps of each method, offering a detailed evaluation of their effectiveness and scalability across different scenarios. These insights enable us to make informed recommendations. Moreover, we explore the challenges faced by GRN inference methods utilizing deep learning and discuss future directions, providing valuable suggestions for the advancement of data scientists in this field.

了解基因之间错综复杂的调控关系对于理解生命系统的发育、分化和细胞反应至关重要。因此，根据观察到的数据推断基因调控网络（GRN）作为生物应用中的一个基本目标受到了广泛关注。可用数据的激增和多样化为准确推断基因调控网络带来了机遇和挑战。深度学习是一种在多个领域都非常成功的技术，有望为 GRN 推断提供帮助。目前已经提出了几种采用深度学习模型的 GRN 推断方法；然而，对于生命科学家来说，如何选择合适的方法仍然是一个挑战。在本调查中，我们全面分析了 12 种利用深度学习模型的 GRN 推断方法。我们追溯了这些主要方法的演变过程，并根据适用数据的类型对它们进行了分类。我们深入探讨了每种方法的核心概念和具体步骤，详细评估了它们在不同场景下的有效性和可扩展性。这些见解使我们能够提出明智的建议。此外，我们还探讨了利用深度学习的 GRN 推理方法所面临的挑战，并讨论了未来的发展方向，为数据科学家在这一领域的进步提供了宝贵的建议。

{"title":"Deep Learning in Gene Regulatory Network Inference: A Survey","authors":"Jiayi Dong;Jiahao Li;Fei Wang","doi":"10.1109/TCBB.2024.3442536","DOIUrl":"10.1109/TCBB.2024.3442536","url":null,"abstract":"Understanding the intricate regulatory relationships among genes is crucial for comprehending the development, differentiation, and cellular response in living systems. Consequently, inferring gene regulatory networks (GRNs) based on observed data has gained significant attention as a fundamental goal in biological applications. The proliferation and diversification of available data present both opportunities and challenges in accurately inferring GRNs. Deep learning, a highly successful technique in various domains, holds promise in aiding GRN inference. Several GRN inference methods employing deep learning models have been proposed; however, the selection of an appropriate method remains a challenge for life scientists. In this survey, we provide a comprehensive analysis of 12 GRN inference methods that leverage deep learning models. We trace the evolution of these major methods and categorize them based on the types of applicable data. We delve into the core concepts and specific steps of each method, offering a detailed evaluation of their effectiveness and scalability across different scenarios. These insights enable us to make informed recommendations. Moreover, we explore the challenges faced by GRN inference methods utilizing deep learning and discuss future directions, providing valuable suggestions for the advancement of data scientists in this field.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2089-2101"},"PeriodicalIF":3.6,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141975613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Constrained Pseudo-Time Ordering for Clinical Transcriptomics Data 临床转录组学数据的受限伪时间排序

IF 3.6 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Pub Date : 2024-08-13 DOI: 10.1109/TCBB.2024.3442669

Sachin Mathur;Hamid Mattoo;Ziv Bar-Joseph

Time series RNASeq studies can enable understanding of the dynamics of disease progression and treatment response in patients. They also provide information on biomarkers, activated and repressed pathways, and more. While useful, data from multiple patients is challenging to integrate due to the heterogeneity in treatment response among patients, and the small number of timepoints that are usually profiled. Due to the heterogeneity among patients, relying on the sampled time points to integrate data across individuals is challenging and does not lead to correct reconstruction of the response patterns. To address these challenges, we developed a new constrained based pseudo-time ordering method for analyzing transcriptomics data in clinical and response studies. Our method allows the assignment of samples to their correct placement on the response curve while respecting the individual patient order. We use polynomials to represent gene expression over the duration of the study and an EM algorithm to determine parameters and locations. Application to four treatment response datasets shows that our method improves on prior methods and leads to accurate orderings that provide new biological insight on the disease and response.

时间序列 RNASeq 研究有助于了解患者的疾病进展动态和治疗反应。它们还能提供生物标记物、激活和抑制通路等方面的信息。来自多个患者的数据虽然有用，但由于患者之间治疗反应的异质性以及通常分析的时间点数量较少，整合这些数据具有挑战性。由于患者之间存在异质性，依靠采样时间点来整合不同个体的数据具有挑战性，而且无法正确重建反应模式。为了应对这些挑战，我们开发了一种新的基于约束的伪时间排序方法，用于分析临床和反应研究中的转录组学数据。我们的方法允许将样本分配到反应曲线上的正确位置，同时尊重患者的个体排序。我们使用多项式来表示研究期间的基因表达，并使用 EM 算法来确定参数和位置。对三个治疗反应数据集的应用表明，我们的方法改进了之前的方法，并能准确排序，为疾病和反应提供新的生物学见解。该方法的代码见 https://github.com/Sanofi-Public/ RDCS-bulkRNASeq-pseudo ordering。

{"title":"Constrained Pseudo-Time Ordering for Clinical Transcriptomics Data","authors":"Sachin Mathur;Hamid Mattoo;Ziv Bar-Joseph","doi":"10.1109/TCBB.2024.3442669","DOIUrl":"10.1109/TCBB.2024.3442669","url":null,"abstract":"Time series RNASeq studies can enable understanding of the dynamics of disease progression and treatment response in patients. They also provide information on biomarkers, activated and repressed pathways, and more. While useful, data from multiple patients is challenging to integrate due to the heterogeneity in treatment response among patients, and the small number of timepoints that are usually profiled. Due to the heterogeneity among patients, relying on the sampled time points to integrate data across individuals is challenging and does not lead to correct reconstruction of the response patterns. To address these challenges, we developed a new constrained based pseudo-time ordering method for analyzing transcriptomics data in clinical and response studies. Our method allows the assignment of samples to their correct placement on the response curve while respecting the individual patient order. We use polynomials to represent gene expression over the duration of the study and an EM algorithm to determine parameters and locations. Application to four treatment response datasets shows that our method improves on prior methods and leads to accurate orderings that provide new biological insight on the disease and response.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2076-2088"},"PeriodicalIF":3.6,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141975612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AGML: Adaptive Graph-Based Multi-Label Learning for Prediction of RBP and as Event Associations During EMT AGML：基于图形的自适应多标签学习，用于预测 EMT 期间的 RBP 和 AS 事件关联。

IF 3.6 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Pub Date : 2024-08-12 DOI: 10.1109/TCBB.2024.3440913

Yushan Qiu;Wensheng Chen;Wai-Ki Ching;Hongmin Cai;Hao Jiang;Quan Zou

Increasing evidence has indicated that RNA-binding proteins (RBPs) play an essential role in mediating alternative splicing (AS) events during epithelial-mesenchymal transition (EMT). However, due to the substantial cost and complexity of biological experiments, how AS events are regulated and influenced remains largely unknown. Thus, it is important to construct effective models for inferring hidden RBP-AS event associations during EMT process. In this paper, a novel and efficient model was developed to identify AS event-related candidate RBPs based on Adaptive Graph-based Multi-Label learning (AGML). In particular, we propose to adaptively learn a new affinity graph to capture the intrinsic structure of data for both RBPs and AS events. Multi-view similarity matrices are employed for maintaining the intrinsic structure and guiding the adaptive graph learning. We then simultaneously update the RBP and AS event associations that are predicted from both spaces by applying multi-label learning. The experimental results have shown that our AGML achieved AUC values of 0.9521 and 0.9873 by 5-fold and leave-one-out cross-validations, respectively, indicating the superiority and effectiveness of our proposed model. Furthermore, AGML can serve as an efficient and reliable tool for uncovering novel AS events-associated RBPs and is applicable for predicting the associations between other biological entities.

越来越多的证据表明，在上皮-间质转化（EMT）过程中，RNA 结合蛋白（RBPs）在介导替代剪接（AS）事件中起着至关重要的作用。然而，由于生物实验成本高昂且十分复杂，AS 事件如何受到调控和影响在很大程度上仍是未知数。因此，构建有效的模型来推断 EMT 过程中隐藏的 RBP-AS 事件关联非常重要。本文基于基于自适应图的多标签学习（AGML），开发了一种新颖高效的模型来识别AS事件相关的候选RBP。特别是，我们建议自适应学习一种新的亲和图，以捕捉 RBPs 和 AS 事件的数据内在结构。多视图相似性矩阵用于保持内在结构和指导自适应图学习。然后，我们通过应用多标签学习，同时更新从两个空间预测出的 RBP 和 AS 事件关联。实验结果表明，通过五倍交叉验证和留一交叉验证，我们的 AGML 的 AUC 值分别达到了 0.9521 和 0.9873，这表明我们提出的模型是优越和有效的。此外，AGML可以作为发现新型AS事件相关RBPs的高效可靠工具，并适用于预测其他生物实体之间的关联。AGML 的源代码见 https://github.com/yushanqiu/AGML。

{"title":"AGML: Adaptive Graph-Based Multi-Label Learning for Prediction of RBP and as Event Associations During EMT","authors":"Yushan Qiu;Wensheng Chen;Wai-Ki Ching;Hongmin Cai;Hao Jiang;Quan Zou","doi":"10.1109/TCBB.2024.3440913","DOIUrl":"10.1109/TCBB.2024.3440913","url":null,"abstract":"Increasing evidence has indicated that RNA-binding proteins (RBPs) play an essential role in mediating alternative splicing (AS) events during epithelial-mesenchymal transition (EMT). However, due to the substantial cost and complexity of biological experiments, how AS events are regulated and influenced remains largely unknown. Thus, it is important to construct effective models for inferring hidden RBP-AS event associations during EMT process. In this paper, a novel and efficient model was developed to identify AS event-related candidate RBPs based on Adaptive Graph-based Multi-Label learning (AGML). In particular, we propose to adaptively learn a new affinity graph to capture the intrinsic structure of data for both RBPs and AS events. Multi-view similarity matrices are employed for maintaining the intrinsic structure and guiding the adaptive graph learning. We then simultaneously update the RBP and AS event associations that are predicted from both spaces by applying multi-label learning. The experimental results have shown that our AGML achieved AUC values of 0.9521 and 0.9873 by 5-fold and leave-one-out cross-validations, respectively, indicating the superiority and effectiveness of our proposed model. Furthermore, AGML can serve as an efficient and reliable tool for uncovering novel AS events-associated RBPs and is applicable for predicting the associations between other biological entities.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2113-2122"},"PeriodicalIF":3.6,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141971037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Editorial Deep Learning-Empowered Big Data Analytics in Biomedical Applications and Digital Healthcare 编辑本段深度学习驱动的生物医学应用和数字医疗大数据分析

IF 3.6 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Pub Date : 2024-08-08 DOI: 10.1109/TCBB.2024.3371808

Xiaokang Zhou;Carson K. Leung;Kevin I-Kai Wang;Giancarlo Fortino

Deep learning and big data analysis are among the most important research topics in the fields of biomedical applications and digital healthcare. With the fast development of artificial intelligence (AI) and Internets of Things (IoT) technologies, deep learning (DL) for big data analytics—including affective learning, reinforcement learning, and transfer learning—are widely applied to sense, learn, and interact with human health. Examples of biomedical applications include smart biomaterials, biomedical imaging, heartbeat/blood pressure measurement, and eye tracking. These biomedical applications collect healthcare data through remote sensors and transfer the data to a centralized system for analysis. With an enormous amount of historical data, DL and big data analysis technologies are able to identify potential linkage between features and possible risks, raise important decision for medical diagnosis, and provide precious advice for better healthcare treatment and lifestyle. Although significant progress has been made with AI, DL, and big data analytic technologies for medical and healthcare research, there remain gaps between the computer-aided treatment design and real-world healthcare demands. In addition, there are unexplored areas in the fields of healthcare and biomedical applications with cutting-edge AI and DL technologies. Hence, exploring the possibility of DL and big data analytics in the fields of biomedical applications and digital healthcare is in high demand.

深度学习和大数据分析是生物医学应用和数字医疗领域最重要的研究课题之一。随着人工智能（AI）和物联网（IoT）技术的快速发展，用于大数据分析的深度学习（DL）--包括情感学习、强化学习和迁移学习--被广泛应用于人类健康的感知、学习和交互。生物医学应用的例子包括智能生物材料、生物医学成像、心跳/血压测量和眼球跟踪。这些生物医学应用通过远程传感器收集医疗保健数据，并将数据传输到中央系统进行分析。面对海量的历史数据，DL 和大数据分析技术能够识别特征与可能风险之间的潜在联系，提出重要的医疗诊断决策，并为更好的医疗治疗和生活方式提供宝贵建议。尽管人工智能、数字图书馆和大数据分析技术在医疗保健研究方面取得了重大进展，但计算机辅助治疗设计与现实世界的医疗保健需求之间仍存在差距。此外，在医疗保健和生物医学应用领域，前沿的人工智能和 DL 技术还有一些尚未开发的领域。因此，探索 DL 和大数据分析在生物医学应用和数字医疗领域的可能性是非常有必要的。

{"title":"Editorial Deep Learning-Empowered Big Data Analytics in Biomedical Applications and Digital Healthcare","authors":"Xiaokang Zhou;Carson K. Leung;Kevin I-Kai Wang;Giancarlo Fortino","doi":"10.1109/TCBB.2024.3371808","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3371808","url":null,"abstract":"Deep learning and big data analysis are among the most important research topics in the fields of biomedical applications and digital healthcare. With the fast development of artificial intelligence (AI) and Internets of Things (IoT) technologies, deep learning (DL) for big data analytics—including affective learning, reinforcement learning, and transfer learning—are widely applied to sense, learn, and interact with human health. Examples of biomedical applications include smart biomaterials, biomedical imaging, heartbeat/blood pressure measurement, and eye tracking. These biomedical applications collect healthcare data through remote sensors and transfer the data to a centralized system for analysis. With an enormous amount of historical data, DL and big data analysis technologies are able to identify potential linkage between features and possible risks, raise important decision for medical diagnosis, and provide precious advice for better healthcare treatment and lifestyle. Although significant progress has been made with AI, DL, and big data analytic technologies for medical and healthcare research, there remain gaps between the computer-aided treatment design and real-world healthcare demands. In addition, there are unexplored areas in the fields of healthcare and biomedical applications with cutting-edge AI and DL technologies. Hence, exploring the possibility of DL and big data analytics in the fields of biomedical applications and digital healthcare is in high demand.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 4","pages":"516-520"},"PeriodicalIF":3.6,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10631783","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141965894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DMAMP: A Deep-Learning Model for Detecting Antimicrobial Peptides and Their Multi-Activities DMAMP：用于检测抗菌肽及其多重活性的深度学习模型。

IF 3.6 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Pub Date : 2024-08-06 DOI: 10.1109/TCBB.2024.3439541

Qiaozhen Meng;Genlang Chen;Bin Lin;Shixin Zheng;Yulai Lin;Jijun Tang;Fei Guo

Due to the broad-spectrum and high-efficiency antibacterial activity, antimicrobial peptides (AMPs) and their functions have been studied in the field of drug discovery. Using biological experiments to detect the AMPs and corresponding activities require a high cost, whereas computational technologies do so for much less. Currently, most computational methods solve the identification of AMPs and their activities as two independent tasks, which ignore the relationship between them. Therefore, the combination and sharing of patterns for two tasks is a crucial problem that needs to be addressed. In this study, we propose a deep learning model, called DMAMP, for detecting AMPs and activities simultaneously, which is benefited from multi-task learning. The first stage is to utilize convolutional neural network models and residual blocks to extract the sharing hidden features from two related tasks. The next stage is to use two fully connected layers to learn the distinct information of two tasks. Meanwhile, the original evolutionary features from the peptide sequence are also fed to the predictor of the second task to complement the forgotten information. The experiments on the independent test dataset demonstrate that our method performs better than the single-task model with 4.28% of Matthews Correlation Coefficient (MCC) on the first task, and achieves 0.2627 of an average MCC which is higher than the single-task model and two existing methods for five activities on the second task. To understand whether features derived from the convolutional layers of models capture the differences between target classes, we visualize these high-dimensional features by projecting into 3D space. In addition, we show that our predictor has the ability to identify peptides that achieve activity against Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2). We hope that our proposed method can give new insights into the discovery of novel antiviral peptide drugs.

由于具有广谱高效的抗菌活性，抗菌肽（AMPs）及其功能已被用于药物发现领域的研究。利用生物学实验检测 AMPs 及其相应活性需要高昂的成本，而利用计算技术则只需较低的成本。目前，大多数计算方法将 AMPs 及其活性的识别作为两个独立的任务来解决，忽略了它们之间的关系。因此，两个任务的模式组合与共享是一个亟待解决的关键问题。在本研究中，我们提出了一种名为 DMAMP 的深度学习模型，用于同时检测 AMPs 和活动，该模型得益于多任务学习。第一阶段是利用卷积神经网络模型和残差块从两个相关任务中提取共享隐藏特征。下一阶段是利用两个全连接层来学习两个任务的不同信息。同时，肽序列中的原始进化特征也会被输入到第二个任务的预测器中，以补充被遗忘的信息。在独立测试数据集上的实验表明，我们的方法在第一项任务上的马修斯相关系数（MCC）为 4.28%，优于单任务模型；在第二项任务的五项活动中，平均马修斯相关系数为 0.2627，高于单任务模型和两种现有方法。为了了解从卷积层模型中得出的特征是否捕捉到了目标类别之间的差异，我们将这些高维特征投影到三维空间，使其可视化。此外，我们还展示了我们的预测器能够识别出具有抗严重急性呼吸系统综合症冠状病毒-2（SARS-CoV-2）活性的多肽。我们希望我们提出的方法能为新型抗病毒多肽药物的发现提供新的见解。

{"title":"DMAMP: A Deep-Learning Model for Detecting Antimicrobial Peptides and Their Multi-Activities","authors":"Qiaozhen Meng;Genlang Chen;Bin Lin;Shixin Zheng;Yulai Lin;Jijun Tang;Fei Guo","doi":"10.1109/TCBB.2024.3439541","DOIUrl":"10.1109/TCBB.2024.3439541","url":null,"abstract":"Due to the broad-spectrum and high-efficiency antibacterial activity, antimicrobial peptides (AMPs) and their functions have been studied in the field of drug discovery. Using biological experiments to detect the AMPs and corresponding activities require a high cost, whereas computational technologies do so for much less. Currently, most computational methods solve the identification of AMPs and their activities as two independent tasks, which ignore the relationship between them. Therefore, the combination and sharing of patterns for two tasks is a crucial problem that needs to be addressed. In this study, we propose a deep learning model, called DMAMP, for detecting AMPs and activities simultaneously, which is benefited from multi-task learning. The first stage is to utilize convolutional neural network models and residual blocks to extract the sharing hidden features from two related tasks. The next stage is to use two fully connected layers to learn the distinct information of two tasks. Meanwhile, the original evolutionary features from the peptide sequence are also fed to the predictor of the second task to complement the forgotten information. The experiments on the independent test dataset demonstrate that our method performs better than the single-task model with 4.28% of Matthews Correlation Coefficient (MCC) on the first task, and achieves 0.2627 of an average MCC which is higher than the single-task model and two existing methods for five activities on the second task. To understand whether features derived from the convolutional layers of models capture the differences between target classes, we visualize these high-dimensional features by projecting into 3D space. In addition, we show that our predictor has the ability to identify peptides that achieve activity against Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2). We hope that our proposed method can give new insights into the discovery of novel antiviral peptide drugs.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2025-2034"},"PeriodicalIF":3.6,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141897345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hyb_SEnc: An Antituberculosis Peptide Predictor Based on a Hybrid Feature Vector and Stacked Ensemble Learning Hyb_SEnc：基于混合特征向量和堆叠集合学习的抗结核肽预测器

IF 3.6 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Pub Date : 2024-07-31 DOI: 10.1109/TCBB.2024.3425644

Xiuhao Fu;Hao Duan;Xiaofeng Zang;Chunling Liu;Xingfeng Li;Qingchen Zhang;Zilong Zhang;Quan Zou;Feifei Cui

Tuberculosis has plagued mankind since ancient times, and the struggle between humans and tuberculosis continues. Mycobacterium tuberculosis is the leading cause of tuberculosis, infecting nearly one-third of the world's population. The rise of peptide drugs has created a new direction in the treatment of tuberculosis. Therefore, for the treatment of tuberculosis, the prediction of anti-tuberculosis peptides is crucial. This paper proposes an anti-tuberculosis peptide prediction method based on hybrid features and stacked ensemble learning. First, a random forest (RF) and extremely randomized tree (ERT) are selected as first-level learning of stacked ensembles. Then, the five best-performing feature encoding methods are selected to obtain the hybrid feature vector, and then the decision tree and recursive feature elimination (DT-RFE) are used to refine the hybrid feature vector. After selection, the optimal feature subset is used as the input of the stacked ensemble model. At the same time, logistic regression (LR) is used as a stacked ensemble secondary learner to build the final stacked ensemble model Hyb_SEnc. The prediction accuracy of Hyb_SEnc achieved 94.68% and 95.74% on the independent test sets of AntiTb_MD and AntiTb_RD, respectively.

结核病自古以来就困扰着人类，人类与结核病的斗争仍在继续。结核分枝杆菌是结核病的主要病因，感染了全球近三分之一的人口。多肽药物的兴起为结核病的治疗开辟了新的方向。因此，对于结核病的治疗，抗结核肽的预测至关重要。本文提出了一种基于混合特征和堆叠集合学习的抗结核肽预测方法。首先，选择随机森林（RF）和极随机树（ERT）作为堆叠集合的一级学习。然后，选择五种表现最好的特征编码方法来获得混合特征向量，再用决策树和递归特征消除（DT-RFE）来完善混合特征向量。经过选择后，最优特征子集被用作堆叠集合模型的输入。同时，使用逻辑回归（LR）作为堆叠集合二级学习器，建立最终的堆叠集合模型 Hyb_SEnc。在 AntiTb_MD 和 AntiTb_RD 的独立测试集上，Hyb_SEnc 的预测准确率分别达到 94.68% 和 95.74%。此外，我们还提供了一个用户友好型网络服务器（http://www.bioailab. com/Hyb_SEnc）。源代码可在 https://github.com/fxh1001/Hyb_SEnc 免费获取。

{"title":"Hyb_SEnc: An Antituberculosis Peptide Predictor Based on a Hybrid Feature Vector and Stacked Ensemble Learning","authors":"Xiuhao Fu;Hao Duan;Xiaofeng Zang;Chunling Liu;Xingfeng Li;Qingchen Zhang;Zilong Zhang;Quan Zou;Feifei Cui","doi":"10.1109/TCBB.2024.3425644","DOIUrl":"10.1109/TCBB.2024.3425644","url":null,"abstract":"Tuberculosis has plagued mankind since ancient times, and the struggle between humans and tuberculosis continues. Mycobacterium tuberculosis is the leading cause of tuberculosis, infecting nearly one-third of the world's population. The rise of peptide drugs has created a new direction in the treatment of tuberculosis. Therefore, for the treatment of tuberculosis, the prediction of anti-tuberculosis peptides is crucial. This paper proposes an anti-tuberculosis peptide prediction method based on hybrid features and stacked ensemble learning. First, a random forest (RF) and extremely randomized tree (ERT) are selected as first-level learning of stacked ensembles. Then, the five best-performing feature encoding methods are selected to obtain the hybrid feature vector, and then the decision tree and recursive feature elimination (DT-RFE) are used to refine the hybrid feature vector. After selection, the optimal feature subset is used as the input of the stacked ensemble model. At the same time, logistic regression (LR) is used as a stacked ensemble secondary learner to build the final stacked ensemble model Hyb_SEnc. The prediction accuracy of Hyb_SEnc achieved 94.68% and 95.74% on the independent test sets of AntiTb_MD and AntiTb_RD, respectively.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"1897-1910"},"PeriodicalIF":3.6,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141859585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

KGRLFF: Detecting Drug-Drug Interactions Based on Knowledge Graph Representation Learning and Feature Fusion KGRLFF：基于知识图谱表示学习和特征融合的药物相互作用检测。

IF 3.6 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Pub Date : 2024-07-29 DOI: 10.1109/TCBB.2024.3434992

Xiaoli Lin;Zhuang Yin;Xiaolong Zhang;Jing Hu

Accurate prediction of drug-drug interactions (DDIs) plays an important role in improving the efficiency of drug development and ensuring the safety of combination therapy. Most existing models rely on a single source of information to predict DDIs, and few models can perform tasks on biomedical knowledge graphs. This paper proposes a new hybrid method, namely Knowledge Graph Representation Learning and Feature Fusion (KGRLFF), to fully exploit the information from the biomedical knowledge graph and molecular structure of drugs to better predict DDIs. KGRLFF first uses a Bidirectional Random Walk sampling method based on the PageRank algorithm (BRWP) to obtain higher-order neighborhood information of drugs in the knowledge graph, including neighboring nodes, semantic relations, and higher-order information associated with triple facts. Then, an embedded representation learning model named Knowledge Graph-based Cyclic Recursive Aggregation (KGCRA) is used to learn the embedded representations of drugs by recursively propagating and aggregating messages with drugs as both the source and destination. In addition, the model learns the molecular structures of the drugs to obtain the structured features. Finally, a Feature Representation Fusion Strategy (FRFS) was developed to integrate embedded representations and structured feature representations. Experimental results showed that KGRLFF is feasible for predicting potential DDIs.

准确预测药物间相互作用（DDIs）对于提高药物开发效率和确保联合疗法的安全性具有重要作用。现有模型大多依赖单一信息源预测 DDIs，很少有模型能在生物医学知识图谱上执行任务。本文提出了一种新的混合方法，即知识图谱表征学习与特征融合（KGRLFF），以充分利用生物医学知识图谱和药物分子结构的信息，更好地预测DDIs。KGRLFF首先使用基于PageRank算法（BRWP）的双向随机游走采样方法获取知识图谱中药物的高阶邻域信息，包括邻近节点、语义关系以及与三重事实相关的高阶信息。然后，一个名为 "基于知识图谱的循环递归聚合（KGCRA）"的嵌入式表征学习模型通过递归传播和聚合以药物为源和目的的信息来学习药物的嵌入式表征。此外，该模型还能学习药物的分子结构，从而获得结构化特征。最后，开发了一种特征表征融合策略（FRFS）来整合嵌入式表征和结构化特征表征。实验结果表明，KGRLFF 对预测潜在的 DDIs 是可行的。

{"title":"KGRLFF: Detecting Drug-Drug Interactions Based on Knowledge Graph Representation Learning and Feature Fusion","authors":"Xiaoli Lin;Zhuang Yin;Xiaolong Zhang;Jing Hu","doi":"10.1109/TCBB.2024.3434992","DOIUrl":"10.1109/TCBB.2024.3434992","url":null,"abstract":"Accurate prediction of drug-drug interactions (DDIs) plays an important role in improving the efficiency of drug development and ensuring the safety of combination therapy. Most existing models rely on a single source of information to predict DDIs, and few models can perform tasks on biomedical knowledge graphs. This paper proposes a new hybrid method, namely Knowledge Graph Representation Learning and Feature Fusion (KGRLFF), to fully exploit the information from the biomedical knowledge graph and molecular structure of drugs to better predict DDIs. KGRLFF first uses a Bidirectional Random Walk sampling method based on the PageRank algorithm (BRWP) to obtain higher-order neighborhood information of drugs in the knowledge graph, including neighboring nodes, semantic relations, and higher-order information associated with triple facts. Then, an embedded representation learning model named Knowledge Graph-based Cyclic Recursive Aggregation (KGCRA) is used to learn the embedded representations of drugs by recursively propagating and aggregating messages with drugs as both the source and destination. In addition, the model learns the molecular structures of the drugs to obtain the structured features. Finally, a Feature Representation Fusion Strategy (FRFS) was developed to integrate embedded representations and structured feature representations. Experimental results showed that KGRLFF is feasible for predicting potential DDIs.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2035-2049"},"PeriodicalIF":3.6,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10613488","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141792317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0