Pub Date : 2024-06-01Epub Date: 2024-02-27DOI: 10.1007/s12539-024-00620-3
Rashid Khan, Chuda Xiao, Yang Liu, Jinyu Tian, Zhuo Chen, Liyilei Su, Dan Li, Haseeb Hassan, Haoyu Li, Weiguo Xie, Wen Zhong, Bingding Huang
Kidney ultrasound (US) images are primarily employed for diagnosing different renal diseases. Among them, one is renal localization and detection, which can be carried out by segmenting the kidney US images. However, kidney segmentation from US images is challenging due to low contrast, speckle noise, fluid, variations in kidney shape, and modality artifacts. Moreover, well-annotated US datasets for renal segmentation and detection are scarce. This study aims to build a novel, well-annotated dataset containing 44,880 US images. In addition, we propose a novel training scheme that utilizes the encoder and decoder parts of a state-of-the-art segmentation algorithm. In the pre-processing step, pixel intensity normalization improves contrast and facilitates model convergence. The modified encoder-decoder architecture improves pyramid-shaped hole pooling, cascaded multiple-hole convolutions, and batch normalization. The pre-processing step gradually reconstructs spatial information, including the capture of complete object boundaries, and the post-processing module with a concave curvature reduces the false positive rate of the results. We present benchmark findings to validate the quality of the proposed training scheme and dataset. We applied six evaluation metrics and several baseline segmentation approaches to our novel kidney US dataset. Among the evaluated models, DeepLabv3+ performed well and achieved the highest dice, Hausdorff distance 95, accuracy, specificity, average symmetric surface distance, and recall scores of 89.76%, 9.91, 98.14%, 98.83%, 3.03, and 90.68%, respectively. The proposed training strategy aids state-of-the-art segmentation models, resulting in better-segmented predictions. Furthermore, the large, well-annotated kidney US public dataset will serve as a valuable baseline source for future medical image analysis research.
肾脏超声波(US)图像主要用于诊断不同的肾脏疾病。其中,肾脏定位和检测可通过分割肾脏 US 图像来实现。然而,由于对比度低、斑点噪声、流体、肾脏形状变化和模式伪影等原因,从 US 图像中分割肾脏具有挑战性。此外,用于肾脏分割和检测的注释良好的 US 数据集也很少。本研究旨在建立一个包含 44,880 张 US 图像的新型、注释完善的数据集。此外,我们还提出了一种新的训练方案,该方案利用了最先进的分割算法的编码器和解码器部分。在预处理步骤中,像素强度归一化可提高对比度并促进模型收敛。修改后的编码器-解码器架构改进了金字塔形孔池、级联多孔卷积和批量归一化。预处理步骤逐步重建空间信息,包括捕捉完整的物体边界,而带有凹曲率的后处理模块则降低了结果的误报率。我们提出了基准结果,以验证所提出的训练方案和数据集的质量。我们对新型肾脏 US 数据集采用了六种评估指标和几种基线分割方法。在接受评估的模型中,DeepLabv3+ 表现出色,在骰子、豪斯多夫距离 95、准确性、特异性、平均对称面距离和召回率方面分别取得了 89.76%、9.91、98.14%、98.83%、3.03 和 90.68% 的最高分。所提出的训练策略有助于最先进的分割模型,从而获得更好的分割预测结果。此外,美国肾脏公共数据集规模大、注释详尽,将成为未来医学图像分析研究的宝贵基准源。
{"title":"Transformative Deep Neural Network Approaches in Kidney Ultrasound Segmentation: Empirical Validation with an Annotated Dataset.","authors":"Rashid Khan, Chuda Xiao, Yang Liu, Jinyu Tian, Zhuo Chen, Liyilei Su, Dan Li, Haseeb Hassan, Haoyu Li, Weiguo Xie, Wen Zhong, Bingding Huang","doi":"10.1007/s12539-024-00620-3","DOIUrl":"10.1007/s12539-024-00620-3","url":null,"abstract":"<p><p>Kidney ultrasound (US) images are primarily employed for diagnosing different renal diseases. Among them, one is renal localization and detection, which can be carried out by segmenting the kidney US images. However, kidney segmentation from US images is challenging due to low contrast, speckle noise, fluid, variations in kidney shape, and modality artifacts. Moreover, well-annotated US datasets for renal segmentation and detection are scarce. This study aims to build a novel, well-annotated dataset containing 44,880 US images. In addition, we propose a novel training scheme that utilizes the encoder and decoder parts of a state-of-the-art segmentation algorithm. In the pre-processing step, pixel intensity normalization improves contrast and facilitates model convergence. The modified encoder-decoder architecture improves pyramid-shaped hole pooling, cascaded multiple-hole convolutions, and batch normalization. The pre-processing step gradually reconstructs spatial information, including the capture of complete object boundaries, and the post-processing module with a concave curvature reduces the false positive rate of the results. We present benchmark findings to validate the quality of the proposed training scheme and dataset. We applied six evaluation metrics and several baseline segmentation approaches to our novel kidney US dataset. Among the evaluated models, DeepLabv3+ performed well and achieved the highest dice, Hausdorff distance 95, accuracy, specificity, average symmetric surface distance, and recall scores of 89.76%, 9.91, 98.14%, 98.83%, 3.03, and 90.68%, respectively. The proposed training strategy aids state-of-the-art segmentation models, resulting in better-segmented predictions. Furthermore, the large, well-annotated kidney US public dataset will serve as a valuable baseline source for future medical image analysis research.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"439-454"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139982903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-01Epub Date: 2024-02-11DOI: 10.1007/s12539-024-00604-3
Songyang Wu, Kui Jin, Mingjing Tang, Yuelong Xia, Wei Gao
Since gene regulation is a complex process in which multiple genes act simultaneously, accurately inferring gene regulatory networks (GRNs) is a long-standing challenge in systems biology. Although graph neural networks can formally describe intricate gene expression mechanisms, current GRN inference methods based on graph learning regard only transcription factor (TF)-target gene interactions as pairwise relationships, and cannot model the many-to-many high-order regulatory patterns that prevail among genes. Moreover, these methods often rely on limited prior regulatory knowledge, ignoring the structural information of GRNs in gene expression profiles. Therefore, we propose a multi-view hierarchical hypergraphs GRN (MHHGRN) inference model. Specifically, multiple heterogeneous biological information is integrated to construct multi-view hierarchical hypergraphs of TFs and target genes, using hypergraph convolution networks to model higher order complex regulatory relationships. Meanwhile, the coupled information diffusion mechanism and the cross-domain messaging mechanism facilitate the information sharing between genes to optimise gene embedding representations. Finally, a unique channel attention mechanism is used to adaptively learn feature representations from multiple views for GRN inference. Experimental results show that MHHGRN achieves better results than the baseline methods on the E. coli and S. cerevisiae benchmark datasets of the DREAM5 challenge, and it has excellent cross-species generalization, achieving comparable or better performance on scRNA-seq datasets from five mouse and two human cell lines.
{"title":"Inference of Gene Regulatory Networks Based on Multi-view Hierarchical Hypergraphs.","authors":"Songyang Wu, Kui Jin, Mingjing Tang, Yuelong Xia, Wei Gao","doi":"10.1007/s12539-024-00604-3","DOIUrl":"10.1007/s12539-024-00604-3","url":null,"abstract":"<p><p>Since gene regulation is a complex process in which multiple genes act simultaneously, accurately inferring gene regulatory networks (GRNs) is a long-standing challenge in systems biology. Although graph neural networks can formally describe intricate gene expression mechanisms, current GRN inference methods based on graph learning regard only transcription factor (TF)-target gene interactions as pairwise relationships, and cannot model the many-to-many high-order regulatory patterns that prevail among genes. Moreover, these methods often rely on limited prior regulatory knowledge, ignoring the structural information of GRNs in gene expression profiles. Therefore, we propose a multi-view hierarchical hypergraphs GRN (MHHGRN) inference model. Specifically, multiple heterogeneous biological information is integrated to construct multi-view hierarchical hypergraphs of TFs and target genes, using hypergraph convolution networks to model higher order complex regulatory relationships. Meanwhile, the coupled information diffusion mechanism and the cross-domain messaging mechanism facilitate the information sharing between genes to optimise gene embedding representations. Finally, a unique channel attention mechanism is used to adaptively learn feature representations from multiple views for GRN inference. Experimental results show that MHHGRN achieves better results than the baseline methods on the E. coli and S. cerevisiae benchmark datasets of the DREAM5 challenge, and it has excellent cross-species generalization, achieving comparable or better performance on scRNA-seq datasets from five mouse and two human cell lines.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"318-332"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139717494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-01Epub Date: 2024-06-29DOI: 10.1007/s12539-024-00630-1
Shaharyar Alam Ansari, Arun Prakash Agrawal, Mohd Anas Wajid, Mohammad Saif Wajid, Aasim Zafar
Image classification, a fundamental task in computer vision, faces challenges concerning limited data handling, interpretability, improved feature representation, efficiency across diverse image types, and processing noisy data. Conventional architectural approaches have made insufficient progress in addressing these challenges, necessitating architectures capable of fine-grained classification, enhanced accuracy, and superior generalization. Among these, the vision transformer emerges as a noteworthy computer vision architecture. However, its reliance on substantial data for training poses a drawback due to its complexity and high data requirements. To surmount these challenges, this paper proposes an innovative approach, MetaV, integrating meta-learning into a vision transformer for medical image classification. N-way K-shot learning is employed to train the model, drawing inspiration from human learning mechanisms utilizing past knowledge. Additionally, deformational convolution and patch merging techniques are incorporated into the vision transformer model to mitigate complexity and overfitting while enhancing feature representation. Augmentation methods such as perturbation and Grid Mask are introduced to address the scarcity and noise in medical images, particularly for rare diseases. The proposed model is evaluated using diverse datasets including Break His, ISIC 2019, SIPaKMed, and STARE. The achieved performance accuracies of 89.89%, 87.33%, 94.55%, and 80.22% for Break His, ISIC 2019, SIPaKMed, and STARE, respectively, present evidence validating the superior performance of the proposed model in comparison to conventional models, setting a new benchmark for meta-vision image classification models.
图像分类是计算机视觉领域的一项基本任务,它面临的挑战包括有限的数据处理、可解释性、改进的特征表示、不同图像类型的效率以及噪声数据的处理。传统的架构方法在应对这些挑战方面没有取得足够的进展,因此需要能够进行细粒度分类、提高准确性和卓越通用性的架构。其中,视觉转换器是一种值得关注的计算机视觉架构。然而,由于其复杂性和对数据的高要求,它对大量训练数据的依赖构成了一个缺点。为了克服这些挑战,本文提出了一种创新方法--MetaV,将元学习集成到用于医学图像分类的视觉转换器中。从人类利用过去知识的学习机制中汲取灵感,采用 N 路 K-shot 学习来训练模型。此外,变形卷积和补丁合并技术被纳入视觉转换器模型,以减轻复杂性和过拟合,同时增强特征表示。此外,还引入了扰动和网格掩码等增强方法,以解决医学图像中的稀缺性和噪声问题,尤其是针对罕见疾病。我们使用不同的数据集对所提出的模型进行了评估,包括 Break His、ISIC 2019、SIPaKMed 和 STARE。Break His、ISIC 2019、SIPaKMed 和 STARE 的准确率分别为 89.89%、87.33%、94.55% 和 80.22%,证明了所提出的模型与传统模型相比具有更优越的性能,为元视觉图像分类模型树立了新的标杆。
{"title":"MetaV: A Pioneer in feature Augmented Meta-Learning Based Vision Transformer for Medical Image Classification.","authors":"Shaharyar Alam Ansari, Arun Prakash Agrawal, Mohd Anas Wajid, Mohammad Saif Wajid, Aasim Zafar","doi":"10.1007/s12539-024-00630-1","DOIUrl":"10.1007/s12539-024-00630-1","url":null,"abstract":"<p><p>Image classification, a fundamental task in computer vision, faces challenges concerning limited data handling, interpretability, improved feature representation, efficiency across diverse image types, and processing noisy data. Conventional architectural approaches have made insufficient progress in addressing these challenges, necessitating architectures capable of fine-grained classification, enhanced accuracy, and superior generalization. Among these, the vision transformer emerges as a noteworthy computer vision architecture. However, its reliance on substantial data for training poses a drawback due to its complexity and high data requirements. To surmount these challenges, this paper proposes an innovative approach, MetaV, integrating meta-learning into a vision transformer for medical image classification. N-way K-shot learning is employed to train the model, drawing inspiration from human learning mechanisms utilizing past knowledge. Additionally, deformational convolution and patch merging techniques are incorporated into the vision transformer model to mitigate complexity and overfitting while enhancing feature representation. Augmentation methods such as perturbation and Grid Mask are introduced to address the scarcity and noise in medical images, particularly for rare diseases. The proposed model is evaluated using diverse datasets including Break His, ISIC 2019, SIPaKMed, and STARE. The achieved performance accuracies of 89.89%, 87.33%, 94.55%, and 80.22% for Break His, ISIC 2019, SIPaKMed, and STARE, respectively, present evidence validating the superior performance of the proposed model in comparison to conventional models, setting a new benchmark for meta-vision image classification models.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"469-488"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141476538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-01Epub Date: 2024-02-18DOI: 10.1007/s12539-023-00601-y
Xianxian Cai, Wei Zhang, Xiaoying Zheng, Yaxin Xu, Yuanyuan Li
With the advent of single-cell RNA sequencing (scRNA-seq) technology, many scRNA-seq data have become available, providing an unprecedented opportunity to explore cellular composition and heterogeneity. Recently, many computational algorithms for predicting cell type composition have been developed, and these methods are typically evaluated on different datasets and performance metrics using diverse techniques. Consequently, the lack of comprehensive and standardized comparative analysis makes it difficult to gain a clear understanding of the strengths and weaknesses of these methods. To address this gap, we reviewed 20 cutting-edge unsupervised cell type identification methods and evaluated these methods comprehensively using 24 real scRNA-seq datasets of varying scales. In addition, we proposed a new ensemble cell-type identification method, named scEM, which learns the consensus similarity matrix by applying the entropy weight method to the four representative methods are selected. The Louvain algorithm is adopted to obtain the final classification of individual cells based on the consensus matrix. Extensive evaluation and comparison with 11 other similarity-based methods under real scRNA-seq datasets demonstrate that the newly developed ensemble algorithm scEM is effective in predicting cellular type composition.
{"title":"scEM: A New Ensemble Framework for Predicting Cell Type Composition Based on scRNA-Seq Data.","authors":"Xianxian Cai, Wei Zhang, Xiaoying Zheng, Yaxin Xu, Yuanyuan Li","doi":"10.1007/s12539-023-00601-y","DOIUrl":"10.1007/s12539-023-00601-y","url":null,"abstract":"<p><p>With the advent of single-cell RNA sequencing (scRNA-seq) technology, many scRNA-seq data have become available, providing an unprecedented opportunity to explore cellular composition and heterogeneity. Recently, many computational algorithms for predicting cell type composition have been developed, and these methods are typically evaluated on different datasets and performance metrics using diverse techniques. Consequently, the lack of comprehensive and standardized comparative analysis makes it difficult to gain a clear understanding of the strengths and weaknesses of these methods. To address this gap, we reviewed 20 cutting-edge unsupervised cell type identification methods and evaluated these methods comprehensively using 24 real scRNA-seq datasets of varying scales. In addition, we proposed a new ensemble cell-type identification method, named scEM, which learns the consensus similarity matrix by applying the entropy weight method to the four representative methods are selected. The Louvain algorithm is adopted to obtain the final classification of individual cells based on the consensus matrix. Extensive evaluation and comparison with 11 other similarity-based methods under real scRNA-seq datasets demonstrate that the newly developed ensemble algorithm scEM is effective in predicting cellular type composition.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"304-317"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139898012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Long noncoding RNAs (lncRNAs) have significant regulatory roles in gene expression. Interactions with proteins are one of the ways lncRNAs play their roles. Since experiments to determine lncRNA-protein interactions (LPIs) are expensive and time-consuming, many computational methods for predicting LPIs have been proposed as alternatives. In the LPIs prediction problem, there commonly exists the imbalance in the distribution of positive and negative samples. However, there are few existing methods that give specific consideration to this problem. In this paper, we proposed a new clustering-based LPIs prediction method using segmented k-mer frequencies and multi-space clustering (LPI-SKMSC). It was dedicated to handling the imbalance of positive and negative samples. We constructed segmented k-mer frequencies to obtain global and local features of lncRNA and protein sequences. Then, the multi-space clustering was applied to LPI-SKMSC. The convolutional neural network (CNN)-based encoders were used to map different features of a sample to different spaces. It used multiple spaces to jointly constrain the classification of samples. Finally, the distances between the output features of the encoder and the cluster center in each space were calculated. The sum of distances in all spaces was compared with the cluster radius to predict the LPIs. We performed cross-validation on 3 public datasets and LPI-SKMSC showed the best performance compared to other existing methods. Experimental results showed that LPI-SKMSC could predict LPIs more effectively when faced with imbalanced positive and negative samples. In addition, we illustrated that our model was better at uncovering potential lncRNA-protein interaction pairs.
{"title":"LPI-SKMSC: Predicting LncRNA-Protein Interactions with Segmented k-mer Frequencies and Multi-space Clustering.","authors":"Dian-Zheng Sun, Zhan-Li Sun, Mengya Liu, Shuang-Hao Yong","doi":"10.1007/s12539-023-00598-4","DOIUrl":"10.1007/s12539-023-00598-4","url":null,"abstract":"<p><p> Long noncoding RNAs (lncRNAs) have significant regulatory roles in gene expression. Interactions with proteins are one of the ways lncRNAs play their roles. Since experiments to determine lncRNA-protein interactions (LPIs) are expensive and time-consuming, many computational methods for predicting LPIs have been proposed as alternatives. In the LPIs prediction problem, there commonly exists the imbalance in the distribution of positive and negative samples. However, there are few existing methods that give specific consideration to this problem. In this paper, we proposed a new clustering-based LPIs prediction method using segmented k-mer frequencies and multi-space clustering (LPI-SKMSC). It was dedicated to handling the imbalance of positive and negative samples. We constructed segmented k-mer frequencies to obtain global and local features of lncRNA and protein sequences. Then, the multi-space clustering was applied to LPI-SKMSC. The convolutional neural network (CNN)-based encoders were used to map different features of a sample to different spaces. It used multiple spaces to jointly constrain the classification of samples. Finally, the distances between the output features of the encoder and the cluster center in each space were calculated. The sum of distances in all spaces was compared with the cluster radius to predict the LPIs. We performed cross-validation on 3 public datasets and LPI-SKMSC showed the best performance compared to other existing methods. Experimental results showed that LPI-SKMSC could predict LPIs more effectively when faced with imbalanced positive and negative samples. In addition, we illustrated that our model was better at uncovering potential lncRNA-protein interaction pairs.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"378-391"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139416997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-01Epub Date: 2024-03-04DOI: 10.1007/s12539-024-00607-0
Ruibin Chen, Guobo Xie, Zhiyi Lin, Guosheng Gu, Yi Yu, Junrui Yu, Zhenguo Liu
Computational approaches employed for predicting potential microbe-disease associations often rely on similarity information between microbes and diseases. Therefore, it is important to obtain reliable similarity information by integrating multiple types of similarity information. However, existing similarity fusion methods do not consider multi-order fusion of similarity networks. To address this problem, a novel method of linear neighborhood label propagation with multi-order similarity fusion learning (MOSFL-LNP) is proposed to predict potential microbe-disease associations. Multi-order fusion learning comprises two parts: low-order global learning and high-order feature learning. Low-order global learning is used to obtain common latent features from multiple similarity sources. High-order feature learning relies on the interactions between neighboring nodes to identify high-order similarities and learn deeper interactive network structures. Coefficients are assigned to different high-order feature learning modules to balance the similarities learned from different orders and enhance the robustness of the fusion network. Overall, by combining low-order global learning with high-order feature learning, multi-order fusion learning can capture both the shared and unique features of different similarity networks, leading to more accurate predictions of microbe-disease associations. In comparison to six other advanced methods, MOSFL-LNP exhibits superior prediction performance in the leave-one-out cross-validation and 5-fold validation frameworks. In the case study, the predicted 10 microbes associated with asthma and type 1 diabetes have an accuracy rate of up to 90% and 100%, respectively.
{"title":"Predicting Microbe-Disease Associations Based on a Linear Neighborhood Label Propagation Method with Multi-order Similarity Fusion Learning.","authors":"Ruibin Chen, Guobo Xie, Zhiyi Lin, Guosheng Gu, Yi Yu, Junrui Yu, Zhenguo Liu","doi":"10.1007/s12539-024-00607-0","DOIUrl":"10.1007/s12539-024-00607-0","url":null,"abstract":"<p><p>Computational approaches employed for predicting potential microbe-disease associations often rely on similarity information between microbes and diseases. Therefore, it is important to obtain reliable similarity information by integrating multiple types of similarity information. However, existing similarity fusion methods do not consider multi-order fusion of similarity networks. To address this problem, a novel method of linear neighborhood label propagation with multi-order similarity fusion learning (MOSFL-LNP) is proposed to predict potential microbe-disease associations. Multi-order fusion learning comprises two parts: low-order global learning and high-order feature learning. Low-order global learning is used to obtain common latent features from multiple similarity sources. High-order feature learning relies on the interactions between neighboring nodes to identify high-order similarities and learn deeper interactive network structures. Coefficients are assigned to different high-order feature learning modules to balance the similarities learned from different orders and enhance the robustness of the fusion network. Overall, by combining low-order global learning with high-order feature learning, multi-order fusion learning can capture both the shared and unique features of different similarity networks, leading to more accurate predictions of microbe-disease associations. In comparison to six other advanced methods, MOSFL-LNP exhibits superior prediction performance in the leave-one-out cross-validation and 5-fold validation frameworks. In the case study, the predicted 10 microbes associated with asthma and type 1 diabetes have an accuracy rate of up to 90% and 100%, respectively.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"345-360"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140021617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Survival analysis, as a widely used method for analyzing and predicting the timing of event occurrence, plays a crucial role in the medicine field. Medical professionals utilize survival models to gain insight into the effects of patient covariates on the disease, and the correlation with the effectiveness of different treatment strategies. This knowledge is essential for the development of treatment plans and the enhancement of treatment approaches. Conventional survival models, such as the Cox proportional hazards model, require a significant amount of feature engineering or prior knowledge to facilitate personalized modeling. To address these limitations, we propose a novel residual-based self-attention deep neural network for survival modeling, called ResDeepSurv, which combines the benefits of neural networks and the Cox proportional hazards regression model. The model proposed in our study simulates the distribution of survival time and the correlation between covariates and outcomes, but does not impose strict assumptions on the basic distribution of survival data. This approach effectively accounts for both linear and nonlinear risk functions in survival data analysis. The performance of our model in analyzing survival data with various risk functions is on par with or even superior to that of other existing survival analysis methods. Furthermore, we validate the superior performance of our model in comparison to currently existing methods by evaluating multiple publicly available clinical datasets. Through this study, we prove the effectiveness of our proposed model in survival analysis, providing a promising alternative to traditional approaches. The application of deep learning techniques and the ability to capture complex relationships between covariates and survival outcomes without relying on extensive feature engineering make our model a valuable tool for personalized medicine and decision-making in clinical practice.
{"title":"ResDeepSurv: A Survival Model for Deep Neural Networks Based on Residual Blocks and Self-attention Mechanism.","authors":"Yuchen Wang, Xianchun Kong, Xiao Bi, Lizhen Cui, Hong Yu, Hao Wu","doi":"10.1007/s12539-024-00617-y","DOIUrl":"10.1007/s12539-024-00617-y","url":null,"abstract":"<p><p>Survival analysis, as a widely used method for analyzing and predicting the timing of event occurrence, plays a crucial role in the medicine field. Medical professionals utilize survival models to gain insight into the effects of patient covariates on the disease, and the correlation with the effectiveness of different treatment strategies. This knowledge is essential for the development of treatment plans and the enhancement of treatment approaches. Conventional survival models, such as the Cox proportional hazards model, require a significant amount of feature engineering or prior knowledge to facilitate personalized modeling. To address these limitations, we propose a novel residual-based self-attention deep neural network for survival modeling, called ResDeepSurv, which combines the benefits of neural networks and the Cox proportional hazards regression model. The model proposed in our study simulates the distribution of survival time and the correlation between covariates and outcomes, but does not impose strict assumptions on the basic distribution of survival data. This approach effectively accounts for both linear and nonlinear risk functions in survival data analysis. The performance of our model in analyzing survival data with various risk functions is on par with or even superior to that of other existing survival analysis methods. Furthermore, we validate the superior performance of our model in comparison to currently existing methods by evaluating multiple publicly available clinical datasets. Through this study, we prove the effectiveness of our proposed model in survival analysis, providing a promising alternative to traditional approaches. The application of deep learning techniques and the ability to capture complex relationships between covariates and survival outcomes without relying on extensive feature engineering make our model a valuable tool for personalized medicine and decision-making in clinical practice.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"405-417"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140136680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We report a combined manual annotation and deep-learning natural language processing study to make accurate entity extraction in hereditary disease related biomedical literature. A total of 400 full articles were manually annotated based on published guidelines by experienced genetic interpreters at Beijing Genomics Institute (BGI). The performance of our manual annotations was assessed by comparing our re-annotated results with those publicly available. The overall Jaccard index was calculated to be 0.866 for the four entity types-gene, variant, disease and species. Both a BERT-based large name entity recognition (NER) model and a DistilBERT-based simplified NER model were trained, validated and tested, respectively. Due to the limited manually annotated corpus, Such NER models were fine-tuned with two phases. The F1-scores of BERT-based NER for gene, variant, disease and species are 97.28%, 93.52%, 92.54% and 95.76%, respectively, while those of DistilBERT-based NER are 95.14%, 86.26%, 91.37% and 89.92%, respectively. Most importantly, the entity type of variant has been extracted by a large language model for the first time and a comparable F1-score with the state-of-the-art variant extraction model tmVar has been achieved.
我们报告了一项人工标注与深度学习自然语言处理相结合的研究,以准确提取遗传病相关生物医学文献中的实体。北京基因组研究所(BGI)经验丰富的基因解读人员根据已发布的指南对总共 400 篇完整文章进行了人工标注。通过将我们重新标注的结果与公开发表的结果进行比较,评估了我们人工标注的性能。经计算,基因、变异体、疾病和物种这四种实体类型的总体 Jaccard 指数为 0.866。对基于 BERT 的大型名称实体识别(NER)模型和基于 DistilBERT 的简化 NER 模型分别进行了训练、验证和测试。由于人工标注的语料有限,这些 NER 模型分两个阶段进行了微调。基于 BERT 的基因、变体、疾病和物种 NER 的 F1 分数分别为 97.28%、93.52%、92.54% 和 95.76%,而基于 DistilBERT 的 NER 的 F1 分数分别为 95.14%、86.26%、91.37% 和 89.92%。最重要的是,变体的实体类型首次由大型语言模型提取,并取得了与最先进的变体提取模型 tmVar 相当的 F1 分数。
{"title":"A Combined Manual Annotation and Deep-Learning Natural Language Processing Study on Accurate Entity Extraction in Hereditary Disease Related Biomedical Literature.","authors":"Dao-Ling Huang, Quanlei Zeng, Yun Xiong, Shuixia Liu, Chaoqun Pang, Menglei Xia, Ting Fang, Yanli Ma, Cuicui Qiang, Yi Zhang, Yu Zhang, Hong Li, Yuying Yuan","doi":"10.1007/s12539-024-00605-2","DOIUrl":"10.1007/s12539-024-00605-2","url":null,"abstract":"<p><p>We report a combined manual annotation and deep-learning natural language processing study to make accurate entity extraction in hereditary disease related biomedical literature. A total of 400 full articles were manually annotated based on published guidelines by experienced genetic interpreters at Beijing Genomics Institute (BGI). The performance of our manual annotations was assessed by comparing our re-annotated results with those publicly available. The overall Jaccard index was calculated to be 0.866 for the four entity types-gene, variant, disease and species. Both a BERT-based large name entity recognition (NER) model and a DistilBERT-based simplified NER model were trained, validated and tested, respectively. Due to the limited manually annotated corpus, Such NER models were fine-tuned with two phases. The F1-scores of BERT-based NER for gene, variant, disease and species are 97.28%, 93.52%, 92.54% and 95.76%, respectively, while those of DistilBERT-based NER are 95.14%, 86.26%, 91.37% and 89.92%, respectively. Most importantly, the entity type of variant has been extracted by a large language model for the first time and a comparable F1-score with the state-of-the-art variant extraction model tmVar has been achieved.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"333-344"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11289304/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139716027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-01Epub Date: 2024-07-02DOI: 10.1007/s12539-024-00626-x
Nan Zhao, Tong Wu, Wenda Wang, Lunchuan Zhang, Xinqi Gong
Protein complexes perform diverse biological functions, and obtaining their three-dimensional structure is critical to understanding and grasping their functions. In many cases, it's not just two proteins interacting to form a dimer; instead, multiple proteins interact to form a multimer. Experimentally resolving protein complex structures can be quite challenging. Recently, there have been efforts and methods that build upon prior predictions of dimer structures to attempt to predict multimer structures. However, in comparison to monomeric protein structure prediction, the accuracy of protein complex structure prediction remains relatively low. This paper provides an overview of recent advancements in efficient computational models for predicting protein complex structures. We introduce protein-protein docking methods in detail and summarize their main ideas, applicable modes, and related information. To enhance prediction accuracy, other critical protein-related information is also integrated, such as predicting interchain residue contact, utilizing experimental data like cryo-EM experiments, and considering protein interactions and non-interactions. In addition, we comprehensively review computational approaches for end-to-end prediction of protein complex structures based on artificial intelligence (AI) technology and describe commonly used datasets and representative evaluation metrics in protein complexes. Finally, we analyze the formidable challenges faced in current protein complex structure prediction tasks, including the structure prediction of heteromeric complex, disordered regions in complex, antibody-antigen complex, and RNA-related complex, as well as the evaluation metrics for complex assessment. We hope that this work will provide comprehensive knowledge of complex structure predictions to contribute to future advanced predictions.
{"title":"Review and Comparative Analysis of Methods and Advancements in Predicting Protein Complex Structure.","authors":"Nan Zhao, Tong Wu, Wenda Wang, Lunchuan Zhang, Xinqi Gong","doi":"10.1007/s12539-024-00626-x","DOIUrl":"10.1007/s12539-024-00626-x","url":null,"abstract":"<p><p>Protein complexes perform diverse biological functions, and obtaining their three-dimensional structure is critical to understanding and grasping their functions. In many cases, it's not just two proteins interacting to form a dimer; instead, multiple proteins interact to form a multimer. Experimentally resolving protein complex structures can be quite challenging. Recently, there have been efforts and methods that build upon prior predictions of dimer structures to attempt to predict multimer structures. However, in comparison to monomeric protein structure prediction, the accuracy of protein complex structure prediction remains relatively low. This paper provides an overview of recent advancements in efficient computational models for predicting protein complex structures. We introduce protein-protein docking methods in detail and summarize their main ideas, applicable modes, and related information. To enhance prediction accuracy, other critical protein-related information is also integrated, such as predicting interchain residue contact, utilizing experimental data like cryo-EM experiments, and considering protein interactions and non-interactions. In addition, we comprehensively review computational approaches for end-to-end prediction of protein complex structures based on artificial intelligence (AI) technology and describe commonly used datasets and representative evaluation metrics in protein complexes. Finally, we analyze the formidable challenges faced in current protein complex structure prediction tasks, including the structure prediction of heteromeric complex, disordered regions in complex, antibody-antigen complex, and RNA-related complex, as well as the evaluation metrics for complex assessment. We hope that this work will provide comprehensive knowledge of complex structure predictions to contribute to future advanced predictions.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"261-288"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141491789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}