首页 > 最新文献

Methods最新文献

英文 中文
SpaInGNN: Enhanced clustering and integration of spatial transcriptomics based on refined graph neural networks SpaInGNN:基于精炼图神经网络的空间转录组学增强聚类和整合。
IF 4.2 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-13 DOI: 10.1016/j.ymeth.2024.11.006
Fangqin Zhang, Zhan Shen, Siyi Huang, Yuan Zhu, Ming Yi
Recent developments in spatial transcriptomics (ST) technology have markedly enhanced the proposed capacity to comprehensively characterize gene expression patterns within tissue microenvironments while crucially preserving spatial context. However, the identification of spatial domains at the single-cell level remains a significant challenge in elucidating biological processes. To address this, SpaInGNN was developed, a sophisticated graph neural network (GNN) framework that accurately delineates spatial domains by integrating spatial location data, histological information, and gene expression profiles into low-dimensional latent embeddings. Additionally, to fully leverage spatial coordinate data, spatial integration using graph neural network (SpaInGNN) refines the graph constructed for spatial locations by incorporating both tissue image distance and Euclidean distance, following a pre-clustering of gene expression profiles. This refined graph is then embedded using a self-supervised GNN, which minimizes self-reconfiguration loss. By applying SpaInGNN to refined graphs across multiple consecutive tissue slices, this study mitigates the impact of batch effects in data analysis. The proposed method demonstrates substantial improvements in the accuracy of spatial domain recognition, providing a more faithful representation of the tissue organization in both mouse olfactory bulb and human lateral prefrontal cortex samples.
空间转录组学(ST)技术的最新发展显著提高了全面描述组织微环境中基因表达模式的能力,同时保留了空间背景。然而,在单细胞水平识别空间域仍然是阐明生物过程的重大挑战。为了解决这个问题,我们开发了一种复杂的图神经网络(GNN)框架--SpaInGNN,它通过将空间位置数据、组织学信息和基因表达谱整合到低维潜在嵌入中来精确划分空间域。此外,为了充分利用空间坐标数据,使用图神经网络的空间整合(SpaInGNN)在对基因表达谱进行预聚类后,通过结合组织图像距离和欧氏距离,完善了为空间位置构建的图。然后,使用自监督 GNN 嵌入这一细化图,从而最大限度地减少自重新配置损失。通过将 SpaInGNN 应用于多个连续组织切片的精炼图,本研究减轻了数据分析中批次效应的影响。所提出的方法大大提高了空间域识别的准确性,更忠实地再现了小鼠嗅球和人类外侧前额叶皮层样本的组织结构。
{"title":"SpaInGNN: Enhanced clustering and integration of spatial transcriptomics based on refined graph neural networks","authors":"Fangqin Zhang,&nbsp;Zhan Shen,&nbsp;Siyi Huang,&nbsp;Yuan Zhu,&nbsp;Ming Yi","doi":"10.1016/j.ymeth.2024.11.006","DOIUrl":"10.1016/j.ymeth.2024.11.006","url":null,"abstract":"<div><div>Recent developments in spatial transcriptomics (ST) technology have markedly enhanced the proposed capacity to comprehensively characterize gene expression patterns within tissue microenvironments while crucially preserving spatial context. However, the identification of spatial domains at the single-cell level remains a significant challenge in elucidating biological processes. To address this, SpaInGNN was developed, a sophisticated graph neural network (GNN) framework that accurately delineates spatial domains by integrating spatial location data, histological information, and gene expression profiles into low-dimensional latent embeddings. Additionally, to fully leverage spatial coordinate data, spatial integration using graph neural network (SpaInGNN) refines the graph constructed for spatial locations by incorporating both tissue image distance and Euclidean distance, following a pre-clustering of gene expression profiles. This refined graph is then embedded using a self-supervised GNN, which minimizes self-reconfiguration loss. By applying SpaInGNN to refined graphs across multiple consecutive tissue slices, this study mitigates the impact of batch effects in data analysis. The proposed method demonstrates substantial improvements in the accuracy of spatial domain recognition, providing a more faithful representation of the tissue organization in both mouse olfactory bulb and human lateral prefrontal cortex samples.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"233 ","pages":"Pages 42-51"},"PeriodicalIF":4.2,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142611449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MVCLST: A spatial transcriptome data analysis pipeline for cell type classification based on multi-view comparative learning MVCLST:基于多视角比较学习的细胞类型分类空间转录组数据分析管道。
IF 4.2 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-13 DOI: 10.1016/j.ymeth.2024.11.001
Wei Peng , Zhihao Zhang , Wei Dai , Zhihao Ping , Xiaodong Fu , Li Liu , Lijun Liu , Ning Yu
Recent advancements in spatial transcriptomics sequencing technologies can not only provide gene expression within individual cells or cell clusters (spots) in a tissue but also pinpoint the exact location of this expression and generate detailed images of stained tissue sections, which offers invaluable insights into cell type identification and cell function exploration. However, effectively integrating the gene expression data, spatial location information, and tissue images from spatial transcriptomics data presents a significant challenge for computational methods in cell classification. In this work, we propose MVCLST, a multi-view comparative learning method to analyze spatial transcriptomics data for accurate cell type classification. MVCLST constructs two views based on gene expression profiles, cell coordinates and image features. The multi-view method we proposed can significantly enhance the effectiveness of feature extraction while avoiding the impact of erroneous information in organizing image or gene expression data. The model employs four separate encoders to capture shared and unique features within each view. To ensure consistency and facilitate information exchange between the two views, MVCLST incorporates a contrastive learning loss function. The extracted shared and private features from both views are fused using corresponding decoders. Finally, the model utilizes the Leiden algorithm to cluster the learned features for cell type identification. Additionally, we establish a framework called MVCLST-CCFS for spatial transcriptomics data analysis based on MVCLST and consistent clustering. Our method achieves excellent results in clustering on human dorsolateral prefrontal cortex data and the mouse brain tissue data. It also outperforms state-of-the-art techniques in the subsequent search for highly variable genes across cell types on the mouse olfactory bulb data.
空间转录组学测序技术的最新进展不仅能提供组织中单个细胞或细胞簇(点)内的基因表达,还能精确定位基因表达的确切位置,并生成染色组织切片的详细图像,这为细胞类型鉴定和细胞功能探索提供了宝贵的见解。然而,如何有效整合空间转录组学数据中的基因表达数据、空间位置信息和组织图像,是细胞分类计算方法面临的重大挑战。在这项工作中,我们提出了 MVCLST,这是一种多视图比较学习方法,用于分析空间转录组学数据,以实现准确的细胞类型分类。MVCLST 基于基因表达谱、细胞坐标和图像特征构建两个视图。我们提出的多视图方法可以显著提高特征提取的有效性,同时避免错误信息对图像或基因表达数据组织的影响。该模型采用四个独立的编码器来捕捉每个视图中的共享和独特特征。为了确保一致性并促进两个视图之间的信息交流,MVCLST 采用了对比学习损失函数。使用相应的解码器融合从两个视图中提取的共享和私有特征。最后,该模型利用莱顿算法对所学特征进行聚类,以识别细胞类型。此外,我们还建立了一个基于 MVCLST 和一致聚类的空间转录组学数据分析框架,称为 MVCLST-CCFS。我们的方法在人类背外侧前额叶皮层数据和小鼠脑组织数据的聚类中取得了优异的成绩。在随后对小鼠嗅球数据进行跨细胞类型的高变异基因搜索时,Italso 的表现优于最先进的技术。
{"title":"MVCLST: A spatial transcriptome data analysis pipeline for cell type classification based on multi-view comparative learning","authors":"Wei Peng ,&nbsp;Zhihao Zhang ,&nbsp;Wei Dai ,&nbsp;Zhihao Ping ,&nbsp;Xiaodong Fu ,&nbsp;Li Liu ,&nbsp;Lijun Liu ,&nbsp;Ning Yu","doi":"10.1016/j.ymeth.2024.11.001","DOIUrl":"10.1016/j.ymeth.2024.11.001","url":null,"abstract":"<div><div>Recent advancements in spatial transcriptomics sequencing technologies can not only provide gene expression within individual cells or cell clusters (spots) in a tissue but also pinpoint the exact location of this expression and generate detailed images of stained tissue sections, which offers invaluable insights into cell type identification and cell function exploration. However, effectively integrating<!--> <!-->the<!--> <!-->gene expression data, spatial location information, and tissue images from spatial transcriptomics data presents a significant challenge for computational methods<!--> <!-->in cell classification. In this work, we propose MVCLST, a multi-view comparative learning<!--> <!-->method to analyze spatial transcriptomics<!--> <!-->data for accurate cell type classification. MVCLST<!--> <!-->constructs two views based on gene expression profiles, cell coordinates and image features. The multi-view method we proposed can significantly enhance the effectiveness of feature extraction while avoiding the impact of erroneous information in organizing image or gene expression data. The model employs four separate encoders to capture shared and unique features within each view. To ensure consistency and facilitate information exchange between the two views, MVCLST incorporates a contrastive learning loss function. The extracted shared and private features from both views are fused using corresponding decoders. Finally, the model utilizes the Leiden algorithm to cluster<!--> <!-->the learned features<!--> <!-->for cell type identification. Additionally, we establish a framework called MVCLST-CCFS for spatial transcriptomics<!--> <!-->data analysis based on MVCLST and consistent clustering. Our method achieves excellent results in clustering on human dorsolateral prefrontal cortex data and the mouse brain tissue data. It<!--> <!-->also outperforms state-of-the-art techniques in the subsequent search for highly variable genes across cell types on the mouse olfactory bulb<!--> <!-->data.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"232 ","pages":"Pages 115-128"},"PeriodicalIF":4.2,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142611448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inferring causal relationships among histone modifications in exon skipping event 推断外显子缺失事件中组蛋白修饰的因果关系
IF 4.2 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-09 DOI: 10.1016/j.ymeth.2024.11.008
Pengmian Feng , Yuanfang Tian , Wei Chen
Alternative splicing is a crucial process of gene expression. Over 90% multi-exonic genes in human genome undergo alternative splicing. Although the splicing code has been proposed, it still couldn’t satisfactorily explain the tissue-specific alternative splicing. Results of co-transcriptional RNA processing analysis demonstrated that, except for trans- and cis-acting elements, histone modifications also play a role in alternative splicing. In the present work, we analyzed the associations among 27 kinds of histone modifications in H1 human embryonic stem cell. In order to illustrate the casual relationships between histone modification and alternative splicing, we built the Bayesian network and validated its robustness by using cross validation test. In addition to the combinatorial patterns, distinct histone modification patterns were also observed in the alternative spliced exons and surrounding intron regions, indicating that histone modifications could substantially mark alternative splicing.
替代剪接是基因表达的关键过程。人类基因组中 90% 以上的多外显子基因都会发生替代剪接。虽然有人提出了剪接密码,但仍无法令人满意地解释组织特异性的替代剪接。共转录 RNA 处理分析结果表明,除了反式和顺式作用元件外,组蛋白修饰也在替代剪接中发挥作用。在本研究中,我们分析了 H1 人类胚胎干细胞中 27 种组蛋白修饰之间的关联。为了说明组蛋白修饰与替代剪接之间的偶然关系,我们建立了贝叶斯网络,并通过交叉验证测试验证了其稳健性。除了组合模式外,在替代剪接的外显子和周围的内含子区域也观察到了不同的组蛋白修饰模式,这表明组蛋白修饰可以在很大程度上标记替代剪接。
{"title":"Inferring causal relationships among histone modifications in exon skipping event","authors":"Pengmian Feng ,&nbsp;Yuanfang Tian ,&nbsp;Wei Chen","doi":"10.1016/j.ymeth.2024.11.008","DOIUrl":"10.1016/j.ymeth.2024.11.008","url":null,"abstract":"<div><div>Alternative splicing is a crucial process of gene expression. Over 90% multi-exonic genes in human genome undergo alternative splicing. Although the splicing code has been proposed, it still couldn’t satisfactorily explain the tissue-specific alternative splicing. Results of co-transcriptional RNA processing analysis demonstrated that, except for trans- and cis-acting elements, histone modifications also play a role in alternative splicing. In the present work, we analyzed the associations among 27 kinds of histone modifications in H1 human embryonic stem cell. In order to illustrate the casual relationships between histone modification and alternative splicing, we built the Bayesian network and validated its robustness by using cross validation test. In addition to the combinatorial patterns, distinct histone modification patterns were also observed in the alternative spliced exons and surrounding intron regions, indicating that histone modifications could substantially mark alternative splicing.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"232 ","pages":"Pages 89-95"},"PeriodicalIF":4.2,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142611463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
dsRNAPredictor-II: An improved predictor of identifying dsRNA and its silencing efficiency for Tribolium castaneum based on sequence length distribution dsRNAPredictor-II:基于序列长度分布的dsRNA及其沉默效率的改进型预测器。
IF 4.2 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-09 DOI: 10.1016/j.ymeth.2024.11.007
Liping Xu, Jia Zheng, Yetong Zhou, Cangzhi Jia
RNA interference (RNAi) has been widely utilized to investigate gene functions and has significant potential for control of pest insects. However, recent studies have revealed that the target insect species, dsRNA molecule length, target genes, and other experimental factors can affect the efficiency of RNAi mediated control, restricting the further development and application of this technology. Therefore, the aim of this study was to establish a deep learning model using bioinformatics to help researchers identify dsRNA fragments with the highest RNAi efficiency. In this study, we optimized an existing model, namely, dsRNAPredictor, by designing sub-models based on different sequence lengths. Accordingly, the data were divided into two groups: 130–399 bp and 400–616 bp long sequences. Then, one-hot encoding was employed to extract sequence information. The convolutional neural network framework comprising three convolutional layers, three average pooling layers, a flattened layer, and three dense layers was employed as the classifier. By adjusting the parameters, we established two sub-models for different sequence distributions. Using multiple independent test datasets and conducting hypothesis testing, we demonstrated that our model exhibits superior performance and strong robustness to dsRNAPredictor, respectively. Therefore, our model may help design dsRNAs with pre-screening potential and facilitate further research and applications.
RNA 干扰(RNAi)已被广泛用于研究基因功能,在控制害虫方面具有巨大潜力。然而,近年来的研究发现,目标昆虫种类、dsRNA分子长度、目标基因等实验因素都会影响RNAi介导控制的效率,制约了该技术的进一步发展和应用。因此,本研究旨在利用生物信息学建立一个深度学习模型,帮助研究人员识别RNAi效率最高的dsRNA片段。在本研究中,我们根据不同的序列长度设计了子模型,从而优化了现有模型,即dsRNAPredictor。因此,数据被分为两组:130-399 bp 和 400-616 bp 长序列。然后,采用单次编码提取序列信息。分类器采用了由三个卷积层、三个平均池化层、一个扁平层和三个密集层组成的卷积神经网络框架。通过调整参数,我们针对不同的序列分布建立了两个子模型。通过使用多个独立测试数据集并进行假设检验,我们证明了我们的模型分别比dsRNAPredictor表现出更优越的性能和更强的鲁棒性。因此,我们的模型可以帮助设计具有预筛选潜力的 dsRNA,促进进一步的研究和应用。
{"title":"dsRNAPredictor-II: An improved predictor of identifying dsRNA and its silencing efficiency for Tribolium castaneum based on sequence length distribution","authors":"Liping Xu,&nbsp;Jia Zheng,&nbsp;Yetong Zhou,&nbsp;Cangzhi Jia","doi":"10.1016/j.ymeth.2024.11.007","DOIUrl":"10.1016/j.ymeth.2024.11.007","url":null,"abstract":"<div><div>RNA interference (RNAi) has been widely utilized to investigate gene functions and has significant potential for control of pest insects. However, recent studies have revealed that the target insect species, dsRNA molecule length, target genes, and other experimental factors can affect the efficiency of RNAi mediated control, restricting the further development and application of this technology. Therefore, the aim of this study was to establish a deep learning model using bioinformatics to help researchers identify dsRNA fragments with the highest RNAi efficiency. In this study, we optimized an existing model, namely, dsRNAPredictor, by designing sub-models based on different sequence lengths. Accordingly, the data were divided into two groups: 130–399 bp and 400–616 bp long sequences. Then, one-hot encoding was employed to extract sequence information. The convolutional neural network framework comprising three convolutional layers, three average pooling layers, a flattened layer, and three dense layers was employed as the classifier. By adjusting the parameters, we established two sub-models for different sequence distributions. Using multiple independent test datasets and conducting hypothesis testing, we demonstrated that our model exhibits superior performance and strong robustness to dsRNAPredictor, respectively. Therefore, our model may help design dsRNAs with pre-screening potential and facilitate further research and applications.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"232 ","pages":"Pages 129-138"},"PeriodicalIF":4.2,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142611539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction of YY1 loop anchor based on multi-omics features 基于多组学特征的 YY1 环锚预测
IF 4.2 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-07 DOI: 10.1016/j.ymeth.2024.11.004
Jun Ren , Zhiling Guo , Yixuan Qi , Zheng Zhang , Li Liu
The three-dimensional structure of chromatin is crucial for the regulation of gene expression. YY1 promotes enhancer-promoter interactions in a manner analogous to CTCF-mediated chromatin interactions. However, little is known about which YY1 binding sites can form loop anchors. In this study, the LightGBM model was used to predict YY1-loop anchors by integrating multi-omics data. Due to the large imbalance in the number of positive and negative samples, we use AUPRC to reflect the quality of the classifier. The results show that the LightGBM model exhibits strong predictive performance (AUPRC0.93). To verify the robustness of the model, the dataset was divided into training and test sets at a 4:1 ratio. The results show that the model performs well for YY1-loop anchor prediction on both the training and independent test sets. Additionally, we ranked the importance of the features and found that the formation of YY1-loop anchors is primarily influenced by the co-binding of transcription factors CTCF, SMC3, and RAD21, as well as histone modifications and sequence context.
染色质的三维结构对基因表达的调控至关重要。YY1 促进增强子-启动子相互作用的方式类似于 CTCF 介导的染色质相互作用。然而,人们对哪些 YY1 结合位点可以形成环锚知之甚少。本研究利用 LightGBM 模型通过整合多组学数据来预测 YY1 环锚。由于阳性样本和阴性样本的数量存在较大的不平衡,我们使用 AUPRC 来反映分类器的质量。结果表明,LightGBM 模型具有很强的预测性能(AUPRC≥0.93)。为了验证模型的鲁棒性,数据集以 4:1 的比例分为训练集和测试集。结果表明,该模型在训练集和独立测试集上的 YY1 环锚预测性能都很好。此外,我们还对特征的重要性进行了排序,发现 YY1 环锚的形成主要受转录因子 CTCF、SMC3 和 RAD21 的共同结合以及组蛋白修饰和序列上下文的影响。
{"title":"Prediction of YY1 loop anchor based on multi-omics features","authors":"Jun Ren ,&nbsp;Zhiling Guo ,&nbsp;Yixuan Qi ,&nbsp;Zheng Zhang ,&nbsp;Li Liu","doi":"10.1016/j.ymeth.2024.11.004","DOIUrl":"10.1016/j.ymeth.2024.11.004","url":null,"abstract":"<div><div>The three-dimensional structure of chromatin is crucial for the regulation of gene expression. YY1 promotes enhancer-promoter interactions in a manner analogous to CTCF-mediated chromatin interactions. However, little is known about which YY1 binding sites can form loop anchors. In this study, the LightGBM model was used to predict YY1-loop anchors by integrating multi-omics data. Due to the large imbalance in the number of positive and negative samples, we use AUPRC to reflect the quality of the classifier. The results show that the LightGBM model exhibits strong predictive performance (<span><math><mrow><mi>A</mi><mi>U</mi><mi>P</mi><mi>R</mi><mi>C</mi><mo>≥</mo><mn>0.93</mn></mrow></math></span>). To verify the robustness of the model, the dataset was divided into training and test sets at a 4:1 ratio. The results show that the model performs well for YY1-loop anchor prediction on both the training and independent test sets. Additionally, we ranked the importance of the features and found that the formation of YY1-loop anchors is primarily influenced by the co-binding of transcription factors CTCF, SMC3, and RAD21, as well as histone modifications and sequence context.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"232 ","pages":"Pages 96-106"},"PeriodicalIF":4.2,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142611464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HistoSPACE: Histology-inspired spatial transcriptome prediction and characterization engine HistoSPACE:受组织学启发的空间转录组预测和表征引擎。
IF 4.2 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-07 DOI: 10.1016/j.ymeth.2024.11.002
Shivam Kumar, Samrat Chatterjee
Spatial transcriptomics (ST) enables the visualization of gene expression within the context of tissue morphology. This emerging discipline has the potential to serve as a foundation for developing tools to design precision medicines. However, due to the higher costs and expertise required for such experiments, its translation into a regular clinical practice might be challenging. Despite implementing modern deep learning to enhance information obtained from histological images using AI, efforts have been constrained by limitations in the diversity of information. In this paper, we developed a model, HistoSPACE, that explores the diversity of histological images available with ST data to extract molecular insights from tissue images. Further, our approach allows us to link the predicted expression with disease pathology. Our proposed study built an image encoder derived from a universal image autoencoder. This image encoder was connected to convolution blocks to build the final model. It was further fine-tuned with the help of ST-Data. The number of model parameters is small and requires lesser system memory and relatively lesser training time. Making it lightweight in comparison to traditional histological models. Our developed model demonstrates significant efficiency compared to contemporary algorithms, revealing a correlation of 0.56 in leave-one-out cross-validation. Finally, its robustness was validated through an independent dataset, showing similar prediction with predefined disease pathology. Our code is available at https://github.com/samrat-lab/HistoSPACE.
空间转录组学(ST)可将组织形态背景下的基因表达可视化。这门新兴学科有望成为开发精准药物设计工具的基础。然而,由于此类实验需要较高的成本和专业知识,将其转化为常规临床实践可能具有挑战性。尽管采用了现代深度学习技术来利用人工智能增强从组织学图像中获取的信息,但由于信息多样性的限制,这方面的努力一直受到制约。在本文中,我们开发了一个名为 "HistoSPACE "的模型,利用 ST 数据探索组织学图像的多样性,从组织图像中提取分子信息。此外,我们的方法还能将预测表达与疾病病理联系起来。我们提出的研究建立了一个源自通用图像自动编码器的图像编码器。该图像编码器与卷积块相连,以建立最终模型。在 ST-Data 的帮助下,对其进行了进一步的微调。模型参数数量少,所需的系统内存和训练时间也相对较少。与传统的组织学模型相比,该模型更轻便。与当代算法相比,我们开发的模型具有显著的效率,在留空交叉验证中显示出 0.56 的相关性。最后,我们通过一个独立的数据集验证了该模型的鲁棒性,显示出与预定义疾病病理相似的预测结果。我们的代码见 https://github.com/samrat-lab/HistoSPACE。
{"title":"HistoSPACE: Histology-inspired spatial transcriptome prediction and characterization engine","authors":"Shivam Kumar,&nbsp;Samrat Chatterjee","doi":"10.1016/j.ymeth.2024.11.002","DOIUrl":"10.1016/j.ymeth.2024.11.002","url":null,"abstract":"<div><div>Spatial transcriptomics (ST) enables the visualization of gene expression within the context of tissue morphology. This emerging discipline has the potential to serve as a foundation for developing tools to design precision medicines. However, due to the higher costs and expertise required for such experiments, its translation into a regular clinical practice might be challenging. Despite implementing modern deep learning to enhance information obtained from histological images using AI, efforts have been constrained by limitations in the diversity of information. In this paper, we developed a model, HistoSPACE, that explores the diversity of histological images available with ST data to extract molecular insights from tissue images. Further, our approach allows us to link the predicted expression with disease pathology. Our proposed study built an image encoder derived from a universal image autoencoder. This image encoder was connected to convolution blocks to build the final model. It was further fine-tuned with the help of ST-Data. The number of model parameters is small and requires lesser system memory and relatively lesser training time. Making it lightweight in comparison to traditional histological models. Our developed model demonstrates significant efficiency compared to contemporary algorithms, revealing a correlation of 0.56 in leave-one-out cross-validation. Finally, its robustness was validated through an independent dataset, showing similar prediction with predefined disease pathology. Our code is available at <span><span>https://github.com/samrat-lab/HistoSPACE</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"232 ","pages":"Pages 107-114"},"PeriodicalIF":4.2,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142611442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development and validation of a machine learning model for predicting drug-drug interactions with oral diabetes medications 开发并验证用于预测糖尿病口服药物药物相互作用的机器学习模型。
IF 4.2 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-01 DOI: 10.1016/j.ymeth.2024.10.012
Quang-Hien Kha , Ngan Thi Kim Nguyen , Nguyen Quoc Khanh Le , Jiunn-Horng Kang
Diabetes management is often complicated by comorbidities, requiring complex medication regimens that increase the risk of drug-drug interactions (DDIs), potentially compromising treatment outcomes or causing toxicity. Although machine learning (ML) models have made strides in DDI prediction, existing approaches lack specificity for oral diabetes medications and face challenges in interpretability. To address these limitations, we propose a novel ML-based framework utilizing the Simplified Molecular Input Line Entry System (SMILES) to encode structural information of oral diabetes drugs. Using this representation, we developed an XGBoost model, selecting molecular features through LASSO. Our dataset, sourced from DrugBank, included 42 oral diabetes drugs and 1,884 interacting drugs, divided into training, validation, and testing sets. The model identified 606 optimal features, achieving an F1-score of 0.8182. SHAP analysis was employed for feature interpretation, enhancing model transparency and clinical relevance. By predicting adverse DDIs, our model offers a valuable tool for clinical decision-making, aiding safer prescription practices. The 606 critical features provide insights into atomic-level interactions, linking computational predictions with biological experiments. We present a classification model specifically designed for predicting DDIs associated with oral diabetes medications, with an openly accessible web application to support diabetes management in multi-drug regimens and comorbidity settings.
糖尿病的治疗往往因合并症而变得复杂,需要复杂的用药方案,这增加了药物相互作用(DDI)的风险,可能会影响治疗效果或导致中毒。虽然机器学习(ML)模型在 DDI 预测方面取得了长足进步,但现有方法对口服糖尿病药物缺乏特异性,在可解释性方面也面临挑战。为了解决这些局限性,我们提出了一种基于 ML 的新型框架,利用简化分子输入行输入系统(SMILES)来编码口服糖尿病药物的结构信息。利用这种表示方法,我们开发了一个 XGBoost 模型,通过 LASSO 选择分子特征。我们的数据集来自 DrugBank,包括 42 种口服糖尿病药物和 1,884 种相互作用药物,分为训练集、验证集和测试集。该模型识别出 606 个最佳特征,F1 分数达到 0.8182。采用 SHAP 分析进行特征解释,提高了模型的透明度和临床相关性。通过预测不良的 DDIs,我们的模型为临床决策提供了有价值的工具,有助于更安全的处方实践。606 个关键特征深入揭示了原子级的相互作用,将计算预测与生物实验联系起来。我们提出的分类模型专门用于预测与口服糖尿病药物相关的DDIs,并提供了一个可公开访问的网络应用程序,以支持多种药物治疗方案和合并症环境下的糖尿病管理。
{"title":"Development and validation of a machine learning model for predicting drug-drug interactions with oral diabetes medications","authors":"Quang-Hien Kha ,&nbsp;Ngan Thi Kim Nguyen ,&nbsp;Nguyen Quoc Khanh Le ,&nbsp;Jiunn-Horng Kang","doi":"10.1016/j.ymeth.2024.10.012","DOIUrl":"10.1016/j.ymeth.2024.10.012","url":null,"abstract":"<div><div>Diabetes management is often complicated by comorbidities, requiring complex medication regimens that increase the risk of drug-drug interactions (DDIs), potentially compromising treatment outcomes or causing toxicity. Although machine learning (ML) models have made strides in DDI prediction, existing approaches lack specificity for oral diabetes medications and face challenges in interpretability. To address these limitations, we propose a novel ML-based framework utilizing the Simplified Molecular Input Line Entry System (SMILES) to encode structural information of oral diabetes drugs. Using this representation, we developed an XGBoost model, selecting molecular features through LASSO. Our dataset, sourced from DrugBank, included 42 oral diabetes drugs and 1,884 interacting drugs, divided into training, validation, and testing sets. The model identified 606 optimal features, achieving an F1-score of 0.8182. SHAP analysis was employed for feature interpretation, enhancing model transparency and clinical relevance. By predicting adverse DDIs, our model offers a valuable tool for clinical decision-making, aiding safer prescription practices. The 606 critical features provide insights into atomic-level interactions, linking computational predictions with biological experiments. We present a classification model specifically designed for predicting DDIs associated with oral diabetes medications, with an openly accessible web application to support diabetes management in multi-drug regimens and comorbidity settings.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"232 ","pages":"Pages 81-88"},"PeriodicalIF":4.2,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142566761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development of novel digital PCR assays for the rapid quantification of Gram-negative bacteria biomarkers using RUCS algorithm 利用 RUCS 算法开发用于快速量化革兰氏阴性菌生物标志物的新型数字 PCR 检测方法。
IF 4.2 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-10-30 DOI: 10.1016/j.ymeth.2024.10.011
Alexandra Bogožalec Košir , Špela Alič , Viktorija Tomič , Dane Lužnik , Tanja Dreo , Mojca Milavec
Rapid and accurate identification of bacterial pathogens is crucial for effective treatment and infection control, particularly in hospital settings. Conventional methods like culture techniques and MALDI-TOF mass spectrometry are often time-consuming and less sensitive. This study addresses the need for faster and more precise diagnostic methods by developing novel digital PCR (dPCR) assays for the rapid quantification of biomarkers from three Gram-negative bacteria: Acinetobacter baumannii, Klebsiella pneumoniae, and Pseudomonas aeruginosa.
Utilizing publicly available genomes and the rapid identification of PCR primers for unique core sequences or RUCS algorithm, we designed highly specific dPCR assays. These assays were validated using synthetic DNA, bacterial genomic DNA, and DNA extracted from clinical samples. The developed dPCR methods demonstrated wide linearity, a low limit of detection (∼30 copies per reaction), and robust analytical performance with measurement uncertainty below 25 %. The assays showed high repeatability and intermediate precision, with no cross-reactivity observed. Comparison with MALDI-TOF mass spectrometry revealed substantial concordance, highlighting the methods’ suitability for clinical diagnostics.
This study underscores the potential of dPCR for rapid and precise quantification of Gram-negative bacterial biomarkers. The developed methods offer significant improvements over existing techniques, providing faster, more accurate, and SI-traceable measurements. These advancements could enhance clinical diagnostics and infection control practices.
快速准确地鉴定细菌病原体对有效治疗和感染控制至关重要,尤其是在医院环境中。培养技术和 MALDI-TOF 质谱等传统方法往往耗时较长,灵敏度较低。为了满足对更快、更精确的诊断方法的需求,本研究开发了新型数字 PCR(dPCR)测定法,用于快速量化三种革兰氏阴性细菌的生物标记物:鲍曼不动杆菌、肺炎克雷伯氏菌和铜绿假单胞菌。利用可公开获得的基因组和针对独特核心序列或 RUCS 算法的 PCR 引物的快速鉴定,我们设计出了高度特异性的 dPCR 检测方法。我们使用合成 DNA、细菌基因组 DNA 和从临床样本中提取的 DNA 对这些检测方法进行了验证。所开发的 dPCR 方法具有宽线性、低检测限(每反应 30 个拷贝)和稳健的分析性能,测量不确定性低于 25%。检测结果显示出较高的重复性和中等精度,未发现交叉反应。与 MALDI-TOF 质谱法的比较显示,二者具有很高的一致性,凸显了该方法在临床诊断中的适用性。这项研究强调了 dPCR 在快速、精确定量革兰氏阴性细菌生物标记物方面的潜力。与现有技术相比,所开发的方法有了重大改进,可提供更快、更准确和可溯源的测量结果。这些进步可加强临床诊断和感染控制实践。
{"title":"Development of novel digital PCR assays for the rapid quantification of Gram-negative bacteria biomarkers using RUCS algorithm","authors":"Alexandra Bogožalec Košir ,&nbsp;Špela Alič ,&nbsp;Viktorija Tomič ,&nbsp;Dane Lužnik ,&nbsp;Tanja Dreo ,&nbsp;Mojca Milavec","doi":"10.1016/j.ymeth.2024.10.011","DOIUrl":"10.1016/j.ymeth.2024.10.011","url":null,"abstract":"<div><div>Rapid and accurate identification of bacterial pathogens is crucial for effective treatment and infection control, particularly in hospital settings. Conventional methods like culture techniques and MALDI-TOF mass spectrometry are often time-consuming and less sensitive. This study addresses the need for faster and more precise diagnostic methods by developing novel digital PCR (dPCR) assays for the rapid quantification of biomarkers from three Gram-negative bacteria: <em>Acinetobacter baumannii</em>, <em>Klebsiella pneumoniae</em>, and <em>Pseudomonas aeruginosa</em>.</div><div>Utilizing publicly available genomes and the <em>rapid identification of PCR primers for unique core sequences</em> or RUCS algorithm, we designed highly specific dPCR assays. These assays were validated using synthetic DNA, bacterial genomic DNA, and DNA extracted from clinical samples. The developed dPCR methods demonstrated wide linearity, a low limit of detection (∼30 copies per reaction), and robust analytical performance with measurement uncertainty below 25 %. The assays showed high repeatability and intermediate precision, with no cross-reactivity observed. Comparison with MALDI-TOF mass spectrometry revealed substantial concordance, highlighting the methods’ suitability for clinical diagnostics.</div><div>This study underscores the potential of dPCR for rapid and precise quantification of Gram-negative bacterial biomarkers. The developed methods offer significant improvements over existing techniques, providing faster, more accurate, and SI-traceable measurements. These advancements could enhance clinical diagnostics and infection control practices.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"232 ","pages":"Pages 72-80"},"PeriodicalIF":4.2,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142556807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MLFA-UNet: A multi-level feature assembly UNet for medical image segmentation MLFA-UNet:用于医学图像分割的多层次特征组合 UNet。
IF 4.2 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-10-29 DOI: 10.1016/j.ymeth.2024.10.010
Anass Garbaz , Yassine Oukdach , Said Charfi , Mohamed El Ansari , Lahcen Koutti , Mouna Salihoun
Medical image segmentation is crucial for accurate diagnosis and treatment in medical image analysis. Among the various methods employed, fully convolutional networks (FCNs) have emerged as a prominent approach for segmenting medical images. Notably, the U-Net architecture and its variants have gained widespread adoption in this domain. This paper introduces MLFA-UNet, an innovative architectural framework aimed at advancing medical image segmentation. MLFA-UNet adopts a U-shaped architecture and integrates two pivotal modules: multi-level feature assembly (MLFA) and multi-scale information attention (MSIA), complemented by a pixel-vanishing (PV) attention mechanism. These modules synergistically contribute to the segmentation process enhancement, fostering both robustness and segmentation precision. MLFA operates within both the network encoder and decoder, facilitating the extraction of local information crucial for accurately segmenting lesions. Furthermore, the bottleneck MSIA module serves to replace stacking modules, thereby expanding the receptive field and augmenting feature diversity, fortified by the PV attention mechanism. These integrated mechanisms work together to boost segmentation performance by effectively capturing both detailed local features and a broader range of contextual information, enhancing both accuracy and resilience in identifying lesions. To assess the versatility of the network, we conducted evaluations of MFLA-UNet across a range of medical image segmentation datasets, encompassing diverse imaging modalities such as wireless capsule endoscopy (WCE), colonoscopy, and dermoscopic images. Our results consistently demonstrate that MFLA-UNet outperforms state-of-the-art algorithms, achieving dice coefficients of 91.42%, 82.43%, 90.8%, and 88.68% for the MICCAI 2017 (Red Lesion), ISIC 2017, PH2, and CVC-ClinicalDB datasets, respectively.
医学图像分割对于医学图像分析中的精确诊断和治疗至关重要。在采用的各种方法中,全卷积网络(FCN)已成为分割医学图像的一种重要方法。值得注意的是,U-Net 架构及其变体在这一领域得到了广泛应用。本文介绍了 MLFA-UNet,这是一种旨在推进医学图像分割的创新架构框架。MLFA-UNet 采用 U 型架构,集成了两个关键模块:多级特征集合(MLFA)和多尺度信息关注(MSIA),并辅以像素消失(PV)关注机制。这些模块协同促进了分割过程的增强,同时提高了稳健性和分割精度。MLFA 同时在网络编码器和解码器中运行,有助于提取对准确分割病变至关重要的局部信息。此外,瓶颈 MSIA 模块可取代堆叠模块,从而扩大感受野,并在 PV 注意机制的强化下增强特征多样性。这些综合机制通过有效捕捉详细的局部特征和更广泛的上下文信息,共同提高了分割性能,从而增强了识别病变的准确性和弹性。为了评估该网络的多功能性,我们在一系列医学图像分割数据集上对 MFLA-UNet 进行了评估,这些数据集包括无线胶囊内窥镜(WCE)、结肠镜检查和皮肤镜图像等多种成像模式。我们的结果一致表明,MFLA-UNet 优于最先进的算法,在 MICCAI 2017 (Red Lesion)、ISIC 2017、PH2 和 CVC-ClinicalDB 数据集上的骰子系数分别达到了 91.42%、82.43%、90.8% 和 88.68%。
{"title":"MLFA-UNet: A multi-level feature assembly UNet for medical image segmentation","authors":"Anass Garbaz ,&nbsp;Yassine Oukdach ,&nbsp;Said Charfi ,&nbsp;Mohamed El Ansari ,&nbsp;Lahcen Koutti ,&nbsp;Mouna Salihoun","doi":"10.1016/j.ymeth.2024.10.010","DOIUrl":"10.1016/j.ymeth.2024.10.010","url":null,"abstract":"<div><div>Medical image segmentation is crucial for accurate diagnosis and treatment in medical image analysis. Among the various methods employed, fully convolutional networks (FCNs) have emerged as a prominent approach for segmenting medical images. Notably, the U-Net architecture and its variants have gained widespread adoption in this domain. This paper introduces MLFA-UNet, an innovative architectural framework aimed at advancing medical image segmentation. MLFA-UNet adopts a U-shaped architecture and integrates two pivotal modules: multi-level feature assembly (MLFA) and multi-scale information attention (MSIA), complemented by a pixel-vanishing (PV) attention mechanism. These modules synergistically contribute to the segmentation process enhancement, fostering both robustness and segmentation precision. MLFA operates within both the network encoder and decoder, facilitating the extraction of local information crucial for accurately segmenting lesions. Furthermore, the bottleneck MSIA module serves to replace stacking modules, thereby expanding the receptive field and augmenting feature diversity, fortified by the PV attention mechanism. These integrated mechanisms work together to boost segmentation performance by effectively capturing both detailed local features and a broader range of contextual information, enhancing both accuracy and resilience in identifying lesions. To assess the versatility of the network, we conducted evaluations of MFLA-UNet across a range of medical image segmentation datasets, encompassing diverse imaging modalities such as wireless capsule endoscopy (WCE), colonoscopy, and dermoscopic images. Our results consistently demonstrate that MFLA-UNet outperforms state-of-the-art algorithms, achieving dice coefficients of 91.42%, 82.43%, 90.8%, and 88.68% for the MICCAI 2017 (Red Lesion), ISIC 2017, PH2, and CVC-ClinicalDB datasets, respectively.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"232 ","pages":"Pages 52-64"},"PeriodicalIF":4.2,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142556808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Arabidopsis thaliana ubiquitination site prediction through knowledge distillation and natural language processing 通过知识提炼和自然语言处理提高拟南芥泛素化位点预测能力
IF 4.2 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-10-22 DOI: 10.1016/j.ymeth.2024.10.006
Van-Nui Nguyen , Thi-Xuan Tran , Thi-Tuyen Nguyen , Nguyen Quoc Khanh Le
Protein ubiquitination is a critical post-translational modification (PTM) involved in diverse biological processes and plays a pivotal role in regulating physiological mechanisms and disease states. Despite various efforts to develop ubiquitination site prediction tools across species, these tools mainly rely on predefined sequence features and machine learning algorithms, with species-specific variations in ubiquitination patterns remaining poorly understood. This study introduces a novel approach for predicting Arabidopsis thaliana ubiquitination sites using a neural network model based on knowledge distillation and natural language processing (NLP) of protein sequences. Our framework employs a multi-species “Teacher model” to guide a more compact, species-specific “Student model”, with the “Teacher” generating pseudo-labels that enhance the “Student” learning and prediction robustness. Cross-validation results demonstrate that our model achieves superior performance, with an accuracy of 86.3 % and an area under the curve (AUC) of 0.926, while independent testing confirmed these results with an accuracy of 86.3 % and an AUC of 0.923. Comparative analysis with established predictors further highlights the model’s superiority, emphasizing the effectiveness of integrating knowledge distillation and NLP in ubiquitination prediction tasks. This study presents a promising and efficient approach for ubiquitination site prediction, offering valuable insights for researchers in related fields. The code and resources are available on GitHub: https://github.com/nuinvtnu/KD_ArapUbi.
蛋白质泛素化是一种关键的翻译后修饰(PTM),涉及多种生物过程,在调节生理机制和疾病状态方面起着关键作用。尽管人们在开发跨物种泛素化位点预测工具方面做出了各种努力,但这些工具主要依赖于预定义的序列特征和机器学习算法,对泛素化模式的物种特异性差异仍然知之甚少。本研究介绍了一种预测拟南芥泛素化位点的新方法,该方法使用基于蛋白质序列知识提炼和自然语言处理(NLP)的神经网络模型。我们的框架采用多物种 "教师模型 "来指导更紧凑、特定物种的 "学生模型","教师 "生成伪标签以增强 "学生 "的学习和预测鲁棒性。交叉验证结果表明,我们的模型性能优越,准确率达 86.3%,曲线下面积(AUC)为 0.926;独立测试证实了这些结果,准确率达 86.3%,曲线下面积(AUC)为 0.923。与已有预测工具的比较分析进一步凸显了该模型的优越性,强调了在泛素化预测任务中整合知识提炼和 NLP 的有效性。这项研究为泛素化位点预测提供了一种前景广阔的高效方法,为相关领域的研究人员提供了宝贵的见解。代码和资源可在 GitHub 上获取:https://github.com/nuinvtnu/KD_ArapUbi.
{"title":"Enhancing Arabidopsis thaliana ubiquitination site prediction through knowledge distillation and natural language processing","authors":"Van-Nui Nguyen ,&nbsp;Thi-Xuan Tran ,&nbsp;Thi-Tuyen Nguyen ,&nbsp;Nguyen Quoc Khanh Le","doi":"10.1016/j.ymeth.2024.10.006","DOIUrl":"10.1016/j.ymeth.2024.10.006","url":null,"abstract":"<div><div>Protein ubiquitination is a critical post-translational modification (PTM) involved in diverse biological processes and plays a pivotal role in regulating physiological mechanisms and disease states. Despite various efforts to develop ubiquitination site prediction tools across species, these tools mainly rely on predefined sequence features and machine learning algorithms, with species-specific variations in ubiquitination patterns remaining poorly understood. This study introduces a novel approach for predicting <em>Arabidopsis thaliana</em> ubiquitination sites using a neural network model based on knowledge distillation and natural language processing (NLP) of protein sequences. Our framework employs a multi-species “Teacher model” to guide a more compact, species-specific “Student model”, with the “Teacher” generating pseudo-labels that enhance the “Student” learning and prediction robustness. Cross-validation results demonstrate that our model achieves superior performance, with an accuracy of 86.3 % and an area under the curve (AUC) of 0.926, while independent testing confirmed these results with an accuracy of 86.3 % and an AUC of 0.923. Comparative analysis with established predictors further highlights the model’s superiority, emphasizing the effectiveness of integrating knowledge distillation and NLP in ubiquitination prediction tasks. This study presents a promising and efficient approach for ubiquitination site prediction, offering valuable insights for researchers in related fields. The code and resources are available on GitHub: <span><span>https://github.com/nuinvtnu/KD_ArapUbi</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"232 ","pages":"Pages 65-71"},"PeriodicalIF":4.2,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142492289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Methods
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1