Pub Date : 2024-11-13DOI: 10.1016/j.ymeth.2024.11.006
Fangqin Zhang, Zhan Shen, Siyi Huang, Yuan Zhu, Ming Yi
Recent developments in spatial transcriptomics (ST) technology have markedly enhanced the proposed capacity to comprehensively characterize gene expression patterns within tissue microenvironments while crucially preserving spatial context. However, the identification of spatial domains at the single-cell level remains a significant challenge in elucidating biological processes. To address this, SpaInGNN was developed, a sophisticated graph neural network (GNN) framework that accurately delineates spatial domains by integrating spatial location data, histological information, and gene expression profiles into low-dimensional latent embeddings. Additionally, to fully leverage spatial coordinate data, spatial integration using graph neural network (SpaInGNN) refines the graph constructed for spatial locations by incorporating both tissue image distance and Euclidean distance, following a pre-clustering of gene expression profiles. This refined graph is then embedded using a self-supervised GNN, which minimizes self-reconfiguration loss. By applying SpaInGNN to refined graphs across multiple consecutive tissue slices, this study mitigates the impact of batch effects in data analysis. The proposed method demonstrates substantial improvements in the accuracy of spatial domain recognition, providing a more faithful representation of the tissue organization in both mouse olfactory bulb and human lateral prefrontal cortex samples.
{"title":"SpaInGNN: Enhanced clustering and integration of spatial transcriptomics based on refined graph neural networks","authors":"Fangqin Zhang, Zhan Shen, Siyi Huang, Yuan Zhu, Ming Yi","doi":"10.1016/j.ymeth.2024.11.006","DOIUrl":"10.1016/j.ymeth.2024.11.006","url":null,"abstract":"<div><div>Recent developments in spatial transcriptomics (ST) technology have markedly enhanced the proposed capacity to comprehensively characterize gene expression patterns within tissue microenvironments while crucially preserving spatial context. However, the identification of spatial domains at the single-cell level remains a significant challenge in elucidating biological processes. To address this, SpaInGNN was developed, a sophisticated graph neural network (GNN) framework that accurately delineates spatial domains by integrating spatial location data, histological information, and gene expression profiles into low-dimensional latent embeddings. Additionally, to fully leverage spatial coordinate data, spatial integration using graph neural network (SpaInGNN) refines the graph constructed for spatial locations by incorporating both tissue image distance and Euclidean distance, following a pre-clustering of gene expression profiles. This refined graph is then embedded using a self-supervised GNN, which minimizes self-reconfiguration loss. By applying SpaInGNN to refined graphs across multiple consecutive tissue slices, this study mitigates the impact of batch effects in data analysis. The proposed method demonstrates substantial improvements in the accuracy of spatial domain recognition, providing a more faithful representation of the tissue organization in both mouse olfactory bulb and human lateral prefrontal cortex samples.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"233 ","pages":"Pages 42-51"},"PeriodicalIF":4.2,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142611449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-13DOI: 10.1016/j.ymeth.2024.11.001
Wei Peng , Zhihao Zhang , Wei Dai , Zhihao Ping , Xiaodong Fu , Li Liu , Lijun Liu , Ning Yu
Recent advancements in spatial transcriptomics sequencing technologies can not only provide gene expression within individual cells or cell clusters (spots) in a tissue but also pinpoint the exact location of this expression and generate detailed images of stained tissue sections, which offers invaluable insights into cell type identification and cell function exploration. However, effectively integrating the gene expression data, spatial location information, and tissue images from spatial transcriptomics data presents a significant challenge for computational methods in cell classification. In this work, we propose MVCLST, a multi-view comparative learning method to analyze spatial transcriptomics data for accurate cell type classification. MVCLST constructs two views based on gene expression profiles, cell coordinates and image features. The multi-view method we proposed can significantly enhance the effectiveness of feature extraction while avoiding the impact of erroneous information in organizing image or gene expression data. The model employs four separate encoders to capture shared and unique features within each view. To ensure consistency and facilitate information exchange between the two views, MVCLST incorporates a contrastive learning loss function. The extracted shared and private features from both views are fused using corresponding decoders. Finally, the model utilizes the Leiden algorithm to cluster the learned features for cell type identification. Additionally, we establish a framework called MVCLST-CCFS for spatial transcriptomics data analysis based on MVCLST and consistent clustering. Our method achieves excellent results in clustering on human dorsolateral prefrontal cortex data and the mouse brain tissue data. It also outperforms state-of-the-art techniques in the subsequent search for highly variable genes across cell types on the mouse olfactory bulb data.
{"title":"MVCLST: A spatial transcriptome data analysis pipeline for cell type classification based on multi-view comparative learning","authors":"Wei Peng , Zhihao Zhang , Wei Dai , Zhihao Ping , Xiaodong Fu , Li Liu , Lijun Liu , Ning Yu","doi":"10.1016/j.ymeth.2024.11.001","DOIUrl":"10.1016/j.ymeth.2024.11.001","url":null,"abstract":"<div><div>Recent advancements in spatial transcriptomics sequencing technologies can not only provide gene expression within individual cells or cell clusters (spots) in a tissue but also pinpoint the exact location of this expression and generate detailed images of stained tissue sections, which offers invaluable insights into cell type identification and cell function exploration. However, effectively integrating<!--> <!-->the<!--> <!-->gene expression data, spatial location information, and tissue images from spatial transcriptomics data presents a significant challenge for computational methods<!--> <!-->in cell classification. In this work, we propose MVCLST, a multi-view comparative learning<!--> <!-->method to analyze spatial transcriptomics<!--> <!-->data for accurate cell type classification. MVCLST<!--> <!-->constructs two views based on gene expression profiles, cell coordinates and image features. The multi-view method we proposed can significantly enhance the effectiveness of feature extraction while avoiding the impact of erroneous information in organizing image or gene expression data. The model employs four separate encoders to capture shared and unique features within each view. To ensure consistency and facilitate information exchange between the two views, MVCLST incorporates a contrastive learning loss function. The extracted shared and private features from both views are fused using corresponding decoders. Finally, the model utilizes the Leiden algorithm to cluster<!--> <!-->the learned features<!--> <!-->for cell type identification. Additionally, we establish a framework called MVCLST-CCFS for spatial transcriptomics<!--> <!-->data analysis based on MVCLST and consistent clustering. Our method achieves excellent results in clustering on human dorsolateral prefrontal cortex data and the mouse brain tissue data. It<!--> <!-->also outperforms state-of-the-art techniques in the subsequent search for highly variable genes across cell types on the mouse olfactory bulb<!--> <!-->data.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"232 ","pages":"Pages 115-128"},"PeriodicalIF":4.2,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142611448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-09DOI: 10.1016/j.ymeth.2024.11.008
Pengmian Feng , Yuanfang Tian , Wei Chen
Alternative splicing is a crucial process of gene expression. Over 90% multi-exonic genes in human genome undergo alternative splicing. Although the splicing code has been proposed, it still couldn’t satisfactorily explain the tissue-specific alternative splicing. Results of co-transcriptional RNA processing analysis demonstrated that, except for trans- and cis-acting elements, histone modifications also play a role in alternative splicing. In the present work, we analyzed the associations among 27 kinds of histone modifications in H1 human embryonic stem cell. In order to illustrate the casual relationships between histone modification and alternative splicing, we built the Bayesian network and validated its robustness by using cross validation test. In addition to the combinatorial patterns, distinct histone modification patterns were also observed in the alternative spliced exons and surrounding intron regions, indicating that histone modifications could substantially mark alternative splicing.
{"title":"Inferring causal relationships among histone modifications in exon skipping event","authors":"Pengmian Feng , Yuanfang Tian , Wei Chen","doi":"10.1016/j.ymeth.2024.11.008","DOIUrl":"10.1016/j.ymeth.2024.11.008","url":null,"abstract":"<div><div>Alternative splicing is a crucial process of gene expression. Over 90% multi-exonic genes in human genome undergo alternative splicing. Although the splicing code has been proposed, it still couldn’t satisfactorily explain the tissue-specific alternative splicing. Results of co-transcriptional RNA processing analysis demonstrated that, except for trans- and cis-acting elements, histone modifications also play a role in alternative splicing. In the present work, we analyzed the associations among 27 kinds of histone modifications in H1 human embryonic stem cell. In order to illustrate the casual relationships between histone modification and alternative splicing, we built the Bayesian network and validated its robustness by using cross validation test. In addition to the combinatorial patterns, distinct histone modification patterns were also observed in the alternative spliced exons and surrounding intron regions, indicating that histone modifications could substantially mark alternative splicing.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"232 ","pages":"Pages 89-95"},"PeriodicalIF":4.2,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142611463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-09DOI: 10.1016/j.ymeth.2024.11.007
Liping Xu, Jia Zheng, Yetong Zhou, Cangzhi Jia
RNA interference (RNAi) has been widely utilized to investigate gene functions and has significant potential for control of pest insects. However, recent studies have revealed that the target insect species, dsRNA molecule length, target genes, and other experimental factors can affect the efficiency of RNAi mediated control, restricting the further development and application of this technology. Therefore, the aim of this study was to establish a deep learning model using bioinformatics to help researchers identify dsRNA fragments with the highest RNAi efficiency. In this study, we optimized an existing model, namely, dsRNAPredictor, by designing sub-models based on different sequence lengths. Accordingly, the data were divided into two groups: 130–399 bp and 400–616 bp long sequences. Then, one-hot encoding was employed to extract sequence information. The convolutional neural network framework comprising three convolutional layers, three average pooling layers, a flattened layer, and three dense layers was employed as the classifier. By adjusting the parameters, we established two sub-models for different sequence distributions. Using multiple independent test datasets and conducting hypothesis testing, we demonstrated that our model exhibits superior performance and strong robustness to dsRNAPredictor, respectively. Therefore, our model may help design dsRNAs with pre-screening potential and facilitate further research and applications.
RNA 干扰(RNAi)已被广泛用于研究基因功能,在控制害虫方面具有巨大潜力。然而,近年来的研究发现,目标昆虫种类、dsRNA分子长度、目标基因等实验因素都会影响RNAi介导控制的效率,制约了该技术的进一步发展和应用。因此,本研究旨在利用生物信息学建立一个深度学习模型,帮助研究人员识别RNAi效率最高的dsRNA片段。在本研究中,我们根据不同的序列长度设计了子模型,从而优化了现有模型,即dsRNAPredictor。因此,数据被分为两组:130-399 bp 和 400-616 bp 长序列。然后,采用单次编码提取序列信息。分类器采用了由三个卷积层、三个平均池化层、一个扁平层和三个密集层组成的卷积神经网络框架。通过调整参数,我们针对不同的序列分布建立了两个子模型。通过使用多个独立测试数据集并进行假设检验,我们证明了我们的模型分别比dsRNAPredictor表现出更优越的性能和更强的鲁棒性。因此,我们的模型可以帮助设计具有预筛选潜力的 dsRNA,促进进一步的研究和应用。
{"title":"dsRNAPredictor-II: An improved predictor of identifying dsRNA and its silencing efficiency for Tribolium castaneum based on sequence length distribution","authors":"Liping Xu, Jia Zheng, Yetong Zhou, Cangzhi Jia","doi":"10.1016/j.ymeth.2024.11.007","DOIUrl":"10.1016/j.ymeth.2024.11.007","url":null,"abstract":"<div><div>RNA interference (RNAi) has been widely utilized to investigate gene functions and has significant potential for control of pest insects. However, recent studies have revealed that the target insect species, dsRNA molecule length, target genes, and other experimental factors can affect the efficiency of RNAi mediated control, restricting the further development and application of this technology. Therefore, the aim of this study was to establish a deep learning model using bioinformatics to help researchers identify dsRNA fragments with the highest RNAi efficiency. In this study, we optimized an existing model, namely, dsRNAPredictor, by designing sub-models based on different sequence lengths. Accordingly, the data were divided into two groups: 130–399 bp and 400–616 bp long sequences. Then, one-hot encoding was employed to extract sequence information. The convolutional neural network framework comprising three convolutional layers, three average pooling layers, a flattened layer, and three dense layers was employed as the classifier. By adjusting the parameters, we established two sub-models for different sequence distributions. Using multiple independent test datasets and conducting hypothesis testing, we demonstrated that our model exhibits superior performance and strong robustness to dsRNAPredictor, respectively. Therefore, our model may help design dsRNAs with pre-screening potential and facilitate further research and applications.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"232 ","pages":"Pages 129-138"},"PeriodicalIF":4.2,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142611539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-07DOI: 10.1016/j.ymeth.2024.11.004
Jun Ren , Zhiling Guo , Yixuan Qi , Zheng Zhang , Li Liu
The three-dimensional structure of chromatin is crucial for the regulation of gene expression. YY1 promotes enhancer-promoter interactions in a manner analogous to CTCF-mediated chromatin interactions. However, little is known about which YY1 binding sites can form loop anchors. In this study, the LightGBM model was used to predict YY1-loop anchors by integrating multi-omics data. Due to the large imbalance in the number of positive and negative samples, we use AUPRC to reflect the quality of the classifier. The results show that the LightGBM model exhibits strong predictive performance (). To verify the robustness of the model, the dataset was divided into training and test sets at a 4:1 ratio. The results show that the model performs well for YY1-loop anchor prediction on both the training and independent test sets. Additionally, we ranked the importance of the features and found that the formation of YY1-loop anchors is primarily influenced by the co-binding of transcription factors CTCF, SMC3, and RAD21, as well as histone modifications and sequence context.
{"title":"Prediction of YY1 loop anchor based on multi-omics features","authors":"Jun Ren , Zhiling Guo , Yixuan Qi , Zheng Zhang , Li Liu","doi":"10.1016/j.ymeth.2024.11.004","DOIUrl":"10.1016/j.ymeth.2024.11.004","url":null,"abstract":"<div><div>The three-dimensional structure of chromatin is crucial for the regulation of gene expression. YY1 promotes enhancer-promoter interactions in a manner analogous to CTCF-mediated chromatin interactions. However, little is known about which YY1 binding sites can form loop anchors. In this study, the LightGBM model was used to predict YY1-loop anchors by integrating multi-omics data. Due to the large imbalance in the number of positive and negative samples, we use AUPRC to reflect the quality of the classifier. The results show that the LightGBM model exhibits strong predictive performance (<span><math><mrow><mi>A</mi><mi>U</mi><mi>P</mi><mi>R</mi><mi>C</mi><mo>≥</mo><mn>0.93</mn></mrow></math></span>). To verify the robustness of the model, the dataset was divided into training and test sets at a 4:1 ratio. The results show that the model performs well for YY1-loop anchor prediction on both the training and independent test sets. Additionally, we ranked the importance of the features and found that the formation of YY1-loop anchors is primarily influenced by the co-binding of transcription factors CTCF, SMC3, and RAD21, as well as histone modifications and sequence context.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"232 ","pages":"Pages 96-106"},"PeriodicalIF":4.2,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142611464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-07DOI: 10.1016/j.ymeth.2024.11.002
Shivam Kumar, Samrat Chatterjee
Spatial transcriptomics (ST) enables the visualization of gene expression within the context of tissue morphology. This emerging discipline has the potential to serve as a foundation for developing tools to design precision medicines. However, due to the higher costs and expertise required for such experiments, its translation into a regular clinical practice might be challenging. Despite implementing modern deep learning to enhance information obtained from histological images using AI, efforts have been constrained by limitations in the diversity of information. In this paper, we developed a model, HistoSPACE, that explores the diversity of histological images available with ST data to extract molecular insights from tissue images. Further, our approach allows us to link the predicted expression with disease pathology. Our proposed study built an image encoder derived from a universal image autoencoder. This image encoder was connected to convolution blocks to build the final model. It was further fine-tuned with the help of ST-Data. The number of model parameters is small and requires lesser system memory and relatively lesser training time. Making it lightweight in comparison to traditional histological models. Our developed model demonstrates significant efficiency compared to contemporary algorithms, revealing a correlation of 0.56 in leave-one-out cross-validation. Finally, its robustness was validated through an independent dataset, showing similar prediction with predefined disease pathology. Our code is available at https://github.com/samrat-lab/HistoSPACE.
空间转录组学(ST)可将组织形态背景下的基因表达可视化。这门新兴学科有望成为开发精准药物设计工具的基础。然而,由于此类实验需要较高的成本和专业知识,将其转化为常规临床实践可能具有挑战性。尽管采用了现代深度学习技术来利用人工智能增强从组织学图像中获取的信息,但由于信息多样性的限制,这方面的努力一直受到制约。在本文中,我们开发了一个名为 "HistoSPACE "的模型,利用 ST 数据探索组织学图像的多样性,从组织图像中提取分子信息。此外,我们的方法还能将预测表达与疾病病理联系起来。我们提出的研究建立了一个源自通用图像自动编码器的图像编码器。该图像编码器与卷积块相连,以建立最终模型。在 ST-Data 的帮助下,对其进行了进一步的微调。模型参数数量少,所需的系统内存和训练时间也相对较少。与传统的组织学模型相比,该模型更轻便。与当代算法相比,我们开发的模型具有显著的效率,在留空交叉验证中显示出 0.56 的相关性。最后,我们通过一个独立的数据集验证了该模型的鲁棒性,显示出与预定义疾病病理相似的预测结果。我们的代码见 https://github.com/samrat-lab/HistoSPACE。
{"title":"HistoSPACE: Histology-inspired spatial transcriptome prediction and characterization engine","authors":"Shivam Kumar, Samrat Chatterjee","doi":"10.1016/j.ymeth.2024.11.002","DOIUrl":"10.1016/j.ymeth.2024.11.002","url":null,"abstract":"<div><div>Spatial transcriptomics (ST) enables the visualization of gene expression within the context of tissue morphology. This emerging discipline has the potential to serve as a foundation for developing tools to design precision medicines. However, due to the higher costs and expertise required for such experiments, its translation into a regular clinical practice might be challenging. Despite implementing modern deep learning to enhance information obtained from histological images using AI, efforts have been constrained by limitations in the diversity of information. In this paper, we developed a model, HistoSPACE, that explores the diversity of histological images available with ST data to extract molecular insights from tissue images. Further, our approach allows us to link the predicted expression with disease pathology. Our proposed study built an image encoder derived from a universal image autoencoder. This image encoder was connected to convolution blocks to build the final model. It was further fine-tuned with the help of ST-Data. The number of model parameters is small and requires lesser system memory and relatively lesser training time. Making it lightweight in comparison to traditional histological models. Our developed model demonstrates significant efficiency compared to contemporary algorithms, revealing a correlation of 0.56 in leave-one-out cross-validation. Finally, its robustness was validated through an independent dataset, showing similar prediction with predefined disease pathology. Our code is available at <span><span>https://github.com/samrat-lab/HistoSPACE</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"232 ","pages":"Pages 107-114"},"PeriodicalIF":4.2,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142611442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-01DOI: 10.1016/j.ymeth.2024.10.012
Quang-Hien Kha , Ngan Thi Kim Nguyen , Nguyen Quoc Khanh Le , Jiunn-Horng Kang
Diabetes management is often complicated by comorbidities, requiring complex medication regimens that increase the risk of drug-drug interactions (DDIs), potentially compromising treatment outcomes or causing toxicity. Although machine learning (ML) models have made strides in DDI prediction, existing approaches lack specificity for oral diabetes medications and face challenges in interpretability. To address these limitations, we propose a novel ML-based framework utilizing the Simplified Molecular Input Line Entry System (SMILES) to encode structural information of oral diabetes drugs. Using this representation, we developed an XGBoost model, selecting molecular features through LASSO. Our dataset, sourced from DrugBank, included 42 oral diabetes drugs and 1,884 interacting drugs, divided into training, validation, and testing sets. The model identified 606 optimal features, achieving an F1-score of 0.8182. SHAP analysis was employed for feature interpretation, enhancing model transparency and clinical relevance. By predicting adverse DDIs, our model offers a valuable tool for clinical decision-making, aiding safer prescription practices. The 606 critical features provide insights into atomic-level interactions, linking computational predictions with biological experiments. We present a classification model specifically designed for predicting DDIs associated with oral diabetes medications, with an openly accessible web application to support diabetes management in multi-drug regimens and comorbidity settings.
{"title":"Development and validation of a machine learning model for predicting drug-drug interactions with oral diabetes medications","authors":"Quang-Hien Kha , Ngan Thi Kim Nguyen , Nguyen Quoc Khanh Le , Jiunn-Horng Kang","doi":"10.1016/j.ymeth.2024.10.012","DOIUrl":"10.1016/j.ymeth.2024.10.012","url":null,"abstract":"<div><div>Diabetes management is often complicated by comorbidities, requiring complex medication regimens that increase the risk of drug-drug interactions (DDIs), potentially compromising treatment outcomes or causing toxicity. Although machine learning (ML) models have made strides in DDI prediction, existing approaches lack specificity for oral diabetes medications and face challenges in interpretability. To address these limitations, we propose a novel ML-based framework utilizing the Simplified Molecular Input Line Entry System (SMILES) to encode structural information of oral diabetes drugs. Using this representation, we developed an XGBoost model, selecting molecular features through LASSO. Our dataset, sourced from DrugBank, included 42 oral diabetes drugs and 1,884 interacting drugs, divided into training, validation, and testing sets. The model identified 606 optimal features, achieving an F1-score of 0.8182. SHAP analysis was employed for feature interpretation, enhancing model transparency and clinical relevance. By predicting adverse DDIs, our model offers a valuable tool for clinical decision-making, aiding safer prescription practices. The 606 critical features provide insights into atomic-level interactions, linking computational predictions with biological experiments. We present a classification model specifically designed for predicting DDIs associated with oral diabetes medications, with an openly accessible web application to support diabetes management in multi-drug regimens and comorbidity settings.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"232 ","pages":"Pages 81-88"},"PeriodicalIF":4.2,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142566761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-30DOI: 10.1016/j.ymeth.2024.10.011
Alexandra Bogožalec Košir , Špela Alič , Viktorija Tomič , Dane Lužnik , Tanja Dreo , Mojca Milavec
Rapid and accurate identification of bacterial pathogens is crucial for effective treatment and infection control, particularly in hospital settings. Conventional methods like culture techniques and MALDI-TOF mass spectrometry are often time-consuming and less sensitive. This study addresses the need for faster and more precise diagnostic methods by developing novel digital PCR (dPCR) assays for the rapid quantification of biomarkers from three Gram-negative bacteria: Acinetobacter baumannii, Klebsiella pneumoniae, and Pseudomonas aeruginosa.
Utilizing publicly available genomes and the rapid identification of PCR primers for unique core sequences or RUCS algorithm, we designed highly specific dPCR assays. These assays were validated using synthetic DNA, bacterial genomic DNA, and DNA extracted from clinical samples. The developed dPCR methods demonstrated wide linearity, a low limit of detection (∼30 copies per reaction), and robust analytical performance with measurement uncertainty below 25 %. The assays showed high repeatability and intermediate precision, with no cross-reactivity observed. Comparison with MALDI-TOF mass spectrometry revealed substantial concordance, highlighting the methods’ suitability for clinical diagnostics.
This study underscores the potential of dPCR for rapid and precise quantification of Gram-negative bacterial biomarkers. The developed methods offer significant improvements over existing techniques, providing faster, more accurate, and SI-traceable measurements. These advancements could enhance clinical diagnostics and infection control practices.
{"title":"Development of novel digital PCR assays for the rapid quantification of Gram-negative bacteria biomarkers using RUCS algorithm","authors":"Alexandra Bogožalec Košir , Špela Alič , Viktorija Tomič , Dane Lužnik , Tanja Dreo , Mojca Milavec","doi":"10.1016/j.ymeth.2024.10.011","DOIUrl":"10.1016/j.ymeth.2024.10.011","url":null,"abstract":"<div><div>Rapid and accurate identification of bacterial pathogens is crucial for effective treatment and infection control, particularly in hospital settings. Conventional methods like culture techniques and MALDI-TOF mass spectrometry are often time-consuming and less sensitive. This study addresses the need for faster and more precise diagnostic methods by developing novel digital PCR (dPCR) assays for the rapid quantification of biomarkers from three Gram-negative bacteria: <em>Acinetobacter baumannii</em>, <em>Klebsiella pneumoniae</em>, and <em>Pseudomonas aeruginosa</em>.</div><div>Utilizing publicly available genomes and the <em>rapid identification of PCR primers for unique core sequences</em> or RUCS algorithm, we designed highly specific dPCR assays. These assays were validated using synthetic DNA, bacterial genomic DNA, and DNA extracted from clinical samples. The developed dPCR methods demonstrated wide linearity, a low limit of detection (∼30 copies per reaction), and robust analytical performance with measurement uncertainty below 25 %. The assays showed high repeatability and intermediate precision, with no cross-reactivity observed. Comparison with MALDI-TOF mass spectrometry revealed substantial concordance, highlighting the methods’ suitability for clinical diagnostics.</div><div>This study underscores the potential of dPCR for rapid and precise quantification of Gram-negative bacterial biomarkers. The developed methods offer significant improvements over existing techniques, providing faster, more accurate, and SI-traceable measurements. These advancements could enhance clinical diagnostics and infection control practices.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"232 ","pages":"Pages 72-80"},"PeriodicalIF":4.2,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142556807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-29DOI: 10.1016/j.ymeth.2024.10.010
Anass Garbaz , Yassine Oukdach , Said Charfi , Mohamed El Ansari , Lahcen Koutti , Mouna Salihoun
Medical image segmentation is crucial for accurate diagnosis and treatment in medical image analysis. Among the various methods employed, fully convolutional networks (FCNs) have emerged as a prominent approach for segmenting medical images. Notably, the U-Net architecture and its variants have gained widespread adoption in this domain. This paper introduces MLFA-UNet, an innovative architectural framework aimed at advancing medical image segmentation. MLFA-UNet adopts a U-shaped architecture and integrates two pivotal modules: multi-level feature assembly (MLFA) and multi-scale information attention (MSIA), complemented by a pixel-vanishing (PV) attention mechanism. These modules synergistically contribute to the segmentation process enhancement, fostering both robustness and segmentation precision. MLFA operates within both the network encoder and decoder, facilitating the extraction of local information crucial for accurately segmenting lesions. Furthermore, the bottleneck MSIA module serves to replace stacking modules, thereby expanding the receptive field and augmenting feature diversity, fortified by the PV attention mechanism. These integrated mechanisms work together to boost segmentation performance by effectively capturing both detailed local features and a broader range of contextual information, enhancing both accuracy and resilience in identifying lesions. To assess the versatility of the network, we conducted evaluations of MFLA-UNet across a range of medical image segmentation datasets, encompassing diverse imaging modalities such as wireless capsule endoscopy (WCE), colonoscopy, and dermoscopic images. Our results consistently demonstrate that MFLA-UNet outperforms state-of-the-art algorithms, achieving dice coefficients of 91.42%, 82.43%, 90.8%, and 88.68% for the MICCAI 2017 (Red Lesion), ISIC 2017, PH2, and CVC-ClinicalDB datasets, respectively.
{"title":"MLFA-UNet: A multi-level feature assembly UNet for medical image segmentation","authors":"Anass Garbaz , Yassine Oukdach , Said Charfi , Mohamed El Ansari , Lahcen Koutti , Mouna Salihoun","doi":"10.1016/j.ymeth.2024.10.010","DOIUrl":"10.1016/j.ymeth.2024.10.010","url":null,"abstract":"<div><div>Medical image segmentation is crucial for accurate diagnosis and treatment in medical image analysis. Among the various methods employed, fully convolutional networks (FCNs) have emerged as a prominent approach for segmenting medical images. Notably, the U-Net architecture and its variants have gained widespread adoption in this domain. This paper introduces MLFA-UNet, an innovative architectural framework aimed at advancing medical image segmentation. MLFA-UNet adopts a U-shaped architecture and integrates two pivotal modules: multi-level feature assembly (MLFA) and multi-scale information attention (MSIA), complemented by a pixel-vanishing (PV) attention mechanism. These modules synergistically contribute to the segmentation process enhancement, fostering both robustness and segmentation precision. MLFA operates within both the network encoder and decoder, facilitating the extraction of local information crucial for accurately segmenting lesions. Furthermore, the bottleneck MSIA module serves to replace stacking modules, thereby expanding the receptive field and augmenting feature diversity, fortified by the PV attention mechanism. These integrated mechanisms work together to boost segmentation performance by effectively capturing both detailed local features and a broader range of contextual information, enhancing both accuracy and resilience in identifying lesions. To assess the versatility of the network, we conducted evaluations of MFLA-UNet across a range of medical image segmentation datasets, encompassing diverse imaging modalities such as wireless capsule endoscopy (WCE), colonoscopy, and dermoscopic images. Our results consistently demonstrate that MFLA-UNet outperforms state-of-the-art algorithms, achieving dice coefficients of 91.42%, 82.43%, 90.8%, and 88.68% for the MICCAI 2017 (Red Lesion), ISIC 2017, PH2, and CVC-ClinicalDB datasets, respectively.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"232 ","pages":"Pages 52-64"},"PeriodicalIF":4.2,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142556808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Protein ubiquitination is a critical post-translational modification (PTM) involved in diverse biological processes and plays a pivotal role in regulating physiological mechanisms and disease states. Despite various efforts to develop ubiquitination site prediction tools across species, these tools mainly rely on predefined sequence features and machine learning algorithms, with species-specific variations in ubiquitination patterns remaining poorly understood. This study introduces a novel approach for predicting Arabidopsis thaliana ubiquitination sites using a neural network model based on knowledge distillation and natural language processing (NLP) of protein sequences. Our framework employs a multi-species “Teacher model” to guide a more compact, species-specific “Student model”, with the “Teacher” generating pseudo-labels that enhance the “Student” learning and prediction robustness. Cross-validation results demonstrate that our model achieves superior performance, with an accuracy of 86.3 % and an area under the curve (AUC) of 0.926, while independent testing confirmed these results with an accuracy of 86.3 % and an AUC of 0.923. Comparative analysis with established predictors further highlights the model’s superiority, emphasizing the effectiveness of integrating knowledge distillation and NLP in ubiquitination prediction tasks. This study presents a promising and efficient approach for ubiquitination site prediction, offering valuable insights for researchers in related fields. The code and resources are available on GitHub: https://github.com/nuinvtnu/KD_ArapUbi.
{"title":"Enhancing Arabidopsis thaliana ubiquitination site prediction through knowledge distillation and natural language processing","authors":"Van-Nui Nguyen , Thi-Xuan Tran , Thi-Tuyen Nguyen , Nguyen Quoc Khanh Le","doi":"10.1016/j.ymeth.2024.10.006","DOIUrl":"10.1016/j.ymeth.2024.10.006","url":null,"abstract":"<div><div>Protein ubiquitination is a critical post-translational modification (PTM) involved in diverse biological processes and plays a pivotal role in regulating physiological mechanisms and disease states. Despite various efforts to develop ubiquitination site prediction tools across species, these tools mainly rely on predefined sequence features and machine learning algorithms, with species-specific variations in ubiquitination patterns remaining poorly understood. This study introduces a novel approach for predicting <em>Arabidopsis thaliana</em> ubiquitination sites using a neural network model based on knowledge distillation and natural language processing (NLP) of protein sequences. Our framework employs a multi-species “Teacher model” to guide a more compact, species-specific “Student model”, with the “Teacher” generating pseudo-labels that enhance the “Student” learning and prediction robustness. Cross-validation results demonstrate that our model achieves superior performance, with an accuracy of 86.3 % and an area under the curve (AUC) of 0.926, while independent testing confirmed these results with an accuracy of 86.3 % and an AUC of 0.923. Comparative analysis with established predictors further highlights the model’s superiority, emphasizing the effectiveness of integrating knowledge distillation and NLP in ubiquitination prediction tasks. This study presents a promising and efficient approach for ubiquitination site prediction, offering valuable insights for researchers in related fields. The code and resources are available on GitHub: <span><span>https://github.com/nuinvtnu/KD_ArapUbi</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"232 ","pages":"Pages 65-71"},"PeriodicalIF":4.2,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142492289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}