首页 > 最新文献

Methods最新文献

英文 中文
Development and validation of a machine learning model for predicting drug-drug interactions with oral diabetes medications 开发并验证用于预测糖尿病口服药物药物相互作用的机器学习模型。
IF 4.2 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-01 DOI: 10.1016/j.ymeth.2024.10.012
Quang-Hien Kha , Ngan Thi Kim Nguyen , Nguyen Quoc Khanh Le , Jiunn-Horng Kang
Diabetes management is often complicated by comorbidities, requiring complex medication regimens that increase the risk of drug-drug interactions (DDIs), potentially compromising treatment outcomes or causing toxicity. Although machine learning (ML) models have made strides in DDI prediction, existing approaches lack specificity for oral diabetes medications and face challenges in interpretability. To address these limitations, we propose a novel ML-based framework utilizing the Simplified Molecular Input Line Entry System (SMILES) to encode structural information of oral diabetes drugs. Using this representation, we developed an XGBoost model, selecting molecular features through LASSO. Our dataset, sourced from DrugBank, included 42 oral diabetes drugs and 1,884 interacting drugs, divided into training, validation, and testing sets. The model identified 606 optimal features, achieving an F1-score of 0.8182. SHAP analysis was employed for feature interpretation, enhancing model transparency and clinical relevance. By predicting adverse DDIs, our model offers a valuable tool for clinical decision-making, aiding safer prescription practices. The 606 critical features provide insights into atomic-level interactions, linking computational predictions with biological experiments. We present a classification model specifically designed for predicting DDIs associated with oral diabetes medications, with an openly accessible web application to support diabetes management in multi-drug regimens and comorbidity settings.
糖尿病的治疗往往因合并症而变得复杂,需要复杂的用药方案,这增加了药物相互作用(DDI)的风险,可能会影响治疗效果或导致中毒。虽然机器学习(ML)模型在 DDI 预测方面取得了长足进步,但现有方法对口服糖尿病药物缺乏特异性,在可解释性方面也面临挑战。为了解决这些局限性,我们提出了一种基于 ML 的新型框架,利用简化分子输入行输入系统(SMILES)来编码口服糖尿病药物的结构信息。利用这种表示方法,我们开发了一个 XGBoost 模型,通过 LASSO 选择分子特征。我们的数据集来自 DrugBank,包括 42 种口服糖尿病药物和 1,884 种相互作用药物,分为训练集、验证集和测试集。该模型识别出 606 个最佳特征,F1 分数达到 0.8182。采用 SHAP 分析进行特征解释,提高了模型的透明度和临床相关性。通过预测不良的 DDIs,我们的模型为临床决策提供了有价值的工具,有助于更安全的处方实践。606 个关键特征深入揭示了原子级的相互作用,将计算预测与生物实验联系起来。我们提出的分类模型专门用于预测与口服糖尿病药物相关的DDIs,并提供了一个可公开访问的网络应用程序,以支持多种药物治疗方案和合并症环境下的糖尿病管理。
{"title":"Development and validation of a machine learning model for predicting drug-drug interactions with oral diabetes medications","authors":"Quang-Hien Kha ,&nbsp;Ngan Thi Kim Nguyen ,&nbsp;Nguyen Quoc Khanh Le ,&nbsp;Jiunn-Horng Kang","doi":"10.1016/j.ymeth.2024.10.012","DOIUrl":"10.1016/j.ymeth.2024.10.012","url":null,"abstract":"<div><div>Diabetes management is often complicated by comorbidities, requiring complex medication regimens that increase the risk of drug-drug interactions (DDIs), potentially compromising treatment outcomes or causing toxicity. Although machine learning (ML) models have made strides in DDI prediction, existing approaches lack specificity for oral diabetes medications and face challenges in interpretability. To address these limitations, we propose a novel ML-based framework utilizing the Simplified Molecular Input Line Entry System (SMILES) to encode structural information of oral diabetes drugs. Using this representation, we developed an XGBoost model, selecting molecular features through LASSO. Our dataset, sourced from DrugBank, included 42 oral diabetes drugs and 1,884 interacting drugs, divided into training, validation, and testing sets. The model identified 606 optimal features, achieving an F1-score of 0.8182. SHAP analysis was employed for feature interpretation, enhancing model transparency and clinical relevance. By predicting adverse DDIs, our model offers a valuable tool for clinical decision-making, aiding safer prescription practices. The 606 critical features provide insights into atomic-level interactions, linking computational predictions with biological experiments. We present a classification model specifically designed for predicting DDIs associated with oral diabetes medications, with an openly accessible web application to support diabetes management in multi-drug regimens and comorbidity settings.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"232 ","pages":"Pages 81-88"},"PeriodicalIF":4.2,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142566761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development of novel digital PCR assays for the rapid quantification of Gram-negative bacteria biomarkers using RUCS algorithm 利用 RUCS 算法开发用于快速量化革兰氏阴性菌生物标志物的新型数字 PCR 检测方法。
IF 4.2 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-10-30 DOI: 10.1016/j.ymeth.2024.10.011
Alexandra Bogožalec Košir , Špela Alič , Viktorija Tomič , Dane Lužnik , Tanja Dreo , Mojca Milavec
Rapid and accurate identification of bacterial pathogens is crucial for effective treatment and infection control, particularly in hospital settings. Conventional methods like culture techniques and MALDI-TOF mass spectrometry are often time-consuming and less sensitive. This study addresses the need for faster and more precise diagnostic methods by developing novel digital PCR (dPCR) assays for the rapid quantification of biomarkers from three Gram-negative bacteria: Acinetobacter baumannii, Klebsiella pneumoniae, and Pseudomonas aeruginosa.
Utilizing publicly available genomes and the rapid identification of PCR primers for unique core sequences or RUCS algorithm, we designed highly specific dPCR assays. These assays were validated using synthetic DNA, bacterial genomic DNA, and DNA extracted from clinical samples. The developed dPCR methods demonstrated wide linearity, a low limit of detection (∼30 copies per reaction), and robust analytical performance with measurement uncertainty below 25 %. The assays showed high repeatability and intermediate precision, with no cross-reactivity observed. Comparison with MALDI-TOF mass spectrometry revealed substantial concordance, highlighting the methods’ suitability for clinical diagnostics.
This study underscores the potential of dPCR for rapid and precise quantification of Gram-negative bacterial biomarkers. The developed methods offer significant improvements over existing techniques, providing faster, more accurate, and SI-traceable measurements. These advancements could enhance clinical diagnostics and infection control practices.
快速准确地鉴定细菌病原体对有效治疗和感染控制至关重要,尤其是在医院环境中。培养技术和 MALDI-TOF 质谱等传统方法往往耗时较长,灵敏度较低。为了满足对更快、更精确的诊断方法的需求,本研究开发了新型数字 PCR(dPCR)测定法,用于快速量化三种革兰氏阴性细菌的生物标记物:鲍曼不动杆菌、肺炎克雷伯氏菌和铜绿假单胞菌。利用可公开获得的基因组和针对独特核心序列或 RUCS 算法的 PCR 引物的快速鉴定,我们设计出了高度特异性的 dPCR 检测方法。我们使用合成 DNA、细菌基因组 DNA 和从临床样本中提取的 DNA 对这些检测方法进行了验证。所开发的 dPCR 方法具有宽线性、低检测限(每反应 30 个拷贝)和稳健的分析性能,测量不确定性低于 25%。检测结果显示出较高的重复性和中等精度,未发现交叉反应。与 MALDI-TOF 质谱法的比较显示,二者具有很高的一致性,凸显了该方法在临床诊断中的适用性。这项研究强调了 dPCR 在快速、精确定量革兰氏阴性细菌生物标记物方面的潜力。与现有技术相比,所开发的方法有了重大改进,可提供更快、更准确和可溯源的测量结果。这些进步可加强临床诊断和感染控制实践。
{"title":"Development of novel digital PCR assays for the rapid quantification of Gram-negative bacteria biomarkers using RUCS algorithm","authors":"Alexandra Bogožalec Košir ,&nbsp;Špela Alič ,&nbsp;Viktorija Tomič ,&nbsp;Dane Lužnik ,&nbsp;Tanja Dreo ,&nbsp;Mojca Milavec","doi":"10.1016/j.ymeth.2024.10.011","DOIUrl":"10.1016/j.ymeth.2024.10.011","url":null,"abstract":"<div><div>Rapid and accurate identification of bacterial pathogens is crucial for effective treatment and infection control, particularly in hospital settings. Conventional methods like culture techniques and MALDI-TOF mass spectrometry are often time-consuming and less sensitive. This study addresses the need for faster and more precise diagnostic methods by developing novel digital PCR (dPCR) assays for the rapid quantification of biomarkers from three Gram-negative bacteria: <em>Acinetobacter baumannii</em>, <em>Klebsiella pneumoniae</em>, and <em>Pseudomonas aeruginosa</em>.</div><div>Utilizing publicly available genomes and the <em>rapid identification of PCR primers for unique core sequences</em> or RUCS algorithm, we designed highly specific dPCR assays. These assays were validated using synthetic DNA, bacterial genomic DNA, and DNA extracted from clinical samples. The developed dPCR methods demonstrated wide linearity, a low limit of detection (∼30 copies per reaction), and robust analytical performance with measurement uncertainty below 25 %. The assays showed high repeatability and intermediate precision, with no cross-reactivity observed. Comparison with MALDI-TOF mass spectrometry revealed substantial concordance, highlighting the methods’ suitability for clinical diagnostics.</div><div>This study underscores the potential of dPCR for rapid and precise quantification of Gram-negative bacterial biomarkers. The developed methods offer significant improvements over existing techniques, providing faster, more accurate, and SI-traceable measurements. These advancements could enhance clinical diagnostics and infection control practices.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"232 ","pages":"Pages 72-80"},"PeriodicalIF":4.2,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142556807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Imaging flow cytometry reveals LPS-induced changes to intracellular intensity and distribution of α-synuclein in a TLR4-dependent manner in STC-1 cells. 成像流式细胞术揭示了 LPS 以 TLR4 依赖性方式诱导 STC-1 细胞内 α-突触核蛋白强度和分布的变化。
IF 4.2 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-10-30 DOI: 10.1016/j.ymeth.2024.10.009
Anastazja M Gorecki, Chidozie C Anyaegbu, Melinda Fitzgerald, Kathryn A Fuller, Ryan S Anderton

Background: Parkinson's disease is a chronic neurodegenerative disorder, where pathological protein aggregates largely composed of phosphorylated α-synuclein are implicated in disease pathogenesis and progression. Emerging evidence suggests that the interaction between pro-inflammatory microbial factors and the gut epithelium contributes to α-synuclein aggregation in the enteric nervous system. However, the cellular sources and mechanisms for α-synuclein pathology in the gut are still unclear.

Methods: The STC-1 cell line, which models an enteroendocrine population capable of communicating with the microbiota, immune and nervous systems, was treated with a TLR4 inhibitor (TAK-242) prior to microbial lipopolysaccharide (LPS) exposure to investigate the role of TLR4 signalling in α-synuclein alterations. Antibodies targeting the full-length protein (α-synuclein) and the Serine-129 phosphorylated form (pS129) were used. Complex, multi-parametric image analysis was conducted through confocal microscopy (with Zen 3.8 analysis) and imaging flow cytometry (with IDEAS® analysis).

Results: Confocal microscopy revealed heterogenous distribution of α-synuclein and pS129 in STC-1 cells, with prominent pS129 staining along cytoplasmic processes. Imaging flow cytometry further quantified the relationship between various α-synuclein morphometric features. Thereafter, imaging flow cytometry demonstrated a dose-specific effect of LPS, where the low (8 μg/mL), but not high dose (32 μg/mL), significantly altered measures related to α-synuclein intensity, distribution, and localisation. Pre-treatment with a TLR4 inhibitor TAK-242 alleviated some of these significant alterations.

Conclusion: This study demonstrates that LPS-TLR4 signalling alters the intracellular localisation of α-synuclein in enteroendocrine cells in vitro and showcases the utility of combining imaging flow cytometry to investigate subtle protein changes that may not be apparent through confocal microscopy alone. Further investigation is required to understand the apparent dose-dependent effects of LPS on α-synuclein in the gut epithelium in healthy states as well as conditions such as Parkinson's disease.

背景:帕金森病是一种慢性神经退行性疾病,其病理蛋白聚集体主要由磷酸化的α-突触核蛋白组成,与疾病的发病和进展有关。新的证据表明,促炎微生物因子与肠道上皮细胞之间的相互作用导致了肠道神经系统中α-突触核蛋白的聚集。然而,肠道中α-突触核蛋白病变的细胞来源和机制仍不清楚:在暴露于微生物脂多糖(LPS)之前,用TLR4抑制剂(TAK-242)处理STC-1细胞系,以研究TLR4信号在α-突触核蛋白改变中的作用。研究使用了针对全长蛋白(α-突触核蛋白)和丝氨酸-129磷酸化形式(pS129)的抗体。通过共聚焦显微镜(Zen 3.8分析)和成像流式细胞仪(IDEAS®分析)进行了复杂的多参数图像分析:结果:共聚焦显微镜显示α-突触核蛋白和pS129在STC-1细胞中呈异质分布,pS129沿细胞质过程突出染色。成像流式细胞术进一步量化了各种α-突触核蛋白形态特征之间的关系。此后,成像流式细胞仪显示了 LPS 的剂量特异性效应,其中低剂量(8 μg/mL)而非高剂量(32 μg/mL)显著改变了与α-突触核蛋白强度、分布和定位相关的指标。TLR4抑制剂TAK-242的预处理减轻了其中一些明显的改变:本研究表明,LPS-TLR4 信号改变了体外肠内分泌细胞中α-突触核蛋白的胞内定位,并展示了结合成像流式细胞术研究蛋白质微妙变化的实用性,这些变化可能无法仅通过共聚焦显微镜观察到。要了解 LPS 对健康状态下肠道上皮细胞中的α-突触核蛋白以及帕金森病等疾病中的α-突触核蛋白的明显剂量依赖性效应,还需要进一步的研究。
{"title":"Imaging flow cytometry reveals LPS-induced changes to intracellular intensity and distribution of α-synuclein in a TLR4-dependent manner in STC-1 cells.","authors":"Anastazja M Gorecki, Chidozie C Anyaegbu, Melinda Fitzgerald, Kathryn A Fuller, Ryan S Anderton","doi":"10.1016/j.ymeth.2024.10.009","DOIUrl":"https://doi.org/10.1016/j.ymeth.2024.10.009","url":null,"abstract":"<p><strong>Background: </strong>Parkinson's disease is a chronic neurodegenerative disorder, where pathological protein aggregates largely composed of phosphorylated α-synuclein are implicated in disease pathogenesis and progression. Emerging evidence suggests that the interaction between pro-inflammatory microbial factors and the gut epithelium contributes to α-synuclein aggregation in the enteric nervous system. However, the cellular sources and mechanisms for α-synuclein pathology in the gut are still unclear.</p><p><strong>Methods: </strong>The STC-1 cell line, which models an enteroendocrine population capable of communicating with the microbiota, immune and nervous systems, was treated with a TLR4 inhibitor (TAK-242) prior to microbial lipopolysaccharide (LPS) exposure to investigate the role of TLR4 signalling in α-synuclein alterations. Antibodies targeting the full-length protein (α-synuclein) and the Serine-129 phosphorylated form (pS129) were used. Complex, multi-parametric image analysis was conducted through confocal microscopy (with Zen 3.8 analysis) and imaging flow cytometry (with IDEAS® analysis).</p><p><strong>Results: </strong>Confocal microscopy revealed heterogenous distribution of α-synuclein and pS129 in STC-1 cells, with prominent pS129 staining along cytoplasmic processes. Imaging flow cytometry further quantified the relationship between various α-synuclein morphometric features. Thereafter, imaging flow cytometry demonstrated a dose-specific effect of LPS, where the low (8 μg/mL), but not high dose (32 μg/mL), significantly altered measures related to α-synuclein intensity, distribution, and localisation. Pre-treatment with a TLR4 inhibitor TAK-242 alleviated some of these significant alterations.</p><p><strong>Conclusion: </strong>This study demonstrates that LPS-TLR4 signalling alters the intracellular localisation of α-synuclein in enteroendocrine cells in vitro and showcases the utility of combining imaging flow cytometry to investigate subtle protein changes that may not be apparent through confocal microscopy alone. Further investigation is required to understand the apparent dose-dependent effects of LPS on α-synuclein in the gut epithelium in healthy states as well as conditions such as Parkinson's disease.</p>","PeriodicalId":390,"journal":{"name":"Methods","volume":" ","pages":""},"PeriodicalIF":4.2,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142563953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MLFA-UNet: A multi-level feature assembly UNet for medical image segmentation MLFA-UNet:用于医学图像分割的多层次特征组合 UNet。
IF 4.2 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-10-29 DOI: 10.1016/j.ymeth.2024.10.010
Anass Garbaz , Yassine Oukdach , Said Charfi , Mohamed El Ansari , Lahcen Koutti , Mouna Salihoun
Medical image segmentation is crucial for accurate diagnosis and treatment in medical image analysis. Among the various methods employed, fully convolutional networks (FCNs) have emerged as a prominent approach for segmenting medical images. Notably, the U-Net architecture and its variants have gained widespread adoption in this domain. This paper introduces MLFA-UNet, an innovative architectural framework aimed at advancing medical image segmentation. MLFA-UNet adopts a U-shaped architecture and integrates two pivotal modules: multi-level feature assembly (MLFA) and multi-scale information attention (MSIA), complemented by a pixel-vanishing (PV) attention mechanism. These modules synergistically contribute to the segmentation process enhancement, fostering both robustness and segmentation precision. MLFA operates within both the network encoder and decoder, facilitating the extraction of local information crucial for accurately segmenting lesions. Furthermore, the bottleneck MSIA module serves to replace stacking modules, thereby expanding the receptive field and augmenting feature diversity, fortified by the PV attention mechanism. These integrated mechanisms work together to boost segmentation performance by effectively capturing both detailed local features and a broader range of contextual information, enhancing both accuracy and resilience in identifying lesions. To assess the versatility of the network, we conducted evaluations of MFLA-UNet across a range of medical image segmentation datasets, encompassing diverse imaging modalities such as wireless capsule endoscopy (WCE), colonoscopy, and dermoscopic images. Our results consistently demonstrate that MFLA-UNet outperforms state-of-the-art algorithms, achieving dice coefficients of 91.42%, 82.43%, 90.8%, and 88.68% for the MICCAI 2017 (Red Lesion), ISIC 2017, PH2, and CVC-ClinicalDB datasets, respectively.
医学图像分割对于医学图像分析中的精确诊断和治疗至关重要。在采用的各种方法中,全卷积网络(FCN)已成为分割医学图像的一种重要方法。值得注意的是,U-Net 架构及其变体在这一领域得到了广泛应用。本文介绍了 MLFA-UNet,这是一种旨在推进医学图像分割的创新架构框架。MLFA-UNet 采用 U 型架构,集成了两个关键模块:多级特征集合(MLFA)和多尺度信息关注(MSIA),并辅以像素消失(PV)关注机制。这些模块协同促进了分割过程的增强,同时提高了稳健性和分割精度。MLFA 同时在网络编码器和解码器中运行,有助于提取对准确分割病变至关重要的局部信息。此外,瓶颈 MSIA 模块可取代堆叠模块,从而扩大感受野,并在 PV 注意机制的强化下增强特征多样性。这些综合机制通过有效捕捉详细的局部特征和更广泛的上下文信息,共同提高了分割性能,从而增强了识别病变的准确性和弹性。为了评估该网络的多功能性,我们在一系列医学图像分割数据集上对 MFLA-UNet 进行了评估,这些数据集包括无线胶囊内窥镜(WCE)、结肠镜检查和皮肤镜图像等多种成像模式。我们的结果一致表明,MFLA-UNet 优于最先进的算法,在 MICCAI 2017 (Red Lesion)、ISIC 2017、PH2 和 CVC-ClinicalDB 数据集上的骰子系数分别达到了 91.42%、82.43%、90.8% 和 88.68%。
{"title":"MLFA-UNet: A multi-level feature assembly UNet for medical image segmentation","authors":"Anass Garbaz ,&nbsp;Yassine Oukdach ,&nbsp;Said Charfi ,&nbsp;Mohamed El Ansari ,&nbsp;Lahcen Koutti ,&nbsp;Mouna Salihoun","doi":"10.1016/j.ymeth.2024.10.010","DOIUrl":"10.1016/j.ymeth.2024.10.010","url":null,"abstract":"<div><div>Medical image segmentation is crucial for accurate diagnosis and treatment in medical image analysis. Among the various methods employed, fully convolutional networks (FCNs) have emerged as a prominent approach for segmenting medical images. Notably, the U-Net architecture and its variants have gained widespread adoption in this domain. This paper introduces MLFA-UNet, an innovative architectural framework aimed at advancing medical image segmentation. MLFA-UNet adopts a U-shaped architecture and integrates two pivotal modules: multi-level feature assembly (MLFA) and multi-scale information attention (MSIA), complemented by a pixel-vanishing (PV) attention mechanism. These modules synergistically contribute to the segmentation process enhancement, fostering both robustness and segmentation precision. MLFA operates within both the network encoder and decoder, facilitating the extraction of local information crucial for accurately segmenting lesions. Furthermore, the bottleneck MSIA module serves to replace stacking modules, thereby expanding the receptive field and augmenting feature diversity, fortified by the PV attention mechanism. These integrated mechanisms work together to boost segmentation performance by effectively capturing both detailed local features and a broader range of contextual information, enhancing both accuracy and resilience in identifying lesions. To assess the versatility of the network, we conducted evaluations of MFLA-UNet across a range of medical image segmentation datasets, encompassing diverse imaging modalities such as wireless capsule endoscopy (WCE), colonoscopy, and dermoscopic images. Our results consistently demonstrate that MFLA-UNet outperforms state-of-the-art algorithms, achieving dice coefficients of 91.42%, 82.43%, 90.8%, and 88.68% for the MICCAI 2017 (Red Lesion), ISIC 2017, PH2, and CVC-ClinicalDB datasets, respectively.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"232 ","pages":"Pages 52-64"},"PeriodicalIF":4.2,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142556808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Arabidopsis thaliana ubiquitination site prediction through knowledge distillation and natural language processing 通过知识提炼和自然语言处理提高拟南芥泛素化位点预测能力
IF 4.2 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-10-22 DOI: 10.1016/j.ymeth.2024.10.006
Van-Nui Nguyen , Thi-Xuan Tran , Thi-Tuyen Nguyen , Nguyen Quoc Khanh Le
Protein ubiquitination is a critical post-translational modification (PTM) involved in diverse biological processes and plays a pivotal role in regulating physiological mechanisms and disease states. Despite various efforts to develop ubiquitination site prediction tools across species, these tools mainly rely on predefined sequence features and machine learning algorithms, with species-specific variations in ubiquitination patterns remaining poorly understood. This study introduces a novel approach for predicting Arabidopsis thaliana ubiquitination sites using a neural network model based on knowledge distillation and natural language processing (NLP) of protein sequences. Our framework employs a multi-species “Teacher model” to guide a more compact, species-specific “Student model”, with the “Teacher” generating pseudo-labels that enhance the “Student” learning and prediction robustness. Cross-validation results demonstrate that our model achieves superior performance, with an accuracy of 86.3 % and an area under the curve (AUC) of 0.926, while independent testing confirmed these results with an accuracy of 86.3 % and an AUC of 0.923. Comparative analysis with established predictors further highlights the model’s superiority, emphasizing the effectiveness of integrating knowledge distillation and NLP in ubiquitination prediction tasks. This study presents a promising and efficient approach for ubiquitination site prediction, offering valuable insights for researchers in related fields. The code and resources are available on GitHub: https://github.com/nuinvtnu/KD_ArapUbi.
蛋白质泛素化是一种关键的翻译后修饰(PTM),涉及多种生物过程,在调节生理机制和疾病状态方面起着关键作用。尽管人们在开发跨物种泛素化位点预测工具方面做出了各种努力,但这些工具主要依赖于预定义的序列特征和机器学习算法,对泛素化模式的物种特异性差异仍然知之甚少。本研究介绍了一种预测拟南芥泛素化位点的新方法,该方法使用基于蛋白质序列知识提炼和自然语言处理(NLP)的神经网络模型。我们的框架采用多物种 "教师模型 "来指导更紧凑、特定物种的 "学生模型","教师 "生成伪标签以增强 "学生 "的学习和预测鲁棒性。交叉验证结果表明,我们的模型性能优越,准确率达 86.3%,曲线下面积(AUC)为 0.926;独立测试证实了这些结果,准确率达 86.3%,曲线下面积(AUC)为 0.923。与已有预测工具的比较分析进一步凸显了该模型的优越性,强调了在泛素化预测任务中整合知识提炼和 NLP 的有效性。这项研究为泛素化位点预测提供了一种前景广阔的高效方法,为相关领域的研究人员提供了宝贵的见解。代码和资源可在 GitHub 上获取:https://github.com/nuinvtnu/KD_ArapUbi.
{"title":"Enhancing Arabidopsis thaliana ubiquitination site prediction through knowledge distillation and natural language processing","authors":"Van-Nui Nguyen ,&nbsp;Thi-Xuan Tran ,&nbsp;Thi-Tuyen Nguyen ,&nbsp;Nguyen Quoc Khanh Le","doi":"10.1016/j.ymeth.2024.10.006","DOIUrl":"10.1016/j.ymeth.2024.10.006","url":null,"abstract":"<div><div>Protein ubiquitination is a critical post-translational modification (PTM) involved in diverse biological processes and plays a pivotal role in regulating physiological mechanisms and disease states. Despite various efforts to develop ubiquitination site prediction tools across species, these tools mainly rely on predefined sequence features and machine learning algorithms, with species-specific variations in ubiquitination patterns remaining poorly understood. This study introduces a novel approach for predicting <em>Arabidopsis thaliana</em> ubiquitination sites using a neural network model based on knowledge distillation and natural language processing (NLP) of protein sequences. Our framework employs a multi-species “Teacher model” to guide a more compact, species-specific “Student model”, with the “Teacher” generating pseudo-labels that enhance the “Student” learning and prediction robustness. Cross-validation results demonstrate that our model achieves superior performance, with an accuracy of 86.3 % and an area under the curve (AUC) of 0.926, while independent testing confirmed these results with an accuracy of 86.3 % and an AUC of 0.923. Comparative analysis with established predictors further highlights the model’s superiority, emphasizing the effectiveness of integrating knowledge distillation and NLP in ubiquitination prediction tasks. This study presents a promising and efficient approach for ubiquitination site prediction, offering valuable insights for researchers in related fields. The code and resources are available on GitHub: <span><span>https://github.com/nuinvtnu/KD_ArapUbi</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"232 ","pages":"Pages 65-71"},"PeriodicalIF":4.2,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142492289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development and validation of a new and rapid molecular diagnostic tool based on RT-LAMP for Hepatitis C virus detection at point-of-care 开发并验证基于 RT-LAMP 的新型快速分子诊断工具,用于在护理点检测丙型肝炎病毒。
IF 4.2 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-10-22 DOI: 10.1016/j.ymeth.2024.10.008
Sonia Arca-Lafuente , Cristina Yépez-Notario , Pablo Cea-Callejo , Violeta Lara-Aguilar , Celia Crespo-Bermejo , Luz Martín-Carbonero , Ignacio de los Santos , Verónica Briz , Ricardo Madrid

Purpose

Globally, it is estimated that 1.0 million individuals are newly infected by Hepatitis C virus (HCV) every year, and nearly 50 million people live with a chronic infection, according to World Health Organization. To overcome underdiagnosis of HCV infection among hard-to-reach populations, it is essential to develop new rapid and easy-to-use molecular diagnostic systems. In this work, we have developed a pangenotypic diagnostic tool based on Loop-Mediated Isothermal Amplification (LAMP), coupled to a direct sample lysis procedure for molecular detection of HCV at point-of-care (POC).

Methods

Procedure validation was performed using 129 different samples from HCV infected patients (116 serum samples, and 13 fresh blood samples), 27 individuals who tested negative for HCV but positive for HIV, and 11 healthy donors. Serum was collected, lysed for 10 min at room temperature, and assayed by RT-LAMP. To achieve this, a set of 9 LAMP-primers was used for the first time. Parallel RT-qPCR assays were conducted for HCV to both validate the procedure and quantify viral loads.

Results

HCV was detected by RT-LAMP in 109/116 HCV positive serum samples, and in 11/13 positive blood samples in less than 40 min. Compared to RT-qPCR results, our RT-LAMP procedure showed a sensitivity of 94 %, 100 % specificity, and a limit of detection of 3.26 log10 IU/mL (10–20 copies per reaction).

Conclusions

We have developed an accurate system, more affordable than the current available rapid tests for HCV. Since no prior RNA purification step from capillary blood is required, we strongly recommend our RT-LAMP system as a valuable and rapid tool for the molecular detection of HCV at POC.
目的:据世界卫生组织估计,全球每年有 100 万人新感染丙型肝炎病毒(HCV),近 5000 万人患有慢性感染。为了解决丙型肝炎病毒感染在难以接触人群中诊断不足的问题,必须开发新的快速、易用的分子诊断系统。在这项工作中,我们开发了一种基于环路介导等温扩增(LAMP)的泛基因型诊断工具,并将其与直接样本裂解程序相结合,用于在护理点(POC)对 HCV 进行分子检测:方法:使用 129 份不同的样本(116 份血清样本和 13 份新鲜血液样本)进行了程序验证,这些样本分别来自 HCV 感染者、27 名 HCV 检测阴性但 HIV 检测阳性者和 11 名健康捐献者。采集血清后,在室温下裂解 10 分钟,然后用 RT-LAMP 进行检测。为此,首次使用了一套 9 个 LAMP-引物。同时还对 HCV 进行了 RT-qPCR 检测,以验证该程序并量化病毒载量:结果:在不到 40 分钟的时间内,RT-LAMP 法检测了 109/116 份 HCV 阳性血清样本和 11/13 份阳性血液样本中的 HCV。与 RT-qPCR 结果相比,我们的 RT-LAMP 程序的灵敏度为 94%,特异性为 100%,检测限为 3.26 log10 IU/mL(每次反应 10-20 个拷贝):结论:我们开发了一种准确的系统,比目前可用的 HCV 快速检测方法更经济实惠。由于无需事先从毛细管血液中纯化 RNA,我们强烈推荐我们的 RT-LAMP 系统,它是在 POC 上进行 HCV 分子检测的重要而快速的工具。
{"title":"Development and validation of a new and rapid molecular diagnostic tool based on RT-LAMP for Hepatitis C virus detection at point-of-care","authors":"Sonia Arca-Lafuente ,&nbsp;Cristina Yépez-Notario ,&nbsp;Pablo Cea-Callejo ,&nbsp;Violeta Lara-Aguilar ,&nbsp;Celia Crespo-Bermejo ,&nbsp;Luz Martín-Carbonero ,&nbsp;Ignacio de los Santos ,&nbsp;Verónica Briz ,&nbsp;Ricardo Madrid","doi":"10.1016/j.ymeth.2024.10.008","DOIUrl":"10.1016/j.ymeth.2024.10.008","url":null,"abstract":"<div><h3>Purpose</h3><div>Globally, it is estimated that 1.0 million individuals are newly infected by Hepatitis C virus (HCV) every year, and nearly 50 million people live with a chronic infection, according to World Health Organization. To overcome underdiagnosis of HCV infection among hard-to-reach populations, it is essential to develop new rapid and easy-to-use molecular diagnostic systems. In this work, we have developed a pangenotypic diagnostic tool based on Loop-Mediated Isothermal Amplification (LAMP), coupled to a direct sample lysis procedure for molecular detection of HCV at point-of-care (POC).</div></div><div><h3>Methods</h3><div>Procedure validation was performed using 129 different samples from HCV infected patients (116 serum samples, and 13 fresh blood samples), 27 individuals who tested negative for HCV but positive for HIV, and 11 healthy donors. Serum was collected, lysed for 10 min at room temperature, and assayed by RT-LAMP. To achieve this, a set of 9 LAMP-primers was used for the first time. Parallel RT-qPCR assays were conducted for HCV to both validate the procedure and quantify viral loads.</div></div><div><h3>Results</h3><div>HCV was detected by RT-LAMP in 109/116 HCV positive serum samples, and in 11/13 positive blood samples in less than 40 min. Compared to RT-qPCR results, our RT-LAMP procedure showed a sensitivity of 94 %, 100 % specificity, and a limit of detection of 3.26 log<sub>10</sub> IU/mL (10–20 copies per reaction).</div></div><div><h3>Conclusions</h3><div>We have developed an accurate system, more affordable than the current available rapid tests for HCV. Since no prior RNA purification step from capillary blood is required, we strongly recommend our RT-LAMP system as a valuable and rapid tool for the molecular detection of HCV at POC.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"232 ","pages":"Pages 43-51"},"PeriodicalIF":4.2,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142492288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HLA-DR4Pred2: An improved method for predicting HLA-DRB1*04:01 binders HLA-DR4Pred2:预测 HLA-DRB1*04:01 结合者的改进方法。
IF 4.2 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-10-19 DOI: 10.1016/j.ymeth.2024.10.007
Sumeet Patiyal , Anjali Dhall , Nishant Kumar , Gajendra P.S. Raghava
HLA-DRB1*04:01 is associated with numerous diseases, including sclerosis, arthritis, diabetes, and COVID-19, emphasizing the need to scan for binders in the antigens to develop immunotherapies and vaccines. Current prediction methods are often limited by their reliance on the small datasets. This study presents HLA-DR4Pred2, developed on a large dataset containing 12,676 binders and an equal number of non-binders. It’s an improved version of HLA-DR4Pred, which was trained on a small dataset, containing 576 binders and an equal number of non-binders. All models were trained, optimized, and tested on 80 % of the data using five-fold cross-validation and evaluated on the remaining 20 %. A range of machine learning techniques was employed, achieving maximum AUROC of 0.90 and 0.87, using composition and binary profile features, respectively. The performance of the composition-based model increased to 0.93, when combined with BLAST search. Additionally, models developed on the realistic dataset containing 12,676 binders and 86,300 non-binders, achieved a maximum AUROC of 0.99. Our proposed method outperformed existing methods when we compared the performance of our best model to that of existing methods on the independent dataset. Finally, we developed a standalone tool and a webserver for HLADR4Pred2, enabling the prediction, design, and virtual scanning of HLA-DRB1*04:01 binding peptides, and we also released a Python package available on the Python Package Index (https://webs.iiitd.edu.in/raghava/hladr4pred2/; https://github.com/raghavagps/hladr4pred2; https://pypi.org/project/hladr4pred2/).
HLA-DRB1*04:01 与多种疾病相关,包括硬化症、关节炎、糖尿病和 COVID-19,这强调了扫描抗原中的结合剂以开发免疫疗法和疫苗的必要性。目前的预测方法往往受限于对小型数据集的依赖。本研究提出的 HLA-DR4Pred2 是在包含 12,676 个结合者和同等数量的非结合者的大型数据集上开发的。它是 HLA-DR4Pred 的改进版,HLA-DR4Pred 是在包含 576 个绑定者和同等数量的非绑定者的小型数据集上训练出来的。所有模型都在 80% 的数据上使用五倍交叉验证进行了训练、优化和测试,并在剩余的 20% 数据上进行了评估。采用了一系列机器学习技术,利用成分特征和二元剖面特征分别获得了 0.90 和 0.87 的最大 AUROC。当与 BLAST 搜索相结合时,基于成分的模型的性能提高到了 0.93。此外,在包含 12,676 个粘合剂和 86,300 个非粘合剂的现实数据集上开发的模型,最大 AUROC 为 0.99。在独立数据集上比较最佳模型和现有方法的性能时,我们提出的方法优于现有方法。最后,我们为 HLADR4Pred2 开发了一个独立工具和一个网络服务器,实现了 HLA-DRB1*04:01 结合肽的预测、设计和虚拟扫描,我们还发布了一个 Python 软件包,可在 Python 软件包索引 (https://webs.iiitd.edu.in/raghava/hladr4pred2/; https://github.com/raghavagps/hladr4pred2; https://pypi.org/project/hladr4pred2/) 上查阅。
{"title":"HLA-DR4Pred2: An improved method for predicting HLA-DRB1*04:01 binders","authors":"Sumeet Patiyal ,&nbsp;Anjali Dhall ,&nbsp;Nishant Kumar ,&nbsp;Gajendra P.S. Raghava","doi":"10.1016/j.ymeth.2024.10.007","DOIUrl":"10.1016/j.ymeth.2024.10.007","url":null,"abstract":"<div><div>HLA-DRB1*04:01 is associated with numerous diseases, including sclerosis, arthritis, diabetes, and COVID-19, emphasizing the need to scan for binders in the antigens to develop immunotherapies and vaccines. Current prediction methods are often limited by their reliance on the small datasets. This study presents HLA-DR4Pred2, developed on a large dataset containing 12,676 binders and an equal number of non-binders. It’s an improved version of HLA-DR4Pred, which was trained on a small dataset, containing 576 binders and an equal number of non-binders. All models were trained, optimized, and tested on 80 % of the data using five-fold cross-validation and evaluated on the remaining 20 %. A range of machine learning techniques was employed, achieving maximum AUROC of 0.90 and 0.87, using composition and binary profile features, respectively. The performance of the composition-based model increased to 0.93, when combined with BLAST search. Additionally, models developed on the realistic dataset containing 12,676 binders and 86,300 non-binders, achieved a maximum AUROC of 0.99. Our proposed method outperformed existing methods when we compared the performance of our best model to that of existing methods on the independent dataset. Finally, we developed a standalone tool and a webserver for HLADR4Pred2, enabling the prediction, design, and virtual scanning of HLA-DRB1*04:01 binding peptides, and we also released a Python package available on the Python Package Index (<span><span>https://webs.iiitd.edu.in/raghava/hladr4pred2/</span><svg><path></path></svg></span>; <span><span>https://github.com/raghavagps/hladr4pred2</span><svg><path></path></svg></span>; <span><span>https://pypi.org/project/hladr4pred2/</span><svg><path></path></svg></span>).</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"232 ","pages":"Pages 18-28"},"PeriodicalIF":4.2,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142455002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A heterogeneous graph transformer framework for accurate cancer driver gene prediction and downstream analysis 用于准确预测癌症驱动基因和下游分析的异构图转换器框架
IF 4.2 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-10-18 DOI: 10.1016/j.ymeth.2024.09.018
Shuwen Xiong , Junming Zhang , Hong Luo , Yongqing Zhang , Qinyin Xiao
Accurately predicting cancer driver genes remains a formidable challenge amidst the burgeoning volume and intricacy of cancer genomic data. In this investigation, we propose HGTDG, an innovative heterogeneous graph transformer framework tailored for precisely predicting cancer driver genes and exploring downstream tasks. A heterogeneous graph construction module is central to the framework, which assembles a gene-protein heterogeneous network leveraging the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and protein-protein interactions sourced from the STRING (search tool for recurring instances of neighboring genes) database. Moreover, our framework introduces a pioneering heterogeneous graph transformer module, harnessing multi-head attention mechanisms for nuanced node embedding. This transformative module proficiently captures distinct representations for both nodes and edges, thereby enriching the model's predictive capacity. Subsequently, the generated node embeddings are seamlessly integrated into a classification module, facilitating the discrimination between driver and non-driver genes. Our experimental findings evince the superiority of HGTDG over existing methodologies, as evidenced by the enhanced performance metrics, including the area under the receiver operating characteristic curves (AUROC) and the area under the precision-recall curves (AUPRC). Furthermore, the downstream analysis utilizing the newly identified cancer driver genes underscores the efficacy and versatility of our proposed framework.
随着癌症基因组数据量的激增和复杂性的增加,准确预测癌症驱动基因仍然是一项艰巨的挑战。在这项研究中,我们提出了 HGTDG,这是一个创新的异构图转换器框架,专为精确预测癌症驱动基因和探索下游任务而量身定制。异构图构建模块是该框架的核心,它利用《京都基因与基因组百科全书》(KEGG)中的通路和 STRING(相邻基因重复实例搜索工具)数据库中的蛋白质-蛋白质相互作用,构建基因-蛋白质异构网络。此外,我们的框架还引入了一个开创性的异构图转换器模块,利用多头关注机制进行细微的节点嵌入。这一转换模块能熟练捕捉节点和边的不同表征,从而丰富模型的预测能力。随后,生成的节点嵌入被无缝集成到分类模块中,从而有助于区分驱动基因和非驱动基因。我们的实验结果表明,与现有方法相比,HGTDG 具有更优越的性能指标,包括接收者操作特征曲线下面积(AUROC)和精度-召回曲线下面积(AUPRC)。此外,利用新发现的癌症驱动基因进行的下游分析也凸显了我们提出的框架的有效性和多功能性。
{"title":"A heterogeneous graph transformer framework for accurate cancer driver gene prediction and downstream analysis","authors":"Shuwen Xiong ,&nbsp;Junming Zhang ,&nbsp;Hong Luo ,&nbsp;Yongqing Zhang ,&nbsp;Qinyin Xiao","doi":"10.1016/j.ymeth.2024.09.018","DOIUrl":"10.1016/j.ymeth.2024.09.018","url":null,"abstract":"<div><div>Accurately predicting cancer driver genes remains a formidable challenge amidst the burgeoning volume and intricacy of cancer genomic data. In this investigation, we propose HGTDG, an innovative heterogeneous graph transformer framework tailored for precisely predicting cancer driver genes and exploring downstream tasks. A heterogeneous graph construction module is central to the framework, which assembles a gene-protein heterogeneous network leveraging the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and protein-protein interactions sourced from the STRING (search tool for recurring instances of neighboring genes) database. Moreover, our framework introduces a pioneering heterogeneous graph transformer module, harnessing multi-head attention mechanisms for nuanced node embedding. This transformative module proficiently captures distinct representations for both nodes and edges, thereby enriching the model's predictive capacity. Subsequently, the generated node embeddings are seamlessly integrated into a classification module, facilitating the discrimination between driver and non-driver genes. Our experimental findings evince the superiority of HGTDG over existing methodologies, as evidenced by the enhanced performance metrics, including the area under the receiver operating characteristic curves (AUROC) and the area under the precision-recall curves (AUPRC). Furthermore, the downstream analysis utilizing the newly identified cancer driver genes underscores the efficacy and versatility of our proposed framework.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"232 ","pages":"Pages 9-17"},"PeriodicalIF":4.2,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142455001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-view contrastive clustering for cancer subtyping using fully and weakly paired multi-omics data 利用完全配对和弱配对多组学数据进行癌症亚型的多视角对比聚类。
IF 4.2 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-10-17 DOI: 10.1016/j.ymeth.2024.09.016
Yabin Kuang , Minzhu Xie , Zhanhong Zhao , Dongze Deng , Ergude Bao
The identification of cancer subtypes is crucial for advancing precision medicine, as it facilitates the development of more effective and personalized treatment and prevention strategies. With the development of high-throughput sequencing technologies, researchers now have access to a wealth of multi-omics data from cancer patients, making computational cancer subtyping increasingly feasible. One of the main challenges in integrating multi-omics data is handling missing data, since not all biomolecules are consistently measured across all samples. Current computational models based on multi-omics data for cancer subtyping often struggle with the challenge of weakly paired omics data. To address this challenge, we propose a novel unsupervised cancer subtyping model named Subtype-MVCC. This model leverages graph convolutional networks to extract and represent low-dimensional features from each omics data type, using intra-view and inter-view contrastive learning approaches. By incorporating a weighted average fusion strategy to unify the dimension of each sample, Subtype-MVCC effectively handles weakly paired multi-omics datasets. Comprehensive evaluations on established benchmark datasets demonstrate that Subtype-MVCC outperforms nine leading models in this domain. Additionally, simulations with varying levels of missing data highlight the model's robust performance in handling weakly paired omics data. The clinical relevance and survival outcomes associated with the identified subtypes further validate the interpretability and reliability of the clustering results produced by Subtype-MVCC.
癌症亚型的确定对于推进精准医疗至关重要,因为这有助于开发更有效、更个性化的治疗和预防策略。随着高通量测序技术的发展,研究人员现在可以从癌症患者那里获得大量的多组学数据,这使得计算癌症亚型变得越来越可行。整合多组学数据的主要挑战之一是处理缺失数据,因为并非所有生物分子都能在所有样本中得到一致的测量。目前基于多组学数据进行癌症亚型分析的计算模型往往难以应对弱配对 omics 数据的挑战。为了应对这一挑战,我们提出了一种名为 Subtype-MVCC 的新型无监督癌症亚型分析模型。该模型利用图卷积网络,采用视图内和视图间对比学习方法,从每种 omics 数据类型中提取并表示低维特征。通过采用加权平均融合策略来统一每个样本的维度,Subtype-MVCC 能有效处理弱配对的多组学数据集。在已建立的基准数据集上进行的综合评估表明,Subtype-MVCC 优于该领域的九种领先模型。此外,不同程度的数据缺失模拟也凸显了该模型在处理弱配对组学数据时的强大性能。与已识别亚型相关的临床相关性和生存结果进一步验证了 Subtype-MVCC 生成的聚类结果的可解释性和可靠性。
{"title":"Multi-view contrastive clustering for cancer subtyping using fully and weakly paired multi-omics data","authors":"Yabin Kuang ,&nbsp;Minzhu Xie ,&nbsp;Zhanhong Zhao ,&nbsp;Dongze Deng ,&nbsp;Ergude Bao","doi":"10.1016/j.ymeth.2024.09.016","DOIUrl":"10.1016/j.ymeth.2024.09.016","url":null,"abstract":"<div><div>The identification of cancer subtypes is crucial for advancing precision medicine, as it facilitates the development of more effective and personalized treatment and prevention strategies. With the development of high-throughput sequencing technologies, researchers now have access to a wealth of multi-omics data from cancer patients, making computational cancer subtyping increasingly feasible. One of the main challenges in integrating multi-omics data is handling missing data, since not all biomolecules are consistently measured across all samples. Current computational models based on multi-omics data for cancer subtyping often struggle with the challenge of weakly paired omics data. To address this challenge, we propose a novel unsupervised cancer subtyping model named Subtype-MVCC. This model leverages graph convolutional networks to extract and represent low-dimensional features from each omics data type, using intra-view and inter-view contrastive learning approaches. By incorporating a weighted average fusion strategy to unify the dimension of each sample, Subtype-MVCC effectively handles weakly paired multi-omics datasets. Comprehensive evaluations on established benchmark datasets demonstrate that Subtype-MVCC outperforms nine leading models in this domain. Additionally, simulations with varying levels of missing data highlight the model's robust performance in handling weakly paired omics data. The clinical relevance and survival outcomes associated with the identified subtypes further validate the interpretability and reliability of the clustering results produced by Subtype-MVCC.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"232 ","pages":"Pages 1-8"},"PeriodicalIF":4.2,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142455004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DGSIST: Clustering spatial transcriptome data based on deep graph structure Infomax DGSIST:基于深度图结构的空间转录组数据聚类 Infomax.
IF 4.2 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-10-15 DOI: 10.1016/j.ymeth.2024.10.002
Yu-Han Xiu , Si-Lin Sun , Bing-Wei Zhou , Ying Wan , Hua Tang , Hai-Xia Long
Although spatial transcriptomics data provide valuable insights into gene expression profiles and the spatial structure of tissues, most studies rely solely on gene expression information, underutilizing the spatial data. To fully leverage the potential of spatial transcriptomics and graph neural networks, the DGSI (Deep Graph Structure Infomax) model is proposed. This innovative graph data processing model uses graph convolutional neural networks and employs an unsupervised learning approach. It maximizes the mutual information between graph-level and node-level representations, emphasizing flexible sampling and aggregation of nodes and their neighbors. This effectively captures and incorporates local information from nodes into the overall graph structure. Additionally, this paper developed the DGSIST framework, an unsupervised cell clustering method that integrates the DGSI model, SVD dimensionality reduction algorithm, and k-means++ clustering algorithm. This aims to identify cell types accurately. DGSIST fully uses spatial transcriptomics data and outperforms existing methods in accuracy. Demonstrations of DGSIST’s capability across various tissue types and technological platforms have shown its effectiveness in accurately identifying spatial domains in multiple tissue sections. Compared to other spatial clustering methods, DGSIST excels in cell clustering and effectively eliminates batch effects without needing batch correction. DGSIST excels in spatial clustering analysis, spatial variation identification, and differential gene expression detection and directly applies to graph analysis tasks, such as node classification, link prediction, or graph clustering. Anticipation lies in the contribution of the DGSIST framework to a deeper understanding of the spatial organizational structures of diseases such as cancer.
虽然空间转录组学数据为了解基因表达谱和组织的空间结构提供了宝贵的视角,但大多数研究仅依赖于基因表达信息,对空间数据利用不足。为了充分利用空间转录组学和图神经网络的潜力,我们提出了 DGSI(深度图结构 Infomax)模型。这种创新的图数据处理模型使用图卷积神经网络,并采用无监督学习方法。它最大化了图层和节点层表征之间的互信息,强调节点及其邻居的灵活采样和聚合。这能有效捕捉节点的局部信息并将其纳入整体图结构中。此外,本文还开发了 DGSIST 框架,这是一种无监督细胞聚类方法,集成了 DGSI 模型、SVD 降维算法和 k-means++ 聚类算法。其目的是准确识别细胞类型。DGSIST 充分利用了空间转录组学数据,其准确性优于现有方法。DGSIST 在各种组织类型和技术平台上的能力展示表明,它能有效准确地识别多个组织切片中的空间域。与其他空间聚类方法相比,DGSIST 在细胞聚类方面表现出色,能有效消除批次效应,无需批次校正。DGSIST 在空间聚类分析、空间变异识别和差异基因表达检测方面表现出色,并可直接应用于节点分类、链接预测或图聚类等图分析任务。我们期待 DGSIST 框架能为深入了解癌症等疾病的空间组织结构做出贡献。
{"title":"DGSIST: Clustering spatial transcriptome data based on deep graph structure Infomax","authors":"Yu-Han Xiu ,&nbsp;Si-Lin Sun ,&nbsp;Bing-Wei Zhou ,&nbsp;Ying Wan ,&nbsp;Hua Tang ,&nbsp;Hai-Xia Long","doi":"10.1016/j.ymeth.2024.10.002","DOIUrl":"10.1016/j.ymeth.2024.10.002","url":null,"abstract":"<div><div>Although spatial transcriptomics data provide valuable insights into gene expression profiles and the spatial structure of tissues, most studies rely solely on gene expression information, underutilizing the spatial data. To fully leverage the potential of spatial transcriptomics and graph neural networks, the DGSI (Deep Graph Structure Infomax) model is proposed. This innovative graph data processing model uses graph convolutional neural networks and employs an unsupervised learning approach. It maximizes the mutual information between graph-level and node-level representations, emphasizing flexible sampling and aggregation of nodes and their neighbors. This effectively captures and incorporates local information from nodes into the overall graph structure. Additionally, this paper developed the DGSIST framework, an unsupervised cell clustering method that integrates the DGSI model, SVD dimensionality reduction algorithm, and k-means++ clustering algorithm. This aims to identify cell types accurately. DGSIST fully uses spatial transcriptomics data and outperforms existing methods in accuracy. Demonstrations of DGSIST’s capability across various tissue types and technological platforms have shown its effectiveness in accurately identifying spatial domains in multiple tissue sections. Compared to other spatial clustering methods, DGSIST excels in cell clustering and effectively eliminates batch effects without needing batch correction. DGSIST excels in spatial clustering analysis, spatial variation identification, and differential gene expression detection and directly applies to graph analysis tasks, such as node classification, link prediction, or graph clustering. Anticipation lies in the contribution of the DGSIST framework to a deeper understanding of the spatial organizational structures of diseases such as cancer.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"231 ","pages":"Pages 226-236"},"PeriodicalIF":4.2,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142455003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Methods
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1