Biodata Mining最新文献_第6页

QIGTD: identifying critical genes in the evolution of lung adenocarcinoma with tensor decomposition. QIGTD：通过张量分解确定肺腺癌演变过程中的关键基因。

IF 4 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2024-09-04 DOI: 10.1186/s13040-024-00386-w

Bolin Chen, Jinlei Zhang, Ci Shao, Jun Bian, Ruiming Kang, Xuequn Shang

Background: Identifying critical genes is important for understanding the pathogenesis of complex diseases. Traditional studies typically comparing the change of biomecules between normal and disease samples or detecting important vertices from a single static biomolecular network, which often overlook the dynamic changes that occur between different disease stages. However, investigating temporal changes in biomolecular networks and identifying critical genes is critical for understanding the occurrence and development of diseases.

Methods: A novel method called Quantifying Importance of Genes with Tensor Decomposition (QIGTD) was proposed in this study. It first constructs a time series network by integrating both the intra and inter temporal network information, which preserving connections between networks at adjacent stages according to the local similarities. A tensor is employed to describe the connections of this time series network, and a 3-order tensor decomposition method was proposed to capture both the topological information of each network snapshot and the time series characteristics of the whole network. QIGTD is also a learning-free and efficient method that can be applied to datasets with a small number of samples.

Results: The effectiveness of QIGTD was evaluated using lung adenocarcinoma (LUAD) datasets and three state-of-the-art methods: T-degree, T-closeness, and T-betweenness were employed as benchmark methods. Numerical experimental results demonstrate that QIGTD outperforms these methods in terms of the indices of both precision and mAP. Notably, out of the top 50 genes, 29 have been verified to be highly related to LUAD according to the DisGeNET Database, and 36 are significantly enriched in LUAD related Gene Ontology (GO) terms, including nuclear division, mitotic nuclear division, chromosome segregation, organelle fission, and mitotic sister chromatid segregation.

Conclusion: In conclusion, QIGTD effectively captures the temporal changes in gene networks and identifies critical genes. It provides a valuable tool for studying temporal dynamics in biological networks and can aid in understanding the underlying mechanisms of diseases such as LUAD.

背景：识别关键基因对于了解复杂疾病的发病机制非常重要。传统研究通常比较正常样本与疾病样本之间生物分子的变化，或从单一静态生物分子网络中检测重要顶点，这往往忽略了不同疾病阶段之间发生的动态变化。然而，研究生物分子网络的时间变化并确定关键基因对于了解疾病的发生和发展至关重要：方法：本研究提出了一种名为 "张量分解基因重要性量化（QIGTD）"的新方法。它首先通过整合时间内和时间间的网络信息构建时间序列网络，根据局部相似性保留相邻阶段网络之间的连接。采用张量来描述该时间序列网络的连接，并提出了一种三阶张量分解方法，以捕捉每个网络快照的拓扑信息和整个网络的时间序列特征。QIGTD 也是一种无需学习的高效方法，可用于样本数量较少的数据集：使用肺腺癌（LUAD）数据集和三种最先进的方法评估了 QIGTD 的有效性：以 T-degree、T-closeness 和 T-betweenness 作为基准方法。数值实验结果表明，QIGTD 在精确度和 mAP 两项指标上都优于这些方法。值得注意的是，根据 DisGeNET 数据库，在前 50 个基因中，有 29 个已被证实与 LUAD 高度相关，有 36 个显著富集了与 LUAD 相关的基因本体（Gene Ontology，GO）术语，包括核分裂、有丝分裂核分裂、染色体分离、细胞器裂变和有丝分裂姐妹染色单体分离：总之，QIGTD 能有效捕捉基因网络的时间变化并识别关键基因。结论：QIGTD 能有效捕捉基因网络的时间变化并识别关键基因，它为研究生物网络的时间动态提供了一种有价值的工具，有助于了解 LUAD 等疾病的潜在机制。

{"title":"QIGTD: identifying critical genes in the evolution of lung adenocarcinoma with tensor decomposition.","authors":"Bolin Chen, Jinlei Zhang, Ci Shao, Jun Bian, Ruiming Kang, Xuequn Shang","doi":"10.1186/s13040-024-00386-w","DOIUrl":"10.1186/s13040-024-00386-w","url":null,"abstract":"Background: Identifying critical genes is important for understanding the pathogenesis of complex diseases. Traditional studies typically comparing the change of biomecules between normal and disease samples or detecting important vertices from a single static biomolecular network, which often overlook the dynamic changes that occur between different disease stages. However, investigating temporal changes in biomolecular networks and identifying critical genes is critical for understanding the occurrence and development of diseases.Methods: A novel method called Quantifying Importance of Genes with Tensor Decomposition (QIGTD) was proposed in this study. It first constructs a time series network by integrating both the intra and inter temporal network information, which preserving connections between networks at adjacent stages according to the local similarities. A tensor is employed to describe the connections of this time series network, and a 3-order tensor decomposition method was proposed to capture both the topological information of each network snapshot and the time series characteristics of the whole network. QIGTD is also a learning-free and efficient method that can be applied to datasets with a small number of samples.Results: The effectiveness of QIGTD was evaluated using lung adenocarcinoma (LUAD) datasets and three state-of-the-art methods: T-degree, T-closeness, and T-betweenness were employed as benchmark methods. Numerical experimental results demonstrate that QIGTD outperforms these methods in terms of the indices of both precision and mAP. Notably, out of the top 50 genes, 29 have been verified to be highly related to LUAD according to the DisGeNET Database, and 36 are significantly enriched in LUAD related Gene Ontology (GO) terms, including nuclear division, mitotic nuclear division, chromosome segregation, organelle fission, and mitotic sister chromatid segregation.Conclusion: In conclusion, QIGTD effectively captures the temporal changes in gene networks and identifies critical genes. It provides a valuable tool for studying temporal dynamics in biological networks and can aid in understanding the underlying mechanisms of diseases such as LUAD.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"30"},"PeriodicalIF":4.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11376055/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142134277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Seven quick tips for gene-focused computational pangenomic analysis. 以基因为重点的计算庞基因组分析的七个快速提示。

IF 4 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2024-09-03 DOI: 10.1186/s13040-024-00380-2

Vincenzo Bonnici, Davide Chicco

Pangenomics is a relatively new scientific field which investigates the union of all the genomes of a clade. The word pan means everything in ancient Greek; the term pangenomics originally regarded genomes of bacteria and was later intended to refer to human genomes as well. Modern bioinformatics offers several tools to analyze pangenomics data, paving the way to an emerging field that we can call computational pangenomics. Current computational power available for the bioinformatics community has made computational pangenomic analyses easy to perform, but this higher accessibility to pangenomics analysis also increases the chances to make mistakes and to produce misleading or inflated results, especially by beginners. To handle this problem, we present here a few quick tips for efficient and correct computational pangenomic analyses with a focus on bacterial pangenomics, by describing common mistakes to avoid and experienced best practices to follow in this field. We believe our recommendations can help the readers perform more robust and sound pangenomic analyses and to generate more reliable results.

泛基因组学（Pangenomics）是一个相对较新的科学领域，研究一个支系所有基因组的结合。在古希腊语中，"pan "意为万物；"pangenomics "一词最初指细菌基因组，后来也指人类基因组。现代生物信息学为分析泛基因组学数据提供了多种工具，为我们称之为计算泛基因组学的新兴领域铺平了道路。目前生物信息学界可用的计算能力使计算庞基因组学分析变得容易执行，但庞基因组学分析的更高可及性也增加了犯错和产生误导性或夸大结果的机会，尤其是初学者。为了解决这个问题，我们在此介绍一些快速窍门，以高效、正确地进行计算庞基因组学分析，重点是细菌庞基因组学，介绍该领域应避免的常见错误和应遵循的最佳实践经验。我们相信，我们的建议能帮助读者进行更稳健、更合理的庞基因组分析，并得出更可靠的结果。

引用次数: 0

Deep learning for automatic calcium detection in echocardiography. 深度学习用于超声心动图中的自动钙检测。

IF 4 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2024-08-28 DOI: 10.1186/s13040-024-00381-1

Luís B Elvas, Sara Gomes, João C Ferreira, Luís Brás Rosário, Tomás Brandão

Cardiovascular diseases are the main cause of death in the world and cardiovascular imaging techniques are the mainstay of noninvasive diagnosis. Aortic stenosis is a lethal cardiac disease preceded by aortic valve calcification for several years. Data-driven tools developed with Deep Learning (DL) algorithms can process and categorize medical images data, providing fast diagnoses with considered reliability, to improve healthcare effectiveness. A systematic review of DL applications on medical images for pathologic calcium detection concluded that there are established techniques in this field, using primarily CT scans, at the expense of radiation exposure. Echocardiography is an unexplored alternative to detect calcium, but still needs technological developments. In this article, a fully automated method based on Convolutional Neural Networks (CNNs) was developed to detect Aortic Calcification in Echocardiography images, consisting of two essential processes: (1) an object detector to locate aortic valve - achieving 95% of precision and 100% of recall; and (2) a classifier to identify calcium structures in the valve - which achieved 92% of precision and 100% of recall. The outcome of this work is the possibility of automation of the detection with Echocardiography of Aortic Valve Calcification, a lethal and prevalent disease.

心血管疾病是世界上最主要的死亡原因，而心血管成像技术是无创诊断的主要手段。主动脉瓣狭窄是一种致命的心脏疾病，主动脉瓣钙化会持续数年。利用深度学习（DL）算法开发的数据驱动工具可以对医学影像数据进行处理和分类，提供可靠的快速诊断，从而提高医疗保健的效率。一项关于将深度学习应用于病理钙检测的医学图像的系统性综述得出结论，该领域已有成熟的技术，主要使用 CT 扫描，但以辐射暴露为代价。超声心动图是一种尚未开发的检测钙的替代方法，但仍需要技术发展。本文开发了一种基于卷积神经网络（CNN）的全自动方法来检测超声心动图图像中的主动脉钙化，该方法由两个基本过程组成：（1）定位主动脉瓣的物体检测器--精确度达到 95%，召回率达到 100%；（2）识别瓣膜中钙结构的分类器--精确度达到 92%，召回率达到 100%。这项工作的成果是实现了主动脉瓣钙化这一致命流行病的超声心动图自动化检测。

{"title":"Deep learning for automatic calcium detection in echocardiography.","authors":"Luís B Elvas, Sara Gomes, João C Ferreira, Luís Brás Rosário, Tomás Brandão","doi":"10.1186/s13040-024-00381-1","DOIUrl":"10.1186/s13040-024-00381-1","url":null,"abstract":"Cardiovascular diseases are the main cause of death in the world and cardiovascular imaging techniques are the mainstay of noninvasive diagnosis. Aortic stenosis is a lethal cardiac disease preceded by aortic valve calcification for several years. Data-driven tools developed with Deep Learning (DL) algorithms can process and categorize medical images data, providing fast diagnoses with considered reliability, to improve healthcare effectiveness. A systematic review of DL applications on medical images for pathologic calcium detection concluded that there are established techniques in this field, using primarily CT scans, at the expense of radiation exposure. Echocardiography is an unexplored alternative to detect calcium, but still needs technological developments. In this article, a fully automated method based on Convolutional Neural Networks (CNNs) was developed to detect Aortic Calcification in Echocardiography images, consisting of two essential processes: (1) an object detector to locate aortic valve - achieving 95% of precision and 100% of recall; and (2) a classifier to identify calcium structures in the valve - which achieved 92% of precision and 100% of recall. The outcome of this work is the possibility of automation of the detection with Echocardiography of Aortic Valve Calcification, a lethal and prevalent disease.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"27"},"PeriodicalIF":4.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11351547/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142094005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Integrating transcriptomics and proteomics to analyze the immune microenvironment of cytomegalovirus associated ulcerative colitis and identify relevant biomarkers. 整合转录组学和蛋白质组学，分析巨细胞病毒相关性溃疡性结肠炎的免疫微环境并确定相关生物标记物。

IF 4 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2024-08-27 DOI: 10.1186/s13040-024-00382-0

Yang Chen, Qingqing Zheng, Hui Wang, Peiren Tang, Li Deng, Pu Li, Huan Li, Jianhong Hou, Jie Li, Li Wang, Jun Peng

Background: In recent years, significant morbidity and mortality in patients with severe inflammatory bowel disease (IBD) and cytomegalovirus (CMV) have drawn considerable attention to the status of CMV infection in the intestinal mucosa of IBD patients and its role in disease progression. However, there is currently no high-throughput sequencing data for ulcerative colitis patients with CMV infection (CMV + UC), and the immune microenvironment in CMV + UC patients have yet to be explored.

Method: The xCell algorithm was used for evaluate the immune microenvironment of CMV + UC patients. Then, WGCNA analysis was explored to obtain the co-expression modules between abnormal immune cells and gene level or protein level. Next, three machine learning approach include Random Forest, SVM-rfe, and Lasso were used to filter candidate biomarkers. Finally, Best Subset Selection algorithms was performed to construct the diagnostic model.

Results: In this study, we performed transcriptomic and proteomic sequencing on CMV + UC patients to establish a comprehensive immune microenvironment profile and found 11 specific abnormal immune cells in CMV + UC group. After using multi-omics integration algorithms, we identified seven co-expression gene modules and five co-expression protein modules. Subsequently, we utilized various machine learning algorithms to identify key biomarkers with diagnostic efficacy and constructed an early diagnostic model. We identified a total of eight biomarkers (PPP1R12B, CIRBP, CSNK2A2, DNAJB11, PIK3R4, RRBP1, STX5, TMEM214) that play crucial roles in the immune microenvironment of CMV + UC and exhibit superior diagnostic performance for CMV + UC.

Conclusion: This 8 biomarkers model offers a new paradigm for the diagnosis and treatment of IBD patients post-CMV infection. Further research into this model will be significant for understanding the changes in the host immune microenvironment following CMV infection.

背景：近年来，严重炎症性肠病（IBD）和巨细胞病毒（CMV）患者的发病率和死亡率显著上升，这引起了人们对IBD患者肠粘膜CMV感染状况及其在疾病进展中所起作用的极大关注。然而，目前还没有CMV感染的溃疡性结肠炎患者（CMV + UC）的高通量测序数据，CMV + UC患者的免疫微环境也有待探索：方法：采用 xCell 算法评估 CMV + UC 患者的免疫微环境。方法：采用 xCell 算法评估 CMV + UC 患者的免疫微环境，然后通过 WGCNA 分析获得异常免疫细胞与基因水平或蛋白质水平的共表达模块。接着，使用随机森林、SVM-rfe 和 Lasso 三种机器学习方法筛选候选生物标记物。最后，采用最佳子集选择算法构建诊断模型：在这项研究中，我们对 CMV + UC 患者进行了转录组学和蛋白质组学测序，以建立全面的免疫微环境谱，并在 CMV + UC 组中发现了 11 种特异性异常免疫细胞。在使用多组学整合算法后，我们确定了 7 个共表达基因模块和 5 个共表达蛋白质模块。随后，我们利用各种机器学习算法确定了具有诊断功效的关键生物标志物，并构建了早期诊断模型。我们共发现了8个生物标志物（PPP1R12B、CIRBP、CSNK2A2、DNAJB11、PIK3R4、RRBP1、STX5、TMEM214），它们在CMV + UC的免疫微环境中发挥着关键作用，并对CMV + UC表现出卓越的诊断性能：结论：这 8 个生物标志物模型为 CMV 感染后 IBD 患者的诊断和治疗提供了新的范例。对该模型的进一步研究将对了解 CMV 感染后宿主免疫微环境的变化具有重要意义。

{"title":"Integrating transcriptomics and proteomics to analyze the immune microenvironment of cytomegalovirus associated ulcerative colitis and identify relevant biomarkers.","authors":"Yang Chen, Qingqing Zheng, Hui Wang, Peiren Tang, Li Deng, Pu Li, Huan Li, Jianhong Hou, Jie Li, Li Wang, Jun Peng","doi":"10.1186/s13040-024-00382-0","DOIUrl":"10.1186/s13040-024-00382-0","url":null,"abstract":"Background: In recent years, significant morbidity and mortality in patients with severe inflammatory bowel disease (IBD) and cytomegalovirus (CMV) have drawn considerable attention to the status of CMV infection in the intestinal mucosa of IBD patients and its role in disease progression. However, there is currently no high-throughput sequencing data for ulcerative colitis patients with CMV infection (CMV + UC), and the immune microenvironment in CMV + UC patients have yet to be explored.Method: The xCell algorithm was used for evaluate the immune microenvironment of CMV + UC patients. Then, WGCNA analysis was explored to obtain the co-expression modules between abnormal immune cells and gene level or protein level. Next, three machine learning approach include Random Forest, SVM-rfe, and Lasso were used to filter candidate biomarkers. Finally, Best Subset Selection algorithms was performed to construct the diagnostic model.Results: In this study, we performed transcriptomic and proteomic sequencing on CMV + UC patients to establish a comprehensive immune microenvironment profile and found 11 specific abnormal immune cells in CMV + UC group. After using multi-omics integration algorithms, we identified seven co-expression gene modules and five co-expression protein modules. Subsequently, we utilized various machine learning algorithms to identify key biomarkers with diagnostic efficacy and constructed an early diagnostic model. We identified a total of eight biomarkers (PPP1R12B, CIRBP, CSNK2A2, DNAJB11, PIK3R4, RRBP1, STX5, TMEM214) that play crucial roles in the immune microenvironment of CMV + UC and exhibit superior diagnostic performance for CMV + UC.Conclusion: This 8 biomarkers model offers a new paradigm for the diagnosis and treatment of IBD patients post-CMV infection. Further research into this model will be significant for understanding the changes in the host immune microenvironment following CMV infection.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"26"},"PeriodicalIF":4.0,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11348729/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142082326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Understanding predictions of drug profiles using explainable machine learning models 利用可解释的机器学习模型了解药物概况预测

IF 4.5 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2024-08-01 DOI: 10.1186/s13040-024-00378-w

Caroline König, Alfredo Vellido

The analysis of absorption, distribution, metabolism, and excretion (ADME) molecular properties is of relevance to drug design, as they directly influence the drug’s effectiveness at its target location. This study concerns their prediction, using explainable Machine Learning (ML) models. The aim of the study is to find which molecular features are relevant to the prediction of the different ADME properties and measure their impact on the predictive model. The relative relevance of individual features for ADME activity is gauged by estimating feature importance in ML models’ predictions. Feature importance is calculated using feature permutation and the individual impact of features is measured by SHAP additive explanations. The study reveals the relevance of specific molecular descriptors for each ADME property and quantifies their impact on the ADME property prediction. The reported research illustrates how explainable ML models can provide detailed insights about the individual contributions of molecular features to the final prediction of an ADME property, as an effort to support experts in the process of drug candidate selection through a better understanding of the impact of molecular features.

吸收、分布、代谢和排泄（ADME）分子特性的分析与药物设计息息相关，因为它们直接影响药物在靶点的有效性。本研究利用可解释的机器学习（ML）模型对其进行预测。研究的目的是找出与预测不同 ADME 特性相关的分子特征，并衡量它们对预测模型的影响。通过估算特征在 ML 模型预测中的重要性来衡量各个特征与 ADME 活性的相对相关性。特征重要性通过特征排列来计算，特征的个体影响则通过 SHAP 相加解释来衡量。该研究揭示了特定分子描述符对每种 ADME 特性的相关性，并量化了它们对 ADME 特性预测的影响。所报告的研究说明了可解释的 ML 模型如何能够提供有关分子特征对 ADME 特性最终预测的个别贡献的详细见解，从而通过更好地了解分子特征的影响，在候选药物选择过程中为专家提供支持。

{"title":"Understanding predictions of drug profiles using explainable machine learning models","authors":"Caroline König, Alfredo Vellido","doi":"10.1186/s13040-024-00378-w","DOIUrl":"https://doi.org/10.1186/s13040-024-00378-w","url":null,"abstract":"The analysis of absorption, distribution, metabolism, and excretion (ADME) molecular properties is of relevance to drug design, as they directly influence the drug’s effectiveness at its target location. This study concerns their prediction, using explainable Machine Learning (ML) models. The aim of the study is to find which molecular features are relevant to the prediction of the different ADME properties and measure their impact on the predictive model. The relative relevance of individual features for ADME activity is gauged by estimating feature importance in ML models’ predictions. Feature importance is calculated using feature permutation and the individual impact of features is measured by SHAP additive explanations. The study reveals the relevance of specific molecular descriptors for each ADME property and quantifies their impact on the ADME property prediction. The reported research illustrates how explainable ML models can provide detailed insights about the individual contributions of molecular features to the final prediction of an ADME property, as an effort to support experts in the process of drug candidate selection through a better understanding of the impact of molecular features.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"45 1","pages":""},"PeriodicalIF":4.5,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141862771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Modelling the nicotine pharmacokinetic profile for e-cigarettes using real time monitoring of consumers' physiological measurements and mouth level exposure. 利用对消费者生理测量数据和口腔接触水平的实时监测，模拟电子烟的尼古丁药代动力学特征。

IF 4 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2024-07-17 DOI: 10.1186/s13040-024-00375-z

Krishna Prasad, Allen Griffiths, Kavya Agrawal, Michael McEwan, Flavio Macci, Marco Ghisoni, Matthew Stopher, Matthew Napleton, Joel Strickland, David Keating, Thomas Whitehead, Gareth Conduit, Stacey Murray, Lauren Edward

Pharmacokinetic (PK) studies can provide essential information on abuse liability of nicotine and tobacco products but are intrusive and must be conducted in a clinical environment. The objective of the study was to explore whether changes in plasma nicotine levels following use of an e-cigarette can be predicted from real time monitoring of physiological parameters and mouth level exposure (MLE) to nicotine before, during, and after e-cigarette vaping, using wearable devices. Such an approach would allow an -effective pre-screening process, reducing the number of clinical studies, reducing the number of products to be tested and the number of blood draws required in a clinical PK study Establishing such a prediction model might facilitate the longitudinal collection of data on product use and nicotine expression among consumers using nicotine products in their normal environments, thereby reducing the need for intrusive clinical studies while generating PK data related to product use in the real world.An exploratory machine learning model was developed to predict changes in plasma nicotine levels following the use of an e-cigarette; from real time monitoring of physiological parameters and MLE to nicotine before, during, and after e-cigarette vaping. This preliminary study identified key parameters, such as heart rate (HR), heart rate variability (HRV), and physiological stress (PS) that may act as predictors for an individual's plasma nicotine response (PK curve). Relative to baseline measurements (per participant), HR showed a significant increase for nicotine containing e-liquids and was consistent across sessions (intra-participant). Imputing missing values and training the model on all data resulted in 57% improvement from the original'learning' data and achieved a median validation R² of 0.70.The study is in its exploratory phase, with limitations including a small and non-diverse sample size and reliance on data from a single e-cigarette product. These findings necessitate further research for validation and to enhance the model's generalisability and applicability in real-world settings. This study serves as a foundational step towards developing non-intrusive PK models for nicotine product use.

药代动力学（PK）研究可以提供有关尼古丁和烟草产品滥用责任的重要信息，但具有侵入性，必须在临床环境中进行。这项研究的目的是探索在使用电子烟之前、期间和之后，利用可穿戴设备对生理参数和口腔尼古丁暴露水平（MLE）进行实时监测，是否可以预测使用电子烟后血浆尼古丁水平的变化。建立这种预测模型可能有助于纵向收集在正常环境中使用尼古丁产品的消费者的产品使用和尼古丁表达数据，从而减少对侵入性临床研究的需求，同时生成与真实世界中产品使用相关的 PK 数据。我们开发了一个探索性的机器学习模型，以预测使用电子烟后血浆尼古丁水平的变化；该模型来自对电子烟吸食前、吸食中和吸食后的生理参数和尼古丁 MLE 的实时监测。这项初步研究确定了一些关键参数，如心率（HR）、心率变异性（HRV）和生理压力（PS），这些参数可作为个人血浆尼古丁反应（PK 曲线）的预测因子。相对于基线测量值（每位参与者），含有尼古丁的电子烟的心率显著增加，并且在不同疗程中（参与者内部）保持一致。对所有数据进行缺失值补偿和模型训练后，原始 "学习 "数据提高了 57%，中位验证 R2 为 0.70。该研究目前处于探索阶段，其局限性包括样本量小且不多样化，以及依赖于单一电子烟产品的数据。这些发现需要进一步的研究来验证，并增强模型在现实环境中的普遍性和适用性。这项研究为开发尼古丁产品使用的非侵入式 PK 模型迈出了基础性的一步。

{"title":"Modelling the nicotine pharmacokinetic profile for e-cigarettes using real time monitoring of consumers' physiological measurements and mouth level exposure.","authors":"Krishna Prasad, Allen Griffiths, Kavya Agrawal, Michael McEwan, Flavio Macci, Marco Ghisoni, Matthew Stopher, Matthew Napleton, Joel Strickland, David Keating, Thomas Whitehead, Gareth Conduit, Stacey Murray, Lauren Edward","doi":"10.1186/s13040-024-00375-z","DOIUrl":"10.1186/s13040-024-00375-z","url":null,"abstract":"Pharmacokinetic (PK) studies can provide essential information on abuse liability of nicotine and tobacco products but are intrusive and must be conducted in a clinical environment. The objective of the study was to explore whether changes in plasma nicotine levels following use of an e-cigarette can be predicted from real time monitoring of physiological parameters and mouth level exposure (MLE) to nicotine before, during, and after e-cigarette vaping, using wearable devices. Such an approach would allow an -effective pre-screening process, reducing the number of clinical studies, reducing the number of products to be tested and the number of blood draws required in a clinical PK study Establishing such a prediction model might facilitate the longitudinal collection of data on product use and nicotine expression among consumers using nicotine products in their normal environments, thereby reducing the need for intrusive clinical studies while generating PK data related to product use in the real world.An exploratory machine learning model was developed to predict changes in plasma nicotine levels following the use of an e-cigarette; from real time monitoring of physiological parameters and MLE to nicotine before, during, and after e-cigarette vaping. This preliminary study identified key parameters, such as heart rate (HR), heart rate variability (HRV), and physiological stress (PS) that may act as predictors for an individual's plasma nicotine response (PK curve). Relative to baseline measurements (per participant), HR showed a significant increase for nicotine containing e-liquids and was consistent across sessions (intra-participant). Imputing missing values and training the model on all data resulted in 57% improvement from the original'learning' data and achieved a median validation R2 of 0.70.The study is in its exploratory phase, with limitations including a small and non-diverse sample size and reliance on data from a single e-cigarette product. These findings necessitate further research for validation and to enhance the model's generalisability and applicability in real-world settings. This study serves as a foundational step towards developing non-intrusive PK models for nicotine product use.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"24"},"PeriodicalIF":4.0,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11253374/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141635153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Construction and application of medication reminder system: intelligent generation of universal medication schedule. 用药提醒系统的构建与应用：智能生成通用用药计划表。

IF 4 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2024-07-15 DOI: 10.1186/s13040-024-00376-y

Hangxing Huang, Lu Zhang, Yongyu Yang, Ling Huang, Xikui Lu, Jingyang Li, Huimin Yu, Shuqiao Cheng, Jian Xiao

Background: Patients with chronic conditions need multiple medications daily to manage their condition. However, most patients have poor compliance, which affects the effectiveness of treatment. To address these challenges, we establish a medication reminder system for the intelligent generation of universal medication schedule (UMS) to remind patients with chronic diseases to take medication accurately and to improve safety of home medication.

Methods: To design medication time constraint with one drug (MTCOD) for each drug and medication time constraint with multi-drug (MTCMD) for each two drugs in order to better regulate the interval and time of patients' medication. Establishment of a medication reminder system consisting of a cloud database of drug information, an operator terminal for medical staff and a patient terminal.

Results: The cloud database has a total of 153,916 pharmaceutical products, 496,708 drug interaction data, and 153,390 pharmaceutical product-ingredient pairs. The MTCOD data was 153,916, and the MTCMD data was 8,552,712. An intelligent UMS medication reminder system was constructed. The system can read the prescription information of patients and provide personalized medication guidance with medication timeline for chronic patients. At the same time, patients can query medication information and get remote pharmacy guidance in real time.

Conclusions: Overall, the medication reminder system provides intelligent medication reminders, automatic drug interaction identification, and monitoring system, which is helpful to monitor the entire process of treatment in patients with chronic diseases.

背景：慢性病患者每天需要服用多种药物来控制病情。然而，大多数患者的依从性较差，影响了治疗效果。为解决这些难题，我们建立了一个用药提醒系统，用于智能生成通用用药时间表（UMS），提醒慢性病患者准确服药，并提高家庭用药的安全性：方法：设计每种药物的单药服药时间约束（MTCOD）和每两种药物的多药服药时间约束（MTCMD），以更好地调节患者服药的间隔和时间。建立由药物信息云数据库、医务人员操作终端和患者终端组成的用药提醒系统：云数据库共有 153,916 个药品、496,708 个药物相互作用数据和 153,390 对药品成分。MTCOD 数据为 153,916 条，MTCMD 数据为 8,552,712 条。构建了一个智能 UMS 用药提醒系统。该系统可读取患者的处方信息，并为慢性病患者提供个性化的用药指导和用药时间表。同时，患者可实时查询用药信息并获得远程药房指导：总之，用药提醒系统提供了智能用药提醒、药物相互作用自动识别和监测系统，有助于监测慢性病患者的整个治疗过程。

{"title":"Construction and application of medication reminder system: intelligent generation of universal medication schedule.","authors":"Hangxing Huang, Lu Zhang, Yongyu Yang, Ling Huang, Xikui Lu, Jingyang Li, Huimin Yu, Shuqiao Cheng, Jian Xiao","doi":"10.1186/s13040-024-00376-y","DOIUrl":"10.1186/s13040-024-00376-y","url":null,"abstract":"Background: Patients with chronic conditions need multiple medications daily to manage their condition. However, most patients have poor compliance, which affects the effectiveness of treatment. To address these challenges, we establish a medication reminder system for the intelligent generation of universal medication schedule (UMS) to remind patients with chronic diseases to take medication accurately and to improve safety of home medication.Methods: To design medication time constraint with one drug (MTCOD) for each drug and medication time constraint with multi-drug (MTCMD) for each two drugs in order to better regulate the interval and time of patients' medication. Establishment of a medication reminder system consisting of a cloud database of drug information, an operator terminal for medical staff and a patient terminal.Results: The cloud database has a total of 153,916 pharmaceutical products, 496,708 drug interaction data, and 153,390 pharmaceutical product-ingredient pairs. The MTCOD data was 153,916, and the MTCMD data was 8,552,712. An intelligent UMS medication reminder system was constructed. The system can read the prescription information of patients and provide personalized medication guidance with medication timeline for chronic patients. At the same time, patients can query medication information and get remote pharmacy guidance in real time.Conclusions: Overall, the medication reminder system provides intelligent medication reminders, automatic drug interaction identification, and monitoring system, which is helpful to monitor the entire process of treatment in patients with chronic diseases.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"23"},"PeriodicalIF":4.0,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11247871/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141621275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Building RadiologyNET: an unsupervised approach to annotating a large-scale multimodal medical database. 构建放射学网络：注释大规模多模态医学数据库的无监督方法。

IF 4 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2024-07-12 DOI: 10.1186/s13040-024-00373-1

Mateja Napravnik, Franko Hržić, Sebastian Tschauner, Ivan Štajduhar

Background: The use of machine learning in medical diagnosis and treatment has grown significantly in recent years with the development of computer-aided diagnosis systems, often based on annotated medical radiology images. However, the lack of large annotated image datasets remains a major obstacle, as the annotation process is time-consuming and costly. This study aims to overcome this challenge by proposing an automated method for annotating a large database of medical radiology images based on their semantic similarity.

Results: An automated, unsupervised approach is used to create a large annotated dataset of medical radiology images originating from the Clinical Hospital Centre Rijeka, Croatia. The pipeline is built by data-mining three different types of medical data: images, DICOM metadata and narrative diagnoses. The optimal feature extractors are then integrated into a multimodal representation, which is then clustered to create an automated pipeline for labelling a precursor dataset of 1,337,926 medical images into 50 clusters of visually similar images. The quality of the clusters is assessed by examining their homogeneity and mutual information, taking into account the anatomical region and modality representation.

Conclusions: The results indicate that fusing the embeddings of all three data sources together provides the best results for the task of unsupervised clustering of large-scale medical data and leads to the most concise clusters. Hence, this work marks the initial step towards building a much larger and more fine-grained annotated dataset of medical radiology images.

背景：近年来，随着计算机辅助诊断系统的发展，机器学习在医学诊断和治疗中的应用有了显著增长，这些系统通常是基于有注释的医学放射图像。然而，由于注释过程耗时且成本高昂，缺乏大型注释图像数据集仍是一大障碍。本研究旨在通过提出一种基于语义相似性的自动注释大型医学放射图像数据库的方法来克服这一挑战：结果：采用一种自动化、无监督的方法创建了一个大型医学放射图像注释数据集，该数据集来自克罗地亚里耶卡临床医院中心。该管道是通过对三种不同类型的医疗数据进行数据挖掘而建立的：图像、DICOM 元数据和叙述性诊断。然后将最佳特征提取器集成到多模态表示中，再对其进行聚类，从而创建一个自动管道，将包含 1,337,926 张医疗图像的前体数据集标记为 50 个视觉相似图像集群。考虑到解剖区域和模式表示，通过检查其同质性和互信息来评估聚类的质量：结果表明，在对大规模医疗数据进行无监督聚类时，将所有三个数据源的嵌入融合在一起可获得最佳结果，并产生最简洁的聚类。因此，这项工作标志着我们朝着建立一个更大、更精细的医学放射图像注释数据集迈出了第一步。

{"title":"Building RadiologyNET: an unsupervised approach to annotating a large-scale multimodal medical database.","authors":"Mateja Napravnik, Franko Hržić, Sebastian Tschauner, Ivan Štajduhar","doi":"10.1186/s13040-024-00373-1","DOIUrl":"10.1186/s13040-024-00373-1","url":null,"abstract":"Background: The use of machine learning in medical diagnosis and treatment has grown significantly in recent years with the development of computer-aided diagnosis systems, often based on annotated medical radiology images. However, the lack of large annotated image datasets remains a major obstacle, as the annotation process is time-consuming and costly. This study aims to overcome this challenge by proposing an automated method for annotating a large database of medical radiology images based on their semantic similarity.Results: An automated, unsupervised approach is used to create a large annotated dataset of medical radiology images originating from the Clinical Hospital Centre Rijeka, Croatia. The pipeline is built by data-mining three different types of medical data: images, DICOM metadata and narrative diagnoses. The optimal feature extractors are then integrated into a multimodal representation, which is then clustered to create an automated pipeline for labelling a precursor dataset of 1,337,926 medical images into 50 clusters of visually similar images. The quality of the clusters is assessed by examining their homogeneity and mutual information, taking into account the anatomical region and modality representation.Conclusions: The results indicate that fusing the embeddings of all three data sources together provides the best results for the task of unsupervised clustering of large-scale medical data and leads to the most concise clusters. Hence, this work marks the initial step towards building a much larger and more fine-grained annotated dataset of medical radiology images.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"22"},"PeriodicalIF":4.0,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11245804/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141602017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Transcriptome- and DNA methylation-based cell-type deconvolutions produce similar estimates of differential gene expression and differential methylation. 转录组和基于 DNA 甲基化的细胞类型解旋对差异基因表达和差异甲基化的估计结果相似。

IF 4 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2024-07-11 DOI: 10.1186/s13040-024-00374-0

Emily R Hannon, Carmen J Marsit, Arlene E Dent, Paula Embury, Sidney Ogolla, David Midem, Scott M Williams, James W Kazura

Background: Changing cell-type proportions can confound studies of differential gene expression or DNA methylation (DNAm) from peripheral blood mononuclear cells (PBMCs). We examined how cell-type proportions derived from the transcriptome versus the methylome (DNAm) influence estimates of differentially expressed genes (DEGs) and differentially methylated positions (DMPs).

Methods: Transcriptome and DNAm data were obtained from PBMC RNA and DNA of Kenyan children (n = 8) before, during, and 6 weeks following uncomplicated malaria. DEGs and DMPs between time points were detected using cell-type adjusted modeling with Cibersortx or IDOL, respectively.

Results: Most major cell types and principal components had moderate to high correlation between the two deconvolution methods (r = 0.60-0.96). Estimates of cell-type proportions and DEGs or DMPs were largely unaffected by the method, with the greatest discrepancy in the estimation of neutrophils.

Conclusion: Variation in cell-type proportions is captured similarly by both transcriptomic and methylome deconvolution methods for most major cell types.

背景：细胞类型比例的改变可能会混淆外周血单核细胞（PBMCs）差异基因表达或DNA甲基化（DNAm）的研究。我们研究了来自转录组与甲基组（DNAm）的细胞类型比例如何影响差异表达基因（DEGs）和差异甲基化位置（DMPs）的估计值：转录组和 DNAm 数据来自无并发症疟疾发生前、发生期间和发生后 6 周的肯尼亚儿童（n = 8）的 PBMC RNA 和 DNA。利用Cibersortx或IDOL的细胞类型调整模型分别检测时间点之间的DEGs和DMPs：大多数主要细胞类型和主成分在两种解卷积方法之间具有中度到高度的相关性（r = 0.60-0.96）。细胞类型比例和 DEG 或 DMP 的估计值基本不受方法的影响，中性粒细胞的估计值差异最大：结论：对于大多数主要细胞类型，转录组学和甲基组学解旋方法都能相似地捕捉到细胞类型比例的变化。

{"title":"Transcriptome- and DNA methylation-based cell-type deconvolutions produce similar estimates of differential gene expression and differential methylation.","authors":"Emily R Hannon, Carmen J Marsit, Arlene E Dent, Paula Embury, Sidney Ogolla, David Midem, Scott M Williams, James W Kazura","doi":"10.1186/s13040-024-00374-0","DOIUrl":"10.1186/s13040-024-00374-0","url":null,"abstract":"Background: Changing cell-type proportions can confound studies of differential gene expression or DNA methylation (DNAm) from peripheral blood mononuclear cells (PBMCs). We examined how cell-type proportions derived from the transcriptome versus the methylome (DNAm) influence estimates of differentially expressed genes (DEGs) and differentially methylated positions (DMPs).Methods: Transcriptome and DNAm data were obtained from PBMC RNA and DNA of Kenyan children (n = 8) before, during, and 6 weeks following uncomplicated malaria. DEGs and DMPs between time points were detected using cell-type adjusted modeling with Cibersortx or IDOL, respectively.Results: Most major cell types and principal components had moderate to high correlation between the two deconvolution methods (r = 0.60-0.96). Estimates of cell-type proportions and DEGs or DMPs were largely unaffected by the method, with the greatest discrepancy in the estimation of neutrophils.Conclusion: Variation in cell-type proportions is captured similarly by both transcriptomic and methylome deconvolution methods for most major cell types.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"21"},"PeriodicalIF":4.0,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11241886/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141591813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Identification of immune-associated biomarkers of diabetes nephropathy tubulointerstitial injury based on machine learning: a bioinformatics multi-chip integrated analysis. 基于机器学习的糖尿病肾病肾小管间质损伤免疫相关生物标记物的鉴定：生物信息学多芯片综合分析。

IF 4 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2024-07-01 DOI: 10.1186/s13040-024-00369-x

Lin Wang, Jiaming Su, Zhongjie Liu, Shaowei Ding, Yaotan Li, Baoluo Hou, Yuxin Hu, Zhaoxi Dong, Jingyi Tang, Hongfang Liu, Weijing Liu

Background: Diabetic nephropathy (DN) is a major microvascular complication of diabetes and has become the leading cause of end-stage renal disease worldwide. A considerable number of DN patients have experienced irreversible end-stage renal disease progression due to the inability to diagnose the disease early. Therefore, reliable biomarkers that are helpful for early diagnosis and treatment are identified. The migration of immune cells to the kidney is considered to be a key step in the progression of DN-related vascular injury. Therefore, finding markers in this process may be more helpful for the early diagnosis and progression prediction of DN.Methods: The gene chip data were retrieved from the GEO database using the search term ' diabetic nephropathy '. The ' limma ' software package was used to identify differentially expressed genes (DEGs) between DN and control samples. Gene set enrichment analysis (GSEA) was performed on genes obtained from the molecular characteristic database (MSigDB. The R package 'WGCNA' was used to identify gene modules associated with tubulointerstitial injury in DN, and it was crossed with immune-related DEGs to identify target genes. Gene ontology (GO) enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were performed on differentially expressed genes using the 'ClusterProfiler' software package in R. Three methods, least absolute shrinkage and selection operator (LASSO), support vector machine recursive feature elimination (SVM-RFE) and random forest (RF), were used to select immune-related biomarkers for diagnosis. We retrieved the tubulointerstitial dataset from the Nephroseq database to construct an external validation dataset. Unsupervised clustering analysis of the expression levels of immune-related biomarkers was performed using the 'ConsensusClusterPlus 'R software package. The urine of patients who visited Dongzhimen Hospital of Beijing University of Chinese Medicine from September 2021 to March 2023 was collected, and Elisa was used to detect the mRNA expression level of immune-related biomarkers in urine. Pearson correlation analysis was used to detect the effect of immune-related biomarker expression on renal function in DN patients.Results: Four microarray datasets from the GEO database are included in the analysis : GSE30122, GSE47185, GSE99340 and GSE104954. These datasets included 63 DN patients and 55 healthy controls. A total of 9415 genes were detected in the data set. We found 153 differentially expressed immune-related genes, of which 112 genes were up-regulated, 41 genes were down-regulated, and 119 overlapping genes were identified. GO analysis showed that they were involved in various biological processes including leukocyte-mediated immunity. KEGG analysis showed that these target genes were mainly involved in the formation of phagosomes in Staphylococcus aureus infection. Among these

背景：糖尿病肾病（DN）是糖尿病的主要微血管并发症，已成为全球终末期肾病的主要病因。由于无法早期诊断，相当多的 DN 患者经历了不可逆转的终末期肾病进展。因此，需要找到有助于早期诊断和治疗的可靠生物标志物。免疫细胞向肾脏的迁移被认为是 DN 相关血管损伤进展的关键步骤。因此，寻找这一过程中的标记物可能更有助于 DN 的早期诊断和进展预测：方法：以 "糖尿病肾病 "为检索词，从 GEO 数据库中检索基因芯片数据。使用 "limma "软件包鉴定 DN 和对照样本之间的差异表达基因（DEGs）。对分子特征数据库（MSigDB）中获得的基因进行了基因组富集分析（GSEA）。使用 R 软件包 "WGCNA "识别与 DN 中肾小管间质损伤相关的基因模块，并与免疫相关的 DEGs 交叉以识别目标基因。利用R软件包 "ClusterProfiler "对差异表达基因进行了基因本体（GO）富集分析和京都基因组百科全书（KEGG）通路分析，并采用最小绝对收缩和选择算子（LASSO）、支持向量机递归特征消除（SVM-RFE）和随机森林（RF）三种方法筛选出用于诊断的免疫相关生物标志物。我们从 Nephroseq 数据库中检索了肾小管间质数据集，以构建外部验证数据集。我们使用 "ConsensusClusterPlus "R软件包对免疫相关生物标志物的表达水平进行了无监督聚类分析。收集2021年9月至2023年3月在北京中医药大学东直门医院就诊的患者尿液，用Elisa检测尿液中免疫相关生物标志物的mRNA表达水平。采用皮尔逊相关分析检测免疫相关生物标志物表达对DN患者肾功能的影响：分析包括 GEO 数据库中的四个微阵列数据集：GSE30122、GSE47185、GSE99340 和 GSE104954。这些数据集包括 63 名 DN 患者和 55 名健康对照者。数据集中共检测到 9415 个基因。我们发现了 153 个差异表达的免疫相关基因，其中 112 个基因上调，41 个基因下调，119 个基因重叠。GO 分析表明，这些基因参与了各种生物过程，包括白细胞介导的免疫。KEGG 分析显示，这些目标基因主要参与了金黄色葡萄球菌感染过程中吞噬体的形成。在这 119 个重叠基因中，机器学习结果发现 AGR2、CCR2、CEBPD、CISH、CX3CR1、DEFB1 和 FSTL1 是潜在的肾小管间质免疫相关生物标记。外部验证表明，上述标记物在区分 DN 患者和健康对照组方面具有诊断功效。临床研究表明，DN 患者尿样中 AGR2、CX3CR1 和 FSTL1 的表达与 GFR 呈负相关，DN 患者尿样中 CX3CR1 和 FSTL1 的表达与血清肌酐呈正相关，而 DN 患者尿样中 DEFB1 的表达与血清肌酐呈负相关。此外，DN 尿样中 CX3CR1 的表达与蛋白尿呈正相关，而 DN 尿样中 DEFB1 的表达与蛋白尿呈负相关。最后，根据蛋白尿的程度，将 DN 患者分为肾病性蛋白尿组（24 人）和肾下性蛋白尿组。经非配对 t 检验，两组患者尿液中 AGR2、CCR2 和 DEFB1 的含量存在明显差异（P 结论：DN 患者的尿液中 AGR2、CCR2 和 DEFB1 的含量均高于肾病蛋白尿组：我们的研究为免疫相关生物标志物在 DN 肾小管间质损伤中的作用提供了新的见解，并为 DN 患者的早期诊断和治疗提供了潜在的靶点。七个不同的基因（AGR2、CCR2、CEBPD、CISH、CX3CR1、DEFB1、FSTL1）作为有希望的敏感生物标志物，可能通过调节免疫炎症反应影响 DN 的进展。然而，要全面了解它们在 DN 中的确切分子机制和功能通路，还需要进一步的综合研究。

{"title":"Identification of immune-associated biomarkers of diabetes nephropathy tubulointerstitial injury based on machine learning: a bioinformatics multi-chip integrated analysis.","authors":"Lin Wang, Jiaming Su, Zhongjie Liu, Shaowei Ding, Yaotan Li, Baoluo Hou, Yuxin Hu, Zhaoxi Dong, Jingyi Tang, Hongfang Liu, Weijing Liu","doi":"10.1186/s13040-024-00369-x","DOIUrl":"10.1186/s13040-024-00369-x","url":null,"abstract":"Background: Diabetic nephropathy (DN) is a major microvascular complication of diabetes and has become the leading cause of end-stage renal disease worldwide. A considerable number of DN patients have experienced irreversible end-stage renal disease progression due to the inability to diagnose the disease early. Therefore, reliable biomarkers that are helpful for early diagnosis and treatment are identified. The migration of immune cells to the kidney is considered to be a key step in the progression of DN-related vascular injury. Therefore, finding markers in this process may be more helpful for the early diagnosis and progression prediction of DN.Methods: The gene chip data were retrieved from the GEO database using the search term ' diabetic nephropathy '. The ' limma ' software package was used to identify differentially expressed genes (DEGs) between DN and control samples. Gene set enrichment analysis (GSEA) was performed on genes obtained from the molecular characteristic database (MSigDB. The R package 'WGCNA' was used to identify gene modules associated with tubulointerstitial injury in DN, and it was crossed with immune-related DEGs to identify target genes. Gene ontology (GO) enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were performed on differentially expressed genes using the 'ClusterProfiler' software package in R. Three methods, least absolute shrinkage and selection operator (LASSO), support vector machine recursive feature elimination (SVM-RFE) and random forest (RF), were used to select immune-related biomarkers for diagnosis. We retrieved the tubulointerstitial dataset from the Nephroseq database to construct an external validation dataset. Unsupervised clustering analysis of the expression levels of immune-related biomarkers was performed using the 'ConsensusClusterPlus 'R software package. The urine of patients who visited Dongzhimen Hospital of Beijing University of Chinese Medicine from September 2021 to March 2023 was collected, and Elisa was used to detect the mRNA expression level of immune-related biomarkers in urine. Pearson correlation analysis was used to detect the effect of immune-related biomarker expression on renal function in DN patients.Results: Four microarray datasets from the GEO database are included in the analysis : GSE30122, GSE47185, GSE99340 and GSE104954. These datasets included 63 DN patients and 55 healthy controls. A total of 9415 genes were detected in the data set. We found 153 differentially expressed immune-related genes, of which 112 genes were up-regulated, 41 genes were down-regulated, and 119 overlapping genes were identified. GO analysis showed that they were involved in various biological processes including leukocyte-mediated immunity. KEGG analysis showed that these target genes were mainly involved in the formation of phagosomes in Staphylococcus aureus infection. Among these","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"20"},"PeriodicalIF":4.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11218417/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141477779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0