首页 > 最新文献

Biodata Mining最新文献

英文 中文
Knowledge-slanted random forest method for high-dimensional data and small sample size with a feature selection application for gene expression data 针对高维数据和小样本量的知识倾斜随机森林方法与基因表达数据的特征选择应用
IF 4.5 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-09-10 DOI: 10.1186/s13040-024-00388-8
Erika Cantor, Sandra Guauque-Olarte, Roberto León, Steren Chabert, Rodrigo Salas
The use of prior knowledge in the machine learning framework has been considered a potential tool to handle the curse of dimensionality in genetic and genomics data. Although random forest (RF) represents a flexible non-parametric approach with several advantages, it can provide poor accuracy in high-dimensional settings, mainly in scenarios with small sample sizes. We propose a knowledge-slanted RF that integrates biological networks as prior knowledge into the model to improve its performance and explainability, exemplifying its use for selecting and identifying relevant genes. knowledge-slanted RF is a combination of two stages. First, prior knowledge represented by graphs is translated by running a random walk with restart algorithm to determine the relevance of each gene based on its connection and localization on a protein-protein interaction network. Then, each relevance is used to modify the selection probability to draw a gene as a candidate split-feature in the conventional RF. Experiments in simulated datasets with very small sample sizes $$(n le 30)$$ comparing knowledge-slanted RF against conventional RF and logistic lasso regression, suggest an improved precision in outcome prediction compared to the other methods. The knowledge-slanted RF was completed with the introduction of a modified version of the Boruta feature selection algorithm. Finally, knowledge-slanted RF identified more relevant biological genes, offering a higher level of explainability for users than conventional RF. These findings were corroborated in one real case to identify relevant genes to calcific aortic valve stenosis.
在机器学习框架中使用先验知识一直被认为是处理遗传和基因组学数据维度诅咒的潜在工具。虽然随机森林(RF)是一种灵活的非参数方法,具有多种优势,但在高维环境下,主要是在样本量较小的情况下,其准确性可能较差。我们提出了一种知识倾斜 RF,将生物网络作为先验知识整合到模型中,以提高其性能和可解释性,并将其用于选择和识别相关基因。首先,通过运行带重启算法的随机行走来转换由图代表的先验知识,从而根据每个基因在蛋白质-蛋白质相互作用网络上的连接和定位来确定其相关性。然后,利用每个相关性来修改选择概率,从而在传统的 RF 中将某个基因作为候选分割特征提取出来。在样本量极小的模拟数据集上进行的实验表明,知识倾斜RF与传统RF和logistic lasso回归相比,结果预测的精确度有所提高。通过引入改进版的 Boruta 特征选择算法,知识倾斜 RF 得到了完善。最后,与传统 RF 相比,知识倾斜 RF 识别出了更多相关的生物基因,为用户提供了更高水平的可解释性。这些发现在一个真实病例中得到了证实,从而确定了钙化性主动脉瓣狭窄的相关基因。
{"title":"Knowledge-slanted random forest method for high-dimensional data and small sample size with a feature selection application for gene expression data","authors":"Erika Cantor, Sandra Guauque-Olarte, Roberto León, Steren Chabert, Rodrigo Salas","doi":"10.1186/s13040-024-00388-8","DOIUrl":"https://doi.org/10.1186/s13040-024-00388-8","url":null,"abstract":"The use of prior knowledge in the machine learning framework has been considered a potential tool to handle the curse of dimensionality in genetic and genomics data. Although random forest (RF) represents a flexible non-parametric approach with several advantages, it can provide poor accuracy in high-dimensional settings, mainly in scenarios with small sample sizes. We propose a knowledge-slanted RF that integrates biological networks as prior knowledge into the model to improve its performance and explainability, exemplifying its use for selecting and identifying relevant genes. knowledge-slanted RF is a combination of two stages. First, prior knowledge represented by graphs is translated by running a random walk with restart algorithm to determine the relevance of each gene based on its connection and localization on a protein-protein interaction network. Then, each relevance is used to modify the selection probability to draw a gene as a candidate split-feature in the conventional RF. Experiments in simulated datasets with very small sample sizes $$(n le 30)$$ comparing knowledge-slanted RF against conventional RF and logistic lasso regression, suggest an improved precision in outcome prediction compared to the other methods. The knowledge-slanted RF was completed with the introduction of a modified version of the Boruta feature selection algorithm. Finally, knowledge-slanted RF identified more relevant biological genes, offering a higher level of explainability for users than conventional RF. These findings were corroborated in one real case to identify relevant genes to calcific aortic valve stenosis.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"10 1","pages":""},"PeriodicalIF":4.5,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhanced labor pain monitoring using machine learning and ECG waveform analysis for uterine contraction-induced pain. 利用机器学习和心电图波形分析对子宫收缩引起的疼痛加强分娩疼痛监测。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-09-07 DOI: 10.1186/s13040-024-00383-z
Yuan-Chia Chu, Saint Shiou-Sheng Chen, Kuen-Bao Chen, Jui-Sheng Sun, Tzu-Kuei Shen, Li-Kuei Chen

Objectives: This study aims to develop an innovative approach for monitoring and assessing labor pain through ECG waveform analysis, utilizing machine learning techniques to monitor pain resulting from uterine contractions.

Methods: The study was conducted at National Taiwan University Hospital between January and July 2020. We collected a dataset of 6010 ECG samples from women preparing for natural spontaneous delivery (NSD). The ECG data was used to develop an ECG waveform-based Nociception Monitoring Index (NoM). The dataset was divided into training (80%) and validation (20%) sets. Multiple machine learning models, including LightGBM, XGBoost, SnapLogisticRegression, and SnapDecisionTree, were developed and evaluated. Hyperparameter optimization was performed using grid search and five-fold cross-validation to enhance model performance.

Results: The LightGBM model demonstrated superior performance with an AUC of 0.96 and an accuracy of 90%, making it the optimal model for monitoring labor pain based on ECG data. Other models, such as XGBoost and SnapLogisticRegression, also showed strong performance, with AUC values ranging from 0.88 to 0.95.

Conclusions: This study demonstrates that the integration of machine learning algorithms with ECG data significantly enhances the accuracy and reliability of labor pain monitoring. Specifically, the LightGBM model exhibits exceptional precision and robustness in continuous pain monitoring during labor, with potential applicability extending to broader healthcare settings.

Trial registration: ClinicalTrials.gov Identifier: NCT04461704.

目的:本研究旨在开发一种通过心电图波形分析监测和评估分娩疼痛的创新方法:本研究旨在开发一种通过心电图波形分析监测和评估分娩疼痛的创新方法,利用机器学习技术监测子宫收缩引起的疼痛:研究于 2020 年 1 月至 7 月在台湾大学医院进行。我们从准备自然自然分娩(NSD)的产妇中收集了 6010 份心电图样本数据集。心电图数据被用于开发基于心电图波形的痛觉监测指数(NoM)。数据集分为训练集(80%)和验证集(20%)。开发并评估了多种机器学习模型,包括 LightGBM、XGBoost、SnapLogisticRegression 和 SnapDecisionTree。使用网格搜索和五倍交叉验证对超参数进行了优化,以提高模型性能:结果:LightGBM 模型表现优异,AUC 为 0.96,准确率达 90%,是基于心电图数据监测分娩疼痛的最佳模型。其他模型,如 XGBoost 和 SnapLogisticRegression,也表现出很强的性能,AUC 值从 0.88 到 0.95 不等:本研究表明,将机器学习算法与心电图数据相结合可显著提高分娩疼痛监测的准确性和可靠性。具体来说,LightGBM 模型在分娩过程中的连续疼痛监测中表现出了卓越的精确性和鲁棒性,其潜在的适用性可扩展到更广泛的医疗保健环境中:试验注册:ClinicalTrials.gov Identifier:试验注册:ClinicalTrials.gov Identifier:NCT04461704。
{"title":"Enhanced labor pain monitoring using machine learning and ECG waveform analysis for uterine contraction-induced pain.","authors":"Yuan-Chia Chu, Saint Shiou-Sheng Chen, Kuen-Bao Chen, Jui-Sheng Sun, Tzu-Kuei Shen, Li-Kuei Chen","doi":"10.1186/s13040-024-00383-z","DOIUrl":"10.1186/s13040-024-00383-z","url":null,"abstract":"<p><strong>Objectives: </strong>This study aims to develop an innovative approach for monitoring and assessing labor pain through ECG waveform analysis, utilizing machine learning techniques to monitor pain resulting from uterine contractions.</p><p><strong>Methods: </strong>The study was conducted at National Taiwan University Hospital between January and July 2020. We collected a dataset of 6010 ECG samples from women preparing for natural spontaneous delivery (NSD). The ECG data was used to develop an ECG waveform-based Nociception Monitoring Index (NoM). The dataset was divided into training (80%) and validation (20%) sets. Multiple machine learning models, including LightGBM, XGBoost, SnapLogisticRegression, and SnapDecisionTree, were developed and evaluated. Hyperparameter optimization was performed using grid search and five-fold cross-validation to enhance model performance.</p><p><strong>Results: </strong>The LightGBM model demonstrated superior performance with an AUC of 0.96 and an accuracy of 90%, making it the optimal model for monitoring labor pain based on ECG data. Other models, such as XGBoost and SnapLogisticRegression, also showed strong performance, with AUC values ranging from 0.88 to 0.95.</p><p><strong>Conclusions: </strong>This study demonstrates that the integration of machine learning algorithms with ECG data significantly enhances the accuracy and reliability of labor pain monitoring. Specifically, the LightGBM model exhibits exceptional precision and robustness in continuous pain monitoring during labor, with potential applicability extending to broader healthcare settings.</p><p><strong>Trial registration: </strong>ClinicalTrials.gov Identifier: NCT04461704.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"32"},"PeriodicalIF":4.0,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11380346/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142146633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The goldmine of GWAS summary statistics: a systematic review of methods and tools. GWAS 摘要统计的金矿:对方法和工具的系统回顾。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-09-05 DOI: 10.1186/s13040-024-00385-x
Panagiota I Kontou, Pantelis G Bagos

Genome-wide association studies (GWAS) have revolutionized our understanding of the genetic architecture of complex traits and diseases. GWAS summary statistics have become essential tools for various genetic analyses, including meta-analysis, fine-mapping, and risk prediction. However, the increasing number of GWAS summary statistics and the diversity of software tools available for their analysis can make it challenging for researchers to select the most appropriate tools for their specific needs. This systematic review aims to provide a comprehensive overview of the currently available software tools and databases for GWAS summary statistics analysis. We conducted a comprehensive literature search to identify relevant software tools and databases. We categorized the tools and databases by their functionality, including data management, quality control, single-trait analysis, and multiple-trait analysis. We also compared the tools and databases based on their features, limitations, and user-friendliness. Our review identified a total of 305 functioning software tools and databases dedicated to GWAS summary statistics, each with unique strengths and limitations. We provide descriptions of the key features of each tool and database, including their input/output formats, data types, and computational requirements. We also discuss the overall usability and applicability of each tool for different research scenarios. This comprehensive review will serve as a valuable resource for researchers who are interested in using GWAS summary statistics to investigate the genetic basis of complex traits and diseases. By providing a detailed overview of the available tools and databases, we aim to facilitate informed tool selection and maximize the effectiveness of GWAS summary statistics analysis.

全基因组关联研究(GWAS)彻底改变了我们对复杂性状和疾病遗传结构的认识。全基因组关联研究摘要统计已成为各种遗传分析(包括荟萃分析、精细图谱绘制和风险预测)的基本工具。然而,GWAS 统计摘要的数量越来越多,用于分析的软件工具也多种多样,这使得研究人员在选择最适合其特定需求的工具时面临挑战。本系统综述旨在全面概述目前可用于 GWAS 摘要统计分析的软件工具和数据库。我们进行了全面的文献检索,以确定相关的软件工具和数据库。我们按照工具和数据库的功能进行了分类,包括数据管理、质量控制、单性状分析和多性状分析。我们还根据工具和数据库的功能、局限性和易用性对其进行了比较。我们的研究共发现了 305 种专用于 GWAS 摘要统计的功能软件工具和数据库,每种工具和数据库都有其独特的优势和局限性。我们对每种工具和数据库的主要特点进行了描述,包括其输入/输出格式、数据类型和计算要求。我们还讨论了每种工具在不同研究方案中的整体可用性和适用性。对于有兴趣使用 GWAS 摘要统计来研究复杂性状和疾病遗传基础的研究人员来说,这篇综合综述将成为宝贵的资源。通过对现有工具和数据库的详细概述,我们旨在促进对工具的知情选择,并最大限度地提高 GWAS 概要统计分析的有效性。
{"title":"The goldmine of GWAS summary statistics: a systematic review of methods and tools.","authors":"Panagiota I Kontou, Pantelis G Bagos","doi":"10.1186/s13040-024-00385-x","DOIUrl":"10.1186/s13040-024-00385-x","url":null,"abstract":"<p><p>Genome-wide association studies (GWAS) have revolutionized our understanding of the genetic architecture of complex traits and diseases. GWAS summary statistics have become essential tools for various genetic analyses, including meta-analysis, fine-mapping, and risk prediction. However, the increasing number of GWAS summary statistics and the diversity of software tools available for their analysis can make it challenging for researchers to select the most appropriate tools for their specific needs. This systematic review aims to provide a comprehensive overview of the currently available software tools and databases for GWAS summary statistics analysis. We conducted a comprehensive literature search to identify relevant software tools and databases. We categorized the tools and databases by their functionality, including data management, quality control, single-trait analysis, and multiple-trait analysis. We also compared the tools and databases based on their features, limitations, and user-friendliness. Our review identified a total of 305 functioning software tools and databases dedicated to GWAS summary statistics, each with unique strengths and limitations. We provide descriptions of the key features of each tool and database, including their input/output formats, data types, and computational requirements. We also discuss the overall usability and applicability of each tool for different research scenarios. This comprehensive review will serve as a valuable resource for researchers who are interested in using GWAS summary statistics to investigate the genetic basis of complex traits and diseases. By providing a detailed overview of the available tools and databases, we aim to facilitate informed tool selection and maximize the effectiveness of GWAS summary statistics analysis.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"31"},"PeriodicalIF":4.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11375927/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142141566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Processing imbalanced medical data at the data level with assisted-reproduction data as an example. 以辅助生产数据为例,在数据层面处理不平衡的医疗数据。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-09-04 DOI: 10.1186/s13040-024-00384-y
Junliang Zhu, Shaowei Pu, Jiaji He, Dongchao Su, Weijie Cai, Xueying Xu, Hongbo Liu

Objective: Data imbalance is a pervasive issue in medical data mining, often leading to biased and unreliable predictive models. This study aims to address the urgent need for effective strategies to mitigate the impact of data imbalance on classification models. We focus on quantifying the effects of different imbalance degrees and sample sizes on model performance, identifying optimal cut-off values, and evaluating the efficacy of various methods to enhance model accuracy in highly imbalanced and small sample size scenarios.

Methods: We collected medical records of patients receiving assisted reproductive treatment in a reproductive medicine center. Random forest was used to screen the key variables for the prediction target. Various datasets with different imbalance degrees and sample sizes were constructed to compare the classification performance of logistic regression models. Metrics such as AUC, G-mean, F1-Score, Accuracy, Recall, and Precision were used for evaluation. Four imbalance treatment methods (SMOTE, ADASYN, OSS, and CNN) were applied to datasets with low positive rates and small sample sizes to assess their effectiveness.

Results: The logistic model's performance was low when the positive rate was below 10% but stabilized beyond this threshold. Similarly, sample sizes below 1200 yielded poor results, with improvement seen above this threshold. For robustness, the optimal cut-offs for positive rate and sample size were identified as 15% and 1500, respectively. SMOTE and ADASYN oversampling significantly improved classification performance in datasets with low positive rates and small sample sizes.

Conclusions: The study identifies a positive rate of 15% and a sample size of 1500 as optimal cut-offs for stable logistic model performance. For datasets with low positive rates and small sample sizes, SMOTE and ADASYN are recommended to improve balance and model accuracy.

目的:数据不平衡是医学数据挖掘中普遍存在的问题,往往会导致预测模型有偏差且不可靠。本研究旨在满足对有效策略的迫切需求,以减轻数据不平衡对分类模型的影响。我们的重点是量化不同失衡程度和样本量对模型性能的影响,确定最佳截断值,并评估各种方法在高度失衡和样本量较小的情况下提高模型准确性的效果:方法:我们收集了一家生殖医学中心接受辅助生殖治疗的患者的医疗记录。方法:我们收集了一家生殖医学中心接受辅助生殖治疗的患者的医疗记录,并使用随机森林筛选预测目标的关键变量。我们构建了不同失衡程度和样本量的数据集,以比较逻辑回归模型的分类性能。评估指标包括 AUC、G-mean、F1-Score、Accuracy、Recall 和 Precision。四种不平衡处理方法(SMOTE、ADASYN、OSS 和 CNN)被应用于阳性率低、样本量小的数据集,以评估其有效性:结果:当阳性率低于 10%时,逻辑模型的性能较低,但超过这一阈值后性能趋于稳定。同样,样本量低于 1200 个时,效果不佳,超过这一临界值时,效果会有所改善。为确保稳健性,确定阳性率和样本量的最佳临界值分别为 15%和 1500。在阳性率低、样本量小的数据集中,SMOTE 和 ADASYN 超采样显著提高了分类性能:结论:这项研究确定了 15%的阳性率和 1500 个样本量是逻辑模型性能稳定的最佳临界值。对于阳性率低、样本量小的数据集,建议使用 SMOTE 和 ADASYN 来提高平衡性和模型准确性。
{"title":"Processing imbalanced medical data at the data level with assisted-reproduction data as an example.","authors":"Junliang Zhu, Shaowei Pu, Jiaji He, Dongchao Su, Weijie Cai, Xueying Xu, Hongbo Liu","doi":"10.1186/s13040-024-00384-y","DOIUrl":"10.1186/s13040-024-00384-y","url":null,"abstract":"<p><strong>Objective: </strong>Data imbalance is a pervasive issue in medical data mining, often leading to biased and unreliable predictive models. This study aims to address the urgent need for effective strategies to mitigate the impact of data imbalance on classification models. We focus on quantifying the effects of different imbalance degrees and sample sizes on model performance, identifying optimal cut-off values, and evaluating the efficacy of various methods to enhance model accuracy in highly imbalanced and small sample size scenarios.</p><p><strong>Methods: </strong>We collected medical records of patients receiving assisted reproductive treatment in a reproductive medicine center. Random forest was used to screen the key variables for the prediction target. Various datasets with different imbalance degrees and sample sizes were constructed to compare the classification performance of logistic regression models. Metrics such as AUC, G-mean, F1-Score, Accuracy, Recall, and Precision were used for evaluation. Four imbalance treatment methods (SMOTE, ADASYN, OSS, and CNN) were applied to datasets with low positive rates and small sample sizes to assess their effectiveness.</p><p><strong>Results: </strong>The logistic model's performance was low when the positive rate was below 10% but stabilized beyond this threshold. Similarly, sample sizes below 1200 yielded poor results, with improvement seen above this threshold. For robustness, the optimal cut-offs for positive rate and sample size were identified as 15% and 1500, respectively. SMOTE and ADASYN oversampling significantly improved classification performance in datasets with low positive rates and small sample sizes.</p><p><strong>Conclusions: </strong>The study identifies a positive rate of 15% and a sample size of 1500 as optimal cut-offs for stable logistic model performance. For datasets with low positive rates and small sample sizes, SMOTE and ADASYN are recommended to improve balance and model accuracy.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"29"},"PeriodicalIF":4.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11373105/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142134276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
QIGTD: identifying critical genes in the evolution of lung adenocarcinoma with tensor decomposition. QIGTD:通过张量分解确定肺腺癌演变过程中的关键基因。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-09-04 DOI: 10.1186/s13040-024-00386-w
Bolin Chen, Jinlei Zhang, Ci Shao, Jun Bian, Ruiming Kang, Xuequn Shang

Background: Identifying critical genes is important for understanding the pathogenesis of complex diseases. Traditional studies typically comparing the change of biomecules between normal and disease samples or detecting important vertices from a single static biomolecular network, which often overlook the dynamic changes that occur between different disease stages. However, investigating temporal changes in biomolecular networks and identifying critical genes is critical for understanding the occurrence and development of diseases.

Methods: A novel method called Quantifying Importance of Genes with Tensor Decomposition (QIGTD) was proposed in this study. It first constructs a time series network by integrating both the intra and inter temporal network information, which preserving connections between networks at adjacent stages according to the local similarities. A tensor is employed to describe the connections of this time series network, and a 3-order tensor decomposition method was proposed to capture both the topological information of each network snapshot and the time series characteristics of the whole network. QIGTD is also a learning-free and efficient method that can be applied to datasets with a small number of samples.

Results: The effectiveness of QIGTD was evaluated using lung adenocarcinoma (LUAD) datasets and three state-of-the-art methods: T-degree, T-closeness, and T-betweenness were employed as benchmark methods. Numerical experimental results demonstrate that QIGTD outperforms these methods in terms of the indices of both precision and mAP. Notably, out of the top 50 genes, 29 have been verified to be highly related to LUAD according to the DisGeNET Database, and 36 are significantly enriched in LUAD related Gene Ontology (GO) terms, including nuclear division, mitotic nuclear division, chromosome segregation, organelle fission, and mitotic sister chromatid segregation.

Conclusion: In conclusion, QIGTD effectively captures the temporal changes in gene networks and identifies critical genes. It provides a valuable tool for studying temporal dynamics in biological networks and can aid in understanding the underlying mechanisms of diseases such as LUAD.

背景:识别关键基因对于了解复杂疾病的发病机制非常重要。传统研究通常比较正常样本与疾病样本之间生物分子的变化,或从单一静态生物分子网络中检测重要顶点,这往往忽略了不同疾病阶段之间发生的动态变化。然而,研究生物分子网络的时间变化并确定关键基因对于了解疾病的发生和发展至关重要:方法:本研究提出了一种名为 "张量分解基因重要性量化(QIGTD)"的新方法。它首先通过整合时间内和时间间的网络信息构建时间序列网络,根据局部相似性保留相邻阶段网络之间的连接。采用张量来描述该时间序列网络的连接,并提出了一种三阶张量分解方法,以捕捉每个网络快照的拓扑信息和整个网络的时间序列特征。QIGTD 也是一种无需学习的高效方法,可用于样本数量较少的数据集:使用肺腺癌(LUAD)数据集和三种最先进的方法评估了 QIGTD 的有效性:以 T-degree、T-closeness 和 T-betweenness 作为基准方法。数值实验结果表明,QIGTD 在精确度和 mAP 两项指标上都优于这些方法。值得注意的是,根据 DisGeNET 数据库,在前 50 个基因中,有 29 个已被证实与 LUAD 高度相关,有 36 个显著富集了与 LUAD 相关的基因本体(Gene Ontology,GO)术语,包括核分裂、有丝分裂核分裂、染色体分离、细胞器裂变和有丝分裂姐妹染色单体分离:总之,QIGTD 能有效捕捉基因网络的时间变化并识别关键基因。结论:QIGTD 能有效捕捉基因网络的时间变化并识别关键基因,它为研究生物网络的时间动态提供了一种有价值的工具,有助于了解 LUAD 等疾病的潜在机制。
{"title":"QIGTD: identifying critical genes in the evolution of lung adenocarcinoma with tensor decomposition.","authors":"Bolin Chen, Jinlei Zhang, Ci Shao, Jun Bian, Ruiming Kang, Xuequn Shang","doi":"10.1186/s13040-024-00386-w","DOIUrl":"10.1186/s13040-024-00386-w","url":null,"abstract":"<p><strong>Background: </strong>Identifying critical genes is important for understanding the pathogenesis of complex diseases. Traditional studies typically comparing the change of biomecules between normal and disease samples or detecting important vertices from a single static biomolecular network, which often overlook the dynamic changes that occur between different disease stages. However, investigating temporal changes in biomolecular networks and identifying critical genes is critical for understanding the occurrence and development of diseases.</p><p><strong>Methods: </strong>A novel method called Quantifying Importance of Genes with Tensor Decomposition (QIGTD) was proposed in this study. It first constructs a time series network by integrating both the intra and inter temporal network information, which preserving connections between networks at adjacent stages according to the local similarities. A tensor is employed to describe the connections of this time series network, and a 3-order tensor decomposition method was proposed to capture both the topological information of each network snapshot and the time series characteristics of the whole network. QIGTD is also a learning-free and efficient method that can be applied to datasets with a small number of samples.</p><p><strong>Results: </strong>The effectiveness of QIGTD was evaluated using lung adenocarcinoma (LUAD) datasets and three state-of-the-art methods: T-degree, T-closeness, and T-betweenness were employed as benchmark methods. Numerical experimental results demonstrate that QIGTD outperforms these methods in terms of the indices of both precision and mAP. Notably, out of the top 50 genes, 29 have been verified to be highly related to LUAD according to the DisGeNET Database, and 36 are significantly enriched in LUAD related Gene Ontology (GO) terms, including nuclear division, mitotic nuclear division, chromosome segregation, organelle fission, and mitotic sister chromatid segregation.</p><p><strong>Conclusion: </strong>In conclusion, QIGTD effectively captures the temporal changes in gene networks and identifies critical genes. It provides a valuable tool for studying temporal dynamics in biological networks and can aid in understanding the underlying mechanisms of diseases such as LUAD.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"30"},"PeriodicalIF":4.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11376055/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142134277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Seven quick tips for gene-focused computational pangenomic analysis. 以基因为重点的计算庞基因组分析的七个快速提示。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-09-03 DOI: 10.1186/s13040-024-00380-2
Vincenzo Bonnici, Davide Chicco

Pangenomics is a relatively new scientific field which investigates the union of all the genomes of a clade. The word pan means everything in ancient Greek; the term pangenomics originally regarded genomes of bacteria and was later intended to refer to human genomes as well. Modern bioinformatics offers several tools to analyze pangenomics data, paving the way to an emerging field that we can call computational pangenomics. Current computational power available for the bioinformatics community has made computational pangenomic analyses easy to perform, but this higher accessibility to pangenomics analysis also increases the chances to make mistakes and to produce misleading or inflated results, especially by beginners. To handle this problem, we present here a few quick tips for efficient and correct computational pangenomic analyses with a focus on bacterial pangenomics, by describing common mistakes to avoid and experienced best practices to follow in this field. We believe our recommendations can help the readers perform more robust and sound pangenomic analyses and to generate more reliable results.

泛基因组学(Pangenomics)是一个相对较新的科学领域,研究一个支系所有基因组的结合。在古希腊语中,"pan "意为万物;"pangenomics "一词最初指细菌基因组,后来也指人类基因组。现代生物信息学为分析泛基因组学数据提供了多种工具,为我们称之为计算泛基因组学的新兴领域铺平了道路。目前生物信息学界可用的计算能力使计算庞基因组学分析变得容易执行,但庞基因组学分析的更高可及性也增加了犯错和产生误导性或夸大结果的机会,尤其是初学者。为了解决这个问题,我们在此介绍一些快速窍门,以高效、正确地进行计算庞基因组学分析,重点是细菌庞基因组学,介绍该领域应避免的常见错误和应遵循的最佳实践经验。我们相信,我们的建议能帮助读者进行更稳健、更合理的庞基因组分析,并得出更可靠的结果。
{"title":"Seven quick tips for gene-focused computational pangenomic analysis.","authors":"Vincenzo Bonnici, Davide Chicco","doi":"10.1186/s13040-024-00380-2","DOIUrl":"10.1186/s13040-024-00380-2","url":null,"abstract":"<p><p>Pangenomics is a relatively new scientific field which investigates the union of all the genomes of a clade. The word pan means everything in ancient Greek; the term pangenomics originally regarded genomes of bacteria and was later intended to refer to human genomes as well. Modern bioinformatics offers several tools to analyze pangenomics data, paving the way to an emerging field that we can call computational pangenomics. Current computational power available for the bioinformatics community has made computational pangenomic analyses easy to perform, but this higher accessibility to pangenomics analysis also increases the chances to make mistakes and to produce misleading or inflated results, especially by beginners. To handle this problem, we present here a few quick tips for efficient and correct computational pangenomic analyses with a focus on bacterial pangenomics, by describing common mistakes to avoid and experienced best practices to follow in this field. We believe our recommendations can help the readers perform more robust and sound pangenomic analyses and to generate more reliable results.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"28"},"PeriodicalIF":4.0,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11370085/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142127084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep learning for automatic calcium detection in echocardiography. 深度学习用于超声心动图中的自动钙检测。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-08-28 DOI: 10.1186/s13040-024-00381-1
Luís B Elvas, Sara Gomes, João C Ferreira, Luís Brás Rosário, Tomás Brandão

Cardiovascular diseases are the main cause of death in the world and cardiovascular imaging techniques are the mainstay of noninvasive diagnosis. Aortic stenosis is a lethal cardiac disease preceded by aortic valve calcification for several years. Data-driven tools developed with Deep Learning (DL) algorithms can process and categorize medical images data, providing fast diagnoses with considered reliability, to improve healthcare effectiveness. A systematic review of DL applications on medical images for pathologic calcium detection concluded that there are established techniques in this field, using primarily CT scans, at the expense of radiation exposure. Echocardiography is an unexplored alternative to detect calcium, but still needs technological developments. In this article, a fully automated method based on Convolutional Neural Networks (CNNs) was developed to detect Aortic Calcification in Echocardiography images, consisting of two essential processes: (1) an object detector to locate aortic valve - achieving 95% of precision and 100% of recall; and (2) a classifier to identify calcium structures in the valve - which achieved 92% of precision and 100% of recall. The outcome of this work is the possibility of automation of the detection with Echocardiography of Aortic Valve Calcification, a lethal and prevalent disease.

心血管疾病是世界上最主要的死亡原因,而心血管成像技术是无创诊断的主要手段。主动脉瓣狭窄是一种致命的心脏疾病,主动脉瓣钙化会持续数年。利用深度学习(DL)算法开发的数据驱动工具可以对医学影像数据进行处理和分类,提供可靠的快速诊断,从而提高医疗保健的效率。一项关于将深度学习应用于病理钙检测的医学图像的系统性综述得出结论,该领域已有成熟的技术,主要使用 CT 扫描,但以辐射暴露为代价。超声心动图是一种尚未开发的检测钙的替代方法,但仍需要技术发展。本文开发了一种基于卷积神经网络(CNN)的全自动方法来检测超声心动图图像中的主动脉钙化,该方法由两个基本过程组成:(1)定位主动脉瓣的物体检测器--精确度达到 95%,召回率达到 100%;(2)识别瓣膜中钙结构的分类器--精确度达到 92%,召回率达到 100%。这项工作的成果是实现了主动脉瓣钙化这一致命流行病的超声心动图自动化检测。
{"title":"Deep learning for automatic calcium detection in echocardiography.","authors":"Luís B Elvas, Sara Gomes, João C Ferreira, Luís Brás Rosário, Tomás Brandão","doi":"10.1186/s13040-024-00381-1","DOIUrl":"10.1186/s13040-024-00381-1","url":null,"abstract":"<p><p>Cardiovascular diseases are the main cause of death in the world and cardiovascular imaging techniques are the mainstay of noninvasive diagnosis. Aortic stenosis is a lethal cardiac disease preceded by aortic valve calcification for several years. Data-driven tools developed with Deep Learning (DL) algorithms can process and categorize medical images data, providing fast diagnoses with considered reliability, to improve healthcare effectiveness. A systematic review of DL applications on medical images for pathologic calcium detection concluded that there are established techniques in this field, using primarily CT scans, at the expense of radiation exposure. Echocardiography is an unexplored alternative to detect calcium, but still needs technological developments. In this article, a fully automated method based on Convolutional Neural Networks (CNNs) was developed to detect Aortic Calcification in Echocardiography images, consisting of two essential processes: (1) an object detector to locate aortic valve - achieving 95% of precision and 100% of recall; and (2) a classifier to identify calcium structures in the valve - which achieved 92% of precision and 100% of recall. The outcome of this work is the possibility of automation of the detection with Echocardiography of Aortic Valve Calcification, a lethal and prevalent disease.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"27"},"PeriodicalIF":4.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11351547/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142094005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrating transcriptomics and proteomics to analyze the immune microenvironment of cytomegalovirus associated ulcerative colitis and identify relevant biomarkers. 整合转录组学和蛋白质组学,分析巨细胞病毒相关性溃疡性结肠炎的免疫微环境并确定相关生物标记物。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-08-27 DOI: 10.1186/s13040-024-00382-0
Yang Chen, Qingqing Zheng, Hui Wang, Peiren Tang, Li Deng, Pu Li, Huan Li, Jianhong Hou, Jie Li, Li Wang, Jun Peng

Background: In recent years, significant morbidity and mortality in patients with severe inflammatory bowel disease (IBD) and cytomegalovirus (CMV) have drawn considerable attention to the status of CMV infection in the intestinal mucosa of IBD patients and its role in disease progression. However, there is currently no high-throughput sequencing data for ulcerative colitis patients with CMV infection (CMV + UC), and the immune microenvironment in CMV + UC patients have yet to be explored.

Method: The xCell algorithm was used for evaluate the immune microenvironment of CMV + UC patients. Then, WGCNA analysis was explored to obtain the co-expression modules between abnormal immune cells and gene level or protein level. Next, three machine learning approach include Random Forest, SVM-rfe, and Lasso were used to filter candidate biomarkers. Finally, Best Subset Selection algorithms was performed to construct the diagnostic model.

Results: In this study, we performed transcriptomic and proteomic sequencing on CMV + UC patients to establish a comprehensive immune microenvironment profile and found 11 specific abnormal immune cells in CMV + UC group. After using multi-omics integration algorithms, we identified seven co-expression gene modules and five co-expression protein modules. Subsequently, we utilized various machine learning algorithms to identify key biomarkers with diagnostic efficacy and constructed an early diagnostic model. We identified a total of eight biomarkers (PPP1R12B, CIRBP, CSNK2A2, DNAJB11, PIK3R4, RRBP1, STX5, TMEM214) that play crucial roles in the immune microenvironment of CMV + UC and exhibit superior diagnostic performance for CMV + UC.

Conclusion: This 8 biomarkers model offers a new paradigm for the diagnosis and treatment of IBD patients post-CMV infection. Further research into this model will be significant for understanding the changes in the host immune microenvironment following CMV infection.

背景:近年来,严重炎症性肠病(IBD)和巨细胞病毒(CMV)患者的发病率和死亡率显著上升,这引起了人们对IBD患者肠粘膜CMV感染状况及其在疾病进展中所起作用的极大关注。然而,目前还没有CMV感染的溃疡性结肠炎患者(CMV + UC)的高通量测序数据,CMV + UC患者的免疫微环境也有待探索:方法:采用 xCell 算法评估 CMV + UC 患者的免疫微环境。方法:采用 xCell 算法评估 CMV + UC 患者的免疫微环境,然后通过 WGCNA 分析获得异常免疫细胞与基因水平或蛋白质水平的共表达模块。接着,使用随机森林、SVM-rfe 和 Lasso 三种机器学习方法筛选候选生物标记物。最后,采用最佳子集选择算法构建诊断模型:在这项研究中,我们对 CMV + UC 患者进行了转录组学和蛋白质组学测序,以建立全面的免疫微环境谱,并在 CMV + UC 组中发现了 11 种特异性异常免疫细胞。在使用多组学整合算法后,我们确定了 7 个共表达基因模块和 5 个共表达蛋白质模块。随后,我们利用各种机器学习算法确定了具有诊断功效的关键生物标志物,并构建了早期诊断模型。我们共发现了8个生物标志物(PPP1R12B、CIRBP、CSNK2A2、DNAJB11、PIK3R4、RRBP1、STX5、TMEM214),它们在CMV + UC的免疫微环境中发挥着关键作用,并对CMV + UC表现出卓越的诊断性能:结论:这 8 个生物标志物模型为 CMV 感染后 IBD 患者的诊断和治疗提供了新的范例。对该模型的进一步研究将对了解 CMV 感染后宿主免疫微环境的变化具有重要意义。
{"title":"Integrating transcriptomics and proteomics to analyze the immune microenvironment of cytomegalovirus associated ulcerative colitis and identify relevant biomarkers.","authors":"Yang Chen, Qingqing Zheng, Hui Wang, Peiren Tang, Li Deng, Pu Li, Huan Li, Jianhong Hou, Jie Li, Li Wang, Jun Peng","doi":"10.1186/s13040-024-00382-0","DOIUrl":"10.1186/s13040-024-00382-0","url":null,"abstract":"<p><strong>Background: </strong>In recent years, significant morbidity and mortality in patients with severe inflammatory bowel disease (IBD) and cytomegalovirus (CMV) have drawn considerable attention to the status of CMV infection in the intestinal mucosa of IBD patients and its role in disease progression. However, there is currently no high-throughput sequencing data for ulcerative colitis patients with CMV infection (CMV + UC), and the immune microenvironment in CMV + UC patients have yet to be explored.</p><p><strong>Method: </strong>The xCell algorithm was used for evaluate the immune microenvironment of CMV + UC patients. Then, WGCNA analysis was explored to obtain the co-expression modules between abnormal immune cells and gene level or protein level. Next, three machine learning approach include Random Forest, SVM-rfe, and Lasso were used to filter candidate biomarkers. Finally, Best Subset Selection algorithms was performed to construct the diagnostic model.</p><p><strong>Results: </strong>In this study, we performed transcriptomic and proteomic sequencing on CMV + UC patients to establish a comprehensive immune microenvironment profile and found 11 specific abnormal immune cells in CMV + UC group. After using multi-omics integration algorithms, we identified seven co-expression gene modules and five co-expression protein modules. Subsequently, we utilized various machine learning algorithms to identify key biomarkers with diagnostic efficacy and constructed an early diagnostic model. We identified a total of eight biomarkers (PPP1R12B, CIRBP, CSNK2A2, DNAJB11, PIK3R4, RRBP1, STX5, TMEM214) that play crucial roles in the immune microenvironment of CMV + UC and exhibit superior diagnostic performance for CMV + UC.</p><p><strong>Conclusion: </strong>This 8 biomarkers model offers a new paradigm for the diagnosis and treatment of IBD patients post-CMV infection. Further research into this model will be significant for understanding the changes in the host immune microenvironment following CMV infection.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"26"},"PeriodicalIF":4.0,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11348729/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142082326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Understanding predictions of drug profiles using explainable machine learning models 利用可解释的机器学习模型了解药物概况预测
IF 4.5 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-08-01 DOI: 10.1186/s13040-024-00378-w
Caroline König, Alfredo Vellido
The analysis of absorption, distribution, metabolism, and excretion (ADME) molecular properties is of relevance to drug design, as they directly influence the drug’s effectiveness at its target location. This study concerns their prediction, using explainable Machine Learning (ML) models. The aim of the study is to find which molecular features are relevant to the prediction of the different ADME properties and measure their impact on the predictive model. The relative relevance of individual features for ADME activity is gauged by estimating feature importance in ML models’ predictions. Feature importance is calculated using feature permutation and the individual impact of features is measured by SHAP additive explanations. The study reveals the relevance of specific molecular descriptors for each ADME property and quantifies their impact on the ADME property prediction. The reported research illustrates how explainable ML models can provide detailed insights about the individual contributions of molecular features to the final prediction of an ADME property, as an effort to support experts in the process of drug candidate selection through a better understanding of the impact of molecular features.
吸收、分布、代谢和排泄(ADME)分子特性的分析与药物设计息息相关,因为它们直接影响药物在靶点的有效性。本研究利用可解释的机器学习(ML)模型对其进行预测。研究的目的是找出与预测不同 ADME 特性相关的分子特征,并衡量它们对预测模型的影响。通过估算特征在 ML 模型预测中的重要性来衡量各个特征与 ADME 活性的相对相关性。特征重要性通过特征排列来计算,特征的个体影响则通过 SHAP 相加解释来衡量。该研究揭示了特定分子描述符对每种 ADME 特性的相关性,并量化了它们对 ADME 特性预测的影响。所报告的研究说明了可解释的 ML 模型如何能够提供有关分子特征对 ADME 特性最终预测的个别贡献的详细见解,从而通过更好地了解分子特征的影响,在候选药物选择过程中为专家提供支持。
{"title":"Understanding predictions of drug profiles using explainable machine learning models","authors":"Caroline König, Alfredo Vellido","doi":"10.1186/s13040-024-00378-w","DOIUrl":"https://doi.org/10.1186/s13040-024-00378-w","url":null,"abstract":"The analysis of absorption, distribution, metabolism, and excretion (ADME) molecular properties is of relevance to drug design, as they directly influence the drug’s effectiveness at its target location. This study concerns their prediction, using explainable Machine Learning (ML) models. The aim of the study is to find which molecular features are relevant to the prediction of the different ADME properties and measure their impact on the predictive model. The relative relevance of individual features for ADME activity is gauged by estimating feature importance in ML models’ predictions. Feature importance is calculated using feature permutation and the individual impact of features is measured by SHAP additive explanations. The study reveals the relevance of specific molecular descriptors for each ADME property and quantifies their impact on the ADME property prediction. The reported research illustrates how explainable ML models can provide detailed insights about the individual contributions of molecular features to the final prediction of an ADME property, as an effort to support experts in the process of drug candidate selection through a better understanding of the impact of molecular features.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"45 1","pages":""},"PeriodicalIF":4.5,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141862771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modelling the nicotine pharmacokinetic profile for e-cigarettes using real time monitoring of consumers' physiological measurements and mouth level exposure. 利用对消费者生理测量数据和口腔接触水平的实时监测,模拟电子烟的尼古丁药代动力学特征。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-17 DOI: 10.1186/s13040-024-00375-z
Krishna Prasad, Allen Griffiths, Kavya Agrawal, Michael McEwan, Flavio Macci, Marco Ghisoni, Matthew Stopher, Matthew Napleton, Joel Strickland, David Keating, Thomas Whitehead, Gareth Conduit, Stacey Murray, Lauren Edward

Pharmacokinetic (PK) studies can provide essential information on abuse liability of nicotine and tobacco products but are intrusive and must be conducted in a clinical environment. The objective of the study was to explore whether changes in plasma nicotine levels following use of an e-cigarette can be predicted from real time monitoring of physiological parameters and mouth level exposure (MLE) to nicotine before, during, and after e-cigarette vaping, using wearable devices. Such an approach would allow an -effective pre-screening process, reducing the number of clinical studies, reducing the number of products to be tested and the number of blood draws required in a clinical PK study Establishing such a prediction model might facilitate the longitudinal collection of data on product use and nicotine expression among consumers using nicotine products in their normal environments, thereby reducing the need for intrusive clinical studies while generating PK data related to product use in the real world.An exploratory machine learning model was developed to predict changes in plasma nicotine levels following the use of an e-cigarette; from real time monitoring of physiological parameters and MLE to nicotine before, during, and after e-cigarette vaping. This preliminary study identified key parameters, such as heart rate (HR), heart rate variability (HRV), and physiological stress (PS) that may act as predictors for an individual's plasma nicotine response (PK curve). Relative to baseline measurements (per participant), HR showed a significant increase for nicotine containing e-liquids and was consistent across sessions (intra-participant). Imputing missing values and training the model on all data resulted in 57% improvement from the original'learning' data and achieved a median validation R2 of 0.70.The study is in its exploratory phase, with limitations including a small and non-diverse sample size and reliance on data from a single e-cigarette product. These findings necessitate further research for validation and to enhance the model's generalisability and applicability in real-world settings. This study serves as a foundational step towards developing non-intrusive PK models for nicotine product use.

药代动力学(PK)研究可以提供有关尼古丁和烟草产品滥用责任的重要信息,但具有侵入性,必须在临床环境中进行。这项研究的目的是探索在使用电子烟之前、期间和之后,利用可穿戴设备对生理参数和口腔尼古丁暴露水平(MLE)进行实时监测,是否可以预测使用电子烟后血浆尼古丁水平的变化。建立这种预测模型可能有助于纵向收集在正常环境中使用尼古丁产品的消费者的产品使用和尼古丁表达数据,从而减少对侵入性临床研究的需求,同时生成与真实世界中产品使用相关的 PK 数据。我们开发了一个探索性的机器学习模型,以预测使用电子烟后血浆尼古丁水平的变化;该模型来自对电子烟吸食前、吸食中和吸食后的生理参数和尼古丁 MLE 的实时监测。这项初步研究确定了一些关键参数,如心率(HR)、心率变异性(HRV)和生理压力(PS),这些参数可作为个人血浆尼古丁反应(PK 曲线)的预测因子。相对于基线测量值(每位参与者),含有尼古丁的电子烟的心率显著增加,并且在不同疗程中(参与者内部)保持一致。对所有数据进行缺失值补偿和模型训练后,原始 "学习 "数据提高了 57%,中位验证 R2 为 0.70。该研究目前处于探索阶段,其局限性包括样本量小且不多样化,以及依赖于单一电子烟产品的数据。这些发现需要进一步的研究来验证,并增强模型在现实环境中的普遍性和适用性。这项研究为开发尼古丁产品使用的非侵入式 PK 模型迈出了基础性的一步。
{"title":"Modelling the nicotine pharmacokinetic profile for e-cigarettes using real time monitoring of consumers' physiological measurements and mouth level exposure.","authors":"Krishna Prasad, Allen Griffiths, Kavya Agrawal, Michael McEwan, Flavio Macci, Marco Ghisoni, Matthew Stopher, Matthew Napleton, Joel Strickland, David Keating, Thomas Whitehead, Gareth Conduit, Stacey Murray, Lauren Edward","doi":"10.1186/s13040-024-00375-z","DOIUrl":"10.1186/s13040-024-00375-z","url":null,"abstract":"<p><p>Pharmacokinetic (PK) studies can provide essential information on abuse liability of nicotine and tobacco products but are intrusive and must be conducted in a clinical environment. The objective of the study was to explore whether changes in plasma nicotine levels following use of an e-cigarette can be predicted from real time monitoring of physiological parameters and mouth level exposure (MLE) to nicotine before, during, and after e-cigarette vaping, using wearable devices. Such an approach would allow an -effective pre-screening process, reducing the number of clinical studies, reducing the number of products to be tested and the number of blood draws required in a clinical PK study Establishing such a prediction model might facilitate the longitudinal collection of data on product use and nicotine expression among consumers using nicotine products in their normal environments, thereby reducing the need for intrusive clinical studies while generating PK data related to product use in the real world.An exploratory machine learning model was developed to predict changes in plasma nicotine levels following the use of an e-cigarette; from real time monitoring of physiological parameters and MLE to nicotine before, during, and after e-cigarette vaping. This preliminary study identified key parameters, such as heart rate (HR), heart rate variability (HRV), and physiological stress (PS) that may act as predictors for an individual's plasma nicotine response (PK curve). Relative to baseline measurements (per participant), HR showed a significant increase for nicotine containing e-liquids and was consistent across sessions (intra-participant). Imputing missing values and training the model on all data resulted in 57% improvement from the original'learning' data and achieved a median validation R<sup>2</sup> of 0.70.The study is in its exploratory phase, with limitations including a small and non-diverse sample size and reliance on data from a single e-cigarette product. These findings necessitate further research for validation and to enhance the model's generalisability and applicability in real-world settings. This study serves as a foundational step towards developing non-intrusive PK models for nicotine product use.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"24"},"PeriodicalIF":4.0,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11253374/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141635153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biodata Mining
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1