首页 > 最新文献

Biodata Mining最新文献

英文 中文
circGPAcorr: an integrative tool for functional annotation of circular RNAs using expression data. circGPAcorr:利用表达数据对环状rna进行功能注释的集成工具。
IF 6.1 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-08-01 DOI: 10.1186/s13040-025-00468-3
Petr Ryšavý, Alikhan Anuarbekov, Michaela Dostálová Merkerová, Jiří Kléma

Circular RNAs play a crucial role in cell development and serve as biomarkers in many diseases. Nevertheless, the function of many circular RNAs remains unknown. This function can be inferred from sponging and silencing interactions with micro RNAs and messenger RNAs. We recently proposed a network-based circRNA functional annotation tool, circGPA. However, validation data for RNA interactions are often sparse and predicted interactions contain many false positives. To address this issue, we propose an extended algorithm named circGPAcorr, which uses expression data to weight the interactions, resulting in more precise functional annotation. To assess the significance of the results, the p-value is calculated using reduction to circGPA, a generating-polynomial-based method. We show that the problem is #P-hard, and thus computationally difficult. The circGPAcorr algorithm is tested on publicly available myelodysplastic syndromes expression data, providing gene ontology annotations that align with the literature on myelodysplastic syndromes. At the same time, we demonstrate its performance in the circRNA-disease annotation task.

环状rna在细胞发育中起着至关重要的作用,并在许多疾病中作为生物标志物。然而,许多环状rna的功能仍然未知。这种功能可以通过海绵和沉默与微rna和信使rna的相互作用来推断。我们最近提出了一个基于网络的circRNA功能注释工具circGPA。然而,RNA相互作用的验证数据通常是稀疏的,并且预测的相互作用包含许多假阳性。为了解决这个问题,我们提出了一个名为circGPAcorr的扩展算法,该算法使用表达式数据来权衡交互,从而产生更精确的功能注释。为了评估结果的显著性,p值是使用一种基于生成多项式的方法来计算的。我们证明这个问题是#P-hard的,因此计算困难。circGPAcorr算法在公开可用的骨髓增生异常综合征表达数据上进行了测试,提供了与骨髓增生异常综合征文献一致的基因本体注释。同时,我们展示了它在circRNA-disease注释任务中的表现。
{"title":"circGPAcorr: an integrative tool for functional annotation of circular RNAs using expression data.","authors":"Petr Ryšavý, Alikhan Anuarbekov, Michaela Dostálová Merkerová, Jiří Kléma","doi":"10.1186/s13040-025-00468-3","DOIUrl":"10.1186/s13040-025-00468-3","url":null,"abstract":"<p><p>Circular RNAs play a crucial role in cell development and serve as biomarkers in many diseases. Nevertheless, the function of many circular RNAs remains unknown. This function can be inferred from sponging and silencing interactions with micro RNAs and messenger RNAs. We recently proposed a network-based circRNA functional annotation tool, circGPA. However, validation data for RNA interactions are often sparse and predicted interactions contain many false positives. To address this issue, we propose an extended algorithm named circGPAcorr, which uses expression data to weight the interactions, resulting in more precise functional annotation. To assess the significance of the results, the p-value is calculated using reduction to circGPA, a generating-polynomial-based method. We show that the problem is #P-hard, and thus computationally difficult. The circGPAcorr algorithm is tested on publicly available myelodysplastic syndromes expression data, providing gene ontology annotations that align with the literature on myelodysplastic syndromes. At the same time, we demonstrate its performance in the circRNA-disease annotation task.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"50"},"PeriodicalIF":6.1,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12317645/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144765669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BioAug-Net: a bioimage sensor-driven attention-augmented segmentation framework with physiological coupling for early prostate cancer detection in T2-weighted MRI. BioAug-Net:一个生物图像传感器驱动的注意力增强分割框架与生理耦合,用于早期前列腺癌的t2加权MRI检测。
IF 6.1 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-07-29 DOI: 10.1186/s13040-025-00467-4
Muhammad Arshad, Chengliang Wang, Muhammad Wajeeh Us Sima, Jamshed Ali Shaikh, Hanen Karamti, Raed Alharthi, Julius Selecky

Accurate segmentation of the prostate peripheral zone (PZ) in T2-weighted MRI is critical for the early detection of prostate cancer. Existing segmentation methods are hindered by significant inter-observer variability (37.4 ± 5.6%), poor boundary localization, and the presence of motion artifacts, along with challenges in clinical integration. In this study, we propose BioAug-Net, a novel framework that integrates real-time physiological signal feedback with MRI data, leveraging transformer-based attention mechanisms and a probabilistic clinical decision support system (PCDSS). BioAug-Net features a dual-branch asymmetric attention mechanism: one branch processes spatial MRI features, while the other incorporates temporal sensor signals through a BiGRU-driven adaptive masking module. Additionally, a Markov Decision Process-based PCDSS maps segmentation outputs to clinical PI-RADS scores, with uncertainty quantification. We validated BioAug-Net on a multi-institutional dataset (n=1,542) and demonstrated state-of-the-art performance, achieving a Dice Similarity Coefficient of 89.7% (p < 0.001), sensitivity of 91.2% (p < 0.001), specificity of 88.4% (p < 0.001), and HD95 of 2.14 mm (p < 0.001), outperforming U-Net, Attention U-Net, and TransUNet. Sensor integration improved segmentation accuracy by 12.6% (p < 0.001) and reduced inter-observer variation by 48.3% (p < 0.001). Radiologist evaluations (n=3) confirmed a 15.0% reduction in diagnosis time (p = 0.003) and an increase in inter-reader agreement from K = 0.68 to K = 0.82 (p = 0.001). Our results show that BioAug-Net offers a clinically viable solution for early prostate cancer detection through enhanced physiological coupling and explainable AI diagnostics.

在t2加权MRI中准确分割前列腺外周带(PZ)对于前列腺癌的早期发现至关重要。现有的分割方法受到观察者之间显著的可变性(37.4±5.6%)、较差的边界定位、运动伪影的存在以及临床整合方面的挑战的阻碍。在这项研究中,我们提出了BioAug-Net,这是一个将实时生理信号反馈与MRI数据相结合的新框架,利用基于变压器的注意力机制和概率临床决策支持系统(PCDSS)。BioAug-Net具有双分支不对称注意机制:一个分支处理空间MRI特征,而另一个分支通过bigru驱动的自适应掩蔽模块整合时间传感器信号。此外,基于马尔可夫决策过程的PCDSS将分割输出映射到临床PI-RADS评分,并进行不确定性量化。我们在多机构数据集(n= 1542)上验证了BioAug-Net,并展示了最先进的性能,实现了89.7%的Dice相似系数(p < 0.001), 91.2%的灵敏度(p < 0.001), 88.4%的特异性(p < 0.001), HD95为2.14 mm (p < 0.001),优于U-Net, Attention U-Net和TransUNet。传感器集成将分割精度提高了12.6% (p < 0.001),将观察者之间的差异降低了48.3% (p < 0.001)。放射科医师评估(n=3)证实诊断时间减少了15.0% (p = 0.003),读者间一致性从K = 0.68增加到K = 0.82 (p = 0.001)。我们的研究结果表明,通过增强生理耦合和可解释的人工智能诊断,BioAug-Net为早期前列腺癌检测提供了临床可行的解决方案。
{"title":"BioAug-Net: a bioimage sensor-driven attention-augmented segmentation framework with physiological coupling for early prostate cancer detection in T2-weighted MRI.","authors":"Muhammad Arshad, Chengliang Wang, Muhammad Wajeeh Us Sima, Jamshed Ali Shaikh, Hanen Karamti, Raed Alharthi, Julius Selecky","doi":"10.1186/s13040-025-00467-4","DOIUrl":"10.1186/s13040-025-00467-4","url":null,"abstract":"<p><p>Accurate segmentation of the prostate peripheral zone (PZ) in T2-weighted MRI is critical for the early detection of prostate cancer. Existing segmentation methods are hindered by significant inter-observer variability (37.4 ± 5.6%), poor boundary localization, and the presence of motion artifacts, along with challenges in clinical integration. In this study, we propose BioAug-Net, a novel framework that integrates real-time physiological signal feedback with MRI data, leveraging transformer-based attention mechanisms and a probabilistic clinical decision support system (PCDSS). BioAug-Net features a dual-branch asymmetric attention mechanism: one branch processes spatial MRI features, while the other incorporates temporal sensor signals through a BiGRU-driven adaptive masking module. Additionally, a Markov Decision Process-based PCDSS maps segmentation outputs to clinical PI-RADS scores, with uncertainty quantification. We validated BioAug-Net on a multi-institutional dataset (n=1,542) and demonstrated state-of-the-art performance, achieving a Dice Similarity Coefficient of 89.7% (p < 0.001), sensitivity of 91.2% (p < 0.001), specificity of 88.4% (p < 0.001), and HD95 of 2.14 mm (p < 0.001), outperforming U-Net, Attention U-Net, and TransUNet. Sensor integration improved segmentation accuracy by 12.6% (p < 0.001) and reduced inter-observer variation by 48.3% (p < 0.001). Radiologist evaluations (n=3) confirmed a 15.0% reduction in diagnosis time (p = 0.003) and an increase in inter-reader agreement from K = 0.68 to K = 0.82 (p = 0.001). Our results show that BioAug-Net offers a clinically viable solution for early prostate cancer detection through enhanced physiological coupling and explainable AI diagnostics.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"49"},"PeriodicalIF":6.1,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12309236/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144745615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Can open source large language models be used for tumor documentation in Germany?-An evaluation on urological doctors' notes. 德国的肿瘤文档可以使用开源的大型语言模型吗?——对泌尿科医生病历的评价。
IF 6.1 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-07-24 DOI: 10.1186/s13040-025-00463-8
Stefan Lenz, Arsenij Ustjanzew, Marco Jeray, Meike Ressing, Torsten Panholzer

Background: Tumor documentation in Germany is currently a largely manual process. It involves reading the textual patient documentation and filling in forms in dedicated databases to obtain structured data. Advances in information extraction techniques that build on large language models (LLMs) could have the potential for enhancing the efficiency and reliability of this process. Evaluating LLMs in the German medical domain, especially their ability to interpret specialized language, is essential to determine their suitability for the use in clinical documentation. Due to data protection regulations, only locally deployed open source LLMs are generally suitable for this application.

Methods: The evaluation employs eleven different open source LLMs with sizes ranging from 1 to 70 billion model parameters. Three basic tasks were selected as representative examples for the tumor documentation process: identifying tumor diagnoses, assigning ICD-10 codes, and extracting the date of first diagnosis. For evaluating the LLMs on these tasks, a dataset of annotated text snippets based on anonymized doctors' notes from urology was prepared. Different prompting strategies were used to investigate the effect of the number of examples in few-shot prompting and to explore the capabilities of the LLMs in general.

Results: The models Llama 3.1 8B, Mistral 7B, and Mistral NeMo 12 B performed comparably well in the tasks. Models with less extensive training data or having fewer than 7 billion parameters showed notably lower performance, while larger models did not display performance gains. Examples from a different medical domain than urology could also improve the outcome in few-shot prompting, which demonstrates the ability of LLMs to handle tasks needed for tumor documentation.

Conclusions: Open source LLMs show a strong potential for automating tumor documentation. Models from 7-12 billion parameters could offer an optimal balance between performance and resource efficiency. With tailored fine-tuning and well-designed prompting, these models might become important tools for clinical documentation in the future. The code for the evaluation is available from https://github.com/stefan-m-lenz/UroLlmEval . We also release the data set under https://huggingface.co/datasets/stefan-m-lenz/UroLlmEvalSet providing a valuable resource that addresses the shortage of authentic and easily accessible benchmarks in German-language medical NLP.

背景:目前德国的肿瘤文献记录主要是手工处理。它包括阅读患者文本文档,并在专用数据库中填写表格,以获得结构化数据。建立在大型语言模型(llm)上的信息提取技术的进步有可能提高这一过程的效率和可靠性。评估德国医学领域的法学硕士,特别是他们解释专业语言的能力,对于确定他们在临床文件中使用的适用性至关重要。由于数据保护规定,通常只有本地部署的开源llm才适合此应用程序。方法:采用11种不同的开源llm进行评估,模型参数从1亿个到700亿个不等。选择三个基本任务作为肿瘤记录过程的代表性示例:识别肿瘤诊断,分配ICD-10代码和提取首次诊断日期。为了评估llm在这些任务上的表现,我们准备了一个基于匿名泌尿科医生笔记的带注释的文本片段数据集。采用不同的提示策略,考察了实例数量对少射提示的影响,并探讨了llm的总体能力。结果:羊驼3.1 8B、西北风7B和西北风NeMo 12b模型在任务中的表现相当好。训练数据较少或参数少于70亿个的模型表现出明显较低的性能,而较大的模型则没有表现出性能提升。与泌尿外科不同的医学领域的例子也可以在少量注射提示中改善结果,这表明llm有能力处理肿瘤记录所需的任务。结论:开源llm显示了自动化肿瘤文档的强大潜力。70 - 120亿个参数的模型可以在性能和资源效率之间提供最佳平衡。通过量身定制的微调和精心设计的提示,这些模型可能成为未来临床记录的重要工具。求值的代码可从https://github.com/stefan-m-lenz/UroLlmEval获得。我们还在https://huggingface.co/datasets/stefan-m-lenz/UroLlmEvalSet下发布了数据集,提供了宝贵的资源,解决了德语医学NLP中真实且易于获取的基准的短缺问题。
{"title":"Can open source large language models be used for tumor documentation in Germany?-An evaluation on urological doctors' notes.","authors":"Stefan Lenz, Arsenij Ustjanzew, Marco Jeray, Meike Ressing, Torsten Panholzer","doi":"10.1186/s13040-025-00463-8","DOIUrl":"10.1186/s13040-025-00463-8","url":null,"abstract":"<p><strong>Background: </strong>Tumor documentation in Germany is currently a largely manual process. It involves reading the textual patient documentation and filling in forms in dedicated databases to obtain structured data. Advances in information extraction techniques that build on large language models (LLMs) could have the potential for enhancing the efficiency and reliability of this process. Evaluating LLMs in the German medical domain, especially their ability to interpret specialized language, is essential to determine their suitability for the use in clinical documentation. Due to data protection regulations, only locally deployed open source LLMs are generally suitable for this application.</p><p><strong>Methods: </strong>The evaluation employs eleven different open source LLMs with sizes ranging from 1 to 70 billion model parameters. Three basic tasks were selected as representative examples for the tumor documentation process: identifying tumor diagnoses, assigning ICD-10 codes, and extracting the date of first diagnosis. For evaluating the LLMs on these tasks, a dataset of annotated text snippets based on anonymized doctors' notes from urology was prepared. Different prompting strategies were used to investigate the effect of the number of examples in few-shot prompting and to explore the capabilities of the LLMs in general.</p><p><strong>Results: </strong>The models Llama 3.1 8B, Mistral 7B, and Mistral NeMo 12 B performed comparably well in the tasks. Models with less extensive training data or having fewer than 7 billion parameters showed notably lower performance, while larger models did not display performance gains. Examples from a different medical domain than urology could also improve the outcome in few-shot prompting, which demonstrates the ability of LLMs to handle tasks needed for tumor documentation.</p><p><strong>Conclusions: </strong>Open source LLMs show a strong potential for automating tumor documentation. Models from 7-12 billion parameters could offer an optimal balance between performance and resource efficiency. With tailored fine-tuning and well-designed prompting, these models might become important tools for clinical documentation in the future. The code for the evaluation is available from https://github.com/stefan-m-lenz/UroLlmEval . We also release the data set under https://huggingface.co/datasets/stefan-m-lenz/UroLlmEvalSet providing a valuable resource that addresses the shortage of authentic and easily accessible benchmarks in German-language medical NLP.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"48"},"PeriodicalIF":6.1,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12291363/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144709599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The ethics of data mining in healthcare: challenges, frameworks, and future directions. 医疗保健中数据挖掘的伦理:挑战、框架和未来方向。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-07-11 DOI: 10.1186/s13040-025-00461-w
Mohamed Mustaf Ahmed, Olalekan John Okesanya, Majd Oweidat, Zhinya Kawa Othman, Shuaibu Saidu Musa, Don Eliseo Lucero-Prisno Iii

Data mining in healthcare offers transformative insights yet surfaces multilayered ethical and governance challenges that extend beyond privacy alone. Privacy and consent concerns remain paramount when handling sensitive medical data, particularly as healthcare organizations increasingly share patient information with large digital platforms. The risks of data breaches and unauthorized access are stark: 725 reportable incidents in 2023 alone exposed more than 133 million patient records, and hacking-related breaches surged by 239% since 2018. Algorithmic bias further threatens equity; models trained on historically prejudiced data can reinforce health disparities across protected groups. Therefore, transparency must span three levels-dataset documentation, model interpretability, and post-deployment audit logging-to make algorithmic reasoning and failures traceable. Security vulnerabilities in the Internet of Medical Things (IoMT) and cloud-based health platforms amplify these risks, while corporate data-sharing deals complicate questions of data ownership and patient autonomy. A comprehensive response requires (i) dataset-level artifacts such as "datasheets," (ii) model-cards that disclose fairness metrics, and (iii) continuous logging of predictions and LIME/SHAP explanations for independent audits. Technical safeguards must blend differential privacy (with empirically validated noise budgets), homomorphic encryption for high-value queries, and federated learning to maintain the locality of raw data. Governance frameworks must also mandate routine bias and robust audits and harmonized penalties for non-compliance. Regular reassessments, thorough documentation, and active engagement with clinicians, patients, and regulators are critical to accountability. This paper synthesizes current evidence, from a 2019 European re-identification study demonstrating 99.98% uniqueness with 15 quasi-identifiers to recent clinical audits that trimmed false-negative rates via threshold recalibration, and proposes an integrated set of fairness, privacy, and security controls aligned with SPIRIT-AI, CONSORT-AI, and emerging PROBAST-AI guidelines. Implementing these solutions will help healthcare systems harness the benefits of data mining while safeguarding patient rights and sustaining public trust.

医疗保健领域的数据挖掘提供了变革性的见解,但也暴露了超越隐私的多层道德和治理挑战。在处理敏感医疗数据时,隐私和同意问题仍然是最重要的,尤其是在医疗保健组织越来越多地与大型数字平台共享患者信息的情况下。数据泄露和未经授权访问的风险非常明显:仅在2023年,就有725起可报告的事件暴露了超过1.33亿份患者记录,自2018年以来,与黑客相关的泄露事件激增了239%。算法偏见进一步威胁到公平;用历史上有偏见的数据训练的模型可能会加剧受保护群体之间的健康差距。因此,透明度必须跨越三个级别—数据集文档、模型可解释性和部署后审计日志—以使算法推理和故障可跟踪。医疗物联网(IoMT)和基于云的健康平台的安全漏洞放大了这些风险,而企业数据共享交易使数据所有权和患者自主权问题复杂化。全面的回应需要(i)数据集级别的工件,如“数据表”,(ii)披露公平指标的模型卡,以及(iii)持续记录预测和独立审计的LIME/SHAP解释。技术保障必须混合差分隐私(与经验验证的噪声预算)、用于高价值查询的同态加密以及用于维护原始数据局域性的联邦学习。治理框架还必须规定例行的偏见和强有力的审计,并对违规行为进行协调一致的惩罚。定期重新评估、全面的文件记录以及临床医生、患者和监管机构的积极参与对问责制至关重要。本文综合了目前的证据,从2019年欧洲重新识别研究显示,15个准标识符具有99.98%的唯一性,到最近通过阈值重新校准减少假阴性率的临床审计,并提出了一套与SPIRIT-AI、CONSORT-AI和新兴PROBAST-AI指南一致的综合公平、隐私和安全控制措施。实施这些解决方案将有助于医疗保健系统利用数据挖掘的好处,同时保护患者权利并维持公众信任。
{"title":"The ethics of data mining in healthcare: challenges, frameworks, and future directions.","authors":"Mohamed Mustaf Ahmed, Olalekan John Okesanya, Majd Oweidat, Zhinya Kawa Othman, Shuaibu Saidu Musa, Don Eliseo Lucero-Prisno Iii","doi":"10.1186/s13040-025-00461-w","DOIUrl":"10.1186/s13040-025-00461-w","url":null,"abstract":"<p><p>Data mining in healthcare offers transformative insights yet surfaces multilayered ethical and governance challenges that extend beyond privacy alone. Privacy and consent concerns remain paramount when handling sensitive medical data, particularly as healthcare organizations increasingly share patient information with large digital platforms. The risks of data breaches and unauthorized access are stark: 725 reportable incidents in 2023 alone exposed more than 133 million patient records, and hacking-related breaches surged by 239% since 2018. Algorithmic bias further threatens equity; models trained on historically prejudiced data can reinforce health disparities across protected groups. Therefore, transparency must span three levels-dataset documentation, model interpretability, and post-deployment audit logging-to make algorithmic reasoning and failures traceable. Security vulnerabilities in the Internet of Medical Things (IoMT) and cloud-based health platforms amplify these risks, while corporate data-sharing deals complicate questions of data ownership and patient autonomy. A comprehensive response requires (i) dataset-level artifacts such as \"datasheets,\" (ii) model-cards that disclose fairness metrics, and (iii) continuous logging of predictions and LIME/SHAP explanations for independent audits. Technical safeguards must blend differential privacy (with empirically validated noise budgets), homomorphic encryption for high-value queries, and federated learning to maintain the locality of raw data. Governance frameworks must also mandate routine bias and robust audits and harmonized penalties for non-compliance. Regular reassessments, thorough documentation, and active engagement with clinicians, patients, and regulators are critical to accountability. This paper synthesizes current evidence, from a 2019 European re-identification study demonstrating 99.98% uniqueness with 15 quasi-identifiers to recent clinical audits that trimmed false-negative rates via threshold recalibration, and proposes an integrated set of fairness, privacy, and security controls aligned with SPIRIT-AI, CONSORT-AI, and emerging PROBAST-AI guidelines. Implementing these solutions will help healthcare systems harness the benefits of data mining while safeguarding patient rights and sustaining public trust.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"47"},"PeriodicalIF":4.0,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12255135/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144620971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vibe coding: a new paradigm for biomedical software development. Vibe编码:生物医学软件开发的新范式。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-07-01 DOI: 10.1186/s13040-025-00462-9
Jason H Moore, Nicholas Tatonetti
{"title":"Vibe coding: a new paradigm for biomedical software development.","authors":"Jason H Moore, Nicholas Tatonetti","doi":"10.1186/s13040-025-00462-9","DOIUrl":"10.1186/s13040-025-00462-9","url":null,"abstract":"","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"46"},"PeriodicalIF":4.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12217882/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144545739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Heart rate transition patterns reveal autonomic dysfunction in heart failure with renal function decline: a symbolic and Markov model approach. 心率转换模式揭示心力衰竭伴肾功能下降的自主神经功能障碍:一个符号和马尔可夫模型方法。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-06-20 DOI: 10.1186/s13040-025-00460-x
Namareq Widatalla, Sona Al Younis, Ahsan Khandoker

Around half of heart failure (HF) patients develop chronic kidney disease (CKD) and early detection of renal impairment in HF remains a clinical challenge. Both HF and CKD are characterized by autonomic dysfunction, suggesting that early identification of autonomic dysregulation may assist in early diagnosis and intervention. Conventional heart rate variability (HRV) metrics serve as non-invasive markers of autonomic nervous system (ANS) function; however, they are limited in their ability to capture directional and nonlinear dynamics associated with autonomic impairment during renal function decline. In this study, we digitized heart rate (HR) changes from 5-minute electrocardiogram (ECG) recordings in 358 patients with chronic HF (CHF). We applied a first-order Markov model and motif pattern analyses to compare HR transition dynamics between patients with normal and reduced estimated glomerular filtration rate (eGFR). The results revealed decreased monotonic HR transitions and increased tonic fluctuations in patients with reduced eGFR. Building on these findings, we introduced a transition stability index (TSI), which was significantly lower in patients with reduced eGFR compared to those with normal eGFR (p < 0.05). These results suggest that TSI may serve as a novel indicator of autonomic dysfunction associated with renal decline. Motif analysis further supported these findings by identifying distinctive HR transition patterns in patients with low eGFR.

大约一半的心力衰竭(HF)患者发展为慢性肾脏疾病(CKD),早期发现HF患者的肾脏损害仍然是一个临床挑战。HF和CKD均以自主神经功能紊乱为特征,提示自主神经功能紊乱的早期识别有助于早期诊断和干预。常规心率变异性(HRV)指标可作为自主神经系统(ANS)功能的非侵入性标志物;然而,它们在捕捉肾功能下降过程中与自主神经损伤相关的定向和非线性动力学方面的能力有限。在这项研究中,我们对358例慢性心衰(CHF)患者5分钟心电图(ECG)记录的心率(HR)变化进行了数字化。我们应用一阶马尔可夫模型和基序模式分析来比较正常和降低肾小球滤过率(eGFR)的患者之间的HR转移动力学。结果显示,eGFR降低的患者单调HR转换减少,紧张波动增加。在这些发现的基础上,我们引入了过渡稳定指数(TSI),与eGFR正常的患者相比,eGFR降低的患者的TSI明显更低
{"title":"Heart rate transition patterns reveal autonomic dysfunction in heart failure with renal function decline: a symbolic and Markov model approach.","authors":"Namareq Widatalla, Sona Al Younis, Ahsan Khandoker","doi":"10.1186/s13040-025-00460-x","DOIUrl":"10.1186/s13040-025-00460-x","url":null,"abstract":"<p><p>Around half of heart failure (HF) patients develop chronic kidney disease (CKD) and early detection of renal impairment in HF remains a clinical challenge. Both HF and CKD are characterized by autonomic dysfunction, suggesting that early identification of autonomic dysregulation may assist in early diagnosis and intervention. Conventional heart rate variability (HRV) metrics serve as non-invasive markers of autonomic nervous system (ANS) function; however, they are limited in their ability to capture directional and nonlinear dynamics associated with autonomic impairment during renal function decline. In this study, we digitized heart rate (HR) changes from 5-minute electrocardiogram (ECG) recordings in 358 patients with chronic HF (CHF). We applied a first-order Markov model and motif pattern analyses to compare HR transition dynamics between patients with normal and reduced estimated glomerular filtration rate (eGFR). The results revealed decreased monotonic HR transitions and increased tonic fluctuations in patients with reduced eGFR. Building on these findings, we introduced a transition stability index (TSI), which was significantly lower in patients with reduced eGFR compared to those with normal eGFR (p < 0.05). These results suggest that TSI may serve as a novel indicator of autonomic dysfunction associated with renal decline. Motif analysis further supported these findings by identifying distinctive HR transition patterns in patients with low eGFR.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"45"},"PeriodicalIF":4.0,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12180264/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144337166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A compact encoding of the genome suitable for machine learning prediction of traits and genetic risk scores. 一种紧凑的基因组编码,适合机器学习预测性状和遗传风险评分。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-06-19 DOI: 10.1186/s13040-025-00459-4
Yasaman Fatapour, James P Brody

Genotype to phenotype prediction is a central problem in biology and medicine. Machine learning is a natural tool to address this problem. However, a person's genotype is usually represented by a few million single-nucleotide polymorphisms and most datasets only have a few thousand patients. Thus, this problem typically has many more predictors than the number of samples (patients), making it unsuitable for machine learning. The objective of this paper is to examine the efficacy of a compact genotype representation, which employs a limited number of predictors, in predicting a person's phenotype through the application of machine learning. We characterized a person's genotype using chromosome-scale length variation, a measure that is computed as the average value of reported log R ratios across a portion of a chromosome. We computed these numbers from data collected by the NIH All of Us program. We used the AutoML function (h2o.ai) in binary classification mode to identify the best models to differentiate between male/female, Black/white, white/Asian, and Black/Asian. We also used the AutoML function in regression mode to predict the height of people based on their age and genotype. Our results showed that we could effectively classify a person, using only information from chromosomes 1-22, as Male/Female (AUC = 0.9988 ± 0.0001), White/Black (AUC = 0.970 ± 0.002), Asian/White (AUC = 0.877 ± 0.002), and Black/Asian (AUC = 0.966 ± 0.002). This approach also effectively predicted height. In conclusion, we have shown that this compact representation of a person's genotype, along with machine learning, can effectively predict a person's phenotype.

基因型到表型的预测是生物学和医学中的一个核心问题。机器学习是解决这个问题的自然工具。然而,一个人的基因型通常由几百万个单核苷酸多态性代表,而大多数数据集只有几千个患者。因此,这个问题通常具有比样本(患者)数量更多的预测因子,这使得它不适合机器学习。本文的目的是研究紧凑的基因型表示的有效性,该表示采用有限数量的预测因子,通过应用机器学习来预测一个人的表型。我们使用染色体尺度长度变异来表征一个人的基因型,这是一种测量方法,计算为报告的对数R比在染色体部分上的平均值。我们根据美国国立卫生研究院“我们所有人”项目收集的数据计算出这些数字。我们使用二元分类模式下的AutoML函数(h2o.ai)来识别区分男性/女性、黑人/白人、白人/亚洲人和黑人/亚洲人的最佳模型。我们还使用回归模型中的AutoML函数根据年龄和基因型预测人们的身高。结果表明,仅使用1-22号染色体的信息,我们就可以有效地将一个人分类为男性/女性(AUC = 0.9988±0.0001)、白人/黑人(AUC = 0.970±0.002)、亚洲人/白人(AUC = 0.877±0.002)和黑人/亚洲人(AUC = 0.966±0.002)。这种方法也能有效地预测身高。总之,我们已经证明,一个人的基因型的紧凑表示,以及机器学习,可以有效地预测一个人的表型。
{"title":"A compact encoding of the genome suitable for machine learning prediction of traits and genetic risk scores.","authors":"Yasaman Fatapour, James P Brody","doi":"10.1186/s13040-025-00459-4","DOIUrl":"10.1186/s13040-025-00459-4","url":null,"abstract":"<p><p>Genotype to phenotype prediction is a central problem in biology and medicine. Machine learning is a natural tool to address this problem. However, a person's genotype is usually represented by a few million single-nucleotide polymorphisms and most datasets only have a few thousand patients. Thus, this problem typically has many more predictors than the number of samples (patients), making it unsuitable for machine learning. The objective of this paper is to examine the efficacy of a compact genotype representation, which employs a limited number of predictors, in predicting a person's phenotype through the application of machine learning. We characterized a person's genotype using chromosome-scale length variation, a measure that is computed as the average value of reported log R ratios across a portion of a chromosome. We computed these numbers from data collected by the NIH All of Us program. We used the AutoML function (h2o.ai) in binary classification mode to identify the best models to differentiate between male/female, Black/white, white/Asian, and Black/Asian. We also used the AutoML function in regression mode to predict the height of people based on their age and genotype. Our results showed that we could effectively classify a person, using only information from chromosomes 1-22, as Male/Female (AUC = 0.9988 ± 0.0001), White/Black (AUC = 0.970 ± 0.002), Asian/White (AUC = 0.877 ± 0.002), and Black/Asian (AUC = 0.966 ± 0.002). This approach also effectively predicted height. In conclusion, we have shown that this compact representation of a person's genotype, along with machine learning, can effectively predict a person's phenotype.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"44"},"PeriodicalIF":4.0,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12180147/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144334213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recent advances in deep learning for protein-protein interaction: a review. 蛋白质-蛋白质相互作用深度学习研究进展综述。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-06-16 DOI: 10.1186/s13040-025-00457-6
Jiafu Cui, Siqi Yang, Litai Yi, Qilemuge Xi, Dezhi Yang, Yongchun Zuo

Deep learning, a cornerstone of artificial intelligence, is driving rapid advancements in computational biology. Protein-protein interactions (PPIs) are fundamental regulators of biological functions. With the inclusion of deep learning in PPI research, the field is undergoing transformative changes. Therefore, there is an urgent need for a comprehensive review and assessment of recent developments to improve analytical methods and open up a wider range of biomedical applications. This review meticulously assesses deep learning progress in PPI prediction from 2021 to 2025. We evaluate core architectures (GNNs, CNNs, RNNs) and pioneering approaches-attention-driven Transformers, multi-task frameworks, multimodal integration of sequence and structural data, transfer learning via BERT and ESM, and autoencoders for interaction characterization. Moreover, we examined enhanced algorithms for dealing with data imbalances, variations, and high-dimensional feature sparsity, as well as industry challenges (including shifting protein interactions, interactions with non-model organisms, and rare or unannotated protein interactions), and offered perspectives on the future of the field. In summary, this review systematically summarizes the latest advances and existing challenges in deep learning in the field of protein interaction analysis, providing a valuable reference for researchers in the fields of computational biology and deep learning.

深度学习是人工智能的基石,正在推动计算生物学的快速发展。蛋白质-蛋白质相互作用(PPIs)是生物功能的基本调节因子。随着深度学习被纳入PPI研究,该领域正在发生革命性的变化。因此,迫切需要对最近的发展进行全面的审查和评估,以改进分析方法并开辟更广泛的生物医学应用。这篇综述细致地评估了2021年至2025年深度学习在PPI预测中的进展。我们评估了核心架构(GNNs, cnn, rnn)和开创性的方法-注意力驱动的变压器,多任务框架,序列和结构数据的多模态集成,通过BERT和ESM的迁移学习,以及用于交互表征的自动编码器。此外,我们研究了用于处理数据不平衡、变化和高维特征稀疏性的增强算法,以及行业挑战(包括转移蛋白质相互作用、与非模式生物的相互作用、罕见或未注释的蛋白质相互作用),并对该领域的未来提出了观点。综上所述,本文系统总结了深度学习在蛋白质相互作用分析领域的最新进展和存在的挑战,为计算生物学和深度学习领域的研究人员提供了有价值的参考。
{"title":"Recent advances in deep learning for protein-protein interaction: a review.","authors":"Jiafu Cui, Siqi Yang, Litai Yi, Qilemuge Xi, Dezhi Yang, Yongchun Zuo","doi":"10.1186/s13040-025-00457-6","DOIUrl":"10.1186/s13040-025-00457-6","url":null,"abstract":"<p><p>Deep learning, a cornerstone of artificial intelligence, is driving rapid advancements in computational biology. Protein-protein interactions (PPIs) are fundamental regulators of biological functions. With the inclusion of deep learning in PPI research, the field is undergoing transformative changes. Therefore, there is an urgent need for a comprehensive review and assessment of recent developments to improve analytical methods and open up a wider range of biomedical applications. This review meticulously assesses deep learning progress in PPI prediction from 2021 to 2025. We evaluate core architectures (GNNs, CNNs, RNNs) and pioneering approaches-attention-driven Transformers, multi-task frameworks, multimodal integration of sequence and structural data, transfer learning via BERT and ESM, and autoencoders for interaction characterization. Moreover, we examined enhanced algorithms for dealing with data imbalances, variations, and high-dimensional feature sparsity, as well as industry challenges (including shifting protein interactions, interactions with non-model organisms, and rare or unannotated protein interactions), and offered perspectives on the future of the field. In summary, this review systematically summarizes the latest advances and existing challenges in deep learning in the field of protein interaction analysis, providing a valuable reference for researchers in the fields of computational biology and deep learning.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"43"},"PeriodicalIF":4.0,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12168265/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144310649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing hepatopathy clinical trial efficiency: a secure, large language model-powered pre-screening pipeline. 提高肝病临床试验效率:一个安全的、大型语言模型驱动的预筛选管道。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-06-14 DOI: 10.1186/s13040-025-00458-5
Xiongbin Gui, Hanlin Lv, Xiao Wang, Longting Lv, Yi Xiao, Lei Wang

Background: Recruitment for cohorts involving complex liver diseases, such as hepatocellular carcinoma and liver cirrhosis, often requires interpreting semantically complex criteria. Traditional manual screening methods are time-consuming and prone to errors. While AI-powered pre-screening offers potential solutions, challenges remain regarding accuracy, efficiency, and data privacy.

Methods: We developed a novel patient pre-screening pipeline that leverages clinical expertise to guide the precise, safe, and efficient application of large language models. The pipeline breaks down complex criteria into a series of composite questions and then employs two strategies to perform semantic question-answering through electronic health records: (1) Pathway A, Anthropomorphized Experts' Chain of Thought strategy; and (2) Pathway B, Preset Stances within an Agent Collaboration strategy, particularly in managing complex clinical reasoning scenarios. The pipeline is evaluated on key metrics including precision, recall, time consumption, and counterfactual inference-at both the question and criterion levels.

Results: Our pipeline achieved a notable balance of high precision (e.g., 0.921, criteria level) and good overall recall (e.g., ~ 0.82, criteria level), alongside high efficiency (0.44s per task). Pathway B excelled in high-precision complex reasoning (while exhibiting a specific recall profile conducive to accuracy), whereas Pathway A was particularly effective for tasks requiring both robust precision and recall (e.g., direct data extraction), often with faster processing times. Both pathways achieved comparable overall precision while offering different strengths in the precision-recall trade-off. The pipeline showed promising precision-focused results in hepatocellular carcinoma (0.878) and cirrhosis trials (0.843).

Conclusions: This data-secure and time-efficient pipeline shows high precision and achieves good recall in hepatopathy trials, providing promising solutions for streamlining clinical trial workflows. Its efficiency, adaptability, and balanced performance profile make it suitable for improving patient recruitment. And its capability to function in resource-constrained environments further enhances its utility in clinical settings.

背景:招募涉及复杂肝脏疾病的队列,如肝细胞癌和肝硬化,通常需要解释语义上复杂的标准。传统的人工筛选方法既耗时又容易出错。虽然人工智能预筛选提供了潜在的解决方案,但在准确性、效率和数据隐私方面仍然存在挑战。方法:我们开发了一种新的患者预筛选管道,利用临床专业知识指导大型语言模型的精确、安全和高效应用。该管道将复杂的标准分解为一系列复合问题,然后采用两种策略通过电子病历进行语义问答:(1)途径a,人格化专家思维链策略;(2)途径B, Agent协作策略中的预设立场,特别是在管理复杂的临床推理场景时。管道在关键指标上进行评估,包括精确度、召回率、时间消耗和反事实推理——在问题和标准级别。结果:我们的管道在高精度(例如,0.921,标准水平)和良好的总体召回率(例如,~ 0.82,标准水平)以及高效率(每个任务0.44s)之间取得了显着的平衡。路径B擅长于高精度复杂推理(同时表现出有利于准确性的特定召回配置文件),而路径a对于需要强大精度和召回的任务(例如,直接数据提取)特别有效,通常处理时间更快。两种方法都达到了相当的总体精度,同时在精度-召回权衡方面提供了不同的优势。该管道在肝细胞癌(0.878)和肝硬化试验(0.843)中显示出有希望的精确结果。结论:这种数据安全、时间高效的管道在肝病试验中具有较高的准确性和良好的召回率,为简化临床试验工作流程提供了有希望的解决方案。它的效率、适应性和平衡的性能使其适合于改善患者招募。它在资源有限的环境中发挥作用的能力进一步增强了它在临床环境中的效用。
{"title":"Enhancing hepatopathy clinical trial efficiency: a secure, large language model-powered pre-screening pipeline.","authors":"Xiongbin Gui, Hanlin Lv, Xiao Wang, Longting Lv, Yi Xiao, Lei Wang","doi":"10.1186/s13040-025-00458-5","DOIUrl":"10.1186/s13040-025-00458-5","url":null,"abstract":"<p><strong>Background: </strong>Recruitment for cohorts involving complex liver diseases, such as hepatocellular carcinoma and liver cirrhosis, often requires interpreting semantically complex criteria. Traditional manual screening methods are time-consuming and prone to errors. While AI-powered pre-screening offers potential solutions, challenges remain regarding accuracy, efficiency, and data privacy.</p><p><strong>Methods: </strong>We developed a novel patient pre-screening pipeline that leverages clinical expertise to guide the precise, safe, and efficient application of large language models. The pipeline breaks down complex criteria into a series of composite questions and then employs two strategies to perform semantic question-answering through electronic health records: (1) Pathway A, Anthropomorphized Experts' Chain of Thought strategy; and (2) Pathway B, Preset Stances within an Agent Collaboration strategy, particularly in managing complex clinical reasoning scenarios. The pipeline is evaluated on key metrics including precision, recall, time consumption, and counterfactual inference-at both the question and criterion levels.</p><p><strong>Results: </strong>Our pipeline achieved a notable balance of high precision (e.g., 0.921, criteria level) and good overall recall (e.g., ~ 0.82, criteria level), alongside high efficiency (0.44s per task). Pathway B excelled in high-precision complex reasoning (while exhibiting a specific recall profile conducive to accuracy), whereas Pathway A was particularly effective for tasks requiring both robust precision and recall (e.g., direct data extraction), often with faster processing times. Both pathways achieved comparable overall precision while offering different strengths in the precision-recall trade-off. The pipeline showed promising precision-focused results in hepatocellular carcinoma (0.878) and cirrhosis trials (0.843).</p><p><strong>Conclusions: </strong>This data-secure and time-efficient pipeline shows high precision and achieves good recall in hepatopathy trials, providing promising solutions for streamlining clinical trial workflows. Its efficiency, adaptability, and balanced performance profile make it suitable for improving patient recruitment. And its capability to function in resource-constrained environments further enhances its utility in clinical settings.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"42"},"PeriodicalIF":4.0,"publicationDate":"2025-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12167571/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144295174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Geospatial analysis of short sleep duration and cognitive disability in US adults: a multi-state study using machine learning techniques. 美国成年人短睡眠时间和认知障碍的地理空间分析:一项使用机器学习技术的多州研究。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-06-13 DOI: 10.1186/s13040-025-00456-7
Tue T Te, Alex A T Bui, Constance H Fung, Mary Regina Boland

Background: There is evidence of increased risk of cognitive disability due to short sleep duration and adverse Social Determinants of Health (SDoH). To determine whether spatial associations (correlation between spatially distributed variables within a given geographic area) exist between neighborhoods with short sleep duration and cognitive disability across the United States (US) after adjusting for other factors. We conducted a spatial analysis using a spatial lag model at the neighborhood-level with the census tract as unit-of-analysis within each state in the US. We aggregated our results nationally using a weighted analysis to adjust for the number of census tracts per state. This study used Centers for Disease Control and Prevention (CDC) data on short sleep duration, cognitive disability and other health factors. We used 2021-2022 neighborhood-level data from the CDC and US Census Bureau adjusting for social determinants of health (SDoH) and demographics, excluding Florida due to inconsistencies in data availability. Our exposure variable was self-reported short sleep defined by the CDC ("sleep less than 7 hours per 24 hour period"). Our outcome was self-reported cognitive disability defined by the CDC ("difficulty concentrating, remembering, or making decision"). We adjusted for other factors including 'health outcomes', 'preventive practices', and the CDC's Social Vulnerability Index.

Results: The spatial analysis revealed a significant association between short sleep duration and an increased risk of cognitive disability across the US (estimate range [0.29; 1.27], p < 0.005) after adjustment. Notably, six Western states (New Mexico, Alaska, Arizona, Nevada, Idaho, and Oregon) were at increased risk of cognitive disability due to short sleep duration and this pattern was significant (p = 0.007).

Conclusions: Our study highlights the importance of short sleep duration as a significant predictor of cognitive disability across the US after adjusting for other confounders. The association between short sleep and cognitive disability was especially strong in the Western region of the US providing a deeper understanding of how geographic context and local factors can shape health outcomes.

背景:有证据表明,由于睡眠时间短和不利的健康社会决定因素(SDoH),认知障碍的风险增加。在调整其他因素后,确定美国各地睡眠时间短的社区与认知障碍之间是否存在空间关联(给定地理区域内空间分布变量之间的相关性)。我们以美国各州的人口普查区为分析单位,在社区层面使用空间滞后模型进行了空间分析。我们在全国范围内汇总了我们的结果,使用加权分析来调整每个州的人口普查区数量。这项研究使用了美国疾病控制与预防中心(CDC)关于睡眠时间短、认知障碍和其他健康因素的数据。我们使用了来自疾病预防控制中心和美国人口普查局的2021-2022年社区数据,对健康的社会决定因素(SDoH)和人口统计学进行了调整,由于数据可用性不一致,不包括佛罗里达州。我们的暴露变量是疾病控制与预防中心定义的自我报告的短睡眠(“每24小时睡眠少于7小时”)。我们的结果是CDC定义的自我报告的认知障碍(“难以集中注意力、记忆或做决定”)。我们调整了其他因素,包括“健康结果”、“预防措施”和疾病预防控制中心的社会脆弱性指数。结果:空间分析显示,在美国,短睡眠时间与认知障碍风险增加之间存在显著关联(估计范围[0.29;[1.27]结论:在调整了其他混杂因素后,我们的研究强调了短睡眠时间作为美国认知障碍的重要预测因素的重要性。睡眠不足和认知障碍之间的联系在美国西部地区尤为明显,这让人们对地理环境和当地因素如何影响健康结果有了更深入的了解。
{"title":"Geospatial analysis of short sleep duration and cognitive disability in US adults: a multi-state study using machine learning techniques.","authors":"Tue T Te, Alex A T Bui, Constance H Fung, Mary Regina Boland","doi":"10.1186/s13040-025-00456-7","DOIUrl":"10.1186/s13040-025-00456-7","url":null,"abstract":"<p><strong>Background: </strong>There is evidence of increased risk of cognitive disability due to short sleep duration and adverse Social Determinants of Health (SDoH). To determine whether spatial associations (correlation between spatially distributed variables within a given geographic area) exist between neighborhoods with short sleep duration and cognitive disability across the United States (US) after adjusting for other factors. We conducted a spatial analysis using a spatial lag model at the neighborhood-level with the census tract as unit-of-analysis within each state in the US. We aggregated our results nationally using a weighted analysis to adjust for the number of census tracts per state. This study used Centers for Disease Control and Prevention (CDC) data on short sleep duration, cognitive disability and other health factors. We used 2021-2022 neighborhood-level data from the CDC and US Census Bureau adjusting for social determinants of health (SDoH) and demographics, excluding Florida due to inconsistencies in data availability. Our exposure variable was self-reported short sleep defined by the CDC (\"sleep less than 7 hours per 24 hour period\"). Our outcome was self-reported cognitive disability defined by the CDC (\"difficulty concentrating, remembering, or making decision\"). We adjusted for other factors including 'health outcomes', 'preventive practices', and the CDC's Social Vulnerability Index.</p><p><strong>Results: </strong>The spatial analysis revealed a significant association between short sleep duration and an increased risk of cognitive disability across the US (estimate range [0.29; 1.27], p < 0.005) after adjustment. Notably, six Western states (New Mexico, Alaska, Arizona, Nevada, Idaho, and Oregon) were at increased risk of cognitive disability due to short sleep duration and this pattern was significant (p = 0.007).</p><p><strong>Conclusions: </strong>Our study highlights the importance of short sleep duration as a significant predictor of cognitive disability across the US after adjusting for other confounders. The association between short sleep and cognitive disability was especially strong in the Western region of the US providing a deeper understanding of how geographic context and local factors can shape health outcomes.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"41"},"PeriodicalIF":4.0,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12166631/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144295129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biodata Mining
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1