Biodata Mining最新文献

Few-shot biomedical NER empowered by LLMs-assisted data augmentation and multi-scale feature extraction.

IF 4 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2025-04-04 DOI: 10.1186/s13040-025-00443-y

Di Zhao, Wenxuan Mu, Xiangxing Jia, Shuang Liu, Yonghe Chu, Jiana Meng, Hongfei Lin

Named Entity Recognition (NER) is a fundamental task in processing biomedical text. Due to the limited availability of labeled data, researchers have investigated few-shot learning methods to tackle this challenge. However, replicating the performance of fully supervised methods remains difficult in few-shot scenarios. This paper addresses two main issues. In terms of data augmentation, existing methods primarily focus on replacing content in the original text, which can potentially distort the semantics. Furthermore, current approaches often neglect sentence features at multiple scales. To overcome these challenges, we utilize ChatGPT to generate enriched data with distinct semantics for the same entities, thereby reducing noisy data. Simultaneously, we employ dynamic convolution to capture multi-scale semantic information in sentences and enhance feature representation based on PubMedBERT. We evaluated the experiments on four biomedical NER datasets (BC5CDR-Disease, NCBI, BioNLP11EPI, BioNLP13GE), and the results exceeded the current state-of-the-art models in most few-shot scenarios, including mainstream large language models like ChatGPT. The results confirm the effectiveness of the proposed method in data augmentation and model generalization.

命名实体识别（NER）是处理生物医学文本的一项基本任务。由于标注数据的可用性有限，研究人员研究了少量学习方法来应对这一挑战。然而，在少数几次学习的情况下，复制完全监督方法的性能仍然很困难。本文主要解决两个问题。在数据增强方面，现有方法主要侧重于替换原文内容，这可能会扭曲语义。此外，现有方法往往忽视了多种尺度的句子特征。为了克服这些挑战，我们利用 ChatGPT 为相同的实体生成具有不同语义的丰富数据，从而减少噪声数据。同时，我们利用动态卷积捕捉句子中的多尺度语义信息，并基于 PubMedBERT 增强特征表示。我们在四个生物医学 NER 数据集（BC5CDR-Disease、NCBI、BioNLP11EPI、BioNLP13GE）上进行了实验评估，结果显示，在大多数少数几个场景中，实验结果都超过了目前最先进的模型，包括主流的大型语言模型，如 ChatGPT。这些结果证实了所提出的方法在数据扩增和模型泛化方面的有效性。

{"title":"Few-shot biomedical NER empowered by LLMs-assisted data augmentation and multi-scale feature extraction.","authors":"Di Zhao, Wenxuan Mu, Xiangxing Jia, Shuang Liu, Yonghe Chu, Jiana Meng, Hongfei Lin","doi":"10.1186/s13040-025-00443-y","DOIUrl":"10.1186/s13040-025-00443-y","url":null,"abstract":"Named Entity Recognition (NER) is a fundamental task in processing biomedical text. Due to the limited availability of labeled data, researchers have investigated few-shot learning methods to tackle this challenge. However, replicating the performance of fully supervised methods remains difficult in few-shot scenarios. This paper addresses two main issues. In terms of data augmentation, existing methods primarily focus on replacing content in the original text, which can potentially distort the semantics. Furthermore, current approaches often neglect sentence features at multiple scales. To overcome these challenges, we utilize ChatGPT to generate enriched data with distinct semantics for the same entities, thereby reducing noisy data. Simultaneously, we employ dynamic convolution to capture multi-scale semantic information in sentences and enhance feature representation based on PubMedBERT. We evaluated the experiments on four biomedical NER datasets (BC5CDR-Disease, NCBI, BioNLP11EPI, BioNLP13GE), and the results exceeded the current state-of-the-art models in most few-shot scenarios, including mainstream large language models like ChatGPT. The results confirm the effectiveness of the proposed method in data augmentation and model generalization.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"28"},"PeriodicalIF":4.0,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11969866/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143781479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multivariate longitudinal clustering reveals neuropsychological factors as dementia predictors in an Alzheimer's disease progression study.

IF 4 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2025-03-28 DOI: 10.1186/s13040-025-00441-0

Patrizia Ribino, Claudia Di Napoli, Giovanni Paragliola, Davide Chicco, Francesca Gasparini

Dementia due to Alzheimer's disease (AD) is a multifaceted neurodegenerative disorder characterized by various cognitive and behavioral decline factors. In this work, we propose an extension of the traditional k-means clustering for multivariate time series data to cluster joint trajectories of different features describing progression over time. The algorithm we propose here enables the joint analysis of various longitudinal features to explore co-occurring trajectory factors among markers indicative of cognitive decline in individuals participating in an AD progression study. By examining how multiple variables co-vary and evolve together, we identify distinct subgroups within the cohort based on their longitudinal trajectories. Our clustering method enhances the understanding of individual development across multiple dimensions and provides deeper medical insights into the trajectories of cognitive decline. In addition, the proposed algorithm is also able to make a selection of the most significant features in separating clusters by considering trajectories over time. This process, together with a preliminary pre-processing on the OASIS-3 dataset, reveals an important role of some neuropsychological factors. In particular, the proposed method has identified a significant profile compatible with a syndrome known as Mild Behavioral Impairment (MBI), displaying behavioral manifestations of individuals that may precede the cognitive symptoms typically observed in AD patients. The findings underscore the importance of considering multiple longitudinal features in clinical modeling, ultimately supporting more effective and individualized patient management strategies.

{"title":"Multivariate longitudinal clustering reveals neuropsychological factors as dementia predictors in an Alzheimer's disease progression study.","authors":"Patrizia Ribino, Claudia Di Napoli, Giovanni Paragliola, Davide Chicco, Francesca Gasparini","doi":"10.1186/s13040-025-00441-0","DOIUrl":"https://doi.org/10.1186/s13040-025-00441-0","url":null,"abstract":"Dementia due to Alzheimer's disease (AD) is a multifaceted neurodegenerative disorder characterized by various cognitive and behavioral decline factors. In this work, we propose an extension of the traditional k-means clustering for multivariate time series data to cluster joint trajectories of different features describing progression over time. The algorithm we propose here enables the joint analysis of various longitudinal features to explore co-occurring trajectory factors among markers indicative of cognitive decline in individuals participating in an AD progression study. By examining how multiple variables co-vary and evolve together, we identify distinct subgroups within the cohort based on their longitudinal trajectories. Our clustering method enhances the understanding of individual development across multiple dimensions and provides deeper medical insights into the trajectories of cognitive decline. In addition, the proposed algorithm is also able to make a selection of the most significant features in separating clusters by considering trajectories over time. This process, together with a preliminary pre-processing on the OASIS-3 dataset, reveals an important role of some neuropsychological factors. In particular, the proposed method has identified a significant profile compatible with a syndrome known as Mild Behavioral Impairment (MBI), displaying behavioral manifestations of individuals that may precede the cognitive symptoms typically observed in AD patients. The findings underscore the importance of considering multiple longitudinal features in clinical modeling, ultimately supporting more effective and individualized patient management strategies.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"26"},"PeriodicalIF":4.0,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11951806/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143744332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Network-based multi-omics integrative analysis methods in drug discovery: a systematic review.

IF 4 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2025-03-28 DOI: 10.1186/s13040-025-00442-z

Wei Jiang, Weicai Ye, Xiaoming Tan, Yun-Juan Bao

The integration of multi-omics data from diverse high-throughput technologies has revolutionized drug discovery. While various network-based methods have been developed to integrate multi-omics data, systematic evaluation and comparison of these methods remain challenging. This review aims to analyze network-based approaches for multi-omics integration and evaluate their applications in drug discovery. We conducted a comprehensive review of literature (2015-2024) on network-based multi-omics integration methods in drug discovery, and categorized methods into four primary types: network propagation/diffusion, similarity-based approaches, graph neural networks, and network inference models. We also discussed the applications of the methods in three scenario of drug discovery, including drug target identification, drug response prediction, and drug repurposing, and finally evaluated the performance of the methods by highlighting their advantages and limitations in specific applications. While network-based multi-omics integration has shown promise in drug discovery, challenges remain in computational scalability, data integration, and biological interpretation. Future developments should focus on incorporating temporal and spatial dynamics, improving model interpretability, and establishing standardized evaluation frameworks.

整合来自不同高通量技术的多组学数据为药物发现带来了革命性的变化。虽然已经开发出各种基于网络的方法来整合多组学数据，但对这些方法进行系统的评估和比较仍然具有挑战性。本综述旨在分析基于网络的多组学整合方法，并评估其在药物发现中的应用。我们对药物发现中基于网络的多组学整合方法的文献（2015-2024 年）进行了全面综述，并将方法分为四种主要类型：网络传播/扩散、基于相似性的方法、图神经网络和网络推理模型。我们还讨论了这些方法在药物发现的三个场景中的应用，包括药物靶点识别、药物反应预测和药物再利用，最后评估了这些方法的性能，强调了它们在具体应用中的优势和局限性。虽然基于网络的多组学整合在药物发现中显示出了前景，但在计算可扩展性、数据整合和生物学解释方面仍然存在挑战。未来的发展重点应该是纳入时间和空间动态、提高模型的可解释性以及建立标准化的评估框架。

{"title":"Network-based multi-omics integrative analysis methods in drug discovery: a systematic review.","authors":"Wei Jiang, Weicai Ye, Xiaoming Tan, Yun-Juan Bao","doi":"10.1186/s13040-025-00442-z","DOIUrl":"https://doi.org/10.1186/s13040-025-00442-z","url":null,"abstract":"The integration of multi-omics data from diverse high-throughput technologies has revolutionized drug discovery. While various network-based methods have been developed to integrate multi-omics data, systematic evaluation and comparison of these methods remain challenging. This review aims to analyze network-based approaches for multi-omics integration and evaluate their applications in drug discovery. We conducted a comprehensive review of literature (2015-2024) on network-based multi-omics integration methods in drug discovery, and categorized methods into four primary types: network propagation/diffusion, similarity-based approaches, graph neural networks, and network inference models. We also discussed the applications of the methods in three scenario of drug discovery, including drug target identification, drug response prediction, and drug repurposing, and finally evaluated the performance of the methods by highlighting their advantages and limitations in specific applications. While network-based multi-omics integration has shown promise in drug discovery, challenges remain in computational scalability, data integration, and biological interpretation. Future developments should focus on incorporating temporal and spatial dynamics, improving model interpretability, and establishing standardized evaluation frameworks.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"27"},"PeriodicalIF":4.0,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11954193/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143744334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Advancing preeclampsia prediction: a tailored machine learning pipeline integrating resampling and ensemble models for handling imbalanced medical data.

IF 4 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2025-03-24 DOI: 10.1186/s13040-025-00440-1

Yinyao Ma, Hanlin Lv, Yanhua Ma, Xiao Wang, Longting Lv, Xuxia Liang, Lei Wang

Background: Constructing a predictive model is challenging in imbalanced medical dataset (such as preeclampsia), particularly when employing ensemble machine learning algorithms.

Objective: This study aims to develop a robust pipeline that enhances the predictive performance of ensemble machine learning models for the early prediction of preeclampsia in an imbalanced dataset.

Methods: Our research establishes a comprehensive pipeline optimized for early preeclampsia prediction in imbalanced medical datasets. We gathered electronic health records from pregnant women at the People's Hospital of Guangxi from 2015 to 2020, with additional external validation using three public datasets. This extensive data collection facilitated the systematic assessment of various resampling techniques, varied minority-to-majority ratios, and ensemble machine learning algorithms through a structured evaluation process. We analyzed 4,608 combinations of model settings against performance metrics such as G-mean, MCC, AP, and AUC to determine the most effective configurations. Advanced statistical analyses including OLS regression, ANOVA, and Kruskal-Wallis tests were utilized to fine-tune these settings, enhancing model performance and robustness for clinical application.

Results: Our analysis confirmed the significant impact of systematic sequential optimization of variables on the predictive performance of our models. The most effective configuration utilized the Inverse Weighted Gaussian Mixture Model for resampling, combined with Gradient Boosting Decision Trees algorithm, and an optimized minority-to-majority ratio of 0.09, achieving a Geometric Mean of 0.6694 (95% confidence interval: 0.5855-0.7557). This configuration significantly outperformed the baseline across all evaluated metrics, demonstrating substantial improvements in model performance.

Conclusions: This study establishes a robust pipeline that significantly enhances the predictive performance of models for preeclampsia within imbalanced datasets. Our findings underscore the importance of a strategic approach to variable optimization in medical diagnostics, offering potential for broad application in various medical contexts where class imbalance is a concern.

背景：在不平衡的医疗数据集（如子痫前期）中构建预测模型具有挑战性，尤其是在使用集合机器学习算法时：本研究旨在开发一个强大的管道，以提高集合机器学习模型的预测性能，从而在不平衡数据集中对子痫前期进行早期预测：我们的研究为在不平衡医疗数据集中进行子痫前期的早期预测建立了一个综合管道。我们收集了 2015 年至 2020 年广西人民医院孕妇的电子健康记录，并使用三个公共数据集进行了额外的外部验证。这种广泛的数据收集有助于通过结构化的评估过程，对各种重采样技术、不同的少数服从多数比率以及集合机器学习算法进行系统评估。我们根据 G-mean、MCC、AP 和 AUC 等性能指标分析了 4608 种模型设置组合，以确定最有效的配置。我们利用包括 OLS 回归、方差分析和 Kruskal-Wallis 检验在内的高级统计分析对这些设置进行了微调，从而提高了模型的性能和稳健性，以满足临床应用的需要：我们的分析证实，对变量进行系统的连续优化对我们模型的预测性能有重大影响。最有效的配置是利用反向加权高斯混杂模型进行重采样，并结合梯度提升决策树算法，优化后的少数服从多数比率为 0.09，几何平均数达到 0.6694（95% 置信区间：0.5855-0.7557）。这一配置在所有评估指标上都明显优于基线配置，表明模型性能有了大幅提高：本研究建立了一个稳健的管道，可显著提高不平衡数据集中子痫前期模型的预测性能。我们的研究结果强调了在医疗诊断中对变量进行战略性优化的重要性，为广泛应用于各种关注类不平衡的医疗环境提供了可能。

{"title":"Advancing preeclampsia prediction: a tailored machine learning pipeline integrating resampling and ensemble models for handling imbalanced medical data.","authors":"Yinyao Ma, Hanlin Lv, Yanhua Ma, Xiao Wang, Longting Lv, Xuxia Liang, Lei Wang","doi":"10.1186/s13040-025-00440-1","DOIUrl":"10.1186/s13040-025-00440-1","url":null,"abstract":"Background: Constructing a predictive model is challenging in imbalanced medical dataset (such as preeclampsia), particularly when employing ensemble machine learning algorithms.Objective: This study aims to develop a robust pipeline that enhances the predictive performance of ensemble machine learning models for the early prediction of preeclampsia in an imbalanced dataset.Methods: Our research establishes a comprehensive pipeline optimized for early preeclampsia prediction in imbalanced medical datasets. We gathered electronic health records from pregnant women at the People's Hospital of Guangxi from 2015 to 2020, with additional external validation using three public datasets. This extensive data collection facilitated the systematic assessment of various resampling techniques, varied minority-to-majority ratios, and ensemble machine learning algorithms through a structured evaluation process. We analyzed 4,608 combinations of model settings against performance metrics such as G-mean, MCC, AP, and AUC to determine the most effective configurations. Advanced statistical analyses including OLS regression, ANOVA, and Kruskal-Wallis tests were utilized to fine-tune these settings, enhancing model performance and robustness for clinical application.Results: Our analysis confirmed the significant impact of systematic sequential optimization of variables on the predictive performance of our models. The most effective configuration utilized the Inverse Weighted Gaussian Mixture Model for resampling, combined with Gradient Boosting Decision Trees algorithm, and an optimized minority-to-majority ratio of 0.09, achieving a Geometric Mean of 0.6694 (95% confidence interval: 0.5855-0.7557). This configuration significantly outperformed the baseline across all evaluated metrics, demonstrating substantial improvements in model performance.Conclusions: This study establishes a robust pipeline that significantly enhances the predictive performance of models for preeclampsia within imbalanced datasets. Our findings underscore the importance of a strategic approach to variable optimization in medical diagnostics, offering potential for broad application in various medical contexts where class imbalance is a concern.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"25"},"PeriodicalIF":4.0,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11934807/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143701866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

High-dimensional mediation analysis reveals the mediating role of physical activity patterns in genetic pathways leading to AD-like brain atrophy.

IF 4 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2025-03-24 DOI: 10.1186/s13040-025-00432-1

Hanxiang Xu, Shizhuo Mu, Jingxuan Bao, Christos Davatzikos, Haochang Shou, Li Shen

Background: Alzheimer's disease (AD) is a complex disorder that affects multiple biological systems including cognition, behavior and physical health. Unfortunately, the pathogenic mechanisms behind AD are not yet clear and the treatment options are still limited. Despite the increasing number of studies examining the pairwise relationships between genetic factors, physical activity (PA), and AD, few have successfully integrated all three domains of data, which may help reveal mechanisms and impact of these genomic and phenomic factors on AD. We use high-dimensional mediation analysis as an integrative framework to study the relationships among genetic factors, PA and AD-like brain atrophy quantified by spatial patterns of brain atrophy.

Results: We integrate data from genetics, PA and neuroimaging measures collected from 13,425 UK Biobank samples to unveil the complex relationship among genetic risk factors, behavior and brain signatures in the contexts of aging and AD. Specifically, we used a composite imaging marker, Spatial Pattern of Abnormality for Recognition of Early AD (SPARE-AD) that characterizes AD-like brain atrophy, as an outcome variable to represent AD risk. Through GWAS, we identified single nucleotide polymorphisms (SNPs) that are significantly associated with SPARE-AD as exposure variables. We employed conventional summary statistics and functional principal component analysis to extract patterns of PA as mediators. After constructing these variables, we utilized a high-dimensional mediation analysis method, Bayesian Mediation Analysis (BAMA), to estimate potential mediating pathways between SNPs, multivariate PA signatures and SPARE-AD. BAMA incorporates Bayesian continuous shrinkage prior to select the active mediators from a large pool of candidates. We identified a total of 22 mediation pathways, indicating how genetic variants can influence SPARE-AD by altering physical activity. By comparing the results with those obtained using univariate mediation analysis, we demonstrate the advantages of high-dimensional mediation analysis methods over univariate mediation analysis.

Conclusion: Through integrative analysis of multi-omics data, we identified several mediation pathways of physical activity between genetic factors and SPARE-AD. These findings contribute to a better understanding of the pathogenic mechanisms of AD. Moreover, our research demonstrates the potential of the high-dimensional mediation analysis method in revealing the mechanisms of disease.

{"title":"High-dimensional mediation analysis reveals the mediating role of physical activity patterns in genetic pathways leading to AD-like brain atrophy.","authors":"Hanxiang Xu, Shizhuo Mu, Jingxuan Bao, Christos Davatzikos, Haochang Shou, Li Shen","doi":"10.1186/s13040-025-00432-1","DOIUrl":"10.1186/s13040-025-00432-1","url":null,"abstract":"Background: Alzheimer's disease (AD) is a complex disorder that affects multiple biological systems including cognition, behavior and physical health. Unfortunately, the pathogenic mechanisms behind AD are not yet clear and the treatment options are still limited. Despite the increasing number of studies examining the pairwise relationships between genetic factors, physical activity (PA), and AD, few have successfully integrated all three domains of data, which may help reveal mechanisms and impact of these genomic and phenomic factors on AD. We use high-dimensional mediation analysis as an integrative framework to study the relationships among genetic factors, PA and AD-like brain atrophy quantified by spatial patterns of brain atrophy.Results: We integrate data from genetics, PA and neuroimaging measures collected from 13,425 UK Biobank samples to unveil the complex relationship among genetic risk factors, behavior and brain signatures in the contexts of aging and AD. Specifically, we used a composite imaging marker, Spatial Pattern of Abnormality for Recognition of Early AD (SPARE-AD) that characterizes AD-like brain atrophy, as an outcome variable to represent AD risk. Through GWAS, we identified single nucleotide polymorphisms (SNPs) that are significantly associated with SPARE-AD as exposure variables. We employed conventional summary statistics and functional principal component analysis to extract patterns of PA as mediators. After constructing these variables, we utilized a high-dimensional mediation analysis method, Bayesian Mediation Analysis (BAMA), to estimate potential mediating pathways between SNPs, multivariate PA signatures and SPARE-AD. BAMA incorporates Bayesian continuous shrinkage prior to select the active mediators from a large pool of candidates. We identified a total of 22 mediation pathways, indicating how genetic variants can influence SPARE-AD by altering physical activity. By comparing the results with those obtained using univariate mediation analysis, we demonstrate the advantages of high-dimensional mediation analysis methods over univariate mediation analysis.Conclusion: Through integrative analysis of multi-omics data, we identified several mediation pathways of physical activity between genetic factors and SPARE-AD. These findings contribute to a better understanding of the pathogenic mechanisms of AD. Moreover, our research demonstrates the potential of the high-dimensional mediation analysis method in revealing the mechanisms of disease.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"24"},"PeriodicalIF":4.0,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11931790/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143701870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Automatic detection and extraction of key resources from tables in biomedical papers.

IF 4 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2025-03-20 DOI: 10.1186/s13040-025-00438-9

Ibrahim Burak Ozyurt, Anita Bandrowski

Background: Tables are useful information artifacts that allow easy detection of missing data and have been deployed by several publishers to improve the amount of information present for key resources and reagents such as antibodies, cell lines, and other tools that constitute the inputs to a study. STAR*Methods key resource tables have increased the "findability" of these key resources, improving transparency of the paper by warning authors (before publication) about any problems, such as key resources that cannot be uniquely identified or those that are known to be problematic, but they have not been commonly available outside of the Cell Press journal family. We believe that processing preprints and adding these 'resource table candidates' automatically will improve the availability of structured and linked information about research resources in a broader swath of the scientific literature. However, if the authors have already added a key resource table, that table must be detected, and each entity must be correctly identified and faithfully restructured into a standard format.

Methods: We introduce four end-to-end table extraction pipelines to extract and faithfully reconstruct key resource tables from biomedical papers in PDF format. The pipelines employ machine learning approaches for key resource table page identification, "Table Transformer" models for table detection, and table structure recognition. We also introduce a character-level generative pre-trained transformer (GPT) language model for scientific tables pre-trained on over 11 million scientific tables. We fine-tuned our table-specific language model with synthetic training data generated with a novel approach to alleviate row over-segmentation significantly improving key resource extraction performance.

Results: The extraction of key resource tables in PDF files by the popular GROBID tool resulted in a Grid Table Similarity (GriTS) score of 0.12. All of our pipelines have outperformed GROBID by a large margin. Our best pipeline with table-specific language model-based row merger achieved a GriTS score of 0.90.

Conclusions: Our pipelines allow the detection and extraction of key resources from tables with much higher accuracy, enabling the deployment of automated research resource extraction tools on BioRxiv to help authors correct unidentifiable key resources detected in their articles and improve the reproducibility of their findings. The code, table-specific language model, annotated training and evaluation data are publicly available.

{"title":"Automatic detection and extraction of key resources from tables in biomedical papers.","authors":"Ibrahim Burak Ozyurt, Anita Bandrowski","doi":"10.1186/s13040-025-00438-9","DOIUrl":"10.1186/s13040-025-00438-9","url":null,"abstract":"Background: Tables are useful information artifacts that allow easy detection of missing data and have been deployed by several publishers to improve the amount of information present for key resources and reagents such as antibodies, cell lines, and other tools that constitute the inputs to a study. STAR*Methods key resource tables have increased the \"findability\" of these key resources, improving transparency of the paper by warning authors (before publication) about any problems, such as key resources that cannot be uniquely identified or those that are known to be problematic, but they have not been commonly available outside of the Cell Press journal family. We believe that processing preprints and adding these 'resource table candidates' automatically will improve the availability of structured and linked information about research resources in a broader swath of the scientific literature. However, if the authors have already added a key resource table, that table must be detected, and each entity must be correctly identified and faithfully restructured into a standard format.Methods: We introduce four end-to-end table extraction pipelines to extract and faithfully reconstruct key resource tables from biomedical papers in PDF format. The pipelines employ machine learning approaches for key resource table page identification, \"Table Transformer\" models for table detection, and table structure recognition. We also introduce a character-level generative pre-trained transformer (GPT) language model for scientific tables pre-trained on over 11 million scientific tables. We fine-tuned our table-specific language model with synthetic training data generated with a novel approach to alleviate row over-segmentation significantly improving key resource extraction performance.Results: The extraction of key resource tables in PDF files by the popular GROBID tool resulted in a Grid Table Similarity (GriTS) score of 0.12. All of our pipelines have outperformed GROBID by a large margin. Our best pipeline with table-specific language model-based row merger achieved a GriTS score of 0.90.Conclusions: Our pipelines allow the detection and extraction of key resources from tables with much higher accuracy, enabling the deployment of automated research resource extraction tools on BioRxiv to help authors correct unidentifiable key resources detected in their articles and improve the reproducibility of their findings. The code, table-specific language model, annotated training and evaluation data are publicly available.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"23"},"PeriodicalIF":4.0,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11924859/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143671632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Leveraging mixed-effects regression trees for the analysis of high-dimensional longitudinal data to identify the low and high-risk subgroups: simulation study with application to genetic study.

IF 4 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2025-03-19 DOI: 10.1186/s13040-025-00437-w

Mina Jahangiri, Anoshirvan Kazemnejad, Keith S Goldfeld, Maryam S Daneshpour, Mehdi Momen, Shayan Mostafaei, Davood Khalili, Mahdi Akbarzadeh

Background: The linear mixed-effects model (LME) is a conventional parametric method mainly used for analyzing longitudinal and clustered data in genetic studies. Previous studies have shown that this model can be sensitive to parametric assumptions and provides less predictive performance than non-parametric methods such as random effects-expectation maximization (RE-EM) and unbiased RE-EM regression tree algorithms. These longitudinal regression trees utilize classification and regression trees (CART) and conditional inference trees (Ctree) to estimate the fixed-effects components of the mixed-effects model. While CART is a well-known tree algorithm, it suffers from greediness. To mitigate this issue, we used the Evtree algorithm to estimate the fixed-effects part of the LME for handling longitudinal and clustered data in genome association studies.

Methods: In this study, we propose a new non-parametric longitudinal-based algorithm called "Ev-RE-EM" for modeling a continuous response variable using the Evtree algorithm to estimate the fixed-effects part of the LME. We compared its predictive performance with other tree algorithms, such as RE-EM and unbiased RE-EM, with and without considering the structure for autocorrelation between errors within subjects to analyze the longitudinal data in the genetic study. The autocorrelation structures include a first-order autoregressive process, a compound symmetric structure with a constant correlation, and a general correlation matrix. The real data was obtained from the longitudinal Tehran cardiometabolic genetic study (TCGS). The data modeling used body mass index (BMI) as the phenotype and included predictor variables such as age, sex, and 25,640 single nucleotide polymorphisms (SNPs).

Results: The results demonstrated that the predictive performance of Ev-RE-EM and unbiased RE-EM was nearly similar. Additionally, the Ev-RE-EM algorithm generated smaller trees than the unbiased RE-EM algorithm, enhancing tree interpretability.

Conclusion: The results showed that the unbiased RE-EM and Ev-RE-EM algorithms outperformed the RE-EM algorithm. Since algorithm performance varies across datasets, researchers should test different algorithms on the dataset of interest and select the best-performing one. Accurately predicting and diagnosing an individual's genetic profile is crucial in medical studies. The model with the highest accuracy should be used to enhance understanding of the genetics of complex traits, improve disease prevention and diagnosis, and aid in treating complex human diseases.

{"title":"Leveraging mixed-effects regression trees for the analysis of high-dimensional longitudinal data to identify the low and high-risk subgroups: simulation study with application to genetic study.","authors":"Mina Jahangiri, Anoshirvan Kazemnejad, Keith S Goldfeld, Maryam S Daneshpour, Mehdi Momen, Shayan Mostafaei, Davood Khalili, Mahdi Akbarzadeh","doi":"10.1186/s13040-025-00437-w","DOIUrl":"10.1186/s13040-025-00437-w","url":null,"abstract":"Background: The linear mixed-effects model (LME) is a conventional parametric method mainly used for analyzing longitudinal and clustered data in genetic studies. Previous studies have shown that this model can be sensitive to parametric assumptions and provides less predictive performance than non-parametric methods such as random effects-expectation maximization (RE-EM) and unbiased RE-EM regression tree algorithms. These longitudinal regression trees utilize classification and regression trees (CART) and conditional inference trees (Ctree) to estimate the fixed-effects components of the mixed-effects model. While CART is a well-known tree algorithm, it suffers from greediness. To mitigate this issue, we used the Evtree algorithm to estimate the fixed-effects part of the LME for handling longitudinal and clustered data in genome association studies.Methods: In this study, we propose a new non-parametric longitudinal-based algorithm called \"Ev-RE-EM\" for modeling a continuous response variable using the Evtree algorithm to estimate the fixed-effects part of the LME. We compared its predictive performance with other tree algorithms, such as RE-EM and unbiased RE-EM, with and without considering the structure for autocorrelation between errors within subjects to analyze the longitudinal data in the genetic study. The autocorrelation structures include a first-order autoregressive process, a compound symmetric structure with a constant correlation, and a general correlation matrix. The real data was obtained from the longitudinal Tehran cardiometabolic genetic study (TCGS). The data modeling used body mass index (BMI) as the phenotype and included predictor variables such as age, sex, and 25,640 single nucleotide polymorphisms (SNPs).Results: The results demonstrated that the predictive performance of Ev-RE-EM and unbiased RE-EM was nearly similar. Additionally, the Ev-RE-EM algorithm generated smaller trees than the unbiased RE-EM algorithm, enhancing tree interpretability.Conclusion: The results showed that the unbiased RE-EM and Ev-RE-EM algorithms outperformed the RE-EM algorithm. Since algorithm performance varies across datasets, researchers should test different algorithms on the dataset of interest and select the best-performing one. Accurately predicting and diagnosing an individual's genetic profile is crucial in medical studies. The model with the highest accuracy should be used to enhance understanding of the genetics of complex traits, improve disease prevention and diagnosis, and aid in treating complex human diseases.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"22"},"PeriodicalIF":4.0,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11924713/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143665028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unsupervised clustering based coronary artery segmentation.

IF 4 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2025-03-07 DOI: 10.1186/s13040-025-00435-y

Belén Serrano-Antón, Manuel Insúa Villa, Santiago Pendón-Minguillón, Santiago Paramés-Estévez, Alberto Otero-Cacho, Diego López-Otero, Brais Díaz-Fernández, María Bastos-Fernández, José R González-Juanatey, Alberto P Muñuzuri

Background: The acquisition of 3D geometries of coronary arteries from computed tomography coronary angiography (CTCA) is crucial for clinicians, enabling visualization of lesions and supporting decision-making processes. Manual segmentation of coronary arteries is time-consuming and prone to errors. There is growing interest in automatic segmentation algorithms, particularly those based on neural networks, which require large datasets and significant computational resources for training. This paper proposes an automatic segmentation methodology based on clustering algorithms and a graph structure, which integrates data from both the clustering process and the original images.

Results: The study compares two approaches: a 2.5D version using axial, sagittal, and coronal slices (3Axis), and a perpendicular version (Perp), which uses the cross-section of each vessel. The methodology was tested on two patient groups: a test set of 10 patients and an additional set of 22 patients with clinically diagnosed lesions. The 3Axis method achieved a Dice score of 0.88 in the test set and 0.83 in the lesion set, while the Perp method obtained Dice scores of 0.81 in the test set and 0.82 in the lesion set, decreasing to 0.79 and 0.80 in the lesion region, respectively. These results are competitive with current state-of-the-art methods.

Conclusions: This clustering-based segmentation approach offers a robust framework that can be easily integrated into clinical workflows, improving both accuracy and efficiency in coronary artery analysis. Additionally, the ability to visualize clusters and graphs from any cross-section enhances the method's explainability, providing clinicians with deeper insights into vascular structures. The study demonstrates the potential of clustering algorithms for improving segmentation performance in coronary artery imaging.

{"title":"Unsupervised clustering based coronary artery segmentation.","authors":"Belén Serrano-Antón, Manuel Insúa Villa, Santiago Pendón-Minguillón, Santiago Paramés-Estévez, Alberto Otero-Cacho, Diego López-Otero, Brais Díaz-Fernández, María Bastos-Fernández, José R González-Juanatey, Alberto P Muñuzuri","doi":"10.1186/s13040-025-00435-y","DOIUrl":"10.1186/s13040-025-00435-y","url":null,"abstract":"Background: The acquisition of 3D geometries of coronary arteries from computed tomography coronary angiography (CTCA) is crucial for clinicians, enabling visualization of lesions and supporting decision-making processes. Manual segmentation of coronary arteries is time-consuming and prone to errors. There is growing interest in automatic segmentation algorithms, particularly those based on neural networks, which require large datasets and significant computational resources for training. This paper proposes an automatic segmentation methodology based on clustering algorithms and a graph structure, which integrates data from both the clustering process and the original images.Results: The study compares two approaches: a 2.5D version using axial, sagittal, and coronal slices (3Axis), and a perpendicular version (Perp), which uses the cross-section of each vessel. The methodology was tested on two patient groups: a test set of 10 patients and an additional set of 22 patients with clinically diagnosed lesions. The 3Axis method achieved a Dice score of 0.88 in the test set and 0.83 in the lesion set, while the Perp method obtained Dice scores of 0.81 in the test set and 0.82 in the lesion set, decreasing to 0.79 and 0.80 in the lesion region, respectively. These results are competitive with current state-of-the-art methods.Conclusions: This clustering-based segmentation approach offers a robust framework that can be easily integrated into clinical workflows, improving both accuracy and efficiency in coronary artery analysis. Additionally, the ability to visualize clusters and graphs from any cross-section enhances the method's explainability, providing clinicians with deeper insights into vascular structures. The study demonstrates the potential of clustering algorithms for improving segmentation performance in coronary artery imaging.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"21"},"PeriodicalIF":4.0,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11887207/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143587591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

EnSCAN: ENsemble Scoring for prioritizing CAusative variaNts across multiplatform GWASs for late-onset alzheimer's disease.

IF 4 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2025-03-04 DOI: 10.1186/s13040-025-00436-x

Onur Erdogan, Cem Iyigun, Yeşim Aydın Son

Late-onset Alzheimer's disease (LOAD) is a progressive and complex neurodegenerative disorder of the aging population. LOAD is characterized by cognitive decline, such as deterioration of memory, loss of intellectual abilities, and other cognitive domains resulting from due to traumatic brain injuries. Alzheimer's Disease (AD) presents a complex genetic etiology that is still unclear, which limits its early or differential diagnosis. The Genome-Wide Association Studies (GWAS) enable the exploration of individual variants' statistical interactions at candidate loci, but univariate analysis overlooks interactions between variants. Machine learning (ML) algorithms can capture hidden, novel, and significant patterns while considering nonlinear interactions between variants to understand the genetic predisposition for complex genetic disorders. When working on different platforms, majority voting cannot be applied because the attributes differ. Hence, a new post-ML ensemble approach was developed to select significant SNVs via multiple genotyping platforms. We proposed the EnSCAN framework using a new algorithm to ensemble selected variants even from different platforms to prioritize candidate causative loci, which consequently helps improve ML results by combining the prior information captured from each dataset. The proposed ensemble algorithm utilizes the chromosomal locations of SNVs by mapping to cytogenetic bands, along with the proximities between pairs and multimodel Random Forest (RF) validations to prioritize SNVs and candidate causative genes for LOAD. The scoring method is scalable and can be applied to any multiplatform genotyping study. We present how the proposed EnSCAN scoring algorithm prioritizes candidate causative variants related to LOAD among three GWAS datasets.

{"title":"EnSCAN: ENsemble Scoring for prioritizing CAusative variaNts across multiplatform GWASs for late-onset alzheimer's disease.","authors":"Onur Erdogan, Cem Iyigun, Yeşim Aydın Son","doi":"10.1186/s13040-025-00436-x","DOIUrl":"10.1186/s13040-025-00436-x","url":null,"abstract":"Late-onset Alzheimer's disease (LOAD) is a progressive and complex neurodegenerative disorder of the aging population. LOAD is characterized by cognitive decline, such as deterioration of memory, loss of intellectual abilities, and other cognitive domains resulting from due to traumatic brain injuries. Alzheimer's Disease (AD) presents a complex genetic etiology that is still unclear, which limits its early or differential diagnosis. The Genome-Wide Association Studies (GWAS) enable the exploration of individual variants' statistical interactions at candidate loci, but univariate analysis overlooks interactions between variants. Machine learning (ML) algorithms can capture hidden, novel, and significant patterns while considering nonlinear interactions between variants to understand the genetic predisposition for complex genetic disorders. When working on different platforms, majority voting cannot be applied because the attributes differ. Hence, a new post-ML ensemble approach was developed to select significant SNVs via multiple genotyping platforms. We proposed the EnSCAN framework using a new algorithm to ensemble selected variants even from different platforms to prioritize candidate causative loci, which consequently helps improve ML results by combining the prior information captured from each dataset. The proposed ensemble algorithm utilizes the chromosomal locations of SNVs by mapping to cytogenetic bands, along with the proximities between pairs and multimodel Random Forest (RF) validations to prioritize SNVs and candidate causative genes for LOAD. The scoring method is scalable and can be applied to any multiplatform genotyping study. We present how the proposed EnSCAN scoring algorithm prioritizes candidate causative variants related to LOAD among three GWAS datasets.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"20"},"PeriodicalIF":4.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11881353/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143558468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Analysis of global trends and hotspots of skin microbiome in acne: a bibliometric perspective.

IF 4 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2025-03-03 DOI: 10.1186/s13040-025-00433-0

Lanfang Zhang, Yuan Cai, Lin Li, Jie Hu, Changsha Jia, Xu Kuang, Yi Zhou, Zhiai Lan, Chunyan Liu, Feng Jiang, Nana Sun, Ni Zeng

Background: Acne is a chronic inflammatory condition affecting the hair follicles and sebaceous glands. Recent research has revealed significant advances in the study of the acne skin microbiome. Systematic analysis of research trends and hotspots in the acne skin microbiome is lacking. This study utilized bibliometric methods to conduct in-depth research on the recognition structure of the acne skin microbiome, identifying hot trends and emerging topics.

Methods: We performed a topic search to retrieve articles about skin microbiome in acne from the Web of Science Core Collection. Bibliometric research was conducted using CiteSpace, VOSviewer, and R language.

Results: This study analyzed 757 articles from 1362 institutions in 68 countries, the United States leading the research efforts. Notably, Brigitte Dréno from the University of Nantes emerged as the most prolific author in this field, with 19 papers and 334 co-citations. The research output on the skin microbiome of acne continues to increase, with Experimental Dermatology being the journal with the highest number of published articles. The primary focus is investigating the skin microbiome's mechanisms in acne development and exploring treatment strategies. These findings have important implications for developing microbiome-targeted therapies, which could provide new, personalized treatment options for patients with acne. Emerging research hotspots include skincare, gut microbiome, and treatment.

Conclusion: The study's findings indicate a thriving research interest in the skin microbiome and its relationship to acne, focusing on acne treatment through the regulation of the skin microbiome balance. Currently, the development of skincare products targeting the regulation of the skin microbiome represents a research hotspot, reflecting the transition from basic scientific research to clinical practice.

{"title":"Analysis of global trends and hotspots of skin microbiome in acne: a bibliometric perspective.","authors":"Lanfang Zhang, Yuan Cai, Lin Li, Jie Hu, Changsha Jia, Xu Kuang, Yi Zhou, Zhiai Lan, Chunyan Liu, Feng Jiang, Nana Sun, Ni Zeng","doi":"10.1186/s13040-025-00433-0","DOIUrl":"10.1186/s13040-025-00433-0","url":null,"abstract":"Background: Acne is a chronic inflammatory condition affecting the hair follicles and sebaceous glands. Recent research has revealed significant advances in the study of the acne skin microbiome. Systematic analysis of research trends and hotspots in the acne skin microbiome is lacking. This study utilized bibliometric methods to conduct in-depth research on the recognition structure of the acne skin microbiome, identifying hot trends and emerging topics.Methods: We performed a topic search to retrieve articles about skin microbiome in acne from the Web of Science Core Collection. Bibliometric research was conducted using CiteSpace, VOSviewer, and R language.Results: This study analyzed 757 articles from 1362 institutions in 68 countries, the United States leading the research efforts. Notably, Brigitte Dréno from the University of Nantes emerged as the most prolific author in this field, with 19 papers and 334 co-citations. The research output on the skin microbiome of acne continues to increase, with Experimental Dermatology being the journal with the highest number of published articles. The primary focus is investigating the skin microbiome's mechanisms in acne development and exploring treatment strategies. These findings have important implications for developing microbiome-targeted therapies, which could provide new, personalized treatment options for patients with acne. Emerging research hotspots include skincare, gut microbiome, and treatment.Conclusion: The study's findings indicate a thriving research interest in the skin microbiome and its relationship to acne, focusing on acne treatment through the regulation of the skin microbiome balance. Currently, the development of skincare products targeting the regulation of the skin microbiome represents a research hotspot, reflecting the transition from basic scientific research to clinical practice.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"19"},"PeriodicalIF":4.0,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11874858/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143544184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0