Biodata Mining最新文献

英文中文

Optimizing large language models for ontology-based annotation: a study on gene ontology in biomedical texts. 优化基于本体的大型语言模型：生物医学文本中基因本体的研究。

IF 6.1 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2025-12-09 DOI: 10.1186/s13040-025-00507-z

Pratik Devkota, Somya D Mohanty, Prashanti Manda

Automated ontology annotation of scientific literature plays a critical role in knowledge management, particularly in fields like biology and biomedicine, where accurate concept tagging can enhance information retrieval, semantic search, and knowledge integration. Traditional models for ontology annotation, such as Recurrent Neural Networks (RNNs) and Bidirectional Gated Recurrent Units (Bi-GRUs), have been effective but limited in handling complex biomedical terminologies and semantic nuances. This study explores the potential of large language models (LLMs), including MPT-7B, Phi, BiomedLM, and Meditron, for improving ontology annotation, specifically with Gene Ontology (GO) concepts. We fine-tuned these models on the CRAFT dataset, assessing their performance in terms of F1 score, semantic similarity, memory usage, and inference speed. Our results show that while Bi-GRU baselines remain competitive in raw accuracy, LLMs offer complementary strengths. LLMs exhibit qualitatively higher semantic consistency in some cases, particularly when handling complex or multi-word ontology terms. However, these observations are exploratory and not statistically verified across all model types. However, resource requirements for LLMs are notably high, raising considerations about computational efficiency. Techniques like parameter-efficient fine-tuning (PEFT) and advanced prompting were explored to address these challenges, demonstrating potential in reducing computational demands while maintaining performance. Our findings suggest that while LLMs offer advantages in annotation accuracy, practical deployment should balance these benefits with resource costs. This research highlights the need for further optimization and domain-specific training to make LLMs a feasible choice for real-world biomedical ontology annotation tasks.

科学文献的自动本体标注在知识管理中起着至关重要的作用，特别是在生物学和生物医学等领域，准确的概念标注可以增强信息检索、语义搜索和知识集成。传统的本体注释模型，如循环神经网络（rnn）和双向门控循环单元（Bi-GRUs），在处理复杂的生物医学术语和语义细微差别方面是有效的，但受到限制。本研究探讨了大型语言模型（llm）的潜力，包括MPT-7B， Phi， BiomedLM和Meditron，用于改进本体注释，特别是基因本体（GO）概念。我们在CRAFT数据集上对这些模型进行了微调，从F1分数、语义相似性、内存使用和推理速度等方面评估了它们的性能。我们的研究结果表明，虽然Bi-GRU基线在原始精度方面仍然具有竞争力，但llm具有互补优势。llm在某些情况下表现出更高的语义一致性，特别是在处理复杂或多词本体术语时。然而，这些观察结果是探索性的，并没有在所有模型类型中进行统计验证。然而，法学硕士的资源需求非常高，这引起了对计算效率的考虑。为了应对这些挑战，研究人员探索了参数高效微调（PEFT）和高级提示等技术，展示了在保持性能的同时降低计算需求的潜力。我们的研究结果表明，虽然llm在注释准确性方面具有优势，但实际部署应该在这些优势与资源成本之间取得平衡。该研究强调了进一步优化和特定领域培训的必要性，以使llm成为现实世界生物医学本体注释任务的可行选择。

{"title":"Optimizing large language models for ontology-based annotation: a study on gene ontology in biomedical texts.","authors":"Pratik Devkota, Somya D Mohanty, Prashanti Manda","doi":"10.1186/s13040-025-00507-z","DOIUrl":"10.1186/s13040-025-00507-z","url":null,"abstract":"Automated ontology annotation of scientific literature plays a critical role in knowledge management, particularly in fields like biology and biomedicine, where accurate concept tagging can enhance information retrieval, semantic search, and knowledge integration. Traditional models for ontology annotation, such as Recurrent Neural Networks (RNNs) and Bidirectional Gated Recurrent Units (Bi-GRUs), have been effective but limited in handling complex biomedical terminologies and semantic nuances. This study explores the potential of large language models (LLMs), including MPT-7B, Phi, BiomedLM, and Meditron, for improving ontology annotation, specifically with Gene Ontology (GO) concepts. We fine-tuned these models on the CRAFT dataset, assessing their performance in terms of F1 score, semantic similarity, memory usage, and inference speed. Our results show that while Bi-GRU baselines remain competitive in raw accuracy, LLMs offer complementary strengths. LLMs exhibit qualitatively higher semantic consistency in some cases, particularly when handling complex or multi-word ontology terms. However, these observations are exploratory and not statistically verified across all model types. However, resource requirements for LLMs are notably high, raising considerations about computational efficiency. Techniques like parameter-efficient fine-tuning (PEFT) and advanced prompting were explored to address these challenges, demonstrating potential in reducing computational demands while maintaining performance. Our findings suggest that while LLMs offer advantages in annotation accuracy, practical deployment should balance these benefits with resource costs. This research highlights the need for further optimization and domain-specific training to make LLMs a feasible choice for real-world biomedical ontology annotation tasks.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":" ","pages":"5"},"PeriodicalIF":6.1,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12781589/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145716334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Clinically aligned multi-modal image-text model for pan-cancer prognosis prediction. 用于泛癌预后预测的临床对齐多模态图像-文本模型。

IF 6.1 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2025-12-06 DOI: 10.1186/s13040-025-00495-0

Jonghyun Lee, Jacob S Leiby, Lina Takemaru, Yidi Huang, Myung-Giun Noh, Jaesik Kim, Byonghan Lee, Mattew E Lee, Derek A Oldridge, Young-Gyu Eun, Hyun Jee Lee, Young Chan Lee, Dokyoon Kim

引用次数: 0

A fairness-aware machine learning framework for maternal health in Ghana: integrating explainability, bias mitigation, and causal inference for ethical AI deployment. 加纳孕产妇健康公平意识机器学习框架：整合可解释性、减轻偏见和伦理人工智能部署的因果推理

IF 6.1 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2025-12-05 DOI: 10.1186/s13040-025-00505-1

Augustus Osborne, Kobloobase Usani

引用次数: 0

Radio Frequency Tagging-enabled patient monitoring: integrating mobility tracking with early warning systems for enhanced safety. 启用射频标签的患者监测：将移动跟踪与早期预警系统相结合，以增强安全性。

IF 6.1 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2025-12-04 DOI: 10.1186/s13040-025-00504-2

Ahed Abugabah, Prashant Kumar Shukla, Piyush Kumar Shukla, Abhishek Dwivedi

引用次数: 0

Tree-based ensemble learning models for protein-protein interactions detection: a review and experimental evaluation. 基于树的蛋白质相互作用检测的集成学习模型：综述和实验评价。

IF 6.1 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2025-11-29 DOI: 10.1186/s13040-025-00501-5

Kamal Taha

Motivation: Protein-protein interactions (PPIs) are central to many biological processes, influencing cellular functions and offering potential therapeutic insights. The increasing availability of genomic and proteomic data has propelled computational approaches, particularly machine learning (ML), to the forefront of PPI prediction, addressing limitations of traditional experimental methods such as scalability and cost. This paper presents an in-depth review of modern ensemble learning models for PPI prediction, specifically focusing on XGBoost, Gradient Boosting, LightGBM, and Random Forest. Ensemble learning models are particularly noteworthy for their ability to surpass traditional approaches by leveraging the combined strengths of multiple base learners. Unlike previous reviews that often concentrate on theoretical comparisons or a limited selection of methods, this study offers a balanced analysis that includes both empirical findings and experimental evaluations. The review examines recent advancements in the field and assesses these models based on critical factors such as scalability, interpretability, accuracy, and efficiency, providing a structured framework to highlight nuanced differences among the techniques.

Scientific contribution: The article evaluates several ML techniques for PPI detection, with LightGBM and XGBoost emerging as the most effective methods due to their high accuracy and computational efficiency, particularly for large and complex datasets. Gradient Boosting and Random Forest also demonstrate strong performance, with Gradient Boosting excelling in capturing non-linear relationships and Random Forest offering robust interpretability. LightGBM stands out for its scalability and efficiency, while XGBoost benefits from built-in regularization to reduce overfitting.

Results: Experimental evaluations across three benchmark datasets-DIP, HPRD, and STRING-demonstrated that LightGBM consistently achieved the highest performance among all models, with average accuracy reaching up to 86%, and top sensitivity, specificity, and precision scores. XGBoost followed closely, showing strong generalization and robustness due to its regularization capabilities. Gradient Boosting provided competitive accuracy but lagged slightly in computational efficiency. Random Forest, while the most interpretable, showed comparatively lower sensitivity but maintained solid precision, highlighting its strength in minimizing false positives. Overall, LightGBM and XGBoost outperformed other methods in handling large, complex, and imbalanced PPI datasets. These results affirm the suitability of advanced ensemble learning techniques for enhancing PPI prediction and offer practical guidance on selecting models based on performance priorities and application constraints.

动机：蛋白质-蛋白质相互作用（PPIs）是许多生物过程的核心，影响细胞功能并提供潜在的治疗见解。基因组学和蛋白质组学数据的日益可用性推动了计算方法，特别是机器学习（ML），成为PPI预测的前沿，解决了传统实验方法的局限性，如可扩展性和成本。本文深入回顾了用于PPI预测的现代集成学习模型，特别关注XGBoost、Gradient Boosting、LightGBM和Random Forest。集成学习模型特别值得注意的是，它们能够通过利用多个基础学习器的综合优势来超越传统方法。不像以前的评论往往集中在理论比较或有限的方法选择，本研究提供了一个平衡的分析，包括实证研究结果和实验评估。本文考察了该领域的最新进展，并基于可扩展性、可解释性、准确性和效率等关键因素对这些模型进行了评估，提供了一个结构化的框架来突出这些技术之间的细微差异。科学贡献：本文评估了几种用于PPI检测的ML技术，其中LightGBM和XGBoost由于其高精度和计算效率而成为最有效的方法，特别是对于大型和复杂的数据集。梯度增强和随机森林也表现出很强的性能，梯度增强在捕获非线性关系方面表现出色，随机森林提供了强大的可解释性。LightGBM因其可扩展性和效率而脱颖而出，而XGBoost则得益于内置的正则化来减少过拟合。结果：对三个基准数据集（dip、HPRD和string）的实验评估表明，LightGBM在所有模型中始终具有最高的性能，平均准确率达到86%，灵敏度、特异性和精度得分最高。XGBoost紧随其后，由于其正则化能力，显示出强大的泛化和鲁棒性。梯度增强提供了具有竞争力的精度，但在计算效率上略微落后。随机森林虽然最具解释性，但灵敏度相对较低，但保持了可靠的精度，突出了其在减少误报方面的优势。总的来说，LightGBM和XGBoost在处理大型、复杂和不平衡的PPI数据集方面优于其他方法。这些结果证实了先进的集成学习技术在提高PPI预测方面的适用性，并为基于性能优先级和应用约束选择模型提供了实用指导。

{"title":"Tree-based ensemble learning models for protein-protein interactions detection: a review and experimental evaluation.","authors":"Kamal Taha","doi":"10.1186/s13040-025-00501-5","DOIUrl":"10.1186/s13040-025-00501-5","url":null,"abstract":"Motivation: Protein-protein interactions (PPIs) are central to many biological processes, influencing cellular functions and offering potential therapeutic insights. The increasing availability of genomic and proteomic data has propelled computational approaches, particularly machine learning (ML), to the forefront of PPI prediction, addressing limitations of traditional experimental methods such as scalability and cost. This paper presents an in-depth review of modern ensemble learning models for PPI prediction, specifically focusing on XGBoost, Gradient Boosting, LightGBM, and Random Forest. Ensemble learning models are particularly noteworthy for their ability to surpass traditional approaches by leveraging the combined strengths of multiple base learners. Unlike previous reviews that often concentrate on theoretical comparisons or a limited selection of methods, this study offers a balanced analysis that includes both empirical findings and experimental evaluations. The review examines recent advancements in the field and assesses these models based on critical factors such as scalability, interpretability, accuracy, and efficiency, providing a structured framework to highlight nuanced differences among the techniques.Scientific contribution: The article evaluates several ML techniques for PPI detection, with LightGBM and XGBoost emerging as the most effective methods due to their high accuracy and computational efficiency, particularly for large and complex datasets. Gradient Boosting and Random Forest also demonstrate strong performance, with Gradient Boosting excelling in capturing non-linear relationships and Random Forest offering robust interpretability. LightGBM stands out for its scalability and efficiency, while XGBoost benefits from built-in regularization to reduce overfitting.Results: Experimental evaluations across three benchmark datasets-DIP, HPRD, and STRING-demonstrated that LightGBM consistently achieved the highest performance among all models, with average accuracy reaching up to 86%, and top sensitivity, specificity, and precision scores. XGBoost followed closely, showing strong generalization and robustness due to its regularization capabilities. Gradient Boosting provided competitive accuracy but lagged slightly in computational efficiency. Random Forest, while the most interpretable, showed comparatively lower sensitivity but maintained solid precision, highlighting its strength in minimizing false positives. Overall, LightGBM and XGBoost outperformed other methods in handling large, complex, and imbalanced PPI datasets. These results affirm the suitability of advanced ensemble learning techniques for enhancing PPI prediction and offer practical guidance on selecting models based on performance priorities and application constraints.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":" ","pages":"1"},"PeriodicalIF":6.1,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12771752/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145641705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bioinformatics analysis of Rickettsia typhi autoimmune associations and screening of Streptomyces-derived inhibitors. 斑疹立克次体自身免疫关联的生物信息学分析和链霉菌衍生抑制剂的筛选。

IF 6.1 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2025-11-28 DOI: 10.1186/s13040-025-00499-w

Zarrin Basharat, Hanan A Ogaly, Fatimah A M Al-Zahrani, Calvin R Wei, Syed Shah Hassan, Muhammad Salman, Madiha Islam, Ibrar Ahmed, Yasir Waheed, Seil Kim

Background: Rickettsia typhi is the causative agent of epidemic murine typhus and Rocky Mountain spotted fever. The infection can affect multiple vital organs, including the heart, lungs, kidneys, and brain. Doxycycline is the recommended treatment but inflammation, mal-response, and drug resistance may arise. No natural product inhibitors have been reported against this bacterium.

Aim: The objective of this study was to establish a potential connection between autoimmune disorders triggered by R. typhi, identify therapeutic targets within its core proteome, and explore novel natural product inhibitors from Streptomyces spp. that could potentially inhibit it.

Methodology: Complete proteomes of four publicly available R. typhi strains were used for pan-proteomic analysis. The fni gene product (Isopentenyl pyrophosphate isomerase) was selected as the potential drug target. Molecular docking of 607 Streptomyces-derived metabolites was performed, with top hits validated using DiffDock and Vinardo scoring. Additionally, the Absorption, Distribution, Metabolism, Excretion, and Toxicity properties of the leading compounds were assessed via pkCSM, and formulation characteristics optimized using FormulationAI.

Results: Out of the 803 core proteins, associations between 14 proteins were mined for autoimmune diseases (including psoriasis, rheumatoid arthritis, optic atrophy, uveitis, even-plus syndrome, Sjogren syndrome, inflammatory bowel disease, allergic rhinitis, systemic lupus erythematosus, sclerosis, Stevens-Johnson syndrome, toxic epidermal necrolysis, colitis etc.). 17 core proteins were predicted as druggable. ZINC01482946 demonstrated the strongest inhibitory potential, as confirmed by DiffDock scoring, convolutional neural network-based ranking, and Vinardo scoring. It demonstrated a stable configuration and exhibited a favorable pharmacokinetic profile, with bioavailability enhanced through cyclodextrin complexation.

Conclusion: To the best of our knowledge, this is the first report identifying human autoimmune associations with R. typhi and natural product inhibitors targeting the pathogen. ZINC01482946 shows potential as an effective inhibitor of R. typhi, while SBE-β-CD appears to be a promising cyclodextrin for improving its solubility and bioavailability.

Clinical trial number: Not applicable.

背景：斑疹伤寒立克次体是流行性鼠斑疹伤寒和落基山斑疹热的病原体。这种感染可以影响多个重要器官，包括心脏、肺、肾脏和大脑。多西环素是推荐的治疗方法，但可能会出现炎症、不良反应和耐药性。没有天然产物抑制剂对这种细菌的报道。目的：本研究的目的是建立伤寒沙门氏菌引发的自身免疫性疾病之间的潜在联系，确定其核心蛋白质组的治疗靶点，并探索来自链霉菌的新型天然产物抑制剂，可能抑制它。方法：采用4株公开获取的斑疹伤寒菌株的完整蛋白质组进行泛蛋白质组学分析。fni基因产物（焦磷酸异戊烯基异构酶）被选为潜在的药物靶点。对607种链霉菌衍生的代谢物进行了分子对接，使用DiffDock和Vinardo评分验证了顶部命中。此外，通过pkCSM评估了主要化合物的吸收、分布、代谢、排泄和毒性特性，并通过FormulationAI优化了配方特征。结果：在803个核心蛋白中，有14个蛋白与自身免疫性疾病（包括牛皮癣、类风湿性关节炎、视神经萎缩、葡萄膜炎、偶联综合征、干燥综合征、炎症性肠病、变应性鼻炎、系统性红斑狼疮、硬化症、史蒂文斯-约翰逊综合征、中毒性表皮坏死松解、结肠炎等）相关。17个核心蛋白被预测为可药物。DiffDock评分、卷积神经网络评分和Vinardo评分均证实ZINC01482946具有最强的抑制潜能。它具有稳定的结构和良好的药代动力学特征，通过环糊精络合提高了生物利用度。结论：据我们所知，这是首次发现人类自身免疫与伤寒杆菌和针对病原体的天然产物抑制剂相关的报道。ZINC01482946显示出作为伤寒杆菌有效抑制剂的潜力，而SBE-β-CD则有望提高其溶解度和生物利用度。临床试验号：不适用。

{"title":"Bioinformatics analysis of Rickettsia typhi autoimmune associations and screening of Streptomyces-derived inhibitors.","authors":"Zarrin Basharat, Hanan A Ogaly, Fatimah A M Al-Zahrani, Calvin R Wei, Syed Shah Hassan, Muhammad Salman, Madiha Islam, Ibrar Ahmed, Yasir Waheed, Seil Kim","doi":"10.1186/s13040-025-00499-w","DOIUrl":"10.1186/s13040-025-00499-w","url":null,"abstract":"Background: Rickettsia typhi is the causative agent of epidemic murine typhus and Rocky Mountain spotted fever. The infection can affect multiple vital organs, including the heart, lungs, kidneys, and brain. Doxycycline is the recommended treatment but inflammation, mal-response, and drug resistance may arise. No natural product inhibitors have been reported against this bacterium.Aim: The objective of this study was to establish a potential connection between autoimmune disorders triggered by R. typhi, identify therapeutic targets within its core proteome, and explore novel natural product inhibitors from Streptomyces spp. that could potentially inhibit it.Methodology: Complete proteomes of four publicly available R. typhi strains were used for pan-proteomic analysis. The fni gene product (Isopentenyl pyrophosphate isomerase) was selected as the potential drug target. Molecular docking of 607 Streptomyces-derived metabolites was performed, with top hits validated using DiffDock and Vinardo scoring. Additionally, the Absorption, Distribution, Metabolism, Excretion, and Toxicity properties of the leading compounds were assessed via pkCSM, and formulation characteristics optimized using FormulationAI.Results: Out of the 803 core proteins, associations between 14 proteins were mined for autoimmune diseases (including psoriasis, rheumatoid arthritis, optic atrophy, uveitis, even-plus syndrome, Sjogren syndrome, inflammatory bowel disease, allergic rhinitis, systemic lupus erythematosus, sclerosis, Stevens-Johnson syndrome, toxic epidermal necrolysis, colitis etc.). 17 core proteins were predicted as druggable. ZINC01482946 demonstrated the strongest inhibitory potential, as confirmed by DiffDock scoring, convolutional neural network-based ranking, and Vinardo scoring. It demonstrated a stable configuration and exhibited a favorable pharmacokinetic profile, with bioavailability enhanced through cyclodextrin complexation.Conclusion: To the best of our knowledge, this is the first report identifying human autoimmune associations with R. typhi and natural product inhibitors targeting the pathogen. ZINC01482946 shows potential as an effective inhibitor of R. typhi, while SBE-β-CD appears to be a promising cyclodextrin for improving its solubility and bioavailability.Clinical trial number: Not applicable.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"83"},"PeriodicalIF":6.1,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12661794/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145641766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FARFOOD: a database of potential interactions between food compounds and drugs. FARFOOD：一个关于食物化合物和药物之间潜在相互作用的数据库。

IF 6.1 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2025-11-25 DOI: 10.1186/s13040-025-00493-2

Ana María Nevado-Bulnes, Dixan Agustin Benitez, Guadalupe Cumplido-Laso, Jose María Carvajal-González, Sonia Mulero-Navarro, Angel-Carlos Román

引用次数: 0

Deep vision in agriculture: assessing the function of YOLO in the classification of plant leaf diseases (PLDs). 农业中的深度视觉：评估YOLO在植物叶片病害分类中的功能。

IF 6.1 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2025-11-24 DOI: 10.1186/s13040-025-00497-y

Chhaya Gupta, Nasib Singh Gill, Preeti Gulia, Sangeeta Duhan, Noha Alduaiji, Piyush Kumar Shukla, Abhishek Dwivedi

Plant leaf diseases (PLDs) can continue to be a significant problem in the agricultural sector, leading to significant losses in production and jeopardizing food security. Early detection is essential, and recent achievements in the domain of deep learning (DL) have made automated high-accuracy solutions possible. The most popular and commonly used of these is the You Only Look Once (YOLO) family of object detection models, which have been proposed to detect plant diseases in real time. This review presents a new and in-depth synthesis of YOLO-based methods, including YOLOv1 to YOLOv10 and the domain-specific variants, including CTB-YOLO (coriander), BED-YOLO (YOLOv10n), and RAG-augmented YOLOv8 (coffee). This work compares to previous surveys in that (i) it presents a structured dataset catalog containing information on size, resolution, disease classes, and limitations (such as imbalance and annotation problems); (ii) it provides comparative benchmarking analysis of performance measures (accuracy, precision, recall, F1-score, mean Average Precision, and frames per second) across versions of YOLO to illustrate trade-offs between speed and accuracy; and (iii) it gives forward-looking discussion on how (ii) open challenges and (iii) future research directions, including lightweight YOLO models to run on mobile. This review presents a summative reference and a new contribution to the progress of the YOLO-based PLD detection approach to sustainable agriculture.

植物叶片病害可能继续成为农业部门的一个重大问题，导致重大生产损失并危及粮食安全。早期检测至关重要，深度学习（DL）领域的最新成就使自动化高精度解决方案成为可能。其中最流行和最常用的是You Only Look Once （YOLO）系列目标检测模型，该模型已被提出用于实时检测植物病害。本文综述了基于YOLOv1到YOLOv10以及CTB-YOLO（香菜）、ed - yolo （YOLOv10n）和ragg -增强YOLOv8（咖啡）的YOLOv10的新的深入合成方法。这项工作与之前的调查相比：(i)它提出了一个结构化的数据集目录，其中包含有关大小、分辨率、疾病类别和限制（如不平衡和注释问题）的信息；（ii）它提供了不同版本的YOLO性能指标（准确性，精度，召回率，f1分数，平均平均精度和每秒帧数）的比较基准分析，以说明速度和准确性之间的权衡；（iii）对如何（ii）开放挑战和（iii）未来研究方向（包括在移动设备上运行的轻量级YOLO模型）进行前瞻性讨论。本文综述了基于yolo的PLD检测方法在可持续农业中的研究进展，并对其作出了新的贡献。

{"title":"Deep vision in agriculture: assessing the function of YOLO in the classification of plant leaf diseases (PLDs).","authors":"Chhaya Gupta, Nasib Singh Gill, Preeti Gulia, Sangeeta Duhan, Noha Alduaiji, Piyush Kumar Shukla, Abhishek Dwivedi","doi":"10.1186/s13040-025-00497-y","DOIUrl":"10.1186/s13040-025-00497-y","url":null,"abstract":"Plant leaf diseases (PLDs) can continue to be a significant problem in the agricultural sector, leading to significant losses in production and jeopardizing food security. Early detection is essential, and recent achievements in the domain of deep learning (DL) have made automated high-accuracy solutions possible. The most popular and commonly used of these is the You Only Look Once (YOLO) family of object detection models, which have been proposed to detect plant diseases in real time. This review presents a new and in-depth synthesis of YOLO-based methods, including YOLOv1 to YOLOv10 and the domain-specific variants, including CTB-YOLO (coriander), BED-YOLO (YOLOv10n), and RAG-augmented YOLOv8 (coffee). This work compares to previous surveys in that (i) it presents a structured dataset catalog containing information on size, resolution, disease classes, and limitations (such as imbalance and annotation problems); (ii) it provides comparative benchmarking analysis of performance measures (accuracy, precision, recall, F1-score, mean Average Precision, and frames per second) across versions of YOLO to illustrate trade-offs between speed and accuracy; and (iii) it gives forward-looking discussion on how (ii) open challenges and (iii) future research directions, including lightweight YOLO models to run on mobile. This review presents a summative reference and a new contribution to the progress of the YOLO-based PLD detection approach to sustainable agriculture.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":" ","pages":"91"},"PeriodicalIF":6.1,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12750877/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145597894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An explainable machine learning model for predicting postoperative cholangitis in pediatric surgical patients with pancreaticobiliary maljunction. 一种可解释的机器学习模型用于预测小儿外科胰胆管畸形患者术后胆管炎。

IF 6.1 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2025-11-19 DOI: 10.1186/s13040-025-00491-4

Hui-Min Mao, Jia Geng, Bin Zhu, Shun-Gen Huang, Wan-Liang Guo

引用次数: 0

Optimizing accuracy and dimensionality: a swarm intelligence strategy for robust cancer genomics classification. 优化准确性和维度：一种稳健的癌症基因组学分类的群体智能策略。

IF 6.1 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2025-11-18 DOI: 10.1186/s13040-025-00503-3

Abrar Yaqoob, Mushtaq Ahmad Mir, R Vijaya Lakshmi, Tejaswini Pradhan, G V V Jagannadha Rao, Ghanshyam G Tejani, Mohd Asif Shah

High-dimensional gene expression datasets pose a major challenge in cancer classification due to redundancy, noise, and the risk of overfitting. To address these issues, this study proposes a hybrid framework that integrates the Dung Beetle Optimizer (DBO) for feature selection with Support Vector Machines (SVM) for classification. DBO, a recently developed nature-inspired algorithm, effectively identifies informative and non-redundant subsets of genes by simulating dung beetles' foraging, rolling, obstacle avoidance, stealing, and breeding behaviors. The selected features are then classified using SVM with Radial Basis Function (RBF) kernels, which provide robust decision boundaries even in high-dimensional spaces. Extensive experiments were conducted on publicly available cancer-related gene expression datasets, covering binary, ternary, and quaternary classification tasks. Results show that the proposed DBO-SVM framework achieves 97.4-98.0% accuracy on binary datasets and 84-88% accuracy on multiclass datasets, with balanced Precision, Recall, and F1-scores. These findings highlight the method's ability to enhance classification performance while reducing computational cost and improving biological interpretability. The proposed hybrid model demonstrates strong potential as an efficient and reliable tool for precision medicine and biomedical data analysis.

由于冗余、噪声和过拟合风险，高维基因表达数据集对癌症分类构成了重大挑战。为了解决这些问题，本研究提出了一个混合框架，该框架将用于特征选择的屎壳郎优化器（DBO）与用于分类的支持向量机（SVM）集成在一起。DBO是最近开发的一种受自然启发的算法，通过模拟蜣螂的觅食、滚动、避障、偷窃和繁殖行为，有效地识别信息丰富且非冗余的基因子集。然后使用具有径向基函数（RBF）核的支持向量机对所选特征进行分类，即使在高维空间中也提供鲁棒决策边界。在公开可用的癌症相关基因表达数据集上进行了广泛的实验，包括二元、三元和四元分类任务。结果表明，本文提出的DBO-SVM框架在二值数据集上的准确率为974 -98.0%，在多类数据集上的准确率为84-88%，精度、召回率和f1分数平衡。这些发现突出了该方法在降低计算成本和提高生物可解释性的同时提高分类性能的能力。所提出的混合模型显示了强大的潜力，作为一个有效和可靠的工具，精准医疗和生物医学数据分析。

{"title":"Optimizing accuracy and dimensionality: a swarm intelligence strategy for robust cancer genomics classification.","authors":"Abrar Yaqoob, Mushtaq Ahmad Mir, R Vijaya Lakshmi, Tejaswini Pradhan, G V V Jagannadha Rao, Ghanshyam G Tejani, Mohd Asif Shah","doi":"10.1186/s13040-025-00503-3","DOIUrl":"10.1186/s13040-025-00503-3","url":null,"abstract":"High-dimensional gene expression datasets pose a major challenge in cancer classification due to redundancy, noise, and the risk of overfitting. To address these issues, this study proposes a hybrid framework that integrates the Dung Beetle Optimizer (DBO) for feature selection with Support Vector Machines (SVM) for classification. DBO, a recently developed nature-inspired algorithm, effectively identifies informative and non-redundant subsets of genes by simulating dung beetles' foraging, rolling, obstacle avoidance, stealing, and breeding behaviors. The selected features are then classified using SVM with Radial Basis Function (RBF) kernels, which provide robust decision boundaries even in high-dimensional spaces. Extensive experiments were conducted on publicly available cancer-related gene expression datasets, covering binary, ternary, and quaternary classification tasks. Results show that the proposed DBO-SVM framework achieves 97.4-98.0% accuracy on binary datasets and 84-88% accuracy on multiclass datasets, with balanced Precision, Recall, and F1-scores. These findings highlight the method's ability to enhance classification performance while reducing computational cost and improving biological interpretability. The proposed hybrid model demonstrates strong potential as an efficient and reliable tool for precision medicine and biomedical data analysis.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"79"},"PeriodicalIF":6.1,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12625619/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145551437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Biodata Mining

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀