Interpretable multi-morphology and multi-scale microalgae classification based on machine learning

IF 4.5 2区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Algal Research-Biomass Biofuels and Bioproducts Pub Date : 2024-12-01 DOI:10.1016/j.algal.2024.103812
Huchao Yan , Xinggan Peng , Chao Wang , Ao Xia , Yun Huang , Xianqing Zhu , Jingmiao Zhang , Xun Zhu , Qiang Liao
{"title":"Interpretable multi-morphology and multi-scale microalgae classification based on machine learning","authors":"Huchao Yan ,&nbsp;Xinggan Peng ,&nbsp;Chao Wang ,&nbsp;Ao Xia ,&nbsp;Yun Huang ,&nbsp;Xianqing Zhu ,&nbsp;Jingmiao Zhang ,&nbsp;Xun Zhu ,&nbsp;Qiang Liao","doi":"10.1016/j.algal.2024.103812","DOIUrl":null,"url":null,"abstract":"<div><div>The multi-morphology and multi-scale mixed microalgae are widely distributed in natural and artificial systems. There is an urgent need to develop an efficient approach to classify the mixed microalgae for natural water system monitoring and microalgae bioprocesses, such as wastewater treatment, carbon dioxide capture and prevention of harmful algal blooms. The numerical feature datasets of pure and mixed cultures of multi-morphic microalgae with a size range between 5 and 500 μm are established in the study. A large number of input features increases model complexity and computational costs, and the feature space dimension was reduced from 24 dimensions to 11 dimensions using the Pearson coefficient matrix and principal component analysis to reduce the impact of unimportant factors. Research indicates that the classification performance of the ensemble model is significantly better than that of the linear and nonlinear models. The average F1_score of the random forest optimized by grid search classified pure and mixed microalgae are 0.952 and 0.943, respectively, which are 2.2 % and 1.0 % higher than those without optimization. The Shapley Additive exPlanations theory and the ensemble model are combined to analyze the critical factors for microalgae classification, and the texture features play a crucial role in all the numerical features of microalgae images.</div></div>","PeriodicalId":7855,"journal":{"name":"Algal Research-Biomass Biofuels and Bioproducts","volume":"84 ","pages":"Article 103812"},"PeriodicalIF":4.5000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Algal Research-Biomass Biofuels and Bioproducts","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2211926424004247","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

The multi-morphology and multi-scale mixed microalgae are widely distributed in natural and artificial systems. There is an urgent need to develop an efficient approach to classify the mixed microalgae for natural water system monitoring and microalgae bioprocesses, such as wastewater treatment, carbon dioxide capture and prevention of harmful algal blooms. The numerical feature datasets of pure and mixed cultures of multi-morphic microalgae with a size range between 5 and 500 μm are established in the study. A large number of input features increases model complexity and computational costs, and the feature space dimension was reduced from 24 dimensions to 11 dimensions using the Pearson coefficient matrix and principal component analysis to reduce the impact of unimportant factors. Research indicates that the classification performance of the ensemble model is significantly better than that of the linear and nonlinear models. The average F1_score of the random forest optimized by grid search classified pure and mixed microalgae are 0.952 and 0.943, respectively, which are 2.2 % and 1.0 % higher than those without optimization. The Shapley Additive exPlanations theory and the ensemble model are combined to analyze the critical factors for microalgae classification, and the texture features play a crucial role in all the numerical features of microalgae images.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于机器学习的可解释多形态和多尺度微藻分类
多形态、多尺度混合微藻广泛分布于自然系统和人工系统中。目前迫切需要开发一种有效的混合微藻分类方法,用于天然水系统监测和微藻生物处理,如废水处理、二氧化碳捕获和有害藻华的预防。建立了5 ~ 500 μm大小的多形态微藻纯培养物和混合培养物的数值特征数据集。大量的输入特征增加了模型复杂度和计算成本,利用Pearson系数矩阵和主成分分析将特征空间维数从24维降至11维,以减少不重要因素的影响。研究表明,集成模型的分类性能明显优于线性和非线性模型。经过网格搜索优化的纯微藻和混合微藻分类随机森林的平均F1_score分别为0.952和0.943,分别比未优化的随机森林高2.2%和1.0%。结合Shapley加性解释理论和集合模型分析了微藻分类的关键因素,纹理特征在微藻图像的所有数值特征中起着至关重要的作用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Algal Research-Biomass Biofuels and Bioproducts
Algal Research-Biomass Biofuels and Bioproducts BIOTECHNOLOGY & APPLIED MICROBIOLOGY-
CiteScore
9.40
自引率
7.80%
发文量
332
期刊介绍: Algal Research is an international phycology journal covering all areas of emerging technologies in algae biology, biomass production, cultivation, harvesting, extraction, bioproducts, biorefinery, engineering, and econometrics. Algae is defined to include cyanobacteria, microalgae, and protists and symbionts of interest in biotechnology. The journal publishes original research and reviews for the following scope: algal biology, including but not exclusive to: phylogeny, biodiversity, molecular traits, metabolic regulation, and genetic engineering, algal cultivation, e.g. phototrophic systems, heterotrophic systems, and mixotrophic systems, algal harvesting and extraction systems, biotechnology to convert algal biomass and components into biofuels and bioproducts, e.g., nutraceuticals, pharmaceuticals, animal feed, plastics, etc. algal products and their economic assessment
期刊最新文献
Acetate-enabled staged strategy in Chlorella vulgaris culture: Relieving ammonium inhibition while reducing ammonia stripping and enhancing protein content Seaweed as a functional feed additive in aquaculture: Efficacy, safety, and fish species-specific optimization Optimizing electrode-integrated microalgae cultivation systems for simultaneous bioelectricity and lipid production Cold adaptation mechanisms in Heterococcus viridis: Insights from morphological, physiological, and transcriptomic analyses Selenium accumulation in Arthrospira platensis under sulfur-limited conditions: modulation of selenium/sulfur molar ratio
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1