Harnessing multimodal data fusion to advance accurate identification of fish feeding intensity

IF 4.4 1区 农林科学 Q1 AGRICULTURAL ENGINEERING Biosystems Engineering Pub Date : 2024-08-06 DOI:10.1016/j.biosystemseng.2024.08.001
{"title":"Harnessing multimodal data fusion to advance accurate identification of fish feeding intensity","authors":"","doi":"10.1016/j.biosystemseng.2024.08.001","DOIUrl":null,"url":null,"abstract":"<div><p>Accurately identifying the fish feeding intensity plays a vital role in aquaculture. While traditional methods are limited by single modality (e.g., water quality, vision, audio), they often lack comprehensive representation, leading to low identification accuracy. In contrast, the multimodal fusion methods leverage the fusion of features from different modalities to obtain richer target features, thereby significantly enhancing the performance of fish feeding intensity assessment (FFIA). In this work a multimodal dataset called MRS-FFIA was introduced. The MRS-FFIA dataset consists of 7611 labelled audio, video and acoustic dataset, and divided the dataset into four different feeding intensity (strong, medium, weak, and none). To address the limitations of single modality methods, a Multimodal Fusion of Fish Feeding Intensity fusion (MFFFI) model was proposed. The MFFFI model is first extracting deep features from three modal data audio (Mel), video (RGB), Acoustic (SI). Then, image stitching techniques are employed to fuse these extracted features. Finally, the fused features are passed through a classifier to obtain the results. The test results show that the accuracy of the fused multimodal information is 99.26%, which improves the accuracy by 12.80%, 13.77%, and 2.86%, respectively, compared to the best results for single-modality (audio, video and acoustic dataset). This result demonstrates that the method proposed in this paper is better at classifying the feeding intensity of fish and can achieve higher accuracy. In addition, compared with the mainstream single-modality approach, the model improves 1.5%–10.8% in accuracy, and the lightweight effect is more obvious. Based on the multimodal fusion method, the feeding decision can be optimised effectively, which provides technical support for the development of intelligent feeding systems.</p></div>","PeriodicalId":9173,"journal":{"name":"Biosystems Engineering","volume":null,"pages":null},"PeriodicalIF":4.4000,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biosystems Engineering","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1537511024001739","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURAL ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

Accurately identifying the fish feeding intensity plays a vital role in aquaculture. While traditional methods are limited by single modality (e.g., water quality, vision, audio), they often lack comprehensive representation, leading to low identification accuracy. In contrast, the multimodal fusion methods leverage the fusion of features from different modalities to obtain richer target features, thereby significantly enhancing the performance of fish feeding intensity assessment (FFIA). In this work a multimodal dataset called MRS-FFIA was introduced. The MRS-FFIA dataset consists of 7611 labelled audio, video and acoustic dataset, and divided the dataset into four different feeding intensity (strong, medium, weak, and none). To address the limitations of single modality methods, a Multimodal Fusion of Fish Feeding Intensity fusion (MFFFI) model was proposed. The MFFFI model is first extracting deep features from three modal data audio (Mel), video (RGB), Acoustic (SI). Then, image stitching techniques are employed to fuse these extracted features. Finally, the fused features are passed through a classifier to obtain the results. The test results show that the accuracy of the fused multimodal information is 99.26%, which improves the accuracy by 12.80%, 13.77%, and 2.86%, respectively, compared to the best results for single-modality (audio, video and acoustic dataset). This result demonstrates that the method proposed in this paper is better at classifying the feeding intensity of fish and can achieve higher accuracy. In addition, compared with the mainstream single-modality approach, the model improves 1.5%–10.8% in accuracy, and the lightweight effect is more obvious. Based on the multimodal fusion method, the feeding decision can be optimised effectively, which provides technical support for the development of intelligent feeding systems.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用多模态数据融合推进鱼类摄食强度的精确识别
准确识别鱼类的摄食强度在水产养殖中起着至关重要的作用。传统方法受限于单一模式(如水质、视觉、音频),往往缺乏全面的表征,导致识别准确率较低。相比之下,多模态融合方法利用不同模态的特征进行融合,以获得更丰富的目标特征,从而显著提高鱼类摄食强度评估(FFIA)的性能。本研究引入了一个名为 MRS-FFIA 的多模态数据集。MRS-FFIA 数据集由 7611 个带标签的音频、视频和声学数据集组成,并将数据集分为四种不同的摄食强度(强、中、弱和无)。针对单一模态方法的局限性,提出了鱼类摄食强度多模态融合模型(MFFFI)。MFFFI 模型首先从音频(Mel)、视频(RGB)和声学(SI)三种模态数据中提取深度特征。然后,采用图像拼接技术来融合这些提取的特征。最后,将融合后的特征通过分类器得出结果。测试结果表明,融合后的多模态信息准确率为 99.26%,与单模态(音频、视频和声学数据集)的最佳结果相比,准确率分别提高了 12.80%、13.77% 和 2.86%。这一结果表明,本文提出的方法能更好地对鱼类的摄食强度进行分类,并能达到更高的准确度。此外,与主流的单模态方法相比,该模型的准确率提高了 1.5%-10.8%,轻量化效果更加明显。基于多模态融合方法,可以有效优化投喂决策,为智能投喂系统的开发提供技术支持。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Biosystems Engineering
Biosystems Engineering 农林科学-农业工程
CiteScore
10.60
自引率
7.80%
发文量
239
审稿时长
53 days
期刊介绍: Biosystems Engineering publishes research in engineering and the physical sciences that represent advances in understanding or modelling of the performance of biological systems for sustainable developments in land use and the environment, agriculture and amenity, bioproduction processes and the food chain. The subject matter of the journal reflects the wide range and interdisciplinary nature of research in engineering for biological systems.
期刊最新文献
Effects of polyphenol-rich extracts and compounds on methane and ammonia emissions from pig slurry during 28-day incubation Optimising maize threshing process with temporal proximity soft actor-critic deep reinforcement learning algorithm Scaled experimental study of a ventilation system featuring partition jet and pit exhaust Simulation and experimental study on frictional wear of plough blades in soil cultivation process based on the Archard model Harvest motion planning for mango picking robot based on improved RRT-Connect
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1