基于可视语言大模型的农业智能分析框架

Q1 Mathematics Applied Sciences Pub Date : 2024-09-17 DOI:10.3390/app14188350

Piaofang Yu, Bo Lin

{"title":"基于可视语言大模型的农业智能分析框架","authors":"Piaofang Yu, Bo Lin","doi":"10.3390/app14188350","DOIUrl":null,"url":null,"abstract":"Smart agriculture has become an inevitable trend in the development of modern agriculture, especially promoted by the continuous progress of large language models like chat generative pre-trained transformer (ChatGPT) and general language model (ChatGLM). Although these large models perform well in general knowledge learning, they still have certain limitations and errors when facing agricultural professional knowledge about crop disease identification, growth stage judgment, and so on. Agricultural data involves images and texts and other modalities, which play an important role in agricultural production and management. In order to better learn the characteristics of different modal data in agriculture, realize cross-modal data fusion, and thus understand complex application scenarios, we propose a framework AgriVLM that uses a large amount of agricultural data to fine-tune the visual language model to analyze agricultural data. It can fuse multimodal data and provide more comprehensive agricultural decision support. Specifically, it utilizes Q-former as a bridge between an image encoder and a language model to achieve a cross-modal fusion of agricultural images and text data. Then, we apply a Low-Rank adaptive to fine-tune the language model to achieve an alignment between agricultural image features and a pre-trained language model. The experimental results prove that AgriVLM demonstrates great performance in crop disease recognition and growth stage recognition, with recognition accuracy exceeding 90%, demonstrating its capability to analyze different modalities of agricultural data.","PeriodicalId":8224,"journal":{"name":"Applied Sciences","volume":"15 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Framework for Agricultural Intelligent Analysis Based on a Visual Language Large Model\",\"authors\":\"Piaofang Yu, Bo Lin\",\"doi\":\"10.3390/app14188350\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Smart agriculture has become an inevitable trend in the development of modern agriculture, especially promoted by the continuous progress of large language models like chat generative pre-trained transformer (ChatGPT) and general language model (ChatGLM). Although these large models perform well in general knowledge learning, they still have certain limitations and errors when facing agricultural professional knowledge about crop disease identification, growth stage judgment, and so on. Agricultural data involves images and texts and other modalities, which play an important role in agricultural production and management. In order to better learn the characteristics of different modal data in agriculture, realize cross-modal data fusion, and thus understand complex application scenarios, we propose a framework AgriVLM that uses a large amount of agricultural data to fine-tune the visual language model to analyze agricultural data. It can fuse multimodal data and provide more comprehensive agricultural decision support. Specifically, it utilizes Q-former as a bridge between an image encoder and a language model to achieve a cross-modal fusion of agricultural images and text data. Then, we apply a Low-Rank adaptive to fine-tune the language model to achieve an alignment between agricultural image features and a pre-trained language model. The experimental results prove that AgriVLM demonstrates great performance in crop disease recognition and growth stage recognition, with recognition accuracy exceeding 90%, demonstrating its capability to analyze different modalities of agricultural data.\",\"PeriodicalId\":8224,\"journal\":{\"name\":\"Applied Sciences\",\"volume\":\"15 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/app14188350\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Mathematics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/app14188350","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Mathematics","Score":null,"Total":0}

引用次数: 0

摘要

智慧农业已成为现代农业发展的必然趋势，尤其是聊天生成预训练变换器（ChatGPT）和通用语言模型（ChatGLM）等大型语言模型的不断进步推动了智慧农业的发展。虽然这些大型模型在常识学习方面表现出色，但在面对农作物病害识别、生长阶段判断等农业专业知识时，仍存在一定的局限性和误差。农业数据涉及图像和文本等多种方式，在农业生产和管理中发挥着重要作用。为了更好地学习农业中不同模态数据的特点，实现跨模态数据融合，从而理解复杂的应用场景，我们提出了一个框架 AgriVLM，利用大量农业数据来微调视觉语言模型，分析农业数据。它可以融合多模态数据，提供更全面的农业决策支持。具体来说，它利用 Q-former 作为图像编码器和语言模型之间的桥梁，实现农业图像和文本数据的跨模态融合。然后，我们采用低强自适应技术对语言模型进行微调，以实现农业图像特征与预训练语言模型之间的匹配。实验结果证明，AgriVLM 在作物病害识别和生长阶段识别方面表现出色，识别准确率超过 90%，证明了其分析不同模态农业数据的能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Framework for Agricultural Intelligent Analysis Based on a Visual Language Large Model

Smart agriculture has become an inevitable trend in the development of modern agriculture, especially promoted by the continuous progress of large language models like chat generative pre-trained transformer (ChatGPT) and general language model (ChatGLM). Although these large models perform well in general knowledge learning, they still have certain limitations and errors when facing agricultural professional knowledge about crop disease identification, growth stage judgment, and so on. Agricultural data involves images and texts and other modalities, which play an important role in agricultural production and management. In order to better learn the characteristics of different modal data in agriculture, realize cross-modal data fusion, and thus understand complex application scenarios, we propose a framework AgriVLM that uses a large amount of agricultural data to fine-tune the visual language model to analyze agricultural data. It can fuse multimodal data and provide more comprehensive agricultural decision support. Specifically, it utilizes Q-former as a bridge between an image encoder and a language model to achieve a cross-modal fusion of agricultural images and text data. Then, we apply a Low-Rank adaptive to fine-tune the language model to achieve an alignment between agricultural image features and a pre-trained language model. The experimental results prove that AgriVLM demonstrates great performance in crop disease recognition and growth stage recognition, with recognition accuracy exceeding 90%, demonstrating its capability to analyze different modalities of agricultural data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Applied Sciences Mathematics-Applied Mathematics

CiteScore

6.40

自引率

0.00%

发文量

审稿时长

11 weeks

期刊介绍： APPS is an international journal. APPS covers a wide spectrum of pure and applied mathematics in science and technology, promoting especially papers presented at Carpato-Balkan meetings. The Editorial Board of APPS takes a very active role in selecting and refereeing papers, ensuring the best quality of contemporary mathematics and its applications. APPS is abstracted in Zentralblatt für Mathematik. The APPS journal uses Double blind peer review.