LiGu-LVM：通过形态学剖析进行 IoMT 临床眼病筛查的语言引导生成式大型视觉模型

IF 8.9 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS IEEE Internet of Things Journal Pub Date : 2024-11-04 DOI:10.1109/JIOT.2024.3490595

Xingru Huang;Tianyun Zhang;Jian Huang;Yihao Guo;Gaopeng Huang;Han Yang;Zhiwen Zheng;Lou Zhao;Shaowei Jiang;Jin Liu;Guan Gui;Xiaoshuai Zhang

{"title":"LiGu-LVM：通过形态学剖析进行 IoMT 临床眼病筛查的语言引导生成式大型视觉模型","authors":"Xingru Huang;Tianyun Zhang;Jian Huang;Yihao Guo;Gaopeng Huang;Han Yang;Zhiwen Zheng;Lou Zhao;Shaowei Jiang;Jin Liu;Guan Gui;Xiaoshuai Zhang","doi":"10.1109/JIOT.2024.3490595","DOIUrl":null,"url":null,"abstract":"The early detection of ocular disorders, including Graves’ disease, myasthenia gravis, conjunctival hyperemia, conjunctivitis, and keratitis, which critically impair the vision of millions worldwide, necessitates large-scale screening predicated on ocular appearance measurements as a crucial diagnostic component. The emerging Internet of Medical Things (IoMT) introduces new avenues for local clinics to embrace portable and extensive diagnostics. However, the inherent heterogeneity and blurriness of ocular images, compounded by environmental noise, and the computational resource constraint hinder the high-precision diagnostics on IoMT devices. In response to these challenges, a linguistic-guided generative large vision model (LiGu-LVM) has been formulated to assist and enhance the diagnostic capability of IoMT-enabled ocular scanners, integrating a dynamically allocated high-speed quantization system (DAHSQS), a linguistic-guided generative local-isolation module (LiGu), an oculo visio transformatrix segmentum-analytica modulorum (OVT-SAM), and a multiscale recursive attention segmentation engine (MuRASE). DAHSQS enables the flexible aggregation and transmission of patient imagery to shift heavy diagnostic tasks from IoMT-enabled mobile ocular scanners to computational clusters, facilitating rapid facial measurements and preliminary screening via dynamic task allocation and scalable server clusters. The LiGu module employs natural language guidance to generate key image locations, using extensive prior knowledge embedded within linguistic models for precise semantic isolation. OVT-SAM synthesizes multilevel features from the large vision model, extracting intermediate characteristic information and addressing global features alongside deep semantic understanding in natural images collected from IoMT-enabled ocular scanners. MuRASE achieves high-fidelity segmentation of ocular images by incorporating contextual recursive attention mechanisms and skip connections with layer-wise reverse connectivity. Extensive experiments show proposed method surpassing 80% Intersection Over Union (IoU) in ocular semantic segmentation on the CelebA-HQ dataset, achieving an IoU of 82.9%, thus exceeding the performance of existing models by 4.9%.","PeriodicalId":54347,"journal":{"name":"IEEE Internet of Things Journal","volume":"12 10","pages":"13194-13207"},"PeriodicalIF":8.9000,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LiGu-LVM: Linguistic-Guided Generative Large Vision Model for IoMT Clinical Ocular Disease Screening via Morphology Dissection\",\"authors\":\"Xingru Huang;Tianyun Zhang;Jian Huang;Yihao Guo;Gaopeng Huang;Han Yang;Zhiwen Zheng;Lou Zhao;Shaowei Jiang;Jin Liu;Guan Gui;Xiaoshuai Zhang\",\"doi\":\"10.1109/JIOT.2024.3490595\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The early detection of ocular disorders, including Graves’ disease, myasthenia gravis, conjunctival hyperemia, conjunctivitis, and keratitis, which critically impair the vision of millions worldwide, necessitates large-scale screening predicated on ocular appearance measurements as a crucial diagnostic component. The emerging Internet of Medical Things (IoMT) introduces new avenues for local clinics to embrace portable and extensive diagnostics. However, the inherent heterogeneity and blurriness of ocular images, compounded by environmental noise, and the computational resource constraint hinder the high-precision diagnostics on IoMT devices. In response to these challenges, a linguistic-guided generative large vision model (LiGu-LVM) has been formulated to assist and enhance the diagnostic capability of IoMT-enabled ocular scanners, integrating a dynamically allocated high-speed quantization system (DAHSQS), a linguistic-guided generative local-isolation module (LiGu), an oculo visio transformatrix segmentum-analytica modulorum (OVT-SAM), and a multiscale recursive attention segmentation engine (MuRASE). DAHSQS enables the flexible aggregation and transmission of patient imagery to shift heavy diagnostic tasks from IoMT-enabled mobile ocular scanners to computational clusters, facilitating rapid facial measurements and preliminary screening via dynamic task allocation and scalable server clusters. The LiGu module employs natural language guidance to generate key image locations, using extensive prior knowledge embedded within linguistic models for precise semantic isolation. OVT-SAM synthesizes multilevel features from the large vision model, extracting intermediate characteristic information and addressing global features alongside deep semantic understanding in natural images collected from IoMT-enabled ocular scanners. MuRASE achieves high-fidelity segmentation of ocular images by incorporating contextual recursive attention mechanisms and skip connections with layer-wise reverse connectivity. Extensive experiments show proposed method surpassing 80% Intersection Over Union (IoU) in ocular semantic segmentation on the CelebA-HQ dataset, achieving an IoU of 82.9%, thus exceeding the performance of existing models by 4.9%.\",\"PeriodicalId\":54347,\"journal\":{\"name\":\"IEEE Internet of Things Journal\",\"volume\":\"12 10\",\"pages\":\"13194-13207\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2024-11-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Internet of Things Journal\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10742080/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Internet of Things Journal","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10742080/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

眼部疾病，包括格雷夫斯病、重症肌无力、结膜充血、结膜炎和角膜炎，这些严重损害全球数百万人视力的疾病的早期检测，需要基于眼部外观测量的大规模筛查，作为关键的诊断组成部分。新兴的医疗物联网（IoMT）为当地诊所引入了新的途径，以接受便携式和广泛的诊断。然而，眼图像固有的异质性和模糊性，加上环境噪声和计算资源的限制，阻碍了IoMT设备的高精度诊断。为了应对这些挑战，我们开发了一种基于语言引导的生成大视觉模型（LiGu- lvm），该模型集成了动态分配的高速量化系统（DAHSQS）、语言引导的生成局部隔离模块（LiGu）、视觉变换矩阵片段分析模块（OVT-SAM）和多尺度递归注意分割引擎（MuRASE），以辅助和增强基于iomt的眼扫描仪的诊断能力。DAHSQS能够灵活地聚合和传输患者图像，将繁重的诊断任务从支持iomt的移动眼扫描仪转移到计算集群，通过动态任务分配和可扩展的服务器集群促进快速面部测量和初步筛选。LiGu模块采用自然语言引导生成关键图像位置，使用语言模型中嵌入的广泛先验知识进行精确的语义隔离。OVT-SAM从大型视觉模型中综合多层特征，提取中间特征信息并处理全局特征，同时对从支持iomt的眼部扫描仪收集的自然图像进行深度语义理解。通过结合上下文递归注意机制和分层反向连接的跳过连接，MuRASE实现了高保真的眼部图像分割。大量实验表明，该方法在CelebA-HQ数据集上的视觉语义分割超过80%的IoU， IoU达到82.9%，比现有模型的性能提高4.9%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

LiGu-LVM: Linguistic-Guided Generative Large Vision Model for IoMT Clinical Ocular Disease Screening via Morphology Dissection

The early detection of ocular disorders, including Graves’ disease, myasthenia gravis, conjunctival hyperemia, conjunctivitis, and keratitis, which critically impair the vision of millions worldwide, necessitates large-scale screening predicated on ocular appearance measurements as a crucial diagnostic component. The emerging Internet of Medical Things (IoMT) introduces new avenues for local clinics to embrace portable and extensive diagnostics. However, the inherent heterogeneity and blurriness of ocular images, compounded by environmental noise, and the computational resource constraint hinder the high-precision diagnostics on IoMT devices. In response to these challenges, a linguistic-guided generative large vision model (LiGu-LVM) has been formulated to assist and enhance the diagnostic capability of IoMT-enabled ocular scanners, integrating a dynamically allocated high-speed quantization system (DAHSQS), a linguistic-guided generative local-isolation module (LiGu), an oculo visio transformatrix segmentum-analytica modulorum (OVT-SAM), and a multiscale recursive attention segmentation engine (MuRASE). DAHSQS enables the flexible aggregation and transmission of patient imagery to shift heavy diagnostic tasks from IoMT-enabled mobile ocular scanners to computational clusters, facilitating rapid facial measurements and preliminary screening via dynamic task allocation and scalable server clusters. The LiGu module employs natural language guidance to generate key image locations, using extensive prior knowledge embedded within linguistic models for precise semantic isolation. OVT-SAM synthesizes multilevel features from the large vision model, extracting intermediate characteristic information and addressing global features alongside deep semantic understanding in natural images collected from IoMT-enabled ocular scanners. MuRASE achieves high-fidelity segmentation of ocular images by incorporating contextual recursive attention mechanisms and skip connections with layer-wise reverse connectivity. Extensive experiments show proposed method surpassing 80% Intersection Over Union (IoU) in ocular semantic segmentation on the CelebA-HQ dataset, achieving an IoU of 82.9%, thus exceeding the performance of existing models by 4.9%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Internet of Things Journal Computer Science-Information Systems

CiteScore

17.60

自引率

13.20%

发文量

1982

期刊介绍： The EEE Internet of Things (IoT) Journal publishes articles and review articles covering various aspects of IoT, including IoT system architecture, IoT enabling technologies, IoT communication and networking protocols such as network coding, and IoT services and applications. Topics encompass IoT's impacts on sensor technologies, big data management, and future internet design for applications like smart cities and smart homes. Fields of interest include IoT architecture such as things-centric, data-centric, service-oriented IoT architecture; IoT enabling technologies and systematic integration such as sensor technologies, big sensor data management, and future Internet design for IoT; IoT services, applications, and test-beds such as IoT service middleware, IoT application programming interface (API), IoT application design, and IoT trials/experiments; IoT standardization activities and technology development in different standard development organizations (SDO) such as IEEE, IETF, ITU, 3GPP, ETSI, etc.