{"title":"LiGu-LVM:通过形态学剖析进行 IoMT 临床眼病筛查的语言引导生成式大型视觉模型","authors":"Xingru Huang;Tianyun Zhang;Jian Huang;Yihao Guo;Gaopeng Huang;Han Yang;Zhiwen Zheng;Lou Zhao;Shaowei Jiang;Jin Liu;Guan Gui;Xiaoshuai Zhang","doi":"10.1109/JIOT.2024.3490595","DOIUrl":null,"url":null,"abstract":"The early detection of ocular disorders, including Graves’ disease, myasthenia gravis, conjunctival hyperemia, conjunctivitis, and keratitis, which critically impair the vision of millions worldwide, necessitates large-scale screening predicated on ocular appearance measurements as a crucial diagnostic component. The emerging Internet of Medical Things (IoMT) introduces new avenues for local clinics to embrace portable and extensive diagnostics. However, the inherent heterogeneity and blurriness of ocular images, compounded by environmental noise, and the computational resource constraint hinder the high-precision diagnostics on IoMT devices. In response to these challenges, a linguistic-guided generative large vision model (LiGu-LVM) has been formulated to assist and enhance the diagnostic capability of IoMT-enabled ocular scanners, integrating a dynamically allocated high-speed quantization system (DAHSQS), a linguistic-guided generative local-isolation module (LiGu), an oculo visio transformatrix segmentum-analytica modulorum (OVT-SAM), and a multiscale recursive attention segmentation engine (MuRASE). DAHSQS enables the flexible aggregation and transmission of patient imagery to shift heavy diagnostic tasks from IoMT-enabled mobile ocular scanners to computational clusters, facilitating rapid facial measurements and preliminary screening via dynamic task allocation and scalable server clusters. The LiGu module employs natural language guidance to generate key image locations, using extensive prior knowledge embedded within linguistic models for precise semantic isolation. OVT-SAM synthesizes multilevel features from the large vision model, extracting intermediate characteristic information and addressing global features alongside deep semantic understanding in natural images collected from IoMT-enabled ocular scanners. MuRASE achieves high-fidelity segmentation of ocular images by incorporating contextual recursive attention mechanisms and skip connections with layer-wise reverse connectivity. Extensive experiments show proposed method surpassing 80% Intersection Over Union (IoU) in ocular semantic segmentation on the CelebA-HQ dataset, achieving an IoU of 82.9%, thus exceeding the performance of existing models by 4.9%.","PeriodicalId":54347,"journal":{"name":"IEEE Internet of Things Journal","volume":"12 10","pages":"13194-13207"},"PeriodicalIF":8.9000,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LiGu-LVM: Linguistic-Guided Generative Large Vision Model for IoMT Clinical Ocular Disease Screening via Morphology Dissection\",\"authors\":\"Xingru Huang;Tianyun Zhang;Jian Huang;Yihao Guo;Gaopeng Huang;Han Yang;Zhiwen Zheng;Lou Zhao;Shaowei Jiang;Jin Liu;Guan Gui;Xiaoshuai Zhang\",\"doi\":\"10.1109/JIOT.2024.3490595\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The early detection of ocular disorders, including Graves’ disease, myasthenia gravis, conjunctival hyperemia, conjunctivitis, and keratitis, which critically impair the vision of millions worldwide, necessitates large-scale screening predicated on ocular appearance measurements as a crucial diagnostic component. The emerging Internet of Medical Things (IoMT) introduces new avenues for local clinics to embrace portable and extensive diagnostics. However, the inherent heterogeneity and blurriness of ocular images, compounded by environmental noise, and the computational resource constraint hinder the high-precision diagnostics on IoMT devices. In response to these challenges, a linguistic-guided generative large vision model (LiGu-LVM) has been formulated to assist and enhance the diagnostic capability of IoMT-enabled ocular scanners, integrating a dynamically allocated high-speed quantization system (DAHSQS), a linguistic-guided generative local-isolation module (LiGu), an oculo visio transformatrix segmentum-analytica modulorum (OVT-SAM), and a multiscale recursive attention segmentation engine (MuRASE). DAHSQS enables the flexible aggregation and transmission of patient imagery to shift heavy diagnostic tasks from IoMT-enabled mobile ocular scanners to computational clusters, facilitating rapid facial measurements and preliminary screening via dynamic task allocation and scalable server clusters. The LiGu module employs natural language guidance to generate key image locations, using extensive prior knowledge embedded within linguistic models for precise semantic isolation. OVT-SAM synthesizes multilevel features from the large vision model, extracting intermediate characteristic information and addressing global features alongside deep semantic understanding in natural images collected from IoMT-enabled ocular scanners. MuRASE achieves high-fidelity segmentation of ocular images by incorporating contextual recursive attention mechanisms and skip connections with layer-wise reverse connectivity. Extensive experiments show proposed method surpassing 80% Intersection Over Union (IoU) in ocular semantic segmentation on the CelebA-HQ dataset, achieving an IoU of 82.9%, thus exceeding the performance of existing models by 4.9%.\",\"PeriodicalId\":54347,\"journal\":{\"name\":\"IEEE Internet of Things Journal\",\"volume\":\"12 10\",\"pages\":\"13194-13207\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2024-11-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Internet of Things Journal\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10742080/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Internet of Things Journal","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10742080/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
LiGu-LVM: Linguistic-Guided Generative Large Vision Model for IoMT Clinical Ocular Disease Screening via Morphology Dissection
The early detection of ocular disorders, including Graves’ disease, myasthenia gravis, conjunctival hyperemia, conjunctivitis, and keratitis, which critically impair the vision of millions worldwide, necessitates large-scale screening predicated on ocular appearance measurements as a crucial diagnostic component. The emerging Internet of Medical Things (IoMT) introduces new avenues for local clinics to embrace portable and extensive diagnostics. However, the inherent heterogeneity and blurriness of ocular images, compounded by environmental noise, and the computational resource constraint hinder the high-precision diagnostics on IoMT devices. In response to these challenges, a linguistic-guided generative large vision model (LiGu-LVM) has been formulated to assist and enhance the diagnostic capability of IoMT-enabled ocular scanners, integrating a dynamically allocated high-speed quantization system (DAHSQS), a linguistic-guided generative local-isolation module (LiGu), an oculo visio transformatrix segmentum-analytica modulorum (OVT-SAM), and a multiscale recursive attention segmentation engine (MuRASE). DAHSQS enables the flexible aggregation and transmission of patient imagery to shift heavy diagnostic tasks from IoMT-enabled mobile ocular scanners to computational clusters, facilitating rapid facial measurements and preliminary screening via dynamic task allocation and scalable server clusters. The LiGu module employs natural language guidance to generate key image locations, using extensive prior knowledge embedded within linguistic models for precise semantic isolation. OVT-SAM synthesizes multilevel features from the large vision model, extracting intermediate characteristic information and addressing global features alongside deep semantic understanding in natural images collected from IoMT-enabled ocular scanners. MuRASE achieves high-fidelity segmentation of ocular images by incorporating contextual recursive attention mechanisms and skip connections with layer-wise reverse connectivity. Extensive experiments show proposed method surpassing 80% Intersection Over Union (IoU) in ocular semantic segmentation on the CelebA-HQ dataset, achieving an IoU of 82.9%, thus exceeding the performance of existing models by 4.9%.
期刊介绍:
The EEE Internet of Things (IoT) Journal publishes articles and review articles covering various aspects of IoT, including IoT system architecture, IoT enabling technologies, IoT communication and networking protocols such as network coding, and IoT services and applications. Topics encompass IoT's impacts on sensor technologies, big data management, and future internet design for applications like smart cities and smart homes. Fields of interest include IoT architecture such as things-centric, data-centric, service-oriented IoT architecture; IoT enabling technologies and systematic integration such as sensor technologies, big sensor data management, and future Internet design for IoT; IoT services, applications, and test-beds such as IoT service middleware, IoT application programming interface (API), IoT application design, and IoT trials/experiments; IoT standardization activities and technology development in different standard development organizations (SDO) such as IEEE, IETF, ITU, 3GPP, ETSI, etc.