{"title":"[基于共注意力网络的医学视觉问题解答方法]。","authors":"Wencheng Cui, Wentao Shi, Hong Shao","doi":"10.7507/1001-5515.202307057","DOIUrl":null,"url":null,"abstract":"<p><p>Recent studies have introduced attention models for medical visual question answering (MVQA). In medical research, not only is the modeling of \"visual attention\" crucial, but the modeling of \"question attention\" is equally significant. To facilitate bidirectional reasoning in the attention processes involving medical images and questions, a new MVQA architecture, named MCAN, has been proposed. This architecture incorporated a cross-modal co-attention network, FCAF, which identifies key words in questions and principal parts in images. Through a meta-learning channel attention module (MLCA), weights were adaptively assigned to each word and region, reflecting the model's focus on specific words and regions during reasoning. Additionally, this study specially designed and developed a medical domain-specific word embedding model, Med-GloVe, to further enhance the model's accuracy and practical value. Experimental results indicated that MCAN proposed in this study improved the accuracy by 7.7% on free-form questions in the Path-VQA dataset, and by 4.4% on closed-form questions in the VQA-RAD dataset, which effectively improves the accuracy of the medical vision question answer.</p>","PeriodicalId":39324,"journal":{"name":"生物医学工程学杂志","volume":"41 3","pages":"560-568"},"PeriodicalIF":0.0000,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11208638/pdf/","citationCount":"0","resultStr":"{\"title\":\"[A medical visual question answering approach based on co-attention networks].\",\"authors\":\"Wencheng Cui, Wentao Shi, Hong Shao\",\"doi\":\"10.7507/1001-5515.202307057\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Recent studies have introduced attention models for medical visual question answering (MVQA). In medical research, not only is the modeling of \\\"visual attention\\\" crucial, but the modeling of \\\"question attention\\\" is equally significant. To facilitate bidirectional reasoning in the attention processes involving medical images and questions, a new MVQA architecture, named MCAN, has been proposed. This architecture incorporated a cross-modal co-attention network, FCAF, which identifies key words in questions and principal parts in images. Through a meta-learning channel attention module (MLCA), weights were adaptively assigned to each word and region, reflecting the model's focus on specific words and regions during reasoning. Additionally, this study specially designed and developed a medical domain-specific word embedding model, Med-GloVe, to further enhance the model's accuracy and practical value. Experimental results indicated that MCAN proposed in this study improved the accuracy by 7.7% on free-form questions in the Path-VQA dataset, and by 4.4% on closed-form questions in the VQA-RAD dataset, which effectively improves the accuracy of the medical vision question answer.</p>\",\"PeriodicalId\":39324,\"journal\":{\"name\":\"生物医学工程学杂志\",\"volume\":\"41 3\",\"pages\":\"560-568\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11208638/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"生物医学工程学杂志\",\"FirstCategoryId\":\"1087\",\"ListUrlMain\":\"https://doi.org/10.7507/1001-5515.202307057\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"生物医学工程学杂志","FirstCategoryId":"1087","ListUrlMain":"https://doi.org/10.7507/1001-5515.202307057","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Medicine","Score":null,"Total":0}
[A medical visual question answering approach based on co-attention networks].
Recent studies have introduced attention models for medical visual question answering (MVQA). In medical research, not only is the modeling of "visual attention" crucial, but the modeling of "question attention" is equally significant. To facilitate bidirectional reasoning in the attention processes involving medical images and questions, a new MVQA architecture, named MCAN, has been proposed. This architecture incorporated a cross-modal co-attention network, FCAF, which identifies key words in questions and principal parts in images. Through a meta-learning channel attention module (MLCA), weights were adaptively assigned to each word and region, reflecting the model's focus on specific words and regions during reasoning. Additionally, this study specially designed and developed a medical domain-specific word embedding model, Med-GloVe, to further enhance the model's accuracy and practical value. Experimental results indicated that MCAN proposed in this study improved the accuracy by 7.7% on free-form questions in the Path-VQA dataset, and by 4.4% on closed-form questions in the VQA-RAD dataset, which effectively improves the accuracy of the medical vision question answer.