Proceedings of the ... International Conference on Image Analysis and Processing. International Conference on Image Analysis and Processing最新文献_第2页

Transformer based Generative Adversarial Network for Liver Segmentation 基于变压器的生成对抗网络肝脏分割

Proceedings of the ... International Conference on Image Analysis and Processing. International Conference on Image Analysis and Processing

Pub Date : 2022-05-01 DOI: 10.48550/arXiv.2205.10663

Ugur Demir, Zheyu Zhang, Bin Wang, M. Antalek, Elif Keles, Debesh Jha, A. Borhani, D. Ladner, Ulas Bagci

Automated liver segmentation from radiology scans (CT, MRI) can improve surgery and therapy planning and follow-up assessment in addition to conventional use for diagnosis and prognosis. Although convolutional neural networks (CNNs) have became the standard image segmentation tasks, more recently this has started to change towards Transformers based architectures because Transformers are taking advantage of capturing long range dependence modeling capability in signals, so called attention mechanism. In this study, we propose a new segmentation approach using a hybrid approach combining the Transformer(s) with the Generative Adversarial Network (GAN) approach. The premise behind this choice is that the self-attention mechanism of the Transformers allows the network to aggregate the high dimensional feature and provide global information modeling. This mechanism provides better segmentation performance compared with traditional methods. Furthermore, we encode this generator into the GAN based architecture so that the discriminator network in the GAN can classify the credibility of the generated segmentation masks compared with the real masks coming from human (expert) annotations. This allows us to extract the high dimensional topology information in the mask for biomedical image segmentation and provide more reliable segmentation results. Our model achieved a high dice coefficient of 0.9433, recall of 0.9515, and precision of 0.9376 and outperformed other Transformer based approaches. The implementation details of the proposed architecture can be found at https://github.com/UgurDemir/tranformer_liver_segmentation.

从放射学扫描(CT, MRI)中自动分割肝脏可以改善手术和治疗计划以及随访评估，除了常规用于诊断和预后。虽然卷积神经网络(cnn)已经成为标准的图像分割任务，但最近它开始向基于变形金刚的架构转变，因为变形金刚利用了捕获信号中的远程依赖建模能力，即所谓的注意力机制。在这项研究中，我们提出了一种新的分割方法，使用混合方法结合变压器(s)和生成对抗网络(GAN)方法。这种选择背后的前提是，变形金刚的自关注机制允许网络聚合高维特征并提供全局信息建模。与传统方法相比，该机制具有更好的分割性能。此外，我们将此生成器编码到基于GAN的架构中，以便GAN中的判别器网络可以将生成的分割掩码与来自人类(专家)注释的真实掩码进行可信度分类。这使得我们可以提取生物医学图像分割中的高维拓扑信息，并提供更可靠的分割结果。我们的模型获得了0.9433的高骰子系数，0.9515的召回率和0.9376的精度，优于其他基于Transformer的方法。建议架构的实现细节可以在https://github.com/UgurDemir/tranformer_liver_segmentation上找到。

{"title":"Transformer based Generative Adversarial Network for Liver Segmentation","authors":"Ugur Demir, Zheyu Zhang, Bin Wang, M. Antalek, Elif Keles, Debesh Jha, A. Borhani, D. Ladner, Ulas Bagci","doi":"10.48550/arXiv.2205.10663","DOIUrl":"https://doi.org/10.48550/arXiv.2205.10663","url":null,"abstract":"Automated liver segmentation from radiology scans (CT, MRI) can improve surgery and therapy planning and follow-up assessment in addition to conventional use for diagnosis and prognosis. Although convolutional neural networks (CNNs) have became the standard image segmentation tasks, more recently this has started to change towards Transformers based architectures because Transformers are taking advantage of capturing long range dependence modeling capability in signals, so called attention mechanism. In this study, we propose a new segmentation approach using a hybrid approach combining the Transformer(s) with the Generative Adversarial Network (GAN) approach. The premise behind this choice is that the self-attention mechanism of the Transformers allows the network to aggregate the high dimensional feature and provide global information modeling. This mechanism provides better segmentation performance compared with traditional methods. Furthermore, we encode this generator into the GAN based architecture so that the discriminator network in the GAN can classify the credibility of the generated segmentation masks compared with the real masks coming from human (expert) annotations. This allows us to extract the high dimensional topology information in the mask for biomedical image segmentation and provide more reliable segmentation results. Our model achieved a high dice coefficient of 0.9433, recall of 0.9515, and precision of 0.9376 and outperformed other Transformer based approaches. The implementation details of the proposed architecture can be found at https://github.com/UgurDemir/tranformer_liver_segmentation.","PeriodicalId":74527,"journal":{"name":"Proceedings of the ... International Conference on Image Analysis and Processing. International Conference on Image Analysis and Processing","volume":"27 1","pages":"340-347"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78950116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Transformer based Generative Adversarial Network for Liver Segmentation. 基于变压器的生成式对抗网络用于肝脏分割。

Proceedings of the ... International Conference on Image Analysis and Processing. International Conference on Image Analysis and Processing

Pub Date : 2022-05-01 Epub Date: 2022-08-04 DOI: 10.1007/978-3-031-13324-4_29

Ugur Demir, Zheyuan Zhang, Bin Wang, Matthew Antalek, Elif Keles, Debesh Jha, Amir Borhani, Daniela Ladner, Ulas Bagci

Automated liver segmentation from radiology scans (CT, MRI) can improve surgery and therapy planning and follow-up assessment in addition to conventional use for diagnosis and prognosis. Although convolutional neural networks (CNNs) have became the standard image segmentation tasks, more recently this has started to change towards Transformers based architectures because Transformers are taking advantage of capturing long range dependence modeling capability in signals, so called attention mechanism. In this study, we propose a new segmentation approach using a hybrid approach combining the Transformer(s) with the Generative Adversarial Network (GAN) approach. The premise behind this choice is that the self-attention mechanism of the Transformers allows the network to aggregate the high dimensional feature and provide global information modeling. This mechanism provides better segmentation performance compared with traditional methods. Furthermore, we encode this generator into the GAN based architecture so that the discriminator network in the GAN can classify the credibility of the generated segmentation masks compared with the real masks coming from human (expert) annotations. This allows us to extract the high dimensional topology information in the mask for biomedical image segmentation and provide more reliable segmentation results. Our model achieved a high dice coefficient of 0.9433, recall of 0.9515, and precision of 0.9376 and outperformed other Transformer based approaches. The implementation details of the proposed architecture can be found at https://github.com/UgurDemir/tranformer_liver_segmentation.

从放射学扫描（CT、MRI）中自动分割肝脏，除了用于传统的诊断和预后外，还能改善手术和治疗计划以及后续评估。虽然卷积神经网络（CNN）已成为标准的图像分割任务，但最近这种情况已开始向基于变形器的架构转变，因为变形器正在利用捕捉信号中的长距离依赖建模能力，即所谓的注意力机制。在本研究中，我们提出了一种新的分割方法，使用变形器与生成对抗网络（GAN）相结合的混合方法。选择这种方法的前提是，变换器的自我注意机制允许网络聚合高维特征并提供全局信息建模。与传统方法相比，这种机制能提供更好的分割性能。此外，我们将这种生成器编码到基于 GAN 的架构中，这样 GAN 中的鉴别器网络就能对生成的分割掩码与来自人类（专家）注释的真实掩码的可信度进行分类。这使我们能够提取掩膜中的高维拓扑信息用于生物医学图像分割，并提供更可靠的分割结果。我们的模型获得了 0.9433 的高骰子系数、0.9515 的召回率和 0.9376 的精确度，表现优于其他基于变换器的方法。建议架构的实现细节请访问 https://github.com/UgurDemir/tranformer_liver_segmentation。

{"title":"Transformer based Generative Adversarial Network for Liver Segmentation.","authors":"Ugur Demir, Zheyuan Zhang, Bin Wang, Matthew Antalek, Elif Keles, Debesh Jha, Amir Borhani, Daniela Ladner, Ulas Bagci","doi":"10.1007/978-3-031-13324-4_29","DOIUrl":"10.1007/978-3-031-13324-4_29","url":null,"abstract":"<p><p>Automated liver segmentation from radiology scans (CT, MRI) can improve surgery and therapy planning and follow-up assessment in addition to conventional use for diagnosis and prognosis. Although convolutional neural networks (CNNs) have became the standard image segmentation tasks, more recently this has started to change towards Transformers based architectures because Transformers are taking advantage of capturing long range dependence modeling capability in signals, so called attention mechanism. In this study, we propose a new segmentation approach using a hybrid approach combining the Transformer(s) with the Generative Adversarial Network (GAN) approach. The premise behind this choice is that the self-attention mechanism of the Transformers allows the network to aggregate the high dimensional feature and provide global information modeling. This mechanism provides better segmentation performance compared with traditional methods. Furthermore, we encode this generator into the GAN based architecture so that the discriminator network in the GAN can classify the credibility of the generated segmentation masks compared with the real masks coming from human (expert) annotations. This allows us to extract the high dimensional topology information in the mask for biomedical image segmentation and provide more reliable segmentation results. Our model achieved a high dice coefficient of 0.9433, recall of 0.9515, and precision of 0.9376 and outperformed other Transformer based approaches. The implementation details of the proposed architecture can be found at https://github.com/UgurDemir/tranformer_liver_segmentation.</p>","PeriodicalId":74527,"journal":{"name":"Proceedings of the ... International Conference on Image Analysis and Processing. International Conference on Image Analysis and Processing","volume":"13374 ","pages":"340-347"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9894332/pdf/nihms-1866463.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10718779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FasterVideo: Efficient Online Joint Object Detection And Tracking 快速视频:高效的在线联合目标检测和跟踪

Proceedings of the ... International Conference on Image Analysis and Processing. International Conference on Image Analysis and Processing

Pub Date : 2022-04-15 DOI: 10.1007/978-3-031-06433-3_32

Issa Mouawad, F. Odone

引用次数: 3

Egocentric Human-Object Interaction Detection Exploiting Synthetic Data 基于合成数据的自我中心人-物交互检测

Proceedings of the ... International Conference on Image Analysis and Processing. International Conference on Image Analysis and Processing

Pub Date : 2022-04-14 DOI: 10.48550/arXiv.2204.07061

Rosario Leonardi, F. Ragusa, Antonino Furnari, G. Farinella

We consider the problem of detecting Egocentric HumanObject Interactions (EHOIs) in industrial contexts. Since collecting and labeling large amounts of real images is challenging, we propose a pipeline and a tool to generate photo-realistic synthetic First Person Vision (FPV) images automatically labeled for EHOI detection in a specific industrial scenario. To tackle the problem of EHOI detection, we propose a method that detects the hands, the objects in the scene, and determines which objects are currently involved in an interaction. We compare the performance of our method with a set of state-of-the-art baselines. Results show that using a synthetic dataset improves the performance of an EHOI detection system, especially when few real data are available. To encourage research on this topic, we publicly release the proposed dataset at the following url: https://iplab.dmi.unict.it/EHOI_SYNTH/.

我们考虑了在工业环境中检测以自我为中心的人物交互(EHOIs)的问题。由于收集和标记大量真实图像具有挑战性，我们提出了一个管道和工具来生成逼真的合成第一人称视觉(FPV)图像，自动标记用于特定工业场景中的EHOI检测。为了解决EHOI检测问题，我们提出了一种方法，该方法可以检测手，场景中的物体，并确定当前参与交互的物体。我们将我们的方法的性能与一组最先进的基线进行比较。结果表明，使用合成数据集可以提高EHOI检测系统的性能，特别是在真实数据较少的情况下。为了鼓励对这一主题的研究，我们在以下url上公开发布了建议的数据集:https://iplab.dmi.unict.it/EHOI_SYNTH/。

引用次数: 11

Weakly Supervised Attended Object Detection Using Gaze Data as Annotations 使用注视数据作为注释的弱监督参与对象检测

Proceedings of the ... International Conference on Image Analysis and Processing. International Conference on Image Analysis and Processing

Pub Date : 2022-04-14 DOI: 10.48550/arXiv.2204.07090

Michele Mazzamuto, F. Ragusa, Antonino Furnari, G. Signorello, G. Farinella

We consider the problem of detecting and recognizing the objects observed by visitors (i.e., attended objects) in cultural sites from egocentric vision. A standard approach to the problem involves detecting all objects and selecting the one which best overlaps with the gaze of the visitor, measured through a gaze tracker. Since labeling large amounts of data to train a standard object detector is expensive in terms of costs and time, we propose a weakly supervised version of the task which leans only on gaze data and a frame-level label indicating the class of the attended object. To study the problem, we present a new dataset composed of egocentric videos and gaze coordinates of subjects visiting a museum. We hence compare three different baselines for weakly supervised attended object detection on the collected data. Results show that the considered approaches achieve satisfactory performance in a weakly supervised manner, which allows for significant time savings with respect to a fully supervised detector based on Faster R-CNN. To encourage research on the topic, we publicly release the code and the dataset at the following url: https://iplab.dmi.unict.it/WS_OBJ_DET/

我们从自我中心的视角来考虑文化场所中游客观察到的物体(即被关注的物体)的检测和识别问题。解决这个问题的标准方法包括检测所有物体，并选择一个与访问者的目光最重叠的物体，通过凝视跟踪器进行测量。由于标记大量数据来训练标准目标检测器在成本和时间上都是昂贵的，我们提出了一个弱监督版本的任务，它只依赖于注视数据和一个指示被关注对象类别的帧级标签。为了研究这个问题，我们提出了一个新的数据集，该数据集由以自我为中心的视频和参观博物馆的受试者的凝视坐标组成。因此，我们比较了三种不同的基线弱监督出席对象检测收集的数据。结果表明，所考虑的方法在弱监督方式下取得了令人满意的性能，相对于基于Faster R-CNN的完全监督检测器，可以节省大量时间。为了鼓励对该主题的研究，我们在以下url上公开发布代码和数据集:https://iplab.dmi.unict.it/WS_OBJ_DET/

{"title":"Weakly Supervised Attended Object Detection Using Gaze Data as Annotations","authors":"Michele Mazzamuto, F. Ragusa, Antonino Furnari, G. Signorello, G. Farinella","doi":"10.48550/arXiv.2204.07090","DOIUrl":"https://doi.org/10.48550/arXiv.2204.07090","url":null,"abstract":"We consider the problem of detecting and recognizing the objects observed by visitors (i.e., attended objects) in cultural sites from egocentric vision. A standard approach to the problem involves detecting all objects and selecting the one which best overlaps with the gaze of the visitor, measured through a gaze tracker. Since labeling large amounts of data to train a standard object detector is expensive in terms of costs and time, we propose a weakly supervised version of the task which leans only on gaze data and a frame-level label indicating the class of the attended object. To study the problem, we present a new dataset composed of egocentric videos and gaze coordinates of subjects visiting a museum. We hence compare three different baselines for weakly supervised attended object detection on the collected data. Results show that the considered approaches achieve satisfactory performance in a weakly supervised manner, which allows for significant time savings with respect to a fully supervised detector based on Faster R-CNN. To encourage research on the topic, we publicly release the code and the dataset at the following url: https://iplab.dmi.unict.it/WS_OBJ_DET/","PeriodicalId":74527,"journal":{"name":"Proceedings of the ... International Conference on Image Analysis and Processing. International Conference on Image Analysis and Processing","volume":"78 1","pages":"263-274"},"PeriodicalIF":0.0,"publicationDate":"2022-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89632364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Underwater Image Enhancement Using Pre-trained Transformer 水下图像增强使用预训练变压器

Proceedings of the ... International Conference on Image Analysis and Processing. International Conference on Image Analysis and Processing

Pub Date : 2022-04-08 DOI: 10.48550/arXiv.2204.04199

Abderrahmene Boudiaf, Yu Guo, Adarsh Ghimire, N. Werghi, G. Masi, S. Javed, J. Dias

The goal of this work is to apply a denoising image transformer to remove the distortion from underwater images and compare it with other similar approaches. Automatic restoration of underwater images plays an important role since it allows to increase the quality of the images, without the need for more expensive equipment. This is a critical example of the important role of the machine learning algorithms to support marine exploration and monitoring, reducing the need for human intervention like the manual processing of the images, thus saving time, effort, and cost. This paper is the first application of the image transformer-based approach called"Pre-Trained Image Processing Transformer"to underwater images. This approach is tested on the UFO-120 dataset, containing 1500 images with the corresponding clean images.

本工作的目标是应用去噪图像转换器来去除水下图像的失真，并将其与其他类似方法进行比较。水下图像的自动恢复起着重要的作用，因为它可以提高图像的质量，而不需要更昂贵的设备。这是机器学习算法在支持海洋勘探和监测方面发挥重要作用的一个重要例子，减少了对人工干预(如手动处理图像)的需求，从而节省了时间、精力和成本。本文首次将基于图像变换的“预训练图像处理变换”方法应用于水下图像。该方法在UFO-120数据集上进行了测试，该数据集包含1500张图像和相应的干净图像。

引用次数: 2

Engagement Detection with Multi-Task Training in E-Learning Environments 电子学习环境下多任务训练的敬业度检测

Proceedings of the ... International Conference on Image Analysis and Processing. International Conference on Image Analysis and Processing

Pub Date : 2022-04-08 DOI: 10.48550/arXiv.2204.04020

Onur Çopur, Mert Nakıp, Simone Scardapane, Jürgen Slowack

Recognition of user interaction, in particular engagement detection, became highly crucial for online working and learning environments, especially during the COVID-19 outbreak. Such recognition and detection systems significantly improve the user experience and efficiency by providing valuable feedback. In this paper, we propose a novel Engagement Detection with Multi-Task Training (ED-MTT) system which minimizes mean squared error and triplet loss together to determine the engagement level of students in an e-learning environment. The performance of this system is evaluated and compared against the state-of-the-art on a publicly available dataset as well as videos collected from real-life scenarios. The results show that ED-MTT achieves 6 % lower MSE than the best state-of-the-art performance with highly acceptable training time and lightweight feature extraction. © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.

识别用户互动，特别是参与检测，对在线工作和学习环境至关重要，特别是在2019冠状病毒病爆发期间。这种识别和检测系统通过提供有价值的反馈，显著改善了用户体验和效率。在本文中，我们提出了一种新的多任务训练参与检测(ED-MTT)系统，该系统可以最大限度地减少均方误差和三重损失，从而确定学生在电子学习环境中的参与水平。对该系统的性能进行评估，并与公开可用的数据集以及从现实场景中收集的视频进行比较。结果表明，ED-MTT在具有高度可接受的训练时间和轻量级特征提取的情况下，其MSE比最先进的性能低6%。©2022，作者获得施普林格自然瑞士股份有限公司的独家授权。

引用次数: 4

Online panoptic 3D reconstruction as a Linear Assignment Problem 作为线性分配问题的在线全视三维重建

Proceedings of the ... International Conference on Image Analysis and Processing. International Conference on Image Analysis and Processing

Pub Date : 2022-04-01 DOI: 10.48550/arXiv.2204.00231

Leevi Raivio, Esa Rahtu

. Real-time holistic scene understanding would allow machines to interpret their surrounding in a much more detailed manner than is currently possible. While panoptic image segmentation methods have brought image segmentation closer to this goal, this information has to be described relative to the 3D environment for the machine to be able to utilise it eﬀectively. In this paper, we investigate methods for sequentially reconstructing static environments from panoptic image segmentations in 3D. We speciﬁcally target real-time operation: the algorithm must process data strictly online and be able to run at relatively fast frame rates. Additionally, the method should be scalable for environments large enough for practical applications. By applying a simple but powerful data-association algorithm, we outperform earlier similar works when operating purely online. Our method is also capable of reaching frame-rates high enough for real-time applications and is scalable to larger environments as well. Source code and further demonstrations are released to the public at: https://tutvision.github.io/Online-Panoptic-3D/

．实时整体场景理解将使机器能够以比目前更详细的方式解释周围环境。虽然全光图像分割方法使图像分割更接近这一目标，但必须相对于3D环境描述这些信息，以便机器能够有效地利用它。在本文中，我们研究了从三维全景图像分割中顺序重建静态环境的方法。我们特别针对实时操作:算法必须严格在线处理数据，并且能够以相对较快的帧率运行。此外，该方法应该可扩展到足够大的实际应用环境。通过应用简单但功能强大的数据关联算法，我们在纯在线操作时优于早期的类似工作。我们的方法还能够达到足够高的实时应用程序帧率，并且可以扩展到更大的环境中。源代码和进一步的演示在:https://tutvision.github.io/Online-Panoptic-3D/上向公众发布

{"title":"Online panoptic 3D reconstruction as a Linear Assignment Problem","authors":"Leevi Raivio, Esa Rahtu","doi":"10.48550/arXiv.2204.00231","DOIUrl":"https://doi.org/10.48550/arXiv.2204.00231","url":null,"abstract":". Real-time holistic scene understanding would allow machines to interpret their surrounding in a much more detailed manner than is currently possible. While panoptic image segmentation methods have brought image segmentation closer to this goal, this information has to be described relative to the 3D environment for the machine to be able to utilise it eﬀectively. In this paper, we investigate methods for sequentially reconstructing static environments from panoptic image segmentations in 3D. We speciﬁcally target real-time operation: the algorithm must process data strictly online and be able to run at relatively fast frame rates. Additionally, the method should be scalable for environments large enough for practical applications. By applying a simple but powerful data-association algorithm, we outperform earlier similar works when operating purely online. Our method is also capable of reaching frame-rates high enough for real-time applications and is scalable to larger environments as well. Source code and further demonstrations are released to the public at: https://tutvision.github.io/Online-Panoptic-3D/","PeriodicalId":74527,"journal":{"name":"Proceedings of the ... International Conference on Image Analysis and Processing. International Conference on Image Analysis and Processing","volume":"43 1","pages":"39-50"},"PeriodicalIF":0.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81398832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Medicinal Boxes Recognition on a Deep Transfer Learning Augmented Reality Mobile Application 基于深度迁移学习增强现实移动应用的药盒识别

Proceedings of the ... International Conference on Image Analysis and Processing. International Conference on Image Analysis and Processing

Pub Date : 2022-03-26 DOI: 10.48550/arXiv.2203.14031

D. Avola, L. Cinque, Alessio Fagioli, G. Foresti, Marco Raoul Marini, Alessio Mecca, D. Pannone

Taking medicines is a fundamental aspect to cure illnesses. However, studies have shown that it can be hard for patients to remember the correct posology. More aggravating, a wrong dosage generally causes the disease to worsen. Although, all relevant instructions for a medicine are summarized in the corresponding patient information leaflet, the latter is generally difficult to navigate and understand. To address this problem and help patients with their medication, in this paper we introduce an augmented reality mobile application that can present to the user important details on the framed medicine. In particular, the app implements an inference engine based on a deep neural network, i.e., a densenet, fine-tuned to recognize a medicinal from its package. Subsequently, relevant information, such as posology or a simplified leaflet, is overlaid on the camera feed to help a patient when taking a medicine. Extensive experiments to select the best hyperparameters were performed on a dataset specifically collected to address this task; ultimately obtaining up to 91.30% accuracy as well as real-time capabilities.

服药是治疗疾病的一个基本方面。然而，研究表明，患者很难记住正确的发音。更严重的是，错误的剂量通常会导致疾病恶化。尽管一种药物的所有相关说明都总结在相应的患者信息小册子中，但后者通常难以浏览和理解。为了解决这个问题并帮助患者用药，在本文中，我们介绍了一个增强现实移动应用程序，可以向用户展示框架药物的重要细节。特别是，该应用程序实现了基于深度神经网络(即密集网络)的推理引擎，经过微调，可以从包装中识别药物。随后，相关信息，如动物学或简化的传单，被覆盖在相机馈送，以帮助患者服用药物。在专门收集的数据集上进行了广泛的实验，以选择最佳的超参数来解决此任务;最终获得高达91.30%的准确度和实时性。

{"title":"Medicinal Boxes Recognition on a Deep Transfer Learning Augmented Reality Mobile Application","authors":"D. Avola, L. Cinque, Alessio Fagioli, G. Foresti, Marco Raoul Marini, Alessio Mecca, D. Pannone","doi":"10.48550/arXiv.2203.14031","DOIUrl":"https://doi.org/10.48550/arXiv.2203.14031","url":null,"abstract":"Taking medicines is a fundamental aspect to cure illnesses. However, studies have shown that it can be hard for patients to remember the correct posology. More aggravating, a wrong dosage generally causes the disease to worsen. Although, all relevant instructions for a medicine are summarized in the corresponding patient information leaflet, the latter is generally difficult to navigate and understand. To address this problem and help patients with their medication, in this paper we introduce an augmented reality mobile application that can present to the user important details on the framed medicine. In particular, the app implements an inference engine based on a deep neural network, i.e., a densenet, fine-tuned to recognize a medicinal from its package. Subsequently, relevant information, such as posology or a simplified leaflet, is overlaid on the camera feed to help a patient when taking a medicine. Extensive experiments to select the best hyperparameters were performed on a dataset specifically collected to address this task; ultimately obtaining up to 91.30% accuracy as well as real-time capabilities.","PeriodicalId":74527,"journal":{"name":"Proceedings of the ... International Conference on Image Analysis and Processing. International Conference on Image Analysis and Processing","volume":"75 1","pages":"489-499"},"PeriodicalIF":0.0,"publicationDate":"2022-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90784289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deepfake Style Transfer Mixture: a First Forensic Ballistics Study on Synthetic Images Deepfake风格转移混合物:合成图像的首次法医弹道研究

Proceedings of the ... International Conference on Image Analysis and Processing. International Conference on Image Analysis and Processing

Pub Date : 2022-03-18 DOI: 10.1007/978-3-031-06430-2_13

Luca Guarnera, O. Giudice, S. Battiato

引用次数: 3