首页 > 最新文献

中国图象图形学报最新文献

英文 中文
Residual Neural Networks for Human Action Recognition from RGB-D Videos 用于从 RGB-D 视频识别人体动作的残差神经网络
Q3 Computer Science Pub Date : 2023-12-01 DOI: 10.18178/joig.11.4.343-352
K. V. Subbareddy, B. P. Pavani, G. Sowmya, N. Ramadevi
Recently, the RGB-D based Human Action Recognition (HAR) has gained significant research attention due to the provision of complimentary information by different data modalities. However, the current models have experienced still unsatisfactory results due to several problems including noises and view point variations between different actions. To sort out these problems, this paper proposes two new action descriptors namely Modified Depth Motion Map (MDMM) and Spherical Redundant Joint Descriptor (SRJD). MDMM eliminates the noises from depth maps and preserves only the action related information. Further SRJD ensures resilience against view point variations and reduces the misclassifications between different actions with similar view properties. Further, to maximize the recognition accuracy, standard deep learning algorithm called as Residual Neural Network (ResNet) is used to train the system through the features extracted from MDMM and SRJD. Simulation experiments prove that the multiple data modalities are better than single data modality. The proposed approach was tested on two public datasets namely NTURGB+D dataset and UTD-MHAD dataset. The testing results declare that the proposed approach is superior to the earlier HAR methods. On an average, the proposed system gained an accuracy of 90.0442% and 92.3850% at Cross-subject and Cross-view validations respectively.
近年来,基于RGB-D的人类行为识别(HAR)由于不同的数据模式提供了互补的信息而获得了重要的研究关注。然而,由于噪声和不同动作之间的视点变化等问题,目前的模型仍然不能令人满意。为了解决这些问题,本文提出了两种新的动作描述符,即改进深度运动映射(MDMM)和球面冗余关节描述符(SRJD)。MDMM消除了深度图中的噪声,只保留了与动作相关的信息。此外,SRJD确保了对视点变化的弹性,并减少了具有相似视图属性的不同操作之间的错误分类。此外,为了最大限度地提高识别精度,使用残差神经网络(ResNet)标准深度学习算法,通过从MDMM和SRJD中提取的特征对系统进行训练。仿真实验证明,多数据模式优于单一数据模式。在NTURGB+D数据集和UTD-MHAD数据集两个公共数据集上对该方法进行了测试。测试结果表明,该方法优于先前的HAR方法。在交叉主题和交叉视角验证中,该系统的平均准确率分别为90.0442%和92.3850%。
{"title":"Residual Neural Networks for Human Action Recognition from RGB-D Videos","authors":"K. V. Subbareddy, B. P. Pavani, G. Sowmya, N. Ramadevi","doi":"10.18178/joig.11.4.343-352","DOIUrl":"https://doi.org/10.18178/joig.11.4.343-352","url":null,"abstract":"Recently, the RGB-D based Human Action Recognition (HAR) has gained significant research attention due to the provision of complimentary information by different data modalities. However, the current models have experienced still unsatisfactory results due to several problems including noises and view point variations between different actions. To sort out these problems, this paper proposes two new action descriptors namely Modified Depth Motion Map (MDMM) and Spherical Redundant Joint Descriptor (SRJD). MDMM eliminates the noises from depth maps and preserves only the action related information. Further SRJD ensures resilience against view point variations and reduces the misclassifications between different actions with similar view properties. Further, to maximize the recognition accuracy, standard deep learning algorithm called as Residual Neural Network (ResNet) is used to train the system through the features extracted from MDMM and SRJD. Simulation experiments prove that the multiple data modalities are better than single data modality. The proposed approach was tested on two public datasets namely NTURGB+D dataset and UTD-MHAD dataset. The testing results declare that the proposed approach is superior to the earlier HAR methods. On an average, the proposed system gained an accuracy of 90.0442% and 92.3850% at Cross-subject and Cross-view validations respectively.","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138621059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatial Pyramid Attention Enhanced Visual Descriptors for Landmark Retrieval 用于地标检索的空间金字塔注意力增强视觉描述符
Q3 Computer Science Pub Date : 2023-12-01 DOI: 10.18178/joig.11.4.359-366
Luepol Pipanmekaporn, Suwatchai Kamonsantiroj, Chiabwoot Ratanavilisagul, Sathit Prasomphan
Landmark retrieval, which aims to search for landmark images similar to a query photo within a massive image database, has received considerable attention for many years. Despite this, finding landmarks quickly and accurately still presents some unique challenges. To tackle these challenges, we present a deep learning model, called the Spatial-Pyramid Attention network (SPA). This network is an end-to-end convolutional network, incorporating a spatial-pyramid attention layer that encodes the input image, leveraging the spatial pyramid structure to highlight regional features based on their relative spatial distinctiveness. An image descriptor is then generated by aggregating these regional features. According to our experiments on benchmark datasets including Oxford5k, Paris6k, and Landmark-100, our proposed model, SPA, achieves mean Average Precision (mAP) accuracy of 85.3% with the Oxford dataset, 89.6% with the Paris dataset, and 80.4% in the Landmark-100 dataset, outperforming existing state-of-theart deep image retrieval models.
地标检索(Landmark retrieval)是一种在海量图像数据库中搜索与查询照片相似的地标图像的方法,多年来一直受到人们的广泛关注。尽管如此,快速准确地找到地标仍然面临着一些独特的挑战。为了应对这些挑战,我们提出了一个深度学习模型,称为空间金字塔注意力网络(SPA)。该网络是一个端到端的卷积网络,包含一个空间金字塔关注层,该层对输入图像进行编码,利用空间金字塔结构根据区域特征的相对空间独特性来突出区域特征。然后通过聚合这些区域特征生成图像描述符。通过对牛津5k、巴黎6k和Landmark-100等基准数据集的实验,我们提出的SPA模型在牛津数据集、巴黎数据集和Landmark-100数据集上的平均精度(mAP)分别达到85.3%、89.6%和80.4%,优于现有的最先进的深度图像检索模型。
{"title":"Spatial Pyramid Attention Enhanced Visual Descriptors for Landmark Retrieval","authors":"Luepol Pipanmekaporn, Suwatchai Kamonsantiroj, Chiabwoot Ratanavilisagul, Sathit Prasomphan","doi":"10.18178/joig.11.4.359-366","DOIUrl":"https://doi.org/10.18178/joig.11.4.359-366","url":null,"abstract":"Landmark retrieval, which aims to search for landmark images similar to a query photo within a massive image database, has received considerable attention for many years. Despite this, finding landmarks quickly and accurately still presents some unique challenges. To tackle these challenges, we present a deep learning model, called the Spatial-Pyramid Attention network (SPA). This network is an end-to-end convolutional network, incorporating a spatial-pyramid attention layer that encodes the input image, leveraging the spatial pyramid structure to highlight regional features based on their relative spatial distinctiveness. An image descriptor is then generated by aggregating these regional features. According to our experiments on benchmark datasets including Oxford5k, Paris6k, and Landmark-100, our proposed model, SPA, achieves mean Average Precision (mAP) accuracy of 85.3% with the Oxford dataset, 89.6% with the Paris dataset, and 80.4% in the Landmark-100 dataset, outperforming existing state-of-theart deep image retrieval models.","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"3 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138625359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating Performances of Attention-Based Merge Architecture Models for Image Captioning in Indian Languages 基于注意力的合并架构模型在印度语言图像字幕中的性能评估
Q3 Computer Science Pub Date : 2023-09-01 DOI: 10.18178/joig.11.3.294-301
Rahul Tangsali, Swapnil Chhatre, Soham Naik, Pranav Bhagwat, Geetanjali Kale
Image captioning is a growing topic of research in which numerous advancements have been made in the past few years. Deep learning methods have been used extensively for generating textual descriptions of image data. In addition, attention-based image captioning mechanisms have also been proposed, which give state-ofthe- art results in image captioning. However, many applications and analyses of these methodologies have not been made in the case of languages from the Indian subcontinent. This paper presents attention-based merge architecture models to achieve accurate captions of images in four Indian languages- Marathi, Kannada, Malayalam, and Tamil. The widely known Flickr8K dataset was used for this project. Pre-trained Convolutional Neural Network (CNN) models and language decoder attention models were implemented, which serve as the components of the mergearchitecture proposed here. Finally, the accuracy of the generated captions was compared against the gold captions using Bilingual Evaluation Understudy (BLEU) as an evaluation metric. It was observed that the merge architectures consisting of InceptionV3 give the best results for the languages we test on, the scores discussed in the paper. Highest BLEU-1 scores obtained for each language were: 0.4939 for Marathi, 0.4557 for Kannada, 0.5082 for Malayalam, and 0.5201 for Tamil. Our proposed architectures gave much higher scores than other architectures implemented for these languages.
图像字幕是一个不断发展的研究课题,在过去几年中取得了许多进展。深度学习方法已被广泛用于生成图像数据的文本描述。此外,基于注意力的图像字幕机制也被提出,提供了最先进的图像字幕效果。然而,这些方法的许多应用和分析还没有在印度次大陆的语言中进行。本文提出了基于注意力的合并架构模型,以实现四种印度语言(马拉地语、卡纳达语、马拉雅拉姆语和泰米尔语)图像的准确字幕。这个项目使用了广为人知的Flickr8K数据集。实现了预训练卷积神经网络(CNN)模型和语言解码器注意模型,它们是本文提出的合并架构的组成部分。最后,使用双语评价替补(BLEU)作为评价指标,将生成的字幕与黄金字幕的准确性进行比较。我们观察到,由InceptionV3组成的合并体系结构为我们测试的语言提供了最好的结果,论文中讨论了分数。每种语言获得的最高布鲁-1分数为:马拉地语0.4939,卡纳达语0.4557,马拉雅拉姆语0.5082,泰米尔语0.5201。我们提出的体系结构比为这些语言实现的其他体系结构得分高得多。
{"title":"Evaluating Performances of Attention-Based Merge Architecture Models for Image Captioning in Indian Languages","authors":"Rahul Tangsali, Swapnil Chhatre, Soham Naik, Pranav Bhagwat, Geetanjali Kale","doi":"10.18178/joig.11.3.294-301","DOIUrl":"https://doi.org/10.18178/joig.11.3.294-301","url":null,"abstract":"Image captioning is a growing topic of research in which numerous advancements have been made in the past few years. Deep learning methods have been used extensively for generating textual descriptions of image data. In addition, attention-based image captioning mechanisms have also been proposed, which give state-ofthe- art results in image captioning. However, many applications and analyses of these methodologies have not been made in the case of languages from the Indian subcontinent. This paper presents attention-based merge architecture models to achieve accurate captions of images in four Indian languages- Marathi, Kannada, Malayalam, and Tamil. The widely known Flickr8K dataset was used for this project. Pre-trained Convolutional Neural Network (CNN) models and language decoder attention models were implemented, which serve as the components of the mergearchitecture proposed here. Finally, the accuracy of the generated captions was compared against the gold captions using Bilingual Evaluation Understudy (BLEU) as an evaluation metric. It was observed that the merge architectures consisting of InceptionV3 give the best results for the languages we test on, the scores discussed in the paper. Highest BLEU-1 scores obtained for each language were: 0.4939 for Marathi, 0.4557 for Kannada, 0.5082 for Malayalam, and 0.5201 for Tamil. Our proposed architectures gave much higher scores than other architectures implemented for these languages.","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79904335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepEar: A Deep Convolutional Network without Deformation for Ear Segmentation DeepEar:一种无变形的深度卷积耳分割网络
Q3 Computer Science Pub Date : 2023-09-01 DOI: 10.18178/joig.11.3.242-247
Yuhan Chen, Wende Ke, Qingfeng Li, Dongxin Lu, Yani Bai, Zhen Wang
With the cross-application of robotics in various fields, machine vision has gradually received attention. As an important part in machine vision, image segmentation has been widely applied especially in biomedical image segmentation, and many algorithms in image segmentation have been proposed in recent years. Nowadays, traditional Chinese medicine gradually received attention and ear diagnosis plays an important role in traditional Chinese medicine, the demand for automation in ear diagnosis becomes gradually intense. This paper proposed a deep convolution network for ear segmentation (DeepEar), which combined spatial pyramid block and the encoder-decoder architecture, besides, atrous convolutional layers are applied throughout the network. Noteworthy, the output ear image from DeepEar has the same size as input images. Experiments shows that this paper proposed DeepEar has great capability in ear segmentation and obtained complete ear with less excess region. Segmentation results from the proposed network obtained Accuracy = 0.9915, Precision = 0.9762, Recal l= 9.9723, Harmonic measure = 0.9738 and Specificity = 0.9955, which performed much better than other Convolution Neural Network (CNN)- based methods in quantitative evaluation. Besides, this paper proposed network basically completed ear-armor segmentation, further validated the capability of the proposed network.
随着机器人技术在各个领域的交叉应用,机器视觉逐渐受到重视。图像分割作为机器视觉的重要组成部分,在生物医学图像分割中得到了广泛的应用,近年来提出了许多用于图像分割的算法。如今,中医逐渐受到重视,耳科诊断在中医中占有重要地位,耳科诊断自动化的需求也逐渐强烈。本文提出了一种深度卷积耳分割网络(DeepEar),该网络将空间金字塔块与编码器-解码器结构相结合,并在整个网络中应用了非均匀卷积层。值得注意的是,DeepEar输出的耳朵图像与输入图像具有相同的大小。实验表明,本文提出的DeepEar算法具有较强的耳区分割能力,能够得到完整的耳区,多余区域较少。该网络的分割结果精度为0.9915,精度为0.9762,Recal为9.9723,谐波测度为0.9738,特异性为0.9955,在定量评价方面明显优于其他基于卷积神经网络(CNN)的方法。此外,本文所提出的网络基本完成了耳甲分割,进一步验证了所提出网络的能力。
{"title":"DeepEar: A Deep Convolutional Network without Deformation for Ear Segmentation","authors":"Yuhan Chen, Wende Ke, Qingfeng Li, Dongxin Lu, Yani Bai, Zhen Wang","doi":"10.18178/joig.11.3.242-247","DOIUrl":"https://doi.org/10.18178/joig.11.3.242-247","url":null,"abstract":"With the cross-application of robotics in various fields, machine vision has gradually received attention. As an important part in machine vision, image segmentation has been widely applied especially in biomedical image segmentation, and many algorithms in image segmentation have been proposed in recent years. Nowadays, traditional Chinese medicine gradually received attention and ear diagnosis plays an important role in traditional Chinese medicine, the demand for automation in ear diagnosis becomes gradually intense. This paper proposed a deep convolution network for ear segmentation (DeepEar), which combined spatial pyramid block and the encoder-decoder architecture, besides, atrous convolutional layers are applied throughout the network. Noteworthy, the output ear image from DeepEar has the same size as input images. Experiments shows that this paper proposed DeepEar has great capability in ear segmentation and obtained complete ear with less excess region. Segmentation results from the proposed network obtained Accuracy = 0.9915, Precision = 0.9762, Recal l= 9.9723, Harmonic measure = 0.9738 and Specificity = 0.9955, which performed much better than other Convolution Neural Network (CNN)- based methods in quantitative evaluation. Besides, this paper proposed network basically completed ear-armor segmentation, further validated the capability of the proposed network.","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"82 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82732573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Solar Radiation and Weather Analysis of Meteorological Satellite Data by Tensor Decomposition 基于张量分解的气象卫星资料太阳辐射与天气分析
Q3 Computer Science Pub Date : 2023-09-01 DOI: 10.18178/joig.11.3.271-281
N. Watanabe, A. Ishida, J. Murakami, N. Yamamoto
In this study, the data obtained from meteorological satellites were analyzed using tensor decomposition. The data used in this paper are meteorological image data observed by the Himawari-8 satellite and solar radiation data generated from Himawari Standard Data. First, we applied Higher-Order Singular Value Decomposition (HOSVD), a type of tensor decomposition, to the original image data and analyzed the features of the data, called the core tensor, obtained from the decomposition. As a result, it was found that the maximum value of the core tensor element is related to the cloud cover in the observed area. We then applied Multidimensional Principal Component Analysis (MPCA), an extension of principal component analysis computed using HOSVD, to the solar radiation data and analyzed the Principal Components (PC) obtained from MPCA. We also found that the PC with the highest contribution rate is related to the solar radiation in the entire observation area. The resulting PC score was compared to actual weather data. From the result, it was confirmed that the temporal transition of the amount of solar radiation in this area can be expressed almost correctly by using the PC score.
在本研究中,利用张量分解对气象卫星数据进行分析。本文使用的数据是由Himawari-8卫星观测到的气象图像数据和Himawari标准数据生成的太阳辐射数据。首先,我们将高阶奇异值分解(HOSVD)作为张量分解的一种,对原始图像数据进行分解,并对分解得到的数据特征——核心张量进行分析。结果发现,核心张量元的最大值与观测区域的云量有关。在此基础上,将基于HOSVD计算的主成分分析的扩展——多维主成分分析(MPCA)应用于太阳辐射数据,并对得到的主成分进行了分析。我们还发现,贡献率最高的PC与整个观测区的太阳辐射有关。得出的PC得分与实际天气数据进行了比较。结果表明,利用PC分数可以较准确地表达该地区太阳辐射量的时间变化。
{"title":"Solar Radiation and Weather Analysis of Meteorological Satellite Data by Tensor Decomposition","authors":"N. Watanabe, A. Ishida, J. Murakami, N. Yamamoto","doi":"10.18178/joig.11.3.271-281","DOIUrl":"https://doi.org/10.18178/joig.11.3.271-281","url":null,"abstract":"In this study, the data obtained from meteorological satellites were analyzed using tensor decomposition. The data used in this paper are meteorological image data observed by the Himawari-8 satellite and solar radiation data generated from Himawari Standard Data. First, we applied Higher-Order Singular Value Decomposition (HOSVD), a type of tensor decomposition, to the original image data and analyzed the features of the data, called the core tensor, obtained from the decomposition. As a result, it was found that the maximum value of the core tensor element is related to the cloud cover in the observed area. We then applied Multidimensional Principal Component Analysis (MPCA), an extension of principal component analysis computed using HOSVD, to the solar radiation data and analyzed the Principal Components (PC) obtained from MPCA. We also found that the PC with the highest contribution rate is related to the solar radiation in the entire observation area. The resulting PC score was compared to actual weather data. From the result, it was confirmed that the temporal transition of the amount of solar radiation in this area can be expressed almost correctly by using the PC score.","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79483390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Web-Based Application for Malaria Parasite Detection Using Thin-Blood Smear Images 基于web的疟疾寄生虫检测应用薄血涂片图像
Q3 Computer Science Pub Date : 2023-09-01 DOI: 10.18178/joig.11.3.288-293
W. Swastika, B. J. Pradana, R. B. Widodo, Rehmadanta Sitepu, G. G. Putra
Malaria is an infectious disease caused by the Plasmodium parasite. In 2019, there were 229 million cases of malaria with a death toll of 400.900. Malaria cases increased in 2020 to 241 million people with the death toll reaching 627,000. Malaria diagnosis which is carried out by observing the patient’s blood sample requires experts and if it is not done correctly, misdiagnosis can occur. Deep Learning can be used to help diagnose Malaria by classifying thin blood smear images. In this study, transfer learning techniques were used on the Convolutional Neural Network to speed up the model training process and get high accuracy. The architecture used for Transfer Learning is EfficientNetB0. The training model is embedded in a pythonbased web application which is then deployed on the Google App Engine platform. This is done so that it can be used by experts to help diagnose. The training model has a training accuracy of 0.9664, a training loss of 0.0937, a validation accuracy of 0.9734, and a validation loss of 0.0816. Prediction results on test data have an accuracy of 96.8% and an F1- score value of 0.968.
疟疾是一种由疟原虫引起的传染病。2019年,全球共有2.29亿例疟疾病例,死亡人数为400.900人。2020年,疟疾病例增加到2.41亿人,死亡人数达到62.7万人。疟疾诊断是通过观察患者的血液样本进行的,需要专家,如果做得不正确,就可能发生误诊。深度学习可以通过对薄血涂片图像进行分类来帮助诊断疟疾。在本研究中,将迁移学习技术应用于卷积神经网络,以加快模型训练过程并获得较高的准确率。用于迁移学习的架构是EfficientNetB0。训练模型嵌入在基于python的web应用程序中,然后部署在b谷歌应用程序引擎平台上。这样做是为了让专家可以用它来帮助诊断。训练模型的训练精度为0.9664,训练损失为0.0937,验证精度为0.9734,验证损失为0.0816。对试验数据的预测准确率为96.8%,F1得分值为0.968。
{"title":"Web-Based Application for Malaria Parasite Detection Using Thin-Blood Smear Images","authors":"W. Swastika, B. J. Pradana, R. B. Widodo, Rehmadanta Sitepu, G. G. Putra","doi":"10.18178/joig.11.3.288-293","DOIUrl":"https://doi.org/10.18178/joig.11.3.288-293","url":null,"abstract":"Malaria is an infectious disease caused by the Plasmodium parasite. In 2019, there were 229 million cases of malaria with a death toll of 400.900. Malaria cases increased in 2020 to 241 million people with the death toll reaching 627,000. Malaria diagnosis which is carried out by observing the patient’s blood sample requires experts and if it is not done correctly, misdiagnosis can occur. Deep Learning can be used to help diagnose Malaria by classifying thin blood smear images. In this study, transfer learning techniques were used on the Convolutional Neural Network to speed up the model training process and get high accuracy. The architecture used for Transfer Learning is EfficientNetB0. The training model is embedded in a pythonbased web application which is then deployed on the Google App Engine platform. This is done so that it can be used by experts to help diagnose. The training model has a training accuracy of 0.9664, a training loss of 0.0937, a validation accuracy of 0.9734, and a validation loss of 0.0816. Prediction results on test data have an accuracy of 96.8% and an F1- score value of 0.968.","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"132 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85297579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generation of High-Resolution Facial Expression Images Using a Super-Resolution Technique and Self-Supervised Guidance 使用超分辨率技术和自监督引导生成高分辨率面部表情图像
Q3 Computer Science Pub Date : 2023-09-01 DOI: 10.18178/joig.11.3.302-308
Tatsuya Hanano
The recent spread of smartphones and social networking services has increased the means of seeing images of human faces. Particularly, in the face image field, the generation of face images using facial expression transformation has already been realized using deep learning–based approaches. However, in the existing deep learning–based models, only low-resolution images can be generated due to limited computational resources. Consequently, the generated images are blurry or aliasing. To address this problem, we proposed a two-step method to enhance the resolution of the generated facial images by combining a super-resolution network following the generative model, which can be considered a serial model, in our previous work. We further proposed a parallel model that trains a generative adversarial network and a superresolution network through multitask learning. In this paper, we propose a new model that integrates self-supervised guidance encoders into the parallel model to further improve the accuracy of the generated results. Using the peak signalto- noise ratio as an evaluation index, image quality was improved by 0.25 dB for the male test data and 0.28 dB for the female test data compared with our previous multitaskbased parallel model.
最近智能手机和社交网络服务的普及增加了看到人脸图像的手段。特别是在人脸图像领域,基于面部表情变换的人脸图像生成已经通过基于深度学习的方法实现。然而,在现有的基于深度学习的模型中,由于计算资源有限,只能生成低分辨率的图像。因此,生成的图像模糊或混叠。为了解决这个问题,我们在之前的工作中提出了一种两步方法,通过结合生成模型之后的超分辨率网络来提高生成的面部图像的分辨率,该模型可以被认为是一个串行模型。我们进一步提出了一个并行模型,该模型通过多任务学习训练生成对抗网络和超分辨率网络。在本文中,我们提出了一种新的模型,将自监督制导编码器集成到并行模型中,以进一步提高生成结果的准确性。以峰值信噪比为评价指标,与之前基于多任务的并行模型相比,男性测试数据的图像质量提高了0.25 dB,女性测试数据的图像质量提高了0.28 dB。
{"title":"Generation of High-Resolution Facial Expression Images Using a Super-Resolution Technique and Self-Supervised Guidance","authors":"Tatsuya Hanano","doi":"10.18178/joig.11.3.302-308","DOIUrl":"https://doi.org/10.18178/joig.11.3.302-308","url":null,"abstract":"The recent spread of smartphones and social networking services has increased the means of seeing images of human faces. Particularly, in the face image field, the generation of face images using facial expression transformation has already been realized using deep learning–based approaches. However, in the existing deep learning–based models, only low-resolution images can be generated due to limited computational resources. Consequently, the generated images are blurry or aliasing. To address this problem, we proposed a two-step method to enhance the resolution of the generated facial images by combining a super-resolution network following the generative model, which can be considered a serial model, in our previous work. We further proposed a parallel model that trains a generative adversarial network and a superresolution network through multitask learning. In this paper, we propose a new model that integrates self-supervised guidance encoders into the parallel model to further improve the accuracy of the generated results. Using the peak signalto- noise ratio as an evaluation index, image quality was improved by 0.25 dB for the male test data and 0.28 dB for the female test data compared with our previous multitaskbased parallel model.","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"2012 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82634333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improvement of Presence in Live Music Videos and Alleviation of Discomfort of Viewers by Zooming Operation 通过缩放操作改善现场音乐视频的存在感,减轻观众的不舒适感
Q3 Computer Science Pub Date : 2023-09-01 DOI: 10.18178/joig.11.3.233-241
Ai Oishi, Eiji Kamioka, Phan Xuan Tan, Manami Kanamaru
People can enjoy watching live performances without visiting live venues thanks to the development of live music streaming services on the Internet. However, such live music videos, especially those recorded by amateur band members, lack a sense of presence. Therefore, in the previous study, the authors proposed a method to improve the sense of presence in live music videos by performing zooming on the video frames. It achieved enhancing the sense of presence. However, it also increased the discomfort of the viewers. This is because the zooming was performed not on a music performer but in the center of the screen, resulting in an unnatural experience for the viewer. Therefore, in this paper, a new zooming method, which effectively emphasizes the music performer with intense movement, is proposed, introducing the concept of the “Main Spot”. The evaluation results through an experiment verified that the proposed method improved the sense of presence in live music videos and alleviated the discomfort of the viewers.
随着网络音乐直播服务的发展,人们可以不去现场观看现场演出。然而,这样的现场音乐视频,尤其是那些由业余乐队成员录制的视频,缺乏存在感。因此,在之前的研究中,作者提出了一种通过对视频帧进行缩放来提高现场音乐视频的存在感的方法。它实现了增强存在感。然而,这也增加了观众的不适。这是因为缩放不是在音乐表演者身上进行的,而是在屏幕的中心进行的,这给观众带来了一种不自然的体验。因此,本文提出了一种新的放大方法,通过引入“主点”的概念,有效地突出了动作激烈的音乐表演者。通过实验的评价结果验证了所提出的方法提高了现场音乐视频的临场感,减轻了观众的不适感。
{"title":"Improvement of Presence in Live Music Videos and Alleviation of Discomfort of Viewers by Zooming Operation","authors":"Ai Oishi, Eiji Kamioka, Phan Xuan Tan, Manami Kanamaru","doi":"10.18178/joig.11.3.233-241","DOIUrl":"https://doi.org/10.18178/joig.11.3.233-241","url":null,"abstract":"People can enjoy watching live performances without visiting live venues thanks to the development of live music streaming services on the Internet. However, such live music videos, especially those recorded by amateur band members, lack a sense of presence. Therefore, in the previous study, the authors proposed a method to improve the sense of presence in live music videos by performing zooming on the video frames. It achieved enhancing the sense of presence. However, it also increased the discomfort of the viewers. This is because the zooming was performed not on a music performer but in the center of the screen, resulting in an unnatural experience for the viewer. Therefore, in this paper, a new zooming method, which effectively emphasizes the music performer with intense movement, is proposed, introducing the concept of the “Main Spot”. The evaluation results through an experiment verified that the proposed method improved the sense of presence in live music videos and alleviated the discomfort of the viewers.","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89375650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Method for Enhancing PET Scan Images Using Nonlocal Mean Filter 一种利用非局部均值滤波增强PET扫描图像的方法
Q3 Computer Science Pub Date : 2023-09-01 DOI: 10.18178/joig.11.3.282-287
Raghad Hazim Hamid, Nagham Saeed, H. M. Ahmed
Medical images are an important source of information for both diagnosing and treating diseases. In many cases, the images produced by a Positron Emission Tomography (PET) scan are used to assess the effectiveness of a particular treatment. This paper presents a method for whole-body PET image denoising using a spatially-guided non-local means filter. The proposed method starts with clustering the images into regions. To estimate the noise, a Bayesian with automatic settings of the parameters was used. Then, only patches that belong to regions were collected and processed. The performance was compared to two methods; Gaussian and conventional Non-Local Means (NLM). The Jaszczak phantom and PET/ Computed Tomography (CT) for whole-body were involved in the benchmarking. The obtained results showed that in the Jaszczak phantom, the Signal-to-Noise Ratio (SNR) was significantly improved. Additionally, the proposed method improved the contrast and SNR compared to conventional NLM and Gaussian. Finally, the proposed method, in clinical whole-body PET, can be considered as another way of the post-reconstruction filter.
医学图像是诊断和治疗疾病的重要信息来源。在许多情况下,正电子发射断层扫描(PET)扫描产生的图像用于评估特定治疗的有效性。提出了一种基于空间引导非局部均值滤波的全身PET图像去噪方法。该方法首先将图像聚类成区域。为了估计噪声,使用了具有自动设置参数的贝叶斯方法。然后,只收集和处理属于区域的补丁。比较了两种方法的性能;高斯和常规非局部均值(NLM)。采用Jaszczak假体和全身PET/计算机断层扫描(CT)进行基准测试。结果表明,在Jaszczak模体中,信噪比(SNR)显著提高。此外,与传统NLM和高斯方法相比,该方法提高了对比度和信噪比。最后,该方法在临床全身PET中可作为重建后滤波的另一种方式。
{"title":"A Method for Enhancing PET Scan Images Using Nonlocal Mean Filter","authors":"Raghad Hazim Hamid, Nagham Saeed, H. M. Ahmed","doi":"10.18178/joig.11.3.282-287","DOIUrl":"https://doi.org/10.18178/joig.11.3.282-287","url":null,"abstract":"Medical images are an important source of information for both diagnosing and treating diseases. In many cases, the images produced by a Positron Emission Tomography (PET) scan are used to assess the effectiveness of a particular treatment. This paper presents a method for whole-body PET image denoising using a spatially-guided non-local means filter. The proposed method starts with clustering the images into regions. To estimate the noise, a Bayesian with automatic settings of the parameters was used. Then, only patches that belong to regions were collected and processed. The performance was compared to two methods; Gaussian and conventional Non-Local Means (NLM). The Jaszczak phantom and PET/ Computed Tomography (CT) for whole-body were involved in the benchmarking. The obtained results showed that in the Jaszczak phantom, the Signal-to-Noise Ratio (SNR) was significantly improved. Additionally, the proposed method improved the contrast and SNR compared to conventional NLM and Gaussian. Finally, the proposed method, in clinical whole-body PET, can be considered as another way of the post-reconstruction filter.","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84005345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development of Vehicle Detection and Counting Systems with UAV Cameras: Deep Learning and Darknet Algorithms 无人机摄像机车辆检测与计数系统的发展:深度学习和暗网算法
Q3 Computer Science Pub Date : 2023-09-01 DOI: 10.18178/joig.11.3.248-262
A. H. Rangkuti, Varyl Hasbi Athala, Farrel Haridhi Indallah
This study focuses on identifying and detecting several types of vehicles, with each vehicle’s position depicted by drone technology or an Unmanned Aerial Vehicle (UAV) camera. The vehicle’s position is captured from a height of 350 to 400 meters above the ground. This study aims to identify the class of vehicles that travel on the highway. The experiment employs several convolutional neural network models, including YOLOv4, YOLOv3, YOLOv7, DenseNet201-YOLOv3, and CSResNext50-Panet-SPP, to identify this type of vehicle. Meanwhile, the Darknet algorithm aids the training process by making it easier to identify the type of vehicle depicted in MP4 movies. Several other Convolution Neural Network (CNN) model experiments were conducted in this study, but due to hardware limitations, only these 5 CNN models could produce an optimal accuracy of up to 70%. Following several experiments, the CSResNext50-Panet-SPP model produced the highest accuracy while detecting 100% of video data using UAV technology, including the volume of vehicles detected while crossing the road. Other CNN models produced high accuracy values, such as DenseNet201- YOLOv3 and YOLOv4 models, which can detect up to 98% to 99% of the time. This research can improve its capabilities by detecting other classes that are affordable by UAV technology but require hardware and peripheral technology to support the training process.
本研究的重点是识别和检测几种类型的车辆,每种车辆的位置由无人机技术或无人驾驶飞行器(UAV)相机描述。车辆的位置从离地面350到400米的高度捕捉。本研究旨在确定在高速公路上行驶的车辆类别。实验采用YOLOv4、YOLOv3、YOLOv7、DenseNet201-YOLOv3、CSResNext50-Panet-SPP等多个卷积神经网络模型对该类车辆进行识别。同时,Darknet算法通过使其更容易识别MP4电影中描述的车辆类型来帮助训练过程。本研究还进行了其他几个卷积神经网络(CNN)模型实验,但由于硬件的限制,只有这5个CNN模型能够产生高达70%的最佳准确率。经过几次实验,CSResNext50-Panet-SPP模型在使用无人机技术检测100%视频数据时产生了最高的准确性,包括在过马路时检测到的车辆数量。其他CNN模型也产生了很高的精度值,如DenseNet201- YOLOv3和YOLOv4模型,可以检测到高达98%到99%的时间。本研究可以通过检测无人机技术负担得起但需要硬件和外围技术来支持训练过程的其他类来提高其能力。
{"title":"Development of Vehicle Detection and Counting Systems with UAV Cameras: Deep Learning and Darknet Algorithms","authors":"A. H. Rangkuti, Varyl Hasbi Athala, Farrel Haridhi Indallah","doi":"10.18178/joig.11.3.248-262","DOIUrl":"https://doi.org/10.18178/joig.11.3.248-262","url":null,"abstract":"This study focuses on identifying and detecting several types of vehicles, with each vehicle’s position depicted by drone technology or an Unmanned Aerial Vehicle (UAV) camera. The vehicle’s position is captured from a height of 350 to 400 meters above the ground. This study aims to identify the class of vehicles that travel on the highway. The experiment employs several convolutional neural network models, including YOLOv4, YOLOv3, YOLOv7, DenseNet201-YOLOv3, and CSResNext50-Panet-SPP, to identify this type of vehicle. Meanwhile, the Darknet algorithm aids the training process by making it easier to identify the type of vehicle depicted in MP4 movies. Several other Convolution Neural Network (CNN) model experiments were conducted in this study, but due to hardware limitations, only these 5 CNN models could produce an optimal accuracy of up to 70%. Following several experiments, the CSResNext50-Panet-SPP model produced the highest accuracy while detecting 100% of video data using UAV technology, including the volume of vehicles detected while crossing the road. Other CNN models produced high accuracy values, such as DenseNet201- YOLOv3 and YOLOv4 models, which can detect up to 98% to 99% of the time. This research can improve its capabilities by detecting other classes that are affordable by UAV technology but require hardware and peripheral technology to support the training process.","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89058367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
中国图象图形学报
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1