Pub Date : 2023-10-31DOI: 10.9717/kmms.2023.26.10.1288
Young-Dan Noh, Bo-Seung Kwon, Sang-Won Jung, Young-Shin Han, Jong Sik Lee
The enterprise seeks profit and aims to maximize it. Inventory costs are among the various costs that enterprise can incur. And this inventory cost refers to various costs incurred due to inventory. Inventory management aims to reduce inventory costs while satisfying customers
{"title":"Modeling and Simulation of Periodic Review Inventory Policy in the Supply Chain Using DEVS","authors":"Young-Dan Noh, Bo-Seung Kwon, Sang-Won Jung, Young-Shin Han, Jong Sik Lee","doi":"10.9717/kmms.2023.26.10.1288","DOIUrl":"https://doi.org/10.9717/kmms.2023.26.10.1288","url":null,"abstract":"The enterprise seeks profit and aims to maximize it. Inventory costs are among the various costs that enterprise can incur. And this inventory cost refers to various costs incurred due to inventory. Inventory management aims to reduce inventory costs while satisfying customers","PeriodicalId":16316,"journal":{"name":"Journal of Korea Multimedia Society","volume":"6 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135978357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-31DOI: 10.9717/kmms.2023.26.10.1261
Jeonghyun Noh, Jinsun Park
The shape of a cell is an important factor in cell examinations that diagnose cancer or certain disease, however, due to the limitations and nature of the microscope, low-resolution (LR) cell images can be obtained. LR images have limitations in analyzing the phenotype or morphological characteristics of cells. Therefore, they need to be restored to high-resolution (HR) images. In this paper, we propose a zero-shot super-resolution (ZSSR) algorithm to reconstruct cell shape information. In specific, a high-frequency filtering module (HFM) is adopted to calculate the difference between HR and LR by extracting various information such as the edge and corners of cells which are high-frequency information in an image. In addition, channel attention blocks (CAB) that suppress and emphasize feature information are used for SR without being confused with similar cell shapes in an image. It also improves the generalization performance of the network by sharing the network’s parameters. As a result, PSNR is improved by 0.04dB compared to that of the previous ZSSR. The source code will be made available at : https://github.com/JJeong-Gari/Cell-ZSSR/
{"title":"Zero-Shot Cell Image Super-Resolution","authors":"Jeonghyun Noh, Jinsun Park","doi":"10.9717/kmms.2023.26.10.1261","DOIUrl":"https://doi.org/10.9717/kmms.2023.26.10.1261","url":null,"abstract":"The shape of a cell is an important factor in cell examinations that diagnose cancer or certain disease, however, due to the limitations and nature of the microscope, low-resolution (LR) cell images can be obtained. LR images have limitations in analyzing the phenotype or morphological characteristics of cells. Therefore, they need to be restored to high-resolution (HR) images. In this paper, we propose a zero-shot super-resolution (ZSSR) algorithm to reconstruct cell shape information. In specific, a high-frequency filtering module (HFM) is adopted to calculate the difference between HR and LR by extracting various information such as the edge and corners of cells which are high-frequency information in an image. In addition, channel attention blocks (CAB) that suppress and emphasize feature information are used for SR without being confused with similar cell shapes in an image. It also improves the generalization performance of the network by sharing the network’s parameters. As a result, PSNR is improved by 0.04dB compared to that of the previous ZSSR. The source code will be made available at : https://github.com/JJeong-Gari/Cell-ZSSR/","PeriodicalId":16316,"journal":{"name":"Journal of Korea Multimedia Society","volume":"150 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135979350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-31DOI: 10.9717/kmms.2023.26.10.1353
Sungwook Yoon
This research primarily focused on the development of an LLM response system tailored for university information, leveraging the capabilities and efficiencies of the LoRA technique. LoRA presents a methodology for efficiently fine-tuning large language models for specific tasks, and its effectiveness and efficiency were substantiated through this study. Consequently, a high-accuracy university information response system was established even under constrained resources. Especially with the utilization of LoRA
{"title":"Design and Implementation of LoRA-Based College Entrance Examination and Related Information System","authors":"Sungwook Yoon","doi":"10.9717/kmms.2023.26.10.1353","DOIUrl":"https://doi.org/10.9717/kmms.2023.26.10.1353","url":null,"abstract":"This research primarily focused on the development of an LLM response system tailored for university information, leveraging the capabilities and efficiencies of the LoRA technique. LoRA presents a methodology for efficiently fine-tuning large language models for specific tasks, and its effectiveness and efficiency were substantiated through this study. Consequently, a high-accuracy university information response system was established even under constrained resources. Especially with the utilization of LoRA","PeriodicalId":16316,"journal":{"name":"Journal of Korea Multimedia Society","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135978358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-31DOI: 10.9717/kmms.2023.26.10.1303
YoungSeo Ji, DongWhan Kim, JaeHong Park, Soon-Bum Lim
Font usage is effective in accentuating meaning and establishing the overall tone of a message. Nevertheless, the process of selecting an appropriate font can be burdensome for users as it necessitates examining all available fonts. Furthermore, users with limited font usage experience might inadvertently choose an inappropriate font. To tackle this concern, we developed a system that recommends fonts by evaluating similarity between font keyword values and emotions extracted from content through deep learning emotion analysis. Considering the disparity in criteria utilized for classifying content emotions and font keywords, the necessity arose for a mapping model to evaluate the similarity between these two sets of criteria. Accordingly we designed our mapping model constructed based on the PAD model, a framework that represents emotions along three axes on a coordinate plane. We formulated two distinct methods to assess similarity: the first converts content and font characteristics into a single PAD value, subsequently discerning the distance; The second method analyzes the Pearson correlation coefficient between the criteria for emotional classification to determine the similarity. A comparative evaluation was conducted between these two methods. The results of the evaluation affirmed that the model reflecting the correlation coefficient yielded greater efficacy. As a result, we opted for this mapping model as the approach for calculating similarity between content and font.
{"title":"Design and Application of Mapping Model for Emotion-Based Font Recommendation System","authors":"YoungSeo Ji, DongWhan Kim, JaeHong Park, Soon-Bum Lim","doi":"10.9717/kmms.2023.26.10.1303","DOIUrl":"https://doi.org/10.9717/kmms.2023.26.10.1303","url":null,"abstract":"Font usage is effective in accentuating meaning and establishing the overall tone of a message. Nevertheless, the process of selecting an appropriate font can be burdensome for users as it necessitates examining all available fonts. Furthermore, users with limited font usage experience might inadvertently choose an inappropriate font. To tackle this concern, we developed a system that recommends fonts by evaluating similarity between font keyword values and emotions extracted from content through deep learning emotion analysis. Considering the disparity in criteria utilized for classifying content emotions and font keywords, the necessity arose for a mapping model to evaluate the similarity between these two sets of criteria. Accordingly we designed our mapping model constructed based on the PAD model, a framework that represents emotions along three axes on a coordinate plane. We formulated two distinct methods to assess similarity: the first converts content and font characteristics into a single PAD value, subsequently discerning the distance; The second method analyzes the Pearson correlation coefficient between the criteria for emotional classification to determine the similarity. A comparative evaluation was conducted between these two methods. The results of the evaluation affirmed that the model reflecting the correlation coefficient yielded greater efficacy. As a result, we opted for this mapping model as the approach for calculating similarity between content and font.","PeriodicalId":16316,"journal":{"name":"Journal of Korea Multimedia Society","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135979492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-31DOI: 10.9717/kmms.2023.26.10.1344
Dongha Shim, Wonsik Jung
This study examines the nostalgic cinematic storytelling characteristics of Top Gun: Maverick, which evokes nostalgia, especially focusing on the expression and use of intertextuality. To this end, we compare and analyze Top Gun: Maverick and Top Gun in terms of story structure and character characteristics, which are key elements of storytelling. As a result of the study, the story structure of Top Gun: Maverick is very similar to the preceding text Top Gun, and based on this, strong intertextuality is revealed. In addition, Top Gun: Maverick enhances intertextuality through the three-dimensional use of characters from the previous work and the nostalgic cinematic transformation of the characters
{"title":"A Study on the Nostalgic Cinematic Storytelling Characteristics of <Top Gun: Maverick> - Focusing on Intertextuality","authors":"Dongha Shim, Wonsik Jung","doi":"10.9717/kmms.2023.26.10.1344","DOIUrl":"https://doi.org/10.9717/kmms.2023.26.10.1344","url":null,"abstract":"This study examines the nostalgic cinematic storytelling characteristics of Top Gun: Maverick, which evokes nostalgia, especially focusing on the expression and use of intertextuality. To this end, we compare and analyze Top Gun: Maverick and Top Gun in terms of story structure and character characteristics, which are key elements of storytelling. As a result of the study, the story structure of Top Gun: Maverick is very similar to the preceding text Top Gun, and based on this, strong intertextuality is revealed. In addition, Top Gun: Maverick enhances intertextuality through the three-dimensional use of characters from the previous work and the nostalgic cinematic transformation of the characters","PeriodicalId":16316,"journal":{"name":"Journal of Korea Multimedia Society","volume":"46 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135978342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-31DOI: 10.9717/kmms.2023.26.10.1231
Juhyeon Oh, Kyujoong Lee
Random erasing offers various levels of occlusion for data augmentation. However, due to its uniform distribution of random selection, it sometimes occludes regions that are unrelated to the object of interest. In this paper, we propose a novel method that utilizes Gradient Weighted Class Activation Mapping (Grad-CAM) for estimating the location of the object of interest and selectively erasing the surrounding areas. By utilizing Grad-CAM, we improve random erasing for CNN models without requiring additional modules or architectural changes. We generate Grad-CAM after the intermediate epochs where CNN models have sufficient representational power for the training data. The hyperparameter that restrict the erasing to the vicinity of the object is set based on Grad-CAM, and experiments were conducted accordingly. As a result of our experiments, we observed a 0.33% decrease in error-rate for image classification tasks using ResNet-20 on the CIFAR-10 dataset.
随机擦除为数据增强提供了不同级别的遮挡。然而,由于其随机选择的均匀分布,有时会遮挡与感兴趣对象无关的区域。在本文中,我们提出了一种利用梯度加权类激活映射(Gradient Weighted Class Activation Mapping, Grad-CAM)来估计感兴趣对象的位置并选择性地擦除周围区域的新方法。通过使用Grad-CAM,我们改进了CNN模型的随机擦除,而不需要额外的模块或架构更改。我们在中间时代之后生成Grad-CAM,其中CNN模型对训练数据具有足够的表征能力。基于Grad-CAM设置了将擦除限制在目标附近的超参数,并进行了相应的实验。通过实验,我们发现在CIFAR-10数据集上使用ResNet-20进行图像分类任务的错误率降低了0.33%。
{"title":"Class Activation Map based Random Erasing for Data Augmentation","authors":"Juhyeon Oh, Kyujoong Lee","doi":"10.9717/kmms.2023.26.10.1231","DOIUrl":"https://doi.org/10.9717/kmms.2023.26.10.1231","url":null,"abstract":"Random erasing offers various levels of occlusion for data augmentation. However, due to its uniform distribution of random selection, it sometimes occludes regions that are unrelated to the object of interest. In this paper, we propose a novel method that utilizes Gradient Weighted Class Activation Mapping (Grad-CAM) for estimating the location of the object of interest and selectively erasing the surrounding areas. By utilizing Grad-CAM, we improve random erasing for CNN models without requiring additional modules or architectural changes. We generate Grad-CAM after the intermediate epochs where CNN models have sufficient representational power for the training data. The hyperparameter that restrict the erasing to the vicinity of the object is set based on Grad-CAM, and experiments were conducted accordingly. As a result of our experiments, we observed a 0.33% decrease in error-rate for image classification tasks using ResNet-20 on the CIFAR-10 dataset.","PeriodicalId":16316,"journal":{"name":"Journal of Korea Multimedia Society","volume":"27 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135978570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-31DOI: 10.9717/kmms.2023.26.10.1321
JiHeun Kong, JiaJun Xu, CheeYong Kim
Interactive movies are multimedia content characterized by the addition of nonlinear narratives to the traditional narrative structure of movies and the interaction mechanism of video games. The nonlinear structure of interactive movies enables a unique viewing experience by allowing the audience to participate in the narrative and the audience to compose their own plot. In this study, we tried to find out the effect of audience experience on audience satisfaction for interactive movies. The audience experience was divided into Flow Experience, Emotional Experience, Relational Experience, and Marketing Communication Experience, respectively, and a total of five variables, including Audience Satisaction, were defined, and the correlation between each variable was proved. Reliability, validity, and hypothesis were verified using SPSS 26.0 and AMOS 24.0 based on a total of 272 questionnaires. Studies have shown that Flow Experience has a significant positive effect on customers
互动电影是在传统电影叙事结构和电子游戏互动机制的基础上加入非线性叙事的多媒体内容。互动电影的非线性结构使观众能够参与到叙事中来,形成自己的情节,从而获得独特的观影体验。在本研究中,我们试图找出观众体验对互动电影观众满意度的影响。将受众体验分为Flow experience、Emotional experience、Relational experience和Marketing Communication experience,并定义了包括audience satisfaction在内的共5个变量,并证明了各变量之间的相关性。采用SPSS 26.0和AMOS 24.0对272份问卷进行信度、效度和假设检验。研究表明,心流体验对顾客有显著的积极影响
{"title":"A Study on the Effect of Audience Experience on Audience Satisfaction with Interactive Movies","authors":"JiHeun Kong, JiaJun Xu, CheeYong Kim","doi":"10.9717/kmms.2023.26.10.1321","DOIUrl":"https://doi.org/10.9717/kmms.2023.26.10.1321","url":null,"abstract":"Interactive movies are multimedia content characterized by the addition of nonlinear narratives to the traditional narrative structure of movies and the interaction mechanism of video games. The nonlinear structure of interactive movies enables a unique viewing experience by allowing the audience to participate in the narrative and the audience to compose their own plot. In this study, we tried to find out the effect of audience experience on audience satisfaction for interactive movies. The audience experience was divided into Flow Experience, Emotional Experience, Relational Experience, and Marketing Communication Experience, respectively, and a total of five variables, including Audience Satisaction, were defined, and the correlation between each variable was proved. Reliability, validity, and hypothesis were verified using SPSS 26.0 and AMOS 24.0 based on a total of 272 questionnaires. Studies have shown that Flow Experience has a significant positive effect on customers","PeriodicalId":16316,"journal":{"name":"Journal of Korea Multimedia Society","volume":"25 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135979496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-31DOI: 10.9717/kmms.2023.26.10.1238
Yeongjae Kwon, Cheolhee Lee
In this paper, an advanced backbone structure for YOLOX is proposed to obtain better detection accuracy in small object detection such as hornet by replacing CSPLayer with ShuffleLayer. By this replacement, numbers of convolution operation are reduced in each layer of the backbone. This can conserve spatial information of small objects in each layer and through layers in backbone, reducing processing time. In order to evaluate the proposed method, four types of experiments were executed such as mAP comparison for our hornet dataset, another mAP comparison for the standard dataset VEDAI dedicated small objects, generalization test for RTMDet, and detection speed between the default YOLOX model and the proposed YOLOX model. As a result, the first mAP under 50% IoU condition for the hornet dataset showed 86.21% and 87.35% for the default and the proposed, respectively. The experiment, mAP test for the standard VEDAI, represented 47% and 41.7% for each model and also showed better accuracy by 5.3%. In the generalization test with RTMDet, the proposed model showed similar or higher accuracy according to IoU. In addition, in terms of speed the proposed ShuffleLayerbased backbone was faster than the default by 1.35 times due to reduced convolution parameters. Thus, experiments above verified that the proposed backbone structure for YOLOX can be effectively utilized to enhance accuracy and inference speed in real-time detection for small objects.
{"title":"Proposal of an Advanced Structure of YOLOX for Hornet Detection Accuracy Improvement","authors":"Yeongjae Kwon, Cheolhee Lee","doi":"10.9717/kmms.2023.26.10.1238","DOIUrl":"https://doi.org/10.9717/kmms.2023.26.10.1238","url":null,"abstract":"In this paper, an advanced backbone structure for YOLOX is proposed to obtain better detection accuracy in small object detection such as hornet by replacing CSPLayer with ShuffleLayer. By this replacement, numbers of convolution operation are reduced in each layer of the backbone. This can conserve spatial information of small objects in each layer and through layers in backbone, reducing processing time. In order to evaluate the proposed method, four types of experiments were executed such as mAP comparison for our hornet dataset, another mAP comparison for the standard dataset VEDAI dedicated small objects, generalization test for RTMDet, and detection speed between the default YOLOX model and the proposed YOLOX model. As a result, the first mAP under 50% IoU condition for the hornet dataset showed 86.21% and 87.35% for the default and the proposed, respectively. The experiment, mAP test for the standard VEDAI, represented 47% and 41.7% for each model and also showed better accuracy by 5.3%. In the generalization test with RTMDet, the proposed model showed similar or higher accuracy according to IoU. In addition, in terms of speed the proposed ShuffleLayerbased backbone was faster than the default by 1.35 times due to reduced convolution parameters. Thus, experiments above verified that the proposed backbone structure for YOLOX can be effectively utilized to enhance accuracy and inference speed in real-time detection for small objects.","PeriodicalId":16316,"journal":{"name":"Journal of Korea Multimedia Society","volume":"9 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135978356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-31DOI: 10.9717/kmms.2023.26.10.1333
Jeehae Park, Wonsik Jung
‘Along with the Gods’ can be said to be Korea’s representative transmedia franchise. This paper aims to study the statistical significance between ‘Along with God’ and Bloom’s Expectancy Theory and derive meaningful implications. For this purpose, a survey was conducted on 300 people in their teens to 30s and analysis was performed based on this. Audiences who use transmedia are considered not only participatory users but also producers or members of transmedia. Assuming that the audience is a member of transmedia, they gain expectations about the world view through the film ‘Along with God’. Additionally, they believe that they can sustain their worldview desires through secondary creative content. Furthermore, they gain a sense of belonging as a reward through user-created content.
{"title":"Reward and Sense of Belonging, A study of Transmedia Contents by Expectancy Theory of Motivation - Focused on the Movie ‘Along with the Gods’","authors":"Jeehae Park, Wonsik Jung","doi":"10.9717/kmms.2023.26.10.1333","DOIUrl":"https://doi.org/10.9717/kmms.2023.26.10.1333","url":null,"abstract":"‘Along with the Gods’ can be said to be Korea’s representative transmedia franchise. This paper aims to study the statistical significance between ‘Along with God’ and Bloom’s Expectancy Theory and derive meaningful implications. For this purpose, a survey was conducted on 300 people in their teens to 30s and analysis was performed based on this. Audiences who use transmedia are considered not only participatory users but also producers or members of transmedia. Assuming that the audience is a member of transmedia, they gain expectations about the world view through the film ‘Along with God’. Additionally, they believe that they can sustain their worldview desires through secondary creative content. Furthermore, they gain a sense of belonging as a reward through user-created content.","PeriodicalId":16316,"journal":{"name":"Journal of Korea Multimedia Society","volume":"46 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135978355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-31DOI: 10.9717/kmms.2023.26.10.1251
Donghyun Ku, Hanhoon Park
This paper proposes an effective method to improve the performance of SwinIR, a vision Transformer-based super-resolution neural network model, by introducing a Transformer decoder with learnable category queries. The decoder allows to extract semantic information of each dataset belonging to different categories (e.g., text and face); the semantic information can improve category-specific texture reconstruction in the process of super-resolution. Experiments were conducted using decoders of different architectures to analyze the performance of the proposed method. The experimental results confirm that the use of decoder can improve the quality of super-resolution images produced by SwinIR qualitatively and quantitatively, although improvements may vary depending on the depth of the decoder and how semantic information is applied.
{"title":"Semantic Super-Resolution Using a Transformer Model","authors":"Donghyun Ku, Hanhoon Park","doi":"10.9717/kmms.2023.26.10.1251","DOIUrl":"https://doi.org/10.9717/kmms.2023.26.10.1251","url":null,"abstract":"This paper proposes an effective method to improve the performance of SwinIR, a vision Transformer-based super-resolution neural network model, by introducing a Transformer decoder with learnable category queries. The decoder allows to extract semantic information of each dataset belonging to different categories (e.g., text and face); the semantic information can improve category-specific texture reconstruction in the process of super-resolution. Experiments were conducted using decoders of different architectures to analyze the performance of the proposed method. The experimental results confirm that the use of decoder can improve the quality of super-resolution images produced by SwinIR qualitatively and quantitatively, although improvements may vary depending on the depth of the decoder and how semantic information is applied.","PeriodicalId":16316,"journal":{"name":"Journal of Korea Multimedia Society","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135978360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}