Yifan Zhou, Bing Yang, Xiaolu Lin, Risa Higashita, Jiang Liu
Deep learning methods have been demonstrated effective in medical image segmentation tasks. The results are affected by data imbalance problems. The inter-class imbalance is often considered, while the intra-class imbalance is not. The intra-class imbalance usually occurs in medical images due to external influences such as noise interference and changes in camera angle, resulting in insufficient discriminative representations within classes. Deep learning methods are easy to segment regions without complex textures and varied appearances. They are susceptible to the intra-class imbalance problem in medical images. In this paper, we propose a two-stage global-local framework to solve the intra-class imbalance problem and increase segmentation accuracy. The framework consists of (1) an auxiliary task network(ATN), (2) a local patch network(LPN), and (3) a fusion module. The ATN has a shared encoder and two separate decoders that perform global segmentation and key points localization. The key points guide to generating the fuzzy patches for the LPN. The LPN focuses on challenging patches to get a more accurate result. The fusion module generates the final output according to the global and local segmentation results. Furthermore, we have performed experiments on a private iris dataset with 290 images and a public CAMUS dataset with 1800 images. Our method achieves an IoU of 0.9280 on the iris dataset and an IoU of 0.8511 on the CAMUS dataset. The results on both datasets show that our method achieves superior performance over U-Net, CE-Net, and U-Net++.
{"title":"Global-Local Framework for Medical Image Segmentation with Intra-class Imbalance Problem","authors":"Yifan Zhou, Bing Yang, Xiaolu Lin, Risa Higashita, Jiang Liu","doi":"10.1145/3590003.3590071","DOIUrl":"https://doi.org/10.1145/3590003.3590071","url":null,"abstract":"Deep learning methods have been demonstrated effective in medical image segmentation tasks. The results are affected by data imbalance problems. The inter-class imbalance is often considered, while the intra-class imbalance is not. The intra-class imbalance usually occurs in medical images due to external influences such as noise interference and changes in camera angle, resulting in insufficient discriminative representations within classes. Deep learning methods are easy to segment regions without complex textures and varied appearances. They are susceptible to the intra-class imbalance problem in medical images. In this paper, we propose a two-stage global-local framework to solve the intra-class imbalance problem and increase segmentation accuracy. The framework consists of (1) an auxiliary task network(ATN), (2) a local patch network(LPN), and (3) a fusion module. The ATN has a shared encoder and two separate decoders that perform global segmentation and key points localization. The key points guide to generating the fuzzy patches for the LPN. The LPN focuses on challenging patches to get a more accurate result. The fusion module generates the final output according to the global and local segmentation results. Furthermore, we have performed experiments on a private iris dataset with 290 images and a public CAMUS dataset with 1800 images. Our method achieves an IoU of 0.9280 on the iris dataset and an IoU of 0.8511 on the CAMUS dataset. The results on both datasets show that our method achieves superior performance over U-Net, CE-Net, and U-Net++.","PeriodicalId":340225,"journal":{"name":"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132347222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The rapid development of urbanization in modern China is accompanied by the increasingly serious problem of urban shrinkage. To provide an effective analytical model for the urban shrinkage problem, this paper takes Liaoning Province, which is one of the typical provinces with a serious urban shrinkage issue in China, as an example. Based on the data from 30 cities in Liaoning Province in recent years, this paper constructs a multi-index system for shrinking cities to evaluate and classify the shrinkage degree of 30 cities. The grey relation analysis model is also used to quantitatively analyze the influence of various factors on the shrinking city population, while the back-propagation neural network algorithm model optimized with particle swarm optimization is also applied to predict the development trend of shrinking cities. The results present the shrinking properties of 30 cities and correlations between different city indicators, as well as the predictive development trend of the shrinking city.
{"title":"Multi-dimensional analysis of urban shrinkage problem in Liaoning Province based on multi-index system, grey correlation analysis and BP neural network with particle swarm optimization","authors":"Zhenyu Fang, Jun Yu Li, Junyu Xiong, Xin Wang","doi":"10.1145/3590003.3590016","DOIUrl":"https://doi.org/10.1145/3590003.3590016","url":null,"abstract":"The rapid development of urbanization in modern China is accompanied by the increasingly serious problem of urban shrinkage. To provide an effective analytical model for the urban shrinkage problem, this paper takes Liaoning Province, which is one of the typical provinces with a serious urban shrinkage issue in China, as an example. Based on the data from 30 cities in Liaoning Province in recent years, this paper constructs a multi-index system for shrinking cities to evaluate and classify the shrinkage degree of 30 cities. The grey relation analysis model is also used to quantitatively analyze the influence of various factors on the shrinking city population, while the back-propagation neural network algorithm model optimized with particle swarm optimization is also applied to predict the development trend of shrinking cities. The results present the shrinking properties of 30 cities and correlations between different city indicators, as well as the predictive development trend of the shrinking city.","PeriodicalId":340225,"journal":{"name":"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133833060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The interactivity of social media platforms allows a large number of users to comment on different political or social issues to express their views, and identifying users' stances from online comment texts helps the government to monitor public opinion more effectively. The automatic recognition of stance information in comment text has become a new research hotspot in the field of natural language processing. Most of the existing text stance analysis corpus focuses on political topics in European and American countries, and high-quality stance analysis corpus research on political topics in Southeast Asian countries is relatively scarce. In order to stimulate this research direction, this paper provides a dataset about the 2022 Philippine presidential election, which annotates the stance information of the two popular presidential candidates and provides reliable data support for subsequent stance analysis model research. Next, we build a stance detection model of hybrid deep neural networks based on BiLSTM, CNN, and Attention, and we demonstrate its effectiveness on multiple datasets and obtain the best results on the SemEval-2016 dataset. In addition, we compare FastText and Word2Vec, two pre-trained word embeddings for word encoding, and discuss which word embedding is preferred in stance detection tasks. This result shows that the stance analysis model proposed in this paper can be effectively applied to Twitter text stance data.
{"title":"Twitter stance detection using deep learning model with FastText Embedding","authors":"Yongqing Deng, Yongzhong Huang","doi":"10.1145/3590003.3590102","DOIUrl":"https://doi.org/10.1145/3590003.3590102","url":null,"abstract":"The interactivity of social media platforms allows a large number of users to comment on different political or social issues to express their views, and identifying users' stances from online comment texts helps the government to monitor public opinion more effectively. The automatic recognition of stance information in comment text has become a new research hotspot in the field of natural language processing. Most of the existing text stance analysis corpus focuses on political topics in European and American countries, and high-quality stance analysis corpus research on political topics in Southeast Asian countries is relatively scarce. In order to stimulate this research direction, this paper provides a dataset about the 2022 Philippine presidential election, which annotates the stance information of the two popular presidential candidates and provides reliable data support for subsequent stance analysis model research. Next, we build a stance detection model of hybrid deep neural networks based on BiLSTM, CNN, and Attention, and we demonstrate its effectiveness on multiple datasets and obtain the best results on the SemEval-2016 dataset. In addition, we compare FastText and Word2Vec, two pre-trained word embeddings for word encoding, and discuss which word embedding is preferred in stance detection tasks. This result shows that the stance analysis model proposed in this paper can be effectively applied to Twitter text stance data.","PeriodicalId":340225,"journal":{"name":"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning","volume":"179 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116143744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Power systems have revealed serious security problems in the process of gradual opening, and intrusion detection as an important security defense measure can detect potential intrusions in a timely manner. In the big data environment of electric power, there are information silos between different electric power data owners, and in order to obtain intrusion detection models with better performance, traditional methods need to fuse data from all parties, which often brings difficulties in information security and data privacy protection. In this paper, we propose a distributed intrusion detection framework based on federated learning and apply it to network traffic data analysis. The framework aims to ensure the information security of each local power data while establishing a collection of decentralized data and completing the joint training of models from multiple data sources. The experimental results show that the scheme achieves 98.1% accuracy on the simulated data set, which is better than other commonly used intrusion detection algorithms. In addition, the method well ensures the security and privacy of data because the data are not interoperable among each participant under the federated learning mechanism.
{"title":"Federated Learning-Based Intrusion Detection Method for Smart Grid","authors":"Dong Bin, Xin Li, Chunyan Yang, Songming Han, Ying Ling","doi":"10.1145/3590003.3590060","DOIUrl":"https://doi.org/10.1145/3590003.3590060","url":null,"abstract":"Power systems have revealed serious security problems in the process of gradual opening, and intrusion detection as an important security defense measure can detect potential intrusions in a timely manner. In the big data environment of electric power, there are information silos between different electric power data owners, and in order to obtain intrusion detection models with better performance, traditional methods need to fuse data from all parties, which often brings difficulties in information security and data privacy protection. In this paper, we propose a distributed intrusion detection framework based on federated learning and apply it to network traffic data analysis. The framework aims to ensure the information security of each local power data while establishing a collection of decentralized data and completing the joint training of models from multiple data sources. The experimental results show that the scheme achieves 98.1% accuracy on the simulated data set, which is better than other commonly used intrusion detection algorithms. In addition, the method well ensures the security and privacy of data because the data are not interoperable among each participant under the federated learning mechanism.","PeriodicalId":340225,"journal":{"name":"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115152566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a comprehensive overview of the application of Raspberry Pi in the field of health monitoring for elderly people with disabilities. Firstly we discuss the advantages of using artificial intelligence technology for health monitoring of elderly people, and the significance of using information technology devices to achieve health monitoring for the elderly, while keeping the cost of the devices low. And then we examine the development of Raspberry Pi and its advantages for health monitoring of elderly people with disabilities, such as its low cost, portability, and ease of use. After that we outline the methods of collecting data for health monitoring of elderly people, such as using sensors to measure heart rate, oxygen levels, and blood pressure, and integrating these sensors into a single device. We also discuss the implementation of a Raspberry Pi-based health monitoring system for elderly people, and the ways in which health data can be utilized to optimize the performance of the system. The work provides useful insights for those who are interested in using Raspberry Pi for health monitoring applications for elderly people with disabilities.
{"title":"Health monitoring system for elderly people based on Raspberry Pi","authors":"Qingsong Peng","doi":"10.1145/3590003.3590057","DOIUrl":"https://doi.org/10.1145/3590003.3590057","url":null,"abstract":"We present a comprehensive overview of the application of Raspberry Pi in the field of health monitoring for elderly people with disabilities. Firstly we discuss the advantages of using artificial intelligence technology for health monitoring of elderly people, and the significance of using information technology devices to achieve health monitoring for the elderly, while keeping the cost of the devices low. And then we examine the development of Raspberry Pi and its advantages for health monitoring of elderly people with disabilities, such as its low cost, portability, and ease of use. After that we outline the methods of collecting data for health monitoring of elderly people, such as using sensors to measure heart rate, oxygen levels, and blood pressure, and integrating these sensors into a single device. We also discuss the implementation of a Raspberry Pi-based health monitoring system for elderly people, and the ways in which health data can be utilized to optimize the performance of the system. The work provides useful insights for those who are interested in using Raspberry Pi for health monitoring applications for elderly people with disabilities.","PeriodicalId":340225,"journal":{"name":"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123338827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Digital pathological images with a large range of Histological Tissue Types (HTTs) contain more sophisticated contours than natural images. In recent years, deep learning algorithms have been widely applied to assist HTT analysis in a weakly-supervised manner by exploiting the class activation maps (CAM). However, the previous methods tend to confusedly activate the most discriminative regions of feature maps, resulting in incomplete segmented contour. This paper proposes a Histo-Puzzle network to improve the HTTs classification and segmentation based on patch-level self-supervised learning. Specifically, our model separates the HTT images into tiled patches by a puzzle module. Then we train a classifier on the supervision of reconstructed CAMs and image-level labels simultaneously. Experiments are conducted on the digital pathology database with 51 hierarchical HTTs. The experimental results show that our proposed method outperforms previous state-of-the-art methods on segmentation tasks of morphological and functional types.
{"title":"A Histo-Puzzle Network for Weakly Supervised Semantic Segmentation of Histological Tissue Type","authors":"Tengyun Ma, Guotian He, Lin Chen, Yuanchang Lin","doi":"10.1145/3590003.3590095","DOIUrl":"https://doi.org/10.1145/3590003.3590095","url":null,"abstract":"Digital pathological images with a large range of Histological Tissue Types (HTTs) contain more sophisticated contours than natural images. In recent years, deep learning algorithms have been widely applied to assist HTT analysis in a weakly-supervised manner by exploiting the class activation maps (CAM). However, the previous methods tend to confusedly activate the most discriminative regions of feature maps, resulting in incomplete segmented contour. This paper proposes a Histo-Puzzle network to improve the HTTs classification and segmentation based on patch-level self-supervised learning. Specifically, our model separates the HTT images into tiled patches by a puzzle module. Then we train a classifier on the supervision of reconstructed CAMs and image-level labels simultaneously. Experiments are conducted on the digital pathology database with 51 hierarchical HTTs. The experimental results show that our proposed method outperforms previous state-of-the-art methods on segmentation tasks of morphological and functional types.","PeriodicalId":340225,"journal":{"name":"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning","volume":"67 8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116006241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Emotional diseases being represented in many kinds of human mental and cardiac problems, demanding requirements are imposed on accurate emotion recognition. Deep learning methods have gained widespread application in the field of emotion recognition, utilizing physiological signals. However, many existing methods rely solely on deep features, which can be difficult to interpret and may not provide a comprehensive understanding of physiological signals. To address this issue, we propose a novel emotion recognition method based on feature fusion and self-supervised learning. This approach combines shallow features and deep learning features, resulting in a more holistic and interpretable approach to analyzing physiological signals. In addition, we transferred the self-supervised learning method from processing images to signals, which learns sophisticated and informative features from unlabeled signal data. Our experimental results are conducted on WESAD, a publicly available dataset and the proposed model shows significant improvement in performance, which confirms the superiority of our proposed method compared to state-of-the-art methods.
{"title":"An Emotion Recognition Method Based On Feature Fusion and Self-Supervised Learning","authors":"Xuan-Nam Cao, Ming Sun","doi":"10.1145/3590003.3590041","DOIUrl":"https://doi.org/10.1145/3590003.3590041","url":null,"abstract":"Emotional diseases being represented in many kinds of human mental and cardiac problems, demanding requirements are imposed on accurate emotion recognition. Deep learning methods have gained widespread application in the field of emotion recognition, utilizing physiological signals. However, many existing methods rely solely on deep features, which can be difficult to interpret and may not provide a comprehensive understanding of physiological signals. To address this issue, we propose a novel emotion recognition method based on feature fusion and self-supervised learning. This approach combines shallow features and deep learning features, resulting in a more holistic and interpretable approach to analyzing physiological signals. In addition, we transferred the self-supervised learning method from processing images to signals, which learns sophisticated and informative features from unlabeled signal data. Our experimental results are conducted on WESAD, a publicly available dataset and the proposed model shows significant improvement in performance, which confirms the superiority of our proposed method compared to state-of-the-art methods.","PeriodicalId":340225,"journal":{"name":"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126580949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arbitrary-Oriented object detection in remote sensing images is a hot topic in recent years. Currently, most arbitrary-oriented object detectors adopt the oriented bounding box (OBB) to represent targets in remote sensing imagery. However, OBB representation suffers from suboptimal regression problems caused by the ambiguity of the angle definition. In this paper, we propose a novel framework to Learning Segmentation-aware Mask for arbitrary-oriented object Detection (LSM-Det) in remote sensing imagery. LSM-Det predicts the mask of the object, and then converts the mask prediction into a minimum external OBB to achieve arbitrary-oriented object detection. Moreover, we designed a segmentation-aware branch to select high-quality predictions via the output matching score. Our method achieves superior performance on multiple remote sensing datasets. Code and models are available to facilitate related research.
{"title":"Detecting Arbitrary-oriented Objects in Remote Sensing Imagery with Segmentation-Aware Mask","authors":"Jiali Wei, Bo Hua, Fei Gao, Huan Zhang, Jiangwei Fan, Shuran Zhang","doi":"10.1145/3590003.3590032","DOIUrl":"https://doi.org/10.1145/3590003.3590032","url":null,"abstract":"Arbitrary-Oriented object detection in remote sensing images is a hot topic in recent years. Currently, most arbitrary-oriented object detectors adopt the oriented bounding box (OBB) to represent targets in remote sensing imagery. However, OBB representation suffers from suboptimal regression problems caused by the ambiguity of the angle definition. In this paper, we propose a novel framework to Learning Segmentation-aware Mask for arbitrary-oriented object Detection (LSM-Det) in remote sensing imagery. LSM-Det predicts the mask of the object, and then converts the mask prediction into a minimum external OBB to achieve arbitrary-oriented object detection. Moreover, we designed a segmentation-aware branch to select high-quality predictions via the output matching score. Our method achieves superior performance on multiple remote sensing datasets. Code and models are available to facilitate related research.","PeriodicalId":340225,"journal":{"name":"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130503626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dyslexia was first proposed in 1877, but this century-old problem still troubles many people today [1]. Dyslexia is marked by difficulty in reading despite having normal or superior conditions in their environment and intellectual ability, is curable using multi-sensory learning, which involves providing audio stimulus, sometimes generated from expressive text-to-speech. However, such generated audio lacks rhythmic features, marked by inadequate insertion of pauses. In response to such technological difficulty, this paper proposes RhySpeech, which models rhythm using feed-forward transformer neural networks and an LRV (Latent Rhythm Vector). The LRV receives input from the pitch, energy, and duration features encoded using a Transformers network along with the numeric encoding of the previous 16 phonemes, which together build a strong sense of context for the pause prediction. This LRV is trained to generate adequate lengths and positions of pa uses, allowing the synthesized audio to have more accurate pausing
{"title":"RhySpeech: A Deployable Rhythmic Text-to-Speech Based on Feed-Forward Transformer for Reading Disabilities","authors":"Yi-Hsien Lin","doi":"10.1145/3590003.3590062","DOIUrl":"https://doi.org/10.1145/3590003.3590062","url":null,"abstract":"Dyslexia was first proposed in 1877, but this century-old problem still troubles many people today [1]. Dyslexia is marked by difficulty in reading despite having normal or superior conditions in their environment and intellectual ability, is curable using multi-sensory learning, which involves providing audio stimulus, sometimes generated from expressive text-to-speech. However, such generated audio lacks rhythmic features, marked by inadequate insertion of pauses. In response to such technological difficulty, this paper proposes RhySpeech, which models rhythm using feed-forward transformer neural networks and an LRV (Latent Rhythm Vector). The LRV receives input from the pitch, energy, and duration features encoded using a Transformers network along with the numeric encoding of the previous 16 phonemes, which together build a strong sense of context for the pause prediction. This LRV is trained to generate adequate lengths and positions of pa uses, allowing the synthesized audio to have more accurate pausing","PeriodicalId":340225,"journal":{"name":"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128448014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The application of scene text erasure technology in privacy protection, camera-based virtual reality translation and image editing has attracted more and more research interests. Recent efforts on scene text erasing have shown promising results. We utilize text removal methods as a component of industrial characters generation procedure to generate large-scale synthetic character images so as to mitigate the issue of insufficient samples in the recognition task of industrial characters. Existing character erasure models has achieved good performance in natural scenes. However, in industrial scenes, these erasure networks are easily affected by salient no-character regions leading to the attention shift. To overcome this limitation, we proposed a character erasure network based on attention mechanism which embed an additional region awareness layer to guide attention to the correct character regions. Meanwhile, we devise a gaussian heat map supervision method for learning additional region awareness layer. The experiments show that the proposed method performs favourably on four industrial character datasets.
{"title":"Gaussian-guided character erasure for data augment of industrial characters","authors":"Hongchao Gao, Chao Yao, Zhennan Wang","doi":"10.1145/3590003.3590077","DOIUrl":"https://doi.org/10.1145/3590003.3590077","url":null,"abstract":"The application of scene text erasure technology in privacy protection, camera-based virtual reality translation and image editing has attracted more and more research interests. Recent efforts on scene text erasing have shown promising results. We utilize text removal methods as a component of industrial characters generation procedure to generate large-scale synthetic character images so as to mitigate the issue of insufficient samples in the recognition task of industrial characters. Existing character erasure models has achieved good performance in natural scenes. However, in industrial scenes, these erasure networks are easily affected by salient no-character regions leading to the attention shift. To overcome this limitation, we proposed a character erasure network based on attention mechanism which embed an additional region awareness layer to guide attention to the correct character regions. Meanwhile, we devise a gaussian heat map supervision method for learning additional region awareness layer. The experiments show that the proposed method performs favourably on four industrial character datasets.","PeriodicalId":340225,"journal":{"name":"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129126890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}