Pub Date : 2022-11-30DOI: 10.1109/DICTA56598.2022.10034610
This paper introduces a machine learning-based approach for detecting electric poles, an essential part of power grid maintenance. With the increasing popularity of deep learning, several such approaches have been proposed for electric pole detection. However, most of these approaches are supervised, requiring a large amount of labeled data, which is time-consuming and labor-intensive. Unsupervised deep learning approaches have the potential to overcome the need for huge amounts of training data. This paper presents an unsupervised deep learning framework for utility pole classification. The framework combines Convolutional Neural Network (CNN) and clustering algorithms with a selection operation. The CNN architecture for extracting meaningful features from aerial imagery, a clustering algorithm for generating pseudo labels for the resulting features, and a selection operation to filter out reliable samples to fine-tune the CNN architecture further. The fine-tuned version then replaces the initial CNN model, thus improving the framework, and we iteratively repeat this process so that the model learns the prominent patterns in the data progressively. The presented framework is trained and tested on a small dataset of utility poles provided by “Mention Fuvex” (a Spanish company utilizing long-range drones for power line inspection). Our extensive experimentation demonstrates the progressive learning behavior of the proposed method and results in promising classification scores on the utility pole dataset.
{"title":"UCL: Unsupervised Curriculum Learning for Utility Pole Classification from Aerial Imagery","authors":"","doi":"10.1109/DICTA56598.2022.10034610","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034610","url":null,"abstract":"This paper introduces a machine learning-based approach for detecting electric poles, an essential part of power grid maintenance. With the increasing popularity of deep learning, several such approaches have been proposed for electric pole detection. However, most of these approaches are supervised, requiring a large amount of labeled data, which is time-consuming and labor-intensive. Unsupervised deep learning approaches have the potential to overcome the need for huge amounts of training data. This paper presents an unsupervised deep learning framework for utility pole classification. The framework combines Convolutional Neural Network (CNN) and clustering algorithms with a selection operation. The CNN architecture for extracting meaningful features from aerial imagery, a clustering algorithm for generating pseudo labels for the resulting features, and a selection operation to filter out reliable samples to fine-tune the CNN architecture further. The fine-tuned version then replaces the initial CNN model, thus improving the framework, and we iteratively repeat this process so that the model learns the prominent patterns in the data progressively. The presented framework is trained and tested on a small dataset of utility poles provided by “Mention Fuvex” (a Spanish company utilizing long-range drones for power line inspection). Our extensive experimentation demonstrates the progressive learning behavior of the proposed method and results in promising classification scores on the utility pole dataset.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126882359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-30DOI: 10.1109/DICTA56598.2022.10034576
Automatic speech recognition (ASR) systems map the input speech signals to corresponding texts in the output. It mainly consists of two stages: encoding the speech signals to an intermediate feature representation which are to be decoded to obtain the corresponding characters. Therefore, it is essential to extract features and temporal dependencies effectively from the speech signals. In addition, it is also necessary to implement a decoding strategy which adequately can leverage the monotonic property of speech transcription. In this paper, we have proposed speech transcription with Transformer where we aimed to enhance the encoder effectiveness by leveraging the strength of convolution neural network in conjunction with self-attention. In addition, we have investigated the possibilities of incorporating monotonicity at the decoder side. Experimental results show the effectiveness of our proposals in ASR.
{"title":"Transformer with enhanced encoder and monotonic decoder for Automatic Speech recognition","authors":"","doi":"10.1109/DICTA56598.2022.10034576","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034576","url":null,"abstract":"Automatic speech recognition (ASR) systems map the input speech signals to corresponding texts in the output. It mainly consists of two stages: encoding the speech signals to an intermediate feature representation which are to be decoded to obtain the corresponding characters. Therefore, it is essential to extract features and temporal dependencies effectively from the speech signals. In addition, it is also necessary to implement a decoding strategy which adequately can leverage the monotonic property of speech transcription. In this paper, we have proposed speech transcription with Transformer where we aimed to enhance the encoder effectiveness by leveraging the strength of convolution neural network in conjunction with self-attention. In addition, we have investigated the possibilities of incorporating monotonicity at the decoder side. Experimental results show the effectiveness of our proposals in ASR.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123743016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-30DOI: 10.1109/DICTA56598.2022.10034640
T. Cao, Connor Luckett, Jerome Williams, T. Cooke, Ben Yip, Arvind Rajagopalan, S. Wong
Maritime Surveillance (MS) involves the detection and classification of maritime vessels using a range of imaging modalities, ranging from radio frequencies such as radar imaging to visible frequencies such as electro-optic (EO) imaging. Among the radar imaging category, Synthetic Aperture Radar (SAR) imagery plays an essential role in MS since SAR can operate in most weather conditions, day and night while providing the sufficient spatial resolution required [1]. An important task of MS is to monitor illegal, unreported, and unregulated (IUU) fishing activities which have caused damage to natural ecosystems with an estimated cost of billions of dollars to fisheries industry and governments worldwide [2]. Space-borne SAR imagery is especially suitable for monitoring IUU fishing activities since it can provide worldwide sensing and large image coverage in the order of hundred of kilometers per image scene.
{"title":"SARFish: Space-Based Maritime Surveillance Using Complex Synthetic Aperture Radar Imagery","authors":"T. Cao, Connor Luckett, Jerome Williams, T. Cooke, Ben Yip, Arvind Rajagopalan, S. Wong","doi":"10.1109/DICTA56598.2022.10034640","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034640","url":null,"abstract":"Maritime Surveillance (MS) involves the detection and classification of maritime vessels using a range of imaging modalities, ranging from radio frequencies such as radar imaging to visible frequencies such as electro-optic (EO) imaging. Among the radar imaging category, Synthetic Aperture Radar (SAR) imagery plays an essential role in MS since SAR can operate in most weather conditions, day and night while providing the sufficient spatial resolution required [1]. An important task of MS is to monitor illegal, unreported, and unregulated (IUU) fishing activities which have caused damage to natural ecosystems with an estimated cost of billions of dollars to fisheries industry and governments worldwide [2]. Space-borne SAR imagery is especially suitable for monitoring IUU fishing activities since it can provide worldwide sensing and large image coverage in the order of hundred of kilometers per image scene.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127792670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-30DOI: 10.1109/DICTA56598.2022.10034635
Human pose estimation and prediction has many applications from autonomous vehicles to video games development, animation and security. In many instances humans are recorded by video in two dimensions and this two-dimensional representation requires uplifting into three dimensions before being fully utilised. This paper proposes lifting two-dimensional skeleton representations of human movement into three dimensional skeleton representations over a sequence of human action, 2D to 3D uplift. The proposed approach builds on the work in HP-GAN [1] utilising a generative adversarial network (GAN) with a recurrent neural network encoder decoder as the generator and a multilayer fully connected neural network as the critic. A novel approach adds random noise from a normal distribution to the z dimension of each joint and a custom loss function consisting of the joint position in 3D space and bone length. The proposed algorithm GAN-Uplift successfully uplifts 2D motion sequences into their respective 3D motion sequences, with a sequence mean joint accuracy of 30.9mm and outperforms several state-of-theart methods and is within 0.4mm of the best state-of-the-art models on the Human3.6M skeleton dataset. In addition, stateof-the-art methods uplift a single pose from a sequence of pose input. GAN-Uplift uplifts a sequence of human poses rather than a single pose.
{"title":"GAN-Uplift: 2D to 3D Uplift with Generative Adversarial Networks","authors":"","doi":"10.1109/DICTA56598.2022.10034635","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034635","url":null,"abstract":"Human pose estimation and prediction has many applications from autonomous vehicles to video games development, animation and security. In many instances humans are recorded by video in two dimensions and this two-dimensional representation requires uplifting into three dimensions before being fully utilised. This paper proposes lifting two-dimensional skeleton representations of human movement into three dimensional skeleton representations over a sequence of human action, 2D to 3D uplift. The proposed approach builds on the work in HP-GAN [1] utilising a generative adversarial network (GAN) with a recurrent neural network encoder decoder as the generator and a multilayer fully connected neural network as the critic. A novel approach adds random noise from a normal distribution to the z dimension of each joint and a custom loss function consisting of the joint position in 3D space and bone length. The proposed algorithm GAN-Uplift successfully uplifts 2D motion sequences into their respective 3D motion sequences, with a sequence mean joint accuracy of 30.9mm and outperforms several state-of-theart methods and is within 0.4mm of the best state-of-the-art models on the Human3.6M skeleton dataset. In addition, stateof-the-art methods uplift a single pose from a sequence of pose input. GAN-Uplift uplifts a sequence of human poses rather than a single pose.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125554964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-30DOI: 10.1109/DICTA56598.2022.10034633
Due to the high adoption rate of QR codes across the world, researchers have been attempting to improve classical QR codes by either improving their appearance to be more meaningful to human perception or improving their capability to be able to store more messages. In this work, we propose dual image QR codes that aim to improve both aspects while preserving the ability to be able to scan by standard QR code readers. We improve the appearance of the QR code using the halftone QR principle and increase the capacity of the QR code with the lenticular imaging technique. To test the robustness of the proposed QR code, we evaluated six important parameters and searched for appropriate conditions through 24, 000 combinations. From the experiments, we found 3, 714 appropriate conditions which achieved 100% successful scanning rate. Lastly, we also list of examples use cases to use in real-world situations for the proposed dual image QR codes.
{"title":"Dual Image QR Codes: The Best of Both Worlds","authors":"","doi":"10.1109/DICTA56598.2022.10034633","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034633","url":null,"abstract":"Due to the high adoption rate of QR codes across the world, researchers have been attempting to improve classical QR codes by either improving their appearance to be more meaningful to human perception or improving their capability to be able to store more messages. In this work, we propose dual image QR codes that aim to improve both aspects while preserving the ability to be able to scan by standard QR code readers. We improve the appearance of the QR code using the halftone QR principle and increase the capacity of the QR code with the lenticular imaging technique. To test the robustness of the proposed QR code, we evaluated six important parameters and searched for appropriate conditions through 24, 000 combinations. From the experiments, we found 3, 714 appropriate conditions which achieved 100% successful scanning rate. Lastly, we also list of examples use cases to use in real-world situations for the proposed dual image QR codes.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131498421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-30DOI: 10.1109/DICTA56598.2022.10034601
In computer vision, object detection is an important task and gained significant progress but object detection using aerial images is still a challenging task for researchers. Small target size, low resolution, occlusion, attitude, and scale variations are the big concerns with aerial images that prevent many state-of-the-art object detectors performing well. In our experimentation, we have modified CenterNet and provided comparison of results achieved using nine different CNN-based backbones i.e., resNet18, resNet34, resNet50, resNet101, resNet152, res2Net50, res2Net101, DLA34 and hourglass104 and found promising results using invariant of centerNet and hourglass104 as a backbone. We used three challenging datasets to validate our approach i.e., VisDrone, Stanford and AU-AIR. By keeping the standard mAP, we achieved 91.62, 75.62 and 34.85 validation results using AU-AIR, Stanford and VisDrone datasets respectively. We have also compared the achieved mAP using IoU@0.5 and IoU@0.75 against different backbones. Our approach has achieved the promising results as compared to results provided in latest research.
{"title":"A Robust Approach for Small-Scale Object Detection From Aerial-View","authors":"","doi":"10.1109/DICTA56598.2022.10034601","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034601","url":null,"abstract":"In computer vision, object detection is an important task and gained significant progress but object detection using aerial images is still a challenging task for researchers. Small target size, low resolution, occlusion, attitude, and scale variations are the big concerns with aerial images that prevent many state-of-the-art object detectors performing well. In our experimentation, we have modified CenterNet and provided comparison of results achieved using nine different CNN-based backbones i.e., resNet18, resNet34, resNet50, resNet101, resNet152, res2Net50, res2Net101, DLA34 and hourglass104 and found promising results using invariant of centerNet and hourglass104 as a backbone. We used three challenging datasets to validate our approach i.e., VisDrone, Stanford and AU-AIR. By keeping the standard mAP, we achieved 91.62, 75.62 and 34.85 validation results using AU-AIR, Stanford and VisDrone datasets respectively. We have also compared the achieved mAP using IoU@0.5 and IoU@0.75 against different backbones. Our approach has achieved the promising results as compared to results provided in latest research.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120983393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-30DOI: 10.1109/DICTA56598.2022.10034615
Vehicle category classification is an integral part of intelligent transportation systems (ITS). In this context, vision-based approaches are of increasing interest due to recent progress in camera hardware and machine learning algorithms. Currently, for vision-based classification an end-to-end approach based on Convolutional Neural Networks (CNNs) is the state-of-the-art. However, their inherent black-box approach and the difficulty of modifying existing or adding new categories currently limit their application in ITS. Here, we present an alternative classification approach that partially removes these limitations. It consists of three parts: 1) a CNN-based detector for semantically strong vehicle parts provides the basis for 2) a feature construction step, followed by 3) the final classification based on a decision tree. Ultimately this approach will allow to keep the training-intensive part-detector fixed, once a sufficiently large set of vehicle parts has been trained. Modification of existing categories and addition of new ones are possible by changes to the feature construction and classification steps only. We illustrate the effectiveness of this approach through the extension of the vehicle classifier from 11 to 16 categories by adding an “articulate” feature. In addition, the vehicle parts provide clear interpretability and the conceptually simple feature construction and decision tree classifier provide explainability of the approach. Nevertheless, the part-based classifier achieves comparable accuracy to an end-to-end CNN model trained on all 16 classes.
{"title":"Disentangling Convolutional Neural Network towards an explainable Vehicle Classifier","authors":"","doi":"10.1109/DICTA56598.2022.10034615","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034615","url":null,"abstract":"Vehicle category classification is an integral part of intelligent transportation systems (ITS). In this context, vision-based approaches are of increasing interest due to recent progress in camera hardware and machine learning algorithms. Currently, for vision-based classification an end-to-end approach based on Convolutional Neural Networks (CNNs) is the state-of-the-art. However, their inherent black-box approach and the difficulty of modifying existing or adding new categories currently limit their application in ITS. Here, we present an alternative classification approach that partially removes these limitations. It consists of three parts: 1) a CNN-based detector for semantically strong vehicle parts provides the basis for 2) a feature construction step, followed by 3) the final classification based on a decision tree. Ultimately this approach will allow to keep the training-intensive part-detector fixed, once a sufficiently large set of vehicle parts has been trained. Modification of existing categories and addition of new ones are possible by changes to the feature construction and classification steps only. We illustrate the effectiveness of this approach through the extension of the vehicle classifier from 11 to 16 categories by adding an “articulate” feature. In addition, the vehicle parts provide clear interpretability and the conceptually simple feature construction and decision tree classifier provide explainability of the approach. Nevertheless, the part-based classifier achieves comparable accuracy to an end-to-end CNN model trained on all 16 classes.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114772468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-30DOI: 10.1109/DICTA56598.2022.10034565
To capture motion homogeneity between successive frames, the edge position difference (EPD) measure based motion modeling (EPD-MM) has shown good motion compensation capabilities. The EPD-MM technique is underpinned by the fact that from one frame to next, edges map to edges and such mapping can be captured by an appropriate motion model. However, the EPD-MM approach may produce inferior quality motion model in those regions of the current frame where moving edges are few in number. For such regions, traditional pixel intensity difference (PID) measure based motion modeling (PID-MM) may yield superior motion compensation. Therefore, in this paper, the entire current frame is at first partitioned into two regions (edge dominant region and edge sparse region) based on the frequency of moving edge pixels. This segmentation is carried out over the EPD image since it possesses information pertinent to the distance of every pixel from its nearest edge. After that for motion modeling, in the edge dominant region, the EPD-MM technique is adopted and for the rest of the current frame regions, the PID-MM approach is chosen. Experimental results show an improved prediction PSNR of 1.90 dB from the proposed approach compared to that of the baseline EPD-MM approach that does not differentiate between edge dominant and edge sparse regions. Moreover, if this predicted frame is employed as an additional reference frame to encode current frames, bit rate savings of up to 7.84% is achievable over a HEVC reference codec.
{"title":"A Region Adaptive Motion Estimation Strategy Leveraging on the Edge Position Difference Measure: Anonymous ICME submission","authors":"","doi":"10.1109/DICTA56598.2022.10034565","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034565","url":null,"abstract":"To capture motion homogeneity between successive frames, the edge position difference (EPD) measure based motion modeling (EPD-MM) has shown good motion compensation capabilities. The EPD-MM technique is underpinned by the fact that from one frame to next, edges map to edges and such mapping can be captured by an appropriate motion model. However, the EPD-MM approach may produce inferior quality motion model in those regions of the current frame where moving edges are few in number. For such regions, traditional pixel intensity difference (PID) measure based motion modeling (PID-MM) may yield superior motion compensation. Therefore, in this paper, the entire current frame is at first partitioned into two regions (edge dominant region and edge sparse region) based on the frequency of moving edge pixels. This segmentation is carried out over the EPD image since it possesses information pertinent to the distance of every pixel from its nearest edge. After that for motion modeling, in the edge dominant region, the EPD-MM technique is adopted and for the rest of the current frame regions, the PID-MM approach is chosen. Experimental results show an improved prediction PSNR of 1.90 dB from the proposed approach compared to that of the baseline EPD-MM approach that does not differentiate between edge dominant and edge sparse regions. Moreover, if this predicted frame is employed as an additional reference frame to encode current frames, bit rate savings of up to 7.84% is achievable over a HEVC reference codec.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121773806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-30DOI: 10.1109/DICTA56598.2022.10034574
With the excellent disentanglement properties of state-of-the-art generative models, image editing has been the dominant approach to controlling the attributes of synthesized face images. However, these edited results often suffer from artifacts or incorrect feature rendering, especially when there is a large discrepancy between the image to be edited and the desired feature set. Therefore, we propose a new approach to mapping the latent vectors of the generative model to the scaling factors through solving a set of multivariate linear equations. The coefficients of the equations are the eigenvectors of the weight parameters of the pre-trained model, which form the basis of a hyper coordinate system. The qualitative and quantitative results both show that the proposed method outperforms the baseline in terms of image diversity. In addition, the method is much more time-efficient since the synthesized images with desirable features can be obtained directly from the latent vectors, rather than the former process of editing randomly generated images with redundant steps.
{"title":"FaceCook: Attribute-Controllable Face Generation Based on Linear Scaling Factors","authors":"","doi":"10.1109/DICTA56598.2022.10034574","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034574","url":null,"abstract":"With the excellent disentanglement properties of state-of-the-art generative models, image editing has been the dominant approach to controlling the attributes of synthesized face images. However, these edited results often suffer from artifacts or incorrect feature rendering, especially when there is a large discrepancy between the image to be edited and the desired feature set. Therefore, we propose a new approach to mapping the latent vectors of the generative model to the scaling factors through solving a set of multivariate linear equations. The coefficients of the equations are the eigenvectors of the weight parameters of the pre-trained model, which form the basis of a hyper coordinate system. The qualitative and quantitative results both show that the proposed method outperforms the baseline in terms of image diversity. In addition, the method is much more time-efficient since the synthesized images with desirable features can be obtained directly from the latent vectors, rather than the former process of editing randomly generated images with redundant steps.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123827867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-30DOI: 10.1109/DICTA56598.2022.10034560
T. Thanh
PET/CT is a type of medical imaging that has been shown to be useful for both disease diagnosis and treatment monitoring. However, PET/CT imaging systems are still not widely used in hospitals because radioactive material injection is required and there aren't many PET scanners available, but CT scanners are widely used in health care facilities. In this paper, we introduce a new Convolution-mix-Transformer as a Generator network for translating CT images to PET images. The method has been tested on PET/CT images of 791 patients, CT images are trained as a full range image instead of just a small window, as systems often do to make it easier to see specific parts of the body. We generate PET images and analyze SUV values based on CT images. We use five common measures to evaluate results: peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), mean structural similarity index measure (MSSIM), mean absolute error (MAE) and Fréchet inception distance (FID), and our result which are significantly higher than other models. Our suggested model successfully converts CT images to PET images by including a trained image-to-image translation approach. The generated PET images maintain both the regional and global features of medical imaging.
{"title":"Convolution-mix-Transformer Generator model to synthesize PET images from CT scans","authors":"T. Thanh","doi":"10.1109/DICTA56598.2022.10034560","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034560","url":null,"abstract":"PET/CT is a type of medical imaging that has been shown to be useful for both disease diagnosis and treatment monitoring. However, PET/CT imaging systems are still not widely used in hospitals because radioactive material injection is required and there aren't many PET scanners available, but CT scanners are widely used in health care facilities. In this paper, we introduce a new Convolution-mix-Transformer as a Generator network for translating CT images to PET images. The method has been tested on PET/CT images of 791 patients, CT images are trained as a full range image instead of just a small window, as systems often do to make it easier to see specific parts of the body. We generate PET images and analyze SUV values based on CT images. We use five common measures to evaluate results: peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), mean structural similarity index measure (MSSIM), mean absolute error (MAE) and Fréchet inception distance (FID), and our result which are significantly higher than other models. Our suggested model successfully converts CT images to PET images by including a trained image-to-image translation approach. The generated PET images maintain both the regional and global features of medical imaging.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"158 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124417430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}