Pub Date : 2021-09-13DOI: 10.1109/ICSIPA52582.2021.9576805
Kin Wai Lee, R. Chin
Medical imaging modalities have been showing great potentials for faster and efficient disease transmission control and containment. In the paper, we propose a cost-effective COVID-19 and pneumonia detection framework using CT scans acquired from several hospitals. To this end, we incorporate a novel data processing framework that utilizes 3D and 2D CT scans to diversify the trainable inputs in a resource-limited setting. Moreover, we empirically demonstrate the significance of several data processing schemes for our COVID-19 and pneumonia detection network. Experiment results show that our proposed pneumonia detection network is comparable to other pneumonia detection tasks integrated with imaging modalities, with 93% mean AUC and 85.22% mean accuracy scores on generalized datasets. Additionally, our proposed data processing framework can be easily adapted to other applications of CT modality, especially for cost-effective and resource-limited scenarios, such as breast cancer detection, pulmonary nodules diagnosis, etc.
{"title":"An Adaptive Data Processing Framework for Cost-Effective COVID-19 and Pneumonia Detection","authors":"Kin Wai Lee, R. Chin","doi":"10.1109/ICSIPA52582.2021.9576805","DOIUrl":"https://doi.org/10.1109/ICSIPA52582.2021.9576805","url":null,"abstract":"Medical imaging modalities have been showing great potentials for faster and efficient disease transmission control and containment. In the paper, we propose a cost-effective COVID-19 and pneumonia detection framework using CT scans acquired from several hospitals. To this end, we incorporate a novel data processing framework that utilizes 3D and 2D CT scans to diversify the trainable inputs in a resource-limited setting. Moreover, we empirically demonstrate the significance of several data processing schemes for our COVID-19 and pneumonia detection network. Experiment results show that our proposed pneumonia detection network is comparable to other pneumonia detection tasks integrated with imaging modalities, with 93% mean AUC and 85.22% mean accuracy scores on generalized datasets. Additionally, our proposed data processing framework can be easily adapted to other applications of CT modality, especially for cost-effective and resource-limited scenarios, such as breast cancer detection, pulmonary nodules diagnosis, etc.","PeriodicalId":326688,"journal":{"name":"2021 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125958832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-13DOI: 10.1109/ICSIPA52582.2021.9576770
N. Malik, S. Abu-Bakar, U. U. Sheikh
The growing technological development in the field of computer vision in general, and human action recognition (HAR), in particular, have attracted increasing number of researchers from various disciplines. Amid the variety of challenges in the field of human action recognition, one of the major issues is complex modelling which requires multiple parameters leading to troublesome training which further requires heavy configuration machines for real-time recognition. Therefore, there is a need to develop a simplified method that could result in reduced complexity, without compromising the performance accuracy. In order to address the mentioned issue, this paper proposes an approach that extracts the mean, variance and median from the skeleton joint locations and directly uses them in the classification process. The system used MCAD dataset for extracting 2D skeleton features with the help of OpenPose technique, which is suitable for the extraction of skeleton features from the 2D image instead of 3D image or using a depth sensor. Henceforth, we avoid using either the RGB images or the skeleton images in the recognition process. The method shows a promising performance with an accuracy of 73.8% when tested with MCAD dataset.
{"title":"A Simplified Skeleton Joints Based Approach For Human Action Recognition","authors":"N. Malik, S. Abu-Bakar, U. U. Sheikh","doi":"10.1109/ICSIPA52582.2021.9576770","DOIUrl":"https://doi.org/10.1109/ICSIPA52582.2021.9576770","url":null,"abstract":"The growing technological development in the field of computer vision in general, and human action recognition (HAR), in particular, have attracted increasing number of researchers from various disciplines. Amid the variety of challenges in the field of human action recognition, one of the major issues is complex modelling which requires multiple parameters leading to troublesome training which further requires heavy configuration machines for real-time recognition. Therefore, there is a need to develop a simplified method that could result in reduced complexity, without compromising the performance accuracy. In order to address the mentioned issue, this paper proposes an approach that extracts the mean, variance and median from the skeleton joint locations and directly uses them in the classification process. The system used MCAD dataset for extracting 2D skeleton features with the help of OpenPose technique, which is suitable for the extraction of skeleton features from the 2D image instead of 3D image or using a depth sensor. Henceforth, we avoid using either the RGB images or the skeleton images in the recognition process. The method shows a promising performance with an accuracy of 73.8% when tested with MCAD dataset.","PeriodicalId":326688,"journal":{"name":"2021 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)","volume":"163 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129212884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-13DOI: 10.1109/ICSIPA52582.2021.9576815
Saleh Altakrouri, S. Usman, N. Ahmad, Taghreed Justinia, N. Noor
Image to image translation based on deep learning models is a subject of immense importance in the disciplines of Artificial Intelligence (AI) and Computer Vision (CV). A variety of traditional tasks such as image colorization, image denoising and image inpainting, are categorized as typical paired image translation tasks. In computer vision, super-resolution regeneration is particularly important field. We proposed an improved algorithm to mitigate the issues that arises during the reconstruction using super resolution based on generative adversarial network. It is difficult to train in reconstruction of results. The generated images and the corresponding ground-truth images should share the same fundamental structure in order to output the required resultant images. The shared basic structure between the input and the corresponding output image is not as optimal as assumed for paired image translation tasks, which can greatly impact the generating model performance. The traditional GAN based model used in image-to-image translation tasks used a pre-trained classification network. The pre-trained networks perform well on the classification tasks compared to image translation tasks because they were trained on features that contribute to better classification. We proposed the perceptual loss based efficient net Generative Adversarial Network (PL-E-GAN) for super resolution tasks. Unlike other state of the art image translation models, the PL-E-GAN offers a generic architecture for image super-resolution tasks. PL-E-GAN is constituted of two convolutional neural networks (CNNs) that are the Generative network and Discriminator network Gn and Dn, respectively. PL-E-GAN employed both the generative adversarial loss and perceptual adversarial loss as objective function to the network. The integration of these loss function undergoes an adversarial training and both the networks Gn and Dn trains alternatively. The feasibility and benefits of the PL-E-GAN over several image translation models are shown in studies and tested on many image-to-image translation tasks
{"title":"Image to Image Translation Networks using Perceptual Adversarial Loss Function","authors":"Saleh Altakrouri, S. Usman, N. Ahmad, Taghreed Justinia, N. Noor","doi":"10.1109/ICSIPA52582.2021.9576815","DOIUrl":"https://doi.org/10.1109/ICSIPA52582.2021.9576815","url":null,"abstract":"Image to image translation based on deep learning models is a subject of immense importance in the disciplines of Artificial Intelligence (AI) and Computer Vision (CV). A variety of traditional tasks such as image colorization, image denoising and image inpainting, are categorized as typical paired image translation tasks. In computer vision, super-resolution regeneration is particularly important field. We proposed an improved algorithm to mitigate the issues that arises during the reconstruction using super resolution based on generative adversarial network. It is difficult to train in reconstruction of results. The generated images and the corresponding ground-truth images should share the same fundamental structure in order to output the required resultant images. The shared basic structure between the input and the corresponding output image is not as optimal as assumed for paired image translation tasks, which can greatly impact the generating model performance. The traditional GAN based model used in image-to-image translation tasks used a pre-trained classification network. The pre-trained networks perform well on the classification tasks compared to image translation tasks because they were trained on features that contribute to better classification. We proposed the perceptual loss based efficient net Generative Adversarial Network (PL-E-GAN) for super resolution tasks. Unlike other state of the art image translation models, the PL-E-GAN offers a generic architecture for image super-resolution tasks. PL-E-GAN is constituted of two convolutional neural networks (CNNs) that are the Generative network and Discriminator network Gn and Dn, respectively. PL-E-GAN employed both the generative adversarial loss and perceptual adversarial loss as objective function to the network. The integration of these loss function undergoes an adversarial training and both the networks Gn and Dn trains alternatively. The feasibility and benefits of the PL-E-GAN over several image translation models are shown in studies and tested on many image-to-image translation tasks","PeriodicalId":326688,"journal":{"name":"2021 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124048664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-13DOI: 10.1109/ICSIPA52582.2021.9576799
T. A. Aris, A. Nasir, Z. Mohamed
Image segmentation is the crucial stage in image analysis since it represents the first step towards extracting important information from the image. In summary, this paper presents several clustering approach to obtain fully malaria parasite cells segmented images of Plasmodium Falciparum and Plasmodium Vivax species on thick smear images. Despite k-means is a renowned clustering approach, its effectiveness is still unreliable due to some vulnerabilities which leads to the need of a better approach. To be specific, fast k-means and enhanced k-means are the adaptation of existing k-means. Fast k-means eliminates the requirement to retraining cluster centres, thus reducing the amount of time it takes to train image cluster centres. While, enhanced k-means introduces the idea of variance and a revised edition of the transferring method for clustered members to aid the distribution of data to the appropriate centre throughout the clustering action. Hence, the goal of this study is to explore the efficacy of k-means, fast k-means and enhanced k-means algorithms in order to achieve a clean segmented image with ability to correctly segment whole region of parasites on thick smear images. Practically, about 100 thick blood smear images were analyzed, and the verdict demonstrate that segmentation via fast k-means clustering algorithm has splendid segmentation performance, with an accuracy of 99.91%, sensitivity of 75.75%, and specificity of 99.93%.
{"title":"A Robust Segmentation of Malaria Parasites Detection using Fast k-Means and Enhanced k-Means Clustering Algorithms","authors":"T. A. Aris, A. Nasir, Z. Mohamed","doi":"10.1109/ICSIPA52582.2021.9576799","DOIUrl":"https://doi.org/10.1109/ICSIPA52582.2021.9576799","url":null,"abstract":"Image segmentation is the crucial stage in image analysis since it represents the first step towards extracting important information from the image. In summary, this paper presents several clustering approach to obtain fully malaria parasite cells segmented images of Plasmodium Falciparum and Plasmodium Vivax species on thick smear images. Despite k-means is a renowned clustering approach, its effectiveness is still unreliable due to some vulnerabilities which leads to the need of a better approach. To be specific, fast k-means and enhanced k-means are the adaptation of existing k-means. Fast k-means eliminates the requirement to retraining cluster centres, thus reducing the amount of time it takes to train image cluster centres. While, enhanced k-means introduces the idea of variance and a revised edition of the transferring method for clustered members to aid the distribution of data to the appropriate centre throughout the clustering action. Hence, the goal of this study is to explore the efficacy of k-means, fast k-means and enhanced k-means algorithms in order to achieve a clean segmented image with ability to correctly segment whole region of parasites on thick smear images. Practically, about 100 thick blood smear images were analyzed, and the verdict demonstrate that segmentation via fast k-means clustering algorithm has splendid segmentation performance, with an accuracy of 99.91%, sensitivity of 75.75%, and specificity of 99.93%.","PeriodicalId":326688,"journal":{"name":"2021 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133989189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-13DOI: 10.1109/ICSIPA52582.2021.9576814
Chiang Kang Tan, C. M. Goh, S. Aluwee, Siak Wang Khor, C. M. Tyng
Malaria is a life-threatening disease caused by Plasmodium parasites, and which is still a serious health concern worldwide nowadays. However, it is curable if early diagnosis could be performed. Due to the lack of access to expertise for diagnosis of the disease, often in poorly developed and remote areas, an automated yet accurate diagnostic solution is sought. In Malaysia, there exists 5 types of malaria parasites. As an initial proof of concept, automated segmentation of one of the types, Plasmodium falciparum, on thin blood smear was experimented using our proposed Residual Attention U-net, a type of Convolutional Neural Network that is used in deep learning system. Results showed an accuracy of 0.9687 and precision of 0.9691 when the trained system was used on verified test data.
{"title":"Malaria Parasite Detection using Residual Attention U-Net","authors":"Chiang Kang Tan, C. M. Goh, S. Aluwee, Siak Wang Khor, C. M. Tyng","doi":"10.1109/ICSIPA52582.2021.9576814","DOIUrl":"https://doi.org/10.1109/ICSIPA52582.2021.9576814","url":null,"abstract":"Malaria is a life-threatening disease caused by Plasmodium parasites, and which is still a serious health concern worldwide nowadays. However, it is curable if early diagnosis could be performed. Due to the lack of access to expertise for diagnosis of the disease, often in poorly developed and remote areas, an automated yet accurate diagnostic solution is sought. In Malaysia, there exists 5 types of malaria parasites. As an initial proof of concept, automated segmentation of one of the types, Plasmodium falciparum, on thin blood smear was experimented using our proposed Residual Attention U-net, a type of Convolutional Neural Network that is used in deep learning system. Results showed an accuracy of 0.9687 and precision of 0.9691 when the trained system was used on verified test data.","PeriodicalId":326688,"journal":{"name":"2021 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115513011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-13DOI: 10.1109/ICSIPA52582.2021.9576796
Sui Lyn Hor, H. A. Karim, Mohd Haris Lye Abdullah, Nouar Aldahoul, Sarina Mansor, M. F. A. Fauzi, John See, A. Wazir
Pornographic and nudity content detection in videos is gaining importance as Internet grows to become a source for exposure to such content. Recent literature involved pornography recognition using deep learning techniques such as convolutional neural network, object detection models and recurrent neural networks, as well as combinations of these methods. In this paper, the effectiveness of three pretrained object detection models (YOLOv3, EfficientDet-d7x and Faster R-CNN with ResNet50 as backbone) were tested to compare their performance in detecting pornographic contents. Video frames consisting of real humans from the public NPDI dataset were utilised to form four categories of target content (female breast, female lower body, male lower body and nude human) by cropping the specific image regions and augmenting them. Results demonstrated that COCO-pretrained EfficientDet-d7x model achieved the highest overall detection accuracy of 75.61%. Interestingly, human detection of YOLOv3 may be dependent on image quality and/or presence of external body parts that belong only to humans.
{"title":"An Evaluation of State-of-the-Art Object Detectors for Pornography Detection","authors":"Sui Lyn Hor, H. A. Karim, Mohd Haris Lye Abdullah, Nouar Aldahoul, Sarina Mansor, M. F. A. Fauzi, John See, A. Wazir","doi":"10.1109/ICSIPA52582.2021.9576796","DOIUrl":"https://doi.org/10.1109/ICSIPA52582.2021.9576796","url":null,"abstract":"Pornographic and nudity content detection in videos is gaining importance as Internet grows to become a source for exposure to such content. Recent literature involved pornography recognition using deep learning techniques such as convolutional neural network, object detection models and recurrent neural networks, as well as combinations of these methods. In this paper, the effectiveness of three pretrained object detection models (YOLOv3, EfficientDet-d7x and Faster R-CNN with ResNet50 as backbone) were tested to compare their performance in detecting pornographic contents. Video frames consisting of real humans from the public NPDI dataset were utilised to form four categories of target content (female breast, female lower body, male lower body and nude human) by cropping the specific image regions and augmenting them. Results demonstrated that COCO-pretrained EfficientDet-d7x model achieved the highest overall detection accuracy of 75.61%. Interestingly, human detection of YOLOv3 may be dependent on image quality and/or presence of external body parts that belong only to humans.","PeriodicalId":326688,"journal":{"name":"2021 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129654301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-13DOI: 10.1109/ICSIPA52582.2021.9576779
B. Sweely, Ziming Liu, T. Wyatt, Katherine Newnam, H. Qi, Xiaopeng Zhao
The ability to measure heart rate (HR) noninvasively is important in both a hospital and a home setting due to the role this vital sign plays in health and wellbeing. Despite great advancements and improvements in recent years, safety remains a challenging issue in the neonatal intensive care unit (NICU). Traditional sensors found in the NICU incubators require adhesives and wires. The objective of this article was to develop a wireless, noncontact monitoring system that measures multiple physiological parameters in human faces from a distance using a camera and a single board computer. Experiments were conducted to estimate heart rate. The current practices of measuring HR involve collecting electrocardiogram (ECG) signals from adhesive electrodes placed on various parts of the body or using a pulse oximeter (PO) typically placed on the ear lobe or finger. We developed a monitoring system and compared its results to that from a PO. The monitoring system is low-cost at less than $200. The system has not been shown to exist in literature thus making it a novel implementation. In conclusion, we were able to estimate HR from a distance using a camera-based system. The developed system may have many useful applications, in both clinical and home health settings.
{"title":"Camera-Based Remote Photoplethysmography for Physiological Monitoring in Neonatal Intensive Care","authors":"B. Sweely, Ziming Liu, T. Wyatt, Katherine Newnam, H. Qi, Xiaopeng Zhao","doi":"10.1109/ICSIPA52582.2021.9576779","DOIUrl":"https://doi.org/10.1109/ICSIPA52582.2021.9576779","url":null,"abstract":"The ability to measure heart rate (HR) noninvasively is important in both a hospital and a home setting due to the role this vital sign plays in health and wellbeing. Despite great advancements and improvements in recent years, safety remains a challenging issue in the neonatal intensive care unit (NICU). Traditional sensors found in the NICU incubators require adhesives and wires. The objective of this article was to develop a wireless, noncontact monitoring system that measures multiple physiological parameters in human faces from a distance using a camera and a single board computer. Experiments were conducted to estimate heart rate. The current practices of measuring HR involve collecting electrocardiogram (ECG) signals from adhesive electrodes placed on various parts of the body or using a pulse oximeter (PO) typically placed on the ear lobe or finger. We developed a monitoring system and compared its results to that from a PO. The monitoring system is low-cost at less than $200. The system has not been shown to exist in literature thus making it a novel implementation. In conclusion, we were able to estimate HR from a distance using a camera-based system. The developed system may have many useful applications, in both clinical and home health settings.","PeriodicalId":326688,"journal":{"name":"2021 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130183425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-13DOI: 10.1109/ICSIPA52582.2021.9576791
A. Sii, Simying Ong, M. Wee, Koksheik Wong
Nowadays, most images are stored and transmitted in certain compressed forms based on some coding standards. Usually, the image is transformed, e.g., by discrete cosine transformation, and hence coefficient makes up a large proportion of the compressed bit stream. However, these coefficients might be corrupted or completely lost due to transmission errors or damages incurred on the storage device. Therefore, in this work, we aim to improve a conventional coefficient recovery method. Specifically, instead of using the Otsu’s method adopted in the conventional method, an adaptive segmentation method is utilized to split the image into background and foreground regions, forming non-overlapping patches. Missing coefficients in these non-overlapping patches are recovered independently. In addition, a rewritable data embedding method is put forward by judiciously selecting patches to embed data. Experiments are carried to verify the basic performance of the proposed methods. In the best-case scenario, an improvement of 31.32% in terms of CPU time is observed, while up to 7149 bits of external data can be embedded into the image.
{"title":"Rewritable Data Embedding in Image based on Improved Coefficient Recovery","authors":"A. Sii, Simying Ong, M. Wee, Koksheik Wong","doi":"10.1109/ICSIPA52582.2021.9576791","DOIUrl":"https://doi.org/10.1109/ICSIPA52582.2021.9576791","url":null,"abstract":"Nowadays, most images are stored and transmitted in certain compressed forms based on some coding standards. Usually, the image is transformed, e.g., by discrete cosine transformation, and hence coefficient makes up a large proportion of the compressed bit stream. However, these coefficients might be corrupted or completely lost due to transmission errors or damages incurred on the storage device. Therefore, in this work, we aim to improve a conventional coefficient recovery method. Specifically, instead of using the Otsu’s method adopted in the conventional method, an adaptive segmentation method is utilized to split the image into background and foreground regions, forming non-overlapping patches. Missing coefficients in these non-overlapping patches are recovered independently. In addition, a rewritable data embedding method is put forward by judiciously selecting patches to embed data. Experiments are carried to verify the basic performance of the proposed methods. In the best-case scenario, an improvement of 31.32% in terms of CPU time is observed, while up to 7149 bits of external data can be embedded into the image.","PeriodicalId":326688,"journal":{"name":"2021 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121328490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-13DOI: 10.1109/ICSIPA52582.2021.9576811
Ng Wei Bhing, P. Sebastian
With the recent outbreak and rapid transmission of COVID-19, medical personal protective equipment (PPE) detection has seen significant importance in the domain of computer vision and deep learning. The need for the public to wear face masks in public is ever increasing. Research has shown that proper usage of face masks and PPE can significantly reduce transmission of COVID-19. In this paper, a computer vision with a deep-learning approach is proposed to develop a medical PPE detection algorithm with real-time video feed capability. This paper aims to use the YOLO object detection algorithm to perform one-stage object detection and classification to identify the three different states of face mask usage and detect the presence of medical PPE. At present, there is no publicly available PPE dataset for object detection. Thus, this paper aims to establish a medical PPE dataset for future applications and development. The YOLO model achieved 84.5% accuracy on our established PPE dataset comprising seven classes in more than 1300 images, the largest dataset for evaluating medical PPE detection in the wild.
{"title":"Personal Protective Equipment Detection with Live Camera","authors":"Ng Wei Bhing, P. Sebastian","doi":"10.1109/ICSIPA52582.2021.9576811","DOIUrl":"https://doi.org/10.1109/ICSIPA52582.2021.9576811","url":null,"abstract":"With the recent outbreak and rapid transmission of COVID-19, medical personal protective equipment (PPE) detection has seen significant importance in the domain of computer vision and deep learning. The need for the public to wear face masks in public is ever increasing. Research has shown that proper usage of face masks and PPE can significantly reduce transmission of COVID-19. In this paper, a computer vision with a deep-learning approach is proposed to develop a medical PPE detection algorithm with real-time video feed capability. This paper aims to use the YOLO object detection algorithm to perform one-stage object detection and classification to identify the three different states of face mask usage and detect the presence of medical PPE. At present, there is no publicly available PPE dataset for object detection. Thus, this paper aims to establish a medical PPE dataset for future applications and development. The YOLO model achieved 84.5% accuracy on our established PPE dataset comprising seven classes in more than 1300 images, the largest dataset for evaluating medical PPE detection in the wild.","PeriodicalId":326688,"journal":{"name":"2021 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114257263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-13DOI: 10.1109/ICSIPA52582.2021.9576789
N. Jamal, N. Fuad, Shahnoor Shanta, M. N. A. Sha'abani
Deep Neural Network (DNN)-based mask estimation approach is an emerging algorithm in monaural speech enhancement. It is used to enhance speech signals from the noisy background by calculating either speech or noise dominant in a particular frame of the noisy speech signal. It can construct complex models for nonlinear processing. However, the limitation of the DNN-based mask algorithm is a generalization of the targeted population. Past research works focused on their target dataset because of time consumption for the audio recording session. Thus, in this work, different recording conditions were used to study the performance of the DNN-based mask estimation approach. The findings revealed that different language test dataset, as well as different conditions, may not give large impact in speech enhancement performance since the algorithm only learn the noise information. But, the performance of speech enhancement is promising when the trained model has been designed properly, especially given the less sample variations in the input dataset involved during the training session.
{"title":"Monaural Speech Enhancement using Deep Neural Network with Cross-Speech Dataset","authors":"N. Jamal, N. Fuad, Shahnoor Shanta, M. N. A. Sha'abani","doi":"10.1109/ICSIPA52582.2021.9576789","DOIUrl":"https://doi.org/10.1109/ICSIPA52582.2021.9576789","url":null,"abstract":"Deep Neural Network (DNN)-based mask estimation approach is an emerging algorithm in monaural speech enhancement. It is used to enhance speech signals from the noisy background by calculating either speech or noise dominant in a particular frame of the noisy speech signal. It can construct complex models for nonlinear processing. However, the limitation of the DNN-based mask algorithm is a generalization of the targeted population. Past research works focused on their target dataset because of time consumption for the audio recording session. Thus, in this work, different recording conditions were used to study the performance of the DNN-based mask estimation approach. The findings revealed that different language test dataset, as well as different conditions, may not give large impact in speech enhancement performance since the algorithm only learn the noise information. But, the performance of speech enhancement is promising when the trained model has been designed properly, especially given the less sample variations in the input dataset involved during the training session.","PeriodicalId":326688,"journal":{"name":"2021 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123241541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}