Research into deep learning models suitable for small data sets is still in an immature state since it has received less attention from the machine learning community. Identifying a snake species using images, is a classification problem which has a number of medical, educational and safety-related importance but no large data set. Due to the lack of large data sets and difficulty in collecting such data set, no one has applied deep learning algorithms, to solve this problem. In this paper, we explored the applicability of single shot learning techniques along with deep neural networks to solve the snake image classification problem. By using a convolutional architecture, we were able to achieve strong results and did a comparative analysis with human results.
{"title":"Snake Image Classification using Siamese Networks","authors":"C. Abeysinghe, A. Welivita, I. Perera","doi":"10.1145/3338472.3338476","DOIUrl":"https://doi.org/10.1145/3338472.3338476","url":null,"abstract":"Research into deep learning models suitable for small data sets is still in an immature state since it has received less attention from the machine learning community. Identifying a snake species using images, is a classification problem which has a number of medical, educational and safety-related importance but no large data set. Due to the lack of large data sets and difficulty in collecting such data set, no one has applied deep learning algorithms, to solve this problem. In this paper, we explored the applicability of single shot learning techniques along with deep neural networks to solve the snake image classification problem. By using a convolutional architecture, we were able to achieve strong results and did a comparative analysis with human results.","PeriodicalId":142573,"journal":{"name":"Proceedings of the 3rd International Conference on Graphics and Signal Processing","volume":"379 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114889673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In many computer vision tasks, the available original data is in matrix form. Traditional methods often convert a matrix into a vector before processing them. This kind of methods not only ignore the location information of matrix elements, but have to deal with the high-dimensional vectors. Two-dimensional linear discriminant analysis (2DLDA) is a widely used approach in image recognition which works with data matrix directly and computes efficiently. When mapping the original data onto low-dimensional space, however, the two projection matrices of 2DLDA cannot remove features with little or no information, resulting in redundant features in the projected space. To address the problem, in this paper we propose an algorithm named two-dimensional discriminative feature selection (2DDFS) for bidirectional direct feature selection on matrix data directly. Based on 2DLDA, it employs the l2,1 norm to regularize the two transformation matrices when learning them. To obtain the optimal solutions, we design an effective optimization algorithm. Then we conduct experiments on two image databases to evaluate the performance of the proposed method, by comparing with other related methods. The promising results demonstrate the effectiveness of our method.
{"title":"Two-Dimensional Discriminative Feature Selection for Image Recognition","authors":"Yong Zhao, Yongjie Chu, Lindu Zhao","doi":"10.1145/3338472.3338478","DOIUrl":"https://doi.org/10.1145/3338472.3338478","url":null,"abstract":"In many computer vision tasks, the available original data is in matrix form. Traditional methods often convert a matrix into a vector before processing them. This kind of methods not only ignore the location information of matrix elements, but have to deal with the high-dimensional vectors. Two-dimensional linear discriminant analysis (2DLDA) is a widely used approach in image recognition which works with data matrix directly and computes efficiently. When mapping the original data onto low-dimensional space, however, the two projection matrices of 2DLDA cannot remove features with little or no information, resulting in redundant features in the projected space. To address the problem, in this paper we propose an algorithm named two-dimensional discriminative feature selection (2DDFS) for bidirectional direct feature selection on matrix data directly. Based on 2DLDA, it employs the l2,1 norm to regularize the two transformation matrices when learning them. To obtain the optimal solutions, we design an effective optimization algorithm. Then we conduct experiments on two image databases to evaluate the performance of the proposed method, by comparing with other related methods. The promising results demonstrate the effectiveness of our method.","PeriodicalId":142573,"journal":{"name":"Proceedings of the 3rd International Conference on Graphics and Signal Processing","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125863074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jeffrey Johannes Austen Bongku, Yohannes Kurniawan
The goal of this research is to identify, analyze and design an effective knowledge management system for XYZ company in Indonesia. As the growing of consultant company in Indonesia, XYZ company need to keep the valuable knowledge, so they can use it in the future. The analyze and design method used by this paper are: Wiig Knowledge Management Cycle, Nonaka and Takeuchi SECI Model, and object-oriented analysis and design. And for the data collection, the authors did the interview with the enterprise solution manager and observation in the company to capture company daily activities. The results are the knowledge management system implemented to capture, manage and use the company knowledge work more effective. We can conclude that the knowledge management system can help the company to maintain their knowledge.
{"title":"Designing the Knowledge Management System: A Case Study Approach in IT Consultant Company","authors":"Jeffrey Johannes Austen Bongku, Yohannes Kurniawan","doi":"10.1145/3338472.3338473","DOIUrl":"https://doi.org/10.1145/3338472.3338473","url":null,"abstract":"The goal of this research is to identify, analyze and design an effective knowledge management system for XYZ company in Indonesia. As the growing of consultant company in Indonesia, XYZ company need to keep the valuable knowledge, so they can use it in the future. The analyze and design method used by this paper are: Wiig Knowledge Management Cycle, Nonaka and Takeuchi SECI Model, and object-oriented analysis and design. And for the data collection, the authors did the interview with the enterprise solution manager and observation in the company to capture company daily activities. The results are the knowledge management system implemented to capture, manage and use the company knowledge work more effective. We can conclude that the knowledge management system can help the company to maintain their knowledge.","PeriodicalId":142573,"journal":{"name":"Proceedings of the 3rd International Conference on Graphics and Signal Processing","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126158738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Khryashchev, O. Stepanova, A. Lebedev, S. Kashin, R. Kuvaev
Computer-aided diagnosis of cancer based on endoscopic image analysis is a promising area in the field of computer vision and machine learning. Convolutional neural networks are one of the most popular approaches in endoscopic image analysis. This paper presents the algorithm of pathology detection in endoscopic images of gastric lesions based on convolutional neural network. Training and testing of the algorithm was carried out on the NVIDIA DGX-1 supercomputer using endoscopic images from the test base, assembled together with the Yaroslavl Regional Cancer Hospital. As a result of experiments, the mAP metric was calculated and the value was 0.875, which is a high result for the task of object detection in images.
{"title":"Deep Learning for Gastric Pathology Detection in Endoscopic Images","authors":"V. Khryashchev, O. Stepanova, A. Lebedev, S. Kashin, R. Kuvaev","doi":"10.1145/3338472.3338492","DOIUrl":"https://doi.org/10.1145/3338472.3338492","url":null,"abstract":"Computer-aided diagnosis of cancer based on endoscopic image analysis is a promising area in the field of computer vision and machine learning. Convolutional neural networks are one of the most popular approaches in endoscopic image analysis. This paper presents the algorithm of pathology detection in endoscopic images of gastric lesions based on convolutional neural network. Training and testing of the algorithm was carried out on the NVIDIA DGX-1 supercomputer using endoscopic images from the test base, assembled together with the Yaroslavl Regional Cancer Hospital. As a result of experiments, the mAP metric was calculated and the value was 0.875, which is a high result for the task of object detection in images.","PeriodicalId":142573,"journal":{"name":"Proceedings of the 3rd International Conference on Graphics and Signal Processing","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126362333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Methods for 3D human pose estimation from monocular images based on convolutional neural networks require a large number of training data of well annotated pose-image pairs. Although many 3D human pose datasets have been created, more training data with accurate 3D annotation is still in shortage for the training of neural networks. Recently in image generation area, techniques based on generative adversarial network show potential to generate realistic image data from user input pose data. In this paper, we propose a neural network-based method for 3D human pose dataset augmentation. In our method, we use an autoencoder to learn a latent representation of existing pose data and to produce new poses of similar style. Human pose data generated by autoencoder is input into a generative adversarial network to synthesize mask images with an actor performing the same style, which can be transformed to colored images at the end. For the evaluation of the proposed method, we augment it with a small amount of labeled data. The experimental analysis shows that our method can generate more valid labeled data from small labeled data, which can boost the training of pose estimation using neural networks.
{"title":"3D Human Pose Dataset Augmentation Using Generative Adversarial Network","authors":"Huyuan ShangGuan, R. Mukundan","doi":"10.1145/3338472.3338475","DOIUrl":"https://doi.org/10.1145/3338472.3338475","url":null,"abstract":"Methods for 3D human pose estimation from monocular images based on convolutional neural networks require a large number of training data of well annotated pose-image pairs. Although many 3D human pose datasets have been created, more training data with accurate 3D annotation is still in shortage for the training of neural networks. Recently in image generation area, techniques based on generative adversarial network show potential to generate realistic image data from user input pose data. In this paper, we propose a neural network-based method for 3D human pose dataset augmentation. In our method, we use an autoencoder to learn a latent representation of existing pose data and to produce new poses of similar style. Human pose data generated by autoencoder is input into a generative adversarial network to synthesize mask images with an actor performing the same style, which can be transformed to colored images at the end. For the evaluation of the proposed method, we augment it with a small amount of labeled data. The experimental analysis shows that our method can generate more valid labeled data from small labeled data, which can boost the training of pose estimation using neural networks.","PeriodicalId":142573,"journal":{"name":"Proceedings of the 3rd International Conference on Graphics and Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126213431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pedestrian behavior understanding and identification in surveillance scenarios has attraction a tremendous amount of attention over the past many years. An integral part of this problem involves identifying various human visual attributes in the scene. Over the years, researcher have proposed various solutions and have explored various features. However, they have focused on either engineered features or simple RGB images. In this paper, we explore the problem of crowd at- tribute recognition using RGB (Red, Green, Blue), HSV (Hue, Saturation, Value) and L*a*b* color models and propose a 3-branch Siamese network to solve the problem. We present a unique approach of using these three color models and fine- tune a pre-trained VGG-19 network for our task. We perform extensive experimentation on the most challenging public PETA dataset, which is by far the largest and the most diverse dataset of its kind. We show an improvement over the state of the art work.
{"title":"A Deep Learning Based Multi-color Space Approach for Pedestrian Attribute Recognition","authors":"Imran N. Junejo","doi":"10.1145/3338472.3338493","DOIUrl":"https://doi.org/10.1145/3338472.3338493","url":null,"abstract":"Pedestrian behavior understanding and identification in surveillance scenarios has attraction a tremendous amount of attention over the past many years. An integral part of this problem involves identifying various human visual attributes in the scene. Over the years, researcher have proposed various solutions and have explored various features. However, they have focused on either engineered features or simple RGB images. In this paper, we explore the problem of crowd at- tribute recognition using RGB (Red, Green, Blue), HSV (Hue, Saturation, Value) and L*a*b* color models and propose a 3-branch Siamese network to solve the problem. We present a unique approach of using these three color models and fine- tune a pre-trained VGG-19 network for our task. We perform extensive experimentation on the most challenging public PETA dataset, which is by far the largest and the most diverse dataset of its kind. We show an improvement over the state of the art work.","PeriodicalId":142573,"journal":{"name":"Proceedings of the 3rd International Conference on Graphics and Signal Processing","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131466026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Three-dimensional (3-D) shape information acquisition for reflective steel plate is a very difficult challenge. In this work, a measurement framework based on three-step-phase-shifting and binocular stereo vision is proposed to obtain 3D shape measurement for reflective steel plate. In the proposed measurement framework, three-step-phase-shifting method is adopted to obtain the phase. Then, 3D points can be obtained by binocular stereo vision method. To verify the proposed measurement framework, a 3D measurement hardware system is developed, which consists of DLP Lightcrafter 4500 and two USB cameras. For reflective steel plate sample object, experimental results confirmed that the proposed framework could be applied to obtain 3D shape information of the reflective steel plate.
{"title":"Three-Dimensional Shape Information Acquisition Using Binocular Stereo Vision for Reflective Steel Plate","authors":"Xin Wen, Kechen Song","doi":"10.1145/3338472.3338491","DOIUrl":"https://doi.org/10.1145/3338472.3338491","url":null,"abstract":"Three-dimensional (3-D) shape information acquisition for reflective steel plate is a very difficult challenge. In this work, a measurement framework based on three-step-phase-shifting and binocular stereo vision is proposed to obtain 3D shape measurement for reflective steel plate. In the proposed measurement framework, three-step-phase-shifting method is adopted to obtain the phase. Then, 3D points can be obtained by binocular stereo vision method. To verify the proposed measurement framework, a 3D measurement hardware system is developed, which consists of DLP Lightcrafter 4500 and two USB cameras. For reflective steel plate sample object, experimental results confirmed that the proposed framework could be applied to obtain 3D shape information of the reflective steel plate.","PeriodicalId":142573,"journal":{"name":"Proceedings of the 3rd International Conference on Graphics and Signal Processing","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132938087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Atrial fibrillation is a kind of common chronic arrhythmia. The incidence of atrial fibrillation increases with aging. Therefore, especially for the elderly, accurate detection of atrial fibrillation can effectively prevent stroke. In this paper, we propose a strategy that combines the heartbeat model based on deep learning with statistical heart rate features, using a classifier such as a multi-layer perceptron to identify atrial fibrillation rhythm. It is worth noticing that the heartbeat model that we used to extract features for the classification of heartbeat. Through this transfer learning method, the features of each heartbeat in the heart rhythm are extracted one by one for the identification task of atrial fibrillation. We evaluated the proposed method on the MIT-BIH AF dataset. The experimental result shows that under the attention mechanism, the accuracy of the proposed method is 98.91%, the sensitivity is 99.41% and the specificity is 98.50%, which outperforms most of the current algorithms.
{"title":"Atrial Fibrillation Detection Based on the Combination of Depth and Statistical Features of ECG","authors":"Mingchun Li, Gary He, Baofeng Zhu","doi":"10.1145/3338472.3338485","DOIUrl":"https://doi.org/10.1145/3338472.3338485","url":null,"abstract":"Atrial fibrillation is a kind of common chronic arrhythmia. The incidence of atrial fibrillation increases with aging. Therefore, especially for the elderly, accurate detection of atrial fibrillation can effectively prevent stroke. In this paper, we propose a strategy that combines the heartbeat model based on deep learning with statistical heart rate features, using a classifier such as a multi-layer perceptron to identify atrial fibrillation rhythm. It is worth noticing that the heartbeat model that we used to extract features for the classification of heartbeat. Through this transfer learning method, the features of each heartbeat in the heart rhythm are extracted one by one for the identification task of atrial fibrillation. We evaluated the proposed method on the MIT-BIH AF dataset. The experimental result shows that under the attention mechanism, the accuracy of the proposed method is 98.91%, the sensitivity is 99.41% and the specificity is 98.50%, which outperforms most of the current algorithms.","PeriodicalId":142573,"journal":{"name":"Proceedings of the 3rd International Conference on Graphics and Signal Processing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121036458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
People of hazy or blurred vision or the elderly people finds it way too challenging just to identify the pills if they are out of the box or packet. And various pills of various shapes, size, texture, color comes with a diverse set of medicinal components. It creates confusion among pills of same color and shape to identify based on a specific texture. For visually impaired people, even if they configure the shape of the pill, the color information and the texts imprinted on the pill remains unknown to them. In this paper, the splitting processes of a dataset according to the number of colors and the texts imprinted on the pills, will be described. Initially the color information were extracted by segmenting pill region from pill image and then some statistical measurements i.e. Kurtosis and skewness, are calculated for probability distributions generated from the image histograms. Thus figuring out the how many colors the pill surface consists of. For the text recognition, the probable text region is detected for an error free text detection. For high quality image data, the reference images from NLM RxIMAGE database has been utilized. The overall accuracy of the proposed system for number of color determination is 95.6% and text recognition accuracy is 81.32%.
{"title":"Medicine Recognition from Colors and Text","authors":"Tanjina Piash, Zakir Hossan, Ashraful Amin","doi":"10.1145/3338472.3338484","DOIUrl":"https://doi.org/10.1145/3338472.3338484","url":null,"abstract":"People of hazy or blurred vision or the elderly people finds it way too challenging just to identify the pills if they are out of the box or packet. And various pills of various shapes, size, texture, color comes with a diverse set of medicinal components. It creates confusion among pills of same color and shape to identify based on a specific texture. For visually impaired people, even if they configure the shape of the pill, the color information and the texts imprinted on the pill remains unknown to them. In this paper, the splitting processes of a dataset according to the number of colors and the texts imprinted on the pills, will be described. Initially the color information were extracted by segmenting pill region from pill image and then some statistical measurements i.e. Kurtosis and skewness, are calculated for probability distributions generated from the image histograms. Thus figuring out the how many colors the pill surface consists of. For the text recognition, the probable text region is detected for an error free text detection. For high quality image data, the reference images from NLM RxIMAGE database has been utilized. The overall accuracy of the proposed system for number of color determination is 95.6% and text recognition accuracy is 81.32%.","PeriodicalId":142573,"journal":{"name":"Proceedings of the 3rd International Conference on Graphics and Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129456828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Forming a broad null around the direction of interferences and keeping the main beam toward the desired direction is desirable to suppress interferences in the multipath environment. In this paper, a null broadening beamforming is proposed to enhance the iterative optimal beamformer. Placing broaden null is achieved through the projection and diagonal loading approach. MUltiple SIgnal Classification (MUSIC) is used to estimate the directions of arrival and angular spread. The null width is the estimated angular spread. A criteria based on the convergence of the output SINR is proposed to terminate the iteration. Simulation results are presented for illustration that the proposed iterative broad null beamformer is capable of steering broad nulls to interference signals, consequently, increasing the output signal to interference plus noise ration (SINR) and decreasing the number of iterations.
{"title":"Iterative Broad Null Steering","authors":"Burak Ors, R. Suleesathira","doi":"10.1145/3338472.3338487","DOIUrl":"https://doi.org/10.1145/3338472.3338487","url":null,"abstract":"Forming a broad null around the direction of interferences and keeping the main beam toward the desired direction is desirable to suppress interferences in the multipath environment. In this paper, a null broadening beamforming is proposed to enhance the iterative optimal beamformer. Placing broaden null is achieved through the projection and diagonal loading approach. MUltiple SIgnal Classification (MUSIC) is used to estimate the directions of arrival and angular spread. The null width is the estimated angular spread. A criteria based on the convergence of the output SINR is proposed to terminate the iteration. Simulation results are presented for illustration that the proposed iterative broad null beamformer is capable of steering broad nulls to interference signals, consequently, increasing the output signal to interference plus noise ration (SINR) and decreasing the number of iterations.","PeriodicalId":142573,"journal":{"name":"Proceedings of the 3rd International Conference on Graphics and Signal Processing","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117026683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}