A. Fedorov, I. Ivashnev, V. Afanasiev, Valery Krivtsov, Aleksandr Zatolokin, S. Zyrin
This paper presents a method for reconstruction of 3D-model using single view image of the object. To do this, we need to detect the edges of the object on the image and get a small markup from the user. In addition, our method allows us to obtain the texture of the 3D model.
{"title":"Interactive reconstruction of the 3D-models using single-view images and user markup","authors":"A. Fedorov, I. Ivashnev, V. Afanasiev, Valery Krivtsov, Aleksandr Zatolokin, S. Zyrin","doi":"10.1145/3313950.3313953","DOIUrl":"https://doi.org/10.1145/3313950.3313953","url":null,"abstract":"This paper presents a method for reconstruction of 3D-model using single view image of the object. To do this, we need to detect the edges of the object on the image and get a small markup from the user. In addition, our method allows us to obtain the texture of the 3D model.","PeriodicalId":392037,"journal":{"name":"Proceedings of the 2nd International Conference on Image and Graphics Processing","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115232117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Speech recognition technology facilitates student learning. It has potential benefits for students with physical disabilities and the technology has been implemented in the classroom over the years in order to learn in a more efficient way. This study provides deaf students with various methods of studying, learning, and remembering new information. Aside from speech-to-text, the developed system is also provided with speech-to-visual approach, which represents information associated with objects. Also, Filipino Sign Language was used utilize as an alternative way of presenting Statistics lessons included in K-12 curriculum. Practical real world approach in presenting Statistics lessons are used to enhance delivery in face to face class se-up or in self-phasing learning. These multiple learning strategies were combined together to have a balance approach for a greater practice and recall will be more successful, especially for the target users. From the initial results, this research showed a significant advantage of using speech recognition and Filipino Sign Language in learning basic Statistics lessons compared from the traditional method.
{"title":"Speech recognition and Filipino sign language E-tutor system: an assistive multimodal learning approach","authors":"M. Samonte","doi":"10.1145/3313950.3313970","DOIUrl":"https://doi.org/10.1145/3313950.3313970","url":null,"abstract":"Speech recognition technology facilitates student learning. It has potential benefits for students with physical disabilities and the technology has been implemented in the classroom over the years in order to learn in a more efficient way. This study provides deaf students with various methods of studying, learning, and remembering new information. Aside from speech-to-text, the developed system is also provided with speech-to-visual approach, which represents information associated with objects. Also, Filipino Sign Language was used utilize as an alternative way of presenting Statistics lessons included in K-12 curriculum. Practical real world approach in presenting Statistics lessons are used to enhance delivery in face to face class se-up or in self-phasing learning. These multiple learning strategies were combined together to have a balance approach for a greater practice and recall will be more successful, especially for the target users. From the initial results, this research showed a significant advantage of using speech recognition and Filipino Sign Language in learning basic Statistics lessons compared from the traditional method.","PeriodicalId":392037,"journal":{"name":"Proceedings of the 2nd International Conference on Image and Graphics Processing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114479357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The purpose of this paper is to develop a convolution neural network based model for Korean sign language recognition. For this purpose, sign language videos were collected for 10 selected words of Korean sign language and these videos were converted into images to have 9 frames. The images with 9 frames were used as input data for the convolution neural network based model developed in this study. In order to develop the model for Korean sign language recognition, experiments for determining the number of convolution layers was first performed. Second, experiments for the pooling which intentionally reduces the features of the feature map was performed. Third, we conducted an experiment to reduce over fitting in the model learning process. Based on the experiments, we have developed a convolution neural network based model for Korean sign language recognition. The accuracy of the developed model was about 84.5% for the 10 selected Korean sign words.
{"title":"Korean sign language recognition based on image and convolution neural network","authors":"Hyojoo Shin, Woo-Je Kim, Kyoung-ae Jang","doi":"10.1145/3313950.3313967","DOIUrl":"https://doi.org/10.1145/3313950.3313967","url":null,"abstract":"The purpose of this paper is to develop a convolution neural network based model for Korean sign language recognition. For this purpose, sign language videos were collected for 10 selected words of Korean sign language and these videos were converted into images to have 9 frames. The images with 9 frames were used as input data for the convolution neural network based model developed in this study. In order to develop the model for Korean sign language recognition, experiments for determining the number of convolution layers was first performed. Second, experiments for the pooling which intentionally reduces the features of the feature map was performed. Third, we conducted an experiment to reduce over fitting in the model learning process. Based on the experiments, we have developed a convolution neural network based model for Korean sign language recognition. The accuracy of the developed model was about 84.5% for the 10 selected Korean sign words.","PeriodicalId":392037,"journal":{"name":"Proceedings of the 2nd International Conference on Image and Graphics Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128397807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nonthaburi is a city for learning with history, local wisdoms, art and culture, and lifestyle. It is necessary to establish a learning resource to create the feeling of bonding and protection of the city. However, the problem is the visitor needs to travel to the learning resource. The use of technology enables an easy and convenient access to the learning resource, leading to preservation of the culture. This research has an objective to develop the Virtual Museum of Nonthaburi by interviewing with curators and/or staff of Nonthaburi Museum and experts in virtual learning resources. It studies the using result of the Virtual Museum of Nonthaburi from general visitors. Data was analyzed by using mean, standard deviation and content analysis. Research results: 1) The model of the Virtual Museum of Nonthaburi consists of 6 components: 1) information, 2) media and tools used, 3) interaction, 4) design, 5) decision support system, and 6) supporting factors. The overall quality result found that the quality was at the highest level )X=4.51, S.D = .0.57( and 2) Using results of the Virtual Museum of Nonthaburi found that most visitors viewed that Nonthaburi had a lot of tourist attractions with historical significance, cultural value, and living aspect. Especially, pottery which presents the development from the past to present is beautiful local artwork that should be preserved. The overall satisfaction result found that the quality was at the high level (X=4.42, S.D = .0.65).
{"title":"Development of the virtual museum of nonthaburi","authors":"Kemmanat Mingsiritham, Gan Chanyawudhiwan","doi":"10.1145/3313950.3313972","DOIUrl":"https://doi.org/10.1145/3313950.3313972","url":null,"abstract":"Nonthaburi is a city for learning with history, local wisdoms, art and culture, and lifestyle. It is necessary to establish a learning resource to create the feeling of bonding and protection of the city. However, the problem is the visitor needs to travel to the learning resource. The use of technology enables an easy and convenient access to the learning resource, leading to preservation of the culture. This research has an objective to develop the Virtual Museum of Nonthaburi by interviewing with curators and/or staff of Nonthaburi Museum and experts in virtual learning resources. It studies the using result of the Virtual Museum of Nonthaburi from general visitors. Data was analyzed by using mean, standard deviation and content analysis. Research results: 1) The model of the Virtual Museum of Nonthaburi consists of 6 components: 1) information, 2) media and tools used, 3) interaction, 4) design, 5) decision support system, and 6) supporting factors. The overall quality result found that the quality was at the highest level )X=4.51, S.D = .0.57( and 2) Using results of the Virtual Museum of Nonthaburi found that most visitors viewed that Nonthaburi had a lot of tourist attractions with historical significance, cultural value, and living aspect. Especially, pottery which presents the development from the past to present is beautiful local artwork that should be preserved. The overall satisfaction result found that the quality was at the high level (X=4.42, S.D = .0.65).","PeriodicalId":392037,"journal":{"name":"Proceedings of the 2nd International Conference on Image and Graphics Processing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128191048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The paper proposes a fusion framework for getting more information from multi-dimensional polarization image. Overall, the challenge lies on overcoming the information loss arising from reflection/irradiation interference of polarizers, inherent defects of intensity images and improper distribution of fusion weights in most fusion processes. So we introduce a modified front polarizer system model, Tiansi mask operator and comprehensive weights. We start our methodology with the modified front polarizer system model, aiming to correct the polarization information. Then, we make use of the high- frequency information enhancement effect and low frequency information preservation ability of Tiansi operator, combined with adaptive histogram equalization (AHE) to achieve intensity enhancement. Finally, the contrast, saliency and exposedness weights of the source images are respectively calculated by using Laplace filtering, IG algorithm, Gauss model and weighting them to obtain the comprehensive weights. We obtain the final image by the fusion of the processed image and the corresponding weight coefficients. Experimental results show that our method has good visual effects and is beneficial to target detection.
{"title":"Polarization image fusion algorithm based on global information correction","authors":"Xia Wang, Jing Sun, Ziyan Xu, Jun Chang","doi":"10.1145/3313950.3313955","DOIUrl":"https://doi.org/10.1145/3313950.3313955","url":null,"abstract":"The paper proposes a fusion framework for getting more information from multi-dimensional polarization image. Overall, the challenge lies on overcoming the information loss arising from reflection/irradiation interference of polarizers, inherent defects of intensity images and improper distribution of fusion weights in most fusion processes. So we introduce a modified front polarizer system model, Tiansi mask operator and comprehensive weights. We start our methodology with the modified front polarizer system model, aiming to correct the polarization information. Then, we make use of the high- frequency information enhancement effect and low frequency information preservation ability of Tiansi operator, combined with adaptive histogram equalization (AHE) to achieve intensity enhancement. Finally, the contrast, saliency and exposedness weights of the source images are respectively calculated by using Laplace filtering, IG algorithm, Gauss model and weighting them to obtain the comprehensive weights. We obtain the final image by the fusion of the processed image and the corresponding weight coefficients. Experimental results show that our method has good visual effects and is beneficial to target detection.","PeriodicalId":392037,"journal":{"name":"Proceedings of the 2nd International Conference on Image and Graphics Processing","volume":"234 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115752087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the human-computer interaction (HCI) field, facial feature analysis and extraction are the most decisive stages which can lead to a robust and efficient classification system like facial expression recognition, emotion classification. In this paper, an approach to the problem of automatic facial feature extraction from different videos are presented using several image algebraic operations. These operations deal with pixel intensity values individually through some mathematical theory involved in image analysis and transformations. In this paper, 11 operations (point subtraction, point addition, point multiplication, point division, edge detecting, average neighborhood filtering, image stretching, log operation, exponential operation, inverse filtering, and image thresholding) are implemented and tested on the images (video frames) extracted from three different self-recorded videos named as video1, video2, video3. The videos are in .avi, .mp4 and .wmv format respectively. The work is tested on two types of data: grayscale and RGB (Red, Green, Blue). To assess the efficiency of each operation, three factors are considered: processing time, frames per second (FPS) and sharpness of edges of feature points based on image gradients. The implementation has been done in MATLAB R2017a.
{"title":"Extraction of features from video files using different image algebraic point operations","authors":"P. Dutta, M. Nachamai","doi":"10.1145/3313950.3313951","DOIUrl":"https://doi.org/10.1145/3313950.3313951","url":null,"abstract":"In the human-computer interaction (HCI) field, facial feature analysis and extraction are the most decisive stages which can lead to a robust and efficient classification system like facial expression recognition, emotion classification. In this paper, an approach to the problem of automatic facial feature extraction from different videos are presented using several image algebraic operations. These operations deal with pixel intensity values individually through some mathematical theory involved in image analysis and transformations. In this paper, 11 operations (point subtraction, point addition, point multiplication, point division, edge detecting, average neighborhood filtering, image stretching, log operation, exponential operation, inverse filtering, and image thresholding) are implemented and tested on the images (video frames) extracted from three different self-recorded videos named as video1, video2, video3. The videos are in .avi, .mp4 and .wmv format respectively. The work is tested on two types of data: grayscale and RGB (Red, Green, Blue). To assess the efficiency of each operation, three factors are considered: processing time, frames per second (FPS) and sharpness of edges of feature points based on image gradients. The implementation has been done in MATLAB R2017a.","PeriodicalId":392037,"journal":{"name":"Proceedings of the 2nd International Conference on Image and Graphics Processing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128947687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I. Korobiichuk, Yuriy Podchashinskiy, O. Bezvesilna, S. Nechay, Yuriy Shavurskiy
The paper presents a new design of a three-axis gravimeter of aviation gravimetric system, which provides compensation for errors caused by influence of mobile base vertical accelerations and in the result of measurement of full gravity acceleration vector. Angle of inclination of a mark that is applied to a gravimeter body and coincides with vertical sensitive axis direction is determined by linear approximation in digital video images. These data are used to point sensitive axes and improve gravimeter accuracy.
{"title":"Three-coordinate gravimeter with exhibition of axis sensitivity based on digital videoimages","authors":"I. Korobiichuk, Yuriy Podchashinskiy, O. Bezvesilna, S. Nechay, Yuriy Shavurskiy","doi":"10.1145/3313950.3314187","DOIUrl":"https://doi.org/10.1145/3313950.3314187","url":null,"abstract":"The paper presents a new design of a three-axis gravimeter of aviation gravimetric system, which provides compensation for errors caused by influence of mobile base vertical accelerations and in the result of measurement of full gravity acceleration vector. Angle of inclination of a mark that is applied to a gravimeter body and coincides with vertical sensitive axis direction is determined by linear approximation in digital video images. These data are used to point sensitive axes and improve gravimeter accuracy.","PeriodicalId":392037,"journal":{"name":"Proceedings of the 2nd International Conference on Image and Graphics Processing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121919262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In many image processing applications, such as OCR and object analysis, the input image is often inclined. For the subsequent processing and analysis, geometric correction is the most part in pre-processing phase. For the image or scanned document which is rectangular or near-rectangular, if its corner(s) is/are missing or folding, to find the real inclination angle of image or document is time consuming by previous techniques. In this paper, we proposed a fast and efficient algorithm to find the inclination angle of such an image or document. Experiments show that the correction results for the inclined image or scanned document by proposed technique are perfect. Compared with the previous algorithms, the amount of calculation has been greatly reduced, so it is suitable for real-time correction of slant images, such as scanned financial note, vehicle license plate, and text document.
{"title":"A fast and efficient correction technique for slant images","authors":"Wenjia Ding, Yi Xie, Yulin Wang","doi":"10.1145/3313950.3313971","DOIUrl":"https://doi.org/10.1145/3313950.3313971","url":null,"abstract":"In many image processing applications, such as OCR and object analysis, the input image is often inclined. For the subsequent processing and analysis, geometric correction is the most part in pre-processing phase. For the image or scanned document which is rectangular or near-rectangular, if its corner(s) is/are missing or folding, to find the real inclination angle of image or document is time consuming by previous techniques. In this paper, we proposed a fast and efficient algorithm to find the inclination angle of such an image or document. Experiments show that the correction results for the inclined image or scanned document by proposed technique are perfect. Compared with the previous algorithms, the amount of calculation has been greatly reduced, so it is suitable for real-time correction of slant images, such as scanned financial note, vehicle license plate, and text document.","PeriodicalId":392037,"journal":{"name":"Proceedings of the 2nd International Conference on Image and Graphics Processing","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126194027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Traditional component coding recognition adopts manual recognition or primitive machine vision technology in the electronic component testing and screening industry, which has the issues of low testing efficiency and high recognition error rate. Therefore, we proposed a novel method of component coding recognition based on machine vision combining with deep learning. The machine vision imaging system have been developed to obtain the images of component, and the processing operators such as grayscale conversion, mean filter, slant correction and other techniques are used for preprocessing. The component coding of different types and materials were recognized by deep learning model of deep convolution neural network. Extensive experiments in the component testing center and comparisons with traditional recognition demonstrate that this method has high recognition accuracy and wide range of components recognition.
{"title":"Component recognition method based on deep learning and machine vision","authors":"Haozhan Tang, Jie Chen, Xuesong Zhen","doi":"10.1145/3313950.3313962","DOIUrl":"https://doi.org/10.1145/3313950.3313962","url":null,"abstract":"Traditional component coding recognition adopts manual recognition or primitive machine vision technology in the electronic component testing and screening industry, which has the issues of low testing efficiency and high recognition error rate. Therefore, we proposed a novel method of component coding recognition based on machine vision combining with deep learning. The machine vision imaging system have been developed to obtain the images of component, and the processing operators such as grayscale conversion, mean filter, slant correction and other techniques are used for preprocessing. The component coding of different types and materials were recognized by deep learning model of deep convolution neural network. Extensive experiments in the component testing center and comparisons with traditional recognition demonstrate that this method has high recognition accuracy and wide range of components recognition.","PeriodicalId":392037,"journal":{"name":"Proceedings of the 2nd International Conference on Image and Graphics Processing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126957160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sabah Afroze, M. Beham, Tamilselvi Rajendran, S. M. A. Maraikkayar, K. Rajakumar
The field of computer vision is devoted to discovering algorithms, data representations and computer architectures that embody the principles underlying visual capabilities. Computer vision is an interdisciplinary field that deals with how computers can be made for gaining high level understanding from digital images or videos. While very promising result has been shown on face recognition related problems, age invariant face recognition still relics a challenge. Facial appearance of a human varies over time, which results in substantial intra-class variations. In order to address this problem, we propose Frangi2D method for normalization, Linear Binary pattern (LBP) for feature extraction and Sparse Representation Classifier (SRC). Extensive results on a well-known public domain face aging dataset: MORPH. The experimental results show the superiority of our proposed method in age invariant face recognition.
{"title":"Age invariant face recognition using Frangi2D binary pattern","authors":"Sabah Afroze, M. Beham, Tamilselvi Rajendran, S. M. A. Maraikkayar, K. Rajakumar","doi":"10.1145/3313950.3313961","DOIUrl":"https://doi.org/10.1145/3313950.3313961","url":null,"abstract":"The field of computer vision is devoted to discovering algorithms, data representations and computer architectures that embody the principles underlying visual capabilities. Computer vision is an interdisciplinary field that deals with how computers can be made for gaining high level understanding from digital images or videos. While very promising result has been shown on face recognition related problems, age invariant face recognition still relics a challenge. Facial appearance of a human varies over time, which results in substantial intra-class variations. In order to address this problem, we propose Frangi2D method for normalization, Linear Binary pattern (LBP) for feature extraction and Sparse Representation Classifier (SRC). Extensive results on a well-known public domain face aging dataset: MORPH. The experimental results show the superiority of our proposed method in age invariant face recognition.","PeriodicalId":392037,"journal":{"name":"Proceedings of the 2nd International Conference on Image and Graphics Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128605947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}