Pub Date : 2023-04-07DOI: 10.1109/I2CT57861.2023.10126251
S. V, S. Koolagudi
Human beings have the ability to identify a particular event occurring in a surrounding based on sound cues even when no visual scenes are presented. Sound events are the auditory cues that are present in a surrounding. Sound event detection (SED) is the process of determining the beginning and end of sound events as well as a textual label for the event. The term sound source localization (SSL) refers to the process of identifying the spatial location of a sound occurrence in addition to the SED. The integrated task of SED and SSL is known as Sound Event Localization and Detection (SELD). In this proposed work, three different deep learning architectures are explored to perform SELD. The three deep learning architectures are SELDNet, D-SELDNet (Depthwise Convolution), and T-SELDNet (Transpose Convolution). Two sets of features are used to perform SED and Direction-of-Arrival (DOA) estimation tasks in this work. D-SELDNet uses a Depthwise convolution layer which helps reduce the model’s complexity in terms of computation time. T-SELDNet uses Transpose Convolution, which helps in learning better discriminative features by retaining the input size and not losing necessary information from the input. The proposed method is evaluated on the First-order Ambisonic (FOA) array format of the TAU-NIGENS Spatial Sound Events 2020 dataset. An improvement has been observed as compared to the existing SELD systems with the proposed T-SELDNet.
{"title":"A Transpose-SELDNet for Polyphonic Sound Event Localization and Detection","authors":"S. V, S. Koolagudi","doi":"10.1109/I2CT57861.2023.10126251","DOIUrl":"https://doi.org/10.1109/I2CT57861.2023.10126251","url":null,"abstract":"Human beings have the ability to identify a particular event occurring in a surrounding based on sound cues even when no visual scenes are presented. Sound events are the auditory cues that are present in a surrounding. Sound event detection (SED) is the process of determining the beginning and end of sound events as well as a textual label for the event. The term sound source localization (SSL) refers to the process of identifying the spatial location of a sound occurrence in addition to the SED. The integrated task of SED and SSL is known as Sound Event Localization and Detection (SELD). In this proposed work, three different deep learning architectures are explored to perform SELD. The three deep learning architectures are SELDNet, D-SELDNet (Depthwise Convolution), and T-SELDNet (Transpose Convolution). Two sets of features are used to perform SED and Direction-of-Arrival (DOA) estimation tasks in this work. D-SELDNet uses a Depthwise convolution layer which helps reduce the model’s complexity in terms of computation time. T-SELDNet uses Transpose Convolution, which helps in learning better discriminative features by retaining the input size and not losing necessary information from the input. The proposed method is evaluated on the First-order Ambisonic (FOA) array format of the TAU-NIGENS Spatial Sound Events 2020 dataset. An improvement has been observed as compared to the existing SELD systems with the proposed T-SELDNet.","PeriodicalId":150346,"journal":{"name":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125275701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-07DOI: 10.1109/I2CT57861.2023.10126432
Carlos Vicente Niño Rondón, Diego Andrés Castellano Carvajal, B. M. Delgado, Sergio Alexander Castro Casadiego, Dinael Guevara Ibarra
Skin cancer ranks as the most common malignant tumor among all types of cancer. Melanoma accounts for 1% of all cancer cases. However, it is responsible for the majority of deaths from this type of cancer. According to the American Cancer Society, it is expected that 99,780 new cases of melanoma will be diagnosed and about 7,650 people will die from this type of cancer. This work presents an executable architecture on reduced plate systems for skin cancer classification, complemented with image enhancement and feature enhancement stages, information extraction using VGG16 network architecture, feature reduction applying Principal Component Analysis and classification stage using gradient augmented decision trees (XGBoost). The architecture was tested on Raspberry Pi 4B reduced board system and developed with Python programming language and open-source libraries. In turn, the images processed and used are part of the ISIC Challenge Dataset. An average power value of 2.93 W out of a maximum of 3.6 W was obtained in the execution of the diagnostic tool. In turn, the minimum required software architecture response time was 0.09 seconds. The demand for the execution of the diagnostic tool in the Central Processing Unit was on average 20.63 % over a maximum value of 24.5 % respectively. On the other hand, the results at the software level of the architecture were compared with the scientific literature and presented improvements of about 9 % in terms of accuracy in skin cancer classification. The diagnostic tool is replicable and affordable due to reduced hardware requirements and cost of implementation.
{"title":"An Architecture for Microprocessor-Executable Skin Cancer Classification","authors":"Carlos Vicente Niño Rondón, Diego Andrés Castellano Carvajal, B. M. Delgado, Sergio Alexander Castro Casadiego, Dinael Guevara Ibarra","doi":"10.1109/I2CT57861.2023.10126432","DOIUrl":"https://doi.org/10.1109/I2CT57861.2023.10126432","url":null,"abstract":"Skin cancer ranks as the most common malignant tumor among all types of cancer. Melanoma accounts for 1% of all cancer cases. However, it is responsible for the majority of deaths from this type of cancer. According to the American Cancer Society, it is expected that 99,780 new cases of melanoma will be diagnosed and about 7,650 people will die from this type of cancer. This work presents an executable architecture on reduced plate systems for skin cancer classification, complemented with image enhancement and feature enhancement stages, information extraction using VGG16 network architecture, feature reduction applying Principal Component Analysis and classification stage using gradient augmented decision trees (XGBoost). The architecture was tested on Raspberry Pi 4B reduced board system and developed with Python programming language and open-source libraries. In turn, the images processed and used are part of the ISIC Challenge Dataset. An average power value of 2.93 W out of a maximum of 3.6 W was obtained in the execution of the diagnostic tool. In turn, the minimum required software architecture response time was 0.09 seconds. The demand for the execution of the diagnostic tool in the Central Processing Unit was on average 20.63 % over a maximum value of 24.5 % respectively. On the other hand, the results at the software level of the architecture were compared with the scientific literature and presented improvements of about 9 % in terms of accuracy in skin cancer classification. The diagnostic tool is replicable and affordable due to reduced hardware requirements and cost of implementation.","PeriodicalId":150346,"journal":{"name":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132369697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neural audio codecs are the most recent development in the field of audio compression. Traditional audio codecs rely on fixed signal processing pipelines and require domain-specific expertise to produce high-quality audio at low to high bit rates. However, the performance of conventional audio codecs usually degrades at low bit rates. Neural audio codecs perform enhancement and compression with no added latency. This paper further enhances the quality of neural audio codecs by integrating a psychoacoustic model with the existing structure that contains a convolutional encoder, decoder, and a residual vector quantizer. It used a combination of reconstruction and adversarial loss to train the model to generate high-quality audio content. Audio quality measures like PEAQ and MUSHRA are conducted to illustrate that the proposed model performs better than the existing model of neural audio codec.
{"title":"Design of Medium to Low Bitrate Neural Audio Codec","authors":"Samarpreet Singh, Saurabh Singh Raghuvanshi, Vinal Patel","doi":"10.1109/I2CT57861.2023.10126323","DOIUrl":"https://doi.org/10.1109/I2CT57861.2023.10126323","url":null,"abstract":"Neural audio codecs are the most recent development in the field of audio compression. Traditional audio codecs rely on fixed signal processing pipelines and require domain-specific expertise to produce high-quality audio at low to high bit rates. However, the performance of conventional audio codecs usually degrades at low bit rates. Neural audio codecs perform enhancement and compression with no added latency. This paper further enhances the quality of neural audio codecs by integrating a psychoacoustic model with the existing structure that contains a convolutional encoder, decoder, and a residual vector quantizer. It used a combination of reconstruction and adversarial loss to train the model to generate high-quality audio content. Audio quality measures like PEAQ and MUSHRA are conducted to illustrate that the proposed model performs better than the existing model of neural audio codec.","PeriodicalId":150346,"journal":{"name":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","volume":"258 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122670754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-07DOI: 10.1109/I2CT57861.2023.10126429
S. Saravanan, G. Prakash, B. Uma Maheswari
Nowadays, Peer-to-Peer (P2P) bots play a significant role in launching attacks such as phishing, distributed denial-of-service (DDoS), email spam, click fraud, cryptocurrency mining, etc. The analysis of statistical network traffic features of hosts is one of the commonly used methods to detect P2P bots. Modern P2P bot detection systems need to extract features from massive streaming network traffic as the size of the Internet keeps increasing every day. However, traditional detection systems have trouble detecting bots in real-time in large-scale networks as they are not implemented on big data streaming platforms. Hence, this work proposes a network flow-based P2P bot detection system implemented on Apache Spark Structured Streaming Platform to detect P2P bots in real time by analyzing massive streaming network traffic data generated from large-scale networks. Such detection of P2P bots is based on statistical network traffic features: destination diversity ratio, control packets ratio, and total source bytes sent in a flow. There are two components in the proposed system: the first component detects potential P2P hosts using the Destination Diversity Ratio (DDR), and the second component finds out P2P bot hosts from the P2P hosts identified by the first component. Furthermore, the performance of the detection components depends on the time window at which statistical features are extracted. Hence, this work also conducted experiments to study the effect of different time windows on detection components. The proposed system is evaluated using real-world datasets and achieves a True Positive Rate (TPR) of 99.87%.
{"title":"A Real-Time P2P Bot Host Detection in a Large-Scale Network Using Statistical Network Traffic Features and Apache Spark Streaming Platform","authors":"S. Saravanan, G. Prakash, B. Uma Maheswari","doi":"10.1109/I2CT57861.2023.10126429","DOIUrl":"https://doi.org/10.1109/I2CT57861.2023.10126429","url":null,"abstract":"Nowadays, Peer-to-Peer (P2P) bots play a significant role in launching attacks such as phishing, distributed denial-of-service (DDoS), email spam, click fraud, cryptocurrency mining, etc. The analysis of statistical network traffic features of hosts is one of the commonly used methods to detect P2P bots. Modern P2P bot detection systems need to extract features from massive streaming network traffic as the size of the Internet keeps increasing every day. However, traditional detection systems have trouble detecting bots in real-time in large-scale networks as they are not implemented on big data streaming platforms. Hence, this work proposes a network flow-based P2P bot detection system implemented on Apache Spark Structured Streaming Platform to detect P2P bots in real time by analyzing massive streaming network traffic data generated from large-scale networks. Such detection of P2P bots is based on statistical network traffic features: destination diversity ratio, control packets ratio, and total source bytes sent in a flow. There are two components in the proposed system: the first component detects potential P2P hosts using the Destination Diversity Ratio (DDR), and the second component finds out P2P bot hosts from the P2P hosts identified by the first component. Furthermore, the performance of the detection components depends on the time window at which statistical features are extracted. Hence, this work also conducted experiments to study the effect of different time windows on detection components. The proposed system is evaluated using real-world datasets and achieves a True Positive Rate (TPR) of 99.87%.","PeriodicalId":150346,"journal":{"name":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122805542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-07DOI: 10.1109/I2CT57861.2023.10126355
Prateeksha Khare, Shailendra Kumar Sharma
In this paper, an artificial neural network technique (ANN) is used for tracking of maximum power point (TMPP) of a solar photovoltaic panel based power supply (SPPBPS). A boost converter is interfaced between the photovoltaic (PV) panel and DC link of single-phase voltage source inverter (VSI). A synchronous reference Frame (SRF)-Phase-locked loop (PLL) is used to synchronize the VSI with the grid supply. The VSI control maintains constant DC bus voltage. An ANN based TMPP is proposed for double stage 1-phase VSI with LCL filter to ensure stable overall system performance, and controllability. A proportional resonant controller with harmonic compensator (HC) controls VSI current. The proposed system behavior is observed under real dynamic situations and found satisfactory.
{"title":"Artificial Neural Network Based Double Stage Grid Connected Solar Photovoltaic Supply System","authors":"Prateeksha Khare, Shailendra Kumar Sharma","doi":"10.1109/I2CT57861.2023.10126355","DOIUrl":"https://doi.org/10.1109/I2CT57861.2023.10126355","url":null,"abstract":"In this paper, an artificial neural network technique (ANN) is used for tracking of maximum power point (TMPP) of a solar photovoltaic panel based power supply (SPPBPS). A boost converter is interfaced between the photovoltaic (PV) panel and DC link of single-phase voltage source inverter (VSI). A synchronous reference Frame (SRF)-Phase-locked loop (PLL) is used to synchronize the VSI with the grid supply. The VSI control maintains constant DC bus voltage. An ANN based TMPP is proposed for double stage 1-phase VSI with LCL filter to ensure stable overall system performance, and controllability. A proportional resonant controller with harmonic compensator (HC) controls VSI current. The proposed system behavior is observed under real dynamic situations and found satisfactory.","PeriodicalId":150346,"journal":{"name":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127650245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-07DOI: 10.1109/I2CT57861.2023.10126197
Aishwarya Agawane, R. Mudhalwadkar
Snoring is one of the most common disorder and yet there is no proper solution for this problem. Polysomnography test is required to have before using any kind of snoring devices. So, the work is focused on a system which help to detect snoring and to manage snoring by alerting that patient. A multi-sensor system is designed so that it will sense, record and alert by using an Internet of things platform. Thus system helping patients to have a good quality of sleep because it has inbuilt music system which play relaxing music when required, meanwhile the patient snores then the system will alert that patient. One of the effect of snoring is neck pain, so an attempt is made to provide neck pain solution into pillow by considering some biological aspects. This system also gives support to neck , back and the head of the patient.
{"title":"Design A Smart Pillow for Detection and Management of Snoring","authors":"Aishwarya Agawane, R. Mudhalwadkar","doi":"10.1109/I2CT57861.2023.10126197","DOIUrl":"https://doi.org/10.1109/I2CT57861.2023.10126197","url":null,"abstract":"Snoring is one of the most common disorder and yet there is no proper solution for this problem. Polysomnography test is required to have before using any kind of snoring devices. So, the work is focused on a system which help to detect snoring and to manage snoring by alerting that patient. A multi-sensor system is designed so that it will sense, record and alert by using an Internet of things platform. Thus system helping patients to have a good quality of sleep because it has inbuilt music system which play relaxing music when required, meanwhile the patient snores then the system will alert that patient. One of the effect of snoring is neck pain, so an attempt is made to provide neck pain solution into pillow by considering some biological aspects. This system also gives support to neck , back and the head of the patient.","PeriodicalId":150346,"journal":{"name":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133418759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-07DOI: 10.1109/I2CT57861.2023.10126216
Ridho Nobelino Sabililah, D. Adytia
Sea level prediction is essential information for citizens who live in the coastal area and plan to build structures, especially in the construction stage around the inshore and offshore locations. The statistical method and tidal harmonic analysis have been used to predict the sea level but require long terms historical sea level data to achieve reasonable accuracy. This paper uses Transformer deep learning approaches to predict sea data levels. This paper uses only four months of data in Pangandaran, Indonesia. We use the sea level dataset obtained from the Inexpensive Device for Sea Level measurement (IDSL). The model is trained to predict 1, 7, and 14 days. We also study the sensitivity of the model in terms of lookbacks. The performance of the Transformer was compared with two other popular deep-learning methods; RNN and LSTM. To forecast 14 days, the Transformer model results in a higher coefficient correlation (CC) of 0.993 and a lower root mean squared error (RMSE) value of 0.055 compared to the other two models. Moreover, the Transformer has a faster computing performance than the other two models.
{"title":"Time Series Forecasting of Sea Level by Using Transformer Approach, with a Case Study in Pangandaran, Indonesia","authors":"Ridho Nobelino Sabililah, D. Adytia","doi":"10.1109/I2CT57861.2023.10126216","DOIUrl":"https://doi.org/10.1109/I2CT57861.2023.10126216","url":null,"abstract":"Sea level prediction is essential information for citizens who live in the coastal area and plan to build structures, especially in the construction stage around the inshore and offshore locations. The statistical method and tidal harmonic analysis have been used to predict the sea level but require long terms historical sea level data to achieve reasonable accuracy. This paper uses Transformer deep learning approaches to predict sea data levels. This paper uses only four months of data in Pangandaran, Indonesia. We use the sea level dataset obtained from the Inexpensive Device for Sea Level measurement (IDSL). The model is trained to predict 1, 7, and 14 days. We also study the sensitivity of the model in terms of lookbacks. The performance of the Transformer was compared with two other popular deep-learning methods; RNN and LSTM. To forecast 14 days, the Transformer model results in a higher coefficient correlation (CC) of 0.993 and a lower root mean squared error (RMSE) value of 0.055 compared to the other two models. Moreover, the Transformer has a faster computing performance than the other two models.","PeriodicalId":150346,"journal":{"name":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133963456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A learning impairment is a dysfunction in one or more fundamental psychological functions that might show up as a lack of proficiency in some areas of learning, such reading, writing, while doing mathematical calculations or while coordinating movements. Learning disabilities are typically not identified until the kid is of school age, Although they can also be developed in very young infants.We aim to develop a machine learning model to analyze EEG (electroencephalogram) signals from people with learning difficulties and provide results in minutes with the highest level of accuracy. Here we will be considering Learning disabilities namely Dyslexia and ADHD(Attention Deficit Hyperactivity Disorder). For the early detection of these disabilities, machine learning algorithms like Support vector machines, K-nearest neighbors, Random Forest, Decision Trees, and convolutional neural networks were used. In order to determine which lobe combination provides the maximum accuracy, we tested the ADHD model using a variety of lobe combinations. The finding indicated that EEG signals produced the highest classification accuracy and Machine learning applications have high potential in identifying ADHD and Dyslexia.
{"title":"Early detection of ADHD and Dyslexia from EEG Signals","authors":"Nupur Gupte, Mitali Patel, Tanvi Pen, Swapnali Kurhade","doi":"10.1109/I2CT57861.2023.10126272","DOIUrl":"https://doi.org/10.1109/I2CT57861.2023.10126272","url":null,"abstract":"A learning impairment is a dysfunction in one or more fundamental psychological functions that might show up as a lack of proficiency in some areas of learning, such reading, writing, while doing mathematical calculations or while coordinating movements. Learning disabilities are typically not identified until the kid is of school age, Although they can also be developed in very young infants.We aim to develop a machine learning model to analyze EEG (electroencephalogram) signals from people with learning difficulties and provide results in minutes with the highest level of accuracy. Here we will be considering Learning disabilities namely Dyslexia and ADHD(Attention Deficit Hyperactivity Disorder). For the early detection of these disabilities, machine learning algorithms like Support vector machines, K-nearest neighbors, Random Forest, Decision Trees, and convolutional neural networks were used. In order to determine which lobe combination provides the maximum accuracy, we tested the ADHD model using a variety of lobe combinations. The finding indicated that EEG signals produced the highest classification accuracy and Machine learning applications have high potential in identifying ADHD and Dyslexia.","PeriodicalId":150346,"journal":{"name":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134211955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Given the cursive structure of the writing and the similarity in shape of the letters, Telugu handwritten character identification is an interesting topic. The lack of Telugu-related handwritten datasets has slowed the development of handwritten word recognizers and forced researchers to compare various approaches. Modern deep neural networks find it difficult because they often need hundreds or thousands of photos per class. It has been demonstrated that learning important aspects of machine learning systems can be computationally expensive and challenging when there is a limited amount of data available. This research analysis work proposes a use case on the pre-existing model called EfficientNet and on top of that a custom pooling layer is added to check the trend as the dataset size increases of Telugu characters. The dataset has been divided into three categories, namely, Vowels only dataset, Consonant only dataset, and All character dataset. Proposed model was trained with a considerable amount of dataset containing half a thousand of handwritten Telugu characters and has produced some fascinating results which were worth observing. The accuracies had followed a certain trend. The model was tested on the dataset collected, which were filtered out to record any performance improvement and improvement was observed, where average accuracy went from 55% to 92%.
{"title":"Handwritten Character Recognition of Telugu Characters","authors":"Yash Prashant Wasalwar, Kishan Singh Bagga, Pvrr Bhogendra Rao, S. Dongre","doi":"10.1109/I2CT57861.2023.10126377","DOIUrl":"https://doi.org/10.1109/I2CT57861.2023.10126377","url":null,"abstract":"Given the cursive structure of the writing and the similarity in shape of the letters, Telugu handwritten character identification is an interesting topic. The lack of Telugu-related handwritten datasets has slowed the development of handwritten word recognizers and forced researchers to compare various approaches. Modern deep neural networks find it difficult because they often need hundreds or thousands of photos per class. It has been demonstrated that learning important aspects of machine learning systems can be computationally expensive and challenging when there is a limited amount of data available. This research analysis work proposes a use case on the pre-existing model called EfficientNet and on top of that a custom pooling layer is added to check the trend as the dataset size increases of Telugu characters. The dataset has been divided into three categories, namely, Vowels only dataset, Consonant only dataset, and All character dataset. Proposed model was trained with a considerable amount of dataset containing half a thousand of handwritten Telugu characters and has produced some fascinating results which were worth observing. The accuracies had followed a certain trend. The model was tested on the dataset collected, which were filtered out to record any performance improvement and improvement was observed, where average accuracy went from 55% to 92%.","PeriodicalId":150346,"journal":{"name":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134240687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-07DOI: 10.1109/I2CT57861.2023.10126314
Rohini K Katti, S. C, Padmashri Desai, Shankar G
Communication is essential to humans because it allows the dissemination of knowledge and the formation of interpersonal connections. We communicate through speaking, facial expressions, hand gestures, reading, writing, and sketching, among other things. However, speaking is the most often utilized means of communication. People having speech and hearing disabilities can only communicate using hand gestures, making them extremely reliant on nonverbal modes of communication. Hearing-impaired persons can communicate via sign language. Globally, around 1 percent(5 million) of the Indian population falls into this group. ISL is a complete language with its own vocabulary, semantics, lexicon, and a variety of other distinctive linguistic features. In our work, we present the methods for Indian sign language recognition at the character and word levels. The Bag of Visual Words(BoVW) technique recognizes ISL at character level(A-Z, 0-9) with an accuracy of 99 percent. Indian Lexicon Sign Language Dataset - INCLUDE-50 dataset is used for word-level sign language recognition. Inception model, a deep Convolutional Neural Network(CNN) is used to train the spatial features and LSTM RNN(Recurrent Neural Network) is used to train the temporal features of the video. Using CNN predictions as input to RNN, we achieved an accuracy of 86.7 %. In order to optimize the training process, only 60 % of the dataset is trained using the Meta-Learning model along with LSTM RNN and obtained an accuracy of 84.4 %, thus reducing the training time by 70 % and reaching nearly as close accuracy as the previous pre-trained model.
{"title":"Character and Word Level Gesture Recognition of Indian Sign Language","authors":"Rohini K Katti, S. C, Padmashri Desai, Shankar G","doi":"10.1109/I2CT57861.2023.10126314","DOIUrl":"https://doi.org/10.1109/I2CT57861.2023.10126314","url":null,"abstract":"Communication is essential to humans because it allows the dissemination of knowledge and the formation of interpersonal connections. We communicate through speaking, facial expressions, hand gestures, reading, writing, and sketching, among other things. However, speaking is the most often utilized means of communication. People having speech and hearing disabilities can only communicate using hand gestures, making them extremely reliant on nonverbal modes of communication. Hearing-impaired persons can communicate via sign language. Globally, around 1 percent(5 million) of the Indian population falls into this group. ISL is a complete language with its own vocabulary, semantics, lexicon, and a variety of other distinctive linguistic features. In our work, we present the methods for Indian sign language recognition at the character and word levels. The Bag of Visual Words(BoVW) technique recognizes ISL at character level(A-Z, 0-9) with an accuracy of 99 percent. Indian Lexicon Sign Language Dataset - INCLUDE-50 dataset is used for word-level sign language recognition. Inception model, a deep Convolutional Neural Network(CNN) is used to train the spatial features and LSTM RNN(Recurrent Neural Network) is used to train the temporal features of the video. Using CNN predictions as input to RNN, we achieved an accuracy of 86.7 %. In order to optimize the training process, only 60 % of the dataset is trained using the Meta-Learning model along with LSTM RNN and obtained an accuracy of 84.4 %, thus reducing the training time by 70 % and reaching nearly as close accuracy as the previous pre-trained model.","PeriodicalId":150346,"journal":{"name":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","volume":"857 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113994575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}