2023 IEEE 8th International Conference for Convergence in Technology (I2CT)最新文献_第6页

A Transpose-SELDNet for Polyphonic Sound Event Localization and Detection 基于转置seldnet的复音事件定位与检测

2023 IEEE 8th International Conference for Convergence in Technology (I2CT)

Pub Date : 2023-04-07 DOI: 10.1109/I2CT57861.2023.10126251

S. V, S. Koolagudi

Human beings have the ability to identify a particular event occurring in a surrounding based on sound cues even when no visual scenes are presented. Sound events are the auditory cues that are present in a surrounding. Sound event detection (SED) is the process of determining the beginning and end of sound events as well as a textual label for the event. The term sound source localization (SSL) refers to the process of identifying the spatial location of a sound occurrence in addition to the SED. The integrated task of SED and SSL is known as Sound Event Localization and Detection (SELD). In this proposed work, three different deep learning architectures are explored to perform SELD. The three deep learning architectures are SELDNet, D-SELDNet (Depthwise Convolution), and T-SELDNet (Transpose Convolution). Two sets of features are used to perform SED and Direction-of-Arrival (DOA) estimation tasks in this work. D-SELDNet uses a Depthwise convolution layer which helps reduce the model’s complexity in terms of computation time. T-SELDNet uses Transpose Convolution, which helps in learning better discriminative features by retaining the input size and not losing necessary information from the input. The proposed method is evaluated on the First-order Ambisonic (FOA) array format of the TAU-NIGENS Spatial Sound Events 2020 dataset. An improvement has been observed as compared to the existing SELD systems with the proposed T-SELDNet.

即使没有视觉场景，人类也有能力根据声音线索识别周围发生的特定事件。声音事件是存在于周围环境中的听觉线索。声音事件检测(SED)是确定声音事件的开始和结束以及事件的文本标签的过程。声源定位(SSL)一词指的是在SED之外识别声音发生的空间位置的过程。SED和SSL的集成任务被称为声音事件定位和检测(SELD)。在本文中，我们探索了三种不同的深度学习架构来执行SELD。这三种深度学习架构分别是SELDNet、D-SELDNet(深度卷积)和T-SELDNet(转置卷积)。在这项工作中，两组特征用于执行SED和到达方向(DOA)估计任务。D-SELDNet使用深度卷积层，这有助于降低模型在计算时间方面的复杂性。T-SELDNet使用转置卷积，通过保留输入大小和不丢失输入的必要信息来帮助学习更好的判别特征。在TAU-NIGENS空间声事件2020数据集的一阶双声(FOA)阵列格式上对该方法进行了评估。与提出的T-SELDNet相比，已经观察到现有SELD系统的改进。

{"title":"A Transpose-SELDNet for Polyphonic Sound Event Localization and Detection","authors":"S. V, S. Koolagudi","doi":"10.1109/I2CT57861.2023.10126251","DOIUrl":"https://doi.org/10.1109/I2CT57861.2023.10126251","url":null,"abstract":"Human beings have the ability to identify a particular event occurring in a surrounding based on sound cues even when no visual scenes are presented. Sound events are the auditory cues that are present in a surrounding. Sound event detection (SED) is the process of determining the beginning and end of sound events as well as a textual label for the event. The term sound source localization (SSL) refers to the process of identifying the spatial location of a sound occurrence in addition to the SED. The integrated task of SED and SSL is known as Sound Event Localization and Detection (SELD). In this proposed work, three different deep learning architectures are explored to perform SELD. The three deep learning architectures are SELDNet, D-SELDNet (Depthwise Convolution), and T-SELDNet (Transpose Convolution). Two sets of features are used to perform SED and Direction-of-Arrival (DOA) estimation tasks in this work. D-SELDNet uses a Depthwise convolution layer which helps reduce the model’s complexity in terms of computation time. T-SELDNet uses Transpose Convolution, which helps in learning better discriminative features by retaining the input size and not losing necessary information from the input. The proposed method is evaluated on the First-order Ambisonic (FOA) array format of the TAU-NIGENS Spatial Sound Events 2020 dataset. An improvement has been observed as compared to the existing SELD systems with the proposed T-SELDNet.","PeriodicalId":150346,"journal":{"name":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125275701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Architecture for Microprocessor-Executable Skin Cancer Classification 一种微处理器可执行皮肤癌分类体系结构

2023 IEEE 8th International Conference for Convergence in Technology (I2CT)

Pub Date : 2023-04-07 DOI: 10.1109/I2CT57861.2023.10126432

Carlos Vicente Niño Rondón, Diego Andrés Castellano Carvajal, B. M. Delgado, Sergio Alexander Castro Casadiego, Dinael Guevara Ibarra

Skin cancer ranks as the most common malignant tumor among all types of cancer. Melanoma accounts for 1% of all cancer cases. However, it is responsible for the majority of deaths from this type of cancer. According to the American Cancer Society, it is expected that 99,780 new cases of melanoma will be diagnosed and about 7,650 people will die from this type of cancer. This work presents an executable architecture on reduced plate systems for skin cancer classification, complemented with image enhancement and feature enhancement stages, information extraction using VGG16 network architecture, feature reduction applying Principal Component Analysis and classification stage using gradient augmented decision trees (XGBoost). The architecture was tested on Raspberry Pi 4B reduced board system and developed with Python programming language and open-source libraries. In turn, the images processed and used are part of the ISIC Challenge Dataset. An average power value of 2.93 W out of a maximum of 3.6 W was obtained in the execution of the diagnostic tool. In turn, the minimum required software architecture response time was 0.09 seconds. The demand for the execution of the diagnostic tool in the Central Processing Unit was on average 20.63 % over a maximum value of 24.5 % respectively. On the other hand, the results at the software level of the architecture were compared with the scientific literature and presented improvements of about 9 % in terms of accuracy in skin cancer classification. The diagnostic tool is replicable and affordable due to reduced hardware requirements and cost of implementation.

皮肤癌是各种癌症中最常见的恶性肿瘤。黑色素瘤占所有癌症病例的1%。然而，这类癌症的大多数死亡都是由它造成的。根据美国癌症协会的数据，预计将有99780例新的黑色素瘤病例被诊断出来，大约7650人将死于这种类型的癌症。本文提出了一种用于皮肤癌分类的简化板系统的可执行架构，辅以图像增强和特征增强阶段，使用VGG16网络架构进行信息提取，使用主成分分析进行特征约简，使用梯度增强决策树(XGBoost)进行分类阶段。该架构在树莓派4B精简板系统上进行了测试，并使用Python编程语言和开源库进行了开发。反过来，处理和使用的图像是ISIC挑战数据集的一部分。在诊断工具的执行过程中，获得的最大3.6 W的平均功率值为2.93 W。相应地，所需的最小软件体系结构响应时间为0.09秒。在中央处理单元中执行诊断工具的需求平均为20.63%，高于24.5%的最大值。另一方面，将该体系结构软件层面的结果与科学文献进行比较，发现皮肤癌分类的准确率提高了约9%。由于降低了硬件需求和实现成本，该诊断工具可复制且价格合理。

{"title":"An Architecture for Microprocessor-Executable Skin Cancer Classification","authors":"Carlos Vicente Niño Rondón, Diego Andrés Castellano Carvajal, B. M. Delgado, Sergio Alexander Castro Casadiego, Dinael Guevara Ibarra","doi":"10.1109/I2CT57861.2023.10126432","DOIUrl":"https://doi.org/10.1109/I2CT57861.2023.10126432","url":null,"abstract":"Skin cancer ranks as the most common malignant tumor among all types of cancer. Melanoma accounts for 1% of all cancer cases. However, it is responsible for the majority of deaths from this type of cancer. According to the American Cancer Society, it is expected that 99,780 new cases of melanoma will be diagnosed and about 7,650 people will die from this type of cancer. This work presents an executable architecture on reduced plate systems for skin cancer classification, complemented with image enhancement and feature enhancement stages, information extraction using VGG16 network architecture, feature reduction applying Principal Component Analysis and classification stage using gradient augmented decision trees (XGBoost). The architecture was tested on Raspberry Pi 4B reduced board system and developed with Python programming language and open-source libraries. In turn, the images processed and used are part of the ISIC Challenge Dataset. An average power value of 2.93 W out of a maximum of 3.6 W was obtained in the execution of the diagnostic tool. In turn, the minimum required software architecture response time was 0.09 seconds. The demand for the execution of the diagnostic tool in the Central Processing Unit was on average 20.63 % over a maximum value of 24.5 % respectively. On the other hand, the results at the software level of the architecture were compared with the scientific literature and presented improvements of about 9 % in terms of accuracy in skin cancer classification. The diagnostic tool is replicable and affordable due to reduced hardware requirements and cost of implementation.","PeriodicalId":150346,"journal":{"name":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132369697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Design of Medium to Low Bitrate Neural Audio Codec 中低比特率神经音频编解码器的设计

2023 IEEE 8th International Conference for Convergence in Technology (I2CT)

Pub Date : 2023-04-07 DOI: 10.1109/I2CT57861.2023.10126323

Samarpreet Singh, Saurabh Singh Raghuvanshi, Vinal Patel

Neural audio codecs are the most recent development in the field of audio compression. Traditional audio codecs rely on fixed signal processing pipelines and require domain-specific expertise to produce high-quality audio at low to high bit rates. However, the performance of conventional audio codecs usually degrades at low bit rates. Neural audio codecs perform enhancement and compression with no added latency. This paper further enhances the quality of neural audio codecs by integrating a psychoacoustic model with the existing structure that contains a convolutional encoder, decoder, and a residual vector quantizer. It used a combination of reconstruction and adversarial loss to train the model to generate high-quality audio content. Audio quality measures like PEAQ and MUSHRA are conducted to illustrate that the proposed model performs better than the existing model of neural audio codec.

神经音频编解码器是音频压缩领域的最新发展。传统的音频编解码器依赖于固定的信号处理管道，需要特定领域的专业知识才能以低到高比特率产生高质量的音频。然而，传统的音频编解码器的性能通常在低比特率下下降。神经音频编解码器执行增强和压缩，没有增加延迟。本文通过将心理声学模型与包含卷积编码器、解码器和残差矢量量化器的现有结构相结合，进一步提高了神经音频编解码器的质量。它结合了重建和对抗损失来训练模型以生成高质量的音频内容。通过PEAQ和MUSHRA等音质测量表明，该模型比现有的神经音频编解码器模型具有更好的性能。

引用次数: 0

A Real-Time P2P Bot Host Detection in a Large-Scale Network Using Statistical Network Traffic Features and Apache Spark Streaming Platform 基于统计网络流量特征和Apache Spark流媒体平台的大规模网络实时P2P Bot主机检测

2023 IEEE 8th International Conference for Convergence in Technology (I2CT)

Pub Date : 2023-04-07 DOI: 10.1109/I2CT57861.2023.10126429

S. Saravanan, G. Prakash, B. Uma Maheswari

Nowadays, Peer-to-Peer (P2P) bots play a significant role in launching attacks such as phishing, distributed denial-of-service (DDoS), email spam, click fraud, cryptocurrency mining, etc. The analysis of statistical network traffic features of hosts is one of the commonly used methods to detect P2P bots. Modern P2P bot detection systems need to extract features from massive streaming network traffic as the size of the Internet keeps increasing every day. However, traditional detection systems have trouble detecting bots in real-time in large-scale networks as they are not implemented on big data streaming platforms. Hence, this work proposes a network flow-based P2P bot detection system implemented on Apache Spark Structured Streaming Platform to detect P2P bots in real time by analyzing massive streaming network traffic data generated from large-scale networks. Such detection of P2P bots is based on statistical network traffic features: destination diversity ratio, control packets ratio, and total source bytes sent in a flow. There are two components in the proposed system: the first component detects potential P2P hosts using the Destination Diversity Ratio (DDR), and the second component finds out P2P bot hosts from the P2P hosts identified by the first component. Furthermore, the performance of the detection components depends on the time window at which statistical features are extracted. Hence, this work also conducted experiments to study the effect of different time windows on detection components. The proposed system is evaluated using real-world datasets and achieves a True Positive Rate (TPR) of 99.87%.

如今，点对点(P2P)机器人在发起网络钓鱼、分布式拒绝服务(DDoS)、电子邮件垃圾邮件、点击欺诈、加密货币挖掘等攻击方面发挥着重要作用。分析主机的统计网络流量特征是检测P2P僵尸程序的常用方法之一。随着互联网规模的日益扩大，现代P2P僵尸检测系统需要从海量的流网络流量中提取特征。然而，传统的检测系统很难在大规模网络中实时检测机器人，因为它们没有在大数据流平台上实现。因此，本文提出了一种基于网络流量的P2P机器人检测系统，该系统在Apache Spark结构化流媒体平台上实现，通过分析大规模网络产生的海量流网络流量数据，实时检测P2P机器人。P2P机器人的检测基于统计网络流量特征:目的集集比、控制包比、流中发送的源字节总数。该系统包含两个组件:第一个组件使用目的地多样性比(DDR)检测潜在的P2P主机，第二个组件从第一个组件识别的P2P主机中找出P2P bot主机。此外，检测组件的性能取决于提取统计特征的时间窗口。因此，本工作还进行了实验，研究不同时间窗对检测分量的影响。该系统使用真实数据集进行了评估，并实现了99.87%的真阳性率(TPR)。

{"title":"A Real-Time P2P Bot Host Detection in a Large-Scale Network Using Statistical Network Traffic Features and Apache Spark Streaming Platform","authors":"S. Saravanan, G. Prakash, B. Uma Maheswari","doi":"10.1109/I2CT57861.2023.10126429","DOIUrl":"https://doi.org/10.1109/I2CT57861.2023.10126429","url":null,"abstract":"Nowadays, Peer-to-Peer (P2P) bots play a significant role in launching attacks such as phishing, distributed denial-of-service (DDoS), email spam, click fraud, cryptocurrency mining, etc. The analysis of statistical network traffic features of hosts is one of the commonly used methods to detect P2P bots. Modern P2P bot detection systems need to extract features from massive streaming network traffic as the size of the Internet keeps increasing every day. However, traditional detection systems have trouble detecting bots in real-time in large-scale networks as they are not implemented on big data streaming platforms. Hence, this work proposes a network flow-based P2P bot detection system implemented on Apache Spark Structured Streaming Platform to detect P2P bots in real time by analyzing massive streaming network traffic data generated from large-scale networks. Such detection of P2P bots is based on statistical network traffic features: destination diversity ratio, control packets ratio, and total source bytes sent in a flow. There are two components in the proposed system: the first component detects potential P2P hosts using the Destination Diversity Ratio (DDR), and the second component finds out P2P bot hosts from the P2P hosts identified by the first component. Furthermore, the performance of the detection components depends on the time window at which statistical features are extracted. Hence, this work also conducted experiments to study the effect of different time windows on detection components. The proposed system is evaluated using real-world datasets and achieves a True Positive Rate (TPR) of 99.87%.","PeriodicalId":150346,"journal":{"name":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122805542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Artificial Neural Network Based Double Stage Grid Connected Solar Photovoltaic Supply System 基于人工神经网络的双级并网太阳能光伏供电系统

2023 IEEE 8th International Conference for Convergence in Technology (I2CT)

Pub Date : 2023-04-07 DOI: 10.1109/I2CT57861.2023.10126355

Prateeksha Khare, Shailendra Kumar Sharma

In this paper, an artificial neural network technique (ANN) is used for tracking of maximum power point (TMPP) of a solar photovoltaic panel based power supply (SPPBPS). A boost converter is interfaced between the photovoltaic (PV) panel and DC link of single-phase voltage source inverter (VSI). A synchronous reference Frame (SRF)-Phase-locked loop (PLL) is used to synchronize the VSI with the grid supply. The VSI control maintains constant DC bus voltage. An ANN based TMPP is proposed for double stage 1-phase VSI with LCL filter to ensure stable overall system performance, and controllability. A proportional resonant controller with harmonic compensator (HC) controls VSI current. The proposed system behavior is observed under real dynamic situations and found satisfactory.

本文采用人工神经网络技术对太阳能光伏板电源的最大功率点进行跟踪。在光伏(PV)面板和单相电压源逆变器(VSI)的直流链路之间接口有升压变换器。同步参考帧(SRF)-锁相环(PLL)用于同步VSI与电网电源。VSI控制保持恒定的直流母线电压。为了保证系统整体性能稳定和可控性，提出了一种基于人工神经网络的LCL滤波器双级1相VSI的TMPP算法。带谐波补偿器(HC)的比例谐振控制器控制VSI电流。在实际动态情况下观察了所提出的系统行为，结果令人满意。

引用次数: 0

Design A Smart Pillow for Detection and Management of Snoring 设计一种检测和管理打鼾的智能枕头

2023 IEEE 8th International Conference for Convergence in Technology (I2CT)

Pub Date : 2023-04-07 DOI: 10.1109/I2CT57861.2023.10126197

Aishwarya Agawane, R. Mudhalwadkar

Snoring is one of the most common disorder and yet there is no proper solution for this problem. Polysomnography test is required to have before using any kind of snoring devices. So, the work is focused on a system which help to detect snoring and to manage snoring by alerting that patient. A multi-sensor system is designed so that it will sense, record and alert by using an Internet of things platform. Thus system helping patients to have a good quality of sleep because it has inbuilt music system which play relaxing music when required, meanwhile the patient snores then the system will alert that patient. One of the effect of snoring is neck pain, so an attempt is made to provide neck pain solution into pillow by considering some biological aspects. This system also gives support to neck , back and the head of the patient.

打鼾是最常见的疾病之一，但对于这个问题没有适当的解决办法。在使用任何打鼾设备之前，都需要进行多导睡眠图测试。所以，这项工作的重点是一个系统，它可以帮助检测打鼾，并通过提醒病人来控制打鼾。设计了一个多传感器系统，利用物联网平台实现感知、记录和报警。因此，系统帮助病人有良好的睡眠质量，因为它有内置的音乐系统，播放放松的音乐，当需要的时候，同时病人打鼾，然后系统会提醒病人。打鼾的影响之一是颈部疼痛，因此试图从一些生物学方面考虑，为枕头提供颈部疼痛的解决方案。该系统还为患者的颈部、背部和头部提供支撑。

引用次数: 0

Time Series Forecasting of Sea Level by Using Transformer Approach, with a Case Study in Pangandaran, Indonesia 用变压器法预测海平面的时间序列，以印度尼西亚Pangandaran为例

2023 IEEE 8th International Conference for Convergence in Technology (I2CT)

Pub Date : 2023-04-07 DOI: 10.1109/I2CT57861.2023.10126216

Ridho Nobelino Sabililah, D. Adytia

Sea level prediction is essential information for citizens who live in the coastal area and plan to build structures, especially in the construction stage around the inshore and offshore locations. The statistical method and tidal harmonic analysis have been used to predict the sea level but require long terms historical sea level data to achieve reasonable accuracy. This paper uses Transformer deep learning approaches to predict sea data levels. This paper uses only four months of data in Pangandaran, Indonesia. We use the sea level dataset obtained from the Inexpensive Device for Sea Level measurement (IDSL). The model is trained to predict 1, 7, and 14 days. We also study the sensitivity of the model in terms of lookbacks. The performance of the Transformer was compared with two other popular deep-learning methods; RNN and LSTM. To forecast 14 days, the Transformer model results in a higher coefficient correlation (CC) of 0.993 and a lower root mean squared error (RMSE) value of 0.055 compared to the other two models. Moreover, the Transformer has a faster computing performance than the other two models.

对于居住在沿海地区并计划建造建筑物的居民来说，海平面预测是必不可少的信息，特别是在近岸和近海地点的施工阶段。统计方法和潮汐调和分析已被用于预测海平面，但需要长期的历史海平面资料才能达到合理的精度。本文使用Transformer深度学习方法来预测海平面数据。这篇论文只使用了印度尼西亚邦干达兰四个月的数据。我们使用了从廉价海平面测量设备(IDSL)获得的海平面数据集。该模型经过训练可以预测1、7和14天。我们还研究了模型在回顾方面的敏感性。Transformer的性能与其他两种流行的深度学习方法进行了比较;RNN和LSTM。与其他两种模型相比，Transformer模型预测14天的相关系数(CC)较高，为0.993，均方根误差(RMSE)较低，为0.055。此外，Transformer的计算性能比其他两个型号更快。

{"title":"Time Series Forecasting of Sea Level by Using Transformer Approach, with a Case Study in Pangandaran, Indonesia","authors":"Ridho Nobelino Sabililah, D. Adytia","doi":"10.1109/I2CT57861.2023.10126216","DOIUrl":"https://doi.org/10.1109/I2CT57861.2023.10126216","url":null,"abstract":"Sea level prediction is essential information for citizens who live in the coastal area and plan to build structures, especially in the construction stage around the inshore and offshore locations. The statistical method and tidal harmonic analysis have been used to predict the sea level but require long terms historical sea level data to achieve reasonable accuracy. This paper uses Transformer deep learning approaches to predict sea data levels. This paper uses only four months of data in Pangandaran, Indonesia. We use the sea level dataset obtained from the Inexpensive Device for Sea Level measurement (IDSL). The model is trained to predict 1, 7, and 14 days. We also study the sensitivity of the model in terms of lookbacks. The performance of the Transformer was compared with two other popular deep-learning methods; RNN and LSTM. To forecast 14 days, the Transformer model results in a higher coefficient correlation (CC) of 0.993 and a lower root mean squared error (RMSE) value of 0.055 compared to the other two models. Moreover, the Transformer has a faster computing performance than the other two models.","PeriodicalId":150346,"journal":{"name":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133963456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Early detection of ADHD and Dyslexia from EEG Signals 从脑电图信号早期检测ADHD和阅读障碍

2023 IEEE 8th International Conference for Convergence in Technology (I2CT)

Pub Date : 2023-04-07 DOI: 10.1109/I2CT57861.2023.10126272

Nupur Gupte, Mitali Patel, Tanvi Pen, Swapnali Kurhade

A learning impairment is a dysfunction in one or more fundamental psychological functions that might show up as a lack of proficiency in some areas of learning, such reading, writing, while doing mathematical calculations or while coordinating movements. Learning disabilities are typically not identified until the kid is of school age, Although they can also be developed in very young infants.We aim to develop a machine learning model to analyze EEG (electroencephalogram) signals from people with learning difficulties and provide results in minutes with the highest level of accuracy. Here we will be considering Learning disabilities namely Dyslexia and ADHD(Attention Deficit Hyperactivity Disorder). For the early detection of these disabilities, machine learning algorithms like Support vector machines, K-nearest neighbors, Random Forest, Decision Trees, and convolutional neural networks were used. In order to determine which lobe combination provides the maximum accuracy, we tested the ADHD model using a variety of lobe combinations. The finding indicated that EEG signals produced the highest classification accuracy and Machine learning applications have high potential in identifying ADHD and Dyslexia.

学习障碍是一种或多种基本心理功能的功能障碍，可能表现为在某些学习领域缺乏熟练程度，如阅读、写作、数学计算或协调动作。学习障碍通常要到孩子到了上学年龄才会被发现，尽管他们也可能在很小的婴儿身上发展起来。我们的目标是开发一种机器学习模型来分析有学习困难的人的脑电图(EEG)信号，并在几分钟内以最高的准确性提供结果。在这里，我们将考虑学习障碍，即阅读障碍和ADHD(注意缺陷多动障碍)。为了早期发现这些残疾，使用了支持向量机、k近邻、随机森林、决策树和卷积神经网络等机器学习算法。为了确定哪个脑叶组合提供了最大的准确性，我们使用各种脑叶组合来测试ADHD模型。这一发现表明，EEG信号产生的分类准确率最高，机器学习应用在识别ADHD和阅读障碍方面具有很高的潜力。

{"title":"Early detection of ADHD and Dyslexia from EEG Signals","authors":"Nupur Gupte, Mitali Patel, Tanvi Pen, Swapnali Kurhade","doi":"10.1109/I2CT57861.2023.10126272","DOIUrl":"https://doi.org/10.1109/I2CT57861.2023.10126272","url":null,"abstract":"A learning impairment is a dysfunction in one or more fundamental psychological functions that might show up as a lack of proficiency in some areas of learning, such reading, writing, while doing mathematical calculations or while coordinating movements. Learning disabilities are typically not identified until the kid is of school age, Although they can also be developed in very young infants.We aim to develop a machine learning model to analyze EEG (electroencephalogram) signals from people with learning difficulties and provide results in minutes with the highest level of accuracy. Here we will be considering Learning disabilities namely Dyslexia and ADHD(Attention Deficit Hyperactivity Disorder). For the early detection of these disabilities, machine learning algorithms like Support vector machines, K-nearest neighbors, Random Forest, Decision Trees, and convolutional neural networks were used. In order to determine which lobe combination provides the maximum accuracy, we tested the ADHD model using a variety of lobe combinations. The finding indicated that EEG signals produced the highest classification accuracy and Machine learning applications have high potential in identifying ADHD and Dyslexia.","PeriodicalId":150346,"journal":{"name":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134211955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Handwritten Character Recognition of Telugu Characters 泰卢固语字符的手写字符识别

2023 IEEE 8th International Conference for Convergence in Technology (I2CT)

Pub Date : 2023-04-07 DOI: 10.1109/I2CT57861.2023.10126377

Yash Prashant Wasalwar, Kishan Singh Bagga, Pvrr Bhogendra Rao, S. Dongre

Given the cursive structure of the writing and the similarity in shape of the letters, Telugu handwritten character identification is an interesting topic. The lack of Telugu-related handwritten datasets has slowed the development of handwritten word recognizers and forced researchers to compare various approaches. Modern deep neural networks find it difficult because they often need hundreds or thousands of photos per class. It has been demonstrated that learning important aspects of machine learning systems can be computationally expensive and challenging when there is a limited amount of data available. This research analysis work proposes a use case on the pre-existing model called EfficientNet and on top of that a custom pooling layer is added to check the trend as the dataset size increases of Telugu characters. The dataset has been divided into three categories, namely, Vowels only dataset, Consonant only dataset, and All character dataset. Proposed model was trained with a considerable amount of dataset containing half a thousand of handwritten Telugu characters and has produced some fascinating results which were worth observing. The accuracies had followed a certain trend. The model was tested on the dataset collected, which were filtered out to record any performance improvement and improvement was observed, where average accuracy went from 55% to 92%.

考虑到书写的草书结构和字母形状的相似性，泰卢固语手写字符识别是一个有趣的话题。缺乏与泰卢格语相关的手写数据集已经减缓了手写单词识别器的发展，并迫使研究人员比较各种方法。现代深度神经网络很难做到这一点，因为它们每节课通常需要数百或数千张照片。已经证明，当可用的数据量有限时，学习机器学习系统的重要方面在计算上可能是昂贵的和具有挑战性的。本研究分析工作提出了一个基于现有模型的用例，称为EfficientNet，并在此基础上添加了一个自定义池层，以检查随泰卢固语字符数据集大小增加的趋势。数据集分为三类，即纯元音数据集、纯辅音数据集和全字符数据集。所提出的模型是用包含50个手写泰卢固语字符的大量数据集进行训练的，并产生了一些值得观察的有趣结果。准确性遵循一定的趋势。该模型在收集的数据集上进行测试，这些数据集被过滤掉以记录任何性能改进和观察到的改进，其中平均准确率从55%提高到92%。

{"title":"Handwritten Character Recognition of Telugu Characters","authors":"Yash Prashant Wasalwar, Kishan Singh Bagga, Pvrr Bhogendra Rao, S. Dongre","doi":"10.1109/I2CT57861.2023.10126377","DOIUrl":"https://doi.org/10.1109/I2CT57861.2023.10126377","url":null,"abstract":"Given the cursive structure of the writing and the similarity in shape of the letters, Telugu handwritten character identification is an interesting topic. The lack of Telugu-related handwritten datasets has slowed the development of handwritten word recognizers and forced researchers to compare various approaches. Modern deep neural networks find it difficult because they often need hundreds or thousands of photos per class. It has been demonstrated that learning important aspects of machine learning systems can be computationally expensive and challenging when there is a limited amount of data available. This research analysis work proposes a use case on the pre-existing model called EfficientNet and on top of that a custom pooling layer is added to check the trend as the dataset size increases of Telugu characters. The dataset has been divided into three categories, namely, Vowels only dataset, Consonant only dataset, and All character dataset. Proposed model was trained with a considerable amount of dataset containing half a thousand of handwritten Telugu characters and has produced some fascinating results which were worth observing. The accuracies had followed a certain trend. The model was tested on the dataset collected, which were filtered out to record any performance improvement and improvement was observed, where average accuracy went from 55% to 92%.","PeriodicalId":150346,"journal":{"name":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134240687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Character and Word Level Gesture Recognition of Indian Sign Language 印度手语字字级手势识别

2023 IEEE 8th International Conference for Convergence in Technology (I2CT)

Pub Date : 2023-04-07 DOI: 10.1109/I2CT57861.2023.10126314

Rohini K Katti, S. C, Padmashri Desai, Shankar G

Communication is essential to humans because it allows the dissemination of knowledge and the formation of interpersonal connections. We communicate through speaking, facial expressions, hand gestures, reading, writing, and sketching, among other things. However, speaking is the most often utilized means of communication. People having speech and hearing disabilities can only communicate using hand gestures, making them extremely reliant on nonverbal modes of communication. Hearing-impaired persons can communicate via sign language. Globally, around 1 percent(5 million) of the Indian population falls into this group. ISL is a complete language with its own vocabulary, semantics, lexicon, and a variety of other distinctive linguistic features. In our work, we present the methods for Indian sign language recognition at the character and word levels. The Bag of Visual Words(BoVW) technique recognizes ISL at character level(A-Z, 0-9) with an accuracy of 99 percent. Indian Lexicon Sign Language Dataset - INCLUDE-50 dataset is used for word-level sign language recognition. Inception model, a deep Convolutional Neural Network(CNN) is used to train the spatial features and LSTM RNN(Recurrent Neural Network) is used to train the temporal features of the video. Using CNN predictions as input to RNN, we achieved an accuracy of 86.7 %. In order to optimize the training process, only 60 % of the dataset is trained using the Meta-Learning model along with LSTM RNN and obtained an accuracy of 84.4 %, thus reducing the training time by 70 % and reaching nearly as close accuracy as the previous pre-trained model.

沟通对人类来说是必不可少的，因为它允许知识的传播和人际关系的形成。我们通过说话、面部表情、手势、阅读、写作和素描等方式进行交流。然而，说话是最常用的交流方式。有语言和听力障碍的人只能用手势进行交流，这使得他们非常依赖非语言的交流方式。听力受损的人可以通过手语进行交流。在全球范围内，大约1%(500万)的印度人口属于这一群体。ISL是一种完整的语言，具有自己的词汇、语义、词汇和各种其他独特的语言特征。在我们的工作中，我们提出了在字符和单词层面的印度手语识别方法。视觉词袋(BoVW)技术在字符级别(A-Z, 0-9)识别ISL，准确率为99%。印度词典手语数据集- INCLUDE-50数据集用于单词级手语识别。在初始模型中，使用深度卷积神经网络(CNN)训练视频的空间特征，使用LSTM RNN(递归神经网络)训练视频的时间特征。使用CNN预测作为RNN的输入，我们达到了86.7%的准确率。为了优化训练过程，只有60%的数据集使用元学习模型和LSTM RNN进行训练，获得了84.4%的准确率，从而减少了70%的训练时间，并且达到了几乎与之前预训练模型一样接近的准确率。

{"title":"Character and Word Level Gesture Recognition of Indian Sign Language","authors":"Rohini K Katti, S. C, Padmashri Desai, Shankar G","doi":"10.1109/I2CT57861.2023.10126314","DOIUrl":"https://doi.org/10.1109/I2CT57861.2023.10126314","url":null,"abstract":"Communication is essential to humans because it allows the dissemination of knowledge and the formation of interpersonal connections. We communicate through speaking, facial expressions, hand gestures, reading, writing, and sketching, among other things. However, speaking is the most often utilized means of communication. People having speech and hearing disabilities can only communicate using hand gestures, making them extremely reliant on nonverbal modes of communication. Hearing-impaired persons can communicate via sign language. Globally, around 1 percent(5 million) of the Indian population falls into this group. ISL is a complete language with its own vocabulary, semantics, lexicon, and a variety of other distinctive linguistic features. In our work, we present the methods for Indian sign language recognition at the character and word levels. The Bag of Visual Words(BoVW) technique recognizes ISL at character level(A-Z, 0-9) with an accuracy of 99 percent. Indian Lexicon Sign Language Dataset - INCLUDE-50 dataset is used for word-level sign language recognition. Inception model, a deep Convolutional Neural Network(CNN) is used to train the spatial features and LSTM RNN(Recurrent Neural Network) is used to train the temporal features of the video. Using CNN predictions as input to RNN, we achieved an accuracy of 86.7 %. In order to optimize the training process, only 60 % of the dataset is trained using the Meta-Learning model along with LSTM RNN and obtained an accuracy of 84.4 %, thus reducing the training time by 70 % and reaching nearly as close accuracy as the previous pre-trained model.","PeriodicalId":150346,"journal":{"name":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","volume":"857 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113994575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0