In response to the characteristics of financial image data, this paper proposes an efficient digital image compression scheme. Firstly, discrete cosine transform (DCT) is applied to divide the financial image into DC and AC coefficients. Secondly, based on the characteristics of DCT coefficients, a fuzzy method is employed to categorize DCT subblocks into smooth, texture, and edge classes, enabling distinct quantization strategies. Subsequently, to eliminate spatial and statistical redundancies in financial images, common features and structures are utilized, and a specific scanning approach is employed to optimize the arrangement of important coefficients. Finally, differential prediction and entropy coding are employed for DCT coefficient scanning encoding, enhancing compression efficiency. The objective evaluation metrics of this algorithm are approximately 2 dB higher than existing algorithms at bit rates of 0.25 and 0.5. Even at bit rates of 0.75, 1.5, 2.5, and 3.5, the performance of this method still outperforms the comparative algorithms, demonstrating its capability to efficiently store and transmit massive financial image data, thereby providing robust support for data processing in the financial sector.
{"title":"Financial Digital Images Compression Method Based on Discrete Cosine Transform","authors":"Wenjin Wang, Miaomiao Lu, Xuanling Dai, Ping Jiang","doi":"10.3103/S014641162470069X","DOIUrl":"10.3103/S014641162470069X","url":null,"abstract":"<p>In response to the characteristics of financial image data, this paper proposes an efficient digital image compression scheme. Firstly, discrete cosine transform (DCT) is applied to divide the financial image into DC and AC coefficients. Secondly, based on the characteristics of DCT coefficients, a fuzzy method is employed to categorize DCT subblocks into smooth, texture, and edge classes, enabling distinct quantization strategies. Subsequently, to eliminate spatial and statistical redundancies in financial images, common features and structures are utilized, and a specific scanning approach is employed to optimize the arrangement of important coefficients. Finally, differential prediction and entropy coding are employed for DCT coefficient scanning encoding, enhancing compression efficiency. The objective evaluation metrics of this algorithm are approximately 2 dB higher than existing algorithms at bit rates of 0.25 and 0.5. Even at bit rates of 0.75, 1.5, 2.5, and 3.5, the performance of this method still outperforms the comparative algorithms, demonstrating its capability to efficiently store and transmit massive financial image data, thereby providing robust support for data processing in the financial sector.</p>","PeriodicalId":46238,"journal":{"name":"AUTOMATIC CONTROL AND COMPUTER SCIENCES","volume":"58 5","pages":"592 - 601"},"PeriodicalIF":0.6,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-06DOI: 10.3103/S0146411624700639
Walid Fakhet, Salim El Khediri, Salah Zidi
Arabic handwritten character recognition (AHCR) is the process of automatically identifying and recognizing handwritten Arabic characters. This is a challenging task due to the complexity of the Arabic script, which includes a large number of characters with complex shapes and ligatures. In this paper, we present a novel approach based on Levenshtein distance to recognize Arabic handwritten characters by combining the classification and the postprocessing phases. To train the proposed model, we created an Arabic optical character recognition (OCR) context database divided into multiple text files. Each file in the database belongs to one of five well-defined contexts: sport, economy, religion, politics, and culture. The total number of words in each file is 15 000. The experiment results show that the new method outperforms the state-of-the-art approach. The error rate achieved by using 15 000 words was 1.2%.
{"title":"A Novel Arabic Optical Character Recognition Approach Based on Levenshtein Distance","authors":"Walid Fakhet, Salim El Khediri, Salah Zidi","doi":"10.3103/S0146411624700639","DOIUrl":"10.3103/S0146411624700639","url":null,"abstract":"<p>Arabic handwritten character recognition (AHCR) is the process of automatically identifying and recognizing handwritten Arabic characters. This is a challenging task due to the complexity of the Arabic script, which includes a large number of characters with complex shapes and ligatures. In this paper, we present a novel approach based on Levenshtein distance to recognize Arabic handwritten characters by combining the classification and the postprocessing phases. To train the proposed model, we created an Arabic optical character recognition (OCR) context database divided into multiple text files. Each file in the database belongs to one of five well-defined contexts: sport, economy, religion, politics, and culture. The total number of words in each file is 15 000. The experiment results show that the new method outperforms the state-of-the-art approach. The error rate achieved by using 15 000 words was 1.2%.</p>","PeriodicalId":46238,"journal":{"name":"AUTOMATIC CONTROL AND COMPUTER SCIENCES","volume":"58 5","pages":"519 - 529"},"PeriodicalIF":0.6,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-06DOI: 10.3103/S0146411624700664
Haiyan Kang, Congming Zhang, Hongling Jiang
In pursuit of enhancing public safety and addressing challenges in driver behavior recognition, an intelligent recognition and detection method of driver behavior based on ResNet (IRDMDB-ResNet) is proposed. The approach aims to identify instances of distracted driving resulting from abnormal behavior. Three models (IRDMDB-1, IRDMDB-2, and IRDMDB-3) are presented to implement this method, which is adapted to a deep learning behavior recognition in driving scenarios. Firstly, this study utilizes two well-tested real datasets: Driver Drowsiness Dataset and The State Farm. These datasets undergo preprocessing to meet the input requirements of the model. Secondly, a lightweight convolutional neural network model has been designed to extract features, aiding the warning system in delivering precise information and minimizing traffic collisions to the maximum extent possible. Finally, the model is evaluated based on the confusion metrics, accuracy, precision, recall, and F1-score criterion. As a result, the IRDMDB-3 model proposed in this paper can recognize and detect driver behavior effectively and stably. And it achieves 99.79% of accuracy in the classification of distracted drivers looking elsewhere in The State Farm dataset. Similarly, the detection at Driver Drowsiness Dataset is 99.68%. This advancement represents a significant improvement in traffic safety, showcasing adaptability to diverse behaviors and remarkable recognition and detection capabilities.
为了提高公共安全和应对驾驶员行为识别方面的挑战,本文提出了一种基于 ResNet 的驾驶员行为智能识别和检测方法(IRDMDB-ResNet)。该方法旨在识别异常行为导致的分心驾驶实例。为实现该方法,提出了三个模型(IRDMDB-1、IRDMDB-2 和 IRDMDB-3),该方法适用于驾驶场景中的深度学习行为识别。首先,本研究使用了两个经过充分测试的真实数据集:驾驶员昏昏欲睡数据集》和《州立农场》。这些数据集经过预处理,以满足模型的输入要求。其次,设计了一个轻量级卷积神经网络模型来提取特征,帮助预警系统提供精确信息,最大限度地减少交通碰撞。最后,根据混淆度量、准确度、精确度、召回率和 F1 分数标准对模型进行评估。结果表明,本文提出的 IRDMDB-3 模型能够有效、稳定地识别和检测驾驶员行为。在 The State Farm 数据集中,该模型对分心驾驶员的分类准确率达到了 99.79%。同样,在驾驶员昏昏欲睡数据集上的检测准确率也达到了 99.68%。这一进步代表着交通安全方面的重大改进,展示了对各种行为的适应性以及卓越的识别和检测能力。
{"title":"Advancing Driver Behavior Recognition: An Intelligent Approach Utilizing ResNet","authors":"Haiyan Kang, Congming Zhang, Hongling Jiang","doi":"10.3103/S0146411624700664","DOIUrl":"10.3103/S0146411624700664","url":null,"abstract":"<p>In pursuit of enhancing public safety and addressing challenges in driver behavior recognition, an intelligent recognition and detection method of driver behavior based on ResNet (IRDMDB-ResNet) is proposed. The approach aims to identify instances of distracted driving resulting from abnormal behavior. Three models (IRDMDB-1, IRDMDB-2, and IRDMDB-3) are presented to implement this method, which is adapted to a deep learning behavior recognition in driving scenarios. Firstly, this study utilizes two well-tested real datasets: Driver Drowsiness Dataset and The State Farm. These datasets undergo preprocessing to meet the input requirements of the model. Secondly, a lightweight convolutional neural network model has been designed to extract features, aiding the warning system in delivering precise information and minimizing traffic collisions to the maximum extent possible. Finally, the model is evaluated based on the confusion metrics, accuracy, precision, recall, and F1-score criterion. As a result, the IRDMDB-3 model proposed in this paper can recognize and detect driver behavior effectively and stably. And it achieves 99.79% of accuracy in the classification of distracted drivers looking elsewhere in The State Farm dataset. Similarly, the detection at Driver Drowsiness Dataset is 99.68%. This advancement represents a significant improvement in traffic safety, showcasing adaptability to diverse behaviors and remarkable recognition and detection capabilities.</p>","PeriodicalId":46238,"journal":{"name":"AUTOMATIC CONTROL AND COMPUTER SCIENCES","volume":"58 5","pages":"555 - 568"},"PeriodicalIF":0.6,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-28DOI: 10.3103/S0146411624700597
Shan Li, Diyuan Tan, Binbin Yao, Zhe Wang
For the elderly, falls can be extremely fatal. However, due to the physical decline of the elderly, it is difficult to avoid falls. Therefore, to the greatest extent feasible lessen the harm that falls on the elderly inflict, so that they can be found in the first time of falls, this study based on wearable devices, proposed a fall monitoring system using an improved K-nearest neighbor algorithm. The improved fuzzy K-nearest neighbor algorithm combined with support vector machine algorithm is applied to improve the efficiency and accuracy of the algorithm, and reduce the false positive rate and false negative rate as much as possible. The suggested model’s average precision in the simulation experiment is 97.5%. The specificity was 97.6%. The sensitivity was 97.5%. The convergence performance is also good, 24 iterations can reach the optimal. In the actual experiment, the average accuracy reached 98.7%; The false alarm rate is only 0.7%; The negative rate was 2.5%; Its performance is superior to other two algorithms. This shows that the proposed method has excellent accuracy, false positive rate and false negative rate in practical application, which has important significance for the health and safety of the elderly.
摘要 对于老年人来说,跌倒是极其致命的。然而,由于老年人身体机能下降,很难避免跌倒。因此,为了在可行的情况下最大程度地减轻跌倒对老年人造成的伤害,使他们能在跌倒的第一时间被发现,本研究基于可穿戴设备,提出了一种使用改进的 K 近邻算法的跌倒监测系统。将改进的模糊 K 近邻算法与支持向量机算法相结合,提高了算法的效率和准确性,尽可能地降低了假阳性率和假阴性率。在模拟实验中,建议模型的平均精确度为 97.5%。特异性为 97.6%。灵敏度为 97.5%。收敛性能也很好,迭代 24 次即可达到最优。在实际实验中,平均准确率达到 98.7%;误报率仅为 0.7%;负值率为 2.5%;其性能优于其他两种算法。由此可见,所提出的方法在实际应用中具有极佳的准确率、误报率和假阴性率,对老年人的健康和安全具有重要意义。
{"title":"Fall Monitoring System Based on Wearable Device and Improved KNN","authors":"Shan Li, Diyuan Tan, Binbin Yao, Zhe Wang","doi":"10.3103/S0146411624700597","DOIUrl":"10.3103/S0146411624700597","url":null,"abstract":"<p>For the elderly, falls can be extremely fatal. However, due to the physical decline of the elderly, it is difficult to avoid falls. Therefore, to the greatest extent feasible lessen the harm that falls on the elderly inflict, so that they can be found in the first time of falls, this study based on wearable devices, proposed a fall monitoring system using an improved K-nearest neighbor algorithm. The improved fuzzy K-nearest neighbor algorithm combined with support vector machine algorithm is applied to improve the efficiency and accuracy of the algorithm, and reduce the false positive rate and false negative rate as much as possible. The suggested model’s average precision in the simulation experiment is 97.5%. The specificity was 97.6%. The sensitivity was 97.5%. The convergence performance is also good, 24 iterations can reach the optimal. In the actual experiment, the average accuracy reached 98.7%; The false alarm rate is only 0.7%; The negative rate was 2.5%; Its performance is superior to other two algorithms. This shows that the proposed method has excellent accuracy, false positive rate and false negative rate in practical application, which has important significance for the health and safety of the elderly.</p>","PeriodicalId":46238,"journal":{"name":"AUTOMATIC CONTROL AND COMPUTER SCIENCES","volume":"58 4","pages":"366 - 378"},"PeriodicalIF":0.6,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-28DOI: 10.3103/S0146411624700500
Saeid Haidari, Alireza Hosseinpour
This paper presents a new method of localizing radio frequency (RF) source in non-line of sight (NLOS) using data collected using the anchor and map. The measurable observation in the unmanned aerial vehicle (UAV) is assumed to be the received signal strength indicator (RSSI), and a method is presented based on the RSSI observation of the reflected signal sent from the anchor to estimate the location of the reflecting obstacle, which is a two-step method for map estimation and localization. It is also assumed that the map of the obstacle location is also available; the location of the reflective obstacle can be obtained using the map with an error. And finally, by combining this data in a weighted and improved particle filter for the optimal use of the number of particles in a wide area, the location of the unknown RF source is estimated more accurately. It was revealed that the proposed method improved localization and had good precision.
{"title":"RF Source Localization Method Based on a Single-Anchor and Map Using Reflection in an Improved Particle Filter","authors":"Saeid Haidari, Alireza Hosseinpour","doi":"10.3103/S0146411624700500","DOIUrl":"10.3103/S0146411624700500","url":null,"abstract":"<p>This paper presents a new method of localizing radio frequency (RF) source in non-line of sight (NLOS) using data collected using the anchor and map. The measurable observation in the unmanned aerial vehicle (UAV) is assumed to be the received signal strength indicator (RSSI), and a method is presented based on the RSSI observation of the reflected signal sent from the anchor to estimate the location of the reflecting obstacle, which is a two-step method for map estimation and localization. It is also assumed that the map of the obstacle location is also available; the location of the reflective obstacle can be obtained using the map with an error. And finally, by combining this data in a weighted and improved particle filter for the optimal use of the number of particles in a wide area, the location of the unknown RF source is estimated more accurately. It was revealed that the proposed method improved localization and had good precision.</p>","PeriodicalId":46238,"journal":{"name":"AUTOMATIC CONTROL AND COMPUTER SCIENCES","volume":"58 4","pages":"379 - 391"},"PeriodicalIF":0.6,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this study, the Tamarixchinensis forest in Changyi national marine ecological special protected area in Shandong province, China, was researched for forest fire monitoring based on thermal infrared remote sensing technology. We summarized the commonly monitoring methods for forest fire point based on remote sensing technology into two types: fixed threshold method (including its deformation model and extension model) and adjacent pixel analysis method (also known as background pixel correlation method). And we analyzed the advantages and disadvantages of these two methods. The BT (brightness temperature) data inverted from the remote sensing images of IRS sensor (HJ 1B satellite) and TIRS sensor (Landsat-8 satellite) indicated that there not had enough thermal radiation to form a fire point during the above phases in the protected zone. The research results and methods also confirmed that thermal infrared remote sensing technology can be used for forest fire monitoring and identification of macro forest fire point.
{"title":"Fire Risk Monitoring of Tamarix chinensis Forest Based on Infrared Remote Sensing Technology","authors":"Jin Wang, Ruiting Liu, Liming Liu, Xiaoxiang Cheng, Feiyong Chen, Xue Shen","doi":"10.3103/S0146411624700482","DOIUrl":"10.3103/S0146411624700482","url":null,"abstract":"<p>In this study, the <i>Tamarix</i> <i>chinensis</i> forest in Changyi national marine ecological special protected area in Shandong province, China, was researched for forest fire monitoring based on thermal infrared remote sensing technology. We summarized the commonly monitoring methods for forest fire point based on remote sensing technology into two types: fixed threshold method (including its deformation model and extension model) and adjacent pixel analysis method (also known as background pixel correlation method). And we analyzed the advantages and disadvantages of these two methods. The BT (brightness temperature) data inverted from the remote sensing images of IRS sensor (HJ 1B satellite) and TIRS sensor (Landsat-8 satellite) indicated that there not had enough thermal radiation to form a fire point during the above phases in the protected zone. The research results and methods also confirmed that thermal infrared remote sensing technology can be used for forest fire monitoring and identification of macro forest fire point.</p>","PeriodicalId":46238,"journal":{"name":"AUTOMATIC CONTROL AND COMPUTER SCIENCES","volume":"58 4","pages":"359 - 365"},"PeriodicalIF":0.6,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-28DOI: 10.3103/S0146411624700585
Lei Yan
In this paper, on the basis of in-depth research on the key technology of binocular vision measurement; a set of multidimension online measurement system for image recognition is built. Canny operator is used as a tool to detect the contour features of parts, and the Canny operator is accelerated and improved from the aspects of mathematical reasoning and Gaussian pyramid. A synchronous external trigger circuit for a binocular camera and light source was designed. Finally, the improved algorithms in various aspects of visual measurement in this paper are applied to the measurement system. The experimental results show that the online measurement system has the advantages of high measurement accuracy and small repeatability errors.
{"title":"Research on Binocular Vision Image Calibration Method Based on Canny Operator","authors":"Lei Yan","doi":"10.3103/S0146411624700585","DOIUrl":"10.3103/S0146411624700585","url":null,"abstract":"<p>In this paper, on the basis of in-depth research on the key technology of binocular vision measurement; a set of multidimension online measurement system for image recognition is built. Canny operator is used as a tool to detect the contour features of parts, and the Canny operator is accelerated and improved from the aspects of mathematical reasoning and Gaussian pyramid. A synchronous external trigger circuit for a binocular camera and light source was designed. Finally, the improved algorithms in various aspects of visual measurement in this paper are applied to the measurement system. The experimental results show that the online measurement system has the advantages of high measurement accuracy and small repeatability errors.</p>","PeriodicalId":46238,"journal":{"name":"AUTOMATIC CONTROL AND COMPUTER SCIENCES","volume":"58 4","pages":"472 - 480"},"PeriodicalIF":0.6,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-28DOI: 10.3103/S0146411624700548
Lisha Yao
Facial features extracted from deep convolutional networks are susceptible to background, individual identity and other factors. It interferes with facial expression recognition when mixed with useless features. Considering that different scale features have rich semantic and texture information respectively, this paper takes VGG-16 as the basic network structure and combines multiscale features to obtain richer feature information. In addition, the input feature map elements are enhanced or suppressed by the attention module in order to extract salient features more accurately. The proposed method was validated on two commonly used expression data sets CK+ and RAF-DB, and the recognition rates were 98.77 and 82.83%, respectively. Experimental results show the superiority of this method.
{"title":"Facial Expression Recognition Based on Multiscale Features and Attention Mechanism","authors":"Lisha Yao","doi":"10.3103/S0146411624700548","DOIUrl":"10.3103/S0146411624700548","url":null,"abstract":"<p>Facial features extracted from deep convolutional networks are susceptible to background, individual identity and other factors. It interferes with facial expression recognition when mixed with useless features. Considering that different scale features have rich semantic and texture information respectively, this paper takes VGG-16 as the basic network structure and combines multiscale features to obtain richer feature information. In addition, the input feature map elements are enhanced or suppressed by the attention module in order to extract salient features more accurately. The proposed method was validated on two commonly used expression data sets CK+ and RAF-DB, and the recognition rates were 98.77 and 82.83%, respectively. Experimental results show the superiority of this method.</p>","PeriodicalId":46238,"journal":{"name":"AUTOMATIC CONTROL AND COMPUTER SCIENCES","volume":"58 4","pages":"429 - 440"},"PeriodicalIF":0.6,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-28DOI: 10.3103/S0146411624700561
Eugene Zhmakin, Grach Mkrtchian
This paper deals with the problem of creating a keyword spotting (KWS) system with real-world audio data. The paper describes the different methods used to build KWS systems, deep learning models such as convolutional neural networks (CNN), transformers, etc. The paper also discusses the mainstream dataset for training and testing KWS models, Google Speech Commands. We conduct experiments on Google Speech Commands dataset and propose our method of creating a KWS dataset and that helps neural networks achieve better results in training on relatively small amounts of data. We also introduce an idea of a hybrid KWS inference system architecture that uses voice detection and light-weight speech recognition framework in attempt to boost its computational performance and accuracy. We conclude by noting that KWS is an important challenge in the field of speech recognition, and suggest that their method can be used to improve the performance of KWS systems in the circumstances of low amounts of training data. We also note that future research could focus on bettering the process of evaluating the models and improving the overall performance of KWS systems.
{"title":"Building a Production-Ready Keyword Detection System on a Real-World Audio","authors":"Eugene Zhmakin, Grach Mkrtchian","doi":"10.3103/S0146411624700561","DOIUrl":"10.3103/S0146411624700561","url":null,"abstract":"<p>This paper deals with the problem of creating a keyword spotting (KWS) system with real-world audio data. The paper describes the different methods used to build KWS systems, deep learning models such as convolutional neural networks (CNN), transformers, etc. The paper also discusses the mainstream dataset for training and testing KWS models, Google Speech Commands. We conduct experiments on Google Speech Commands dataset and propose our method of creating a KWS dataset and that helps neural networks achieve better results in training on relatively small amounts of data. We also introduce an idea of a hybrid KWS inference system architecture that uses voice detection and light-weight speech recognition framework in attempt to boost its computational performance and accuracy. We conclude by noting that KWS is an important challenge in the field of speech recognition, and suggest that their method can be used to improve the performance of KWS systems in the circumstances of low amounts of training data. We also note that future research could focus on bettering the process of evaluating the models and improving the overall performance of KWS systems.</p>","PeriodicalId":46238,"journal":{"name":"AUTOMATIC CONTROL AND COMPUTER SCIENCES","volume":"58 4","pages":"454 - 458"},"PeriodicalIF":0.6,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-28DOI: 10.3103/S0146411624700512
Wang Hao, Li Hui, Song Duanzheng, Zhu Jintao
In recent years, the proliferating of IoT (Internet of things)-originated applications have generated huge amounts of data, which has put enormous pressure on infrastructures such as the network cloud. In this regard, scholars have proposed an architectural model for “cloud-fog” computing, where one of the obstacles to fog computing is how to allocate computing resources to minimize network resources. A heuristic-based TDCC (Time, distance, cost and computing-power) algorithm is proposed to optimize the task scheduling problem in this heterogeneous system for genetic algorithm-based “cloud-fog” computing, including execution time, operational cost, distance and total computing power resources. The algorithm uses evolutionary genetic algorithms as a research tool to combine the advantages of cloud computing, fog computing and genetic algorithms to achieve a balance between latency, cost, link length and computing power. In the hybrid computing task scheduling, this algorithm has a better balance than TCaS algorithm which only considers a single metric; this algorithm has a better adaptation value than traditional MPSO algorithm by 2.61%, BLA algorithm by 6.92% and RR algorithm by 33.39%, respectively. The algorithm is also flexible enough to match the user’s needs for high performance distance-cost-computing power, enhancing the effectiveness of the system.
{"title":"A Research on Genetic Algorithm-Based Task Scheduling in Cloud-Fog Computing Systems","authors":"Wang Hao, Li Hui, Song Duanzheng, Zhu Jintao","doi":"10.3103/S0146411624700512","DOIUrl":"10.3103/S0146411624700512","url":null,"abstract":"<p>In recent years, the proliferating of IoT (Internet of things)-originated applications have generated huge amounts of data, which has put enormous pressure on infrastructures such as the network cloud. In this regard, scholars have proposed an architectural model for “cloud-fog” computing, where one of the obstacles to fog computing is how to allocate computing resources to minimize network resources. A heuristic-based TDCC (Time, distance, cost and computing-power) algorithm is proposed to optimize the task scheduling problem in this heterogeneous system for genetic algorithm-based “cloud-fog” computing, including execution time, operational cost, distance and total computing power resources. The algorithm uses evolutionary genetic algorithms as a research tool to combine the advantages of cloud computing, fog computing and genetic algorithms to achieve a balance between latency, cost, link length and computing power. In the hybrid computing task scheduling, this algorithm has a better balance than TCaS algorithm which only considers a single metric; this algorithm has a better adaptation value than traditional MPSO algorithm by 2.61%, BLA algorithm by 6.92% and RR algorithm by 33.39%, respectively. The algorithm is also flexible enough to match the user’s needs for high performance distance-cost-computing power, enhancing the effectiveness of the system.</p>","PeriodicalId":46238,"journal":{"name":"AUTOMATIC CONTROL AND COMPUTER SCIENCES","volume":"58 4","pages":"392 - 407"},"PeriodicalIF":0.6,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}