Pub Date : 2023-10-01DOI: 10.17586/2226-1494-2023-23-5-946-954
A.B. Menisov, A.G. Lomako, T.R. Sabirov
At present, the interpretability of Natural Language Processing (NLP) models is unsatisfactory due to the imperfection of the scientific and methodological apparatus for describing the functioning of both individual elements and models as a whole. One of the problems associated with poor interpretability is the low reliability of the functioning of neural networks that process natural language texts. Small perturbations in text data are known to affect the stability of neural networks. The paper presents a method for testing NLP models for the threat of evasion attacks. The method includes the following text adversarial examples generations: random text modification and modification generation network. Random text modification is made using homoglyphs, rearranging text, adding invisible characters and removing characters randomly. The modification generation network is based on a generative adversarial architecture of neural networks. The conducted experiments demonstrated the effectiveness of the testing method based on the network for generating text adversarial examples. The advantage of the developed method is, firstly, in the possibility of generating more natural and diverse adversarial examples, which have less restrictions, and, secondly, that multiple requests to the model under test are not required. This may be applicable in more complex test scenarios where interaction with the model is limited. The experiments showed that the developed method allowed achieving a relatively better balance of effectiveness and stealth of textual adversarial examples (e.g. GigaChat and YaGPT models tested). The results of the work showed the need to test for defects and vulnerabilities that can be exploited by attackers in order to reduce the quality of the functioning of NLP models. This indicates a lot of potential in terms of ensuring the reliability of machine learning models. A promising direction is the problem of restoring the level of security (confidentiality, availability and integrity) of NLP models.
{"title":"Method for testing NLP models with text adversarial examples","authors":"A.B. Menisov, A.G. Lomako, T.R. Sabirov","doi":"10.17586/2226-1494-2023-23-5-946-954","DOIUrl":"https://doi.org/10.17586/2226-1494-2023-23-5-946-954","url":null,"abstract":"At present, the interpretability of Natural Language Processing (NLP) models is unsatisfactory due to the imperfection of the scientific and methodological apparatus for describing the functioning of both individual elements and models as a whole. One of the problems associated with poor interpretability is the low reliability of the functioning of neural networks that process natural language texts. Small perturbations in text data are known to affect the stability of neural networks. The paper presents a method for testing NLP models for the threat of evasion attacks. The method includes the following text adversarial examples generations: random text modification and modification generation network. Random text modification is made using homoglyphs, rearranging text, adding invisible characters and removing characters randomly. The modification generation network is based on a generative adversarial architecture of neural networks. The conducted experiments demonstrated the effectiveness of the testing method based on the network for generating text adversarial examples. The advantage of the developed method is, firstly, in the possibility of generating more natural and diverse adversarial examples, which have less restrictions, and, secondly, that multiple requests to the model under test are not required. This may be applicable in more complex test scenarios where interaction with the model is limited. The experiments showed that the developed method allowed achieving a relatively better balance of effectiveness and stealth of textual adversarial examples (e.g. GigaChat and YaGPT models tested). The results of the work showed the need to test for defects and vulnerabilities that can be exploited by attackers in order to reduce the quality of the functioning of NLP models. This indicates a lot of potential in terms of ensuring the reliability of machine learning models. A promising direction is the problem of restoring the level of security (confidentiality, availability and integrity) of NLP models.","PeriodicalId":21700,"journal":{"name":"Scientific and Technical Journal of Information Technologies, Mechanics and Optics","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136247738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-01DOI: 10.17586/2226-1494-2023-23-5-955-966
S.A. Shaker, A.S. Arif, Y. Fazea
Motion estimation plays a crucial role in video coding; the Adaptive Rood Pattern Search (ARPS) algorithm is a well known fast motion estimation algorithm. However, ARPS has certain limitations, such as the lack of an accurate starting motion vector, a fixed Zero Motion Prejudgment (ZMP) threshold unsuitable for fast motion video sequences, and the repetitive use of a Unit Rood Pattern (URP) resulting in increased computational complexity. To address these issues, this paper proposes a novel algorithm called Efficient Adaptive Rood Pattern Search (EARPS). EARPS overcomes these limitations by employing the Full Search algorithm to obtain optimal motion vectors for the first column in each frame, adopting a dynamic ZMP threshold that adapts to varying motion speeds in video sequences and utilizing URP only once to reduce computational overhead. The performance of the new proposed EARPS algorithm is evaluated and compared with that of ARPS algorithm using various video sequences with different motion speeds. The number of searching points and Peak Signal-to-Noise Ratio (PSNR) are used to quantify computing complexity and accuracy. The experimental findings show that EARPS surpasses ARPS in terms of computing complexity while retaining a decent degree of PSNR accuracy. The proposed EARPS motion estimation algorithm main contribution is to achieve high speed with reasonable accuracy, regardless of the type of motion speed in the video frames. The EARPS algorithm offers a substantial advancement over ARPS, delivering a more efficient motion estimation method with broader applicability in video processing. It represents a significant contribution to the development of effective motion estimation algorithms.
{"title":"A new efficient adaptive rood pattern search motion estimation algorithm","authors":"S.A. Shaker, A.S. Arif, Y. Fazea","doi":"10.17586/2226-1494-2023-23-5-955-966","DOIUrl":"https://doi.org/10.17586/2226-1494-2023-23-5-955-966","url":null,"abstract":"Motion estimation plays a crucial role in video coding; the Adaptive Rood Pattern Search (ARPS) algorithm is a well known fast motion estimation algorithm. However, ARPS has certain limitations, such as the lack of an accurate starting motion vector, a fixed Zero Motion Prejudgment (ZMP) threshold unsuitable for fast motion video sequences, and the repetitive use of a Unit Rood Pattern (URP) resulting in increased computational complexity. To address these issues, this paper proposes a novel algorithm called Efficient Adaptive Rood Pattern Search (EARPS). EARPS overcomes these limitations by employing the Full Search algorithm to obtain optimal motion vectors for the first column in each frame, adopting a dynamic ZMP threshold that adapts to varying motion speeds in video sequences and utilizing URP only once to reduce computational overhead. The performance of the new proposed EARPS algorithm is evaluated and compared with that of ARPS algorithm using various video sequences with different motion speeds. The number of searching points and Peak Signal-to-Noise Ratio (PSNR) are used to quantify computing complexity and accuracy. The experimental findings show that EARPS surpasses ARPS in terms of computing complexity while retaining a decent degree of PSNR accuracy. The proposed EARPS motion estimation algorithm main contribution is to achieve high speed with reasonable accuracy, regardless of the type of motion speed in the video frames. The EARPS algorithm offers a substantial advancement over ARPS, delivering a more efficient motion estimation method with broader applicability in video processing. It represents a significant contribution to the development of effective motion estimation algorithms.","PeriodicalId":21700,"journal":{"name":"Scientific and Technical Journal of Information Technologies, Mechanics and Optics","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136247545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-01DOI: 10.17586/2226-1494-2023-23-5-1065-1072
V.E. Fomin, A.V. Novotelnova, G.A. Bolkunov, F.Yu. Bochkanov, D.Yu. Karpenkov
In the search for new magnetically ordered phases of materials, solid-state synthesis technologies in reaction crucibles are used. The final result of the synthesis process in reaction crucibles is conditioned, in particular, by technological factors, the mode of current flow and its density, the achieved temperature in the reaction zone, exposure time, geometrical parameters of the crucible and the reaction zone, etc. The paper presents the results of influence investigation of the reaction volume filling degree with tin melt on the processes of heat and mass transfer during its electrothermal treatment. A model describing diffusion processes in the reaction zone during the synthesis of iron and tin intermetallides under electrothermal treatment has been proposed. The diffusion process in the reaction crucibles of the iron-tin system was investigated by the finite element method in the Comsol Multiphysics software environment. It is shown that the decrease in the degree of filling of the reaction crucible with synthesis components leads to a change in the distribution of current density and a decrease in the temperature in the reaction zone, which affects the mass transfer processes. The results of the work can be used in the analysis of experimental data on the production of intermetallides by reaction synthesis and determination of the necessary technological parameters for the synthesis of new materials.
{"title":"Study of heat and mass transfer processes in the Fe-Sn reaction crucible in the presence of high-density electric current","authors":"V.E. Fomin, A.V. Novotelnova, G.A. Bolkunov, F.Yu. Bochkanov, D.Yu. Karpenkov","doi":"10.17586/2226-1494-2023-23-5-1065-1072","DOIUrl":"https://doi.org/10.17586/2226-1494-2023-23-5-1065-1072","url":null,"abstract":"In the search for new magnetically ordered phases of materials, solid-state synthesis technologies in reaction crucibles are used. The final result of the synthesis process in reaction crucibles is conditioned, in particular, by technological factors, the mode of current flow and its density, the achieved temperature in the reaction zone, exposure time, geometrical parameters of the crucible and the reaction zone, etc. The paper presents the results of influence investigation of the reaction volume filling degree with tin melt on the processes of heat and mass transfer during its electrothermal treatment. A model describing diffusion processes in the reaction zone during the synthesis of iron and tin intermetallides under electrothermal treatment has been proposed. The diffusion process in the reaction crucibles of the iron-tin system was investigated by the finite element method in the Comsol Multiphysics software environment. It is shown that the decrease in the degree of filling of the reaction crucible with synthesis components leads to a change in the distribution of current density and a decrease in the temperature in the reaction zone, which affects the mass transfer processes. The results of the work can be used in the analysis of experimental data on the production of intermetallides by reaction synthesis and determination of the necessary technological parameters for the synthesis of new materials.","PeriodicalId":21700,"journal":{"name":"Scientific and Technical Journal of Information Technologies, Mechanics and Optics","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136247733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-01DOI: 10.17586/2226-1494-2023-23-4-757-766
L. Tagirova, T. Zubkova
{"title":"Intelligent adaptive testing system","authors":"L. Tagirova, T. Zubkova","doi":"10.17586/2226-1494-2023-23-4-757-766","DOIUrl":"https://doi.org/10.17586/2226-1494-2023-23-4-757-766","url":null,"abstract":"","PeriodicalId":21700,"journal":{"name":"Scientific and Technical Journal of Information Technologies, Mechanics and Optics","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84448924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-01DOI: 10.17586/2226-1494-2023-23-4-776-785
A. Tiple, A. Kakade
{"title":"Brain tumour segmentation in MRI using fuzzy deformable fusion model with Dolphin-SCA","authors":"A. Tiple, A. Kakade","doi":"10.17586/2226-1494-2023-23-4-776-785","DOIUrl":"https://doi.org/10.17586/2226-1494-2023-23-4-776-785","url":null,"abstract":"","PeriodicalId":21700,"journal":{"name":"Scientific and Technical Journal of Information Technologies, Mechanics and Optics","volume":"77 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83667191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-01DOI: 10.17586/2226-1494-2023-23-4-676-684
K. Matveeva, A. Kundalevich, A. Kapitunova, A. Zozulya, S. Sukhikh, A. Tsibulnikova, A. Zyubin, I. Samusev
{"title":"Application of Raman spectroscopy to study the inactivation process of bacterial microorganisms","authors":"K. Matveeva, A. Kundalevich, A. Kapitunova, A. Zozulya, S. Sukhikh, A. Tsibulnikova, A. Zyubin, I. Samusev","doi":"10.17586/2226-1494-2023-23-4-676-684","DOIUrl":"https://doi.org/10.17586/2226-1494-2023-23-4-676-684","url":null,"abstract":"","PeriodicalId":21700,"journal":{"name":"Scientific and Technical Journal of Information Technologies, Mechanics and Optics","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75677017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-01DOI: 10.17586/2226-1494-2023-23-4-820-827
Динара Есентаевна Курманова, Нурболат Жумабекович Джайчибеков, Dinara E. Kurmanova, Nurbolat Zh, Jaichibekov, L. Gumilyov
Heating of oil and oil products is widely used to reduce energy losses during transportation. The flow in the annular space of the heat exchanger is complex and depends on many factors. The use of thin tubes in helicoid-type heat exchangers makes it necessary to take into account the transition of the flow regime from laminar to turbulent. The semi-empirical turbulence models traditionally used in numerical calculations do not take into account the laminar-turbulent transition.
{"title":"Modeling and simulation of heat exchanger with strong dependence of oil viscosity on temperature","authors":"Динара Есентаевна Курманова, Нурболат Жумабекович Джайчибеков, Dinara E. Kurmanova, Nurbolat Zh, Jaichibekov, L. Gumilyov","doi":"10.17586/2226-1494-2023-23-4-820-827","DOIUrl":"https://doi.org/10.17586/2226-1494-2023-23-4-820-827","url":null,"abstract":"Heating of oil and oil products is widely used to reduce energy losses during transportation. The flow in the annular space of the heat exchanger is complex and depends on many factors. The use of thin tubes in helicoid-type heat exchangers makes it necessary to take into account the transition of the flow regime from laminar to turbulent. The semi-empirical turbulence models traditionally used in numerical calculations do not take into account the laminar-turbulent transition.","PeriodicalId":21700,"journal":{"name":"Scientific and Technical Journal of Information Technologies, Mechanics and Optics","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85245249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-01DOI: 10.17586/2226-1494-2023-23-4-803-811
Маргарита Алексеевна Тит, Сергей Николаевич Беляев, Александр Григорьевич Щербак, Ольга Сергеевна Юльметова, M. A. Tit, Sergey N. Belyaev, Alexandr G. Shcherbak, O. Yulmetova
Improvement of the manufacturing technology for gyroscopic devices, which autonomously generate motion parameters of moving objects, has strategic importance and priority for various industries. The object of current research is a spherical rotor of an electrostatically suspended gyroscope which geometric parameters determine the accuracy characteristics of the device. The paper presents results of the process modeling of spherical form correction for rotors of electrostatically suspended gyroscopes at the stage of its manufacture during the coating deposition process. The proposed mathematical model of the deposition process is based on the placement of a movable screen with a hole between a rotor and a spray source. The axis of the hole lies on the dynamic axis of the rotor and it provides a formation of a spherical segment on the coating rotor surface. During deposition of an additional layer, the screen or rotor moves along the dynamic axis of the rotor changing the distance between the rotor and the screen, and there is additional rotation of the rotor around its dynamic axis. It allows adjusting the curvature of the formed coating on the rotor surface. An analytical model of the technological process for correcting the shape of spherical rotors of electrostatically suspended gyroscopes has been developed. A mathematical description, control factors and significant parameters of the process are given. The results of practical testing of the developed model are presented. The presented mathematical model makes it possible to correct the shape of the rotors during the deposition of a functional coating expanding the technological possibilities and increasing the accuracy of rotors.
{"title":"Modeling of the process of spherical form correction for rotors of electrostatically suspended gyros","authors":"Маргарита Алексеевна Тит, Сергей Николаевич Беляев, Александр Григорьевич Щербак, Ольга Сергеевна Юльметова, M. A. Tit, Sergey N. Belyaev, Alexandr G. Shcherbak, O. Yulmetova","doi":"10.17586/2226-1494-2023-23-4-803-811","DOIUrl":"https://doi.org/10.17586/2226-1494-2023-23-4-803-811","url":null,"abstract":"Improvement of the manufacturing technology for gyroscopic devices, which autonomously generate motion parameters of moving objects, has strategic importance and priority for various industries. The object of current research is a spherical rotor of an electrostatically suspended gyroscope which geometric parameters determine the accuracy characteristics of the device. The paper presents results of the process modeling of spherical form correction for rotors of electrostatically suspended gyroscopes at the stage of its manufacture during the coating deposition process. The proposed mathematical model of the deposition process is based on the placement of a movable screen with a hole between a rotor and a spray source. The axis of the hole lies on the dynamic axis of the rotor and it provides a formation of a spherical segment on the coating rotor surface. During deposition of an additional layer, the screen or rotor moves along the dynamic axis of the rotor changing the distance between the rotor and the screen, and there is additional rotation of the rotor around its dynamic axis. It allows adjusting the curvature of the formed coating on the rotor surface. An analytical model of the technological process for correcting the shape of spherical rotors of electrostatically suspended gyroscopes has been developed. A mathematical description, control factors and significant parameters of the process are given. The results of practical testing of the developed model are presented. The presented mathematical model makes it possible to correct the shape of the rotors during the deposition of a functional coating expanding the technological possibilities and increasing the accuracy of rotors.","PeriodicalId":21700,"journal":{"name":"Scientific and Technical Journal of Information Technologies, Mechanics and Optics","volume":"153 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85397782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-01DOI: 10.17586/2226-1494-2023-23-4-850-853
Алексей Алексеевич Бобцов, Николай Анатольевич, Николаев2, Ольга Андреевна Козачёк, Ольга Владимировна Оськина, A. Bobtsov, Nikolay A. Nikolaev, O. Kozachek, O. Oskina
Unknown constant parameters estimation problem for a nonlinear time-varying system with delayed measurements is considered. The objective of this work is to design an adaptive observer for a nonlinear time-varying system. The observer must provide asymptotic convergence of the unknown constant parameters estimates to their true values. The main idea behind the method is to perform the parametrization of initial dynamical system based on GPEBO (Generalized Parameter Estimation Based Observer) technology and to build a linear regression model. The identification of linear regression model unknown parameters is performed using least square method with forgetting factor. This work develops the previously published approach for the case of nonlinear time-varying systems with delayed measurements. New parameters estimation algorithm can be applied for technical tasks, such as technical condition control and automatic control systems design.
研究一类具有时滞测量的非线性时变系统的未知常参数估计问题。本文的目标是设计一个非线性时变系统的自适应观测器。观测器必须提供未知常数参数估计到其真值的渐近收敛性。该方法的主要思想是基于GPEBO (Generalized Parameter Estimation based Observer)技术对初始动力系统进行参数化,并建立线性回归模型。采用带遗忘因子的最小二乘法对线性回归模型的未知参数进行辨识。这项工作发展了以前发表的方法,用于具有延迟测量的非线性时变系统。新的参数估计算法可以应用于工艺条件控制和自动控制系统设计等技术任务。
{"title":"Adaptive observer for state variables of a time-varying nonlinear system with unknown constant parameters and delayed measurements","authors":"Алексей Алексеевич Бобцов, Николай Анатольевич, Николаев2, Ольга Андреевна Козачёк, Ольга Владимировна Оськина, A. Bobtsov, Nikolay A. Nikolaev, O. Kozachek, O. Oskina","doi":"10.17586/2226-1494-2023-23-4-850-853","DOIUrl":"https://doi.org/10.17586/2226-1494-2023-23-4-850-853","url":null,"abstract":"Unknown constant parameters estimation problem for a nonlinear time-varying system with delayed measurements is considered. The objective of this work is to design an adaptive observer for a nonlinear time-varying system. The observer must provide asymptotic convergence of the unknown constant parameters estimates to their true values. The main idea behind the method is to perform the parametrization of initial dynamical system based on GPEBO (Generalized Parameter Estimation Based Observer) technology and to build a linear regression model. The identification of linear regression model unknown parameters is performed using least square method with forgetting factor. This work develops the previously published approach for the case of nonlinear time-varying systems with delayed measurements. New parameters estimation algorithm can be applied for technical tasks, such as technical condition control and automatic control systems design.","PeriodicalId":21700,"journal":{"name":"Scientific and Technical Journal of Information Technologies, Mechanics and Optics","volume":"100 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79180278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-01DOI: 10.17586/2226-1494-2023-23-4-767-775
Александр Александрович, Аксёнов1, Елена Витальевна Рюмина, Дмитрий Александрович Рюмин, Денис Викторович Иванько, Алексей Анатольевич Карпов, Alexandr A. Axyonov, Elena V. Ryumina, Dmitry A. Ryumin, Denis V. Ivanko, Alexey Karpov
Visual speech recognition or automated lip-reading systems actively apply to speech-to-text translation. Video data proves to be useful in multimodal speech recognition systems, particularly when using acoustic data is difficult or not available at all. The main purpose of this study is to improve driver command recognition by analyzing visual information to reduce touch interaction with various vehicle systems (multimedia and navigation systems, phone calls, etc.) while driving. We propose a method of automated lip-reading the driver’s speech while driving based on a deep neural network of 3DResNet18 architecture. Using neural network architecture with bi-directional LSTM model and attention mechanism allows achieving higher recognition accuracy with a slight decrease in performance. Two different variants of neural network architectures for visual speech recognition are proposed and investigated. When using the first neural network architecture, the result of voice recognition of the driver was 77.68 %, which was lower by 5.78 % than when using the second one the accuracy of which was 83.46 %. Performance of the system which is determined by a real-time indicator RTF in the case of the first neural network architecture is equal to 0.076, and the second — RTF is 0.183 which is more than two times higher. The proposed method was tested on the data of multimodal corpus RUSAVIC recorded in the car. Results of the study can be used in systems of audio-visual speech recognition which is recommended in high noise conditions, for example, when driving a vehicle. In addition, the analysis performed allows us to choose the optimal neural network model of visual speech recognition for subsequent incorporation into the assistive system based on a mobile device.
{"title":"Neural network-based method for visual recognition of driver's voice commands using attention mechanism","authors":"Александр Александрович, Аксёнов1, Елена Витальевна Рюмина, Дмитрий Александрович Рюмин, Денис Викторович Иванько, Алексей Анатольевич Карпов, Alexandr A. Axyonov, Elena V. Ryumina, Dmitry A. Ryumin, Denis V. Ivanko, Alexey Karpov","doi":"10.17586/2226-1494-2023-23-4-767-775","DOIUrl":"https://doi.org/10.17586/2226-1494-2023-23-4-767-775","url":null,"abstract":"Visual speech recognition or automated lip-reading systems actively apply to speech-to-text translation. Video data proves to be useful in multimodal speech recognition systems, particularly when using acoustic data is difficult or not available at all. The main purpose of this study is to improve driver command recognition by analyzing visual information to reduce touch interaction with various vehicle systems (multimedia and navigation systems, phone calls, etc.) while driving. We propose a method of automated lip-reading the driver’s speech while driving based on a deep neural network of 3DResNet18 architecture. Using neural network architecture with bi-directional LSTM model and attention mechanism allows achieving higher recognition accuracy with a slight decrease in performance. Two different variants of neural network architectures for visual speech recognition are proposed and investigated. When using the first neural network architecture, the result of voice recognition of the driver was 77.68 %, which was lower by 5.78 % than when using the second one the accuracy of which was 83.46 %. Performance of the system which is determined by a real-time indicator RTF in the case of the first neural network architecture is equal to 0.076, and the second — RTF is 0.183 which is more than two times higher. The proposed method was tested on the data of multimodal corpus RUSAVIC recorded in the car. Results of the study can be used in systems of audio-visual speech recognition which is recommended in high noise conditions, for example, when driving a vehicle. In addition, the analysis performed allows us to choose the optimal neural network model of visual speech recognition for subsequent incorporation into the assistive system based on a mobile device.","PeriodicalId":21700,"journal":{"name":"Scientific and Technical Journal of Information Technologies, Mechanics and Optics","volume":"64 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80853298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}