Pub Date : 2021-12-29DOI: 10.1109/ICSPIS54653.2021.9729351
Seyedeh Sogand Hashemi, M. Asadi, M. Aghabozorgi
Audio noise has no unique definition, but in general, it includes background and environmental sounds such as objects movements, animal sounds, and etc. These sounds distract listeners and lead to loss of main content. Noise reduction is a process for removing such these unwanted sounds and extracts clear noise-free sound of an audio source. All proposed methods for this problem deal with some challenges such as residual noise, low speed performance, ambiguity in separation. In this paper an automated system is proposed to eliminate noise signal from noisy audio of an audio-visual data. This system utilizes audio and visual features of main sound source (musical instruments) to feed its two internal DNN based models: a) object detection and b) sound separation model. First, an object detection model which is designed by transfer learning method is used to identify sound source in video frames. Then based on detected source, a specific sound separation model is applied to noisy signal and extracts the noise-free audio signal. Audio and visual features play a complementary role in noise reduction process and its positive effect is obvious in obtained results. The experimental results indicate that under the noisy environment, especially in real-time applications, the proposed noise reduction scheme improves the quality of the extracted noise-free sound in comparison with other algorithms.
{"title":"An Audio-Visual System for Sound Noise Reduction Based on Deep Neural Networks","authors":"Seyedeh Sogand Hashemi, M. Asadi, M. Aghabozorgi","doi":"10.1109/ICSPIS54653.2021.9729351","DOIUrl":"https://doi.org/10.1109/ICSPIS54653.2021.9729351","url":null,"abstract":"Audio noise has no unique definition, but in general, it includes background and environmental sounds such as objects movements, animal sounds, and etc. These sounds distract listeners and lead to loss of main content. Noise reduction is a process for removing such these unwanted sounds and extracts clear noise-free sound of an audio source. All proposed methods for this problem deal with some challenges such as residual noise, low speed performance, ambiguity in separation. In this paper an automated system is proposed to eliminate noise signal from noisy audio of an audio-visual data. This system utilizes audio and visual features of main sound source (musical instruments) to feed its two internal DNN based models: a) object detection and b) sound separation model. First, an object detection model which is designed by transfer learning method is used to identify sound source in video frames. Then based on detected source, a specific sound separation model is applied to noisy signal and extracts the noise-free audio signal. Audio and visual features play a complementary role in noise reduction process and its positive effect is obvious in obtained results. The experimental results indicate that under the noisy environment, especially in real-time applications, the proposed noise reduction scheme improves the quality of the extracted noise-free sound in comparison with other algorithms.","PeriodicalId":286966,"journal":{"name":"2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127838824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-29DOI: 10.1109/ICSPIS54653.2021.9729381
Sneha Kanchan, Ajit Kumar, A. Saqib, B. Choi
Smart Grids are the need of today's energy distribution system, which maintains a systematic communication between suppliers and consumers. Often these grids need to communicate to the Human Machine Interface (HMI) server regarding their findings of the customer needs and availability. However, some external entities might compromise the HMI server, which tends to misuse smart grids' personal information. Hence, the grids should not reveal their or their customer's identity to the server. Federated Learning (FL) can solve this situation where the data from various smart grids can be collected without disclosing the grid's identity. We have proposed a group signature-based federated signature-based in which grid components sign with the group signature instead of their personal signatures. We have verified the security of our algorithm with the Automated Validation of Internet Security Protocols and Applications (AVISPA) simulator.
{"title":"Group Signature Based Federated Learning in Smart Grids","authors":"Sneha Kanchan, Ajit Kumar, A. Saqib, B. Choi","doi":"10.1109/ICSPIS54653.2021.9729381","DOIUrl":"https://doi.org/10.1109/ICSPIS54653.2021.9729381","url":null,"abstract":"Smart Grids are the need of today's energy distribution system, which maintains a systematic communication between suppliers and consumers. Often these grids need to communicate to the Human Machine Interface (HMI) server regarding their findings of the customer needs and availability. However, some external entities might compromise the HMI server, which tends to misuse smart grids' personal information. Hence, the grids should not reveal their or their customer's identity to the server. Federated Learning (FL) can solve this situation where the data from various smart grids can be collected without disclosing the grid's identity. We have proposed a group signature-based federated signature-based in which grid components sign with the group signature instead of their personal signatures. We have verified the security of our algorithm with the Automated Validation of Internet Security Protocols and Applications (AVISPA) simulator.","PeriodicalId":286966,"journal":{"name":"2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130811864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-29DOI: 10.1109/ICSPIS54653.2021.9729394
Abdollah Zakeri, H. Hassanpour
In the recent decade, the field of biometrics was revolutionized thanks to the rise of deep learning. Many improvements were done on old biometric methods which reduced the security concerns. Before biometric people verification methods like facial recognition, an imposter could access people's vital information simply by finding out their password via installing a key-logger on their system. Thanks to deep learning, safer biometric approaches to person verification and person re-identification like visual authentication and audio-visual authentication were made possible and applicable on many devices like smartphones and laptops. Unfortunately, facial recognition is considered to be a threat to personal privacy by some people. Additionally, biometric methods that use the audio modality are not always applicable due to reasons like audio noise present in the environment. Lip-based biometric authentication (LBBA) is the process of authenticating a person using a video of their lips' movement while talking. In order to solve the mentioned concerns about other biometric authentication methods, we can use a visual-only LBBA method. Since people might have different emotional states that could potentially affect their utterance and speech tempo, the audio-only LBBA method must be able to produce an emotional and speech tempo invariant embedding of the input utterance video. In this article, we proposed a network inspired by the Siamese architecture that learned to produce emotion and speech tempo invariant representations of the input utterance videos. In order to train and test our proposed network, we used the CREMA-D dataset and achieved 95.41 % accuracy on the validation set.
{"title":"WhisperNet: Deep Siamese Network For Emotion and Speech Tempo Invariant Visual-Only Lip-Based Biometric","authors":"Abdollah Zakeri, H. Hassanpour","doi":"10.1109/ICSPIS54653.2021.9729394","DOIUrl":"https://doi.org/10.1109/ICSPIS54653.2021.9729394","url":null,"abstract":"In the recent decade, the field of biometrics was revolutionized thanks to the rise of deep learning. Many improvements were done on old biometric methods which reduced the security concerns. Before biometric people verification methods like facial recognition, an imposter could access people's vital information simply by finding out their password via installing a key-logger on their system. Thanks to deep learning, safer biometric approaches to person verification and person re-identification like visual authentication and audio-visual authentication were made possible and applicable on many devices like smartphones and laptops. Unfortunately, facial recognition is considered to be a threat to personal privacy by some people. Additionally, biometric methods that use the audio modality are not always applicable due to reasons like audio noise present in the environment. Lip-based biometric authentication (LBBA) is the process of authenticating a person using a video of their lips' movement while talking. In order to solve the mentioned concerns about other biometric authentication methods, we can use a visual-only LBBA method. Since people might have different emotional states that could potentially affect their utterance and speech tempo, the audio-only LBBA method must be able to produce an emotional and speech tempo invariant embedding of the input utterance video. In this article, we proposed a network inspired by the Siamese architecture that learned to produce emotion and speech tempo invariant representations of the input utterance videos. In order to train and test our proposed network, we used the CREMA-D dataset and achieved 95.41 % accuracy on the validation set.","PeriodicalId":286966,"journal":{"name":"2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130595483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-29DOI: 10.1109/ICSPIS54653.2021.9729376
F. Kazemzadeh, Amir Karian, A. Safaei, M. Mirzarezaee
In social networks, the problem of influence maximization seeks for a solution to find individuals or nodes in different communities so that they can diffuse information influence among a wide range of other nodes. The proposed algorithms for influence maximization problem have many drawbacks. For example, the computational overhead is very high and also the seed nodes is not selected optimally. For this reason, the influence does not spread totally in the social network.for solving the problem, This paper provides the SFIM algorithm and uses the idea of layering community nodes and identifying valuable layers to limit the search space. The operation is continued only on nodes of valuable layers, which significantly reduces the algorithm's runtime. Then, the best set of influential nodes with the highest accuracy is found by considering the main criteria of centrality topology such as harmonic and degree. Accuracy in selecting a node is one of the most important needs of the problem that is best answered. Moreover, different experiments and datasets indicate that this algorithm can provide the best efficiency required to solve the problem compared to other algorithms.
{"title":"Intelligent Filtering of Graph Shells in the Problem of Influence Maximization Based on the Independent Cascade Model","authors":"F. Kazemzadeh, Amir Karian, A. Safaei, M. Mirzarezaee","doi":"10.1109/ICSPIS54653.2021.9729376","DOIUrl":"https://doi.org/10.1109/ICSPIS54653.2021.9729376","url":null,"abstract":"In social networks, the problem of influence maximization seeks for a solution to find individuals or nodes in different communities so that they can diffuse information influence among a wide range of other nodes. The proposed algorithms for influence maximization problem have many drawbacks. For example, the computational overhead is very high and also the seed nodes is not selected optimally. For this reason, the influence does not spread totally in the social network.for solving the problem, This paper provides the SFIM algorithm and uses the idea of layering community nodes and identifying valuable layers to limit the search space. The operation is continued only on nodes of valuable layers, which significantly reduces the algorithm's runtime. Then, the best set of influential nodes with the highest accuracy is found by considering the main criteria of centrality topology such as harmonic and degree. Accuracy in selecting a node is one of the most important needs of the problem that is best answered. Moreover, different experiments and datasets indicate that this algorithm can provide the best efficiency required to solve the problem compared to other algorithms.","PeriodicalId":286966,"journal":{"name":"2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS)","volume":"67 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124247341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-29DOI: 10.1109/ICSPIS54653.2021.9729332
C. S. Kalyan, Khamruddin Syed, B. S. Goud, Ch. Rami Reddy, Hossein Shahinzadeh, G. Gharehpetian
This paper attempted to assess the performances of two simulation models of generation rate constraint (GRC) for two area thermal systems to achieve optimal load frequency control (LFC). The dynamical models of GRC investigated in this work are coined as open loop and closed loop GRC which are extensively utilized by the researchers without providing specific analysis for their selection and suitability. This paper facilitates the selection of appropriate and most effective GRC structures based on dynamical analysis about the thermal system to obtain LFC optimally. Two area thermal system has been examined with different GRC models in the platform of MATLAB/SIMULINK under supervision of two degrees of freedom (DOF)-PID (2DOF-PID) controller optimized with seagull optimization algorithm (SOA). Simulation results demonstrate the most suitable GRC model for the thermal system to obtain optimal LFC.
{"title":"Ascertainment of Appropriate GRC Structure for Two Area Thermal System under Seagull Optimization based 2DOF-PID Controller","authors":"C. S. Kalyan, Khamruddin Syed, B. S. Goud, Ch. Rami Reddy, Hossein Shahinzadeh, G. Gharehpetian","doi":"10.1109/ICSPIS54653.2021.9729332","DOIUrl":"https://doi.org/10.1109/ICSPIS54653.2021.9729332","url":null,"abstract":"This paper attempted to assess the performances of two simulation models of generation rate constraint (GRC) for two area thermal systems to achieve optimal load frequency control (LFC). The dynamical models of GRC investigated in this work are coined as open loop and closed loop GRC which are extensively utilized by the researchers without providing specific analysis for their selection and suitability. This paper facilitates the selection of appropriate and most effective GRC structures based on dynamical analysis about the thermal system to obtain LFC optimally. Two area thermal system has been examined with different GRC models in the platform of MATLAB/SIMULINK under supervision of two degrees of freedom (DOF)-PID (2DOF-PID) controller optimized with seagull optimization algorithm (SOA). Simulation results demonstrate the most suitable GRC model for the thermal system to obtain optimal LFC.","PeriodicalId":286966,"journal":{"name":"2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134083260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-29DOI: 10.1109/ICSPIS54653.2021.9729342
Mahdi Mohammadi, M. Shayegan, Nima Latifi
Nowadays, data access, data sharing, data extraction and data usage have become a vital issue for technology experts. With the rapid growth of content on the Web, humans need new and up-to-date approaches for data extraction from the Web. However, there is much useless and unrelated information such as navigation panel, content table, propaganda, service catalogue, and menus in these pages. Thus, the web content is considered useful (original) and useless (secondary) content. Most receivers and final users search for useful content. This research presents a new approach to extract useful content from the Web. For this purpose, child nodes are selected as the original content by weighing the fundamental contextual rules method to DOM Tree's nodes. Overall, after standardizing web page and developing DOM Tree, the best child node of the parent node are selected according to a weighing algorithm; then, the best path and the best sample node are selected. The presented solution applied on several datasets shows high accuracy rate such as Precision, Recall and F factor are 0.992, 0.983 and 0.988, respectively.
{"title":"Web Content Extraction by Weighing the Fundamental Contextual Rules","authors":"Mahdi Mohammadi, M. Shayegan, Nima Latifi","doi":"10.1109/ICSPIS54653.2021.9729342","DOIUrl":"https://doi.org/10.1109/ICSPIS54653.2021.9729342","url":null,"abstract":"Nowadays, data access, data sharing, data extraction and data usage have become a vital issue for technology experts. With the rapid growth of content on the Web, humans need new and up-to-date approaches for data extraction from the Web. However, there is much useless and unrelated information such as navigation panel, content table, propaganda, service catalogue, and menus in these pages. Thus, the web content is considered useful (original) and useless (secondary) content. Most receivers and final users search for useful content. This research presents a new approach to extract useful content from the Web. For this purpose, child nodes are selected as the original content by weighing the fundamental contextual rules method to DOM Tree's nodes. Overall, after standardizing web page and developing DOM Tree, the best child node of the parent node are selected according to a weighing algorithm; then, the best path and the best sample node are selected. The presented solution applied on several datasets shows high accuracy rate such as Precision, Recall and F factor are 0.992, 0.983 and 0.988, respectively.","PeriodicalId":286966,"journal":{"name":"2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130433588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-29DOI: 10.1109/ICSPIS54653.2021.9729383
Somayeh Mokhtari, C. M. Silva, J. Nogueira
Vehicular communication can be highly improved by providing network access points distributed along with the road network. Such access points for vehicles are commonly referred to as roadside units and provide a backbone integrating the whole vehicular network. However, to avoid wasting resources and maximizing network efficiency, the locations where these units are installed need special attention in such networks' design. In this work, we propose two novel strategies (Partial Time Information (PTI) and Maximum Coverage Time (MCT)) to deploy a predefined number of roadside units seeking to maximize the number of distinct vehicles crossing covered areas during a given time threshold. Instead of relying on the full trajectory of vehicles, which may incur privacy issues, the PTI strategy utilizes duplicate coverage time ratios between urban regions to infer the best locations for deploying the roadside units, while the MCT operates in the lack of mobility information. As a baseline, we consider the FPF strategy, which regards a Markovian approach for the flow of vehicles. Simulation results based on the data traffic set of Cologne, Germany demonstrate that the proposed approaches increase the vehicle to infrastructure connection time in comparison to FPF.
{"title":"Designing the Communication Infrastructures for Democratizing the Coverage Time of Connected Vehicles","authors":"Somayeh Mokhtari, C. M. Silva, J. Nogueira","doi":"10.1109/ICSPIS54653.2021.9729383","DOIUrl":"https://doi.org/10.1109/ICSPIS54653.2021.9729383","url":null,"abstract":"Vehicular communication can be highly improved by providing network access points distributed along with the road network. Such access points for vehicles are commonly referred to as roadside units and provide a backbone integrating the whole vehicular network. However, to avoid wasting resources and maximizing network efficiency, the locations where these units are installed need special attention in such networks' design. In this work, we propose two novel strategies (Partial Time Information (PTI) and Maximum Coverage Time (MCT)) to deploy a predefined number of roadside units seeking to maximize the number of distinct vehicles crossing covered areas during a given time threshold. Instead of relying on the full trajectory of vehicles, which may incur privacy issues, the PTI strategy utilizes duplicate coverage time ratios between urban regions to infer the best locations for deploying the roadside units, while the MCT operates in the lack of mobility information. As a baseline, we consider the FPF strategy, which regards a Markovian approach for the flow of vehicles. Simulation results based on the data traffic set of Cologne, Germany demonstrate that the proposed approaches increase the vehicle to infrastructure connection time in comparison to FPF.","PeriodicalId":286966,"journal":{"name":"2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134517684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-29DOI: 10.1109/ICSPIS54653.2021.9729336
Erfan Basiri, Reza P. R. Hasanzadeh, M. Kersemans
Owing to the high sensitivity of carbon fiber reinforced polymer (CFRP) to internal damages, defect detection through Non-destructive testing (NDT) is deemed an essential task. One of the common methods in NDT to achieve this aim is measuring and analyzing the full-field guided waves propagation in CFRP plates. Scattered waves corresponding to deep defects are usually obscured by other waves due to their weak amplitude. A successful method to highlight these waves is to use wavenumber filtering (WF). However, WF suffers from the assumption that the optimal frequency range of excitation signal is known beforehand, which is not always available. Another drawback is that when more than one type of guided waves mode exist, this method is not capable of highlighting desirable waves or vibrations sufficiently. In this paper, full wavefield images are first constituted by exciting the guided waves via broadband chirp signal and registering them with scanning laser Doppler vibrometery (SLDV). Then, a successive wavenumber filtering (SWF) approach is introduced, which efficiently removes undesirable higher order guided wave modes, and removes the need to know a priori the optimal excitation frequency. Moreover, it is quantitatively and qualitatively shown that the proposed approach could lead to better discrimination between damaged and healthy area than conventional WF.
{"title":"A Successive Wavenumber Filtering Approach for Defect Detection in CFRP using Wavefield Scanning","authors":"Erfan Basiri, Reza P. R. Hasanzadeh, M. Kersemans","doi":"10.1109/ICSPIS54653.2021.9729336","DOIUrl":"https://doi.org/10.1109/ICSPIS54653.2021.9729336","url":null,"abstract":"Owing to the high sensitivity of carbon fiber reinforced polymer (CFRP) to internal damages, defect detection through Non-destructive testing (NDT) is deemed an essential task. One of the common methods in NDT to achieve this aim is measuring and analyzing the full-field guided waves propagation in CFRP plates. Scattered waves corresponding to deep defects are usually obscured by other waves due to their weak amplitude. A successful method to highlight these waves is to use wavenumber filtering (WF). However, WF suffers from the assumption that the optimal frequency range of excitation signal is known beforehand, which is not always available. Another drawback is that when more than one type of guided waves mode exist, this method is not capable of highlighting desirable waves or vibrations sufficiently. In this paper, full wavefield images are first constituted by exciting the guided waves via broadband chirp signal and registering them with scanning laser Doppler vibrometery (SLDV). Then, a successive wavenumber filtering (SWF) approach is introduced, which efficiently removes undesirable higher order guided wave modes, and removes the need to know a priori the optimal excitation frequency. Moreover, it is quantitatively and qualitatively shown that the proposed approach could lead to better discrimination between damaged and healthy area than conventional WF.","PeriodicalId":286966,"journal":{"name":"2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128049697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-29DOI: 10.1109/ICSPIS54653.2021.9729378
Arghavan Irankhah, Sahar Rezazadeh, M. Moghaddam, Sara Ershadi-Nasab
Energy prediction is an essential task in smart homes for demand-side management and energy consumption reduction. Therefore, an intelligent forecasting model is necessary for predicting demand-side energy in residential buildings. Recent studies have shown that deep learning networks have higher performance than traditional machine learning methods in short-term load forecasting. In this paper, a new hybrid network is proposed that consists of Auto-Encoder LSTM layer, Bi-LSTM layer, stack of LSTM layer, and finally Fully connected layer. The experiments are conducted on an individual household electric power consumption dataset and the results demonstrate that the proposed network has the smallest value in terms of root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) in comparison with other state-of-the-art approaches.
{"title":"Hybrid Deep Learning Method Based on LSTM-Autoencoder Network for Household Short-term Load Forecasting","authors":"Arghavan Irankhah, Sahar Rezazadeh, M. Moghaddam, Sara Ershadi-Nasab","doi":"10.1109/ICSPIS54653.2021.9729378","DOIUrl":"https://doi.org/10.1109/ICSPIS54653.2021.9729378","url":null,"abstract":"Energy prediction is an essential task in smart homes for demand-side management and energy consumption reduction. Therefore, an intelligent forecasting model is necessary for predicting demand-side energy in residential buildings. Recent studies have shown that deep learning networks have higher performance than traditional machine learning methods in short-term load forecasting. In this paper, a new hybrid network is proposed that consists of Auto-Encoder LSTM layer, Bi-LSTM layer, stack of LSTM layer, and finally Fully connected layer. The experiments are conducted on an individual household electric power consumption dataset and the results demonstrate that the proposed network has the smallest value in terms of root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) in comparison with other state-of-the-art approaches.","PeriodicalId":286966,"journal":{"name":"2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS)","volume":"06 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127273991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-29DOI: 10.1109/ICSPIS54653.2021.9729387
Pedram Abdzadeh Ziabary, H. Veisi
Nowadays, biometrics like face, voice, fingerprint, and iris are widely used for the identity authentication of individuals. Automatic Speaker Verification (ASV) systems aim to verify the speaker's authenticity, but recent research has shown that they are vulnerable to various types of attacks. A large number of Text-To-Speech (TTS) and Voice Conversion (VC) methods are being used to create the so-called synthetic or deepfake speech. In recent years, numerous works have been proposed to improve the spoofing detection ability to protect ASV systems against these attacks. This work proposes a synthetic speech detection system, which uses the spectrogram of Constant Q Transform (CQT) as its input features. The CQT spectrogram provides a constant Q factor in different frequency regions similar to the human perception system. Also, compared with Short-Term Fourier Transform (STFT), CQT provides higher time resolution at higher frequencies and higher frequency resolution at lower frequencies. Additionally, the CQT spectrogram has brought us low input feature dimensions, which aids with reducing needed computation time. The Constant Q Cepstral Coefficients (CQCC) features, driven from cepstral analysis of the CQT, have been employed in some recent works for voice spoofing detection. However, to the best of our knowledge, ours is the first work using CQT magnitude and power spectrogram directly for voice spoofing detection. We also use a combination of self-attended ResNet and one class learning to provide our model the robustness against unseen attacks. Finally, it is observed that even though using input features with relatively lower dimensions and reducing computation time, we can still obtain EER 3.53% and min t-DCF 0.10 on ASVspoof 2019 Logical Access (LA) dataset, which places our model among the top performers in this field.
{"title":"A Countermeasure Based on CQT Spectrogram for Deepfake Speech Detection","authors":"Pedram Abdzadeh Ziabary, H. Veisi","doi":"10.1109/ICSPIS54653.2021.9729387","DOIUrl":"https://doi.org/10.1109/ICSPIS54653.2021.9729387","url":null,"abstract":"Nowadays, biometrics like face, voice, fingerprint, and iris are widely used for the identity authentication of individuals. Automatic Speaker Verification (ASV) systems aim to verify the speaker's authenticity, but recent research has shown that they are vulnerable to various types of attacks. A large number of Text-To-Speech (TTS) and Voice Conversion (VC) methods are being used to create the so-called synthetic or deepfake speech. In recent years, numerous works have been proposed to improve the spoofing detection ability to protect ASV systems against these attacks. This work proposes a synthetic speech detection system, which uses the spectrogram of Constant Q Transform (CQT) as its input features. The CQT spectrogram provides a constant Q factor in different frequency regions similar to the human perception system. Also, compared with Short-Term Fourier Transform (STFT), CQT provides higher time resolution at higher frequencies and higher frequency resolution at lower frequencies. Additionally, the CQT spectrogram has brought us low input feature dimensions, which aids with reducing needed computation time. The Constant Q Cepstral Coefficients (CQCC) features, driven from cepstral analysis of the CQT, have been employed in some recent works for voice spoofing detection. However, to the best of our knowledge, ours is the first work using CQT magnitude and power spectrogram directly for voice spoofing detection. We also use a combination of self-attended ResNet and one class learning to provide our model the robustness against unseen attacks. Finally, it is observed that even though using input features with relatively lower dimensions and reducing computation time, we can still obtain EER 3.53% and min t-DCF 0.10 on ASVspoof 2019 Logical Access (LA) dataset, which places our model among the top performers in this field.","PeriodicalId":286966,"journal":{"name":"2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126366022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}