Pub Date : 2022-07-11DOI: 10.1109/SPCOM55316.2022.9840798
Ugrasen Singh, M. Bhatnagar
This paper introduces a reconfigurable intelligent surface (RIS)-aided wireless communication network, where an RIS simultaneously transmits the information and reflects an impinging radio frequency signal. Moreover, the RIS explicitly embeds its information bits in the discrete phase shifts of reflecting elements, which are selected from the reflection phase modulation (RPM) constellation. Further, access point (AP) exploits the pulse amplitude modulation (PAM) constellation to convey its information bits. Both RIS and AP independently transmit their data to the receiver using RPM and PAM symbols. In addition, joint decoding of RPM and PAM constellations symbols is performed using a maximum likelihood (ML)-detector, and a tight upper bound of the average bit error rate (ABER) is presented. A unified analytical framework of the average pairwise error probability over double Rayleigh fading channels is derived, which is followed by the ABER expression. Furthermore, it is noticed from the numerical results that the proposed scheme attains high data rates with remarkably lower error rates in very low SNR regime.
{"title":"An Information Transmission Scheme for RIS-Aided Wireless Communication Network","authors":"Ugrasen Singh, M. Bhatnagar","doi":"10.1109/SPCOM55316.2022.9840798","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840798","url":null,"abstract":"This paper introduces a reconfigurable intelligent surface (RIS)-aided wireless communication network, where an RIS simultaneously transmits the information and reflects an impinging radio frequency signal. Moreover, the RIS explicitly embeds its information bits in the discrete phase shifts of reflecting elements, which are selected from the reflection phase modulation (RPM) constellation. Further, access point (AP) exploits the pulse amplitude modulation (PAM) constellation to convey its information bits. Both RIS and AP independently transmit their data to the receiver using RPM and PAM symbols. In addition, joint decoding of RPM and PAM constellations symbols is performed using a maximum likelihood (ML)-detector, and a tight upper bound of the average bit error rate (ABER) is presented. A unified analytical framework of the average pairwise error probability over double Rayleigh fading channels is derived, which is followed by the ABER expression. Furthermore, it is noticed from the numerical results that the proposed scheme attains high data rates with remarkably lower error rates in very low SNR regime.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"50 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116119871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep neural network (DNN) models have gained popularity for most image classification problems. However, DNNs also have numerous vulnerable areas. These vulnerabilities can be exploited by an adversary to execute a successful adversarial attack, which is an algorithm to generate perturbed inputs that can fool a well-trained DNN. Among various existing adversarial attacks, DeepFool, a white-box untargeted attack is considered as one of the most reliable algorithms to compute adversarial perturbations. However, in some scenarios such as person recognition, adversary might want to carry out a targeted attack such that the input gets misclassified in a specific target class. Moreover, studies show that defense against a targeted attack is tougher than an untargeted one. Hence, generating a targeted adversarial example is desirable from an attacker’s perspective. In this paper, we propose ‘Targeted DeepFool’, which is based on computing a minimal amount of perturbation required to reach the target hyperplane. The proposed algorithm produces minimal amount of distortion for conventional image datasets: MNIST and CIFAR10. Further, Targeted DeepFool shows excellent performance in terms of adversarial success rate.
{"title":"Generating Targeted Adversarial Attacks and Assessing their Effectiveness in Fooling Deep Neural Networks","authors":"Shivangi Gajjar, Avik Hati, Shruti Bhilare, Srimanta Mandal","doi":"10.1109/SPCOM55316.2022.9840784","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840784","url":null,"abstract":"Deep neural network (DNN) models have gained popularity for most image classification problems. However, DNNs also have numerous vulnerable areas. These vulnerabilities can be exploited by an adversary to execute a successful adversarial attack, which is an algorithm to generate perturbed inputs that can fool a well-trained DNN. Among various existing adversarial attacks, DeepFool, a white-box untargeted attack is considered as one of the most reliable algorithms to compute adversarial perturbations. However, in some scenarios such as person recognition, adversary might want to carry out a targeted attack such that the input gets misclassified in a specific target class. Moreover, studies show that defense against a targeted attack is tougher than an untargeted one. Hence, generating a targeted adversarial example is desirable from an attacker’s perspective. In this paper, we propose ‘Targeted DeepFool’, which is based on computing a minimal amount of perturbation required to reach the target hyperplane. The proposed algorithm produces minimal amount of distortion for conventional image datasets: MNIST and CIFAR10. Further, Targeted DeepFool shows excellent performance in terms of adversarial success rate.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"251 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116444708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-11DOI: 10.1109/SPCOM55316.2022.9840846
Manjeer Majumder, Amrita Mishra, A. Jagannatham
This paper proposes an optimal training sequence framework for general block transmission systems over spatially correlated multiple-input multiple-output (MIMO) frequency selective channels. The pilot design is based on a formulation that minimizes the Bayesian Cramér-Rao bound (BCRB) for the mean squared error (MSE) of channel estimation, and is thus MSE optimal in nature. The novelty of the proposed work lies in development of a generic pilot design scheme applicable to all the four MIMO block transmission systems, namely single carrier cyclic prefix (SC-CP), single carrier zero padded (SC-ZP), multi-carrier zero padded (MC-ZP), and multi-carrier cyclic prefix (MC-CP) systems. Simulation results are presented to illustrate the superior performance of the proposed technique over the conventional pilot sequences, in terms of both MSE as well as bit error rate (BER).
{"title":"Optimal Training Design for Channel Estimation in MIMO Single/Multi Carrier Block Transmission Systems","authors":"Manjeer Majumder, Amrita Mishra, A. Jagannatham","doi":"10.1109/SPCOM55316.2022.9840846","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840846","url":null,"abstract":"This paper proposes an optimal training sequence framework for general block transmission systems over spatially correlated multiple-input multiple-output (MIMO) frequency selective channels. The pilot design is based on a formulation that minimizes the Bayesian Cramér-Rao bound (BCRB) for the mean squared error (MSE) of channel estimation, and is thus MSE optimal in nature. The novelty of the proposed work lies in development of a generic pilot design scheme applicable to all the four MIMO block transmission systems, namely single carrier cyclic prefix (SC-CP), single carrier zero padded (SC-ZP), multi-carrier zero padded (MC-ZP), and multi-carrier cyclic prefix (MC-CP) systems. Simulation results are presented to illustrate the superior performance of the proposed technique over the conventional pilot sequences, in terms of both MSE as well as bit error rate (BER).","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121782064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-11DOI: 10.1109/SPCOM55316.2022.9840773
Vinod S. Khandkar, M. Hanawal
The Internet is a common platform for sharing information. It is required to preserve every user’s privacy and security of information on the Internet. While data security is primarily taken care of by the TLS protocol and broader adaptation of HTTPS, FTPS, and SMPTS protocol, some fields of TLS expose the type of activity a user is performing, thus violating user privacy. One such protocol information is Server Name Indication (SNI) in the TLS ClinetHello message that goes in plaintext. Anyone intercepting the message thus identifies the service host type. We present a method named Extended TLS (ETLS) to mask the server host identity by encrypting the SNI without requiring any change in the existing protocols. In ETLS, a connection is established over two handshakes - the first handshake establishes a secure channel without sharing SNI information, and the second handshake shares the encrypted SNI. ETLS requires no modification in the already proven TLS encryption mechanism and retains all security benefits of the existing secure channel establishment. We demonstrate the feasibility of ETLS over live Internet with scripts that implement our methodology. Using a customized client-server and a commercial traffic shaper, we also demonstrated that the host identity is not exposed under ETLS, thus demonstrating its privacy-preserving property.
{"title":"Extended TLS: Masking Server Host Identity on the Internet Using Encrypted TLS Handshake","authors":"Vinod S. Khandkar, M. Hanawal","doi":"10.1109/SPCOM55316.2022.9840773","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840773","url":null,"abstract":"The Internet is a common platform for sharing information. It is required to preserve every user’s privacy and security of information on the Internet. While data security is primarily taken care of by the TLS protocol and broader adaptation of HTTPS, FTPS, and SMPTS protocol, some fields of TLS expose the type of activity a user is performing, thus violating user privacy. One such protocol information is Server Name Indication (SNI) in the TLS ClinetHello message that goes in plaintext. Anyone intercepting the message thus identifies the service host type. We present a method named Extended TLS (ETLS) to mask the server host identity by encrypting the SNI without requiring any change in the existing protocols. In ETLS, a connection is established over two handshakes - the first handshake establishes a secure channel without sharing SNI information, and the second handshake shares the encrypted SNI. ETLS requires no modification in the already proven TLS encryption mechanism and retains all security benefits of the existing secure channel establishment. We demonstrate the feasibility of ETLS over live Internet with scripts that implement our methodology. Using a customized client-server and a commercial traffic shaper, we also demonstrated that the host identity is not exposed under ETLS, thus demonstrating its privacy-preserving property.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126725285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-11DOI: 10.1109/SPCOM55316.2022.9840816
Y. Vasavada, Bibin Baby John
This paper shows that the information transmitted over the spatial dimension in the Multiple Input Multiple Output systems with Spatial Modulation (MIMO-SM) is sensitive to the transmission of the conventional amplitude and phase modulated (APM) symbols with small magnitude (e.g. from the inner ring of the constellation). This sensitivity is a limiting factor in the performance of the quadrature amplitude modulated (QAM) SM. We propose three novel MIMO-SM constellation designs to mitigate the performance limitation: (i) a hybrid PSK-QAM MIMO-SM that leverages the constant modulus phase shift keying (PSK) in conjunction with the QAM to minimize the sum of the antenna errors and the APM symbol errors. This is achieved by increasing the number of transmit antennas, which need not be an integer power of two; (ii) MIMO-SM with a novel APM QAM constellation, with optimized radius of the inner constellation ring; and (iii) a MIMO-SM that transmits the QAM symbols from the inner ring on different orthogonal resources (e.g., subcarriers) to reduce their impact on the antenna errors. The simulation results demonstrate the performance benefit of the proposed approaches compared to the conventional SM.
{"title":"Constellation Designs for the Spatial Modulation MIMO Systems","authors":"Y. Vasavada, Bibin Baby John","doi":"10.1109/SPCOM55316.2022.9840816","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840816","url":null,"abstract":"This paper shows that the information transmitted over the spatial dimension in the Multiple Input Multiple Output systems with Spatial Modulation (MIMO-SM) is sensitive to the transmission of the conventional amplitude and phase modulated (APM) symbols with small magnitude (e.g. from the inner ring of the constellation). This sensitivity is a limiting factor in the performance of the quadrature amplitude modulated (QAM) SM. We propose three novel MIMO-SM constellation designs to mitigate the performance limitation: (i) a hybrid PSK-QAM MIMO-SM that leverages the constant modulus phase shift keying (PSK) in conjunction with the QAM to minimize the sum of the antenna errors and the APM symbol errors. This is achieved by increasing the number of transmit antennas, which need not be an integer power of two; (ii) MIMO-SM with a novel APM QAM constellation, with optimized radius of the inner constellation ring; and (iii) a MIMO-SM that transmits the QAM symbols from the inner ring on different orthogonal resources (e.g., subcarriers) to reduce their impact on the antenna errors. The simulation results demonstrate the performance benefit of the proposed approaches compared to the conventional SM.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133554370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-11DOI: 10.1109/SPCOM55316.2022.9840850
Mrinmoy Bhattacharjee, S. Prasanna, P. Guha
The separation of foreground and background sounds can serve as a useful preprocessing step when dealing with real-world audio signals. This work proposes a foreground-background audio separation (FBAS) algorithm that uses spectral peak information for generating time-frequency masks. The proposed algorithm can work without training, is relatively fast, and provides decent audio separation. As a specific use case, the proposed algorithm is used to extract clean foreground signals from noisy speech signals. The quality of foreground speech separated with FBAS is compared with the output of a state-of-the-art deep-learning-based speech enhancement system. Various subjective and objective evaluation measures are computed, which indicate that the proposed FBAS algorithm is effective.
{"title":"Foreground-Background Audio Separation using Spectral Peaks based Time-Frequency Masks","authors":"Mrinmoy Bhattacharjee, S. Prasanna, P. Guha","doi":"10.1109/SPCOM55316.2022.9840850","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840850","url":null,"abstract":"The separation of foreground and background sounds can serve as a useful preprocessing step when dealing with real-world audio signals. This work proposes a foreground-background audio separation (FBAS) algorithm that uses spectral peak information for generating time-frequency masks. The proposed algorithm can work without training, is relatively fast, and provides decent audio separation. As a specific use case, the proposed algorithm is used to extract clean foreground signals from noisy speech signals. The quality of foreground speech separated with FBAS is compared with the output of a state-of-the-art deep-learning-based speech enhancement system. Various subjective and objective evaluation measures are computed, which indicate that the proposed FBAS algorithm is effective.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"1987 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131340451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-11DOI: 10.1109/SPCOM55316.2022.9840804
Shrikant Sharma, A. Girish, Darin Jeff, Garweet Sresth, Sanket Bhalerao, V. Gadre, C. Rao, P. Radhakrishna
The complete characterization of a target by radar involves estimation of its range and Doppler and micro-Doppler frequencies. Finite Rate of Innovation (FRI) approaches allow for sampling at sub-Nyquist rates. Empirical Mode Decomposition, which recursively decomposes a signal into different modes of unknown spectral bands, has performance limitations such as sensitivity to noise and sampling rates. These limitations are partially addressed by several variant algorithms; one of them is Variational Mode Decomposition (VMD), an entirely non-recursive model to extract the modes concurrently. In this paper, we propose an approach using FRI-based technique to estimate the delay of the target, and a VMD-based approach for Doppler and micro-Doppler parameter estimation. A novel mathematical analysis is proposed to identify the initialization parameters for faster convergence of the VMD algorithm. Further, we provide simulation results to show that the proposed approach is capable of estimating the parameters of multiple targets even in the presence of noise.
{"title":"Micro-Doppler Parameter Estimation Using Variational Mode Decomposition With Finite Rate of Innovation","authors":"Shrikant Sharma, A. Girish, Darin Jeff, Garweet Sresth, Sanket Bhalerao, V. Gadre, C. Rao, P. Radhakrishna","doi":"10.1109/SPCOM55316.2022.9840804","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840804","url":null,"abstract":"The complete characterization of a target by radar involves estimation of its range and Doppler and micro-Doppler frequencies. Finite Rate of Innovation (FRI) approaches allow for sampling at sub-Nyquist rates. Empirical Mode Decomposition, which recursively decomposes a signal into different modes of unknown spectral bands, has performance limitations such as sensitivity to noise and sampling rates. These limitations are partially addressed by several variant algorithms; one of them is Variational Mode Decomposition (VMD), an entirely non-recursive model to extract the modes concurrently. In this paper, we propose an approach using FRI-based technique to estimate the delay of the target, and a VMD-based approach for Doppler and micro-Doppler parameter estimation. A novel mathematical analysis is proposed to identify the initialization parameters for faster convergence of the VMD algorithm. Further, we provide simulation results to show that the proposed approach is capable of estimating the parameters of multiple targets even in the presence of noise.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133256909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-11DOI: 10.1109/spcom55316.2022.9840800
{"title":"SPCOM 2022 Cover Page","authors":"","doi":"10.1109/spcom55316.2022.9840800","DOIUrl":"https://doi.org/10.1109/spcom55316.2022.9840800","url":null,"abstract":"","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122429292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-11DOI: 10.1109/SPCOM55316.2022.9840811
Vartika Sengar, S. VivekB., Gaurab Bhattacharya, J. Gubbi, Arpan Pal, P. Balamuralidhar
Identification of bias and its mitigation in a classifier is a fundamental sanity check required in trustworthy AI systems. There have been many methods for mitigation of bias in literature that use bias as apriori information. In this work, we propose a system that can detect the low-level bias (e.g., color, texture) and mitigate the same. A novel auto-encoder architecture to explain the predictions made by a deep neural network is built that helps in identification of the bias. The auto-encoder is trained to produce a generalized representation of the input image by decomposing it into a set of latent embeddings. These embeddings are learned by specializing the group of higher dimensional feature maps to learn the disentangled color and shape concepts. The shape embeddings are trained to reconstruct discrete wavelet transform components of an image and the color embeddings are trained to capture the color information. The feature specialization is done by reconstructing the RGB image using the shape embeddings modulated by color embeddings. We have shown that these representations can be used to detect low level bias in a classification task. Post detection of bias, we also propose a method to de-bias the classifier by training it with counterfactual images generated by manipulating the representations learned by the auto-encoder. We have shown that our proposed method of bias discovery and mitigation is able to achieve state-of-the-art results on ColorMNIST and the newly proposed BiasedShape dataset.
{"title":"Low-level Bias discovery and Mitigation for Image Classification","authors":"Vartika Sengar, S. VivekB., Gaurab Bhattacharya, J. Gubbi, Arpan Pal, P. Balamuralidhar","doi":"10.1109/SPCOM55316.2022.9840811","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840811","url":null,"abstract":"Identification of bias and its mitigation in a classifier is a fundamental sanity check required in trustworthy AI systems. There have been many methods for mitigation of bias in literature that use bias as apriori information. In this work, we propose a system that can detect the low-level bias (e.g., color, texture) and mitigate the same. A novel auto-encoder architecture to explain the predictions made by a deep neural network is built that helps in identification of the bias. The auto-encoder is trained to produce a generalized representation of the input image by decomposing it into a set of latent embeddings. These embeddings are learned by specializing the group of higher dimensional feature maps to learn the disentangled color and shape concepts. The shape embeddings are trained to reconstruct discrete wavelet transform components of an image and the color embeddings are trained to capture the color information. The feature specialization is done by reconstructing the RGB image using the shape embeddings modulated by color embeddings. We have shown that these representations can be used to detect low level bias in a classification task. Post detection of bias, we also propose a method to de-bias the classifier by training it with counterfactual images generated by manipulating the representations learned by the auto-encoder. We have shown that our proposed method of bias discovery and mitigation is able to achieve state-of-the-art results on ColorMNIST and the newly proposed BiasedShape dataset.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"2014 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127416977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-11DOI: 10.1109/SPCOM55316.2022.9840830
V. R. Lakkavalli
In this paper the classical paradigm of analysis by synthesis (AbS) for automatic speech recognition (ASR) is re-visited to enhance the performance of ASR. Although AbS paradigm holds promise to explain the process of perception as proposed in Motor Theory many challenges remain to be addressed to realize a practical ASR system based on it. In this paper, i) a general architecture for ASR using AbS is presented; and, ii) a new AbS-trellis is proposed which is used to realize the AbS loop considering combination of transition (coarticulation) cost and classification cost to search for best decoding path. Initial results on TIMIT database shows that substitution errors may be reduced by employing AbS. This shows promise for using AbS in ASR, and the results further highlight the need to identify an invariant phonetic representation space, a better distance metric (or coarticulation modelling), and synthesizer.
{"title":"AbS for ASR: A New Computational Perspective","authors":"V. R. Lakkavalli","doi":"10.1109/SPCOM55316.2022.9840830","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840830","url":null,"abstract":"In this paper the classical paradigm of analysis by synthesis (AbS) for automatic speech recognition (ASR) is re-visited to enhance the performance of ASR. Although AbS paradigm holds promise to explain the process of perception as proposed in Motor Theory many challenges remain to be addressed to realize a practical ASR system based on it. In this paper, i) a general architecture for ASR using AbS is presented; and, ii) a new AbS-trellis is proposed which is used to realize the AbS loop considering combination of transition (coarticulation) cost and classification cost to search for best decoding path. Initial results on TIMIT database shows that substitution errors may be reduced by employing AbS. This shows promise for using AbS in ASR, and the results further highlight the need to identify an invariant phonetic representation space, a better distance metric (or coarticulation modelling), and synthesizer.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115298717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}