Pub Date : 2022-07-11DOI: 10.1109/SPCOM55316.2022.9840783
Tamoghno Nath, K. Benerjee, Adrish Banerjee
In a Molecular-Communication-via-Diffusion (MCvD) channel, the molecules follow a simple Brownian motion that leads to an irregular arrival of the molecules at the receiver and introduces Inter-Symbol-Interference (ISI) in the channel. In this work, we have used different sequence distributions to analyze the effect of ISI in an MCvD channel. It has been shown that the ISI strictly depends on the location of bit-1s in the sequence, and accordingly, the expected ISI has been computed for all the proposed sequences based on the bit-1 positions in the sequence. We have also derived an upper bound on the expected ISI for the proposed sequences. We have shown that One-at-Starting-Position (OSP) sequence shows the best performance among all the proposed sequence distributions, with the expected ISI converging to a constant value. Simulation results also corroborate that the OSP sequence provides the lowest ISI in an MCvD channel compared to other codes studied in the literature.
{"title":"On Effect of Different Sequence Distributions on ISI in an MCvD System","authors":"Tamoghno Nath, K. Benerjee, Adrish Banerjee","doi":"10.1109/SPCOM55316.2022.9840783","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840783","url":null,"abstract":"In a Molecular-Communication-via-Diffusion (MCvD) channel, the molecules follow a simple Brownian motion that leads to an irregular arrival of the molecules at the receiver and introduces Inter-Symbol-Interference (ISI) in the channel. In this work, we have used different sequence distributions to analyze the effect of ISI in an MCvD channel. It has been shown that the ISI strictly depends on the location of bit-1s in the sequence, and accordingly, the expected ISI has been computed for all the proposed sequences based on the bit-1 positions in the sequence. We have also derived an upper bound on the expected ISI for the proposed sequences. We have shown that One-at-Starting-Position (OSP) sequence shows the best performance among all the proposed sequence distributions, with the expected ISI converging to a constant value. Simulation results also corroborate that the OSP sequence provides the lowest ISI in an MCvD channel compared to other codes studied in the literature.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116032464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-11DOI: 10.1109/SPCOM55316.2022.9840798
Ugrasen Singh, M. Bhatnagar
This paper introduces a reconfigurable intelligent surface (RIS)-aided wireless communication network, where an RIS simultaneously transmits the information and reflects an impinging radio frequency signal. Moreover, the RIS explicitly embeds its information bits in the discrete phase shifts of reflecting elements, which are selected from the reflection phase modulation (RPM) constellation. Further, access point (AP) exploits the pulse amplitude modulation (PAM) constellation to convey its information bits. Both RIS and AP independently transmit their data to the receiver using RPM and PAM symbols. In addition, joint decoding of RPM and PAM constellations symbols is performed using a maximum likelihood (ML)-detector, and a tight upper bound of the average bit error rate (ABER) is presented. A unified analytical framework of the average pairwise error probability over double Rayleigh fading channels is derived, which is followed by the ABER expression. Furthermore, it is noticed from the numerical results that the proposed scheme attains high data rates with remarkably lower error rates in very low SNR regime.
{"title":"An Information Transmission Scheme for RIS-Aided Wireless Communication Network","authors":"Ugrasen Singh, M. Bhatnagar","doi":"10.1109/SPCOM55316.2022.9840798","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840798","url":null,"abstract":"This paper introduces a reconfigurable intelligent surface (RIS)-aided wireless communication network, where an RIS simultaneously transmits the information and reflects an impinging radio frequency signal. Moreover, the RIS explicitly embeds its information bits in the discrete phase shifts of reflecting elements, which are selected from the reflection phase modulation (RPM) constellation. Further, access point (AP) exploits the pulse amplitude modulation (PAM) constellation to convey its information bits. Both RIS and AP independently transmit their data to the receiver using RPM and PAM symbols. In addition, joint decoding of RPM and PAM constellations symbols is performed using a maximum likelihood (ML)-detector, and a tight upper bound of the average bit error rate (ABER) is presented. A unified analytical framework of the average pairwise error probability over double Rayleigh fading channels is derived, which is followed by the ABER expression. Furthermore, it is noticed from the numerical results that the proposed scheme attains high data rates with remarkably lower error rates in very low SNR regime.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"50 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116119871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-11DOI: 10.1109/SPCOM55316.2022.9840846
Manjeer Majumder, Amrita Mishra, A. Jagannatham
This paper proposes an optimal training sequence framework for general block transmission systems over spatially correlated multiple-input multiple-output (MIMO) frequency selective channels. The pilot design is based on a formulation that minimizes the Bayesian Cramér-Rao bound (BCRB) for the mean squared error (MSE) of channel estimation, and is thus MSE optimal in nature. The novelty of the proposed work lies in development of a generic pilot design scheme applicable to all the four MIMO block transmission systems, namely single carrier cyclic prefix (SC-CP), single carrier zero padded (SC-ZP), multi-carrier zero padded (MC-ZP), and multi-carrier cyclic prefix (MC-CP) systems. Simulation results are presented to illustrate the superior performance of the proposed technique over the conventional pilot sequences, in terms of both MSE as well as bit error rate (BER).
{"title":"Optimal Training Design for Channel Estimation in MIMO Single/Multi Carrier Block Transmission Systems","authors":"Manjeer Majumder, Amrita Mishra, A. Jagannatham","doi":"10.1109/SPCOM55316.2022.9840846","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840846","url":null,"abstract":"This paper proposes an optimal training sequence framework for general block transmission systems over spatially correlated multiple-input multiple-output (MIMO) frequency selective channels. The pilot design is based on a formulation that minimizes the Bayesian Cramér-Rao bound (BCRB) for the mean squared error (MSE) of channel estimation, and is thus MSE optimal in nature. The novelty of the proposed work lies in development of a generic pilot design scheme applicable to all the four MIMO block transmission systems, namely single carrier cyclic prefix (SC-CP), single carrier zero padded (SC-ZP), multi-carrier zero padded (MC-ZP), and multi-carrier cyclic prefix (MC-CP) systems. Simulation results are presented to illustrate the superior performance of the proposed technique over the conventional pilot sequences, in terms of both MSE as well as bit error rate (BER).","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121782064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-11DOI: 10.1109/SPCOM55316.2022.9840773
Vinod S. Khandkar, M. Hanawal
The Internet is a common platform for sharing information. It is required to preserve every user’s privacy and security of information on the Internet. While data security is primarily taken care of by the TLS protocol and broader adaptation of HTTPS, FTPS, and SMPTS protocol, some fields of TLS expose the type of activity a user is performing, thus violating user privacy. One such protocol information is Server Name Indication (SNI) in the TLS ClinetHello message that goes in plaintext. Anyone intercepting the message thus identifies the service host type. We present a method named Extended TLS (ETLS) to mask the server host identity by encrypting the SNI without requiring any change in the existing protocols. In ETLS, a connection is established over two handshakes - the first handshake establishes a secure channel without sharing SNI information, and the second handshake shares the encrypted SNI. ETLS requires no modification in the already proven TLS encryption mechanism and retains all security benefits of the existing secure channel establishment. We demonstrate the feasibility of ETLS over live Internet with scripts that implement our methodology. Using a customized client-server and a commercial traffic shaper, we also demonstrated that the host identity is not exposed under ETLS, thus demonstrating its privacy-preserving property.
{"title":"Extended TLS: Masking Server Host Identity on the Internet Using Encrypted TLS Handshake","authors":"Vinod S. Khandkar, M. Hanawal","doi":"10.1109/SPCOM55316.2022.9840773","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840773","url":null,"abstract":"The Internet is a common platform for sharing information. It is required to preserve every user’s privacy and security of information on the Internet. While data security is primarily taken care of by the TLS protocol and broader adaptation of HTTPS, FTPS, and SMPTS protocol, some fields of TLS expose the type of activity a user is performing, thus violating user privacy. One such protocol information is Server Name Indication (SNI) in the TLS ClinetHello message that goes in plaintext. Anyone intercepting the message thus identifies the service host type. We present a method named Extended TLS (ETLS) to mask the server host identity by encrypting the SNI without requiring any change in the existing protocols. In ETLS, a connection is established over two handshakes - the first handshake establishes a secure channel without sharing SNI information, and the second handshake shares the encrypted SNI. ETLS requires no modification in the already proven TLS encryption mechanism and retains all security benefits of the existing secure channel establishment. We demonstrate the feasibility of ETLS over live Internet with scripts that implement our methodology. Using a customized client-server and a commercial traffic shaper, we also demonstrated that the host identity is not exposed under ETLS, thus demonstrating its privacy-preserving property.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126725285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-11DOI: 10.1109/SPCOM55316.2022.9840850
Mrinmoy Bhattacharjee, S. Prasanna, P. Guha
The separation of foreground and background sounds can serve as a useful preprocessing step when dealing with real-world audio signals. This work proposes a foreground-background audio separation (FBAS) algorithm that uses spectral peak information for generating time-frequency masks. The proposed algorithm can work without training, is relatively fast, and provides decent audio separation. As a specific use case, the proposed algorithm is used to extract clean foreground signals from noisy speech signals. The quality of foreground speech separated with FBAS is compared with the output of a state-of-the-art deep-learning-based speech enhancement system. Various subjective and objective evaluation measures are computed, which indicate that the proposed FBAS algorithm is effective.
{"title":"Foreground-Background Audio Separation using Spectral Peaks based Time-Frequency Masks","authors":"Mrinmoy Bhattacharjee, S. Prasanna, P. Guha","doi":"10.1109/SPCOM55316.2022.9840850","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840850","url":null,"abstract":"The separation of foreground and background sounds can serve as a useful preprocessing step when dealing with real-world audio signals. This work proposes a foreground-background audio separation (FBAS) algorithm that uses spectral peak information for generating time-frequency masks. The proposed algorithm can work without training, is relatively fast, and provides decent audio separation. As a specific use case, the proposed algorithm is used to extract clean foreground signals from noisy speech signals. The quality of foreground speech separated with FBAS is compared with the output of a state-of-the-art deep-learning-based speech enhancement system. Various subjective and objective evaluation measures are computed, which indicate that the proposed FBAS algorithm is effective.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"1987 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131340451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-11DOI: 10.1109/SPCOM55316.2022.9840804
Shrikant Sharma, A. Girish, Darin Jeff, Garweet Sresth, Sanket Bhalerao, V. Gadre, C. Rao, P. Radhakrishna
The complete characterization of a target by radar involves estimation of its range and Doppler and micro-Doppler frequencies. Finite Rate of Innovation (FRI) approaches allow for sampling at sub-Nyquist rates. Empirical Mode Decomposition, which recursively decomposes a signal into different modes of unknown spectral bands, has performance limitations such as sensitivity to noise and sampling rates. These limitations are partially addressed by several variant algorithms; one of them is Variational Mode Decomposition (VMD), an entirely non-recursive model to extract the modes concurrently. In this paper, we propose an approach using FRI-based technique to estimate the delay of the target, and a VMD-based approach for Doppler and micro-Doppler parameter estimation. A novel mathematical analysis is proposed to identify the initialization parameters for faster convergence of the VMD algorithm. Further, we provide simulation results to show that the proposed approach is capable of estimating the parameters of multiple targets even in the presence of noise.
{"title":"Micro-Doppler Parameter Estimation Using Variational Mode Decomposition With Finite Rate of Innovation","authors":"Shrikant Sharma, A. Girish, Darin Jeff, Garweet Sresth, Sanket Bhalerao, V. Gadre, C. Rao, P. Radhakrishna","doi":"10.1109/SPCOM55316.2022.9840804","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840804","url":null,"abstract":"The complete characterization of a target by radar involves estimation of its range and Doppler and micro-Doppler frequencies. Finite Rate of Innovation (FRI) approaches allow for sampling at sub-Nyquist rates. Empirical Mode Decomposition, which recursively decomposes a signal into different modes of unknown spectral bands, has performance limitations such as sensitivity to noise and sampling rates. These limitations are partially addressed by several variant algorithms; one of them is Variational Mode Decomposition (VMD), an entirely non-recursive model to extract the modes concurrently. In this paper, we propose an approach using FRI-based technique to estimate the delay of the target, and a VMD-based approach for Doppler and micro-Doppler parameter estimation. A novel mathematical analysis is proposed to identify the initialization parameters for faster convergence of the VMD algorithm. Further, we provide simulation results to show that the proposed approach is capable of estimating the parameters of multiple targets even in the presence of noise.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133256909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-11DOI: 10.1109/SPCOM55316.2022.9840816
Y. Vasavada, Bibin Baby John
This paper shows that the information transmitted over the spatial dimension in the Multiple Input Multiple Output systems with Spatial Modulation (MIMO-SM) is sensitive to the transmission of the conventional amplitude and phase modulated (APM) symbols with small magnitude (e.g. from the inner ring of the constellation). This sensitivity is a limiting factor in the performance of the quadrature amplitude modulated (QAM) SM. We propose three novel MIMO-SM constellation designs to mitigate the performance limitation: (i) a hybrid PSK-QAM MIMO-SM that leverages the constant modulus phase shift keying (PSK) in conjunction with the QAM to minimize the sum of the antenna errors and the APM symbol errors. This is achieved by increasing the number of transmit antennas, which need not be an integer power of two; (ii) MIMO-SM with a novel APM QAM constellation, with optimized radius of the inner constellation ring; and (iii) a MIMO-SM that transmits the QAM symbols from the inner ring on different orthogonal resources (e.g., subcarriers) to reduce their impact on the antenna errors. The simulation results demonstrate the performance benefit of the proposed approaches compared to the conventional SM.
{"title":"Constellation Designs for the Spatial Modulation MIMO Systems","authors":"Y. Vasavada, Bibin Baby John","doi":"10.1109/SPCOM55316.2022.9840816","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840816","url":null,"abstract":"This paper shows that the information transmitted over the spatial dimension in the Multiple Input Multiple Output systems with Spatial Modulation (MIMO-SM) is sensitive to the transmission of the conventional amplitude and phase modulated (APM) symbols with small magnitude (e.g. from the inner ring of the constellation). This sensitivity is a limiting factor in the performance of the quadrature amplitude modulated (QAM) SM. We propose three novel MIMO-SM constellation designs to mitigate the performance limitation: (i) a hybrid PSK-QAM MIMO-SM that leverages the constant modulus phase shift keying (PSK) in conjunction with the QAM to minimize the sum of the antenna errors and the APM symbol errors. This is achieved by increasing the number of transmit antennas, which need not be an integer power of two; (ii) MIMO-SM with a novel APM QAM constellation, with optimized radius of the inner constellation ring; and (iii) a MIMO-SM that transmits the QAM symbols from the inner ring on different orthogonal resources (e.g., subcarriers) to reduce their impact on the antenna errors. The simulation results demonstrate the performance benefit of the proposed approaches compared to the conventional SM.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133554370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Despite high prediction accuracy, deep networks are vulnerable to adversarial attacks, designed by inducing human-indiscernible perturbations to clean images. Hence, adversarial samples can mislead already trained deep networks. The process of generating adversarial examples can assist us in investigating the robustness of different models. Many developed adversarial attacks often fail under challenging black-box settings. Hence, it is required to improve transferability of adversarial attacks to an unknown model. In this aspect, we propose to increase the rate of transferability by inducing linearity in a few intermediate layers of architecture. The proposed design does not disturb the original architecture much. The design focuses on significance of intermediate layers in generating feature maps suitable for a task. By analyzing the intermediate feature maps of architecture, a particular layer can be more perturbed to improve the transferability. The performance is further enhanced by considering diverse input patterns. Experimental results demonstrate the success in increasing the transferability of our proposition.
{"title":"Increasing Transferability by Imposing Linearity and Perturbation in Intermediate Layer with Diverse Input Patterns","authors":"Meet Shah, Srimanta Mandal, Shruti Bhilare, Avik Hati Dhirubhai","doi":"10.1109/SPCOM55316.2022.9840512","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840512","url":null,"abstract":"Despite high prediction accuracy, deep networks are vulnerable to adversarial attacks, designed by inducing human-indiscernible perturbations to clean images. Hence, adversarial samples can mislead already trained deep networks. The process of generating adversarial examples can assist us in investigating the robustness of different models. Many developed adversarial attacks often fail under challenging black-box settings. Hence, it is required to improve transferability of adversarial attacks to an unknown model. In this aspect, we propose to increase the rate of transferability by inducing linearity in a few intermediate layers of architecture. The proposed design does not disturb the original architecture much. The design focuses on significance of intermediate layers in generating feature maps suitable for a task. By analyzing the intermediate feature maps of architecture, a particular layer can be more perturbed to improve the transferability. The performance is further enhanced by considering diverse input patterns. Experimental results demonstrate the success in increasing the transferability of our proposition.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124562299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-11DOI: 10.1109/SPCOM55316.2022.9840821
T. Pratap, Priyanka Kokil
Cataract is the most common cause of blindness in the world. Early detection and treatment can lower the risk of cataract progression. The diagnostic performance of existing computer-aided cataract grading (CACG) methods often deteriorates due to the sophisticated image capture technology. The common retinal fundus image aberrations such as noise and blur are unavoidable in practice. In this paper, a CACG method is proposed to achieve robust cataract grading under adversarial conditions such as noise and blur. The presented CACG method is designed using three deep neural network variants. Each variant is fine-tuned individually using good, noisy, and blur retinal fundus images to achieve optimum performance. Further, the input image quality detection module is incorporated in the proposed CACG method to detect input image distortion and then pivots the input image to the desired deep neural network variant. Gaussian noise and blur models are used to evaluate the effectiveness of the suggested CACG method. The proposed CACG approach exhibits superior performance to existing methods under adversarial conditions.
{"title":"Computer-aided Cataract Grading Under Adversarial Environment","authors":"T. Pratap, Priyanka Kokil","doi":"10.1109/SPCOM55316.2022.9840821","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840821","url":null,"abstract":"Cataract is the most common cause of blindness in the world. Early detection and treatment can lower the risk of cataract progression. The diagnostic performance of existing computer-aided cataract grading (CACG) methods often deteriorates due to the sophisticated image capture technology. The common retinal fundus image aberrations such as noise and blur are unavoidable in practice. In this paper, a CACG method is proposed to achieve robust cataract grading under adversarial conditions such as noise and blur. The presented CACG method is designed using three deep neural network variants. Each variant is fine-tuned individually using good, noisy, and blur retinal fundus images to achieve optimum performance. Further, the input image quality detection module is incorporated in the proposed CACG method to detect input image distortion and then pivots the input image to the desired deep neural network variant. Gaussian noise and blur models are used to evaluate the effectiveness of the suggested CACG method. The proposed CACG approach exhibits superior performance to existing methods under adversarial conditions.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129220025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-11DOI: 10.1109/SPCOM55316.2022.9840808
Supritha M. Shetty, Suraj Durgesht, K. Deepak
Electroglottograph (EGG) is a device used to measure the conductance between the vocal folds. The analysis of EGG signal has many applications in the literature such as speech-to-text synthesis, voice disorder analysis, emotion recognition, speaker verification, etc. Therefore, the EGG device is essential to record the vocal folds activity. Alternatively, a new method is proposed in this work to synthesize the EGG waveform from speech signal using a context aggregation convolutional neural network. The synthesis network is trained by accounting the deep feature losses obtained by comparing it with another network called the EGG classification network. The synthesized EGG signal needs to be characterized. During the voiced speech production, the instants at which the vocal folds attain complete closure are called glottal closure instants (GCIs). Likewise, the opening instants are called glottal opening instants (GOIs). Such instants are reliably measured using the EGG signal. The performance of the proposed method is compared with other state-of-the-art techniques. The CMU-Arctic database has a parallel corpus of speech and EGG signal recorded simultaneously. This database is used for training the synthesis network and for comparison purposes. It is found that the performance of extracting glottal instants from synthesized EGG signals is comparable to other methods.
{"title":"Glottal instants extraction from speech signal using Deep Feature Loss","authors":"Supritha M. Shetty, Suraj Durgesht, K. Deepak","doi":"10.1109/SPCOM55316.2022.9840808","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840808","url":null,"abstract":"Electroglottograph (EGG) is a device used to measure the conductance between the vocal folds. The analysis of EGG signal has many applications in the literature such as speech-to-text synthesis, voice disorder analysis, emotion recognition, speaker verification, etc. Therefore, the EGG device is essential to record the vocal folds activity. Alternatively, a new method is proposed in this work to synthesize the EGG waveform from speech signal using a context aggregation convolutional neural network. The synthesis network is trained by accounting the deep feature losses obtained by comparing it with another network called the EGG classification network. The synthesized EGG signal needs to be characterized. During the voiced speech production, the instants at which the vocal folds attain complete closure are called glottal closure instants (GCIs). Likewise, the opening instants are called glottal opening instants (GOIs). Such instants are reliably measured using the EGG signal. The performance of the proposed method is compared with other state-of-the-art techniques. The CMU-Arctic database has a parallel corpus of speech and EGG signal recorded simultaneously. This database is used for training the synthesis network and for comparison purposes. It is found that the performance of extracting glottal instants from synthesized EGG signals is comparable to other methods.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125330721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}