Pub Date : 2020-07-01DOI: 10.1109/SPCOM50965.2020.9179618
Shaik Basheeruddin Shah, Vijay Kumar Chakka
In signal processing applications the information about the signal such as frequency (or) period is known a prior for most of the practical signals like ECG, EEG, speech, etc. Inspired by this, in this paper, we propose a new signal representation to estimate the period and frequency information of a given signal with low computational complexity. We achieve this by representing a finite-length discrete-time signal as a linear combination of signals belongs to Ramanujan subspaces. Further, we evaluate the performance of the proposed representation with a simulated example and also by addressing the problem of reducing Power Line Interference (PLI) in an ECG signal. Finally, for a given integer-valued signal, we show that the computational complexity of the proposed transform is quite low in comparison with the existing transforms, and it is quite comparable for a given real (or) complex-valued signal.
{"title":"Signal Representation Using Ramanujan Subspaces Utilizing A Prior Signal Information","authors":"Shaik Basheeruddin Shah, Vijay Kumar Chakka","doi":"10.1109/SPCOM50965.2020.9179618","DOIUrl":"https://doi.org/10.1109/SPCOM50965.2020.9179618","url":null,"abstract":"In signal processing applications the information about the signal such as frequency (or) period is known a prior for most of the practical signals like ECG, EEG, speech, etc. Inspired by this, in this paper, we propose a new signal representation to estimate the period and frequency information of a given signal with low computational complexity. We achieve this by representing a finite-length discrete-time signal as a linear combination of signals belongs to Ramanujan subspaces. Further, we evaluate the performance of the proposed representation with a simulated example and also by addressing the problem of reducing Power Line Interference (PLI) in an ECG signal. Finally, for a given integer-valued signal, we show that the computational complexity of the proposed transform is quite low in comparison with the existing transforms, and it is quite comparable for a given real (or) complex-valued signal.","PeriodicalId":208527,"journal":{"name":"2020 International Conference on Signal Processing and Communications (SPCOM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120988710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/SPCOM50965.2020.9179630
Ankit, M. Bhatnagar
Dimensionality of the fluid environment plays a crucial role in characterizing the diffusive channel. It is generally believed that increasing the dimensionality of the fluid medium should negatively affect the hitting probabilities as the degrees of freedom of propagating molecules have been enhanced. This paper has twofold objectives, it provides the diffusion channel characterization of a molecular communication (MC) system in an enclosed cuboid geometry and then studies the effect of dimensionality and the size of the receiver on the obtained channel statistics. The motility probability distribution function (PDF) of the molecules in a constrained cuboid environment with five reflecting and one absorbing wall is derived. The first hitting time (FHT) PDF and the hitting probabilities of the molecules to the absorbing wall are deduced from the same. A comparative analytical study of the derived FHT PDF against the diffusion channel statistics of various bounded and unbounded environments is presented. The comparison quantitatively establishes that an MC system with suitably configured fluid boundaries and transmitter and receiver arrangement can completely eliminate the effect of dimensionality and the size of the receiver on the hitting probabilities. The study may be of use in designing practically efficient and economic MC systems.
{"title":"Diffusion Channel Characterization for A Cuboid Container: Some Insights into The Role of Dimensionality and Fluid Boundaries","authors":"Ankit, M. Bhatnagar","doi":"10.1109/SPCOM50965.2020.9179630","DOIUrl":"https://doi.org/10.1109/SPCOM50965.2020.9179630","url":null,"abstract":"Dimensionality of the fluid environment plays a crucial role in characterizing the diffusive channel. It is generally believed that increasing the dimensionality of the fluid medium should negatively affect the hitting probabilities as the degrees of freedom of propagating molecules have been enhanced. This paper has twofold objectives, it provides the diffusion channel characterization of a molecular communication (MC) system in an enclosed cuboid geometry and then studies the effect of dimensionality and the size of the receiver on the obtained channel statistics. The motility probability distribution function (PDF) of the molecules in a constrained cuboid environment with five reflecting and one absorbing wall is derived. The first hitting time (FHT) PDF and the hitting probabilities of the molecules to the absorbing wall are deduced from the same. A comparative analytical study of the derived FHT PDF against the diffusion channel statistics of various bounded and unbounded environments is presented. The comparison quantitatively establishes that an MC system with suitably configured fluid boundaries and transmitter and receiver arrangement can completely eliminate the effect of dimensionality and the size of the receiver on the hitting probabilities. The study may be of use in designing practically efficient and economic MC systems.","PeriodicalId":208527,"journal":{"name":"2020 International Conference on Signal Processing and Communications (SPCOM)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130425554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/SPCOM50965.2020.9179566
K. S. S. Anudeep, Kuldeep Khoria, R. Das
For identifying sparse systems, a recently proposed algorithm called upper threshold based zero attracting proportionate normalized least mean square (UT-ZA-PNLMS) algorithm has shown improved performance in terms of both the convergence rate and steady-state error in comparison to the ZAPNLMS algorithm. The UT-ZA-PNLMS algorithm employs adaptive threshold based gain function in order to improve convergence rate of the active taps, especially the taps with low magnitude, and appends zero attracting term in the update equation in order to bring the inactive taps to their optimum zero level. However, as the UT-ZA-PNLMS algorithm uses uniform shrinkage for that zero attraction, the active taps experience significant bias which limits overall steady-state performance. In this paper, we introduce selective shrinkage for the zero attracting term so that the inactive taps get strong attractive force whereas the active taps would experience negligibly small attractive force, and thus the bias in the active tap is reduced. In particular, we propose three different algorithms incorporating log-sum, $ell_{p^{-}}$ norm and $ell_{0}$-norm penalties to the cost function of the upper threshold based PNLMS algorithm. The resulting algorithms are studied extensively and the simulation results show their improved steady-state performances.
{"title":"Improving Steady-State Performance of the UT-ZA-PNLMS Algorithm for Sparse Systems","authors":"K. S. S. Anudeep, Kuldeep Khoria, R. Das","doi":"10.1109/SPCOM50965.2020.9179566","DOIUrl":"https://doi.org/10.1109/SPCOM50965.2020.9179566","url":null,"abstract":"For identifying sparse systems, a recently proposed algorithm called upper threshold based zero attracting proportionate normalized least mean square (UT-ZA-PNLMS) algorithm has shown improved performance in terms of both the convergence rate and steady-state error in comparison to the ZAPNLMS algorithm. The UT-ZA-PNLMS algorithm employs adaptive threshold based gain function in order to improve convergence rate of the active taps, especially the taps with low magnitude, and appends zero attracting term in the update equation in order to bring the inactive taps to their optimum zero level. However, as the UT-ZA-PNLMS algorithm uses uniform shrinkage for that zero attraction, the active taps experience significant bias which limits overall steady-state performance. In this paper, we introduce selective shrinkage for the zero attracting term so that the inactive taps get strong attractive force whereas the active taps would experience negligibly small attractive force, and thus the bias in the active tap is reduced. In particular, we propose three different algorithms incorporating log-sum, $ell_{p^{-}}$ norm and $ell_{0}$-norm penalties to the cost function of the upper threshold based PNLMS algorithm. The resulting algorithms are studied extensively and the simulation results show their improved steady-state performances.","PeriodicalId":208527,"journal":{"name":"2020 International Conference on Signal Processing and Communications (SPCOM)","volume":"318 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134262498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/SPCOM50965.2020.9179523
Vinith Kishore, Subhadip Mukherjee, C. Seelamantula
We consider the problem of reconstructing a complex-valued signal from its phase-only measurements. This framework can be considered as a generalization of the well-known one-bit compressed sensing paradigm where the underlying signal is known to be sparse. In contrast, the proposed formalism does not rely on the assumption of sparsity and hence applies to a broader class of signals. The optimization problem for signal reconstruction is formulated by first splitting the linear measurement vector into its phase and magnitude components and subsequently using the non-negativity property of the magnitude component as a constraint. The resulting optimization problem turns out to be a quadratic program (QP) and is solved using two algorithms: (i) alternating directions method of multipliers; and (ii) projected gradient-descent with Nesterov’s momentum. Due to the inherent scale ambiguity of the phase-only measurement, the underlying signal can be reconstructed only up to a global scale-factor. We obtain high accuracy for reconstructing 1–D synthetic signals in the absence of noise. We also show an application of the proposed approach in reconstructing images from the phase of their measurement coefficients. The underlying image is recovered up to a peak signal-to-noise ratio exceeding 30 dB in several examples, indicating an accurate reconstruction.
{"title":"PhaseSense — Signal Reconstruction from Phase-Only Measurements via Quadratic Programming","authors":"Vinith Kishore, Subhadip Mukherjee, C. Seelamantula","doi":"10.1109/SPCOM50965.2020.9179523","DOIUrl":"https://doi.org/10.1109/SPCOM50965.2020.9179523","url":null,"abstract":"We consider the problem of reconstructing a complex-valued signal from its phase-only measurements. This framework can be considered as a generalization of the well-known one-bit compressed sensing paradigm where the underlying signal is known to be sparse. In contrast, the proposed formalism does not rely on the assumption of sparsity and hence applies to a broader class of signals. The optimization problem for signal reconstruction is formulated by first splitting the linear measurement vector into its phase and magnitude components and subsequently using the non-negativity property of the magnitude component as a constraint. The resulting optimization problem turns out to be a quadratic program (QP) and is solved using two algorithms: (i) alternating directions method of multipliers; and (ii) projected gradient-descent with Nesterov’s momentum. Due to the inherent scale ambiguity of the phase-only measurement, the underlying signal can be reconstructed only up to a global scale-factor. We obtain high accuracy for reconstructing 1–D synthetic signals in the absence of noise. We also show an application of the proposed approach in reconstructing images from the phase of their measurement coefficients. The underlying image is recovered up to a peak signal-to-noise ratio exceeding 30 dB in several examples, indicating an accurate reconstruction.","PeriodicalId":208527,"journal":{"name":"2020 International Conference on Signal Processing and Communications (SPCOM)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133439124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/SPCOM50965.2020.9179633
Parvez Shaik, P. Singya, Nagendra Kumar, Kamal K. Garg, V. Bhatia
Simultaneous wireless information and power transfer (SWIPT) is an efficient solution for the power scarce wireless communications. In this paper, we consider a relay assisted multi-input and multi-output (MIMO) device-to-device (D2D) communications with a practical scenario of imperfect channel state information (CSI) over a generalized Nakagami-m fading channels. Further, at the relay node, energy from the received radio-frequency (RF) signals is harvested by adopting time-switch (TS) protocol for broadcasting a signal to the destination. In a resource limited environment, it is deterrent to use all the MIMO antennas due to increased system complexity with the dedicated RF chain for each active antenna. Thus, transmit antenna selection strategy (TAS) is considered in this work. Framework for the outage probability and asymptotic outage probability of TAS based MIMO D2D relay system is provided. It is observed that the diversity order of the system gets affected with small variation in imperfect CSI correlation coefficient and throughput of the system gets affected severely with increasing rates with imperfect CSI. Further, Monte-Carlo simulations are performed to validate the derived analytical expressions.
{"title":"On Impact of Imperfect CSI over SWIPT Device-to-Device (D2D) MIMO Relay Systems","authors":"Parvez Shaik, P. Singya, Nagendra Kumar, Kamal K. Garg, V. Bhatia","doi":"10.1109/SPCOM50965.2020.9179633","DOIUrl":"https://doi.org/10.1109/SPCOM50965.2020.9179633","url":null,"abstract":"Simultaneous wireless information and power transfer (SWIPT) is an efficient solution for the power scarce wireless communications. In this paper, we consider a relay assisted multi-input and multi-output (MIMO) device-to-device (D2D) communications with a practical scenario of imperfect channel state information (CSI) over a generalized Nakagami-m fading channels. Further, at the relay node, energy from the received radio-frequency (RF) signals is harvested by adopting time-switch (TS) protocol for broadcasting a signal to the destination. In a resource limited environment, it is deterrent to use all the MIMO antennas due to increased system complexity with the dedicated RF chain for each active antenna. Thus, transmit antenna selection strategy (TAS) is considered in this work. Framework for the outage probability and asymptotic outage probability of TAS based MIMO D2D relay system is provided. It is observed that the diversity order of the system gets affected with small variation in imperfect CSI correlation coefficient and throughput of the system gets affected severely with increasing rates with imperfect CSI. Further, Monte-Carlo simulations are performed to validate the derived analytical expressions.","PeriodicalId":208527,"journal":{"name":"2020 International Conference on Signal Processing and Communications (SPCOM)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128991829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/SPCOM50965.2020.9179583
Divyesh G. Rajpura, Jui Shah, Maitreya Patel, Harshit Malaviya, K. Phatnani, H. Patil
Singing voice conversion (SVC) is a task of converting the perception of the source speaker’s identity to the target speaker without changing lyrics and rhythm. Recent approaches in traditional voice conversion involve the use of the generative models, such as Variational Autoencoders (VAE), and Generative Adversarial Networks (GANs). However, in the case of SVC, GANs are not explored much. The only system that has been proposed in the literature uses traditional GAN on the parallel data. The parallel data collection for real scenarios (with the same background music) is not feasible. Moreover, in the presence of background music, SVC is one of the most challenging tasks as it involves the source separation of vocals from the inputs, which will have some noise. Therefore, in this paper, we propose transfer learning, and fine-tuning-based Cycle consistent GAN (CycleGAN) model for non-parallel SVC, where music source separation is done using Deep Attractor Network (DANet). We designed seven different possible systems to identify the best possible combination of transfer learning and fine-tuning. Here, we use a more challenging database, MUSDB18, as our primary dataset, and we also use the NUS-48E database to pre-train CycleGAN. We perform extensive analysis via objective and subjective measures and report that with a 4.14 MOS score out of 5 for naturalness, the CycleGAN model pre-trained on NUS-48E corpus performs the best compared to the other systems described in the paper.
{"title":"Effectiveness of Transfer Learning on Singing Voice Conversion in the Presence of Background Music","authors":"Divyesh G. Rajpura, Jui Shah, Maitreya Patel, Harshit Malaviya, K. Phatnani, H. Patil","doi":"10.1109/SPCOM50965.2020.9179583","DOIUrl":"https://doi.org/10.1109/SPCOM50965.2020.9179583","url":null,"abstract":"Singing voice conversion (SVC) is a task of converting the perception of the source speaker’s identity to the target speaker without changing lyrics and rhythm. Recent approaches in traditional voice conversion involve the use of the generative models, such as Variational Autoencoders (VAE), and Generative Adversarial Networks (GANs). However, in the case of SVC, GANs are not explored much. The only system that has been proposed in the literature uses traditional GAN on the parallel data. The parallel data collection for real scenarios (with the same background music) is not feasible. Moreover, in the presence of background music, SVC is one of the most challenging tasks as it involves the source separation of vocals from the inputs, which will have some noise. Therefore, in this paper, we propose transfer learning, and fine-tuning-based Cycle consistent GAN (CycleGAN) model for non-parallel SVC, where music source separation is done using Deep Attractor Network (DANet). We designed seven different possible systems to identify the best possible combination of transfer learning and fine-tuning. Here, we use a more challenging database, MUSDB18, as our primary dataset, and we also use the NUS-48E database to pre-train CycleGAN. We perform extensive analysis via objective and subjective measures and report that with a 4.14 MOS score out of 5 for naturalness, the CycleGAN model pre-trained on NUS-48E corpus performs the best compared to the other systems described in the paper.","PeriodicalId":208527,"journal":{"name":"2020 International Conference on Signal Processing and Communications (SPCOM)","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129005201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/SPCOM50965.2020.9179556
Balasubramanyam Appina
We propose a complete blind no-reference (NR) image quality assessment algorithm for assessing the perceptual quality of natural stereoscopic (S3D) images. Towards this end, we have generated an intermediate image from the left and right views, and hypothesize that the perceived quality of the S3D view close to that cyclopean image. We perform multi-steerable decomposition on cyclopean images and we compute the naturalness image quality evaluator (NIQE) score [1] and entropy score from each subband. Finally, the primitive quality scores of steerable subbands are pooled to obtain the overall perceptual quality score of an S3D image. The proposed algorithm is evaluated on the LIVE Phase I [2] and LIVE Phase II [3] stereoscopic image datasets and demonstrates its robust performance on both the datasets and across distortions. The proposed algorithm, which is a ‘complete blind’ model (neither requires pristine S3D images nor requires training on human opinion scores), is called the Multi-Orient NIQE based 3D image quality evaluator (MO-NIQE).
{"title":"A ‘Complete Blind’ No-Reference Stereoscopic Image Quality Assessment Algorithm","authors":"Balasubramanyam Appina","doi":"10.1109/SPCOM50965.2020.9179556","DOIUrl":"https://doi.org/10.1109/SPCOM50965.2020.9179556","url":null,"abstract":"We propose a complete blind no-reference (NR) image quality assessment algorithm for assessing the perceptual quality of natural stereoscopic (S3D) images. Towards this end, we have generated an intermediate image from the left and right views, and hypothesize that the perceived quality of the S3D view close to that cyclopean image. We perform multi-steerable decomposition on cyclopean images and we compute the naturalness image quality evaluator (NIQE) score [1] and entropy score from each subband. Finally, the primitive quality scores of steerable subbands are pooled to obtain the overall perceptual quality score of an S3D image. The proposed algorithm is evaluated on the LIVE Phase I [2] and LIVE Phase II [3] stereoscopic image datasets and demonstrates its robust performance on both the datasets and across distortions. The proposed algorithm, which is a ‘complete blind’ model (neither requires pristine S3D images nor requires training on human opinion scores), is called the Multi-Orient NIQE based 3D image quality evaluator (MO-NIQE).","PeriodicalId":208527,"journal":{"name":"2020 International Conference on Signal Processing and Communications (SPCOM)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123550399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/SPCOM50965.2020.9179526
M. Sivasankar, R. Hegde
Development of angular superresolution methods for resolving targets using multifunction phased array radar is challenging. Angular superresolution of closely spaced coherent targets with strong interferences in the context of phased array radar has hitherto not been addressed. In this paper a novel beamforming method with angular superresolution is proposed for resolving closely spaced coherent targets in the presence of interferences. A dynamic subarray beamforming framework is first developed based on the knowledge of the number of interferences. The output obtained from the dynamic subarray beamformer is then smoothed using an augmented covariance method to account for the coherence of targets. Superresolution method is then used to obtain robust DOA estimates even at low SNR. Experiments on DOA estimation are conducted in typical target detection scenarios and the results are evaluated using several performance metrics to illustrate the significance of the proposed method.
{"title":"Dynamic Subarray Beamforming for Angular Superresolution of Coherent Targets","authors":"M. Sivasankar, R. Hegde","doi":"10.1109/SPCOM50965.2020.9179526","DOIUrl":"https://doi.org/10.1109/SPCOM50965.2020.9179526","url":null,"abstract":"Development of angular superresolution methods for resolving targets using multifunction phased array radar is challenging. Angular superresolution of closely spaced coherent targets with strong interferences in the context of phased array radar has hitherto not been addressed. In this paper a novel beamforming method with angular superresolution is proposed for resolving closely spaced coherent targets in the presence of interferences. A dynamic subarray beamforming framework is first developed based on the knowledge of the number of interferences. The output obtained from the dynamic subarray beamformer is then smoothed using an augmented covariance method to account for the coherence of targets. Superresolution method is then used to obtain robust DOA estimates even at low SNR. Experiments on DOA estimation are conducted in typical target detection scenarios and the results are evaluated using several performance metrics to illustrate the significance of the proposed method.","PeriodicalId":208527,"journal":{"name":"2020 International Conference on Signal Processing and Communications (SPCOM)","volume":"36 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124507378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/SPCOM50965.2020.9179550
P. Jain, K. Gurugubelli, A. Vuppala
Language Identification (LID) is an integral part of multilingual speech systems. There are various conditions under which the performance of LID systems are sub-optimal, such as short duration, noise, channel variation, and so on. There has been effort to improve performance under these conditions, but the impact of speaker emotion variation on the performance of LID systems has not been studied. It is observed that the performance of LID systems degrade in the presence of emotional mismatch between train and test conditions. To that effect, we investigated adaptation approaches for improving the performance of LID systems by incorporating emotional utterances in form of adaptation dataset. Hence, we studied a prosody modification technique called Flexible Analysis Synthesis Tool (FAST) to vary the emotional characteristics of an utterance in order to improve the performance, but the results were inconsistent and not satisfactory. In this work, we propose a combination of Recurrent Convolutional Neural Network (RCNN) based architecture with multi stage training methodology, which outperformed state-ofart LID systems such as i-vectors, time delay neural network, long short term memory, and deep neural network x-vector.
{"title":"Towards Emotion Independent Language Identification System","authors":"P. Jain, K. Gurugubelli, A. Vuppala","doi":"10.1109/SPCOM50965.2020.9179550","DOIUrl":"https://doi.org/10.1109/SPCOM50965.2020.9179550","url":null,"abstract":"Language Identification (LID) is an integral part of multilingual speech systems. There are various conditions under which the performance of LID systems are sub-optimal, such as short duration, noise, channel variation, and so on. There has been effort to improve performance under these conditions, but the impact of speaker emotion variation on the performance of LID systems has not been studied. It is observed that the performance of LID systems degrade in the presence of emotional mismatch between train and test conditions. To that effect, we investigated adaptation approaches for improving the performance of LID systems by incorporating emotional utterances in form of adaptation dataset. Hence, we studied a prosody modification technique called Flexible Analysis Synthesis Tool (FAST) to vary the emotional characteristics of an utterance in order to improve the performance, but the results were inconsistent and not satisfactory. In this work, we propose a combination of Recurrent Convolutional Neural Network (RCNN) based architecture with multi stage training methodology, which outperformed state-ofart LID systems such as i-vectors, time delay neural network, long short term memory, and deep neural network x-vector.","PeriodicalId":208527,"journal":{"name":"2020 International Conference on Signal Processing and Communications (SPCOM)","volume":"145 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121190203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/SPCOM50965.2020.9179596
Ashok Bandi, R. B. S. Mysore, S. Chatzinotas, B. Ottersten
This paper studies the joint design of user scheduling and precoding for the maximization of spectral efficiency (SE) for a multigroup multicast scenario in multiuser MISO downlink channels. Noticing that the existing definition of SE fails to account for group sizes, a new metric called multicast spectral efficiency (MC-SE) is proposed. In this context, the joint design is considered for the maximization of MC-SE. Firstly, with the help of binary scheduling variables, the joint design problem is formulated as a mixed-integer non-linear programming problem such that it facilitates the joint update of scheduling and precoding variables. Further, useful reformulations are proposed to reveal the hidden difference-of-convex/concave structure of the problem. Thereafter, we propose a convex-concave procedure based iterative algorithm with convergence guarantees to a stationary point. Finally, we compare different aspects namely MC-SE, SE and number of scheduled users through Monte-Carlo simulations.
{"title":"Joint User Scheduling, and Precoding for Multicast Spectral Efficiency in Multigroup Multicast Systems","authors":"Ashok Bandi, R. B. S. Mysore, S. Chatzinotas, B. Ottersten","doi":"10.1109/SPCOM50965.2020.9179596","DOIUrl":"https://doi.org/10.1109/SPCOM50965.2020.9179596","url":null,"abstract":"This paper studies the joint design of user scheduling and precoding for the maximization of spectral efficiency (SE) for a multigroup multicast scenario in multiuser MISO downlink channels. Noticing that the existing definition of SE fails to account for group sizes, a new metric called multicast spectral efficiency (MC-SE) is proposed. In this context, the joint design is considered for the maximization of MC-SE. Firstly, with the help of binary scheduling variables, the joint design problem is formulated as a mixed-integer non-linear programming problem such that it facilitates the joint update of scheduling and precoding variables. Further, useful reformulations are proposed to reveal the hidden difference-of-convex/concave structure of the problem. Thereafter, we propose a convex-concave procedure based iterative algorithm with convergence guarantees to a stationary point. Finally, we compare different aspects namely MC-SE, SE and number of scheduled users through Monte-Carlo simulations.","PeriodicalId":208527,"journal":{"name":"2020 International Conference on Signal Processing and Communications (SPCOM)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116650142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}