Pub Date : 2021-07-27DOI: 10.1109/NCC52529.2021.9530143
P. Trinadh, Anoop Thomas
A client-server network in which multiple clients are connected to a single server possessing files/data through a shared error free link is considered. Each client is associated with a cache memory and demands a file from the server. The server loads the cache memory with a portion of files during off-peak hours to reduce the delivery rate during peak hours. A decentralized placement approach which is more practical for large networks is considered for filling the cache contents. In this paper, the shared caching problem in which each cache can be accessed by multiple clients is considered. A Deep Reinforcement Learning (DRL) based framework is proposed for optimizing the delivery rate of the requested contents by the users. The system is strategically modelled as a Markov decision process, to deploy our DRL agent and enable it to learn how to make decisions. The DRL agent learns to multicast coded bits from the file library of the server in such a way that the user requests are met with minimum transmissions of these coded bits. It is shown that the proposed DRL based agent outperforms the existing decentralized algorithms for the shared caching problem in terms of normalized delivery rate. For the conventional caching problem which is a special case of the shared caching problem, simulation results show that the proposed DRL agent outperforms the existing algorithms.
{"title":"A Deep Reinforcement Learning Approach for Shared Caching","authors":"P. Trinadh, Anoop Thomas","doi":"10.1109/NCC52529.2021.9530143","DOIUrl":"https://doi.org/10.1109/NCC52529.2021.9530143","url":null,"abstract":"A client-server network in which multiple clients are connected to a single server possessing files/data through a shared error free link is considered. Each client is associated with a cache memory and demands a file from the server. The server loads the cache memory with a portion of files during off-peak hours to reduce the delivery rate during peak hours. A decentralized placement approach which is more practical for large networks is considered for filling the cache contents. In this paper, the shared caching problem in which each cache can be accessed by multiple clients is considered. A Deep Reinforcement Learning (DRL) based framework is proposed for optimizing the delivery rate of the requested contents by the users. The system is strategically modelled as a Markov decision process, to deploy our DRL agent and enable it to learn how to make decisions. The DRL agent learns to multicast coded bits from the file library of the server in such a way that the user requests are met with minimum transmissions of these coded bits. It is shown that the proposed DRL based agent outperforms the existing decentralized algorithms for the shared caching problem in terms of normalized delivery rate. For the conventional caching problem which is a special case of the shared caching problem, simulation results show that the proposed DRL agent outperforms the existing algorithms.","PeriodicalId":414087,"journal":{"name":"2021 National Conference on Communications (NCC)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126418762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-27DOI: 10.1109/NCC52529.2021.9530162
A. S., A. Ramakrishnan
Sanskrit is one of the Indian languages which fares poorly, with regard to the development of language-based tools. In this work, we build a connectionist temporal classification (CTC) based end-to-end large vocabulary continuous speech recognition system for Sanskrit. To our knowledge, this is the first time an end-to-end framework is being used for automatic speech recognition in Sanskrit. A Sanskrit speech corpus with around 5.5 hours of speech data is used for training a neural network with a CTC objective. 80-dimensional mel-spectrogram together with their delta and delta-delta is used as the input features. Spectrogram augmentation techniques are used to effectively increase the amount of training data. The trained CTC acoustic model is assessed in terms of character error rate (CER) on greedy decoding. Weighted finite-state transducer (WFST) decoding is used to obtain the word level transcriptions from the character level probability distributions obtained at the output of the CTC network. The decoder WFST, which maps the CTC output characters to the words in the lexicon, is constructed by composing 3 individual finite-state transducers (FST), namely token, lexicon and grammar. Trigram models trained from a text corpus of 262338 sentences are used for language modeling in grammar FST. The system achieves a word error rate (WER) of 7.64% and a sentence error rate (SER) of 32.44% on the Sanskrit test set of 558 utterances with spectrogram augmentation and WFST decoding. Spectrogram augmentation provides an absolute improvement of 13.86% in WER.
{"title":"CTC-Based End-To-End ASR for the Low Resource Sanskrit Language with Spectrogram Augmentation","authors":"A. S., A. Ramakrishnan","doi":"10.1109/NCC52529.2021.9530162","DOIUrl":"https://doi.org/10.1109/NCC52529.2021.9530162","url":null,"abstract":"Sanskrit is one of the Indian languages which fares poorly, with regard to the development of language-based tools. In this work, we build a connectionist temporal classification (CTC) based end-to-end large vocabulary continuous speech recognition system for Sanskrit. To our knowledge, this is the first time an end-to-end framework is being used for automatic speech recognition in Sanskrit. A Sanskrit speech corpus with around 5.5 hours of speech data is used for training a neural network with a CTC objective. 80-dimensional mel-spectrogram together with their delta and delta-delta is used as the input features. Spectrogram augmentation techniques are used to effectively increase the amount of training data. The trained CTC acoustic model is assessed in terms of character error rate (CER) on greedy decoding. Weighted finite-state transducer (WFST) decoding is used to obtain the word level transcriptions from the character level probability distributions obtained at the output of the CTC network. The decoder WFST, which maps the CTC output characters to the words in the lexicon, is constructed by composing 3 individual finite-state transducers (FST), namely token, lexicon and grammar. Trigram models trained from a text corpus of 262338 sentences are used for language modeling in grammar FST. The system achieves a word error rate (WER) of 7.64% and a sentence error rate (SER) of 32.44% on the Sanskrit test set of 558 utterances with spectrogram augmentation and WFST decoding. Spectrogram augmentation provides an absolute improvement of 13.86% in WER.","PeriodicalId":414087,"journal":{"name":"2021 National Conference on Communications (NCC)","volume":"174 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115583694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-27DOI: 10.1109/NCC52529.2021.9530178
Naveenta Gautam, A. Choudhary, Brejesh Lall
Due to increase in demand of the optical fiber communication system there is a special emphasis on diagnosing ultrashort pulses. The linear and nonlinear distortions introduced during transmission gives rise to wide variety of wave dynamics. The conventional signal processing techniques being used for characterising these pulses are computationally inefficient. Since machine learning has shown improvement compared to other analytical methods, we present a comparative study of different neural network (NN) architectures to predict the output pulse profile after transmission through highly nonlinear and dispersive fibers. The trained network has the ability to learn the mapping from a set of input and output pulses for the case of both known and unknown fibers. Since each NN has its own advantages and disadvantages, we to the best of our knowledge, present a comprehensive analysis of six different NN architectures (i) fully connected NN (FCNN), (ii) cascade forward NN (CaNN), (iii) Convolutional NN (CNN), (iv) long short term memory network (LSTM), (v) bidirectional LSTM (BiLSTM) and (vi) gated recurrent unit (GRU) for the first time.
{"title":"Neural Networks for predicting optical pulse propagation through highly nonlinear fibers","authors":"Naveenta Gautam, A. Choudhary, Brejesh Lall","doi":"10.1109/NCC52529.2021.9530178","DOIUrl":"https://doi.org/10.1109/NCC52529.2021.9530178","url":null,"abstract":"Due to increase in demand of the optical fiber communication system there is a special emphasis on diagnosing ultrashort pulses. The linear and nonlinear distortions introduced during transmission gives rise to wide variety of wave dynamics. The conventional signal processing techniques being used for characterising these pulses are computationally inefficient. Since machine learning has shown improvement compared to other analytical methods, we present a comparative study of different neural network (NN) architectures to predict the output pulse profile after transmission through highly nonlinear and dispersive fibers. The trained network has the ability to learn the mapping from a set of input and output pulses for the case of both known and unknown fibers. Since each NN has its own advantages and disadvantages, we to the best of our knowledge, present a comprehensive analysis of six different NN architectures (i) fully connected NN (FCNN), (ii) cascade forward NN (CaNN), (iii) Convolutional NN (CNN), (iv) long short term memory network (LSTM), (v) bidirectional LSTM (BiLSTM) and (vi) gated recurrent unit (GRU) for the first time.","PeriodicalId":414087,"journal":{"name":"2021 National Conference on Communications (NCC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115357322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-27DOI: 10.1109/NCC52529.2021.9530120
Yegnanarayana Bayya, B. N. Murthy, J. Satyanarayana, V. Pannala, Nivedita Chennupati
Estimation of time delay from the received broadband signals like speech, collected at two or more spatially distributed microphones, has many applications. Methods like the cross-correlation of the signals directly and generalized crosscorrelation based methods (GCC and GCC-PHAT) have been used for several years to estimate the time delay. Performance of these methods degrades due to noise, multi-path reflections, and reverberation in a practical environment, like a live room. The estimated time delay is usually robust due to the averaging effect of the delay obtained over several frames in an utterance of a few seconds. The robustness is affected if the varying time delay of a moving speaker is desired. A smaller duration for averaging results in errors in the estimation of the time delay, and a longer duration for averaging results in loss of accuracy. Since the single frequency filtering (SFF) based analysis provides an estimation of the instantaneous time delay, it is possible to study the trade off between accuracy and robustness. This paper examines this trade-off in determining the number of stationary speakers from mixed signals and in tracking a speaker moving along a straight line path and along a circular path. The results are illustrated for actual data collected in a live room.
{"title":"Robustness and Accuracy of Time Delay Estimation in a Live Room","authors":"Yegnanarayana Bayya, B. N. Murthy, J. Satyanarayana, V. Pannala, Nivedita Chennupati","doi":"10.1109/NCC52529.2021.9530120","DOIUrl":"https://doi.org/10.1109/NCC52529.2021.9530120","url":null,"abstract":"Estimation of time delay from the received broadband signals like speech, collected at two or more spatially distributed microphones, has many applications. Methods like the cross-correlation of the signals directly and generalized crosscorrelation based methods (GCC and GCC-PHAT) have been used for several years to estimate the time delay. Performance of these methods degrades due to noise, multi-path reflections, and reverberation in a practical environment, like a live room. The estimated time delay is usually robust due to the averaging effect of the delay obtained over several frames in an utterance of a few seconds. The robustness is affected if the varying time delay of a moving speaker is desired. A smaller duration for averaging results in errors in the estimation of the time delay, and a longer duration for averaging results in loss of accuracy. Since the single frequency filtering (SFF) based analysis provides an estimation of the instantaneous time delay, it is possible to study the trade off between accuracy and robustness. This paper examines this trade-off in determining the number of stationary speakers from mixed signals and in tracking a speaker moving along a straight line path and along a circular path. The results are illustrated for actual data collected in a live room.","PeriodicalId":414087,"journal":{"name":"2021 National Conference on Communications (NCC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132318947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-27DOI: 10.1109/NCC52529.2021.9530166
H. K, V. Sukumaran, C. Singh
We study medium access in FMCW radar networks. In particular, we propose a slotted ALOHA protocol and analyze probability of interference between radars as a function of system parameters such as total number of radars, chirp duration, number of chirps in a repetition interval, as well as medium access probability. We see that the characteristics of interference probability in FMCW radar networks are very different from those in wireless communication networks. We observe that interference probability also depends on the number of chirps in a radar packet. We further propose a notion of throughput and study its variation with various parameters. We perform extensive simulations to verify our analytical results.
{"title":"Slotted Aloha for FMCW Radar Networks","authors":"H. K, V. Sukumaran, C. Singh","doi":"10.1109/NCC52529.2021.9530166","DOIUrl":"https://doi.org/10.1109/NCC52529.2021.9530166","url":null,"abstract":"We study medium access in FMCW radar networks. In particular, we propose a slotted ALOHA protocol and analyze probability of interference between radars as a function of system parameters such as total number of radars, chirp duration, number of chirps in a repetition interval, as well as medium access probability. We see that the characteristics of interference probability in FMCW radar networks are very different from those in wireless communication networks. We observe that interference probability also depends on the number of chirps in a radar packet. We further propose a notion of throughput and study its variation with various parameters. We perform extensive simulations to verify our analytical results.","PeriodicalId":414087,"journal":{"name":"2021 National Conference on Communications (NCC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126863131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-27DOI: 10.1109/NCC52529.2021.9530083
Karthika Mohan, Suvra Shekhar Das
Spectrally efficient wireless communication systems are designed to dynamically adapt transmission rate and power by comparing the instantaneous signal to interference plus noise ratio (SINR) samples against SINR switching thresholds, which can be designed a priori using perfect knowledge of SINR distribution. Nevertheless, a priori perfect knowledge of SINR distribution is hardly feasible in any practical operating system for the following reasons. The operating condition is not stationary owing to mobility, while it is impossible to have prior knowledge of all possible operating conditions. Even if the set of operating conditions is defined, identifying the current operating scenario is not a trivial task either. Considering the above challenges, dynamic estimation of SINR distribution is one possible way out. The challenge encountered in such estimation is that only quantized values of SINR are available. Leveraging the well-accepted log-normal approximation of the signal to interference plus noise ratio (SINR) distribution, we develop a mechanism to obtain parametric estimates of the distribution of SINR using quantized data in this work. The proposed method can be used at the transmitter and the receiver in the same manner with appropriate modifications to signalling protocols and algorithm parameter values. We demonstrate through numerical analysis that the proposed method can help achieve near-ideal average spectral efficiency (ASE).
{"title":"Parametric Estimation of SINR Distribution using Quantized SINR Samples for Maximizing Average Spectral Efficiency","authors":"Karthika Mohan, Suvra Shekhar Das","doi":"10.1109/NCC52529.2021.9530083","DOIUrl":"https://doi.org/10.1109/NCC52529.2021.9530083","url":null,"abstract":"Spectrally efficient wireless communication systems are designed to dynamically adapt transmission rate and power by comparing the instantaneous signal to interference plus noise ratio (SINR) samples against SINR switching thresholds, which can be designed a priori using perfect knowledge of SINR distribution. Nevertheless, a priori perfect knowledge of SINR distribution is hardly feasible in any practical operating system for the following reasons. The operating condition is not stationary owing to mobility, while it is impossible to have prior knowledge of all possible operating conditions. Even if the set of operating conditions is defined, identifying the current operating scenario is not a trivial task either. Considering the above challenges, dynamic estimation of SINR distribution is one possible way out. The challenge encountered in such estimation is that only quantized values of SINR are available. Leveraging the well-accepted log-normal approximation of the signal to interference plus noise ratio (SINR) distribution, we develop a mechanism to obtain parametric estimates of the distribution of SINR using quantized data in this work. The proposed method can be used at the transmitter and the receiver in the same manner with appropriate modifications to signalling protocols and algorithm parameter values. We demonstrate through numerical analysis that the proposed method can help achieve near-ideal average spectral efficiency (ASE).","PeriodicalId":414087,"journal":{"name":"2021 National Conference on Communications (NCC)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125254961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-27DOI: 10.1109/NCC52529.2021.9530108
Moyukh Laha, R. Datta
Many vehicular applications require data dissemination where all the vehicles in a specific region of concern are the intended receivers of particular messages. Such dissemination is challenging due to vehicular networks' distinct properties, such as high mobility, low communication range, intermittent connectivity, and diverse variations in their topology. In this work, we propose a Local Centrality-based Dissemination scheme for vehicular networks based on V2V communication. To this end, each vehicle node gathers their two-hop neighborhood information to identify the super-spreader nodes that continue the dissemination by rebroadcasting the receiving messages. In contrast, the rest of the nodes remain quiet. We validate the performance of our proposed scheme with real vehicular data. Extensive simulation results reveal the superior performance of our proposed scheme in terms of higher and quicker coverage with fewer redundant transmissions than the state-of-the-art data dissemination protocols.
{"title":"Efficient Message Dissemination in V2V Network: A Local Centrality-based Approach","authors":"Moyukh Laha, R. Datta","doi":"10.1109/NCC52529.2021.9530108","DOIUrl":"https://doi.org/10.1109/NCC52529.2021.9530108","url":null,"abstract":"Many vehicular applications require data dissemination where all the vehicles in a specific region of concern are the intended receivers of particular messages. Such dissemination is challenging due to vehicular networks' distinct properties, such as high mobility, low communication range, intermittent connectivity, and diverse variations in their topology. In this work, we propose a Local Centrality-based Dissemination scheme for vehicular networks based on V2V communication. To this end, each vehicle node gathers their two-hop neighborhood information to identify the super-spreader nodes that continue the dissemination by rebroadcasting the receiving messages. In contrast, the rest of the nodes remain quiet. We validate the performance of our proposed scheme with real vehicular data. Extensive simulation results reveal the superior performance of our proposed scheme in terms of higher and quicker coverage with fewer redundant transmissions than the state-of-the-art data dissemination protocols.","PeriodicalId":414087,"journal":{"name":"2021 National Conference on Communications (NCC)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121571562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-27DOI: 10.1109/NCC52529.2021.9530180
Ravindra Mohan Nigam, P. M. Pradhan
In aeronautical telemetry Alamouti encoded Shaped Offset Quadrature Phase Shift Keying - Telemetry Group (SOQPSK-TG) modulated signal is used to resolve “Two antenna problem” due to simultaneous transmission from the two onboard antennae. Detection of this signal at the receiver requires estimation of channel impairments (channel gains, time delays and frequency offset). The Maximum Likelihood Sequence Estimation (MLSE) based decoder of Space Time Coding (STC) encoded SOQPSK- TG signal requires 512 states, which is too complex for implementation. In this paper, pulse shaping is performed on SOQPSK- TG frequency pulse to reduce the pulse duration. Pulse of length 2 bit interval is found to be approximately matching the SOQPSK- TG characteristic while reducing the decoder complexity to 8 number of states. Subsequently parameter estimation is carried out for STC encoded SOQPSK-2T by Maximum Likelihood (ML) estimation method. The performances of proposed pulse shaping functions are compared with those of SOQPSK- TG and Feher's Quadrature Phase Shift Keying (FQPSK-JR), and are found to be superior for aeronautical telemetry display and level flight operations.
{"title":"Development of Improved SOQPSK based Data Transmission over Aeronautical Telemetry Link","authors":"Ravindra Mohan Nigam, P. M. Pradhan","doi":"10.1109/NCC52529.2021.9530180","DOIUrl":"https://doi.org/10.1109/NCC52529.2021.9530180","url":null,"abstract":"In aeronautical telemetry Alamouti encoded Shaped Offset Quadrature Phase Shift Keying - Telemetry Group (SOQPSK-TG) modulated signal is used to resolve “Two antenna problem” due to simultaneous transmission from the two onboard antennae. Detection of this signal at the receiver requires estimation of channel impairments (channel gains, time delays and frequency offset). The Maximum Likelihood Sequence Estimation (MLSE) based decoder of Space Time Coding (STC) encoded SOQPSK- TG signal requires 512 states, which is too complex for implementation. In this paper, pulse shaping is performed on SOQPSK- TG frequency pulse to reduce the pulse duration. Pulse of length 2 bit interval is found to be approximately matching the SOQPSK- TG characteristic while reducing the decoder complexity to 8 number of states. Subsequently parameter estimation is carried out for STC encoded SOQPSK-2T by Maximum Likelihood (ML) estimation method. The performances of proposed pulse shaping functions are compared with those of SOQPSK- TG and Feher's Quadrature Phase Shift Keying (FQPSK-JR), and are found to be superior for aeronautical telemetry display and level flight operations.","PeriodicalId":414087,"journal":{"name":"2021 National Conference on Communications (NCC)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121731984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-27DOI: 10.1109/NCC52529.2021.9530135
P. Kachare, P. C. Pandey, Vishal Mane, H. Dasgupta, K. Nataraj
Hearing-impaired children lack auditory feedback and experience difficulty in acquiring speech production. They can benefit from speech training aids providing visual feedback of key articulatory efforts. Requirements for such aid are developed through extended interaction with speech therapists and special education teachers. The aid is developed as a PC-based app for ease of distribution and use. It has two panels to enable comparison between the articulatory efforts of the learner and a teacher or a pre-recorded reference speaker. The visual feedback for an utterance is based on the information obtained from its audiovisual recording. The speech signal is processed to obtain time-varying vocal tract shape, level, and pitch. The vocal tract shape estimation uses LP-based inverse filtering, and the pitch estimation uses glottal epoch detection using Hilbert envelope for excitation enhancement. Visual feedback comprises a variable-rate animation of the lateral vocal tract shape, level, and pitch, and time-aligned display of the frontal view of the speaker's face along with playback of time-scaled speech signal. The graphical user interface and modules for signal acquisition, speech analysis, and time-scaled animation are developed and integrated using Python. The app has been tested for its functionalities and user interface and needs to be evaluated for speech training of hearing-impaired children. It may also be useful to second-language learners in improving the pronunciation of unfamiliar sounds.
{"title":"Speech-Training Aid with Time-Scaled Audiovisual Feedback of Articulatory Efforts","authors":"P. Kachare, P. C. Pandey, Vishal Mane, H. Dasgupta, K. Nataraj","doi":"10.1109/NCC52529.2021.9530135","DOIUrl":"https://doi.org/10.1109/NCC52529.2021.9530135","url":null,"abstract":"Hearing-impaired children lack auditory feedback and experience difficulty in acquiring speech production. They can benefit from speech training aids providing visual feedback of key articulatory efforts. Requirements for such aid are developed through extended interaction with speech therapists and special education teachers. The aid is developed as a PC-based app for ease of distribution and use. It has two panels to enable comparison between the articulatory efforts of the learner and a teacher or a pre-recorded reference speaker. The visual feedback for an utterance is based on the information obtained from its audiovisual recording. The speech signal is processed to obtain time-varying vocal tract shape, level, and pitch. The vocal tract shape estimation uses LP-based inverse filtering, and the pitch estimation uses glottal epoch detection using Hilbert envelope for excitation enhancement. Visual feedback comprises a variable-rate animation of the lateral vocal tract shape, level, and pitch, and time-aligned display of the frontal view of the speaker's face along with playback of time-scaled speech signal. The graphical user interface and modules for signal acquisition, speech analysis, and time-scaled animation are developed and integrated using Python. The app has been tested for its functionalities and user interface and needs to be evaluated for speech training of hearing-impaired children. It may also be useful to second-language learners in improving the pronunciation of unfamiliar sounds.","PeriodicalId":414087,"journal":{"name":"2021 National Conference on Communications (NCC)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129874823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-27DOI: 10.1109/NCC52529.2021.9530188
Shraddha Tripathi, O. Pandey, R. Hegde
Reliable and low-latency data transfer to the cell edge users (CEUs) of 5G edge-network is a challenging problem. Solution to this problem can enable real-time applications such as remote health-monitoring of patients and target tracking in battle field. In this work, a novel method for optimal data transfer over UAV-assisted edge-networks is proposed. The proposed method utilizes unmanned aerial vehicle (UAV) as a relay node for data transfer between ground base station (GBS) and the CEUs. Additionally, UAV node is designed to be able to perform 3D beamforming leading to improved signal to interference noise ratio (SINR) and high throughput. To obtain optimal data transfer, the CEUs are first geographically clustered using a distance criterion. Subsequently, a joint optimization problem that aims to find the UAV trajectory and the beamforming downtilt angles, while applying minimum latency and maximum throughput constraints is formulated. This joint optimization problem is solved by using an iterative approach. Extensive simulations are then performed to validate this method for network latency and throughput under varying network conditions. The results are motivating enough for the method to be used in medium and large scale edge networks.
{"title":"Optimal Data Transfer in UAV-Assisted Edge-Networks Using 3D Beamforming","authors":"Shraddha Tripathi, O. Pandey, R. Hegde","doi":"10.1109/NCC52529.2021.9530188","DOIUrl":"https://doi.org/10.1109/NCC52529.2021.9530188","url":null,"abstract":"Reliable and low-latency data transfer to the cell edge users (CEUs) of 5G edge-network is a challenging problem. Solution to this problem can enable real-time applications such as remote health-monitoring of patients and target tracking in battle field. In this work, a novel method for optimal data transfer over UAV-assisted edge-networks is proposed. The proposed method utilizes unmanned aerial vehicle (UAV) as a relay node for data transfer between ground base station (GBS) and the CEUs. Additionally, UAV node is designed to be able to perform 3D beamforming leading to improved signal to interference noise ratio (SINR) and high throughput. To obtain optimal data transfer, the CEUs are first geographically clustered using a distance criterion. Subsequently, a joint optimization problem that aims to find the UAV trajectory and the beamforming downtilt angles, while applying minimum latency and maximum throughput constraints is formulated. This joint optimization problem is solved by using an iterative approach. Extensive simulations are then performed to validate this method for network latency and throughput under varying network conditions. The results are motivating enough for the method to be used in medium and large scale edge networks.","PeriodicalId":414087,"journal":{"name":"2021 National Conference on Communications (NCC)","volume":"26 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120918621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}