Pub Date : 2018-06-06DOI: 10.1109/NCC.2018.8600134
Vinayak Ramkumar, Myna Vajha, P. V. Kumar
Projective Reed-Muller codes correspond to subcodes of the Reed-Muller code in which the polynomials being evaluated to yield codewords, are restricted to be homogeneous. The Generalized Hamming Weights (GHW) of a code C, identify for each dimension v, the smallest size of the support of a sub code of $C$ of dimension u, The GHW of a code are of interest in assessing the vulnerability of a code in a wiretap channel setting. It is also of use in bounding the state complexity of the trellis representation of the code. In prior work [1] by the same authors, a code-shortening algorithm was employed to derive upper bounds on the GHW of binary projective, Reed-Muller (PRM) codes. In the present paper, we derive a matching lower bound by adapting the proof techniques used originally for Reed-Muller (RM) codes by Wei in [2]. This results in a characterization of the GHW hierarchy of binary PRM codes.
{"title":"Determining the Generalized Hamming Weight Hierarchy of the Binary Projective Reed-Muller Code","authors":"Vinayak Ramkumar, Myna Vajha, P. V. Kumar","doi":"10.1109/NCC.2018.8600134","DOIUrl":"https://doi.org/10.1109/NCC.2018.8600134","url":null,"abstract":"Projective Reed-Muller codes correspond to subcodes of the Reed-Muller code in which the polynomials being evaluated to yield codewords, are restricted to be homogeneous. The Generalized Hamming Weights (GHW) of a code C, identify for each dimension v, the smallest size of the support of a sub code of $C$ of dimension u, The GHW of a code are of interest in assessing the vulnerability of a code in a wiretap channel setting. It is also of use in bounding the state complexity of the trellis representation of the code. In prior work [1] by the same authors, a code-shortening algorithm was employed to derive upper bounds on the GHW of binary projective, Reed-Muller (PRM) codes. In the present paper, we derive a matching lower bound by adapting the proof techniques used originally for Reed-Muller (RM) codes by Wei in [2]. This results in a characterization of the GHW hierarchy of binary PRM codes.","PeriodicalId":121544,"journal":{"name":"2018 Twenty Fourth National Conference on Communications (NCC)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116270348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-02-01DOI: 10.1109/NCC.2018.8600204
V. Viswanath, S. Alam, R. S. Kshetrimayum
Cognitive Radio Network is a form of communication where licensed frequency band of the Primary users (PUs) are made available to the Secondary user (SU) with constraint interference to the PUs. In this work, we have investigated a novel model considering interweave approach for spectrum access, with multiple primary users and single secondary user (SU). Multiple antennas have been considered at both the primary users as well as the secondary users. The activity of primary users are modeled as Poisson process. In addition, we also propose a new approach of sensing for a MIMO cognitive radio network using energy detection. The proposed method provides a closed form expression for probability of detection (Pd) and probability of false alarm (Pf) in a Multiple Input Multiple Output (MIMO) channel. Further, the throughput of secondary user as well as interference on primary users caused due to secondary user transmissions, has been computed for the proposed model.
{"title":"Spectrum Sensing and Collision with Primary Users in MIMO Cognitive Radio","authors":"V. Viswanath, S. Alam, R. S. Kshetrimayum","doi":"10.1109/NCC.2018.8600204","DOIUrl":"https://doi.org/10.1109/NCC.2018.8600204","url":null,"abstract":"Cognitive Radio Network is a form of communication where licensed frequency band of the Primary users (PUs) are made available to the Secondary user (SU) with constraint interference to the PUs. In this work, we have investigated a novel model considering interweave approach for spectrum access, with multiple primary users and single secondary user (SU). Multiple antennas have been considered at both the primary users as well as the secondary users. The activity of primary users are modeled as Poisson process. In addition, we also propose a new approach of sensing for a MIMO cognitive radio network using energy detection. The proposed method provides a closed form expression for probability of detection (Pd) and probability of false alarm (Pf) in a Multiple Input Multiple Output (MIMO) channel. Further, the throughput of secondary user as well as interference on primary users caused due to secondary user transmissions, has been computed for the proposed model.","PeriodicalId":121544,"journal":{"name":"2018 Twenty Fourth National Conference on Communications (NCC)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117144238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-02-01DOI: 10.1109/NCC.2018.8600240
Pardhu Madipalli, S. Kotta, Harish Dadi, N. Y., A. C S, A. V. Narasimhadhan
Cardiovascular diseases have been one of the leading causes of death and have been increasing in much of the developing world. Atherosclerosis, the accumulation of plaque on artery walls is the major for cardiovascular diseases. This is diagnosed by measuring the thickness of IMC of common carotid artery (CCA) in ultrasound images. In this paper, we present a completely automatic technique for segmentation of IMC in ultrasound images of CCA. The image is segmented using adaptive wind driven optimization (AWDO) technique. The denoising filter based on Bayesian least square approach and a robust enhancement technique is used in the pre-processing stage. The proposed method is evaluated on 60 ultrasound images and is compared with the state-of-the-art methods. The experimental results show that the proposed method yields better results as compared to other methods.
{"title":"Automatic Segmentation of Intima Media Complex in Common Carotid Artery using Adaptive Wind Driven Optimization","authors":"Pardhu Madipalli, S. Kotta, Harish Dadi, N. Y., A. C S, A. V. Narasimhadhan","doi":"10.1109/NCC.2018.8600240","DOIUrl":"https://doi.org/10.1109/NCC.2018.8600240","url":null,"abstract":"Cardiovascular diseases have been one of the leading causes of death and have been increasing in much of the developing world. Atherosclerosis, the accumulation of plaque on artery walls is the major for cardiovascular diseases. This is diagnosed by measuring the thickness of IMC of common carotid artery (CCA) in ultrasound images. In this paper, we present a completely automatic technique for segmentation of IMC in ultrasound images of CCA. The image is segmented using adaptive wind driven optimization (AWDO) technique. The denoising filter based on Bayesian least square approach and a robust enhancement technique is used in the pre-processing stage. The proposed method is evaluated on 60 ultrasound images and is compared with the state-of-the-art methods. The experimental results show that the proposed method yields better results as compared to other methods.","PeriodicalId":121544,"journal":{"name":"2018 Twenty Fourth National Conference on Communications (NCC)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124846666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-02-01DOI: 10.1109/NCC.2018.8600131
Vinay Verma, Preet Khaturia, N. Khanna
Many audio forensic applications would benefit from the ability to classify audio recordings, based on characteristics of the originating device, particularly in social media platforms where an enormous amount of data is posted every day. This paper utilizes passive signatures associated with the recording devices, as extracted from recorded audio itself, in the absence of any extrinsic security mechanism such as digital watermarking, to identify the source cell-phone of recorded audio. It uses device-specific information present in low as well as high-frequency regions of the recorded audio. On the only publicly available dataset in this field, MOBIPHONE, the proposed system gives a closed set accuracy of 97.2 % which matches the state of art accuracy reported for this dataset. On audio recordings which have undergone double compression, as typically happens for a recording posted on social media, the proposed system outperforms the existing methods (4% improvement in average accuracy).
{"title":"Cell-Phone Identification from Recompressed Audio Recordings","authors":"Vinay Verma, Preet Khaturia, N. Khanna","doi":"10.1109/NCC.2018.8600131","DOIUrl":"https://doi.org/10.1109/NCC.2018.8600131","url":null,"abstract":"Many audio forensic applications would benefit from the ability to classify audio recordings, based on characteristics of the originating device, particularly in social media platforms where an enormous amount of data is posted every day. This paper utilizes passive signatures associated with the recording devices, as extracted from recorded audio itself, in the absence of any extrinsic security mechanism such as digital watermarking, to identify the source cell-phone of recorded audio. It uses device-specific information present in low as well as high-frequency regions of the recorded audio. On the only publicly available dataset in this field, MOBIPHONE, the proposed system gives a closed set accuracy of 97.2 % which matches the state of art accuracy reported for this dataset. On audio recordings which have undergone double compression, as typically happens for a recording posted on social media, the proposed system outperforms the existing methods (4% improvement in average accuracy).","PeriodicalId":121544,"journal":{"name":"2018 Twenty Fourth National Conference on Communications (NCC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126084347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-02-01DOI: 10.1109/NCC.2018.8600091
D. Govind, D. Pravena, S. Ajay
The objective of the present work is to improve the epoch extraction performance from emotive speech by proposing a post processing approach to the conventional zero frequency filtering (ZFF) method using variational mode decomposition (VMD) based spectral smoothing. Due to the fast uncontrolled variations of the pitch in emotive speech signals, the reliable estimation of epochs is always challenging. In the proposed method, the spectra of the short frames of zero frequency filtered signal (ZFFS) is subjected variational mode decomposition to get component spectra in five modes. A smoothed short time spectra is then obtained by excluding the spectra from the two higher VMD modes which essentially have the high spectral variations. The modified ZFFS is then reconstructed using the sinusoidal parameters corresponding to single dominant frequency present in the smoothed spectra using VMD by parameter interpolation based sinusoidal synthesis. The resulting re-synthesized ZFFS has reduced spurious zero crossings as compared to that obtained from the conventional ZFF method for emotive speech signals. The effectiveness of the proposed VMD based spectral post processing is confirmed from the improved epoch identification rate and epoch identification accuracy across all the emotive utterances (with 7 emotions) present in German emotion speech database having simultaneous speech and electroglottographic (EGG) signal recordings. The performance of the proposed method is found to be better or comparable with the other existing ZFF based post processing methods proposed for emotive speech signals in terms of the epoch identification accuracy with respect to the corresponding reference epochs estimated from EGG signals.
{"title":"Improved Epoch Extraction Using Variational Mode Decomposition Based Spectral Smoothing of Zero Frequency Filtered Emotive Speech Signals","authors":"D. Govind, D. Pravena, S. Ajay","doi":"10.1109/NCC.2018.8600091","DOIUrl":"https://doi.org/10.1109/NCC.2018.8600091","url":null,"abstract":"The objective of the present work is to improve the epoch extraction performance from emotive speech by proposing a post processing approach to the conventional zero frequency filtering (ZFF) method using variational mode decomposition (VMD) based spectral smoothing. Due to the fast uncontrolled variations of the pitch in emotive speech signals, the reliable estimation of epochs is always challenging. In the proposed method, the spectra of the short frames of zero frequency filtered signal (ZFFS) is subjected variational mode decomposition to get component spectra in five modes. A smoothed short time spectra is then obtained by excluding the spectra from the two higher VMD modes which essentially have the high spectral variations. The modified ZFFS is then reconstructed using the sinusoidal parameters corresponding to single dominant frequency present in the smoothed spectra using VMD by parameter interpolation based sinusoidal synthesis. The resulting re-synthesized ZFFS has reduced spurious zero crossings as compared to that obtained from the conventional ZFF method for emotive speech signals. The effectiveness of the proposed VMD based spectral post processing is confirmed from the improved epoch identification rate and epoch identification accuracy across all the emotive utterances (with 7 emotions) present in German emotion speech database having simultaneous speech and electroglottographic (EGG) signal recordings. The performance of the proposed method is found to be better or comparable with the other existing ZFF based post processing methods proposed for emotive speech signals in terms of the epoch identification accuracy with respect to the corresponding reference epochs estimated from EGG signals.","PeriodicalId":121544,"journal":{"name":"2018 Twenty Fourth National Conference on Communications (NCC)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125576409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-02-01DOI: 10.1109/NCC.2018.8599942
Sai Gunaranian Pelluri, T. Sreenivas
It is well known that when there is a relative motion between the transmitter (source) and receiver, a Doppler shift is observed in the spectral content of the received signal. In this paper, we investigate a scenario where the source signal itself has an innate spectral non-stationarity in addition to the non-stationarity introduced by the source motion relative to the receiver. Using only a single microphone recording, we show that these two kinds of non-stationarities are distinguishable and propose a method of separating them. Towards this, we propose a novel scheme of simulating the signal from a source traversing an arbitrary trajectory. The proposed simulation mechanism employs band-limited interpolation and nonuniform sampling to incorporate an acoustic source generating an arbitrary band-limited signal and moving along an arbitrary trajectory.
{"title":"Disambiguation of Source and Trajectory Non-Stationarities of a Moving Acoustic Source","authors":"Sai Gunaranian Pelluri, T. Sreenivas","doi":"10.1109/NCC.2018.8599942","DOIUrl":"https://doi.org/10.1109/NCC.2018.8599942","url":null,"abstract":"It is well known that when there is a relative motion between the transmitter (source) and receiver, a Doppler shift is observed in the spectral content of the received signal. In this paper, we investigate a scenario where the source signal itself has an innate spectral non-stationarity in addition to the non-stationarity introduced by the source motion relative to the receiver. Using only a single microphone recording, we show that these two kinds of non-stationarities are distinguishable and propose a method of separating them. Towards this, we propose a novel scheme of simulating the signal from a source traversing an arbitrary trajectory. The proposed simulation mechanism employs band-limited interpolation and nonuniform sampling to incorporate an acoustic source generating an arbitrary band-limited signal and moving along an arbitrary trajectory.","PeriodicalId":121544,"journal":{"name":"2018 Twenty Fourth National Conference on Communications (NCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122635891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-02-01DOI: 10.1109/NCC.2018.8599882
Megha M. Kolhekar, H. Pillai
Permutation polynomials are a topic of research due to their applications in various areas like coding theory, cryptography and combinatorial designs. The seminal paper [1] lists many open problems in this area. There are qqpolynomials of degree < $q$ over $mathbb{F}_{q}$ and $q!$ among them are the permutation polynomials. Therefore as $q$ increases it becomes more difficult to find a permutation polynomial. In this paper, we define a notion of a “Permutation Polynomial Representative (PPR)” which can be used to reduce the search space for permutation polynomials. We give some properties of a PPR. Then we give matrix representation of a PPR; which can be used to construct the ‘compositional inverse’ of the PPR. In every application compositional inverses are required to invert the permutation established by the permutation polynomial, but finding the compositional inverse of a given permutation polynomial is not a straightforward problem. Further, we introduce a product of two vectors over $mathbb{F}_{q}$ which we call as the ‘Butterfly Product’, use it to define a $mathcal{H}$ matrix’ and provide a necessary and sufficient condition for any (q - 2) × (q - 2) matrix over $mathbb{F}_{q}$ to be the matrix representation of a permutation of non-zero elements of $mathbb{F}_{q}$. In the end we give a theorem about finding more permutation polynomials from the matrix of a PPR.
{"title":"Permutation Polynomial Representatives and their Matrices","authors":"Megha M. Kolhekar, H. Pillai","doi":"10.1109/NCC.2018.8599882","DOIUrl":"https://doi.org/10.1109/NCC.2018.8599882","url":null,"abstract":"Permutation polynomials are a topic of research due to their applications in various areas like coding theory, cryptography and combinatorial designs. The seminal paper [1] lists many open problems in this area. There are q<sup>q</sup>polynomials of degree < <tex>$q$</tex> over <tex>$mathbb{F}_{q}$</tex> and <tex>$q!$</tex> among them are the permutation polynomials. Therefore as <tex>$q$</tex> increases it becomes more difficult to find a permutation polynomial. In this paper, we define a notion of a “Permutation Polynomial Representative (PPR)” which can be used to reduce the search space for permutation polynomials. We give some properties of a PPR. Then we give matrix representation of a PPR; which can be used to construct the ‘compositional inverse’ of the PPR. In every application compositional inverses are required to invert the permutation established by the permutation polynomial, but finding the compositional inverse of a given permutation polynomial is not a straightforward problem. Further, we introduce a product of two vectors over <tex>$mathbb{F}_{q}$</tex> which we call as the ‘Butterfly Product’, use it to define a <tex>$mathcal{H}$</tex> matrix’ and provide a necessary and sufficient condition for any (q - 2) × (q - 2) matrix over <tex>$mathbb{F}_{q}$</tex> to be the matrix representation of a permutation of non-zero elements of <tex>$mathbb{F}_{q}$</tex>. In the end we give a theorem about finding more permutation polynomials from the matrix of a PPR.","PeriodicalId":121544,"journal":{"name":"2018 Twenty Fourth National Conference on Communications (NCC)","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126384546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-02-01DOI: 10.1109/NCC.2018.8600266
Akhil Singh, P. B. Gohain, S. Chaudhari
In this paper, we investigate a distributed and heterogeneous cognitive radio network (CRN), comprising of secondary users (SUs) employing either energy detector (ED) or autocorrelation detector (AD) to detect the presence or absence of an orthogonal frequency-division multiplexing (OFDM) based primary user (PU). For the considered heterogeneous cooperative spectrum sensing (CSS), the optimal soft combining rule is derived. The performance of this optimal fusion rule and different hard combining schemes such as OR, AND, and MAJOR- ITY is presented for the case when the noise variance is exactly known. Later, the effect of noise uncertainty is also presented. The proposed heterogeneous CSS is shown to combine the excellent performance of the EDs (when the noise variance is exactly known) and robustness of the ADs to the noise uncertainty.
{"title":"Cooperative Sensing of OFDM Signals using Heterogeneous Sensors","authors":"Akhil Singh, P. B. Gohain, S. Chaudhari","doi":"10.1109/NCC.2018.8600266","DOIUrl":"https://doi.org/10.1109/NCC.2018.8600266","url":null,"abstract":"In this paper, we investigate a distributed and heterogeneous cognitive radio network (CRN), comprising of secondary users (SUs) employing either energy detector (ED) or autocorrelation detector (AD) to detect the presence or absence of an orthogonal frequency-division multiplexing (OFDM) based primary user (PU). For the considered heterogeneous cooperative spectrum sensing (CSS), the optimal soft combining rule is derived. The performance of this optimal fusion rule and different hard combining schemes such as OR, AND, and MAJOR- ITY is presented for the case when the noise variance is exactly known. Later, the effect of noise uncertainty is also presented. The proposed heterogeneous CSS is shown to combine the excellent performance of the EDs (when the noise variance is exactly known) and robustness of the ADs to the noise uncertainty.","PeriodicalId":121544,"journal":{"name":"2018 Twenty Fourth National Conference on Communications (NCC)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126574465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-02-01DOI: 10.1109/NCC.2018.8600133
M. R. Chowdhury, S. De, N. Shukla, Ranendra N. Biswas
Air pollution monitoring systems with energy-intensive sensors cannot afford to sample frequently in order to maximize time between successive recharges. In this paper, we propose an energy-efficient machine learning based sensor duty-cycling method for a sensor hub receiving data from the air-pollution sensors. In particular, we demonstrate that temporal correlation of pollutant concentration can be exploited to select optimum sampling period of an energy-intensive sensor to reduce sensing energy consumption without losing much information. Support Vector Regression is used to predict the missing samples during the period sensor is turned off.
{"title":"Energy-Efficient Air Pollution Monitoring with Optimum Duty-Cycling on a Sensor Hub","authors":"M. R. Chowdhury, S. De, N. Shukla, Ranendra N. Biswas","doi":"10.1109/NCC.2018.8600133","DOIUrl":"https://doi.org/10.1109/NCC.2018.8600133","url":null,"abstract":"Air pollution monitoring systems with energy-intensive sensors cannot afford to sample frequently in order to maximize time between successive recharges. In this paper, we propose an energy-efficient machine learning based sensor duty-cycling method for a sensor hub receiving data from the air-pollution sensors. In particular, we demonstrate that temporal correlation of pollutant concentration can be exploited to select optimum sampling period of an energy-intensive sensor to reduce sensing energy consumption without losing much information. Support Vector Regression is used to predict the missing samples during the period sensor is turned off.","PeriodicalId":121544,"journal":{"name":"2018 Twenty Fourth National Conference on Communications (NCC)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131074477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-02-01DOI: 10.1109/NCC.2018.8600190
P. R, K. S. Rao
Phoneme lattices have been shown to be a good choice to encode in a compact way alternative decoding hypotheses from a speech recognition system. However the optimal phoneme sequence is produced by tracing all the phoneme identities in the lattice. This not only makes the search space of the decoder huge but also the final phoneme sequence may be prone to have false substitutions or insertion errors. In this paper, we introduce the split lattice structures that is generated by splitting the speech frames based on the manner of articulation. Spectral flatness measure (SFM) is exploited to detect the two broad manner of articulation sonorants and non-sonorants. The manner of sonorants includes broadly the vowels, the semivowels and the nasals whereas the fricatives, stop consonants and closures belong to non-sonorants. The conventional way of speech decoder produces one lattice for one test utterance. In our work, we split the speech frames into sonorants and non-sonorants based on SFM knowledge and generate split lattices. The split lattice generated are modified according to the manner of articulation in each split so as to remove the irrelevant phoneme identities in the lattice. For instance, the sonorant lattice is forced to exclude the non-sonorant phoneme identities and hence minimizing false substitutions or insertion errors. The proposed split lattice structure based on sonority detection decreased the phone error rates by nearly 0.9 % when evaluated on core TIMIT test corpus as compared to the conventional decoding involved in the state-of-the-art Deep Neural Networks (DNN).
{"title":"Manner of Articulation based Split Lattices for Phoneme Recognition","authors":"P. R, K. S. Rao","doi":"10.1109/NCC.2018.8600190","DOIUrl":"https://doi.org/10.1109/NCC.2018.8600190","url":null,"abstract":"Phoneme lattices have been shown to be a good choice to encode in a compact way alternative decoding hypotheses from a speech recognition system. However the optimal phoneme sequence is produced by tracing all the phoneme identities in the lattice. This not only makes the search space of the decoder huge but also the final phoneme sequence may be prone to have false substitutions or insertion errors. In this paper, we introduce the split lattice structures that is generated by splitting the speech frames based on the manner of articulation. Spectral flatness measure (SFM) is exploited to detect the two broad manner of articulation sonorants and non-sonorants. The manner of sonorants includes broadly the vowels, the semivowels and the nasals whereas the fricatives, stop consonants and closures belong to non-sonorants. The conventional way of speech decoder produces one lattice for one test utterance. In our work, we split the speech frames into sonorants and non-sonorants based on SFM knowledge and generate split lattices. The split lattice generated are modified according to the manner of articulation in each split so as to remove the irrelevant phoneme identities in the lattice. For instance, the sonorant lattice is forced to exclude the non-sonorant phoneme identities and hence minimizing false substitutions or insertion errors. The proposed split lattice structure based on sonority detection decreased the phone error rates by nearly 0.9 % when evaluated on core TIMIT test corpus as compared to the conventional decoding involved in the state-of-the-art Deep Neural Networks (DNN).","PeriodicalId":121544,"journal":{"name":"2018 Twenty Fourth National Conference on Communications (NCC)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131229435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}