Yifan Guo, Jianxun Zhang, Yuting Lin, Jie Zhang, Bowen Li
With the increase of large shopping malls, there are many large parking spaces in complex environments, which increases the difficulty of finding vehicles in such environments. To upgrade the consumer's experience, some car manufacturers have proposed detecting parking space numbers in parking spaces. The detection of parking space number in parking spaces in complex environments has problems such as the diversified background of parking space numbers, tilted direction of parking space numbers, and small parking space number scale. Since no scholar has proposed a high-performance method for such problems, a parking space number detection model based on the multi-branch convolutional attention is presented. Firstly, using ResNet50 as the backbone network, a multi-branch convolutional structure is proposed in the backbone network, which aims to process and fuse the feature map through three parallel branches, and enhance the network to represent ability information by convolutional attention, learn global features to selectively strengthen the features containing helpful information, and improve the ability of the model to detect the parking space number area. Secondly, a high-level feature enhancement unit is designed to adjust the features channel by channel, obtain more spatial correlation, and reduce the loss of information in the process of feature map generation. The data results of the model on the parking space number dataset CCAG show that the precision, recall, and F-measure are 84.8%, 84.6%, and 84.7%, respectively, which has certain advantages for parking space number detection.
{"title":"Parking space number detection with multi-branch convolution attention","authors":"Yifan Guo, Jianxun Zhang, Yuting Lin, Jie Zhang, Bowen Li","doi":"10.1049/sil2.12226","DOIUrl":"https://doi.org/10.1049/sil2.12226","url":null,"abstract":"<p>With the increase of large shopping malls, there are many large parking spaces in complex environments, which increases the difficulty of finding vehicles in such environments. To upgrade the consumer's experience, some car manufacturers have proposed detecting parking space numbers in parking spaces. The detection of parking space number in parking spaces in complex environments has problems such as the diversified background of parking space numbers, tilted direction of parking space numbers, and small parking space number scale. Since no scholar has proposed a high-performance method for such problems, a parking space number detection model based on the multi-branch convolutional attention is presented. Firstly, using ResNet50 as the backbone network, a multi-branch convolutional structure is proposed in the backbone network, which aims to process and fuse the feature map through three parallel branches, and enhance the network to represent ability information by convolutional attention, learn global features to selectively strengthen the features containing helpful information, and improve the ability of the model to detect the parking space number area. Secondly, a high-level feature enhancement unit is designed to adjust the features channel by channel, obtain more spatial correlation, and reduce the loss of information in the process of feature map generation. The data results of the model on the parking space number dataset CCAG show that the precision, recall, and F-measure are 84.8%, 84.6%, and 84.7%, respectively, which has certain advantages for parking space number detection.</p>","PeriodicalId":56301,"journal":{"name":"IET Signal Processing","volume":"17 6","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sil2.12226","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50134028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Schizophrenia is a disease that affects approximately 1% of the population. Its early accurate diagnosis is of vital importance to apply adequate therapy as soon as possible. We present a Statistical Discriminant Diagnosing (SDD) system that discriminates between healthy controls and subjects and that supports diagnosis by a medical professional. The system works with {feature, electrode} EEG pairs which are selected based on the statistical significance of the p-values computed over the brain P3b wave. A bank of evoked potential pre-processed and filtered EEG signals is recorded during an auditory odd-ball (AOD) task and serves as input to the SDD system. These EEG signals comprise 20 features and 17 electrodes, both in time (t) and frequency (f) domain. The relevance of the Parieto-Temporal region is shown, allowing us to identify highly discriminant {feature, electrode} pairs in the detection of schizophrenia, resulting lower p-values in both Right and Left Hemispheres, as well as in Parieto-Temporal EEG signals. See for instance, the {PSE, P4} pair, with p-value = 0.00003 for (parametric) t Student and p-value = 0.00019 for (nonparametric) U Mann-Whitney tests, both under the 15 Hz cutoff frequency of a low pass EEG preprocessing filter. The relevance of this pair is in agreement with previously published related results. The proposed SDD system may provide the human expert (psychiatrist) with an objective complimentary information to help in the early diagnosis of schizophrenia.
{"title":"A discriminant analysis of the P3b wave with electroencephalogram by feature-electrode pairs in schizophrenia diagnosis","authors":"Juan I. Arribas, Luis M. San-José-Revuelta","doi":"10.1049/sil2.12230","DOIUrl":"https://doi.org/10.1049/sil2.12230","url":null,"abstract":"<p>Schizophrenia is a disease that affects approximately 1% of the population. Its early accurate diagnosis is of vital importance to apply adequate therapy as soon as possible. We present a Statistical Discriminant Diagnosing (SDD) system that discriminates between healthy controls and subjects and that supports diagnosis by a medical professional. The system works with {<i>feature</i>, <i>electrode</i>} EEG pairs which are selected based on the statistical significance of the <i>p</i>-values computed over the brain P3b wave. A bank of evoked potential pre-processed and filtered EEG signals is recorded during an auditory odd-ball (AOD) task and serves as input to the SDD system. These EEG signals comprise 20 features and 17 electrodes, both in time (<i>t</i>) and frequency (<i>f</i>) domain. The relevance of the Parieto-Temporal region is shown, allowing us to identify highly discriminant {<i>feature</i>, <i>electrode</i>} pairs in the detection of schizophrenia, resulting lower <i>p</i>-values in both Right and Left Hemispheres, as well as in Parieto-Temporal EEG signals. See for instance, the {<i>PSE</i>, <i>P4</i>} pair, with <i>p</i>-value = 0.00003 for (parametric) <i>t</i> Student and <i>p</i>-value = 0.00019 for (nonparametric) <i>U</i> Mann-Whitney tests, both under the 15 Hz cutoff frequency of a low pass EEG preprocessing filter. The relevance of this pair is in agreement with previously published related results. The proposed SDD system may provide the human expert (psychiatrist) with an objective complimentary information to help in the early diagnosis of schizophrenia.</p>","PeriodicalId":56301,"journal":{"name":"IET Signal Processing","volume":"17 6","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sil2.12230","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50134027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenjing Zhou, Mingwei Shen, Min Xu, Guodong Han, Yudong Zhang
In this paper, a new sparsity-optimised Farrow structure variable fractional delay (SFS-VFD) filter is proposed to address the aperture effect in wideband array. Our method is based on coefficient (anti-)symmetry and optimises the number and orders of its sub-filters, greatly reducing the non-zero coefficients. The established cost function is formulated as a parametric minimisation problem with multiple regularisation constraints, and solved by the modified three-block alternating direction multiplier method (MTB-ADMM), which is improved by introducing core variable correction items to ensure stable and fast convergence. Experimental results show that the SFS-VFD filter reduces the complexity of the system by decreasing the use of multipliers and adders while ensuring high delay accuracy. In wideband array, the SFS-VFD filter effectively corrects the aperture effect and achieves precise beam pointing.
{"title":"Sparsity-optimised farrow structure variable fractional delay filter for wideband array","authors":"Wenjing Zhou, Mingwei Shen, Min Xu, Guodong Han, Yudong Zhang","doi":"10.1049/sil2.12228","DOIUrl":"https://doi.org/10.1049/sil2.12228","url":null,"abstract":"<p>In this paper, a new sparsity-optimised Farrow structure variable fractional delay (SFS-VFD) filter is proposed to address the aperture effect in wideband array. Our method is based on coefficient (anti-)symmetry and optimises the number and orders of its sub-filters, greatly reducing the non-zero coefficients. The established cost function is formulated as a parametric minimisation problem with multiple regularisation constraints, and solved by the modified three-block alternating direction multiplier method (MTB-ADMM), which is improved by introducing core variable correction items to ensure stable and fast convergence. Experimental results show that the SFS-VFD filter reduces the complexity of the system by decreasing the use of multipliers and adders while ensuring high delay accuracy. In wideband array, the SFS-VFD filter effectively corrects the aperture effect and achieves precise beam pointing.</p>","PeriodicalId":56301,"journal":{"name":"IET Signal Processing","volume":"17 6","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sil2.12228","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50120252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With a focus on integrated sensing, communication, and computation (ISCC) systems, multiple sensor devices collect information of different objects and upload it to data processing servers for fusion. Appearance gaps in composite images caused by distinct capture conditions can degrade the visual quality and affect the accuracy of other image processing and analysis results. The authors propose a fused-image harmonisation method that aims to eliminate appearance gaps among different objects. First, the authors modify a lightweight image harmonisation backbone and combined it with a pretrained segmentation model, in which the extracted semantic features were fed to both the encoder and decoder. Then the authors implement a semantic-related background-to-foreground style transfer by leveraging spatial separation adaptive instance normalisation (SAIN). To better preserve the input semantic information, the authors design a simple and effective semantic-aware adaptive denormalisation (SADE) module. Experimental results demonstrate that the authors’ proposed method achieves competitive performance on the iHarmony4 dataset and benefits from the harmonisation of fused images with incompatible appearance gaps.
{"title":"Semantic-aware visual consistency network for fused image harmonisation","authors":"Huayan Yu, Hai Huang, Yueyan Zhu, Aoran Chen","doi":"10.1049/sil2.12219","DOIUrl":"https://doi.org/10.1049/sil2.12219","url":null,"abstract":"<p>With a focus on integrated sensing, communication, and computation (ISCC) systems, multiple sensor devices collect information of different objects and upload it to data processing servers for fusion. Appearance gaps in composite images caused by distinct capture conditions can degrade the visual quality and affect the accuracy of other image processing and analysis results. The authors propose a fused-image harmonisation method that aims to eliminate appearance gaps among different objects. First, the authors modify a lightweight image harmonisation backbone and combined it with a pretrained segmentation model, in which the extracted semantic features were fed to both the encoder and decoder. Then the authors implement a semantic-related background-to-foreground style transfer by leveraging spatial separation adaptive instance normalisation (SAIN). To better preserve the input semantic information, the authors design a simple and effective semantic-aware adaptive denormalisation (SADE) module. Experimental results demonstrate that the authors’ proposed method achieves competitive performance on the iHarmony4 dataset and benefits from the harmonisation of fused images with incompatible appearance gaps.</p>","PeriodicalId":56301,"journal":{"name":"IET Signal Processing","volume":"17 6","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sil2.12219","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50144934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The range and azimuth information of a target can be obtained after coherent pulse accumulation of the traditional multiframe stepped-frequency (SF) synthesis wideband echo and spectrum analysis, and high-resolution two-dimensional imaging of the target can be achieved. However, the accumulation of a certain number of pulses requires a long beam dwell time, which cannot meet real-time imaging requirements for high-speed radar moving platforms. To solve the above problems, a scanning imaging mode is proposed by combining forward-looking imaging and scanning imaging, and a target echo signal model with the structure of scanning stepped-frequency is constructed. The SF pulses are grouped and transmitted according to the scanning order, and the echo pulses are sorted and reorganised. After the timing compensation and range Doppler coupling compensation are completed, the target is located and projected. The proposed imaging mode can achieve high-resolution scanning forward-looking imaging and can basically attain an azimuth resolution of approximately 0.1° within the forward-looking scanning range. This imaging mode has higher real-time performance and a larger target imaging range than the traditional methods. Moreover, the simulation results showed good performance via the scanning imaging method.
{"title":"Research on a forward-looking scanning imaging algorithm for a high-speed radar platform","authors":"Sijia Liu, Minghai Pan","doi":"10.1049/sil2.12221","DOIUrl":"https://doi.org/10.1049/sil2.12221","url":null,"abstract":"<p>The range and azimuth information of a target can be obtained after coherent pulse accumulation of the traditional multiframe stepped-frequency (SF) synthesis wideband echo and spectrum analysis, and high-resolution two-dimensional imaging of the target can be achieved. However, the accumulation of a certain number of pulses requires a long beam dwell time, which cannot meet real-time imaging requirements for high-speed radar moving platforms. To solve the above problems, a scanning imaging mode is proposed by combining forward-looking imaging and scanning imaging, and a target echo signal model with the structure of scanning stepped-frequency is constructed. The SF pulses are grouped and transmitted according to the scanning order, and the echo pulses are sorted and reorganised. After the timing compensation and range Doppler coupling compensation are completed, the target is located and projected. The proposed imaging mode can achieve high-resolution scanning forward-looking imaging and can basically attain an azimuth resolution of approximately 0.1° within the forward-looking scanning range. This imaging mode has higher real-time performance and a larger target imaging range than the traditional methods. Moreover, the simulation results showed good performance via the scanning imaging method.</p>","PeriodicalId":56301,"journal":{"name":"IET Signal Processing","volume":"17 6","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sil2.12221","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50144282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liangliang Li, Huaguo Zhang, Songmao Du, Tao Liang, Lin Gao
In non-cooperative scenarios, the spreading sequences or waveforms of the direct sequence spread spectrum (DSSS) signals is unknown to the receiver. This paper focuses on addressing the problem of blind estimation of the spreading waveform under multipath channels. In the scenario of direct signal path transmission, the spreading sequences can be directly obtained based on the estimated spreading waveforms. However, in the presence of multipath channels, the spreading waveform becomes the convolution of the spreading sequence and channel response, thus deconvolution should also be performed after estimating the spreading waveforms. In order to perform blind despreading and deconvolution of asynchronous multiuser DSSS signals under multipath channels, the authors propose to exploit the finite symbol characteristics of information and spreading sequences and then the iterative least square with projection method is adopted. Besides, the Cramer-Rao bound of spreading waveforms is derived in such a circumstance as a performance benchmark. The effectiveness of the proposed method is verified via simulation experiments.
{"title":"Blind despreading and deconvolution of asynchronous multiuser direct sequence spread spectrum signals under multipath channels","authors":"Liangliang Li, Huaguo Zhang, Songmao Du, Tao Liang, Lin Gao","doi":"10.1049/sil2.12220","DOIUrl":"10.1049/sil2.12220","url":null,"abstract":"<p>In non-cooperative scenarios, the spreading sequences or waveforms of the direct sequence spread spectrum (DSSS) signals is unknown to the receiver. This paper focuses on addressing the problem of blind estimation of the spreading waveform under multipath channels. In the scenario of direct signal path transmission, the spreading sequences can be directly obtained based on the estimated spreading waveforms. However, in the presence of multipath channels, the spreading waveform becomes the convolution of the spreading sequence and channel response, thus deconvolution should also be performed after estimating the spreading waveforms. In order to perform blind despreading and deconvolution of asynchronous multiuser DSSS signals under multipath channels, the authors propose to exploit the finite symbol characteristics of information and spreading sequences and then the iterative least square with projection method is adopted. Besides, the Cramer-Rao bound of spreading waveforms is derived in such a circumstance as a performance benchmark. The effectiveness of the proposed method is verified via simulation experiments.</p>","PeriodicalId":56301,"journal":{"name":"IET Signal Processing","volume":"17 5","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sil2.12220","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45710351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jincheng Yang, Shiwen Chen, Jinpeng Dong, Xiao Han
It is difficult for a receiver to intercept the signals from a radar system that can emit low probability of intercept (LPI) polyphase coded signals. The traditional Wigner Hough transform (WHT) algorithm requires a large amount of computation and takes a long time to estimate the parameters of the LPI radar polyphase coded signals. To address this problem, an iterative angle search (IAS) algorithm, which when used in combination with the WHT algorithm significantly reduces the computational cost is proposed. When the signal-to-noise ratio is in the range of −4 to 20 dB, the carrier frequency, number of subcodes, and number of cycles of the carrier frequency per subcode of five polyphase coded signals, namely, Frank, P1, P2, P3, and P4, are accurately estimated in simulation experiments. Based on the selected IAS algorithm parameters, the estimation accuracy of the proposed method is the same as that of the traditional WHT algorithm. However, the operation time is only 5.14% of that of the traditional method. The IAS algorithm has certain application prospects. Experiments indicate that the proposed algorithm provides excellent performance and can rapidly and accurately estimate the parameters of LPI polyphase codes.
{"title":"A fast Wigner Hough transform algorithm for parameter estimation of low probability of intercept radar polyphase coded signals","authors":"Jincheng Yang, Shiwen Chen, Jinpeng Dong, Xiao Han","doi":"10.1049/sil2.12224","DOIUrl":"10.1049/sil2.12224","url":null,"abstract":"<p>It is difficult for a receiver to intercept the signals from a radar system that can emit low probability of intercept (LPI) polyphase coded signals. The traditional Wigner Hough transform (WHT) algorithm requires a large amount of computation and takes a long time to estimate the parameters of the LPI radar polyphase coded signals. To address this problem, an iterative angle search (IAS) algorithm, which when used in combination with the WHT algorithm significantly reduces the computational cost is proposed. When the signal-to-noise ratio is in the range of −4 to 20 dB, the carrier frequency, number of subcodes, and number of cycles of the carrier frequency per subcode of five polyphase coded signals, namely, Frank, P1, P2, P3, and P4, are accurately estimated in simulation experiments. Based on the selected IAS algorithm parameters, the estimation accuracy of the proposed method is the same as that of the traditional WHT algorithm. However, the operation time is only 5.14% of that of the traditional method. The IAS algorithm has certain application prospects. Experiments indicate that the proposed algorithm provides excellent performance and can rapidly and accurately estimate the parameters of LPI polyphase codes.</p>","PeriodicalId":56301,"journal":{"name":"IET Signal Processing","volume":"17 5","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sil2.12224","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44194244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chia-Hung Lin, Hsiang-Yueh Lai, Ping-Tzan Huang, Pi-Yun Chen, Chien-Ming Li
Human speech signals may contain specific information regarding a speaker's characteristics, and these signals can be very useful in applications involving interactive voice response (IVR) and automatic speech recognition (ASR). For IVR and ASR applications, speaker classification into different ages and gender groups can be applied in human–machine interaction or computer-based interaction systems for customised advertisement, translation (text generation), machine dialog systems, or self-service applications. Hence, an IVR-based system dictates that ASR should function through users' voices (specific voice-frequency bands) to identify customers' age and gender and interact with a host system. In the present study, we intended to combine a pitch detection (PD)-based extractor and a voice classifier for gender identification. The Yet Another Algorithm for Pitch Tracking (YAAPT)-based PD method was designed to extract the voice fundamental frequency (F0) from non-stationary speaker's voice signals, allowing us to achieve gender identification, by distinguishing differences in F0 between adult females and males, and classify voices into adult and children groups. Then, in vowel voice signal classification, a one-dimensional (1D) convolutional neural network (CNN), consisted of a multi-round 1D kernel convolutional layer, a 1D pooling process, and a vowel classifier that could preliminary divide feature patterns into three level ranges of F0, including adult and children groups. Consequently, a classifier was used in the classification layer to identify the speakers' gender. The proposed PD-based extractor and voice classifier could reduce complexity and improve classification efficiency. Acoustic datasets were selected from the Hillenbrand database for experimental tests on 12 vowels classifications, and K-fold cross-validations were performed. The experimental results demonstrated that our approach is a very promising method to quantify the proposed classifier's performance in terms of recall (%), precision (%), accuracy (%), and F1 score.
人类语音信号可能包含有关说话人特征的特定信息,这些信号在涉及交互式语音应答(IVR)和自动语音识别(ASR)的应用中非常有用。对于IVR和ASR应用,不同年龄和性别的说话人分类可以应用于人机交互或基于计算机的交互系统中,用于定制广告、翻译(文本生成)、机器对话系统或自助服务应用。因此,基于ivr的系统要求ASR应该通过用户的声音(特定的语音频段)来识别客户的年龄和性别,并与主机系统进行交互。在本研究中,我们打算结合一个基于音高检测(PD)的提取器和一个用于性别识别的语音分类器。基于YAAPT (Yet Another Algorithm for Pitch Tracking)的PD方法从非静止说话人的语音信号中提取语音基频(F0),通过区分成年女性和男性的F0差异实现性别识别,并将声音分为成人和儿童两类。然后,在元音语音信号分类中,一维(1D)卷积神经网络(CNN)由多轮一维卷积核层、一维池化过程和元音分类器组成,该分类器可以初步将特征模式划分为F0三个级别范围,包括成人和儿童组。因此,在分类层中使用分类器来识别说话人的性别。提出的基于pd的提取器和语音分类器可以降低复杂度,提高分类效率。从Hillenbrand数据库中选择声学数据集对12个元音分类进行实验测试,并进行K-fold交叉验证。实验结果表明,我们的方法是一种非常有前途的方法,可以从召回率(%)、精度(%)、准确度(%)和F1分数等方面量化所提出的分类器的性能。
{"title":"Vowel classification with combining pitch detection and one-dimensional convolutional neural network based classifier for gender identification","authors":"Chia-Hung Lin, Hsiang-Yueh Lai, Ping-Tzan Huang, Pi-Yun Chen, Chien-Ming Li","doi":"10.1049/sil2.12216","DOIUrl":"10.1049/sil2.12216","url":null,"abstract":"<p>Human speech signals may contain specific information regarding a speaker's characteristics, and these signals can be very useful in applications involving interactive voice response (IVR) and automatic speech recognition (ASR). For IVR and ASR applications, speaker classification into different ages and gender groups can be applied in human–machine interaction or computer-based interaction systems for customised advertisement, translation (text generation), machine dialog systems, or self-service applications. Hence, an IVR-based system dictates that ASR should function through users' voices (specific voice-frequency bands) to identify customers' age and gender and interact with a host system. In the present study, we intended to combine a pitch detection (PD)-based extractor and a voice classifier for gender identification. The Yet Another Algorithm for Pitch Tracking (YAAPT)-based PD method was designed to extract the voice fundamental frequency (F<sub>0</sub>) from non-stationary speaker's voice signals, allowing us to achieve gender identification, by distinguishing differences in F<sub>0</sub> between adult females and males, and classify voices into adult and children groups. Then, in vowel voice signal classification, a one-dimensional (1D) convolutional neural network (CNN), consisted of a multi-round 1D kernel convolutional layer, a 1D pooling process, and a vowel classifier that could preliminary divide feature patterns into three level ranges of F<sub>0</sub>, including adult and children groups. Consequently, a classifier was used in the classification layer to identify the speakers' gender. The proposed PD-based extractor and voice classifier could reduce complexity and improve classification efficiency. Acoustic datasets were selected from the Hillenbrand database for experimental tests on 12 vowels classifications, and K-fold cross-validations were performed. The experimental results demonstrated that our approach is a very promising method to quantify the proposed classifier's performance in terms of recall (%), precision (%), accuracy (%), and F1 score.</p>","PeriodicalId":56301,"journal":{"name":"IET Signal Processing","volume":"17 5","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sil2.12216","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43589818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hina Ayaz, Ghulam Abbas, Muhammad Waqas, Ziaul Haq Abbas, Muhammad Bilal, Ali Nauman, Muhammad Ali Jamshed
It is anticipated that sixth-generation (6G) systems would present new security challenges while offering improved features and new directions for security in vehicular communication, which may result in the emergence of a new breed of adaptive and context-aware security protocol. Physical layer security solutions can compete for low-complexity, low-delay, low-footprint, adaptable, extensible, and context-aware security schemes by leveraging the physical layer and introducing security controls. A novel physical layer security scheme that employs the concept of radio frequency fingerprinting (RF-FP) for location estimation is proposed, wherein the RF-FP values are collected at different points with in the cell. Then, based on the estimated location, the nearest possible road-side unit for sending the information signal is located. After this, the effects on secrecy capacity (SC) and secrecy outage probability (SOP) in the presence of multiple eavesdropper per unit time are analysed. It has been shown via simulations that the proposed RF-FP scheme increases SC by up to 25% for the same signal-to-noise ratio (SNR) values as those of the benchmarks, while the SOP tends to decrease by up to 30% as compared to the benchmark scheme for the same SNR value. Thus, the proposed RF-FP-based location estimation provides much better results as compared to the existing physical layer security schemes.
{"title":"Physical layer security analysis using radio frequency-fingerprinting in cellular-V2X for 6G communication","authors":"Hina Ayaz, Ghulam Abbas, Muhammad Waqas, Ziaul Haq Abbas, Muhammad Bilal, Ali Nauman, Muhammad Ali Jamshed","doi":"10.1049/sil2.12225","DOIUrl":"10.1049/sil2.12225","url":null,"abstract":"<p>It is anticipated that sixth-generation (6G) systems would present new security challenges while offering improved features and new directions for security in vehicular communication, which may result in the emergence of a new breed of adaptive and context-aware security protocol. Physical layer security solutions can compete for low-complexity, low-delay, low-footprint, adaptable, extensible, and context-aware security schemes by leveraging the physical layer and introducing security controls. A novel physical layer security scheme that employs the concept of radio frequency fingerprinting (RF-FP) for location estimation is proposed, wherein the RF-FP values are collected at different points with in the cell. Then, based on the estimated location, the nearest possible road-side unit for sending the information signal is located. After this, the effects on secrecy capacity (SC) and secrecy outage probability (SOP) in the presence of multiple eavesdropper per unit time are analysed. It has been shown via simulations that the proposed RF-FP scheme increases SC by up to 25% for the same signal-to-noise ratio (SNR) values as those of the benchmarks, while the SOP tends to decrease by up to 30% as compared to the benchmark scheme for the same SNR value. Thus, the proposed RF-FP-based location estimation provides much better results as compared to the existing physical layer security schemes.</p>","PeriodicalId":56301,"journal":{"name":"IET Signal Processing","volume":"17 5","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sil2.12225","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48430111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article deals with peak to average power (PAPR) reduction in a single and multi-user orthogonal chirp division multiplexing (OCDM) context. Two methods for PAPR reduction based on the selection of the frequency variation (up or down) of the chirps are first presented in a single user system. The first technique consists in considering two OCDM signals generated with up and down chirps, respectively, and selecting the one offering lowest PAPR. The second PAPR reduction method is based on usual clipping, and in that case the chirp selection aims to reduce the clipping noise. An adapted receiver is presented, based on the maximum likelihood estimation of the frequency variation (up or down) of the chirp. Then, a general procedure for multi-user OCDM transmission is introduced, where a sub-band of the available bandwidth is dedicated to each user, whose frequency of the chirps varies within this sub-band. Next, the PAPR reduction techniques are generalised to this multi-user OCDM system. Moreover, a performance analysis of the first PAPR reduction method is developed, and it is shown through simulations that theoretical and numerical results match for both Nyquist rate and oversampled signals. It is also shown that the chirp selection reduces the clipping noise, and improves the bit error rate performance compared with clipping only.
{"title":"Peak to average power ratio reduction techniques based on chirp selection for single and multi-user orthogonal chirp division multiplexing system","authors":"Vincent Savaux","doi":"10.1049/sil2.12215","DOIUrl":"10.1049/sil2.12215","url":null,"abstract":"<p>This article deals with peak to average power (PAPR) reduction in a single and multi-user orthogonal chirp division multiplexing (OCDM) context. Two methods for PAPR reduction based on the selection of the frequency variation (up or down) of the chirps are first presented in a single user system. The first technique consists in considering two OCDM signals generated with up and down chirps, respectively, and selecting the one offering lowest PAPR. The second PAPR reduction method is based on usual clipping, and in that case the chirp selection aims to reduce the clipping noise. An adapted receiver is presented, based on the maximum likelihood estimation of the frequency variation (up or down) of the chirp. Then, a general procedure for multi-user OCDM transmission is introduced, where a sub-band of the available bandwidth is dedicated to each user, whose frequency of the chirps varies within this sub-band. Next, the PAPR reduction techniques are generalised to this multi-user OCDM system. Moreover, a performance analysis of the first PAPR reduction method is developed, and it is shown through simulations that theoretical and numerical results match for both Nyquist rate and oversampled signals. It is also shown that the chirp selection reduces the clipping noise, and improves the bit error rate performance compared with clipping only.</p>","PeriodicalId":56301,"journal":{"name":"IET Signal Processing","volume":"17 5","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sil2.12215","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41276464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}