Pub Date : 2022-07-07DOI: 10.3389/frsip.2022.915567
Shiyu Jiao, X. Xie, Zhiguo Ding
This study investigates the application of deep deterministic policy gradient (DDPG) to reconfigurable intelligent surface (RIS)-based unmanned aerial vehicles (UAV)-assisted non-orthogonal multiple access (NOMA) downlink networks. The deployment of UAV equipped with a RIS is important, as the UAV increases the flexibility of the RIS significantly, especially for the case of users who have no line-of-sight (LoS) path to the base station (BS). Therefore, the aim of this study is to maximize the sum-rate by jointly optimizing the power allocation of the BS, the phase shifting of the RIS, and the horizontal position of the UAV. The formulated problem is non-convex, the DDPG algorithm is utilized to solve it. The computer simulation results are provided to show the superior performance of the proposed DDPG-based algorithm.
{"title":"Deep Reinforcement Learning-Based Optimization for RIS-Based UAV-NOMA Downlink Networks (Invited Paper)","authors":"Shiyu Jiao, X. Xie, Zhiguo Ding","doi":"10.3389/frsip.2022.915567","DOIUrl":"https://doi.org/10.3389/frsip.2022.915567","url":null,"abstract":"This study investigates the application of deep deterministic policy gradient (DDPG) to reconfigurable intelligent surface (RIS)-based unmanned aerial vehicles (UAV)-assisted non-orthogonal multiple access (NOMA) downlink networks. The deployment of UAV equipped with a RIS is important, as the UAV increases the flexibility of the RIS significantly, especially for the case of users who have no line-of-sight (LoS) path to the base station (BS). Therefore, the aim of this study is to maximize the sum-rate by jointly optimizing the power allocation of the BS, the phase shifting of the RIS, and the horizontal position of the UAV. The formulated problem is non-convex, the DDPG algorithm is utilized to solve it. The computer simulation results are provided to show the superior performance of the proposed DDPG-based algorithm.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88427394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-06DOI: 10.3389/frsip.2022.868638
Omar Alotaibi, A. Papandreou-Suppappola
We consider the problem of a primary source tracking a moving object under time-varying and unknown noise conditions. We propose two methods that integrate sequential Bayesian filtering with transfer learning to improve tracking performance. Within the transfer learning framework, multiple sources are assumed to perform the same tracking task as the primary source but under different noise conditions. The first method uses Gaussian mixtures to model the measurement distribution, assuming that the measurement noise intensity at the learning sources is fixed and known a priori and the learning and primary sources are simultaneously tracking the same source. The second tracking method uses Dirichlet process mixtures to model noise parameters, assuming that the learning source measurement noise intensity is unknown. As we demonstrate, the use of Bayesian nonparametric learning does not require all sources to track the same object. The learned information can be stored and transferred to the primary source when needed. Using simulations for both high- and low-signal-to-noise ratio conditions, we demonstrate the improved primary tracking performance as the number of learning sources increases.
{"title":"Bayesian Nonparametric Learning and Knowledge Transfer for Object Tracking Under Unknown Time-Varying Conditions","authors":"Omar Alotaibi, A. Papandreou-Suppappola","doi":"10.3389/frsip.2022.868638","DOIUrl":"https://doi.org/10.3389/frsip.2022.868638","url":null,"abstract":"We consider the problem of a primary source tracking a moving object under time-varying and unknown noise conditions. We propose two methods that integrate sequential Bayesian filtering with transfer learning to improve tracking performance. Within the transfer learning framework, multiple sources are assumed to perform the same tracking task as the primary source but under different noise conditions. The first method uses Gaussian mixtures to model the measurement distribution, assuming that the measurement noise intensity at the learning sources is fixed and known a priori and the learning and primary sources are simultaneously tracking the same source. The second tracking method uses Dirichlet process mixtures to model noise parameters, assuming that the learning source measurement noise intensity is unknown. As we demonstrate, the use of Bayesian nonparametric learning does not require all sources to track the same object. The learned information can be stored and transferred to the primary source when needed. Using simulations for both high- and low-signal-to-noise ratio conditions, we demonstrate the improved primary tracking performance as the number of learning sources increases.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86305609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-06DOI: 10.3389/frsip.2022.867388
Spilios Evmorfos, Dionysios S. Kalogerias, A. Petropulu
We consider the problem of joint beamforming and discrete motion control for mobile relaying networks in dynamic channel environments. We assume a single source-destination communication pair. We adopt a general time slotted approach where, during each slot, every relay implements optimal beamforming and estimates its optimal position for the subsequent slot. We assume that the relays move in a 2D compact square region that has been discretized into a fine grid. The goal is to derive discrete motion policies for the relays, in an adaptive fashion, so that they accommodate the dynamic changes of the channel and, therefore, maximize the Signal-to-Interference + Noise Ratio (SINR) at the destination. We present two different approaches for constructing the motion policies. The first approach assumes that the channel evolves as a Gaussian process and exhibits correlation with respect to both time and space. A stochastic programming method is proposed for estimating the relay positions (and the beamforming weights) based on causal information. The stochastic program is equivalent to a set of simple subproblems and the exact evaluation of the objective of each subproblem is impossible. To tackle this we propose a surrogate of the original subproblem that pertains to the Sample Average Approximation method. We denote this approach as model-based because it adopts the assumption that the underlying correlation structure of the channels is completely known. The second method is denoted as model-free, because it adopts no assumption for the channel statistics. For the scope of this approach, we set the problem of discrete relay motion control in a dynamic programming framework. Finally we employ deep Q learning to derive the motion policies. We provide implementation details that are crucial for achieving good performance in terms of the collective SINR at the destination. GRAPHICAL ABSTRACT
{"title":"Adaptive Discrete Motion Control for Mobile Relay Networks","authors":"Spilios Evmorfos, Dionysios S. Kalogerias, A. Petropulu","doi":"10.3389/frsip.2022.867388","DOIUrl":"https://doi.org/10.3389/frsip.2022.867388","url":null,"abstract":"We consider the problem of joint beamforming and discrete motion control for mobile relaying networks in dynamic channel environments. We assume a single source-destination communication pair. We adopt a general time slotted approach where, during each slot, every relay implements optimal beamforming and estimates its optimal position for the subsequent slot. We assume that the relays move in a 2D compact square region that has been discretized into a fine grid. The goal is to derive discrete motion policies for the relays, in an adaptive fashion, so that they accommodate the dynamic changes of the channel and, therefore, maximize the Signal-to-Interference + Noise Ratio (SINR) at the destination. We present two different approaches for constructing the motion policies. The first approach assumes that the channel evolves as a Gaussian process and exhibits correlation with respect to both time and space. A stochastic programming method is proposed for estimating the relay positions (and the beamforming weights) based on causal information. The stochastic program is equivalent to a set of simple subproblems and the exact evaluation of the objective of each subproblem is impossible. To tackle this we propose a surrogate of the original subproblem that pertains to the Sample Average Approximation method. We denote this approach as model-based because it adopts the assumption that the underlying correlation structure of the channels is completely known. The second method is denoted as model-free, because it adopts no assumption for the channel statistics. For the scope of this approach, we set the problem of discrete relay motion control in a dynamic programming framework. Finally we employ deep Q learning to derive the motion policies. We provide implementation details that are crucial for achieving good performance in terms of the collective SINR at the destination. GRAPHICAL ABSTRACT","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"96 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91356544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-05DOI: 10.3389/frsip.2022.842513
Melissa M. Gray, Zhengqiao Zhao, G. Rosen
Efficiently and accurately identifying which microbes are present in a biological sample is important to medicine and biology. For example, in medicine, microbe identification allows doctors to better diagnose diseases. Two questions are essential to metagenomic analysis (the analysis of a random sampling of DNA in a patient/environment sample): How to accurately identify the microbes in samples and how to efficiently update the taxonomic classifier as new microbe genomes are sequenced and added to the reference database. To investigate how classifiers change as they train on more knowledge, we made sub-databases composed of genomes that existed in past years that served as “snapshots in time” (1999–2020) of the NCBI reference genome database. We evaluated two classification methods, Kraken 2 and CLARK with these snapshots using a real, experimental metagenomic sample from a human gut. This allowed us to measure how much of a real sample could confidently classify using these methods and as the database grows. Despite not knowing the ground truth, we could measure the concordance between methods and between years of the database within each method using a Bray-Curtis distance. In addition, we also recorded the training times of the classifiers for each snapshot. For all data for Kraken 2, we observed that as more genomes were added, more microbes from the sample were classified. CLARK had a similar trend, but in the final year, this trend reversed with the microbial variation and less unique k-mers. Also, both classifiers, while having different ways of training, generally are linear in time - but Kraken 2 has a significantly lower slope in scaling to more data.
{"title":"How Scalable Are Clade-Specific Marker K-Mer Based Hash Methods for Metagenomic Taxonomic Classification?","authors":"Melissa M. Gray, Zhengqiao Zhao, G. Rosen","doi":"10.3389/frsip.2022.842513","DOIUrl":"https://doi.org/10.3389/frsip.2022.842513","url":null,"abstract":"Efficiently and accurately identifying which microbes are present in a biological sample is important to medicine and biology. For example, in medicine, microbe identification allows doctors to better diagnose diseases. Two questions are essential to metagenomic analysis (the analysis of a random sampling of DNA in a patient/environment sample): How to accurately identify the microbes in samples and how to efficiently update the taxonomic classifier as new microbe genomes are sequenced and added to the reference database. To investigate how classifiers change as they train on more knowledge, we made sub-databases composed of genomes that existed in past years that served as “snapshots in time” (1999–2020) of the NCBI reference genome database. We evaluated two classification methods, Kraken 2 and CLARK with these snapshots using a real, experimental metagenomic sample from a human gut. This allowed us to measure how much of a real sample could confidently classify using these methods and as the database grows. Despite not knowing the ground truth, we could measure the concordance between methods and between years of the database within each method using a Bray-Curtis distance. In addition, we also recorded the training times of the classifiers for each snapshot. For all data for Kraken 2, we observed that as more genomes were added, more microbes from the sample were classified. CLARK had a similar trend, but in the final year, this trend reversed with the microbial variation and less unique k-mers. Also, both classifiers, while having different ways of training, generally are linear in time - but Kraken 2 has a significantly lower slope in scaling to more data.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88276451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-29DOI: 10.3389/frsip.2022.921973
S. Mukhopadhyay, S. Krishnan
Plausibly, the first computerized and automated electrocardiogram (ECG) signal processing algorithm was published in the literature in 1961, and since then, the number of algorithms that have been developed to-date for the detection of the QRS-complexes in ECG signals is countless. Both the digital signal processing and artificial intelligence-based techniques have been tested rigorously in many applications to achieve a high accuracy of the detection of the QRS-complexes in ECG signals. However, since the ECG signals are quasi-periodic in nature, a periodicity analysis-based technique would be an apt approach for the detection its QRS-complexes. Ramanujan filter bank (RFB)-based periodicity estimation technique is used in this research for the identification of the QRS-complexes in ECG signals. An added advantage of the proposed algorithm is that, at the instant of detection of a QRS-complex the algorithm can efficiently indicate whether it is a normal or a premature ventricular contraction or an atrial premature contraction QRS-complex. First, the ECG signal is preprocessed using Butterworth low and highpass filters followed by amplitude normalization. The normalized signal is then passed through a set of Ramanujan filters. Filtered signals from all the filters in the bank are then summed up to obtain a holistic time-domain representation of the ECG signal. Next, a Gaussian-weighted moving average filter is used to smooth the time-period-estimation data. Finally, the QRS-complexes are detected from the smoothed data using a peak-detection-based technique, and the abnormal ones are identified using a period thresholding-based technique. Performance of the proposed algorithm is tested on nine ECG databases (totaling a duration of 48.91 days) and is found to be highly competent compared to that of the state-of-the-art algorithms. To the best of our knowledge, such an RFB-based QRS-complex detection algorithm is reported here for the first time. The proposed algorithm can be adapted for the detection of other ECG waves, and also for the processing of other biomedical signals which exhibit periodic or quasi-periodic nature.
{"title":"Robust Identification of the QRS-Complexes in Electrocardiogram Signals Using Ramanujan Filter Bank-Based Periodicity Estimation Technique","authors":"S. Mukhopadhyay, S. Krishnan","doi":"10.3389/frsip.2022.921973","DOIUrl":"https://doi.org/10.3389/frsip.2022.921973","url":null,"abstract":"Plausibly, the first computerized and automated electrocardiogram (ECG) signal processing algorithm was published in the literature in 1961, and since then, the number of algorithms that have been developed to-date for the detection of the QRS-complexes in ECG signals is countless. Both the digital signal processing and artificial intelligence-based techniques have been tested rigorously in many applications to achieve a high accuracy of the detection of the QRS-complexes in ECG signals. However, since the ECG signals are quasi-periodic in nature, a periodicity analysis-based technique would be an apt approach for the detection its QRS-complexes. Ramanujan filter bank (RFB)-based periodicity estimation technique is used in this research for the identification of the QRS-complexes in ECG signals. An added advantage of the proposed algorithm is that, at the instant of detection of a QRS-complex the algorithm can efficiently indicate whether it is a normal or a premature ventricular contraction or an atrial premature contraction QRS-complex. First, the ECG signal is preprocessed using Butterworth low and highpass filters followed by amplitude normalization. The normalized signal is then passed through a set of Ramanujan filters. Filtered signals from all the filters in the bank are then summed up to obtain a holistic time-domain representation of the ECG signal. Next, a Gaussian-weighted moving average filter is used to smooth the time-period-estimation data. Finally, the QRS-complexes are detected from the smoothed data using a peak-detection-based technique, and the abnormal ones are identified using a period thresholding-based technique. Performance of the proposed algorithm is tested on nine ECG databases (totaling a duration of 48.91 days) and is found to be highly competent compared to that of the state-of-the-art algorithms. To the best of our knowledge, such an RFB-based QRS-complex detection algorithm is reported here for the first time. The proposed algorithm can be adapted for the detection of other ECG waves, and also for the processing of other biomedical signals which exhibit periodic or quasi-periodic nature.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"61 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74315974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-24DOI: 10.3389/frsip.2022.906304
Laura Bertojo, C. Néraud, W. Puech
Copy-move forgery detection is a challenging task in digital image forensics. Keypoint-based detection methods have proven to be very efficient to detect copied-moved forged areas in images. Although these methods are effective, the keypoint matching phase has a high complexity, which takes a long time to detect forgeries, especially for very large images such as 4K Ultra HD images. In this paper, we propose a new keypoint-based method with a new fast feature matching algorithm, based on the generalized two nearest-neighbor (g2NN) algorithm allowing us to greatly reduce the complexity and thus the computation time. First, we extract keypoints from the input image. After ordering them, we perform a match search restricted to a window around the current keypoint. To detect the keypoints, we propose not to use a threshold, which allows low intensity keypoint matching and a very efficient detection of copy-move forgery, even in very uniform or weakly textured areas. Then, we apply a new matching algorithm, and finally we compute the cluster thanks to the DBSCAN algorithm. Our experimental results show that the method we propose can detect copied-moved areas in forged images very accurately and with a very short computation time which allows for the fast detection of forgeries on 4K images.
{"title":"A Very Fast Copy-Move Forgery Detection Method for 4K Ultra HD Images","authors":"Laura Bertojo, C. Néraud, W. Puech","doi":"10.3389/frsip.2022.906304","DOIUrl":"https://doi.org/10.3389/frsip.2022.906304","url":null,"abstract":"Copy-move forgery detection is a challenging task in digital image forensics. Keypoint-based detection methods have proven to be very efficient to detect copied-moved forged areas in images. Although these methods are effective, the keypoint matching phase has a high complexity, which takes a long time to detect forgeries, especially for very large images such as 4K Ultra HD images. In this paper, we propose a new keypoint-based method with a new fast feature matching algorithm, based on the generalized two nearest-neighbor (g2NN) algorithm allowing us to greatly reduce the complexity and thus the computation time. First, we extract keypoints from the input image. After ordering them, we perform a match search restricted to a window around the current keypoint. To detect the keypoints, we propose not to use a threshold, which allows low intensity keypoint matching and a very efficient detection of copy-move forgery, even in very uniform or weakly textured areas. Then, we apply a new matching algorithm, and finally we compute the cluster thanks to the DBSCAN algorithm. Our experimental results show that the method we propose can detect copied-moved areas in forged images very accurately and with a very short computation time which allows for the fast detection of forgeries on 4K images.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75467445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-22DOI: 10.3389/frsip.2022.851809
Peter Naylor, Tristan Lazard, G. Bataillon, M. Laé, A. Vincent-Salomon, A. Hamy, F. Reyal, Thomas Walter
The automatic analysis of stained histological sections is becoming increasingly popular. Deep Learning is today the method of choice for the computational analysis of such data, and has shown spectacular results for large datasets for a large variety of cancer types and prediction tasks. On the other hand, many scientific questions relate to small, highly specific cohorts. Such cohorts pose serious challenges for Deep Learning, typically trained on large datasets. In this article, we propose a modification of the standard nested cross-validation procedure for hyperparameter tuning and model selection, dedicated to the analysis of small cohorts. We also propose a new architecture for the particularly challenging question of treatment prediction, and apply this workflow to the prediction of response to neoadjuvant chemotherapy for Triple Negative Breast Cancer.
{"title":"Prediction of Treatment Response in Triple Negative Breast Cancer From Whole Slide Images","authors":"Peter Naylor, Tristan Lazard, G. Bataillon, M. Laé, A. Vincent-Salomon, A. Hamy, F. Reyal, Thomas Walter","doi":"10.3389/frsip.2022.851809","DOIUrl":"https://doi.org/10.3389/frsip.2022.851809","url":null,"abstract":"The automatic analysis of stained histological sections is becoming increasingly popular. Deep Learning is today the method of choice for the computational analysis of such data, and has shown spectacular results for large datasets for a large variety of cancer types and prediction tasks. On the other hand, many scientific questions relate to small, highly specific cohorts. Such cohorts pose serious challenges for Deep Learning, typically trained on large datasets. In this article, we propose a modification of the standard nested cross-validation procedure for hyperparameter tuning and model selection, dedicated to the analysis of small cohorts. We also propose a new architecture for the particularly challenging question of treatment prediction, and apply this workflow to the prediction of response to neoadjuvant chemotherapy for Triple Negative Breast Cancer.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89652183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-03DOI: 10.3389/frsip.2022.759684
P. Sharan
The COVID-19 virus has irrevocably changed the world since 2020, and its incredible infectivity and severity have sent a majority of countries into lockdown. The virus’s incubation period can reach up to 14 days, enabling asymptomatic hosts to transmit the virus to many others in that period without realizing it, thus making containment difficult. Without actively getting tested each day, which is logistically improbable, it would be very difficult for one to know if they had the virus during the incubation period. The objective of this paper’s systematic review is to compile the different tools used to identify coughs and ascertain how artificial intelligence may be used to discriminate a cough from another type of cough. A systematic search was performed on Google Scholar, PubMed, and MIT library search engines to identify papers relevant to cough detection, discrimination, and epidemiology. A total of 204 papers have been compiled and reviewed and two datasets have been discussed. Cough recording datasets such as the ESC-50 and the FSDKaggle 2018 and 2019 datasets can be used for neural networking and identifying coughs. For cough discrimination techniques, neural networks such as k-NN, Feed Forward Neural Network, and Random Forests are used, as well as Support Vector Machine and naive Bayesian classifiers. Some methods propose hybrids. While there are many proposed ideas for cough discrimination, the method best suited for detecting COVID-19 coughs within this urgent time frame is not known. The main contribution of this review is to compile information on what has been researched on machine learning algorithms and its effectiveness in diagnosing COVID-19, as well as highlight the areas of debate and future areas for research. This review will aid future researchers in taking the best course of action for building a machine learning algorithm to discriminate COVID-19 related coughs with great accuracy and accessibility.
{"title":"Automated Discrimination of Cough in Audio Recordings: A Scoping Review","authors":"P. Sharan","doi":"10.3389/frsip.2022.759684","DOIUrl":"https://doi.org/10.3389/frsip.2022.759684","url":null,"abstract":"The COVID-19 virus has irrevocably changed the world since 2020, and its incredible infectivity and severity have sent a majority of countries into lockdown. The virus’s incubation period can reach up to 14 days, enabling asymptomatic hosts to transmit the virus to many others in that period without realizing it, thus making containment difficult. Without actively getting tested each day, which is logistically improbable, it would be very difficult for one to know if they had the virus during the incubation period. The objective of this paper’s systematic review is to compile the different tools used to identify coughs and ascertain how artificial intelligence may be used to discriminate a cough from another type of cough. A systematic search was performed on Google Scholar, PubMed, and MIT library search engines to identify papers relevant to cough detection, discrimination, and epidemiology. A total of 204 papers have been compiled and reviewed and two datasets have been discussed. Cough recording datasets such as the ESC-50 and the FSDKaggle 2018 and 2019 datasets can be used for neural networking and identifying coughs. For cough discrimination techniques, neural networks such as k-NN, Feed Forward Neural Network, and Random Forests are used, as well as Support Vector Machine and naive Bayesian classifiers. Some methods propose hybrids. While there are many proposed ideas for cough discrimination, the method best suited for detecting COVID-19 coughs within this urgent time frame is not known. The main contribution of this review is to compile information on what has been researched on machine learning algorithms and its effectiveness in diagnosing COVID-19, as well as highlight the areas of debate and future areas for research. This review will aid future researchers in taking the best course of action for building a machine learning algorithm to discriminate COVID-19 related coughs with great accuracy and accessibility.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82193573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-12DOI: 10.3389/frsip.2022.917684
P. Pérez, E. González-Sosa, Jes'us Guti'errez, Narciso García
Several technological and scientific advances have been achieved recently in the fields of immersive systems (e.g., 360-degree/multiview video systems, augmented/mixed/virtual reality systems, immersive audio-haptic systems, etc.), which are offering new possibilities to applications and services in different communication domains, such as entertainment, virtual conferencing, working meetings, social relations, healthcare, and industry. Users of these immersive technologies can explore and experience the stimuli in a more interactive and personalized way than previous technologies (e.g., 2D video). Thus, considering the new technological challenges related to these systems and the new perceptual dimensions and interaction behaviors involved, a deep understanding of the users’ Quality of Experience (QoE) is required to satisfy their demands and expectations. In this sense, it is essential to foster the research on evaluating the QoE of immersive communication systems, since this will provide useful outcomes to optimize them and to identify the factors that can deteriorate the user experience. With this aim, subjective tests are usually performed following standard methodologies (e.g., ITU recommendations), which are designed for specific technologies and services. Although numerous user studies have been already published, there are no recommendations or standards that define common testing methodologies to be applied to evaluate immersive communication systems, such as those developed for images and video. Taking this into account, a revision of the QoE evaluation methods designed for previous technologies is required to develop robust and reliable methodologies for immersive communication systems. Thus, the objective of this paper is to provide an overview of existing immersive communication systems and related user studies, which can help on the definition of basic guidelines and testing methodologies to be used when performing user tests of immersive communication systems, such as 360-degree video-based telepresence, avatar-based social VR, cooperative AR, etc.
{"title":"Emerging Immersive Communication Systems: Overview, Taxonomy, and Good Practices for QoE Assessment","authors":"P. Pérez, E. González-Sosa, Jes'us Guti'errez, Narciso García","doi":"10.3389/frsip.2022.917684","DOIUrl":"https://doi.org/10.3389/frsip.2022.917684","url":null,"abstract":"Several technological and scientific advances have been achieved recently in the fields of immersive systems (e.g., 360-degree/multiview video systems, augmented/mixed/virtual reality systems, immersive audio-haptic systems, etc.), which are offering new possibilities to applications and services in different communication domains, such as entertainment, virtual conferencing, working meetings, social relations, healthcare, and industry. Users of these immersive technologies can explore and experience the stimuli in a more interactive and personalized way than previous technologies (e.g., 2D video). Thus, considering the new technological challenges related to these systems and the new perceptual dimensions and interaction behaviors involved, a deep understanding of the users’ Quality of Experience (QoE) is required to satisfy their demands and expectations. In this sense, it is essential to foster the research on evaluating the QoE of immersive communication systems, since this will provide useful outcomes to optimize them and to identify the factors that can deteriorate the user experience. With this aim, subjective tests are usually performed following standard methodologies (e.g., ITU recommendations), which are designed for specific technologies and services. Although numerous user studies have been already published, there are no recommendations or standards that define common testing methodologies to be applied to evaluate immersive communication systems, such as those developed for images and video. Taking this into account, a revision of the QoE evaluation methods designed for previous technologies is required to develop robust and reliable methodologies for immersive communication systems. Thus, the objective of this paper is to provide an overview of existing immersive communication systems and related user studies, which can help on the definition of basic guidelines and testing methodologies to be used when performing user tests of immersive communication systems, such as 360-degree video-based telepresence, avatar-based social VR, cooperative AR, etc.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79727027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-11DOI: 10.3389/frsip.2022.856968
W. Ravenscroft, Stefan Goetze, Thomas Hain
Separation of speech mixtures in noisy and reverberant environments remains a challenging task for state-of-the-art speech separation systems. Time-domain audio speech separation networks (TasNets) are among the most commonly used network architectures for this task. TasNet models have demonstrated strong performance on typical speech separation baselines where speech is not contaminated with noise. When additive or convolutive noise is present, performance of speech separation degrades significantly. TasNets are typically constructed of an encoder network, a mask estimation network and a decoder network. The design of these networks puts the majority of the onus for enhancing the signal on the mask estimation network when used without any pre-processing of the input data or post processing of the separation network output data. Use of multihead attention (MHA) is proposed in this work as an additional layer in the encoder and decoder to help the separation network attend to encoded features that are relevant to the target speakers and conversely suppress noisy disturbances in the encoded features. As shown in this work, incorporating MHA mechanisms into the encoder network in particular leads to a consistent performance improvement across numerous quality and intelligibility metrics on a variety of acoustic conditions using the WHAMR corpus, a data-set of noisy reverberant speech mixtures. The use of MHA is also investigated in the decoder network where it is demonstrated that smaller performance improvements are consistently gained within specific model configurations. The best performing MHA models yield a mean 0.6 dB scale invariant signal-to-distortion (SISDR) improvement on noisy reverberant mixtures over a baseline 1D convolution encoder. A mean 1 dB SISDR improvement is observed on clean speech mixtures.
{"title":"Att-TasNet: Attending to Encodings in Time-Domain Audio Speech Separation of Noisy, Reverberant Speech Mixtures","authors":"W. Ravenscroft, Stefan Goetze, Thomas Hain","doi":"10.3389/frsip.2022.856968","DOIUrl":"https://doi.org/10.3389/frsip.2022.856968","url":null,"abstract":"Separation of speech mixtures in noisy and reverberant environments remains a challenging task for state-of-the-art speech separation systems. Time-domain audio speech separation networks (TasNets) are among the most commonly used network architectures for this task. TasNet models have demonstrated strong performance on typical speech separation baselines where speech is not contaminated with noise. When additive or convolutive noise is present, performance of speech separation degrades significantly. TasNets are typically constructed of an encoder network, a mask estimation network and a decoder network. The design of these networks puts the majority of the onus for enhancing the signal on the mask estimation network when used without any pre-processing of the input data or post processing of the separation network output data. Use of multihead attention (MHA) is proposed in this work as an additional layer in the encoder and decoder to help the separation network attend to encoded features that are relevant to the target speakers and conversely suppress noisy disturbances in the encoded features. As shown in this work, incorporating MHA mechanisms into the encoder network in particular leads to a consistent performance improvement across numerous quality and intelligibility metrics on a variety of acoustic conditions using the WHAMR corpus, a data-set of noisy reverberant speech mixtures. The use of MHA is also investigated in the decoder network where it is demonstrated that smaller performance improvements are consistently gained within specific model configurations. The best performing MHA models yield a mean 0.6 dB scale invariant signal-to-distortion (SISDR) improvement on noisy reverberant mixtures over a baseline 1D convolution encoder. A mean 1 dB SISDR improvement is observed on clean speech mixtures.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83744223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}