Pub Date : 2021-07-10DOI: 10.19139/SOIC-2310-5070-1092
Saeid Pourmand, Ashkan Shabbak, M. Ganjali
Due to the extensive use of high-dimensional data and its application in a wide range of scientifc felds of research, dimensionality reduction has become a major part of the preprocessing step in machine learning. Feature selection is one procedure for reducing dimensionality. In this process, instead of using the whole set of features, a subset is selected to be used in the learning model. Feature selection (FS) methods are divided into three main categories: flters, wrappers, and embedded approaches. Filter methods only depend on the characteristics of the data, and do not rely on the learning model at hand. Divergence functions as measures of evaluating the differences between probability distribution functions can be used as flter methods of feature selection. In this paper, the performances of a few divergence functions such as Jensen-Shannon (JS) divergence and Exponential divergence (EXP) are compared with those of some of the most-known flter feature selection methods such as Information Gain (IG) and Chi-Squared (CHI). This comparison was made through accuracy rate and F1-score of classifcation models after implementing these feature selection methods.
由于高维数据的广泛使用及其在科学研究领域的广泛应用,降维已成为机器学习预处理步骤的重要组成部分。特征选择是降维的一个步骤。在这个过程中,不是使用整个特征集,而是选择一个子集用于学习模型。特征选择(FS)方法主要分为三类:过滤器、包装器和嵌入方法。过滤方法只依赖于数据的特征,而不依赖于手头的学习模型。散度函数作为评价概率分布函数之间差异的度量,可以作为特征选择的过滤方法。本文将Jensen-Shannon (JS)散度和指数散度(EXP)等散度函数的性能与一些最著名的滤波特征选择方法(Information Gain (IG)和CHI - squared (CHI))的性能进行了比较。通过实现这些特征选择方法后分类模型的准确率和f1得分进行比较。
{"title":"Feature Selection Based on Divergence Functions: A Comparative Classiffication Study","authors":"Saeid Pourmand, Ashkan Shabbak, M. Ganjali","doi":"10.19139/SOIC-2310-5070-1092","DOIUrl":"https://doi.org/10.19139/SOIC-2310-5070-1092","url":null,"abstract":"Due to the extensive use of high-dimensional data and its application in a wide range of scientifc felds of research, dimensionality reduction has become a major part of the preprocessing step in machine learning. Feature selection is one procedure for reducing dimensionality. In this process, instead of using the whole set of features, a subset is selected to be used in the learning model. Feature selection (FS) methods are divided into three main categories: flters, wrappers, and embedded approaches. Filter methods only depend on the characteristics of the data, and do not rely on the learning model at hand. Divergence functions as measures of evaluating the differences between probability distribution functions can be used as flter methods of feature selection. In this paper, the performances of a few divergence functions such as Jensen-Shannon (JS) divergence and Exponential divergence (EXP) are compared with those of some of the most-known flter feature selection methods such as Information Gain (IG) and Chi-Squared (CHI). This comparison was made through accuracy rate and F1-score of classifcation models after implementing these feature selection methods.","PeriodicalId":93376,"journal":{"name":"Statistics, optimization & information computing","volume":"198 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80039633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-10DOI: 10.19139/soic-2310-5070-1220
Abd-Elmonem A. M. Teamah, Ahmed A. Elbanna, Ahmed M. Gemeay
Heavy tailed distributions have a big role in studying risk data sets. Statisticians in many cases search and try to find new or relatively new statistical models to fit data sets in different fields. This article introduced a relatively new heavy-tailed statistical model by using alpha power transformation and exponentiated log-logistic distribution which called alpha power exponentiated log-logistic distribution. Its statistical properties were derived mathematically such as moments, moment generating function, quantile function, entropy, inequality curves and order statistics. Five estimation methods were introduced mathematically and the behaviour of the proposed model parameters was checked by randomly generated data sets and these estimation methods. Also, some actuarial measures were deduced mathematically such as value at risk, tail value at risk, tail variance and tail variance premium. Numerical values for these measures were performed and proved that the proposed distribution has a heavier tail than others compared models. Finally, three real data sets from different fields were used to show how these proposed models fitting these data sets than other many wells known and related models.
{"title":"Heavy-Tailed Log-Logistic Distribution: Properties, Risk Measures and Applications","authors":"Abd-Elmonem A. M. Teamah, Ahmed A. Elbanna, Ahmed M. Gemeay","doi":"10.19139/soic-2310-5070-1220","DOIUrl":"https://doi.org/10.19139/soic-2310-5070-1220","url":null,"abstract":"Heavy tailed distributions have a big role in studying risk data sets. Statisticians in many cases search and try to find new or relatively new statistical models to fit data sets in different fields. This article introduced a relatively new heavy-tailed statistical model by using alpha power transformation and exponentiated log-logistic distribution which called alpha power exponentiated log-logistic distribution. Its statistical properties were derived mathematically such as moments, moment generating function, quantile function, entropy, inequality curves and order statistics. Five estimation methods were introduced mathematically and the behaviour of the proposed model parameters was checked by randomly generated data sets and these estimation methods. Also, some actuarial measures were deduced mathematically such as value at risk, tail value at risk, tail variance and tail variance premium. Numerical values for these measures were performed and proved that the proposed distribution has a heavier tail than others compared models. Finally, three real data sets from different fields were used to show how these proposed models fitting these data sets than other many wells known and related models.","PeriodicalId":93376,"journal":{"name":"Statistics, optimization & information computing","volume":"103 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88980312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-28DOI: 10.19139/soic-2310-5070-702
Wirote Apinantanakon, K. Sunat, S. Chiewchanwattana
A swarm-based nature-inspired optimization algorithm, namely, the fruit fly optimization algorithm (FOA), hasa simple structure and is easy to implement. However, FOA has a low success rate and a slow convergence, because FOA generates new positions around the best location, using a fixed search radius. Several improved FOAs have been proposed. However, their exploration ability is questionable. To make the search process smooth, transitioning from the exploration phase to the exploitation phase, this paper proposes a new FOA, constructed from a cooperation of the multileader and the probabilistic random walk strategies (CPFOA). This involves two population types working together. CPFOAs performance is evaluated by 18 well-known standard benchmarks. The results showed that CPFOA outperforms both the original FOA and its variants, in terms of convergence speed and performance accuracy. The results show that CPFOA can achieve a very promising accuracy, when compared with the well-known competitive algorithms. CPFOA is applied to optimize twoapplications: classifying the real datasets with multilayer perceptron and extracting the parameters of a very compact T-S fuzzy system to model the Box and Jenkins gas furnace data set. CPFOA successfully find parameters with a very high quality, compared with the best known competitive algorithms.
{"title":"A Cooperation of the Multileader Fruit Fly and Probabilistic Random Walk Strategies with Adaptive Normalization for Solving the Unconstrained Optimization Problems","authors":"Wirote Apinantanakon, K. Sunat, S. Chiewchanwattana","doi":"10.19139/soic-2310-5070-702","DOIUrl":"https://doi.org/10.19139/soic-2310-5070-702","url":null,"abstract":"A swarm-based nature-inspired optimization algorithm, namely, the fruit fly optimization algorithm (FOA), hasa simple structure and is easy to implement. However, FOA has a low success rate and a slow convergence, because FOA generates new positions around the best location, using a fixed search radius. Several improved FOAs have been proposed. However, their exploration ability is questionable. To make the search process smooth, transitioning from the exploration phase to the exploitation phase, this paper proposes a new FOA, constructed from a cooperation of the multileader and the probabilistic random walk strategies (CPFOA). This involves two population types working together. CPFOAs performance is evaluated by 18 well-known standard benchmarks. The results showed that CPFOA outperforms both the original FOA and its variants, in terms of convergence speed and performance accuracy. The results show that CPFOA can achieve a very promising accuracy, when compared with the well-known competitive algorithms. CPFOA is applied to optimize twoapplications: classifying the real datasets with multilayer perceptron and extracting the parameters of a very compact T-S fuzzy system to model the Box and Jenkins gas furnace data set. CPFOA successfully find parameters with a very high quality, compared with the best known competitive algorithms.","PeriodicalId":93376,"journal":{"name":"Statistics, optimization & information computing","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78420323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-08DOI: 10.19139/SOIC-2310-5070-1202
Marah-Lisanne Thormann, Jan Farchmin, Christoph Weisser, René-Marcel Kruse, Benjamin Säfken, A. Silbersdorff
Predicting the trend of stock prices is a central topic in financial engineering. Given the complexity and nonlinearity of the underlying processes we consider the use of neural networks in general and sentiment analysis in particular for the analysis of financial time series. As one of the biggest social media platforms with a user base across the world, Twitter offers a huge potential for such sentiment analysis. In fact, stocks themselves are a popular topic in Twitter discussions. Due to the real-time nature of the collective information quasi contemporaneous information can be harvested for the prediction of financial trends. In this study, we give an introduction in financial feature engineering as well as in the architecture of a Long Short-Term Memory (LSTM) to tackle the highly nonlinear problem of forecasting stock prices. This paper presents a guide for collecting past tweets, processing for sentiment analysis and combining them with technical financial indicators to forecast the stock prices of Apple 30m and 60m ahead. A LSTM with lagged close price is used as a baseline model. We are able to show that a combination of financial and Twitter features can outperform the baseline in all settings. The code to fully replicate our forecasting approach is available in the Appendix.
{"title":"Stock Price Predictions with LSTM Neural Networks and Twitter Sentiment","authors":"Marah-Lisanne Thormann, Jan Farchmin, Christoph Weisser, René-Marcel Kruse, Benjamin Säfken, A. Silbersdorff","doi":"10.19139/SOIC-2310-5070-1202","DOIUrl":"https://doi.org/10.19139/SOIC-2310-5070-1202","url":null,"abstract":"Predicting the trend of stock prices is a central topic in financial engineering. Given the complexity and nonlinearity of the underlying processes we consider the use of neural networks in general and sentiment analysis in particular for the analysis of financial time series. As one of the biggest social media platforms with a user base across the world, Twitter offers a huge potential for such sentiment analysis. In fact, stocks themselves are a popular topic in Twitter discussions. Due to the real-time nature of the collective information quasi contemporaneous information can be harvested for the prediction of financial trends. In this study, we give an introduction in financial feature engineering as well as in the architecture of a Long Short-Term Memory (LSTM) to tackle the highly nonlinear problem of forecasting stock prices. This paper presents a guide for collecting past tweets, processing for sentiment analysis and combining them with technical financial indicators to forecast the stock prices of Apple 30m and 60m ahead. A LSTM with lagged close price is used as a baseline model. We are able to show that a combination of financial and Twitter features can outperform the baseline in all settings. The code to fully replicate our forecasting approach is available in the Appendix.","PeriodicalId":93376,"journal":{"name":"Statistics, optimization & information computing","volume":"9 1","pages":"268-287"},"PeriodicalIF":0.0,"publicationDate":"2021-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48843864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-04-13DOI: 10.19139/SOIC-2310-5070-1179
M. Alizadeh, V. Ranjbar, Abbas Eftekharian, O. Kharazmi
A four-parameter extended of Lindley distribution with application to lifetime data is introduced.It is called extended Marshal-Olkin generalized Lindley distribution. Some mathematical propertiessuch as moments, skewness, kurtosis and extreme value are derived. These properties with plotsof density and hazard functions are shown the high flexibility of the mentioned distribution. Themaximum likelihood estimations of proposed distribution parameters with asymptotic properties ofthese estimations are examined. A simulation study to investigate the performance of maximumlikelihood estimations is presented. Moreover, the performance and flexibility of the new distributionare investigated by comparing with several generalizations of Lindley distribution through two realdata sets. Finally, Bayesian analysis and efficiency of Gibbs sampling are provided based on the tworeal data sets.
{"title":"An Extended Lindley Distribution with Application to Lifetime Data with Bayesian Estimation","authors":"M. Alizadeh, V. Ranjbar, Abbas Eftekharian, O. Kharazmi","doi":"10.19139/SOIC-2310-5070-1179","DOIUrl":"https://doi.org/10.19139/SOIC-2310-5070-1179","url":null,"abstract":"A four-parameter extended of Lindley distribution with application to lifetime data is introduced.It is called extended Marshal-Olkin generalized Lindley distribution. Some mathematical propertiessuch as moments, skewness, kurtosis and extreme value are derived. These properties with plotsof density and hazard functions are shown the high flexibility of the mentioned distribution. Themaximum likelihood estimations of proposed distribution parameters with asymptotic properties ofthese estimations are examined. A simulation study to investigate the performance of maximumlikelihood estimations is presented. Moreover, the performance and flexibility of the new distributionare investigated by comparing with several generalizations of Lindley distribution through two realdata sets. Finally, Bayesian analysis and efficiency of Gibbs sampling are provided based on the tworeal data sets.","PeriodicalId":93376,"journal":{"name":"Statistics, optimization & information computing","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79420130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-03-29DOI: 10.19139/SOIC-2310-5070-1093
Hanaa Elgohari
In this paper, we introduce a new generalization of the Exponentiated Exponential distribution. Various structural mathematical properties are derived. Numerical analysis for mean, variance, skewness and kurtosis and the dispersion index is performed. The new density can be right skewed and symmetric with “unimodal” and “bimodal” shapes. The new hazard function can be “constant”, “decreasing”, “increasing”, “increasing-constant”, “upsidedown-constant”, “decreasingconstant”. Many bivariate and multivariate type model have been also derived. We assess the performance of the maximum likelihood method graphically via the biases and mean squared errors. The usefulness and flexibility of the new distribution is illustrated by means of two real data sets.
{"title":"A New Version of the Exponentiated Exponential Distribution: Copula, Properties and Application to Relief and Survival Times","authors":"Hanaa Elgohari","doi":"10.19139/SOIC-2310-5070-1093","DOIUrl":"https://doi.org/10.19139/SOIC-2310-5070-1093","url":null,"abstract":"In this paper, we introduce a new generalization of the Exponentiated Exponential distribution. Various structural mathematical properties are derived. Numerical analysis for mean, variance, skewness and kurtosis and the dispersion index is performed. The new density can be right skewed and symmetric with “unimodal” and “bimodal” shapes. The new hazard function can be “constant”, “decreasing”, “increasing”, “increasing-constant”, “upsidedown-constant”, “decreasingconstant”. Many bivariate and multivariate type model have been also derived. We assess the performance of the maximum likelihood method graphically via the biases and mean squared errors. The usefulness and flexibility of the new distribution is illustrated by means of two real data sets.","PeriodicalId":93376,"journal":{"name":"Statistics, optimization & information computing","volume":"9 1","pages":"311-333"},"PeriodicalIF":0.0,"publicationDate":"2021-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43344568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-03-28DOI: 10.19139/SOIC-2310-5070-1193
Shasha Wang, Wen-Qing Xu, Jitao Liu
We construct optimal extrapolation estimates of π based on random polygons generated by n independent points uniformly distributed on a unit circle in R2. While the semiperimeters and areas of these random n-gons converge to π almost surely and are asymptotically normal as n → ∞, in this paper we develop various extrapolation processes to further accelerate such convergence. By simultaneously considering the random n-gons and suitably constructed random 2n-gons and then optimizing over functionals of the semiperimeters and areas of these random polygons, we derive several new estimates of π with faster convergence rates. These extrapolation improvements are also shown to be asymptotically normal as n → ∞.
{"title":"Random Polygons and Optimal Extrapolation Estimates of pi","authors":"Shasha Wang, Wen-Qing Xu, Jitao Liu","doi":"10.19139/SOIC-2310-5070-1193","DOIUrl":"https://doi.org/10.19139/SOIC-2310-5070-1193","url":null,"abstract":"We construct optimal extrapolation estimates of π based on random polygons generated by n independent points uniformly distributed on a unit circle in R2. While the semiperimeters and areas of these random n-gons converge to π almost surely and are asymptotically normal as n → ∞, in this paper we develop various extrapolation processes to further accelerate such convergence. By simultaneously considering the random n-gons and suitably constructed random 2n-gons and then optimizing over functionals of the semiperimeters and areas of these random polygons, we derive several new estimates of π with faster convergence rates. These extrapolation improvements are also shown to be asymptotically normal as n → ∞.","PeriodicalId":93376,"journal":{"name":"Statistics, optimization & information computing","volume":"9 1","pages":"241-249"},"PeriodicalIF":0.0,"publicationDate":"2021-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44003294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-03-10DOI: 10.19139/SOIC-2310-5070-1061
Md. Raisul Kibria, M. Yousuf
Text generation is a rapidly evolving field of Natural Language Processing (NLP) with larger Language models proposed very often setting new state-of-the-art. These models are extremely effective in learning the representation of words and their internal coherence in a particular language. However, an established context-driven, end to end text generation model is very rare, even more so for the Bengali language. In this paper, we have proposed a Bidirectional gated recurrent unit (GRU) based architecture that simulates the conditional language model or the decoder portion of the sequence to sequence (seq2seq) model and is further conditioned upon the target context vectors. We have explored several ways of combining multiple context words into a fixed dimensional vector representation that is extracted from the same GloVe language model which is used to generate the embedding matrix. We have used beam search optimization to generate the sentence with the maximum cumulative log probability score. In addition, we have proposed a human scoring based evaluation metric and used it to compare the performance of the model with unidirectional LSTM and GRU networks. Empirical results prove that the proposed model performs exceedingly well in producing meaningful outcomes depicting the target context. The experiment leads to an architecture that can be applied to an extensive domain of context-driven text generation based applications and which is also a key contribution to the NLP based literature of the Bengali language.
{"title":"Context-driven Bengali Text Generation using Conditional Language Model","authors":"Md. Raisul Kibria, M. Yousuf","doi":"10.19139/SOIC-2310-5070-1061","DOIUrl":"https://doi.org/10.19139/SOIC-2310-5070-1061","url":null,"abstract":"Text generation is a rapidly evolving field of Natural Language Processing (NLP) with larger Language models proposed very often setting new state-of-the-art. These models are extremely effective in learning the representation of words and their internal coherence in a particular language. However, an established context-driven, end to end text generation model is very rare, even more so for the Bengali language. In this paper, we have proposed a Bidirectional gated recurrent unit (GRU) based architecture that simulates the conditional language model or the decoder portion of the sequence to sequence (seq2seq) model and is further conditioned upon the target context vectors. We have explored several ways of combining multiple context words into a fixed dimensional vector representation that is extracted from the same GloVe language model which is used to generate the embedding matrix. We have used beam search optimization to generate the sentence with the maximum cumulative log probability score. In addition, we have proposed a human scoring based evaluation metric and used it to compare the performance of the model with unidirectional LSTM and GRU networks. Empirical results prove that the proposed model performs exceedingly well in producing meaningful outcomes depicting the target context. The experiment leads to an architecture that can be applied to an extensive domain of context-driven text generation based applications and which is also a key contribution to the NLP based literature of the Bengali language.","PeriodicalId":93376,"journal":{"name":"Statistics, optimization & information computing","volume":"9 1","pages":"334-350"},"PeriodicalIF":0.0,"publicationDate":"2021-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44278160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-02-06DOI: 10.19139/SOIC-2310-5070-671
Héctor Zárate, Edilberto Cepeda
This article extends the fusion among various statistical methods to estimate the mean and variance functions in heteroscedastic semiparametric models when the response variable comes from a two-parameter exponential family distribution. We rely on the natural connection among smoothing methods that use basis functions with penalization, mixed models and a Bayesian Markov Chain sampling simulation methodology. The significance and implications of our strategy lies in its potential to contribute to a simple and unified computational methodology that takes into account the factors that affect the variability in the responses, which in turn is important for an efficient estimation and correct inference of mean parameters without the specification of fully parametric models. An extensive simulation study investigates the performance of the estimates. Finally, an application using the Light Detection and Ranging technique, LIDAR, data highlights the merits of our approach.
{"title":"Semiparametric Smoothing Spline in Joint Mean and Dispersion Models with Responses from the Biparametric Exponential Family: A Bayesian Perspective","authors":"Héctor Zárate, Edilberto Cepeda","doi":"10.19139/SOIC-2310-5070-671","DOIUrl":"https://doi.org/10.19139/SOIC-2310-5070-671","url":null,"abstract":"This article extends the fusion among various statistical methods to estimate the mean and variance functions in heteroscedastic semiparametric models when the response variable comes from a two-parameter exponential family distribution. We rely on the natural connection among smoothing methods that use basis functions with penalization, mixed models and a Bayesian Markov Chain sampling simulation methodology. The significance and implications of our strategy lies in its potential to contribute to a simple and unified computational methodology that takes into account the factors that affect the variability in the responses, which in turn is important for an efficient estimation and correct inference of mean parameters without the specification of fully parametric models. An extensive simulation study investigates the performance of the estimates. Finally, an application using the Light Detection and Ranging technique, LIDAR, data highlights the merits of our approach.","PeriodicalId":93376,"journal":{"name":"Statistics, optimization & information computing","volume":"9 1","pages":"351-367"},"PeriodicalIF":0.0,"publicationDate":"2021-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45261141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-30DOI: 10.19139/SOIC-2310-5070-994
A. Belcaid, M. Douimi
In this paper, we focus on the problem of signal smoothing and step-detection for piecewise constant signals. This problem is central to several applications such as human activity analysis, speech or image analysis and anomaly detection in genetics. We present a two-stage approach to approximate the well-known line process energy which arises from the probabilistic representation of the signal and its segmentation. In the first stage, we minimize a total variation (TV) least square problem to detect the majority of the continuous edges. In the second stage, we apply a combinatorial algorithm to filter all false jumps introduced by the TV solution. The performances of the proposed method were tested on several synthetic examples. In comparison to recent step-preserving denoising algorithms, the acceleration presents a superior speed and competitive step-detection quality.
{"title":"Nonconvex Energy Minimization with Unsupervised Line Process Classifier for Efficient Piecewise Constant Signals Reconstruction","authors":"A. Belcaid, M. Douimi","doi":"10.19139/SOIC-2310-5070-994","DOIUrl":"https://doi.org/10.19139/SOIC-2310-5070-994","url":null,"abstract":"In this paper, we focus on the problem of signal smoothing and step-detection for piecewise constant signals. This problem is central to several applications such as human activity analysis, speech or image analysis and anomaly detection in genetics. We present a two-stage approach to approximate the well-known line process energy which arises from the probabilistic representation of the signal and its segmentation. In the first stage, we minimize a total variation (TV) least square problem to detect the majority of the continuous edges. In the second stage, we apply a combinatorial algorithm to filter all false jumps introduced by the TV solution. The performances of the proposed method were tested on several synthetic examples. In comparison to recent step-preserving denoising algorithms, the acceleration presents a superior speed and competitive step-detection quality.","PeriodicalId":93376,"journal":{"name":"Statistics, optimization & information computing","volume":"9 1","pages":"435-452"},"PeriodicalIF":0.0,"publicationDate":"2021-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46590671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}