首页 > 最新文献

Statistics, optimization & information computing最新文献

英文 中文
Feature Selection Based on Divergence Functions: A Comparative Classiffication Study 基于发散函数的特征选择:比较分类研究
Pub Date : 2021-07-10 DOI: 10.19139/SOIC-2310-5070-1092
Saeid Pourmand, Ashkan Shabbak, M. Ganjali
Due to the extensive use of high-dimensional data and its application in a wide range of scientifc felds of research, dimensionality reduction has become a major part of the preprocessing step in machine learning. Feature selection is one procedure for reducing dimensionality. In this process, instead of using the whole set of features, a subset is selected to be used in the learning model. Feature selection (FS) methods are divided into three main categories: flters, wrappers, and embedded approaches. Filter methods only depend on the characteristics of the data, and do not rely on the learning model at hand. Divergence functions as measures of evaluating the differences between probability distribution functions can be used as flter methods of feature selection. In this paper, the performances of a few divergence functions such as Jensen-Shannon (JS) divergence and Exponential divergence (EXP) are compared with those of some of the most-known flter feature selection methods such as Information Gain (IG) and Chi-Squared (CHI). This comparison was made through accuracy rate and F1-score of classifcation models after implementing these feature selection methods.
由于高维数据的广泛使用及其在科学研究领域的广泛应用,降维已成为机器学习预处理步骤的重要组成部分。特征选择是降维的一个步骤。在这个过程中,不是使用整个特征集,而是选择一个子集用于学习模型。特征选择(FS)方法主要分为三类:过滤器、包装器和嵌入方法。过滤方法只依赖于数据的特征,而不依赖于手头的学习模型。散度函数作为评价概率分布函数之间差异的度量,可以作为特征选择的过滤方法。本文将Jensen-Shannon (JS)散度和指数散度(EXP)等散度函数的性能与一些最著名的滤波特征选择方法(Information Gain (IG)和CHI - squared (CHI))的性能进行了比较。通过实现这些特征选择方法后分类模型的准确率和f1得分进行比较。
{"title":"Feature Selection Based on Divergence Functions: A Comparative Classiffication Study","authors":"Saeid Pourmand, Ashkan Shabbak, M. Ganjali","doi":"10.19139/SOIC-2310-5070-1092","DOIUrl":"https://doi.org/10.19139/SOIC-2310-5070-1092","url":null,"abstract":"Due to the extensive use of high-dimensional data and its application in a wide range of scientifc felds of research, dimensionality reduction has become a major part of the preprocessing step in machine learning. Feature selection is one procedure for reducing dimensionality. In this process, instead of using the whole set of features, a subset is selected to be used in the learning model. Feature selection (FS) methods are divided into three main categories: flters, wrappers, and embedded approaches. Filter methods only depend on the characteristics of the data, and do not rely on the learning model at hand. Divergence functions as measures of evaluating the differences between probability distribution functions can be used as flter methods of feature selection. In this paper, the performances of a few divergence functions such as Jensen-Shannon (JS) divergence and Exponential divergence (EXP) are compared with those of some of the most-known flter feature selection methods such as Information Gain (IG) and Chi-Squared (CHI). This comparison was made through accuracy rate and F1-score of classifcation models after implementing these feature selection methods.","PeriodicalId":93376,"journal":{"name":"Statistics, optimization & information computing","volume":"198 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80039633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Heavy-Tailed Log-Logistic Distribution: Properties, Risk Measures and Applications 重尾对数- logistic分布:性质、风险度量及应用
Pub Date : 2021-07-10 DOI: 10.19139/soic-2310-5070-1220
Abd-Elmonem A. M. Teamah, Ahmed A. Elbanna, Ahmed M. Gemeay
Heavy tailed distributions have a big role in studying risk data sets. Statisticians in many cases search and try to find new or relatively new statistical models to fit data sets in different fields. This article introduced a relatively new heavy-tailed statistical model by using alpha power transformation and exponentiated log-logistic distribution which called alpha power exponentiated log-logistic distribution. Its statistical properties were derived mathematically such as moments, moment generating function, quantile function, entropy, inequality curves and order statistics. Five estimation methods were introduced mathematically and the behaviour of the proposed model parameters was checked by randomly generated data sets and these estimation methods. Also, some actuarial measures were deduced mathematically such as value at risk, tail value at risk, tail variance and tail variance premium. Numerical values for these measures were performed and proved that the proposed distribution has a heavier tail than others compared models. Finally, three real data sets from different fields were used to show how these proposed models fitting these data sets than other many wells known and related models.
重尾分布在研究风险数据集方面具有重要作用。统计学家在许多情况下搜索并试图找到新的或相对较新的统计模型来拟合不同领域的数据集。本文介绍了一种利用幂变换和指数对数-logistic分布的较新的重尾统计模型,即幂指数对数-logistic分布。从数学上推导了其统计性质,如矩、矩生成函数、分位数函数、熵、不等式曲线和序统计量。从数学上介绍了五种估计方法,并通过随机生成的数据集和这些估计方法对所提出的模型参数的行为进行了检验。对风险值、风险尾值、尾方差和尾方差溢价等精算指标进行了数学推导。对这些度量进行了数值计算,并证明了所提出的分布比其他比较模型具有更重的尾部。最后,使用来自不同领域的三个真实数据集来展示这些提出的模型与其他许多众所周知的相关模型相比如何拟合这些数据集。
{"title":"Heavy-Tailed Log-Logistic Distribution: Properties, Risk Measures and Applications","authors":"Abd-Elmonem A. M. Teamah, Ahmed A. Elbanna, Ahmed M. Gemeay","doi":"10.19139/soic-2310-5070-1220","DOIUrl":"https://doi.org/10.19139/soic-2310-5070-1220","url":null,"abstract":"Heavy tailed distributions have a big role in studying risk data sets. Statisticians in many cases search and try to find new or relatively new statistical models to fit data sets in different fields. This article introduced a relatively new heavy-tailed statistical model by using alpha power transformation and exponentiated log-logistic distribution which called alpha power exponentiated log-logistic distribution. Its statistical properties were derived mathematically such as moments, moment generating function, quantile function, entropy, inequality curves and order statistics. Five estimation methods were introduced mathematically and the behaviour of the proposed model parameters was checked by randomly generated data sets and these estimation methods. Also, some actuarial measures were deduced mathematically such as value at risk, tail value at risk, tail variance and tail variance premium. Numerical values for these measures were performed and proved that the proposed distribution has a heavier tail than others compared models. Finally, three real data sets from different fields were used to show how these proposed models fitting these data sets than other many wells known and related models.","PeriodicalId":93376,"journal":{"name":"Statistics, optimization & information computing","volume":"103 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88980312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
A Cooperation of the Multileader Fruit Fly and Probabilistic Random Walk Strategies with Adaptive Normalization for Solving the Unconstrained Optimization Problems 多导果蝇与自适应归一化概率随机行走策略的合作求解无约束优化问题
Pub Date : 2021-06-28 DOI: 10.19139/soic-2310-5070-702
Wirote Apinantanakon, K. Sunat, S. Chiewchanwattana
A swarm-based nature-inspired optimization algorithm, namely, the fruit fly optimization algorithm (FOA), hasa simple structure and is easy to implement. However, FOA has a low success rate and a slow convergence, because FOA generates new positions around the best location, using a fixed search radius. Several improved FOAs have been proposed. However, their exploration ability is questionable. To make the search process smooth, transitioning from the exploration phase to the exploitation phase, this paper proposes a new FOA, constructed from a cooperation of the multileader and the probabilistic random walk strategies (CPFOA). This involves two population types working together. CPFOAs performance is evaluated by 18 well-known standard benchmarks. The results showed that CPFOA outperforms both the original FOA and its variants, in terms of convergence speed and performance accuracy. The results show that CPFOA can achieve a very promising accuracy, when compared with the well-known competitive algorithms. CPFOA is applied to optimize twoapplications: classifying the real datasets with multilayer perceptron and extracting the parameters of a very compact T-S fuzzy system to model the Box and Jenkins gas furnace data set. CPFOA successfully find parameters with a very high quality, compared with the best known competitive algorithms.
一种基于群体的自然优化算法,即果蝇优化算法(FOA),结构简单,易于实现。然而,FOA的成功率较低,收敛速度较慢,因为FOA使用固定的搜索半径在最佳位置周围生成新位置。提出了若干改进的foa。然而,他们的勘探能力值得怀疑。为了使搜索过程从探索阶段顺利过渡到开发阶段,本文提出了一种基于多领导群和概率随机漫步策略(CPFOA)合作的寻优算法。这涉及到两种人口类型一起工作。CPFOAs的性能由18个著名的标准基准进行评估。结果表明,CPFOA在收敛速度和性能精度方面都优于原始FOA及其变体。结果表明,与已有的竞争算法相比,CPFOA算法具有很高的精度。将CPFOA应用于两种优化应用:用多层感知器对真实数据集进行分类,提取非常紧凑的T-S模糊系统参数对Box和Jenkins煤气炉数据集进行建模。与最知名的竞争算法相比,CPFOA成功地找到了质量很高的参数。
{"title":"A Cooperation of the Multileader Fruit Fly and Probabilistic Random Walk Strategies with Adaptive Normalization for Solving the Unconstrained Optimization Problems","authors":"Wirote Apinantanakon, K. Sunat, S. Chiewchanwattana","doi":"10.19139/soic-2310-5070-702","DOIUrl":"https://doi.org/10.19139/soic-2310-5070-702","url":null,"abstract":"A swarm-based nature-inspired optimization algorithm, namely, the fruit fly optimization algorithm (FOA), hasa simple structure and is easy to implement. However, FOA has a low success rate and a slow convergence, because FOA generates new positions around the best location, using a fixed search radius. Several improved FOAs have been proposed. However, their exploration ability is questionable. To make the search process smooth, transitioning from the exploration phase to the exploitation phase, this paper proposes a new FOA, constructed from a cooperation of the multileader and the probabilistic random walk strategies (CPFOA). This involves two population types working together. CPFOAs performance is evaluated by 18 well-known standard benchmarks. The results showed that CPFOA outperforms both the original FOA and its variants, in terms of convergence speed and performance accuracy. The results show that CPFOA can achieve a very promising accuracy, when compared with the well-known competitive algorithms. CPFOA is applied to optimize twoapplications: classifying the real datasets with multilayer perceptron and extracting the parameters of a very compact T-S fuzzy system to model the Box and Jenkins gas furnace data set. CPFOA successfully find parameters with a very high quality, compared with the best known competitive algorithms.","PeriodicalId":93376,"journal":{"name":"Statistics, optimization & information computing","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78420323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Stock Price Predictions with LSTM Neural Networks and Twitter Sentiment 基于LSTM神经网络和Twitter情绪的股价预测
Pub Date : 2021-05-08 DOI: 10.19139/SOIC-2310-5070-1202
Marah-Lisanne Thormann, Jan Farchmin, Christoph Weisser, René-Marcel Kruse, Benjamin Säfken, A. Silbersdorff
Predicting the trend of stock prices is a central topic in financial engineering. Given the complexity and nonlinearity of the underlying processes we consider the use of neural networks in general and sentiment analysis in particular for the analysis of financial time series. As one of the biggest social media platforms with a user base across the world, Twitter offers a huge potential for such sentiment analysis. In fact, stocks themselves are a popular topic in Twitter discussions. Due to the real-time nature of the collective information quasi contemporaneous information can be harvested for the prediction of financial trends. In this study, we give an introduction in financial feature engineering as well as in the architecture of a Long Short-Term Memory (LSTM) to tackle the highly nonlinear problem of forecasting stock prices. This paper presents a guide for collecting past tweets, processing for sentiment analysis and combining them with technical financial indicators to forecast the stock prices of Apple 30m and 60m ahead. A LSTM with lagged close price is used as a baseline model. We are able to show that a combination of financial and Twitter features can outperform the baseline in all settings. The code to fully replicate our forecasting approach is available in the Appendix.
预测股票价格走势是金融工程中的一个中心课题。考虑到潜在过程的复杂性和非线性,我们一般考虑使用神经网络,特别是情绪分析来分析金融时间序列。作为用户遍布全球的最大社交媒体平台之一,Twitter为这种情绪分析提供了巨大的潜力。事实上,股票本身就是推特讨论的热门话题。由于集体信息的实时性,可以收获准同期信息,用于预测财务趋势。在本研究中,我们介绍了金融特征工程以及长短期记忆(LSTM)的架构,以解决预测股票价格的高度非线性问题。本文提出了一个指南,收集过去的推文,处理情绪分析,并结合技术财务指标,预测未来3000万和6000万的苹果股价。使用收盘价格滞后的LSTM作为基准模型。我们能够证明,金融和Twitter功能的组合可以在所有设置中超过基线。完全复制我们的预测方法的代码可在附录中找到。
{"title":"Stock Price Predictions with LSTM Neural Networks and Twitter Sentiment","authors":"Marah-Lisanne Thormann, Jan Farchmin, Christoph Weisser, René-Marcel Kruse, Benjamin Säfken, A. Silbersdorff","doi":"10.19139/SOIC-2310-5070-1202","DOIUrl":"https://doi.org/10.19139/SOIC-2310-5070-1202","url":null,"abstract":"Predicting the trend of stock prices is a central topic in financial engineering. Given the complexity and nonlinearity of the underlying processes we consider the use of neural networks in general and sentiment analysis in particular for the analysis of financial time series. As one of the biggest social media platforms with a user base across the world, Twitter offers a huge potential for such sentiment analysis. In fact, stocks themselves are a popular topic in Twitter discussions. Due to the real-time nature of the collective information quasi contemporaneous information can be harvested for the prediction of financial trends. In this study, we give an introduction in financial feature engineering as well as in the architecture of a Long Short-Term Memory (LSTM) to tackle the highly nonlinear problem of forecasting stock prices. This paper presents a guide for collecting past tweets, processing for sentiment analysis and combining them with technical financial indicators to forecast the stock prices of Apple 30m and 60m ahead. A LSTM with lagged close price is used as a baseline model. We are able to show that a combination of financial and Twitter features can outperform the baseline in all settings. The code to fully replicate our forecasting approach is available in the Appendix.","PeriodicalId":93376,"journal":{"name":"Statistics, optimization & information computing","volume":"9 1","pages":"268-287"},"PeriodicalIF":0.0,"publicationDate":"2021-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48843864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
An Extended Lindley Distribution with Application to Lifetime Data with Bayesian Estimation 广义Lindley分布及其在贝叶斯估计下的应用
Pub Date : 2021-04-13 DOI: 10.19139/SOIC-2310-5070-1179
M. Alizadeh, V. Ranjbar, Abbas Eftekharian, O. Kharazmi
A four-parameter extended of Lindley distribution with application to lifetime data is introduced.It is called extended Marshal-Olkin generalized Lindley distribution. Some mathematical propertiessuch as moments, skewness, kurtosis and extreme value are derived. These properties with plotsof density and hazard functions are shown the high flexibility of the mentioned distribution. Themaximum likelihood estimations of proposed distribution parameters with asymptotic properties ofthese estimations are examined. A simulation study to investigate the performance of maximumlikelihood estimations is presented. Moreover, the performance and flexibility of the new distributionare investigated by comparing with several generalizations of Lindley distribution through two realdata sets. Finally, Bayesian analysis and efficiency of Gibbs sampling are provided based on the tworeal data sets.
介绍了林德利分布的四参数扩展及其在寿命数据中的应用。它被称为扩展marshall - olkin广义林德利分布。导出了矩、偏度、峰度和极值等数学性质。这些性质与密度和危险函数的图显示了上述分布的高度灵活性。对所提出的分布参数的极大似然估计及其渐近性质进行了研究。给出了一种研究极大似然估计性能的仿真研究。此外,通过两个实际数据集与林德利分布的几种推广进行比较,研究了新分布的性能和灵活性。最后给出了基于两组数据集的贝叶斯分析和Gibbs抽样的效率。
{"title":"An Extended Lindley Distribution with Application to Lifetime Data with Bayesian Estimation","authors":"M. Alizadeh, V. Ranjbar, Abbas Eftekharian, O. Kharazmi","doi":"10.19139/SOIC-2310-5070-1179","DOIUrl":"https://doi.org/10.19139/SOIC-2310-5070-1179","url":null,"abstract":"A four-parameter extended of Lindley distribution with application to lifetime data is introduced.It is called extended Marshal-Olkin generalized Lindley distribution. Some mathematical propertiessuch as moments, skewness, kurtosis and extreme value are derived. These properties with plotsof density and hazard functions are shown the high flexibility of the mentioned distribution. Themaximum likelihood estimations of proposed distribution parameters with asymptotic properties ofthese estimations are examined. A simulation study to investigate the performance of maximumlikelihood estimations is presented. Moreover, the performance and flexibility of the new distributionare investigated by comparing with several generalizations of Lindley distribution through two realdata sets. Finally, Bayesian analysis and efficiency of Gibbs sampling are provided based on the tworeal data sets.","PeriodicalId":93376,"journal":{"name":"Statistics, optimization & information computing","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79420130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A New Version of the Exponentiated Exponential Distribution: Copula, Properties and Application to Relief and Survival Times 指数分布的一个新版本:Copula,性质及其在救援和生存时代的应用
Pub Date : 2021-03-29 DOI: 10.19139/SOIC-2310-5070-1093
Hanaa Elgohari
In this paper, we introduce a new generalization of the Exponentiated Exponential distribution. Various structural mathematical properties are derived. Numerical analysis for mean, variance, skewness and kurtosis and the dispersion index is performed. The new density can be right skewed and symmetric with “unimodal” and “bimodal” shapes. The new hazard function can be “constant”, “decreasing”, “increasing”, “increasing-constant”, “upsidedown-constant”, “decreasingconstant”. Many bivariate and multivariate type model have been also derived. We assess the performance of the maximum likelihood method graphically via the biases and mean squared errors. The usefulness and flexibility of the new distribution is illustrated by means of two real data sets.
在本文中,我们引入了指数指数分布的一个新的推广。导出了各种结构数学性质。对均值、方差、偏度、峰度和色散指数进行了数值分析。新的密度可以是右偏的和对称的,具有“单峰”和“双峰”形状。新的危险函数可以是“常数”、“递减”、“递增”、“增大常数”、”上下常数“、”递减常数“。还导出了许多双变量和多变量类型的模型。我们通过偏差和均方误差以图形方式评估最大似然方法的性能。通过两个实际数据集说明了新分布的有用性和灵活性。
{"title":"A New Version of the Exponentiated Exponential Distribution: Copula, Properties and Application to Relief and Survival Times","authors":"Hanaa Elgohari","doi":"10.19139/SOIC-2310-5070-1093","DOIUrl":"https://doi.org/10.19139/SOIC-2310-5070-1093","url":null,"abstract":"In this paper, we introduce a new generalization of the Exponentiated Exponential distribution. Various structural mathematical properties are derived. Numerical analysis for mean, variance, skewness and kurtosis and the dispersion index is performed. The new density can be right skewed and symmetric with “unimodal” and “bimodal” shapes. The new hazard function can be “constant”, “decreasing”, “increasing”, “increasing-constant”, “upsidedown-constant”, “decreasingconstant”. Many bivariate and multivariate type model have been also derived. We assess the performance of the maximum likelihood method graphically via the biases and mean squared errors. The usefulness and flexibility of the new distribution is illustrated by means of two real data sets.","PeriodicalId":93376,"journal":{"name":"Statistics, optimization & information computing","volume":"9 1","pages":"311-333"},"PeriodicalIF":0.0,"publicationDate":"2021-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43344568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Random Polygons and Optimal Extrapolation Estimates of pi 随机多边形和圆周率的最佳外推估计
Pub Date : 2021-03-28 DOI: 10.19139/SOIC-2310-5070-1193
Shasha Wang, Wen-Qing Xu, Jitao Liu
We construct optimal extrapolation estimates of π based on random polygons generated by n independent points uniformly distributed on a unit circle in R2. While the semiperimeters and areas of these random n-gons converge to π almost surely and are asymptotically normal as n → ∞, in this paper we develop various extrapolation processes to further accelerate such convergence. By simultaneously considering the random n-gons and suitably constructed random 2n-gons and then optimizing over functionals of the semiperimeters and areas of these random polygons, we derive several new estimates of π with faster convergence rates. These extrapolation improvements are also shown to be asymptotically normal as n → ∞.
基于均匀分布在R2中单位圆上的n个独立点生成的随机多边形,构造了π的最优外推估计。由于这些随机n-gon的半周长和面积几乎肯定收敛于π,并且随着n→∞渐近正态化,本文发展了各种外推过程来进一步加速这种收敛。通过同时考虑随机n-多边形和适当构造的随机2n-多边形,然后对这些随机多边形的半周长和面积的泛函进行优化,我们得到了几个收敛速度更快的π的新估计。这些外推改进也被证明是渐近正态的n→∞。
{"title":"Random Polygons and Optimal Extrapolation Estimates of pi","authors":"Shasha Wang, Wen-Qing Xu, Jitao Liu","doi":"10.19139/SOIC-2310-5070-1193","DOIUrl":"https://doi.org/10.19139/SOIC-2310-5070-1193","url":null,"abstract":"We construct optimal extrapolation estimates of π based on random polygons generated by n independent points uniformly distributed on a unit circle in R2. While the semiperimeters and areas of these random n-gons converge to π almost surely and are asymptotically normal as n → ∞, in this paper we develop various extrapolation processes to further accelerate such convergence. By simultaneously considering the random n-gons and suitably constructed random 2n-gons and then optimizing over functionals of the semiperimeters and areas of these random polygons, we derive several new estimates of π with faster convergence rates. These extrapolation improvements are also shown to be asymptotically normal as n → ∞.","PeriodicalId":93376,"journal":{"name":"Statistics, optimization & information computing","volume":"9 1","pages":"241-249"},"PeriodicalIF":0.0,"publicationDate":"2021-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44003294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Context-driven Bengali Text Generation using Conditional Language Model 使用条件语言模型的上下文驱动的孟加拉语文本生成
Pub Date : 2021-03-10 DOI: 10.19139/SOIC-2310-5070-1061
Md. Raisul Kibria, M. Yousuf
Text generation is a rapidly evolving field of Natural Language Processing (NLP) with larger Language models proposed very often setting new state-of-the-art. These models are extremely effective in learning the representation of words and their internal coherence in a particular language. However, an established context-driven, end to end text generation model is very rare, even more so for the Bengali language. In this paper, we have proposed a Bidirectional gated recurrent unit (GRU) based architecture that simulates the conditional language model or the decoder portion of the sequence to sequence (seq2seq) model and is further conditioned upon the target context vectors. We have explored several ways of combining multiple context words into a fixed dimensional vector representation that is extracted from the same GloVe language model which is used to generate the embedding matrix. We have used beam search optimization to generate the sentence with the maximum cumulative log probability score. In addition, we have proposed a human scoring based evaluation metric and used it to compare the performance of the model with unidirectional LSTM and GRU networks. Empirical results prove that the proposed model performs exceedingly well in producing meaningful outcomes depicting the target context. The experiment leads to an architecture that can be applied to an extensive domain of context-driven text generation based applications and which is also a key contribution to the NLP based literature of the Bengali language.
文本生成是自然语言处理(NLP)的一个快速发展的领域,提出的更大的语言模型往往会带来新的技术。这些模型在学习特定语言中单词的表达及其内部连贯性方面非常有效。然而,已经建立的上下文驱动的端到端文本生成模型非常罕见,孟加拉语更是如此。在本文中,我们提出了一种基于双向门控递归单元(GRU)的架构,该架构模拟条件语言模型或序列到序列(seq2seq)模型的解码器部分,并进一步以目标上下文向量为条件。我们已经探索了几种将多个上下文单词组合成固定维向量表示的方法,该向量表示是从用于生成嵌入矩阵的同一GloVe语言模型中提取的。我们使用波束搜索优化来生成具有最大累积对数概率得分的句子。此外,我们还提出了一种基于人工评分的评估指标,并将其用于将模型的性能与单向LSTM和GRU网络进行比较。经验结果证明,所提出的模型在产生描述目标上下文的有意义的结果方面表现得非常好。该实验产生了一种可以应用于基于上下文驱动的文本生成应用程序的广泛领域的架构,这也是对孟加拉语基于NLP的文献的关键贡献。
{"title":"Context-driven Bengali Text Generation using Conditional Language Model","authors":"Md. Raisul Kibria, M. Yousuf","doi":"10.19139/SOIC-2310-5070-1061","DOIUrl":"https://doi.org/10.19139/SOIC-2310-5070-1061","url":null,"abstract":"Text generation is a rapidly evolving field of Natural Language Processing (NLP) with larger Language models proposed very often setting new state-of-the-art. These models are extremely effective in learning the representation of words and their internal coherence in a particular language. However, an established context-driven, end to end text generation model is very rare, even more so for the Bengali language. In this paper, we have proposed a Bidirectional gated recurrent unit (GRU) based architecture that simulates the conditional language model or the decoder portion of the sequence to sequence (seq2seq) model and is further conditioned upon the target context vectors. We have explored several ways of combining multiple context words into a fixed dimensional vector representation that is extracted from the same GloVe language model which is used to generate the embedding matrix. We have used beam search optimization to generate the sentence with the maximum cumulative log probability score. In addition, we have proposed a human scoring based evaluation metric and used it to compare the performance of the model with unidirectional LSTM and GRU networks. Empirical results prove that the proposed model performs exceedingly well in producing meaningful outcomes depicting the target context. The experiment leads to an architecture that can be applied to an extensive domain of context-driven text generation based applications and which is also a key contribution to the NLP based literature of the Bengali language.","PeriodicalId":93376,"journal":{"name":"Statistics, optimization & information computing","volume":"9 1","pages":"334-350"},"PeriodicalIF":0.0,"publicationDate":"2021-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44278160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semiparametric Smoothing Spline in Joint Mean and Dispersion Models with Responses from the Biparametric Exponential Family: A Bayesian Perspective 具有双参数指数族响应的联合均值和离散模型中的半参数光滑样条:贝叶斯视角
Pub Date : 2021-02-06 DOI: 10.19139/SOIC-2310-5070-671
Héctor Zárate, Edilberto Cepeda
This article extends the fusion among various statistical methods to estimate the mean and variance functions in heteroscedastic semiparametric models when the response variable comes from a two-parameter exponential family distribution. We rely on the natural connection among smoothing methods that use basis functions with penalization, mixed models and a Bayesian Markov Chain sampling simulation methodology. The significance and implications of our strategy lies in its potential to contribute to a simple and unified computational methodology that takes into account the factors that affect the variability in the responses, which in turn is important for an efficient estimation and correct inference of mean parameters without the specification of fully parametric models. An extensive simulation study investigates the performance of the estimates. Finally, an application using the Light Detection and Ranging technique, LIDAR, data highlights the merits of our approach.
本文将各种统计方法的融合推广到响应变量来自双参数指数族分布的异方差半参数模型中均值和方差函数的估计。我们依赖于使用带有惩罚的基函数、混合模型和贝叶斯马尔可夫链抽样模拟方法的平滑方法之间的自然联系。我们的策略的意义和意义在于,它有可能为一种简单而统一的计算方法做出贡献,这种方法考虑了影响响应可变性的因素,这反过来对于在没有全参数模型规范的情况下有效估计和正确推断平均参数非常重要。一个广泛的模拟研究调查了估计的性能。最后,使用光探测和测距技术,激光雷达,数据的应用突出了我们的方法的优点。
{"title":"Semiparametric Smoothing Spline in Joint Mean and Dispersion Models with Responses from the Biparametric Exponential Family: A Bayesian Perspective","authors":"Héctor Zárate, Edilberto Cepeda","doi":"10.19139/SOIC-2310-5070-671","DOIUrl":"https://doi.org/10.19139/SOIC-2310-5070-671","url":null,"abstract":"This article extends the fusion among various statistical methods to estimate the mean and variance functions in heteroscedastic semiparametric models when the response variable comes from a two-parameter exponential family distribution. We rely on the natural connection among smoothing methods that use basis functions with penalization, mixed models and a Bayesian Markov Chain sampling simulation methodology. The significance and implications of our strategy lies in its potential to contribute to a simple and unified computational methodology that takes into account the factors that affect the variability in the responses, which in turn is important for an efficient estimation and correct inference of mean parameters without the specification of fully parametric models. An extensive simulation study investigates the performance of the estimates. Finally, an application using the Light Detection and Ranging technique, LIDAR, data highlights the merits of our approach.","PeriodicalId":93376,"journal":{"name":"Statistics, optimization & information computing","volume":"9 1","pages":"351-367"},"PeriodicalIF":0.0,"publicationDate":"2021-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45261141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Nonconvex Energy Minimization with Unsupervised Line Process Classifier for Efficient Piecewise Constant Signals Reconstruction 基于无监督线性过程分类器的非凸能量最小化算法用于高效分段常量信号重构
Pub Date : 2021-01-30 DOI: 10.19139/SOIC-2310-5070-994
A. Belcaid, M. Douimi
In this paper, we focus on the problem of signal smoothing and step-detection for piecewise constant signals. This problem is central to several applications such as human activity analysis, speech or image analysis and anomaly detection in genetics. We present a two-stage approach to approximate the well-known line process energy which arises from the probabilistic representation of the signal and its segmentation. In the first stage, we minimize a total variation (TV) least square problem to detect the majority of the continuous edges. In the second stage, we apply a combinatorial algorithm to filter all false jumps introduced by the TV solution. The performances of the proposed method were tested on several synthetic examples. In comparison to recent step-preserving denoising algorithms, the acceleration presents a superior speed and competitive step-detection quality.
本文主要研究分段常数信号的信号平滑和步进检测问题。这个问题是几个应用的核心,如人类活动分析,语音或图像分析和遗传学中的异常检测。我们提出了一种两阶段的方法来近似众所周知的线过程能量,它源于信号的概率表示及其分割。在第一阶段,我们最小化总变化(TV)最小二乘问题来检测大多数连续边缘。在第二阶段,我们采用组合算法来过滤电视方案引入的所有假跳。通过几个综合算例验证了该方法的性能。与现有的保持步长去噪算法相比,加速算法具有更高的速度和具有竞争力的步长检测质量。
{"title":"Nonconvex Energy Minimization with Unsupervised Line Process Classifier for Efficient Piecewise Constant Signals Reconstruction","authors":"A. Belcaid, M. Douimi","doi":"10.19139/SOIC-2310-5070-994","DOIUrl":"https://doi.org/10.19139/SOIC-2310-5070-994","url":null,"abstract":"In this paper, we focus on the problem of signal smoothing and step-detection for piecewise constant signals. This problem is central to several applications such as human activity analysis, speech or image analysis and anomaly detection in genetics. We present a two-stage approach to approximate the well-known line process energy which arises from the probabilistic representation of the signal and its segmentation. In the first stage, we minimize a total variation (TV) least square problem to detect the majority of the continuous edges. In the second stage, we apply a combinatorial algorithm to filter all false jumps introduced by the TV solution. The performances of the proposed method were tested on several synthetic examples. In comparison to recent step-preserving denoising algorithms, the acceleration presents a superior speed and competitive step-detection quality.","PeriodicalId":93376,"journal":{"name":"Statistics, optimization & information computing","volume":"9 1","pages":"435-452"},"PeriodicalIF":0.0,"publicationDate":"2021-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46590671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Statistics, optimization & information computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1