Soft Computing最新文献

英文中文

Strategy for complementor under platform owner’s entry with vertically differentiated content 平台所有者以垂直差异化内容进入市场时的补充者战略

IF 4.1 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Soft Computing

Pub Date : 2024-08-20 DOI: 10.1007/s00500-024-09666-3

Zhiguo Li, Rui Dong, Qianqian Cao, Hongwu Zhang

Complementors who provide content on platforms are increasingly threatened by the entry of platform owners. Platform owners may enter the content market through offering vertically differentiated content either by self producing or hiring the complementor to produce. We build a game-theoretic model to analyze the platform owner’s entry decisions and the complementor’s response strategy considering the effects of demand complementarity, vertical content differentiation and consumer heterogeneity to both players’ strategies. We find that vertical content differentiation relaxes boundary conditions of entry, and it is more obvious when the platform owner has advantage in content value. However, we show that though the complementor may hold advantages on content value, price, or sales volume, it faces dependent dilemma once entry happens. Further, we demonstrate that second-party cooperation may mitigate the dependent dilemma and create a “win–win” situation through leveraging the platform owner’s efficiency in marketing and the complementor’s efficiency in content producing.

在平台上提供内容的补充者越来越受到平台所有者进入市场的威胁。平台所有者可以通过自我生产或雇佣补充者生产的方式提供垂直差异化内容，从而进入内容市场。考虑到需求互补性、垂直内容差异化和消费者异质性对双方策略的影响，我们建立了一个博弈论模型来分析平台所有者的进入决策和补充者的应对策略。我们发现，垂直内容差异化会放宽进入的边界条件，当平台所有者拥有内容价值优势时，垂直内容差异化会更加明显。然而，我们的研究表明，尽管补充者可能在内容价值、价格或销售量上占有优势，但一旦进入市场，它就会面临依赖性困境。此外，我们还证明了第二方合作可以缓解依存困境，并通过利用平台所有者的营销效率和补充者的内容生产效率创造 "双赢 "局面。

引用次数: 0

A dynamic model of social media ad information diffusion in uncertain environment 不确定环境中社交媒体广告信息扩散的动态模型

IF 4.1 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Soft Computing

Pub Date : 2024-08-20 DOI: 10.1007/s00500-024-09665-4

Meiling Jin, Yufu Ning, Fengming Liu, Zhen Li, Haoran Zheng, Yichang Gao, Jian Zhou

In the social media environment, ad (advertise) information diffusion can effectively increase the ad promotion effect and promote product marketing. Therefore, it is essential to study the mechanism of ad information diffusion in social media. This study aims to help cope with the complexity and uncertainty in social media ad information diffusion systems by identifying the causal relationships behind ad information diffusion behavior and clarifying the feedback mechanisms of system factors, and then testing the validity of the system model through simulation. Specifically, this study innovatively combines system dynamics and uncertainty theory to construct a dynamic model of ad information diffusion system. Particularly, the uncertainty effect of environmental noise on the ad information diffusion system is considered and portrayed as a Liu process. This study can explore the diffusion mechanism of ad information more precisely, so as to better serve the ad promotion industry.

在社交媒体环境中，广告（宣传）信息扩散能有效提高广告推广效果，促进产品营销。因此，研究社交媒体中的广告信息扩散机制十分必要。本研究旨在通过识别广告信息扩散行为背后的因果关系，阐明系统因素的反馈机制，进而通过仿真检验系统模型的有效性，帮助应对社交媒体广告信息扩散系统的复杂性和不确定性。具体而言，本研究创新性地将系统动力学与不确定性理论相结合，构建了广告信息扩散系统的动态模型。特别是考虑了环境噪声对广告信息扩散系统的不确定性影响，并将其描绘成一个刘过程。该研究可以更精确地探索广告信息的扩散机制，从而更好地服务于广告推广行业。

引用次数: 0

Ridge estimation for uncertain regression model with imprecise observations 具有不精确观测数据的不确定回归模型的岭估计

IF 4.1 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Soft Computing

Pub Date : 2024-08-20 DOI: 10.1007/s00500-024-09656-5

Shuang Zhang, Xin Gao

In traditional regression analysis, the observed data are all accurate, but the observed data that we can obtain in real life is often not accurate. For that reason, there is the uncertain regression analysis based on the uncertain variable, in the framework of the uncertainty theory. Under the premise of imprecise observations, the data obtained often contains outliers due to human input errors or incorrect measurements. Outliers can affect parameter estimation, resulting in misleading results and making model fitting inaccurate. In parameter estimation, the most commonly used method is least squares estimation, but this method is extremely sensitive to outliers and makes parameter estimation inaccurate. To solve this problem, this paper proposes an uncertain regression model based on ridge estimation, which adds a square penalty term when performing least squares estimation of unknown parameters. The advantage of ridge estimation is that the tolerance of pathological data is much better than other parameter estimation methods, which can reduce the influence of outliers. In this paper, the optimal shrinkage parameter is determined by K-fold cross-validation to estimate the parameters of the regression model, and then we conduct the residual analysis and hypothesis test on the fitted model to obtain the predicted value and the predicted confidence interval. Finally, the validity of the model is demonstrated by two numerical examples.

在传统的回归分析中，观测到的数据都是准确的，但我们在现实生活中能得到的观测数据往往并不准确。为此，在不确定性理论的框架下，出现了基于不确定变量的不确定回归分析。在观测数据不精确的前提下，由于人为输入错误或测量结果不正确，所获得的数据往往包含异常值。异常值会影响参数估计，导致误导性结果，使模型拟合不准确。在参数估计中，最常用的方法是最小二乘估计法，但这种方法对异常值极为敏感，会导致参数估计不准确。为解决这一问题，本文提出了一种基于脊估计的不确定回归模型，该模型在对未知参数进行最小二乘估计时增加了一个平方惩罚项。脊估计法的优点是对病态数据的容忍度远远优于其他参数估计方法，可以减少异常值的影响。本文通过 K 倍交叉验证确定最优收缩参数，估计回归模型参数，然后对拟合模型进行残差分析和假设检验，得到预测值和预测置信区间。最后，通过两个数值实例证明了模型的有效性。

{"title":"Ridge estimation for uncertain regression model with imprecise observations","authors":"Shuang Zhang, Xin Gao","doi":"10.1007/s00500-024-09656-5","DOIUrl":"https://doi.org/10.1007/s00500-024-09656-5","url":null,"abstract":"In traditional regression analysis, the observed data are all accurate, but the observed data that we can obtain in real life is often not accurate. For that reason, there is the uncertain regression analysis based on the uncertain variable, in the framework of the uncertainty theory. Under the premise of imprecise observations, the data obtained often contains outliers due to human input errors or incorrect measurements. Outliers can affect parameter estimation, resulting in misleading results and making model fitting inaccurate. In parameter estimation, the most commonly used method is least squares estimation, but this method is extremely sensitive to outliers and makes parameter estimation inaccurate. To solve this problem, this paper proposes an uncertain regression model based on ridge estimation, which adds a square penalty term when performing least squares estimation of unknown parameters. The advantage of ridge estimation is that the tolerance of pathological data is much better than other parameter estimation methods, which can reduce the influence of outliers. In this paper, the optimal shrinkage parameter is determined by K-fold cross-validation to estimate the parameters of the regression model, and then we conduct the residual analysis and hypothesis test on the fitted model to obtain the predicted value and the predicted confidence interval. Finally, the validity of the model is demonstrated by two numerical examples.","PeriodicalId":22039,"journal":{"name":"Soft Computing","volume":"174 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Research efficiency evaluation and regional differences analysis of humanities and social sciences of colleges and universities in China 中国高校人文社会科学研究效率评价与地区差异分析

IF 4.1 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Soft Computing

Pub Date : 2024-08-20 DOI: 10.1007/s00500-024-09662-7

Lifeng Wang, Tan Wang

Humanities and Social Sciences (HSS) play a key role for deepening human understanding and transforming the world, providing essential value orientation for the advancement of natural sciences. Evaluating the efficiency of HSS in higher education institutions is crucial for optimizing the distribution of research resources and improving scientific research efficiency. By utilizing statistical data from HSS in colleges and universities across 31 provinces (municipalities, regions) of China (excluding Hong Kong, Macao, and Taiwan) from 2017 to 2020, this study employs the Data Envelopment Analysis (DEA) and Malmquist models to assess both the static and dynamic scientific research efficiencies. Additionally, the Theil index is used to examine regional differences. The findings reveal that: (1) HSS in colleges and universities exhibit relatively high static efficiency, with the Malmquist index indicating a general trend of improvement primarily driven by efficiency change (EC); (2) colleges and universities in China’s central region display higher static efficiency than those in the western region, yet they lack the internal driving forces for fostering research efficiency; (3) an analysis of static and dynamic efficiencies via the Theil index shows widening regional differences, with within-group differences being the main source of overall differences. The above results imply that each region should adopt different countermeasures to improve scientific research efficiency.

人文社会科学（HSS）在深化人类认识和改造世界方面发挥着关键作用，为自然科学的发展提供了重要的价值导向。评估高等院校人文社科的效率对于优化科研资源配置、提高科研效率至关重要。本研究利用2017-2020年中国31个省（市、区）（不含港澳台地区）高校人文社科统计数据，采用数据包络分析（DEA）和Malmquist模型对静态和动态科研效率进行评估。此外，研究还采用了 Theil 指数来考察地区差异。研究结果表明(1) 高校人文社科的静态效率相对较高，Malmquist 指数显示出主要由效率变化（EC）驱动的总体改善趋势；(2) 中国中部地区高校的静态效率高于西部地区高校，但它们缺乏促进科研效率的内在驱动力；(3) 通过 Theil 指数对静态和动态效率的分析表明，地区差异正在扩大，组内差异是总体差异的主要来源。上述结果表明，各地区应采取不同的对策来提高科研效率。

{"title":"Research efficiency evaluation and regional differences analysis of humanities and social sciences of colleges and universities in China","authors":"Lifeng Wang, Tan Wang","doi":"10.1007/s00500-024-09662-7","DOIUrl":"https://doi.org/10.1007/s00500-024-09662-7","url":null,"abstract":"Humanities and Social Sciences (HSS) play a key role for deepening human understanding and transforming the world, providing essential value orientation for the advancement of natural sciences. Evaluating the efficiency of HSS in higher education institutions is crucial for optimizing the distribution of research resources and improving scientific research efficiency. By utilizing statistical data from HSS in colleges and universities across 31 provinces (municipalities, regions) of China (excluding Hong Kong, Macao, and Taiwan) from 2017 to 2020, this study employs the Data Envelopment Analysis (DEA) and Malmquist models to assess both the static and dynamic scientific research efficiencies. Additionally, the Theil index is used to examine regional differences. The findings reveal that: (1) HSS in colleges and universities exhibit relatively high static efficiency, with the Malmquist index indicating a general trend of improvement primarily driven by efficiency change (EC); (2) colleges and universities in China’s central region display higher static efficiency than those in the western region, yet they lack the internal driving forces for fostering research efficiency; (3) an analysis of static and dynamic efficiencies via the Theil index shows widening regional differences, with within-group differences being the main source of overall differences. The above results imply that each region should adopt different countermeasures to improve scientific research efficiency.","PeriodicalId":22039,"journal":{"name":"Soft Computing","volume":"78 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimal pricing in social networks under fuzzy environment 模糊环境下社交网络的最优定价

IF 4.1 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Soft Computing

Pub Date : 2024-08-20 DOI: 10.1007/s00500-024-09657-4

Zhuqing Liu, Yaodong Ni

In this paper, we study optimal pricing strategy decisions of the monopolist in a social network under a fuzzy environment, in which consumers experience a nonnegative network effect that is influenced by their neighbors’ consumption level, the extent to which they are affected is considered as a fuzzy variable. To derive the equilibrium solution, we establish a two-stage game model for decision processes in consumer social networks. Utilizing the backward induction, we first get the expected consumption equilibrium, then figure out the matrix expression of unique pricing equilibrium by maximizing the monopolist’s profit. In addition, we introduce Fuzzy Bonacich Centrality, and find out components of the price each consumer charged by the monopolist in a fuzzy network, this points out the importance of the monopolist knowing consumer network structure. By conducting numerical studies, we find that the network effect plays an essential role in deciding pricing strategies in fuzzy social networks, but fuzziness would weaken this impact. For social networks with fuzziness existing, the monopolist should choose discriminatory pricing strategy to benefit most. The results of our model can provide valuable managerial insights when helping the monopolist make pricing decisions.

在本文中，我们研究了模糊环境下社交网络中垄断者的最优定价策略决策，在这种环境下，消费者会受到非负网络效应的影响，而这种网络效应会受到其邻居消费水平的影响，其受影响的程度被视为一个模糊变量。为了得出均衡解，我们建立了消费者社交网络决策过程的两阶段博弈模型。利用反向归纳法，我们首先得到预期消费均衡，然后通过垄断者利润最大化计算出唯一定价均衡的矩阵表达式。此外，我们还引入了模糊博纳西奇中心性（Fuzzy Bonacich Centrality），并在模糊网络中找出了垄断者向每个消费者收取的价格的组成部分，这就指出了垄断者了解消费者网络结构的重要性。通过数值研究，我们发现在模糊社会网络中，网络效应对定价策略的决定起着至关重要的作用，但模糊性会削弱这种影响。对于存在模糊性的社交网络，垄断者应选择歧视性定价策略，以获得最大利益。我们的模型结果可以为垄断者的定价决策提供有价值的管理启示。

{"title":"Optimal pricing in social networks under fuzzy environment","authors":"Zhuqing Liu, Yaodong Ni","doi":"10.1007/s00500-024-09657-4","DOIUrl":"https://doi.org/10.1007/s00500-024-09657-4","url":null,"abstract":"In this paper, we study optimal pricing strategy decisions of the monopolist in a social network under a fuzzy environment, in which consumers experience a nonnegative network effect that is influenced by their neighbors’ consumption level, the extent to which they are affected is considered as a fuzzy variable. To derive the equilibrium solution, we establish a two-stage game model for decision processes in consumer social networks. Utilizing the backward induction, we first get the expected consumption equilibrium, then figure out the matrix expression of unique pricing equilibrium by maximizing the monopolist’s profit. In addition, we introduce Fuzzy Bonacich Centrality, and find out components of the price each consumer charged by the monopolist in a fuzzy network, this points out the importance of the monopolist knowing consumer network structure. By conducting numerical studies, we find that the network effect plays an essential role in deciding pricing strategies in fuzzy social networks, but fuzziness would weaken this impact. For social networks with fuzziness existing, the monopolist should choose discriminatory pricing strategy to benefit most. The results of our model can provide valuable managerial insights when helping the monopolist make pricing decisions.","PeriodicalId":22039,"journal":{"name":"Soft Computing","volume":"67 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The influence of gender and age in driving ability: an analysis of average and extreme behaviours 性别和年龄对驾驶能力的影响：平均行为和极端行为分析

IF 4.1 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Soft Computing

Pub Date : 2024-08-20 DOI: 10.1007/s00500-024-09782-0

Fabio Baione, Davide Biancalana, Massimiliano Menzietti

In 2012, the European Court of Justice introduced the ban on differentiating car insurance premiums for gender to avoid gender inequality. This paper deals with a gender analysis of driving ability by investigating the relationship between gender and the relative total claim amount in Motor Third Party Liability insurance, also considering the effect of age. Leveraging a two-part model based on parametric quantile regression, we want to investigate the average behaviour of drivers and their tail behaviour in order to highlight the importance of dispersion and the impact of largest claims. As a consequence, the purpose of our contribution is to study how gender and age can influence the entire probability distribution of the insurance claim with a particular focus on the quantiles with high probability levels, which are very important indicators to determine the effective riskiness of a driver. We apply our model to an Australian insurance dataset; our results suggest that men are in general riskier in terms of both average and tail behaviour.

2012 年，欧洲法院禁止按性别区分汽车保险费，以避免性别不平等。本文通过调查性别与机动车第三者责任险相对索赔总额之间的关系，同时考虑年龄的影响，对驾驶能力进行性别分析。利用基于参数量化回归的两部分模型，我们希望调查驾驶员的平均行为及其尾部行为，以突出分散的重要性和最大索赔额的影响。因此，我们的目的是研究性别和年龄如何影响保险理赔的整个概率分布，并特别关注高概率水平的量值，这是确定驾驶员有效风险程度的重要指标。我们将模型应用于澳大利亚的一个保险数据集；结果表明，就平均行为和尾部行为而言，男性一般风险较高。

引用次数: 0

Sheffer stroke operation on L-algebras via an algorithmic approach 通过算法对 L 后拉进行谢弗行程运算

IF 4.1 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Soft Computing

Pub Date : 2024-08-19 DOI: 10.1007/s00500-024-09906-6

Necla Kırcalı Gürsoy, Tahsin Öner, Arif Gürsoy, Alper Ülker

In this study, we introduce the Sheffer stroke L-algebra and prove some fundamental theorems, propositions and lemmas of Sheffer Stroke L-algebras. The notions of filter and ultrafilter for Sheffer stroke L-algebra are studied. We give subalgebra and normal subset definitions of a Sheffer stroke L-algebras. Moreover, a homomorphism between Sheffer stroke L-algebras is introduced and isomorphism theorems are presented. Finally, we give three new algorithms for Sheffer stroke L-algebras. Thus, it is contributed to researchers on different application areas by presenting an algorithmic approach on this subject, for the first time in the literature.

在本研究中，我们介绍了 Sheffer 冲程 L-代数，并证明了 Sheffer 冲程 L-代数的一些基本定理、命题和定理。研究了谢弗行程 L- 代数的滤波和超滤波概念。我们给出了 Sheffer 冲程 L-gebras 的子代数和法子集定义。此外，我们还引入了 Sheffer 冲程 L-gebras 之间的同构，并给出了同构定理。最后，我们给出了 Sheffer stroke L- 算法的三种新算法。因此，通过在文献中首次提出关于这一主题的算法方法，它为不同应用领域的研究人员做出了贡献。

引用次数: 0

Multi-rider ridesharing stable matching optimization 多乘客共享乘车的稳定匹配优化

IF 4.1 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Soft Computing

Pub Date : 2024-08-19 DOI: 10.1007/s00500-024-09947-x

Hua Ke, Haoyang Li

The rapid growth of private car ownership has led to significant issues such as traffic congestion and environmental pollution. Ridesharing has emerged as a promising solution to alleviate the negative impacts associated with private car usage. This paper focuses on the stability of ridesharing systems and establishes a single-driver multiple-rider ridesharing matching model. To solve this model, a filtering algorithm for the pre-matching set and a fast-solving algorithm for stable matching scheme are proposed. Furthermore, we introduce the concept of subsidy distance upper limit into the ridesharing system. Remarkably, our findings indicate that with a limit of 0.1km, the distance saved generated by the subsidy amounts to 560.5% of the total subsidy. To validate our approach, we simulate ridesharing demand data using real taxi data, and design computational experiments to prove the computational efficiency of the filtering algorithm and fast-solving algorithm. The impact of various parameters on ridesharing systems is also explored.

私家车拥有量的快速增长导致了交通拥堵和环境污染等重大问题。为缓解与私家车使用相关的负面影响，共享出行已成为一种前景广阔的解决方案。本文重点研究了共享出行系统的稳定性，并建立了单司机多乘客共享出行匹配模型。为求解该模型，本文提出了预匹配集过滤算法和稳定匹配方案快速求解算法。此外，我们还在共享出行系统中引入了补贴距离上限的概念。值得注意的是，我们的研究结果表明，当距离上限为 0.1 公里时，补贴所节省的距离相当于总补贴的 560.5%。为了验证我们的方法，我们利用真实的出租车数据模拟了共享出行需求数据，并设计了计算实验来证明过滤算法和快速求解算法的计算效率。我们还探讨了各种参数对共享出行系统的影响。

引用次数: 0

Algebraic rough sets via algebraic relations 通过代数关系的代数粗糙集

IF 4.1 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Soft Computing

Pub Date : 2024-08-19 DOI: 10.1007/s00500-024-09820-x

Xiu-Yun Wu, Chun-Yan Liao, Hui-Min Zhang

The aim of this paper is to discuss algebraic rough set and its relationships with convex space, rough set and generalized neighborhood space. Specifically, the notion of algebraic relations is introduced and a pair of lower approximation operator and upper approximation operator are presented. Then, several conditions of algebraic relations such as seriality, reflexivity, (resp., weak, primitive) symmetry and (resp., strong) transitivity are characterized by algebraic approximation operators. Based on this, relationships among algebraic rough sets, convex structures and generalized neighborhood systems are investigated. It is proved that the category of reflexive and transitive algebraic rough spaces is isomorphic to the category of convex spaces. In particular, the category of reflexive, weakly symmetric and transitive algebraic rough spaces is isomorphic to the category of convex matroids and the category of reflexive, weakly symmetric and transitive algebraic generalized neighborhood spaces.

本文旨在讨论代数粗糙集及其与凸空间、粗糙集和广义邻域空间的关系。具体来说，本文引入了代数关系的概念，并提出了一对下近似算子和上近似算子。然后，用代数近似算子来描述代数关系的几个条件，如序列性、反射性、（弱的、原始的）对称性和（强的）传递性。在此基础上，研究了代数粗糙集、凸结构和广义邻域系统之间的关系。研究证明，反折和传递代数粗糙空间范畴与凸空间范畴同构。特别是，反折、弱对称和传递代数粗糙空间范畴与凸矩阵范畴和反折、弱对称和传递代数广义邻域空间范畴同构。

引用次数: 0

Integrating prior knowledge and data-driven approaches for improving grapheme-to-phoneme conversion in Korean language 整合先验知识和数据驱动方法，改进韩语中的词素到音素转换

IF 4.1 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Soft Computing

Pub Date : 2024-08-19 DOI: 10.1007/s00500-024-09934-2

Dezhi Cao, Yue Zhao, Licheng Wu

Grapheme-to-phoneme (G2P) conversion technology is currently dominated by two methodologies: knowledge-based and data-based approaches. Knowledge-driven methods struggle to adapt to extensive datasets, while data-driven methods rely heavily on high-quality data and require precise feature selection for model construction. To address these challenges, this research aims to propose an integrated approach that combines prior knowledge with data-driven techniques for automatic G2P conversion in the Korean language. In this work, we extract attributes based on pronunciation rules and phonetic transformations between Korean words to construct a decision tree. Subsequently, the model is trained using a data-driven approach for automated phonetic transcription. The proposed integrated model achieves more accurate alignment between input and output variables, effectively capturing phonological variations in continuous Korean speech, and determining corresponding phonemes for graphemes. Rigorous cross-validation confirms its superiority, with an average accuracy of 94.63% in grapheme-to-phoneme conversion, outperforming existing methodologies. In conclusion, this research demonstrates the effectiveness of an integrated approach combining prior knowledge and data-driven techniques for G2P conversion in Korean. The high accuracy and performance of this method are significant for Korean G2P. Our approach can also be applied to low-resource or endangered languages that already have some linguistic research foundation to improve the accuracy of the pronunciation lexicon of the language.

词素到音素（G2P）转换技术目前主要有两种方法：基于知识的方法和基于数据的方法。知识驱动型方法难以适应广泛的数据集，而数据驱动型方法则严重依赖高质量数据，并需要为构建模型进行精确的特征选择。为了应对这些挑战，本研究旨在提出一种综合方法，将先验知识与数据驱动技术相结合，实现韩语 G2P 的自动转换。在这项工作中，我们根据发音规则和韩语单词之间的语音转换提取属性，构建决策树。随后，利用数据驱动方法对模型进行训练，以实现自动音标转写。所提出的综合模型实现了输入和输出变量之间更精确的对齐，有效捕捉了连续韩语语音中的语音变化，并为音素确定了相应的音素。严格的交叉验证证实了该模型的优越性，在词素到音素的转换中平均准确率达到 94.63%，优于现有方法。总之，这项研究证明了结合先验知识和数据驱动技术的综合方法在韩语 G2P 转换中的有效性。这种方法的高准确性和高性能对韩语 G2P 具有重要意义。我们的方法也可应用于低资源或濒危语言，这些语言已经有了一定的语言学研究基础，可以提高语言发音词典的准确性。

{"title":"Integrating prior knowledge and data-driven approaches for improving grapheme-to-phoneme conversion in Korean language","authors":"Dezhi Cao, Yue Zhao, Licheng Wu","doi":"10.1007/s00500-024-09934-2","DOIUrl":"https://doi.org/10.1007/s00500-024-09934-2","url":null,"abstract":"Grapheme-to-phoneme (G2P) conversion technology is currently dominated by two methodologies: knowledge-based and data-based approaches. Knowledge-driven methods struggle to adapt to extensive datasets, while data-driven methods rely heavily on high-quality data and require precise feature selection for model construction. To address these challenges, this research aims to propose an integrated approach that combines prior knowledge with data-driven techniques for automatic G2P conversion in the Korean language. In this work, we extract attributes based on pronunciation rules and phonetic transformations between Korean words to construct a decision tree. Subsequently, the model is trained using a data-driven approach for automated phonetic transcription. The proposed integrated model achieves more accurate alignment between input and output variables, effectively capturing phonological variations in continuous Korean speech, and determining corresponding phonemes for graphemes. Rigorous cross-validation confirms its superiority, with an average accuracy of 94.63% in grapheme-to-phoneme conversion, outperforming existing methodologies. In conclusion, this research demonstrates the effectiveness of an integrated approach combining prior knowledge and data-driven techniques for G2P conversion in Korean. The high accuracy and performance of this method are significant for Korean G2P. Our approach can also be applied to low-resource or endangered languages that already have some linguistic research foundation to improve the accuracy of the pronunciation lexicon of the language.","PeriodicalId":22039,"journal":{"name":"Soft Computing","volume":"26 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Soft Computing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀