Algorithms最新文献_第6页

Smooth Information Criterion for Regularized Estimation of Item Response Models 项目反应模型正则化估计的平滑信息标准

Algorithms

Pub Date : 2024-04-06 DOI: 10.3390/a17040153

Alexander Robitzsch

Item response theory (IRT) models are frequently used to analyze multivariate categorical data from questionnaires or cognitive test data. In order to reduce the model complexity in item response models, regularized estimation is now widely applied, adding a nondifferentiable penalty function like the LASSO or the SCAD penalty to the log-likelihood function in the optimization function. In most applications, regularized estimation repeatedly estimates the IRT model on a grid of regularization parameters λ. The final model is selected for the parameter that minimizes the Akaike or Bayesian information criterion (AIC or BIC). In recent work, it has been proposed to directly minimize a smooth approximation of the AIC or the BIC for regularized estimation. This approach circumvents the repeated estimation of the IRT model. To this end, the computation time is substantially reduced. The adequacy of the new approach is demonstrated by three simulation studies focusing on regularized estimation for IRT models with differential item functioning, multidimensional IRT models with cross-loadings, and the mixed Rasch/two-parameter logistic IRT model. It was found from the simulation studies that the computationally less demanding direct optimization based on the smooth variants of AIC and BIC had comparable or improved performance compared to the ordinarily employed repeated regularized estimation based on AIC or BIC.

项目反应理论（IRT）模型常用于分析问卷或认知测试数据中的多元分类数据。为了降低项目反应模型的复杂性，正则化估计现已得到广泛应用，即在优化函数中的对数似然函数上添加一个无差别惩罚函数，如 LASSO 或 SCAD 惩罚。在大多数应用中，正则化估计都是在正则化参数 λ 的网格上重复估计 IRT 模型。在最近的工作中，有人提出直接最小化正则化估计的 AIC 或 BIC 的平滑近似值。这种方法避免了对 IRT 模型的重复估计。为此，计算时间大大缩短。新方法的充分性通过三项模拟研究得到了证明，这些研究主要针对具有差异项目功能的 IRT 模型、具有交叉负荷的多维 IRT 模型以及 Rasch/双参数逻辑混合 IRT 模型的正则化估计。模拟研究发现，与通常使用的基于 AIC 或 BIC 的重复正则化估算相比，基于 AIC 和 BIC 平滑变体的直接优化计算要求更低，性能相当或有所提高。

{"title":"Smooth Information Criterion for Regularized Estimation of Item Response Models","authors":"Alexander Robitzsch","doi":"10.3390/a17040153","DOIUrl":"https://doi.org/10.3390/a17040153","url":null,"abstract":"Item response theory (IRT) models are frequently used to analyze multivariate categorical data from questionnaires or cognitive test data. In order to reduce the model complexity in item response models, regularized estimation is now widely applied, adding a nondifferentiable penalty function like the LASSO or the SCAD penalty to the log-likelihood function in the optimization function. In most applications, regularized estimation repeatedly estimates the IRT model on a grid of regularization parameters λ. The final model is selected for the parameter that minimizes the Akaike or Bayesian information criterion (AIC or BIC). In recent work, it has been proposed to directly minimize a smooth approximation of the AIC or the BIC for regularized estimation. This approach circumvents the repeated estimation of the IRT model. To this end, the computation time is substantially reduced. The adequacy of the new approach is demonstrated by three simulation studies focusing on regularized estimation for IRT models with differential item functioning, multidimensional IRT models with cross-loadings, and the mixed Rasch/two-parameter logistic IRT model. It was found from the simulation studies that the computationally less demanding direct optimization based on the smooth variants of AIC and BIC had comparable or improved performance compared to the ordinarily employed repeated regularized estimation based on AIC or BIC.","PeriodicalId":502609,"journal":{"name":"Algorithms","volume":"36 130","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140735094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Resource Allocation of Cooperative Alternatives Using the Analytic Hierarchy Process and Analytic Network Process with Shapley Values 利用层次分析法和带有夏普利值的网络分析法分配合作备选方案的资源

Algorithms

Pub Date : 2024-04-05 DOI: 10.3390/a17040152

Jih-Jeng Huang, Chin-Yi Chen

Cooperative alternatives need complex multi-criteria decision-making (MCDM) consideration, especially in resource allocation, where the alternatives exhibit interdependent relationships. Traditional MCDM methods like the Analytic Hierarchy Process (AHP) and Analytic Network Process (ANP) often overlook the synergistic potential of cooperative alternatives. This study introduces a novel method integrating AHP/ANP with Shapley values, specifically designed to address this gap by evaluating alternatives on individual merits and their contributions within coalitions. Our methodology begins with defining problem structures and applying AHP/ANP to determine the criteria weights and alternatives’ scores. Subsequently, we compute Shapley values based on coalition values, synthesizing these findings to inform resource allocation decisions more equitably. A numerical example of budget allocation illustrates the method’s efficacy, revealing significant insights into resource distribution when cooperative dynamics are considered. Our results demonstrate the proposed method’s superiority in capturing the nuanced interplay between criteria and alternatives, leading to more informed urban planning decisions. This approach marks a significant advancement in MCDM, offering a comprehensive framework that incorporates both the analytical rigor of AHP/ANP and the equitable considerations of cooperative game theory through Shapley values.

合作替代方案需要复杂的多标准决策（MCDM）考量，尤其是在资源分配中，替代方案表现出相互依存的关系。传统的 MCDM 方法，如层次分析法（AHP）和网络分析法（ANP），往往忽视了合作备选方案的协同潜力。本研究介绍了一种将 AHP/ANP 与 Shapley 值相结合的新方法，专门用于通过评估备选方案的个体优点及其在联盟中的贡献来弥补这一不足。我们的方法首先定义问题结构，然后应用 AHP/ANP 确定标准权重和备选方案得分。随后，我们根据联盟值计算 Shapley 值，综合这些结果，为资源分配决策提供更公平的信息。一个预算分配的数字示例说明了该方法的功效，揭示了在考虑合作动态时资源分配的重要见解。我们的研究结果表明，所提出的方法在捕捉标准和备选方案之间细微的相互作用方面具有优越性，从而可以做出更加明智的城市规划决策。这种方法标志着 MCDM 的重大进步，它提供了一个全面的框架，既包含了 AHP/ANP 的严谨分析，又通过 Shapley 值考虑到了合作博弈理论的公平性。

{"title":"Resource Allocation of Cooperative Alternatives Using the Analytic Hierarchy Process and Analytic Network Process with Shapley Values","authors":"Jih-Jeng Huang, Chin-Yi Chen","doi":"10.3390/a17040152","DOIUrl":"https://doi.org/10.3390/a17040152","url":null,"abstract":"Cooperative alternatives need complex multi-criteria decision-making (MCDM) consideration, especially in resource allocation, where the alternatives exhibit interdependent relationships. Traditional MCDM methods like the Analytic Hierarchy Process (AHP) and Analytic Network Process (ANP) often overlook the synergistic potential of cooperative alternatives. This study introduces a novel method integrating AHP/ANP with Shapley values, specifically designed to address this gap by evaluating alternatives on individual merits and their contributions within coalitions. Our methodology begins with defining problem structures and applying AHP/ANP to determine the criteria weights and alternatives’ scores. Subsequently, we compute Shapley values based on coalition values, synthesizing these findings to inform resource allocation decisions more equitably. A numerical example of budget allocation illustrates the method’s efficacy, revealing significant insights into resource distribution when cooperative dynamics are considered. Our results demonstrate the proposed method’s superiority in capturing the nuanced interplay between criteria and alternatives, leading to more informed urban planning decisions. This approach marks a significant advancement in MCDM, offering a comprehensive framework that incorporates both the analytical rigor of AHP/ANP and the equitable considerations of cooperative game theory through Shapley values.","PeriodicalId":502609,"journal":{"name":"Algorithms","volume":"24 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140736597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Research on Efficient Feature Generation and Spatial Aggregation for Remote Sensing Semantic Segmentation 遥感语义分割的高效特征生成与空间聚合研究

Algorithms

Pub Date : 2024-04-04 DOI: 10.3390/a17040151

Ruoyang Li, Shuping Xiong, Yinchao Che, Lei Shi, Xinming Ma, Lei Xi

Semantic segmentation algorithms leveraging deep convolutional neural networks often encounter challenges due to their extensive parameters, high computational complexity, and slow execution. To address these issues, we introduce a semantic segmentation network model emphasizing the rapid generation of redundant features and multi-level spatial aggregation. This model applies cost-efficient linear transformations instead of standard convolution operations during feature map generation, effectively managing memory usage and reducing computational complexity. To enhance the feature maps’ representation ability post-linear transformation, a specifically designed dual-attention mechanism is implemented, enhancing the model’s capacity for semantic understanding of both local and global image information. Moreover, the model integrates sparse self-attention with multi-scale contextual strategies, effectively combining features across different scales and spatial extents. This approach optimizes computational efficiency and retains crucial information, enabling precise and quick image segmentation. To assess the model’s segmentation performance, we conducted experiments in Changge City, Henan Province, using datasets such as LoveDA, PASCAL VOC, LandCoverNet, and DroneDeploy. These experiments demonstrated the model’s outstanding performance on public remote sensing datasets, significantly reducing the parameter count and computational complexity while maintaining high accuracy in segmentation tasks. This advancement offers substantial technical benefits for applications in agriculture and forestry, including land cover classification and crop health monitoring, thereby underscoring the model’s potential to support these critical sectors effectively.

利用深度卷积神经网络的语义分割算法往往会遇到参数多、计算复杂度高、执行速度慢等挑战。为了解决这些问题，我们引入了一种语义分割网络模型，强调快速生成冗余特征和多级空间聚合。该模型在特征图生成过程中采用了具有成本效益的线性变换，而不是标准的卷积操作，从而有效地管理了内存使用，降低了计算复杂度。为了提高线性变换后特征图的表示能力，该模型采用了专门设计的双重关注机制，增强了模型对局部和全局图像信息的语义理解能力。此外，该模型还将稀疏自我注意与多尺度上下文策略相结合，有效地结合了不同尺度和空间范围的特征。这种方法既能优化计算效率，又能保留关键信息，从而实现精确、快速的图像分割。为了评估该模型的分割性能，我们在河南省长葛市使用 LoveDA、PASCAL VOC、LandCoverNet 和 DroneDeploy 等数据集进行了实验。这些实验证明了该模型在公共遥感数据集上的出色表现，在保持高精度分割任务的同时，大大减少了参数数量和计算复杂度。这一进步为农业和林业应用（包括土地覆被分类和作物健康监测）带来了巨大的技术优势，从而凸显了该模型有效支持这些关键领域的潜力。

{"title":"Research on Efficient Feature Generation and Spatial Aggregation for Remote Sensing Semantic Segmentation","authors":"Ruoyang Li, Shuping Xiong, Yinchao Che, Lei Shi, Xinming Ma, Lei Xi","doi":"10.3390/a17040151","DOIUrl":"https://doi.org/10.3390/a17040151","url":null,"abstract":"Semantic segmentation algorithms leveraging deep convolutional neural networks often encounter challenges due to their extensive parameters, high computational complexity, and slow execution. To address these issues, we introduce a semantic segmentation network model emphasizing the rapid generation of redundant features and multi-level spatial aggregation. This model applies cost-efficient linear transformations instead of standard convolution operations during feature map generation, effectively managing memory usage and reducing computational complexity. To enhance the feature maps’ representation ability post-linear transformation, a specifically designed dual-attention mechanism is implemented, enhancing the model’s capacity for semantic understanding of both local and global image information. Moreover, the model integrates sparse self-attention with multi-scale contextual strategies, effectively combining features across different scales and spatial extents. This approach optimizes computational efficiency and retains crucial information, enabling precise and quick image segmentation. To assess the model’s segmentation performance, we conducted experiments in Changge City, Henan Province, using datasets such as LoveDA, PASCAL VOC, LandCoverNet, and DroneDeploy. These experiments demonstrated the model’s outstanding performance on public remote sensing datasets, significantly reducing the parameter count and computational complexity while maintaining high accuracy in segmentation tasks. This advancement offers substantial technical benefits for applications in agriculture and forestry, including land cover classification and crop health monitoring, thereby underscoring the model’s potential to support these critical sectors effectively.","PeriodicalId":502609,"journal":{"name":"Algorithms","volume":"62 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140742267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Solar Irradiance Forecasting with Natural Language Processing of Cloud Observations and Interpretation of Results with Modified Shapley Additive Explanations 利用云观测数据的自然语言处理进行太阳辐照度预报，并利用修正的夏普利加法解释进行结果解释

Algorithms

Pub Date : 2024-04-02 DOI: 10.3390/a17040150

P. Matrenin, Valeriy V. Gamaley, A. Khalyasmaa, Alina I. Stepanova

Forecasting the generation of solar power plants (SPPs) requires taking into account meteorological parameters that influence the difference between the solar irradiance at the top of the atmosphere calculated with high accuracy and the solar irradiance at the tilted plane of the solar panel on the Earth’s surface. One of the key factors is cloudiness, which can be presented not only as a percentage of the sky area covered by clouds but also many additional parameters, such as the type of clouds, the distribution of clouds across atmospheric layers, and their height. The use of machine learning algorithms to forecast the generation of solar power plants requires retrospective data over a long period and formalising the features; however, retrospective data with detailed information about cloudiness are normally recorded in the natural language format. This paper proposes an algorithm for processing such records to convert them into a binary feature vector. Experiments conducted on data from a real solar power plant showed that this algorithm increases the accuracy of short-term solar irradiance forecasts by 5–15%, depending on the quality metric used. At the same time, adding features makes the model less transparent to the user, which is a significant drawback from the point of view of explainable artificial intelligence. Therefore, the paper uses an additive explanation algorithm based on the Shapley vector to interpret the model’s output. It is shown that this approach allows the machine learning model to explain why it generates a particular forecast, which will provide a greater level of trust in intelligent information systems in the power industry.

预测太阳能发电站（SPP）的发电量需要考虑气象参数，这些参数会影响高精度计算的大气顶部太阳辐照度与地球表面太阳能电池板倾斜平面上的太阳辐照度之间的差异。其中一个关键因素是云量，云量不仅可以用云覆盖天空面积的百分比来表示，还可以用许多其他参数来表示，如云的类型、云在大气层中的分布以及云的高度。使用机器学习算法预测太阳能发电厂的发电量需要长期的回顾性数据，并将特征正规化；然而，包含云量详细信息的回顾性数据通常以自然语言格式记录。本文提出了一种处理此类记录的算法，将其转换为二进制特征向量。在一个真实太阳能发电厂的数据上进行的实验表明，该算法可将短期太阳辐照度预报的准确率提高 5-15%，具体取决于所使用的质量指标。同时，添加特征会降低模型对用户的透明度，从可解释人工智能的角度来看，这是一个重大缺陷。因此，本文使用基于夏普利向量的添加解释算法来解释模型的输出。结果表明，这种方法允许机器学习模型解释其产生特定预测的原因，这将为电力行业的智能信息系统提供更高的信任度。

{"title":"Solar Irradiance Forecasting with Natural Language Processing of Cloud Observations and Interpretation of Results with Modified Shapley Additive Explanations","authors":"P. Matrenin, Valeriy V. Gamaley, A. Khalyasmaa, Alina I. Stepanova","doi":"10.3390/a17040150","DOIUrl":"https://doi.org/10.3390/a17040150","url":null,"abstract":"Forecasting the generation of solar power plants (SPPs) requires taking into account meteorological parameters that influence the difference between the solar irradiance at the top of the atmosphere calculated with high accuracy and the solar irradiance at the tilted plane of the solar panel on the Earth’s surface. One of the key factors is cloudiness, which can be presented not only as a percentage of the sky area covered by clouds but also many additional parameters, such as the type of clouds, the distribution of clouds across atmospheric layers, and their height. The use of machine learning algorithms to forecast the generation of solar power plants requires retrospective data over a long period and formalising the features; however, retrospective data with detailed information about cloudiness are normally recorded in the natural language format. This paper proposes an algorithm for processing such records to convert them into a binary feature vector. Experiments conducted on data from a real solar power plant showed that this algorithm increases the accuracy of short-term solar irradiance forecasts by 5–15%, depending on the quality metric used. At the same time, adding features makes the model less transparent to the user, which is a significant drawback from the point of view of explainable artificial intelligence. Therefore, the paper uses an additive explanation algorithm based on the Shapley vector to interpret the model’s output. It is shown that this approach allows the machine learning model to explain why it generates a particular forecast, which will provide a greater level of trust in intelligent information systems in the power industry.","PeriodicalId":502609,"journal":{"name":"Algorithms","volume":"35 24","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140753135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Computational Platform for Automatic Signal Processing for Bender Element Sensors 弯管元件传感器自动信号处理计算平台

Algorithms

Pub Date : 2024-03-22 DOI: 10.3390/a17040131

I. Moldovan, Abdalla Almukashfi, A. Gomes Correia

The small strain shear modulus is an important characteristic of geomaterials that can be measured experimentally using piezoelectric sensors (bender elements). However, most conventional signal interpretation techniques are based on the visual observation of the output signal and therefore inherently subjective. Objective techniques also exist, like the cross-correlation of the input and output signals, but they lack physical insight, as they rely on the (incorrect) assumption that input and output signals are similar. This paper presents GeoHyTE, the first objective and physically consistent toolbox for the automatic processing of the output signal of bender element sensors. GeoHyTE updates a finite element model of the experiment, iteratively searching for the small strain shear modulus that maximises the correlation between the experimental and numerical output signals. The method is objective, as the results do not depend on the experience of the user, and physically consistent, as the wave propagation process is modelled in full and signals of the same nature (output) are correlated. Moreover, GeoHyTE is nearly insensitive to grossly erroneous input by the user, both in terms of the starting point of the iterative maximisation process and refinement of the finite element model. The results obtained with GeoHyTE are validated against benchmark measurements reported in the literature and experimental data obtained by the authors. A detailed statistical analysis of the results obtained with GeoHyTE and conventional interpretation techniques is also presented.

小应变剪切模量是土工材料的一个重要特征，可通过压电传感器（弯管元件）进行实验测量。然而，大多数传统的信号解释技术都是基于对输出信号的目视观察，因此具有固有的主观性。客观技术也存在，如输入和输出信号的交叉相关性，但它们缺乏物理洞察力，因为它们依赖于输入和输出信号相似的假设（不正确）。本文介绍了 GeoHyTE，这是第一个用于自动处理弯管元件传感器输出信号的客观且物理上一致的工具箱。GeoHyTE 对实验的有限元模型进行更新，迭代搜索能使实验和数值输出信号之间相关性最大化的小应变剪切模量。该方法是客观的，因为其结果并不依赖于用户的经验，而且在物理上是一致的，因为波的传播过程是完全模拟的，相同性质（输出）的信号是相关的。此外，GeoHyTE 对用户输入的严重错误几乎不敏感，包括迭代最大化过程的起点和有限元模型的细化。使用 GeoHyTE 得出的结果与文献中报道的基准测量结果和作者获得的实验数据进行了验证。此外，还对使用 GeoHyTE 和传统解释技术得出的结果进行了详细的统计分析。

{"title":"A Computational Platform for Automatic Signal Processing for Bender Element Sensors","authors":"I. Moldovan, Abdalla Almukashfi, A. Gomes Correia","doi":"10.3390/a17040131","DOIUrl":"https://doi.org/10.3390/a17040131","url":null,"abstract":"The small strain shear modulus is an important characteristic of geomaterials that can be measured experimentally using piezoelectric sensors (bender elements). However, most conventional signal interpretation techniques are based on the visual observation of the output signal and therefore inherently subjective. Objective techniques also exist, like the cross-correlation of the input and output signals, but they lack physical insight, as they rely on the (incorrect) assumption that input and output signals are similar. This paper presents GeoHyTE, the first objective and physically consistent toolbox for the automatic processing of the output signal of bender element sensors. GeoHyTE updates a finite element model of the experiment, iteratively searching for the small strain shear modulus that maximises the correlation between the experimental and numerical output signals. The method is objective, as the results do not depend on the experience of the user, and physically consistent, as the wave propagation process is modelled in full and signals of the same nature (output) are correlated. Moreover, GeoHyTE is nearly insensitive to grossly erroneous input by the user, both in terms of the starting point of the iterative maximisation process and refinement of the finite element model. The results obtained with GeoHyTE are validated against benchmark measurements reported in the literature and experimental data obtained by the authors. A detailed statistical analysis of the results obtained with GeoHyTE and conventional interpretation techniques is also presented.","PeriodicalId":502609,"journal":{"name":"Algorithms","volume":" 30","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140216419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The Impact of Data Preparation and Model Complexity on the Natural Language Classification of Chinese News Headlines 数据准备和模型复杂性对中文新闻标题自然语言分类的影响

Algorithms

Pub Date : 2024-03-22 DOI: 10.3390/a17040132

Torrey Wagner, Dennis Guhl, Brent Langhals

Given the emergence of China as a political and economic power in the 21st century, there is increased interest in analyzing Chinese news articles to better understand developing trends in China. Because of the volume of the material, automating the categorization of Chinese-language news articles by headline text or titles can be an effective way to sort the articles into categories for efficient review. A 383,000-headline dataset labeled with 15 categories from the Toutiao website was evaluated via natural language processing to predict topic categories. The influence of six data preparation variations on the predictive accuracy of four algorithms was studied. The simplest model (Naïve Bayes) achieved 85.1% accuracy on a holdout dataset, while the most complex model (Neural Network using BERT) demonstrated 89.3% accuracy. The most useful data preparation steps were identified, and another goal examined the underlying complexity and computational costs of automating the categorization process. It was discovered the BERT model required 170x more time to train, was slower to predict by a factor of 18,600, and required 27x more disk space to save, indicating it may be the best choice for low-volume applications when the highest accuracy is needed. However, for larger-scale operations where a slight performance degradation is tolerated, the Naïve Bayes algorithm could be the best choice. Nearly one in four records in the Toutiao dataset are duplicates, and this is the first published analysis with duplicates removed.

随着中国在 21 世纪崛起为政治和经济大国，人们对分析中文新闻文章以更好地了解中国发展趋势的兴趣与日俱增。由于资料数量庞大，按标题文本或标题对中文新闻文章进行自动分类是一种有效的方法，可将文章分门别类，以便进行高效审查。通过自然语言处理预测主题类别，对头条新闻网站上标有 15 个类别的 383,000 条标题数据集进行了评估。研究了六种数据准备方式对四种算法预测准确性的影响。最简单的模型（Naïve Bayes）在保留数据集上达到了 85.1% 的准确率，而最复杂的模型（使用 BERT 的神经网络）则达到了 89.3% 的准确率。我们确定了最有用的数据准备步骤，另一个目标是研究自动分类过程的基本复杂性和计算成本。结果发现，BERT 模型需要多 170 倍的时间来训练，预测速度慢 18,600 倍，需要多 27 倍的磁盘空间来保存，这表明它可能是需要最高准确性的低容量应用的最佳选择。不过，对于可以忍受轻微性能下降的大规模操作，奈维贝叶斯算法可能是最佳选择。在头条数据集中，每四条记录中就有近一条是重复的，而这是首次发布的去除重复记录的分析结果。

{"title":"The Impact of Data Preparation and Model Complexity on the Natural Language Classification of Chinese News Headlines","authors":"Torrey Wagner, Dennis Guhl, Brent Langhals","doi":"10.3390/a17040132","DOIUrl":"https://doi.org/10.3390/a17040132","url":null,"abstract":"Given the emergence of China as a political and economic power in the 21st century, there is increased interest in analyzing Chinese news articles to better understand developing trends in China. Because of the volume of the material, automating the categorization of Chinese-language news articles by headline text or titles can be an effective way to sort the articles into categories for efficient review. A 383,000-headline dataset labeled with 15 categories from the Toutiao website was evaluated via natural language processing to predict topic categories. The influence of six data preparation variations on the predictive accuracy of four algorithms was studied. The simplest model (Naïve Bayes) achieved 85.1% accuracy on a holdout dataset, while the most complex model (Neural Network using BERT) demonstrated 89.3% accuracy. The most useful data preparation steps were identified, and another goal examined the underlying complexity and computational costs of automating the categorization process. It was discovered the BERT model required 170x more time to train, was slower to predict by a factor of 18,600, and required 27x more disk space to save, indicating it may be the best choice for low-volume applications when the highest accuracy is needed. However, for larger-scale operations where a slight performance degradation is tolerated, the Naïve Bayes algorithm could be the best choice. Nearly one in four records in the Toutiao dataset are duplicates, and this is the first published analysis with duplicates removed.","PeriodicalId":502609,"journal":{"name":"Algorithms","volume":" 34","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140216163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PDEC: A Framework for Improving Knowledge Graph Reasoning Performance through Predicate Decomposition PDEC：通过谓词分解提高知识图谱推理性能的框架

Algorithms

Pub Date : 2024-03-21 DOI: 10.3390/a17030129

Xin Tian, Yuan Meng

The judicious configuration of predicates is a crucial but often overlooked aspect in the field of knowledge graphs. While previous research has primarily focused on the precision of triples in assessing knowledge graph quality, the rationality of predicates has been largely ignored. This paper introduces an innovative approach aimed at enhancing knowledge graph reasoning by addressing the issue of predicate polysemy. Predicate polysemy refers to instances where a predicate possesses multiple meanings, introducing ambiguity into the knowledge graph. We present an adaptable optimization framework that effectively addresses predicate polysemy, thereby enhancing reasoning capabilities within knowledge graphs. Our approach serves as a versatile and generalized framework applicable to any reasoning model, offering a scalable and flexible solution to enhance performance across various domains and applications. Through rigorous experimental evaluations, we demonstrate the effectiveness and adaptability of our methodology, showing significant improvements in knowledge graph reasoning accuracy. Our findings underscore that discerning predicate polysemy is a crucial step towards achieving a more dependable and efficient knowledge graph reasoning process. Even in the age of large language models, the optimization and induction of predicates remain relevant in ensuring interpretable reasoning.

在知识图谱领域，谓词的合理配置是一个至关重要但却经常被忽视的方面。以往的研究在评估知识图谱质量时主要关注三元组的精确性，而谓词的合理性却在很大程度上被忽视了。本文介绍了一种创新方法，旨在通过解决谓词多义性问题来增强知识图谱推理能力。谓词多义是指一个谓词具有多种含义，从而给知识图谱带来歧义的情况。我们提出了一个可调整的优化框架，它能有效解决谓词多义性问题，从而增强知识图谱的推理能力。我们的方法是一个通用的通用框架，适用于任何推理模型，提供了一个可扩展的灵活解决方案，以提高各个领域和应用的性能。通过严格的实验评估，我们证明了我们方法的有效性和适应性，显示了知识图谱推理准确性的显著提高。我们的研究结果强调，辨别谓词多义性是实现更可靠、更高效的知识图谱推理过程的关键一步。即使在大型语言模型时代，谓词的优化和归纳在确保可解释推理方面仍然具有重要意义。

{"title":"PDEC: A Framework for Improving Knowledge Graph Reasoning Performance through Predicate Decomposition","authors":"Xin Tian, Yuan Meng","doi":"10.3390/a17030129","DOIUrl":"https://doi.org/10.3390/a17030129","url":null,"abstract":"The judicious configuration of predicates is a crucial but often overlooked aspect in the field of knowledge graphs. While previous research has primarily focused on the precision of triples in assessing knowledge graph quality, the rationality of predicates has been largely ignored. This paper introduces an innovative approach aimed at enhancing knowledge graph reasoning by addressing the issue of predicate polysemy. Predicate polysemy refers to instances where a predicate possesses multiple meanings, introducing ambiguity into the knowledge graph. We present an adaptable optimization framework that effectively addresses predicate polysemy, thereby enhancing reasoning capabilities within knowledge graphs. Our approach serves as a versatile and generalized framework applicable to any reasoning model, offering a scalable and flexible solution to enhance performance across various domains and applications. Through rigorous experimental evaluations, we demonstrate the effectiveness and adaptability of our methodology, showing significant improvements in knowledge graph reasoning accuracy. Our findings underscore that discerning predicate polysemy is a crucial step towards achieving a more dependable and efficient knowledge graph reasoning process. Even in the age of large language models, the optimization and induction of predicates remain relevant in ensuring interpretable reasoning.","PeriodicalId":502609,"journal":{"name":"Algorithms","volume":"62 1‐2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140223141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Comprehensive Brain MRI Image Segmentation System Based on Contourlet Transform and Deep Neural Networks 基于轮廓变换和深度神经网络的脑磁共振成像综合图像分割系统

Algorithms

Pub Date : 2024-03-21 DOI: 10.3390/a17030130

Navid Khalili Dizaji, Mustafa Doğan

Brain tumors are one of the deadliest types of cancer. Rapid and accurate identification of brain tumors, followed by appropriate surgical intervention or chemotherapy, increases the probability of survival. Accurate determination of brain tumors in MRI scans determines the exact location of surgical intervention or chemotherapy. However, this accurate segmentation of brain tumors, due to their diverse morphologies in MRI scans, poses challenges that require significant expertise and accuracy in image interpretation. Despite significant advances in this field, there are several barriers to proper data collection, particularly in the medical sciences, due to concerns about the confidentiality of patient information. However, research papers for learning systems and proposed networks often rely on standardized datasets because a specific approach is unavailable. This system combines unsupervised learning in the adversarial generative network component with supervised learning in segmentation networks. The system is fully automated and can be applied to tumor segmentation on various datasets, including those with sparse data. In order to improve the learning process, the brain MRI segmentation network is trained using a generative adversarial network to increase the number of images. The U-Net model was employed during the segmentation step to combine the remaining blocks efficiently. Contourlet transform produces the ground truth for each MRI image obtained from the adversarial generator network and the original images in the processing and mask preparation phase. On the part of the adversarial generator network, high-quality images are produced, the results of which are similar to the histogram of the original images. Finally, this system improves the image segmentation performance by combining the remaining blocks with the U-net network. Segmentation is evaluated using brain magnetic resonance images obtained from Istanbul Medipol Hospital. The results show that the proposed method and image segmentation network, which incorporates several criteria, such as the DICE criterion of 0.9434, can be effectively used in any dataset as a fully automatic system for segmenting different brain MRI images.

脑肿瘤是最致命的癌症之一。快速准确地识别脑肿瘤，然后进行适当的手术治疗或化疗，可以提高患者的生存概率。磁共振成像扫描中对脑肿瘤的准确判断决定了手术干预或化疗的确切位置。然而，由于脑肿瘤在核磁共振成像扫描中的形态各异，要对其进行准确的分割，需要大量的专业知识和准确的图像解读。尽管在这一领域取得了重大进展，但由于担心病人信息的保密性，适当的数据收集仍存在一些障碍，尤其是在医学科学领域。然而，由于没有特定的方法，学习系统和拟议网络的研究论文往往依赖于标准化的数据集。该系统将对抗生成网络组件中的无监督学习与分割网络中的有监督学习相结合。该系统是全自动的，可应用于各种数据集（包括数据稀疏的数据集）上的肿瘤分割。为了改进学习过程，脑磁共振成像分割网络使用对抗生成网络进行训练，以增加图像数量。在分割步骤中采用了 U-Net 模型，以有效地组合剩余区块。在处理和掩膜准备阶段，对抗生成器网络和原始图像获得的每张 MRI 图像的轮廓变换都会产生地面实况。对抗生成器网络可生成高质量图像，其结果与原始图像的直方图相似。最后，该系统通过将剩余区块与 U-net 网络相结合，提高了图像分割性能。我们使用从伊斯坦布尔 Medipol 医院获得的脑磁共振图像对分割效果进行了评估。结果表明，建议的方法和图像分割网络结合了多个标准，如 0.9434 的 DICE 标准，可以有效地用于任何数据集，成为分割不同脑磁共振图像的全自动系统。

{"title":"A Comprehensive Brain MRI Image Segmentation System Based on Contourlet Transform and Deep Neural Networks","authors":"Navid Khalili Dizaji, Mustafa Doğan","doi":"10.3390/a17030130","DOIUrl":"https://doi.org/10.3390/a17030130","url":null,"abstract":"Brain tumors are one of the deadliest types of cancer. Rapid and accurate identification of brain tumors, followed by appropriate surgical intervention or chemotherapy, increases the probability of survival. Accurate determination of brain tumors in MRI scans determines the exact location of surgical intervention or chemotherapy. However, this accurate segmentation of brain tumors, due to their diverse morphologies in MRI scans, poses challenges that require significant expertise and accuracy in image interpretation. Despite significant advances in this field, there are several barriers to proper data collection, particularly in the medical sciences, due to concerns about the confidentiality of patient information. However, research papers for learning systems and proposed networks often rely on standardized datasets because a specific approach is unavailable. This system combines unsupervised learning in the adversarial generative network component with supervised learning in segmentation networks. The system is fully automated and can be applied to tumor segmentation on various datasets, including those with sparse data. In order to improve the learning process, the brain MRI segmentation network is trained using a generative adversarial network to increase the number of images. The U-Net model was employed during the segmentation step to combine the remaining blocks efficiently. Contourlet transform produces the ground truth for each MRI image obtained from the adversarial generator network and the original images in the processing and mask preparation phase. On the part of the adversarial generator network, high-quality images are produced, the results of which are similar to the histogram of the original images. Finally, this system improves the image segmentation performance by combining the remaining blocks with the U-net network. Segmentation is evaluated using brain magnetic resonance images obtained from Istanbul Medipol Hospital. The results show that the proposed method and image segmentation network, which incorporates several criteria, such as the DICE criterion of 0.9434, can be effectively used in any dataset as a fully automatic system for segmenting different brain MRI images.","PeriodicalId":502609,"journal":{"name":"Algorithms","volume":"91 s1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140223416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On the Need for Accurate Brushstroke Segmentation of Tablet-Acquired Kinematic and Pressure Data: The Case of Unconstrained Tracing 对平板电脑获取的运动学和压力数据进行精确笔触分割的必要性：无约束追踪案例

Algorithms

Pub Date : 2024-03-20 DOI: 10.3390/a17030128

Karly S. Franz, Grace Reszetnik, Tom Chau

Brushstroke segmentation algorithms are critical in computer-based analysis of fine motor control via handwriting, drawing, or tracing tasks. Current segmentation approaches typically rely only on one type of feature, either spatial, temporal, kinematic, or pressure. We introduce a segmentation algorithm that leverages both spatiotemporal and pressure features to accurately identify brushstrokes during a tracing task. The algorithm was tested on both a clinical and validation dataset. Using validation trials with incorrectly identified brushstrokes, we evaluated the impact of segmentation errors on commonly derived biomechanical features used in the literature to detect graphomotor pathologies. The algorithm exhibited robust performance on validation and clinical datasets, effectively identifying brushstrokes while simultaneously eliminating spurious, noisy data. Spatial and temporal features were most affected by incorrect segmentation, particularly those related to the distance between brushstrokes and in-air time, which experienced propagated errors of 99% and 95%, respectively. In contrast, kinematic features, such as velocity and acceleration, were minimally affected, with propagated errors between 0 to 12%. The proposed algorithm may help improve brushstroke segmentation in future studies of handwriting, drawing, or tracing tasks. Spatial and temporal features derived from tablet-acquired data should be considered with caution, given their sensitivity to segmentation errors and instrumentation characteristics.

笔触分割算法对于通过手写、绘画或描摹任务进行基于计算机的精细运动控制分析至关重要。目前的分割方法通常只依赖一种类型的特征，即空间、时间、运动或压力特征。我们介绍了一种利用时空和压力特征的分割算法，可在描画任务中准确识别笔触。该算法在临床和验证数据集上进行了测试。通过使用识别错误的笔画进行验证试验，我们评估了分割错误对文献中用于检测图形运动病理学的常见生物力学特征的影响。该算法在验证和临床数据集上表现出强劲的性能，在有效识别笔触的同时，还能消除虚假的噪声数据。空间和时间特征受错误分割的影响最大，尤其是与笔画间距离和空气中时间相关的特征，其传播误差分别为 99% 和 95%。相比之下，速度和加速度等运动学特征受到的影响最小，传播误差在 0 到 12% 之间。在未来的手写、绘画或描摹任务研究中，所提出的算法可能有助于改进笔触分割。从平板电脑获取的数据中得出的空间和时间特征应谨慎考虑，因为它们对分割误差和仪器特性非常敏感。

{"title":"On the Need for Accurate Brushstroke Segmentation of Tablet-Acquired Kinematic and Pressure Data: The Case of Unconstrained Tracing","authors":"Karly S. Franz, Grace Reszetnik, Tom Chau","doi":"10.3390/a17030128","DOIUrl":"https://doi.org/10.3390/a17030128","url":null,"abstract":"Brushstroke segmentation algorithms are critical in computer-based analysis of fine motor control via handwriting, drawing, or tracing tasks. Current segmentation approaches typically rely only on one type of feature, either spatial, temporal, kinematic, or pressure. We introduce a segmentation algorithm that leverages both spatiotemporal and pressure features to accurately identify brushstrokes during a tracing task. The algorithm was tested on both a clinical and validation dataset. Using validation trials with incorrectly identified brushstrokes, we evaluated the impact of segmentation errors on commonly derived biomechanical features used in the literature to detect graphomotor pathologies. The algorithm exhibited robust performance on validation and clinical datasets, effectively identifying brushstrokes while simultaneously eliminating spurious, noisy data. Spatial and temporal features were most affected by incorrect segmentation, particularly those related to the distance between brushstrokes and in-air time, which experienced propagated errors of 99% and 95%, respectively. In contrast, kinematic features, such as velocity and acceleration, were minimally affected, with propagated errors between 0 to 12%. The proposed algorithm may help improve brushstroke segmentation in future studies of handwriting, drawing, or tracing tasks. Spatial and temporal features derived from tablet-acquired data should be considered with caution, given their sensitivity to segmentation errors and instrumentation characteristics.","PeriodicalId":502609,"journal":{"name":"Algorithms","volume":"360 11","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140228083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fast Algorithm for High-Throughput Screening Scheduling Based on the PERT/CPM Project Management Technique 基于 PERT/CPM 项目管理技术的高通量筛选调度快速算法

Algorithms

Pub Date : 2024-03-19 DOI: 10.3390/a17030127

Eugene Levner, V. Kats, Pengyu Yan, Ada Che

High-throughput screening systems are robotic cells that automatically scan and analyze thousands of biochemical samples and reagents in real time. The problem under consideration is to find an optimal cyclic schedule of robot moves that ensures maximum cell performance. To address this issue, we proposed a new efficient version of the parametric PERT/CPM project management method that works in conjunction with a combinatorial subalgorithm capable of rejecting unfeasible schedules. The main result obtained is that the new fast PERT/CPM method finds optimal robust schedules for solving large size problems in strongly polynomial time, which cannot be achieved using existing algorithms.

高通量筛选系统是自动扫描和实时分析数千种生化样本和试剂的机器人单元。所要解决的问题是找到一个机器人移动的最佳循环时间表，以确保细胞发挥最大性能。为了解决这个问题，我们提出了一种新的参数 PERT/CPM 项目管理方法的高效版本，该方法与能够剔除不可行计划的组合子算法相结合。获得的主要结果是，新的快速 PERT/CPM 方法能在强多项式时间内找到解决大型问题的最佳稳健计划，而现有算法无法实现这一点。

引用次数: 0