首页 > 最新文献

Journal of Computational Mathematics and Data Science最新文献

英文 中文
Efficiency of the multisection method 多分段法的效率
Pub Date : 2024-11-02 DOI: 10.1016/j.jcmds.2024.100106
J.S.C. Prentice
We study the efficiency of the multisection method for univariate nonlinear equations, relative to that for the well-known bisection method. We show that there is a minimal effort algorithm that uses more sections than the bisection method, although this optimal algorithm is problem dependent. The number of sections required for optimality is determined by means of a Lambert W function.
我们研究了单变量非线性方程的多分段法与著名的分段法相比的效率。我们的研究表明,有一种最省力的算法可以使用比分段法更多的分段,尽管这种最优算法与问题有关。最优化所需的截面数是通过兰伯特 W 函数确定的。
{"title":"Efficiency of the multisection method","authors":"J.S.C. Prentice","doi":"10.1016/j.jcmds.2024.100106","DOIUrl":"10.1016/j.jcmds.2024.100106","url":null,"abstract":"<div><div>We study the efficiency of the multisection method for univariate nonlinear equations, relative to that for the well-known bisection method. We show that there is a minimal effort algorithm that uses more sections than the bisection method, although this optimal algorithm is problem dependent. The number of sections required for optimality is determined by means of a Lambert <em>W</em> function.</div></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"13 ","pages":"Article 100106"},"PeriodicalIF":0.0,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142577799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian optimization of one-dimensional convolutional neural networks (1D CNN) for early diagnosis of Autistic Spectrum Disorder 贝叶斯优化一维卷积神经网络 (1D CNN),用于自闭症谱系障碍的早期诊断
Pub Date : 2024-10-19 DOI: 10.1016/j.jcmds.2024.100105
Temidayo Oluwatosin Omotehinwa , Morolake Oladayo Lawrence , David Opeoluwa Oyewola , Emmanuel Gbenga Dada
Autistic Spectrum Disorder (ASD) is a challenging neurological development disorder, which involves poor social interaction, communication, and repetitive behaviours. If autism is identified early enough it can be treated with better outcomes but present diagnostic tests are dependent on subjective opinion, consume a lot of time, and are vague. This study is aimed at optimizing one-dimensional convolutional neural networks (1D CNN) to improve the precision and speed of early ASD diagnosis. Four ASD datasets representing different age groups — toddlers, children, adolescents, and adults were modelled using one-dimensional convolutional neural networks (1D CNN). These datasets are accessible to the public on the UCI Machine Learning Repository and Kaggle, they consist of behavioural features relevant to ASD diagnosis. Each dataset underwent feature selection, categorical encoding, and missing value handling. Then, baseline 1D CNN with predefined hyperparameters was modelled on each of the datasets. Subsequently, the baseline models were optimized using the Tree-structured Parzen Estimator (TPE). An interactive web-based ASD diagnostic tool was developed, where user inputs are processed through age-specific pre-trained optimized models to determine ASD probability. The optimized 1D CNN models significantly outperformed the baseline models across all age groups and achieved scores of 100% in accuracy, precision, recall, F1-score, MCC, and AUC ROC. This implies that the optimized models can reliably identify people in various age groups who have and do not have ASD. The development of an interactive web-based diagnostic tool extends the practical utility of the models, making them accessible for clinical and at-home use.
自闭症(ASD)是一种具有挑战性的神经发育障碍,表现为社交、沟通和重复行为不良。如果能及早发现自闭症,治疗效果会更好,但目前的诊断测试依赖于主观意见,耗费大量时间,而且模糊不清。本研究旨在优化一维卷积神经网络(1D CNN),以提高早期 ASD 诊断的准确性和速度。我们使用一维卷积神经网络(1D CNN)对代表不同年龄组(幼儿、儿童、青少年和成人)的四个 ASD 数据集进行了建模。这些数据集可在 UCI 机器学习资料库和 Kaggle 上向公众开放,它们包含与 ASD 诊断相关的行为特征。每个数据集都经过了特征选择、分类编码和缺失值处理。然后,使用预定义的超参数在每个数据集上建立基线 1D CNN 模型。随后,使用树状结构帕尔森估计器(TPE)对基线模型进行了优化。我们开发了一种基于网络的交互式 ASD 诊断工具,通过预先训练的特定年龄优化模型来处理用户输入,从而确定 ASD 的概率。优化后的一维 CNN 模型在所有年龄组中的表现都明显优于基线模型,在准确率、精确度、召回率、F1 分数、MCC 和 AUC ROC 方面均达到了 100%。这意味着优化后的模型可以可靠地识别出不同年龄组中患有或未患有 ASD 的人群。基于网络的交互式诊断工具的开发扩展了模型的实用性,使其可以在临床和家庭中使用。
{"title":"Bayesian optimization of one-dimensional convolutional neural networks (1D CNN) for early diagnosis of Autistic Spectrum Disorder","authors":"Temidayo Oluwatosin Omotehinwa ,&nbsp;Morolake Oladayo Lawrence ,&nbsp;David Opeoluwa Oyewola ,&nbsp;Emmanuel Gbenga Dada","doi":"10.1016/j.jcmds.2024.100105","DOIUrl":"10.1016/j.jcmds.2024.100105","url":null,"abstract":"<div><div>Autistic Spectrum Disorder (ASD) is a challenging neurological development disorder, which involves poor social interaction, communication, and repetitive behaviours. If autism is identified early enough it can be treated with better outcomes but present diagnostic tests are dependent on subjective opinion, consume a lot of time, and are vague. This study is aimed at optimizing one-dimensional convolutional neural networks (1D CNN) to improve the precision and speed of early ASD diagnosis. Four ASD datasets representing different age groups — toddlers, children, adolescents, and adults were modelled using one-dimensional convolutional neural networks (1D CNN). These datasets are accessible to the public on the UCI Machine Learning Repository and Kaggle, they consist of behavioural features relevant to ASD diagnosis. Each dataset underwent feature selection, categorical encoding, and missing value handling. Then, baseline 1D CNN with predefined hyperparameters was modelled on each of the datasets. Subsequently, the baseline models were optimized using the Tree-structured Parzen Estimator (TPE). An interactive web-based ASD diagnostic tool was developed, where user inputs are processed through age-specific pre-trained optimized models to determine ASD probability. The optimized 1D CNN models significantly outperformed the baseline models across all age groups and achieved scores of 100% in accuracy, precision, recall, F1-score, MCC, and AUC ROC. This implies that the optimized models can reliably identify people in various age groups who have and do not have ASD. The development of an interactive web-based diagnostic tool extends the practical utility of the models, making them accessible for clinical and at-home use.</div></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"13 ","pages":"Article 100105"},"PeriodicalIF":0.0,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142529566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Novel color space representation extracted by NMF to segment a color image 用 NMF 提取的新颖色彩空间表示法分割彩色图像
Pub Date : 2024-10-18 DOI: 10.1016/j.jcmds.2024.100104
Ciro Castiello , Nicoletta Del Buono , Flavia Esposito
This paper considers the task of separating pixels in color image into background and foreground classes. Using the machine learning technique known as Nonnegative Matrix Factorization, data pertaining to different color channels – selected by color spaces – are combined, and a novel space representation is extracted.
The novel representation of the image includes additional information, namely “metacolor”, which could be related to foreground and background and adopted to improve binary segmentation of the investigated image. In both qualitative and quantitative experiments, the use of novel color space representation produces some improvements in the binary segmentation results when it compared to those obtained applying common simpler thresholding algorithms directly to the original image.
本文探讨了将彩色图像中的像素分为背景和前景两类的任务。图像的新表示法包括额外的信息,即 "元颜色"(metacolor),这些信息可能与前景和背景相关,并可用于改进所研究图像的二元分割。在定性和定量实验中,与直接对原始图像应用普通的简单阈值算法相比,使用新颖的色彩空间表示法可在二值分割结果上产生一些改进。
{"title":"Novel color space representation extracted by NMF to segment a color image","authors":"Ciro Castiello ,&nbsp;Nicoletta Del Buono ,&nbsp;Flavia Esposito","doi":"10.1016/j.jcmds.2024.100104","DOIUrl":"10.1016/j.jcmds.2024.100104","url":null,"abstract":"<div><div>This paper considers the task of separating pixels in color image into background and foreground classes. Using the machine learning technique known as Nonnegative Matrix Factorization, data pertaining to different color channels – selected by color spaces – are combined, and a novel space representation is extracted.</div><div>The novel representation of the image includes additional information, namely “metacolor”, which could be related to foreground and background and adopted to improve binary segmentation of the investigated image. In both qualitative and quantitative experiments, the use of novel color space representation produces some improvements in the binary segmentation results when it compared to those obtained applying common simpler thresholding algorithms directly to the original image.</div></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"13 ","pages":"Article 100104"},"PeriodicalIF":0.0,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142529565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhanced MRI brain tumor detection and classification via topological data analysis and low-rank tensor decomposition 通过拓扑数据分析和低阶张量分解增强磁共振成像脑肿瘤检测和分类能力
Pub Date : 2024-10-03 DOI: 10.1016/j.jcmds.2024.100103
Serena Grazia De Benedictis , Grazia Gargano , Gaetano Settembre
The advent of artificial intelligence in medical imaging has paved the way for significant advancements in the diagnosis of brain tumors. This study presents a novel ensemble approach that uses magnetic resonance imaging (MRI) to identify and categorize common brain cancers, such as pituitary, meningioma, and glioma. The proposed workflow is composed of a two-fold approach: firstly, it employs non-trivial image enhancement techniques in data preprocessing, low-rank Tucker decomposition for dimensionality reduction, and machine learning (ML) classifiers to detect and predict the type of brain tumor. Secondly, persistent homology (PH), a topological data analysis (TDA) technique, is exploited to extract potential critical areas in MRI scans. When paired with the ML classifier output, this additional information can help domain experts to identify areas of interest that might contain tumor signatures, improving the interpretability of ML predictions. When compared to automated diagnoses, this transparency adds another level of confidence and is essential for clinical acceptance. The performance of the system was quantitatively evaluated on a well-known MRI dataset, with an overall classification accuracy of 97.28% using an extremely randomized trees model. The promising results show that the integration of TDA, ML, and low-rank approximation methods is a successful approach for brain tumor identification and categorization, providing a solid foundation for further study and clinical application.
医学成像领域人工智能的出现为脑肿瘤诊断的重大进展铺平了道路。本研究提出了一种新颖的组合方法,利用磁共振成像(MRI)来识别和分类垂体瘤、脑膜瘤和胶质瘤等常见脑癌。所提出的工作流程由两方面的方法组成:首先,它在数据预处理中采用了非琐碎的图像增强技术、用于降维的低秩塔克分解以及机器学习(ML)分类器来检测和预测脑肿瘤的类型。其次,利用拓扑数据分析(TDA)技术 "持久同源性"(PH)提取磁共振成像扫描中的潜在关键区域。当与 ML 分类器输出配对时,这些附加信息可以帮助领域专家识别可能包含肿瘤特征的感兴趣区域,从而提高 ML 预测的可解释性。与自动诊断相比,这种透明度增加了另一个层次的信心,对临床接受度至关重要。该系统的性能在一个著名的磁共振成像数据集上进行了定量评估,使用极随机树模型的总体分类准确率为 97.28%。这些令人鼓舞的结果表明,TDA、ML 和低阶近似方法的集成是脑肿瘤识别和分类的一种成功方法,为进一步研究和临床应用奠定了坚实的基础。
{"title":"Enhanced MRI brain tumor detection and classification via topological data analysis and low-rank tensor decomposition","authors":"Serena Grazia De Benedictis ,&nbsp;Grazia Gargano ,&nbsp;Gaetano Settembre","doi":"10.1016/j.jcmds.2024.100103","DOIUrl":"10.1016/j.jcmds.2024.100103","url":null,"abstract":"<div><div>The advent of artificial intelligence in medical imaging has paved the way for significant advancements in the diagnosis of brain tumors. This study presents a novel ensemble approach that uses magnetic resonance imaging (MRI) to identify and categorize common brain cancers, such as pituitary, meningioma, and glioma. The proposed workflow is composed of a two-fold approach: firstly, it employs non-trivial image enhancement techniques in data preprocessing, low-rank Tucker decomposition for dimensionality reduction, and machine learning (ML) classifiers to detect and predict the type of brain tumor. Secondly, persistent homology (PH), a topological data analysis (TDA) technique, is exploited to extract potential critical areas in MRI scans. When paired with the ML classifier output, this additional information can help domain experts to identify areas of interest that might contain tumor signatures, improving the interpretability of ML predictions. When compared to automated diagnoses, this transparency adds another level of confidence and is essential for clinical acceptance. The performance of the system was quantitatively evaluated on a well-known MRI dataset, with an overall classification accuracy of 97.28% using an extremely randomized trees model. The promising results show that the integration of TDA, ML, and low-rank approximation methods is a successful approach for brain tumor identification and categorization, providing a solid foundation for further study and clinical application.</div></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"13 ","pages":"Article 100103"},"PeriodicalIF":0.0,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Artifact removal from ECG signals using online recursive independent component analysis 利用在线递归独立成分分析去除心电信号中的伪影
Pub Date : 2024-09-27 DOI: 10.1016/j.jcmds.2024.100102
K. Gunasekaran , V.D. Ambeth Kumar , Mary Judith A.
The diagnosis of cardiac abnormalities and monitoring of heart health heavily rely on Electrocardiogram (ECG) signals. Unfortunately, these signals frequently encounter interference from diverse artifacts, impeding precise interpretation and analysis. To overcome this challenge, we suggest a novel method for real-time artifact removal from ECG signals through the utilization of Online Recursive Independent Component Analysis (ORICA). Our study outlines a systematic preprocessing pipeline, adaptively estimating the mixing matrix and demixing matrix of the ICA model while streaming data is processed. Additionally, we explore the selection of appropriate ICA components and the use of relevant feature extraction techniques to enhance the quality of extracted cardiac signals. This research presents a promising solution for removing artifacts from ECG signals in real-time, paving the way for improved cardiac diagnostics and monitoring systems. Comparative analyses demonstrate significant improvements in the accuracy of subsequent ECG analysis and interpretation following the application of our ORICA-based preprocessing.
诊断心脏异常和监测心脏健康在很大程度上依赖于心电图(ECG)信号。遗憾的是,这些信号经常会受到各种伪影的干扰,妨碍了精确的解读和分析。为了克服这一挑战,我们提出了一种新方法,利用在线递归独立成分分析(ORICA)实时去除心电信号中的伪影。我们的研究概述了一个系统化的预处理管道,在处理流数据的同时自适应地估计 ICA 模型的混合矩阵和去混合矩阵。此外,我们还探讨了如何选择合适的 ICA 分量,以及如何使用相关的特征提取技术来提高提取的心脏信号的质量。这项研究为实时去除心电信号中的伪影提供了一个前景广阔的解决方案,为改进心脏诊断和监测系统铺平了道路。对比分析表明,在应用基于 ORICA 的预处理后,后续心电图分析和解读的准确性有了显著提高。
{"title":"Artifact removal from ECG signals using online recursive independent component analysis","authors":"K. Gunasekaran ,&nbsp;V.D. Ambeth Kumar ,&nbsp;Mary Judith A.","doi":"10.1016/j.jcmds.2024.100102","DOIUrl":"10.1016/j.jcmds.2024.100102","url":null,"abstract":"<div><div>The diagnosis of cardiac abnormalities and monitoring of heart health heavily rely on Electrocardiogram (ECG) signals. Unfortunately, these signals frequently encounter interference from diverse artifacts, impeding precise interpretation and analysis. To overcome this challenge, we suggest a novel method for real-time artifact removal from ECG signals through the utilization of Online Recursive Independent Component Analysis (ORICA). Our study outlines a systematic preprocessing pipeline, adaptively estimating the mixing matrix and demixing matrix of the ICA model while streaming data is processed. Additionally, we explore the selection of appropriate ICA components and the use of relevant feature extraction techniques to enhance the quality of extracted cardiac signals. This research presents a promising solution for removing artifacts from ECG signals in real-time, paving the way for improved cardiac diagnostics and monitoring systems. Comparative analyses demonstrate significant improvements in the accuracy of subsequent ECG analysis and interpretation following the application of our ORICA-based preprocessing.</div></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"13 ","pages":"Article 100102"},"PeriodicalIF":0.0,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging feed-forward neural networks to enhance the hybrid block derivative methods for system of second-order ordinary differential equations 利用前馈神经网络增强二阶常微分方程系统的混合分块导数法
Pub Date : 2024-09-20 DOI: 10.1016/j.jcmds.2024.100101
Sabastine Emmanuel , Saratha Sathasivam , Muideen O. Ogunniran
This study introduces an innovative method combining discrete hybrid block techniques and artificial intelligence to enhance the solution of second-order Ordinary Differential Equations (ODEs). By integrating feed-forward neural networks (FFNN) into the hybrid block derivative method (HBDM), the modified approach shows improved accuracy and efficiency compared to traditional methods. Through comprehensive comparisons with exact and existing solutions, the study demonstrates the effectiveness of the proposed approach. The evaluation, utilizing root mean square error (RMSE), confirms its superior performance, robustness, and applicability in diverse scenarios. This research sets a new standard for solving complex ODE systems, offering promising avenues for future research and practical implementations.
本研究介绍了一种结合离散混合块技术和人工智能的创新方法,以提高二阶常微分方程(ODE)的求解能力。通过将前馈神经网络(FFNN)集成到混合分块导数法(HBDM)中,与传统方法相比,改进后的方法显示出更高的精度和效率。通过与精确解法和现有解法的综合比较,该研究证明了所提方法的有效性。利用均方根误差 (RMSE) 进行的评估证实了该方法的卓越性能、稳健性和在各种情况下的适用性。这项研究为解决复杂的 ODE 系统设定了新的标准,为未来的研究和实际应用提供了广阔的前景。
{"title":"Leveraging feed-forward neural networks to enhance the hybrid block derivative methods for system of second-order ordinary differential equations","authors":"Sabastine Emmanuel ,&nbsp;Saratha Sathasivam ,&nbsp;Muideen O. Ogunniran","doi":"10.1016/j.jcmds.2024.100101","DOIUrl":"10.1016/j.jcmds.2024.100101","url":null,"abstract":"<div><div>This study introduces an innovative method combining discrete hybrid block techniques and artificial intelligence to enhance the solution of second-order Ordinary Differential Equations (ODEs). By integrating feed-forward neural networks (FFNN) into the hybrid block derivative method (HBDM), the modified approach shows improved accuracy and efficiency compared to traditional methods. Through comprehensive comparisons with exact and existing solutions, the study demonstrates the effectiveness of the proposed approach. The evaluation, utilizing root mean square error (RMSE), confirms its superior performance, robustness, and applicability in diverse scenarios. This research sets a new standard for solving complex ODE systems, offering promising avenues for future research and practical implementations.</div></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"13 ","pages":"Article 100101"},"PeriodicalIF":0.0,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142323259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On resolution coresets for constrained clustering 关于约束聚类的分辨率核心集
Pub Date : 2024-09-01 DOI: 10.1016/j.jcmds.2024.100100
Maximilian Fiedler, Peter Gritzmann, Fabian Klemm

Specific data compression techniques, formalized by the concept of coresets, proved to be powerful for many optimization problems. In fact, while tightly controlling the approximation error, coresets may lead to significant speed up of the computations and hence allow to extend algorithms to much larger problem sizes. The present paper deals with a weight-balanced clustering problem, and is specifically motivated by an application in materials science where a voxel-based image is to be processed into a diagram representation. Here, the class of desired coresets is naturally confined to those which can be viewed as lowering the resolution of the input data. While one might expect that such resolution coresets are inferior to unrestricted coreset we prove bounds for resolution coresets which improve known bounds in the relevant dimensions and also lead to significantly faster algorithms in practice.

事实证明,以核心集概念为形式的特定数据压缩技术对许多优化问题都非常有效。事实上,在严格控制近似误差的同时,核心集可以显著加快计算速度,从而将算法扩展到更大的问题规模。本文讨论的是权重平衡聚类问题,其具体动机来自材料科学中的一个应用,即把基于体素的图像处理成图表表示。在这里,所需的核心集类别自然仅限于那些可被视为降低输入数据分辨率的核心集。虽然人们可能会认为这种分辨率核心集不如无限制核心集,但我们证明了分辨率核心集的边界,这改进了相关维度中的已知边界,并在实践中大大加快了算法的速度。
{"title":"On resolution coresets for constrained clustering","authors":"Maximilian Fiedler,&nbsp;Peter Gritzmann,&nbsp;Fabian Klemm","doi":"10.1016/j.jcmds.2024.100100","DOIUrl":"10.1016/j.jcmds.2024.100100","url":null,"abstract":"<div><p>Specific data compression techniques, formalized by the concept of coresets, proved to be powerful for many optimization problems. In fact, while tightly controlling the approximation error, coresets may lead to significant speed up of the computations and hence allow to extend algorithms to much larger problem sizes. The present paper deals with a weight-balanced clustering problem, and is specifically motivated by an application in materials science where a voxel-based image is to be processed into a diagram representation. Here, the class of desired coresets is naturally confined to those which can be viewed as lowering the resolution of the input data. While one might expect that such resolution coresets are inferior to unrestricted coreset we prove bounds for resolution coresets which improve known bounds in the relevant dimensions and also lead to significantly faster algorithms in practice.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"12 ","pages":"Article 100100"},"PeriodicalIF":0.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772415824000117/pdfft?md5=119df73da5369d09083c391d94764956&pid=1-s2.0-S2772415824000117-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142150297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast empirical scenarios 快速经验方案
Pub Date : 2024-09-01 DOI: 10.1016/j.jcmds.2024.100099
Michael Multerer , Paul Schneider , Rohan Sen

We seek to extract a small number of representative scenarios from large panel data that are consistent with sample moments. Among two novel algorithms, the first identifies scenarios that have not been observed before, and comes with a scenario-based representation of covariance matrices. The second proposal selects important data points from states of the world that have already realized, and are consistent with higher-order sample moment information. Both algorithms are efficient to compute and lend themselves to consistent scenario-based modeling and multi-dimensional numerical integration that can be used for interpretable decision-making under uncertainty. Extensive numerical benchmarking studies and an application in portfolio optimization favor the proposed algorithms.

我们试图从大型面板数据中提取少量与样本矩一致的代表性情景。在两种新颖的算法中,第一种算法能识别以前未观察到的情景,并提供基于情景的协方差矩阵表示。第二种建议是从已经实现的世界状态中选择重要数据点,并与高阶样本矩信息保持一致。这两种算法的计算效率都很高,并适合于基于情景的一致建模和多维数值积分,可用于不确定情况下的可解释决策。广泛的数值基准研究和在投资组合优化中的应用都有利于所提出的算法。
{"title":"Fast empirical scenarios","authors":"Michael Multerer ,&nbsp;Paul Schneider ,&nbsp;Rohan Sen","doi":"10.1016/j.jcmds.2024.100099","DOIUrl":"10.1016/j.jcmds.2024.100099","url":null,"abstract":"<div><p>We seek to extract a small number of representative scenarios from large panel data that are consistent with sample moments. Among two novel algorithms, the first identifies scenarios that have not been observed before, and comes with a scenario-based representation of covariance matrices. The second proposal selects important data points from states of the world that have already realized, and are consistent with higher-order sample moment information. Both algorithms are efficient to compute and lend themselves to consistent scenario-based modeling and multi-dimensional numerical integration that can be used for interpretable decision-making under uncertainty. Extensive numerical benchmarking studies and an application in portfolio optimization favor the proposed algorithms.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"12 ","pages":"Article 100099"},"PeriodicalIF":0.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772415824000105/pdfft?md5=701519346db6f93b6f348d8512c143fa&pid=1-s2.0-S2772415824000105-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142150298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating data complexity and drift through a multiscale generalized impurity approach 通过多尺度广义杂质法估算数据复杂性和漂移
Pub Date : 2024-08-26 DOI: 10.1016/j.jcmds.2024.100098
Diogo Costa , Eugénio M. Rocha , Nelson Ferreira

The quality of machine learning solutions, and of classifier models in general, depend largely on the performance of the chosen algorithm, and on the intrinsic characteristics of the input data. Although work has been extensive on the former of these aspects, the latter has received comparably less attention. In this paper, we introduce the Multiscale Impurity Complexity Analysis (MICA) algorithm for the quantification of class separability and decision-boundary complexity of datasets. MICA is both model and dimensionality-independent and can provide a measure of separability based on regional impurity values. This makes it so that MICA is sensible to both global and local data conditions. We show MICA to be capable of properly describing class separability in a comprehensive set of both synthetic and real datasets and comparing it against other state-of-the-art methods. After establishing the robustness of the proposed method, alternative applications are discussed, including a streaming-data variant of MICA (MICA-S), that can be repurposed into a model-independent method for concept drift detection.

机器学习解决方案以及分类器模型的质量在很大程度上取决于所选算法的性能以及输入数据的内在特征。尽管在前者方面已经开展了大量工作,但后者受到的关注却相对较少。本文介绍了多尺度杂质复杂性分析(MICA)算法,用于量化数据集的类别可分性和决策边界复杂性。MICA 与模型和维度无关,可以提供基于区域杂质值的可分性度量。这使得 MICA 对全局和局部数据条件都很敏感。我们展示了 MICA 能够在一组综合的合成和真实数据集中正确描述类的可分性,并将其与其他最先进的方法进行了比较。在确定了所提方法的鲁棒性之后,我们还讨论了其他应用,包括 MICA 的流数据变体(MICA-S),该变体可用于独立于模型的概念漂移检测方法。
{"title":"Estimating data complexity and drift through a multiscale generalized impurity approach","authors":"Diogo Costa ,&nbsp;Eugénio M. Rocha ,&nbsp;Nelson Ferreira","doi":"10.1016/j.jcmds.2024.100098","DOIUrl":"10.1016/j.jcmds.2024.100098","url":null,"abstract":"<div><p>The quality of machine learning solutions, and of classifier models in general, depend largely on the performance of the chosen algorithm, and on the intrinsic characteristics of the input data. Although work has been extensive on the former of these aspects, the latter has received comparably less attention. In this paper, we introduce the Multiscale Impurity Complexity Analysis (MICA) algorithm for the quantification of class separability and decision-boundary complexity of datasets. MICA is both model and dimensionality-independent and can provide a measure of separability based on regional impurity values. This makes it so that MICA is sensible to both global and local data conditions. We show MICA to be capable of properly describing class separability in a comprehensive set of both synthetic and real datasets and comparing it against other state-of-the-art methods. After establishing the robustness of the proposed method, alternative applications are discussed, including a streaming-data variant of MICA (MICA-S), that can be repurposed into a model-independent method for concept drift detection.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"12 ","pages":"Article 100098"},"PeriodicalIF":0.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772415824000099/pdfft?md5=54b719dae828872e98af24740cf27e23&pid=1-s2.0-S2772415824000099-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142076295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structured stochastic curve fitting without gradient calculation 无需梯度计算的结构化随机曲线拟合
Pub Date : 2024-07-26 DOI: 10.1016/j.jcmds.2024.100097
Jixin Chen

Optimization of parameters and hyperparameters is a general process for any data analysis. Because not all models are mathematically well-behaved, stochastic optimization can be useful in many analyses by randomly choosing parameters in each optimization iteration. Many such algorithms have been reported and applied in chemistry data analysis, but the one reported here is interesting to check out, where a naïve algorithm searches each parameter sequentially and randomly in its bounds. Then it picks the best for the next iteration. Thus, one can ignore irrational solution of the model itself or its gradient in parameter space and continue the optimization.

参数和超参数的优化是任何数据分析的一般过程。由于并非所有模型都具有良好的数学特性,随机优化可以在每次优化迭代中随机选择参数,从而在许多分析中发挥作用。许多此类算法已被报道并应用于化学数据分析中,但本文报道的算法值得一探究竟。在该算法中,一个天真的算法在其边界内按顺序随机搜索每个参数。然后在下一次迭代中选出最佳方案。因此,我们可以忽略模型本身的不合理解或其在参数空间的梯度,继续进行优化。
{"title":"Structured stochastic curve fitting without gradient calculation","authors":"Jixin Chen","doi":"10.1016/j.jcmds.2024.100097","DOIUrl":"10.1016/j.jcmds.2024.100097","url":null,"abstract":"<div><p>Optimization of parameters and hyperparameters is a general process for any data analysis. Because not all models are mathematically well-behaved, stochastic optimization can be useful in many analyses by randomly choosing parameters in each optimization iteration. Many such algorithms have been reported and applied in chemistry data analysis, but the one reported here is interesting to check out, where a naïve algorithm searches each parameter sequentially and randomly in its bounds. Then it picks the best for the next iteration. Thus, one can ignore irrational solution of the model itself or its gradient in parameter space and continue the optimization.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"12 ","pages":"Article 100097"},"PeriodicalIF":0.0,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772415824000087/pdfft?md5=d29b0c976e4cd3877c7a001f5d45fd9a&pid=1-s2.0-S2772415824000087-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141841097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Computational Mathematics and Data Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1