首页 > 最新文献

Journal of Computational Mathematics and Data Science最新文献

英文 中文
An improved K-medoids clustering approach based on the crow search algorithm 一种基于crow搜索算法的改进K-medoids聚类方法
Pub Date : 2022-06-01 DOI: 10.1016/j.jcmds.2022.100034
Nitesh Sureja , Bharat Chawda , Avani Vasant

K-medoids clustering algorithm is a simple yet effective algorithm that has been applied to solve many clustering problems. Instead of using the mean point as the centre of a cluster, K-medoids uses an actual point to represent it. Medoid is the most centrally located object of the cluster, with a minimum sum of distances to other points. K-medoids can correctly represent the cluster centre as it is robust to outliers. However, the K-medoids algorithm is unsuitable for clustering arbitrary shaped groups of objects and large scale datasets. This is because it uses compactness as a clustering criterion instead of connectivity. An improved k-medoids algorithm based on the crow search algorithm is proposed to overcome the above problems. This research uses the crow search algorithm to improve the balance between the exploration and exploitation process of the K-medoids algorithm. Experimental result comparison shows that the proposed improved algorithm performs better than other competitors.

K-medoids聚类算法是一种简单而有效的算法,已被应用于解决许多聚类问题。K-Medoid不是使用平均点作为聚类的中心,而是使用实际点来表示它。Medoid是聚类中位于最中心的对象,与其他点的距离总和最小。K-medoid可以正确地表示聚类中心,因为它对异常值是鲁棒的。然而,K-medoids算法不适合对任意形状的对象组和大规模数据集进行聚类。这是因为它使用紧凑性作为聚类标准,而不是连通性。针对上述问题,提出了一种基于crow搜索算法的改进k-medoids算法。本研究使用乌鸦搜索算法来改善K-medoids算法的探索和开发过程之间的平衡。实验结果比较表明,该改进算法的性能优于其他竞争对手。
{"title":"An improved K-medoids clustering approach based on the crow search algorithm","authors":"Nitesh Sureja ,&nbsp;Bharat Chawda ,&nbsp;Avani Vasant","doi":"10.1016/j.jcmds.2022.100034","DOIUrl":"https://doi.org/10.1016/j.jcmds.2022.100034","url":null,"abstract":"<div><p>K-medoids clustering algorithm is a simple yet effective algorithm that has been applied to solve many clustering problems. Instead of using the mean point as the centre of a cluster, K-medoids uses an actual point to represent it. Medoid is the most centrally located object of the cluster, with a minimum sum of distances to other points. K-medoids can correctly represent the cluster centre as it is robust to outliers. However, the K-medoids algorithm is unsuitable for clustering arbitrary shaped groups of objects and large scale datasets. This is because it uses compactness as a clustering criterion instead of connectivity. An improved k-medoids algorithm based on the crow search algorithm is proposed to overcome the above problems. This research uses the crow search algorithm to improve the balance between the exploration and exploitation process of the K-medoids algorithm. Experimental result comparison shows that the proposed improved algorithm performs better than other competitors.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"3 ","pages":"Article 100034"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772415822000074/pdfft?md5=51264beac75b1244da73f110e16c4c0a&pid=1-s2.0-S2772415822000074-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72243328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Revealing influence of meteorological conditions and flight factors on delays Using XGBoost 利用XGBoost揭示气象条件和飞行因素对延误的影响
Pub Date : 2022-06-01 DOI: 10.1016/j.jcmds.2022.100030
Yinghan Wu, Gang Mei, Kaixuan Shao

With the increasing demand for air transportation, the negative impact of flight delays has been paid more and more attention, especially in the hubs of large cities. By examining flight delay data and analyzing the main factors affecting flight delays, the causes of flight delays can be found and effectively avoided. In this paper, we collect meteorological data and flight data of New York’s John F. Kennedy International Airport (JFK), Laguardia Airport (LGA), and Newark Liberty International Airport (EWR). By consulting relevant data, we select the factors that may have a strong correlation with flight delays, and we simplify and classify the data. Based on the preliminary analysis of the relationship between a single factor and flight delays, we use XGBoost to predict and analyze flight delays. We find that: (1) the effect of a single feature on flight delays is limited; (2) departure time, carrier, and precipitation have a great influence on flight delays; and (3) the accuracy of the prediction results of the change of delay duration during flight is better than the departure delay and arrival delay. Our research results can help airports combine meteorological conditions and forecasts to arrange flights properly and reduce the rate of flight delays and the losses to airlines and passengers.

随着航空运输需求的不断增加,航班延误的负面影响越来越受到人们的关注,特别是在大城市的枢纽。通过检查航班延误数据,分析影响航班延误的主要因素,可以发现航班延误的原因并有效避免。本文收集了美国纽约肯尼迪国际机场(JFK)、拉瓜迪亚机场(LGA)和纽瓦克自由国际机场(EWR)的气象数据和飞行数据。通过查阅相关数据,我们选择可能与航班延误有较强相关性的因素,并对数据进行简化和分类。在初步分析单因素与航班延误关系的基础上,利用XGBoost对航班延误进行预测和分析。我们发现:(1)单个特征对航班延误的影响是有限的;(2)起飞时间、承运人、降水对航班延误影响较大;(3)飞行期间延误时间变化预测结果的准确性优于出发延误和到达延误预测结果。我们的研究成果可以帮助机场结合气象条件和预报,合理安排航班,减少航班延误率,减少航空公司和旅客的损失。
{"title":"Revealing influence of meteorological conditions and flight factors on delays Using XGBoost","authors":"Yinghan Wu,&nbsp;Gang Mei,&nbsp;Kaixuan Shao","doi":"10.1016/j.jcmds.2022.100030","DOIUrl":"https://doi.org/10.1016/j.jcmds.2022.100030","url":null,"abstract":"<div><p>With the increasing demand for air transportation, the negative impact of flight delays has been paid more and more attention, especially in the hubs of large cities. By examining flight delay data and analyzing the main factors affecting flight delays, the causes of flight delays can be found and effectively avoided. In this paper, we collect meteorological data and flight data of New York’s John F. Kennedy International Airport (JFK), Laguardia Airport (LGA), and Newark Liberty International Airport (EWR). By consulting relevant data, we select the factors that may have a strong correlation with flight delays, and we simplify and classify the data. Based on the preliminary analysis of the relationship between a single factor and flight delays, we use XGBoost to predict and analyze flight delays. We find that: (1) the effect of a single feature on flight delays is limited; (2) departure time, carrier, and precipitation have a great influence on flight delays; and (3) the accuracy of the prediction results of the change of delay duration during flight is better than the departure delay and arrival delay. Our research results can help airports combine meteorological conditions and forecasts to arrange flights properly and reduce the rate of flight delays and the losses to airlines and passengers.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"3 ","pages":"Article 100030"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772415822000050/pdfft?md5=bee0b2b1da153dcda474586e7f45857c&pid=1-s2.0-S2772415822000050-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136550813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MicroRNA signature for interpretable breast cancer classification with subtype clue 微RNA标记用于具有亚型线索的可解释的乳腺癌症分类
Pub Date : 2022-06-01 DOI: 10.1016/j.jcmds.2022.100042
Paolo Andreini , Simone Bonechi , Monica Bianchini , Filippo Geraci

MicroRNAs (miRNAs) are short non-coding RNAs engaged in cellular regulation by suppressing genes at their post-transcriptional stage. Evidence of their involvement in breast cancer and the possibility of quantifying the their concentration in the blood has sparked the hope of using them as reliable, inexpensive and non-invasive biomarkers.

While differential expression analysis succeeded in identifying groups of disregulated miRNAs among tumor and healthy samples, its intrinsic dual nature makes it inadequate for cancer subtype detection. Using artificial intelligence or machine learning to uncover complex profiles of miRNA expression associated with different breast cancer subtypes has poorly been investigated and only few recent works have explored this possibility. However, the use of the same dataset both for training and testing leaves the issue of the robustness of these results still open.

In this paper, we propose a two-stage method that leverages on two ad-hoc classifiers for tumor/healthy classification and subtype identification. We assess our results using two completely independent datasets: TGCA for training and GSE68085 for testing. Experiments show that our strategy is extraordinarily effective especially for tumor/healthy classification, where we achieved an accuracy of 0.99. Yet, by means of a feature importance mechanism, our method is able to display which miRNAs lead to every single sample classification so as to enable a personalized medicine approach to therapy as well as the algorithm explainability required by the EU GDPR regulation and other similar legislations.

微小RNA(miRNA)是一种短的非编码RNA,通过在转录后阶段抑制基因参与细胞调控。他们参与癌症的证据以及量化他们在血液中的浓度的可能性激发了将他们用作可靠、廉价和非侵入性生物标志物的希望。虽然差异表达分析成功地鉴定了肿瘤和健康样本中失调的miRNA组,但其内在的双重性质使其不足以检测癌症亚型。使用人工智能或机器学习来揭示与不同乳腺癌症亚型相关的miRNA表达的复杂图谱的研究很少,最近只有很少的工作探索了这种可能性。然而,在训练和测试中使用相同的数据集仍然存在这些结果的稳健性问题。在本文中,我们提出了一种两阶段方法,该方法利用两个自组织分类器进行肿瘤/健康分类和亚型识别。我们使用两个完全独立的数据集来评估我们的结果:用于训练的TGCA和用于测试的GSE68085。实验表明,我们的策略非常有效,尤其是在肿瘤/健康分类方面,我们的准确率达到了0.99。然而,通过特征重要性机制,我们的方法能够显示哪些miRNA导致每个样本分类,从而实现个性化的药物治疗方法,以及欧盟GDPR法规和其他类似立法所要求的算法可解释性。
{"title":"MicroRNA signature for interpretable breast cancer classification with subtype clue","authors":"Paolo Andreini ,&nbsp;Simone Bonechi ,&nbsp;Monica Bianchini ,&nbsp;Filippo Geraci","doi":"10.1016/j.jcmds.2022.100042","DOIUrl":"https://doi.org/10.1016/j.jcmds.2022.100042","url":null,"abstract":"<div><p>MicroRNAs (miRNAs) are short non-coding RNAs engaged in cellular regulation by suppressing genes at their post-transcriptional stage. Evidence of their involvement in breast cancer and the possibility of quantifying the their concentration in the blood has sparked the hope of using them as reliable, inexpensive and non-invasive biomarkers.</p><p>While differential expression analysis succeeded in identifying groups of disregulated miRNAs among tumor and healthy samples, its intrinsic dual nature makes it inadequate for cancer subtype detection. Using artificial intelligence or machine learning to uncover complex profiles of miRNA expression associated with different breast cancer subtypes has poorly been investigated and only few recent works have explored this possibility. However, the use of the same dataset both for training and testing leaves the issue of the robustness of these results still open.</p><p>In this paper, we propose a two-stage method that leverages on two ad-hoc classifiers for tumor/healthy classification and subtype identification. We assess our results using two completely independent datasets: TGCA for training and GSE68085 for testing. Experiments show that our strategy is extraordinarily effective especially for tumor/healthy classification, where we achieved an accuracy of 0.99. Yet, by means of a feature importance mechanism, our method is able to display which miRNAs lead to every single sample classification so as to enable a personalized medicine approach to therapy as well as the algorithm explainability required by the EU GDPR regulation and other similar legislations.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"3 ","pages":"Article 100042"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772415822000116/pdfft?md5=5ebd30b1a40a0f15df580e1b4efa8552&pid=1-s2.0-S2772415822000116-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72292921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
PROVAL: A framework for comparison of protein sequence embeddings PROVAL:一个比较蛋白质序列嵌入的框架
Pub Date : 2022-06-01 DOI: 10.1016/j.jcmds.2022.100044
Philipp Väth , Maximilian Münch , Christoph Raab , F.-M. Schleif

High throughput sequencing technology leads to a significant increase in the number of generated protein sequences and the anchor database UniProt doubles approximately every two years. This large set of annotated data is used by many bioinformatics algorithms. Searching within these databases, typically without using any annotations, is challenging due to the variable lengths of the entries and the used non-standard comparison measures. A promising strategy to address these issues is to find fixed-length, information-preserving representations of the variable length protein sequences. A systematic algorithmic evaluation of the proposals is however surprisingly missing. In this work, we analyze how different algorithms perform in generating general protein sequence representations and provide a thorough evaluation framework PROVAL. The strategies range from a proximity representation using classical Smith–Waterman algorithm to state-of-the-art embedding techniques by means of transformer networks. The methods are evaluated by, e.g., the molecular function classification, embedding space visualization, computational complexity and the carbon footprint.

高通量测序技术导致生成的蛋白质序列数量显著增加,锚定数据库UniProt大约每两年翻一番。许多生物信息学算法都使用这一大组注释数据。在这些数据库中搜索,通常不使用任何注释,由于条目的长度可变和使用的非标准比较度量,具有挑战性。解决这些问题的一个有前途的策略是找到可变长度蛋白质序列的固定长度、信息保存的表示。然而,令人惊讶的是,对提案缺乏系统的算法评估。在这项工作中,我们分析了不同的算法在生成通用蛋白质序列表示方面的表现,并提供了一个全面的评估框架PROVAL。策略范围从使用经典Smith–Waterman算法的邻近表示到通过变压器网络的最先进嵌入技术。这些方法通过分子函数分类、嵌入空间可视化、计算复杂性和碳足迹等进行评估。
{"title":"PROVAL: A framework for comparison of protein sequence embeddings","authors":"Philipp Väth ,&nbsp;Maximilian Münch ,&nbsp;Christoph Raab ,&nbsp;F.-M. Schleif","doi":"10.1016/j.jcmds.2022.100044","DOIUrl":"https://doi.org/10.1016/j.jcmds.2022.100044","url":null,"abstract":"<div><p>High throughput sequencing technology leads to a significant increase in the number of generated protein sequences and the anchor database UniProt doubles approximately every two years. This large set of annotated data is used by many bioinformatics algorithms. Searching within these databases, typically without using any annotations, is challenging due to the variable lengths of the entries and the used non-standard comparison measures. A promising strategy to address these issues is to find fixed-length, information-preserving representations of the variable length protein sequences. A systematic algorithmic evaluation of the proposals is however surprisingly missing. In this work, we analyze how different algorithms perform in generating general protein sequence representations and provide a thorough evaluation framework PROVAL. The strategies range from a proximity representation using classical Smith–Waterman algorithm to state-of-the-art embedding techniques by means of transformer networks. The methods are evaluated by, e.g., the molecular function classification, embedding space visualization, computational complexity and the carbon footprint.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"3 ","pages":"Article 100044"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772415822000128/pdfft?md5=b870f0fa5ea53661bdacc49b6a2e71b8&pid=1-s2.0-S2772415822000128-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72292922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
The localized method of approximate particular solutions for solving an optimal control problem 求解最优控制问题的近似特解的局部化方法
Pub Date : 2022-06-01 DOI: 10.1016/j.jcmds.2022.100038
Kwesi Acheampong , Hongbo Guan , Huiqing Zhu

In this paper, we consider the localized method of approximate particular solutions (LMAPS) for solving a two-dimensional distributive optimal control problem governed by elliptic partial differential equations. Both radial basis functions and polynomial basis functions (RBFs) are used in the LMAPS discretization, while the leave-one-out cross-validation is adopted for the selection of the shape parameter appeared in RBFs. Numerical experiments are presented to demonstrate the accuracy and efficiency of the proposed method.

本文研究了一类椭圆型偏微分方程的二维分布最优控制问题的近似特解的局部化方法。LMAPS离散化采用径向基函数和多项式基函数(rbf),对rbf中出现的形状参数选择采用留一交叉验证。数值实验验证了该方法的准确性和有效性。
{"title":"The localized method of approximate particular solutions for solving an optimal control problem","authors":"Kwesi Acheampong ,&nbsp;Hongbo Guan ,&nbsp;Huiqing Zhu","doi":"10.1016/j.jcmds.2022.100038","DOIUrl":"10.1016/j.jcmds.2022.100038","url":null,"abstract":"<div><p>In this paper, we consider the localized method of approximate particular solutions (LMAPS) for solving a two-dimensional distributive optimal control problem governed by elliptic partial differential equations. Both radial basis functions and polynomial basis functions (RBFs) are used in the LMAPS discretization, while the leave-one-out cross-validation is adopted for the selection of the shape parameter appeared in RBFs. Numerical experiments are presented to demonstrate the accuracy and efficiency of the proposed method.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"3 ","pages":"Article 100038"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772415822000098/pdfft?md5=7a88a8c30fe0636f48d4081f589fccf5&pid=1-s2.0-S2772415822000098-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84146507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigating changes in global distribution of Ozone in 2018 using k-means clustering algorithm 基于k-means聚类算法的2018年臭氧全球分布变化研究
Pub Date : 2022-06-01 DOI: 10.1016/j.jcmds.2022.100028
Kaixuan Shao, Gang Mei, Yinghan Wu

Ozone is an active gas in the atmosphere. Its content is quite low, but it plays an important role in protecting the health of human beings and other living things on earth. Ozone circulates in the atmosphere, and its total distribution and variation trend are related to geographical position. In this paper, we collected global Ozone tendency data and investigated the changes in global distribution of Ozone in 2018 using k-means clustering algorithm. We observed that (1) the global Ozone tendency can be broadly divided into four regions; (2) the data with a large variation range of total Ozone tendency is mainly concentrated near the sea–land boundary, and their distribution is similar to the coastline contour to some extent; (3) after clustering, the concentration area of the data with great changes in the total Ozone tendency is roughly x-shaped distribution, and the acute angle between the data and the latitude line is between 25° and 45°. Our findings can contribute to a clearer understanding and analysis of the tendency of global Ozone change and help mitigate the Ozone hole problem in different regions.

臭氧是大气中的一种活性气体。它的含量很低,但它对保护人类和地球上其他生物的健康起着重要的作用。臭氧在大气中循环,其总分布和变化趋势与地理位置有关。本文收集了2018年全球臭氧趋势数据,利用k-means聚类算法研究了臭氧在全球的分布变化。结果表明:(1)全球臭氧趋势大致可分为四个区域;(2)总臭氧趋势变化幅度较大的资料主要集中在海陆边界附近,其分布与海岸线等值线有一定的相似性;(3)聚类后,臭氧总趋势变化较大的数据集中区大致呈x形分布,数据与纬度线的锐角在25°~ 45°之间。这些发现有助于更清晰地认识和分析全球臭氧变化的趋势,并有助于缓解不同区域的臭氧空洞问题。
{"title":"Investigating changes in global distribution of Ozone in 2018 using k-means clustering algorithm","authors":"Kaixuan Shao,&nbsp;Gang Mei,&nbsp;Yinghan Wu","doi":"10.1016/j.jcmds.2022.100028","DOIUrl":"10.1016/j.jcmds.2022.100028","url":null,"abstract":"<div><p>Ozone is an active gas in the atmosphere. Its content is quite low, but it plays an important role in protecting the health of human beings and other living things on earth. Ozone circulates in the atmosphere, and its total distribution and variation trend are related to geographical position. In this paper, we collected global Ozone tendency data and investigated the changes in global distribution of Ozone in 2018 using <span><math><mi>k</mi></math></span>-means clustering algorithm. We observed that (1) the global Ozone tendency can be broadly divided into four regions; (2) the data with a large variation range of total Ozone tendency is mainly concentrated near the sea–land boundary, and their distribution is similar to the coastline contour to some extent; (3) after clustering, the concentration area of the data with great changes in the total Ozone tendency is roughly x-shaped distribution, and the acute angle between the data and the latitude line is between <span><math><mrow><mn>25</mn><mo>°</mo></mrow></math></span> and <span><math><mrow><mn>45</mn><mo>°</mo></mrow></math></span>. Our findings can contribute to a clearer understanding and analysis of the tendency of global Ozone change and help mitigate the Ozone hole problem in different regions.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"3 ","pages":"Article 100028"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772415822000049/pdfft?md5=78352574a10b9397c871aa984652f8a1&pid=1-s2.0-S2772415822000049-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89475073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
PROVAL: A framework for comparison of protein sequence embeddings 证明:蛋白质序列嵌入比较的框架
Pub Date : 2022-05-01 DOI: 10.1016/j.jcmds.2022.100044
Philipp Väth, Maximilian Münch, Christoph Raab, Frank-Michael Schleif
{"title":"PROVAL: A framework for comparison of protein sequence embeddings","authors":"Philipp Väth, Maximilian Münch, Christoph Raab, Frank-Michael Schleif","doi":"10.1016/j.jcmds.2022.100044","DOIUrl":"https://doi.org/10.1016/j.jcmds.2022.100044","url":null,"abstract":"","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85526386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
MicroRNA signature for interpretable breast cancer classification with subtype clue 基于亚型线索的可解释乳腺癌分类的MicroRNA标记
Pub Date : 2022-05-01 DOI: 10.1016/j.jcmds.2022.100042
P. Andreini, S. Bonechi, M. Bianchini, Filippo Geraci
{"title":"MicroRNA signature for interpretable breast cancer classification with subtype clue","authors":"P. Andreini, S. Bonechi, M. Bianchini, Filippo Geraci","doi":"10.1016/j.jcmds.2022.100042","DOIUrl":"https://doi.org/10.1016/j.jcmds.2022.100042","url":null,"abstract":"","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85068042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An improved K-medoids clustering approach based on the crow search algorithm 一种基于乌鸦搜索算法的改进k -媒质聚类方法
Pub Date : 2022-04-01 DOI: 10.1016/j.jcmds.2022.100034
Nitesh M. Sureja, Bharat V. Chawda, A. Vasant
{"title":"An improved K-medoids clustering approach based on the crow search algorithm","authors":"Nitesh M. Sureja, Bharat V. Chawda, A. Vasant","doi":"10.1016/j.jcmds.2022.100034","DOIUrl":"https://doi.org/10.1016/j.jcmds.2022.100034","url":null,"abstract":"","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88457917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A parallel algorithm for understanding design spaces and performing convex hull computations 用于理解设计空间和执行凸包计算的并行算法
Pub Date : 2022-01-01 DOI: 10.1016/j.jcmds.2021.100021
Adam Siegel

A novel algorithm to compute the convex hull of any given hyperdimensional data set is presented. This algorithm has lower memory requirements than state of the art software, and runtimes which are typically much faster than conventional programs and algorithms which do the same. A discussion is presented which examines the large importance that convex hull computations serve in creating general surrogate models from data sets, and their importance to machine learning algorithms. In addition to the deep reaching applications in many fields, this algorithm can be used to help solve design problems, specifically those in preliminary design when surrogate models are used to perform rapid design trades. The algorithm is presented, in addition to algorithms which compute volumes and facilitate understanding of hyperdimensional spaces which cannot be easily visualized. This paper concludes with the presentation of a representative design problem containing similar dimensionality and numbers of points as a standard engineering preliminary design problem. The minimum number of points needed for the interpolation of a general surrogate model during design and analysis is then discussed, including the proposal of a new metric.

提出了一种计算任意给定超维数据集凸包的新算法。该算法比最先进的软件具有更低的内存需求,并且运行时通常比传统的程序和算法快得多。本文讨论了凸包计算在从数据集创建通用代理模型中的重要性,以及它们对机器学习算法的重要性。除了在许多领域具有深远的应用外,该算法还可以用于帮助解决设计问题,特别是在使用代理模型进行快速设计交易时的初步设计问题。除了计算体积和便于理解不容易可视化的超维空间的算法之外,还提出了该算法。本文最后提出了一个具有代表性的设计问题,该问题包含相似的维数和点数作为标准的工程初步设计问题。然后讨论了在设计和分析期间一般代理模型插值所需的最小点数,包括新度量的建议。
{"title":"A parallel algorithm for understanding design spaces and performing convex hull computations","authors":"Adam Siegel","doi":"10.1016/j.jcmds.2021.100021","DOIUrl":"10.1016/j.jcmds.2021.100021","url":null,"abstract":"<div><p>A novel algorithm to compute the convex hull of any given hyperdimensional data set is presented. This algorithm has lower memory requirements than state of the art software, and runtimes which are typically much faster than conventional programs and algorithms which do the same. A discussion is presented which examines the large importance that convex hull computations serve in creating general surrogate models from data sets, and their importance to machine learning algorithms. In addition to the deep reaching applications in many fields, this algorithm can be used to help solve design problems, specifically those in preliminary design when surrogate models are used to perform rapid design trades. The algorithm is presented, in addition to algorithms which compute volumes and facilitate understanding of hyperdimensional spaces which cannot be easily visualized. This paper concludes with the presentation of a representative design problem containing similar dimensionality and numbers of points as a standard engineering preliminary design problem. The minimum number of points needed for the interpolation of a general surrogate model during design and analysis is then discussed, including the proposal of a new metric.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"2 ","pages":"Article 100021"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772415821000110/pdfft?md5=dbb4410045084152f030c63f6ecfbbd5&pid=1-s2.0-S2772415821000110-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82239247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
Journal of Computational Mathematics and Data Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1