Journal of Computational Mathematics and Data Science最新文献

英文中文

Computational treatment of MHD Maxwell nanofluid flow across a stretching sheet considering higher-order chemical reaction and thermal radiation 考虑高阶化学反应和热辐射的MHD麦克斯韦纳米流体流过拉伸片的计算处理

Journal of Computational Mathematics and Data Science

Pub Date : 2022-08-01 DOI: 10.1016/j.jcmds.2022.100048

Rajib Biswas , Md. Shahadat Hossain , Rafiqul Islam , Sarder Firoz Ahmmed , S.R. Mishra , Mohammad Afikuzzaman

The present analysis reports a computational study of Magnetohydrodynamic (MHD) flow behaviour of 2D Maxwell nanofluid across a stretched sheet in appearance of Brownian motion. The substantial term thermal radiation and chemical reactions have been employed extensively in the current research. Nanofluids are usually chosen by researchers because of their rheological properties, which are important in determining their appropriateness for convective heat transfer. The present research reveals that the fluid velocity augments for the enhanced values of all the parameters. Heat source, as well as the radiation parameters, ensure that there is enough heat in the fluid, which implies escalation of the thermal boundary layer thickness by accruing radiation parameter. Moreover, streamlines and isotherms have been investigated for the different parametric values. The suggested model is valuable because it has a wide range of applications in domains including medical sciences (treatment of cancer therapeutics), microelectronics, biomedicine, biology, and industrial production processes.

本分析报告了二维麦克斯韦纳米流体在布朗运动外观下通过拉伸片的磁流体动力学(MHD)流动行为的计算研究。在目前的研究中，热辐射和化学反应已被广泛应用。研究人员通常选择纳米流体是因为它们的流变特性，这对于决定它们是否适合对流换热是很重要的。研究表明，流体速度随各参数的增大值而增大。热源和辐射参数确保流体中有足够的热量，这意味着通过累积辐射参数来增加热边界层厚度。此外，还研究了不同参数值下的流线和等温线。所建议的模型很有价值，因为它在医学(癌症疗法的治疗)、微电子学、生物医学、生物学和工业生产过程等领域具有广泛的应用。

{"title":"Computational treatment of MHD Maxwell nanofluid flow across a stretching sheet considering higher-order chemical reaction and thermal radiation","authors":"Rajib Biswas , Md. Shahadat Hossain , Rafiqul Islam , Sarder Firoz Ahmmed , S.R. Mishra , Mohammad Afikuzzaman","doi":"10.1016/j.jcmds.2022.100048","DOIUrl":"10.1016/j.jcmds.2022.100048","url":null,"abstract":"<div><p>The present analysis reports a computational study of Magnetohydrodynamic (MHD) flow behaviour of 2D Maxwell nanofluid across a stretched sheet in appearance of Brownian motion. The substantial term thermal radiation and chemical reactions have been employed extensively in the current research. Nanofluids are usually chosen by researchers because of their rheological properties, which are important in determining their appropriateness for convective heat transfer. The present research reveals that the fluid velocity augments for the enhanced values of all the parameters. Heat source, as well as the radiation parameters, ensure that there is enough heat in the fluid, which implies escalation of the thermal boundary layer thickness by accruing radiation parameter. Moreover, streamlines and isotherms have been investigated for the different parametric values. The suggested model is valuable because it has a wide range of applications in domains including medical sciences (treatment of cancer therapeutics), microelectronics, biomedicine, biology, and industrial production processes.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"4 ","pages":"Article 100048"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772415822000141/pdfft?md5=901981bc7e4956837055a6b712d8d47e&pid=1-s2.0-S2772415822000141-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88860933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

Thermodynamic analysis of a tangent hyperbolic hydromagnetic heat generating fluid in quadratic Boussinesq approximation 二次Boussinesq近似下正切双曲磁流体热流体的热力学分析

Journal of Computational Mathematics and Data Science

Pub Date : 2022-08-01 DOI: 10.1016/j.jcmds.2022.100058

A.R. Hassan , S.O. Salawu , A.B. Disu , O.R. Aderele

The current investigation is to examine the compound impact of electromagnetic induced force and internal heat source on a tangent hyperbolic fluid in quadratic Boussinesq approximation. The current hyperbolic tangent liquid flow and heat transport formulation model adequately predicts and characterizes the shear-stricken event. The nonlinear dimensionless heat transfer flow equations are solved completely using weighted residual solution procedures coupled with Galerkin approximation integration approach. The results in the table and graphs revealed that the magnetic field strength has a substantial impact on the fluid flow and heat propagation, as well as the internal heat source. Therefore, the entropy generation is optimized through an enhanced thermodynamic equilibrium and adequate control of heat generating terms and energy loss.

本文研究了电磁感应力和内部热源对正切双曲流体的二次近似复合影响。目前的双曲切线流体流动和热输运公式模型充分地预测和表征了剪切灾害事件。采用加权残馀解法结合伽辽金近似积分法对非线性无量纲换热流动方程进行了完整求解。表格和图表的结果表明，磁场强度对流体的流动和热传播以及内部热源有很大的影响。因此，通过增强热力学平衡和充分控制产热项和能量损失来优化熵的产生。

引用次数: 8

Detection of anomalies in the proximity of a railway line: A case study 铁路附近异常的检测:一个案例研究

Journal of Computational Mathematics and Data Science

Pub Date : 2022-08-01 DOI: 10.1016/j.jcmds.2022.100052

Pierluigi Amodio , Marcello De Giosa , Felice Iavernaro , Roberto La Scala , Arcangelo Labianca , Monica Lazzo , Francesca Mazzia , Lorenzo Pisani

A point cloud describing a railway environment is considered in a case study aimed at presenting a workflow for the automatic detection of external objects that, coming too close to the railway infrastructure, may cause potential risks for its correct functioning. The approach combines classical semantic segmentation methodologies with a novel geometric and numerical procedure to define a region of interest, consisting of a lower tube enveloping the 3D space occupied by the train during its transit and an upper tube enclosing the overhead contact lines. One useful application could be automatic vegetation monitoring in the proximity of the railway structure, which would help with planning maintenance pruning activities.

在一个案例研究中，考虑了一个描述铁路环境的点云，该案例研究旨在展示一个用于自动检测外部物体的工作流程，这些物体过于靠近铁路基础设施，可能会对其正确运行造成潜在风险。该方法将经典的语义分割方法与新颖的几何和数值过程相结合，以定义感兴趣的区域，包括包围列车在运输过程中所占用的三维空间的下管和包围架空接触线的上管。一个有用的应用可能是铁路结构附近的植被自动监测，这将有助于规划维护修剪活动。

引用次数: 0

A general framework for hypercomplex-valued extreme learning machines 超复值极限学习机的一般框架

Journal of Computational Mathematics and Data Science

Pub Date : 2022-06-01 DOI: 10.1016/j.jcmds.2022.100032

Guilherme Vieira, Marcos Eduardo Valle

This paper aims to establish a framework for extreme learning machines (ELMs) on general hypercomplex algebras. Hypercomplex neural networks are machine learning models that feature higher-dimension numbers as parameters, inputs, and outputs. Firstly, we review broad hypercomplex algebras and show a framework to operate in these algebras through real-valued linear algebra operations in a robust manner. We proceed to explore a handful of well-known four-dimensional examples. Then, we propose the hypercomplex-valued ELMs and derive their learning using a hypercomplex-valued least-squares problem. Finally, we compare real and hypercomplex-valued ELM models’ performance in an experiment on time-series prediction and another on color image auto-encoding. The computational experiments highlight the excellent performance of hypercomplex-valued ELMs to treat multi-dimensional data, including models based on unusual hypercomplex algebras.

本文旨在为一般超复代数上的极限学习机（ELM）建立一个框架。超复杂神经网络是以高维数字为参数、输入和输出的机器学习模型。首先，我们回顾了广义超复代数，并展示了一个通过实值线性代数运算在这些代数中以稳健的方式进行运算的框架。我们继续探索一些众所周知的四维例子。然后，我们提出了超复值ELM，并使用超复值最小二乘问题推导了它们的学习。最后，我们比较了实数和超复数值ELM模型在时间序列预测实验和彩色图像自动编码实验中的性能。计算实验强调了超复数值ELM在处理多维数据方面的优异性能，包括基于异常超复数代数的模型。

引用次数: 10

EUPHORIA: A neural multi-view approach to combine content and behavioral features in review spam detection EUPHORIA:一种结合评论垃圾邮件检测内容和行为特征的神经多视图方法

Journal of Computational Mathematics and Data Science

Pub Date : 2022-06-01 DOI: 10.1016/j.jcmds.2022.100036

Giuseppina Andresini , Andrea Iovine , Roberto Gasbarro , Marco Lomolino , Marco de Gemmis , Annalisa Appice

Nowadays, online reviews are the main source to customer opinions. They are especially important in the realm of e-commerce, where reviews regarding products and services influence the purchase decisions of customers, as well as the reputation of the commerce websites. Unfortunately, not all the online reviews are truthful and trustworthy. Therefore, it is crucial to develop machine learning techniques to detect review spam. This study describes EUPHORIA — a novel classification approach to distinguish spam from truthful reviews. This approach couples multi-view learning to deep learning, in order to gain accuracy by accounting for the variety of information possibly associated with both the reviews’ content and the reviewers’ behavior. Experiments carried out on two real review datasets from Yelp.com – Hotel and Restaurant – show that the use of multi-view learning can improve the performance of a deep learning classifier trained for review spam detection. In particular, the proposed approach achieves AUC-ROC equal to 0.813 and 0.708 in Hotel and Restaurant, respectively.

如今，网上评论是顾客意见的主要来源。它们在电子商务领域尤其重要，在电子商务领域，关于产品和服务的评论会影响客户的购买决策，以及商务网站的声誉。不幸的是，并非所有的在线评论都是真实可信的。因此，开发机器学习技术来检测评论垃圾邮件至关重要。本研究描述了EUPHORIA——一种区分垃圾邮件和真实评论的新分类方法。这种方法将多视图学习与深度学习相结合，以便通过考虑可能与审稿人的内容和审稿人的行为相关的各种信息来获得准确性。在Yelp.com的两个真实评论数据集(酒店和餐厅)上进行的实验表明，使用多视图学习可以提高深度学习分类器的性能，用于评论垃圾邮件检测。特别是，本文方法在酒店和餐厅的AUC-ROC分别为0.813和0.708。

{"title":"EUPHORIA: A neural multi-view approach to combine content and behavioral features in review spam detection","authors":"Giuseppina Andresini , Andrea Iovine , Roberto Gasbarro , Marco Lomolino , Marco de Gemmis , Annalisa Appice","doi":"10.1016/j.jcmds.2022.100036","DOIUrl":"10.1016/j.jcmds.2022.100036","url":null,"abstract":"<div><p>Nowadays, online reviews are the main source to customer opinions. They are especially important in the realm of e-commerce, where reviews regarding products and services influence the purchase decisions of customers, as well as the reputation of the commerce websites. Unfortunately, not all the online reviews are truthful and trustworthy. Therefore, it is crucial to develop machine learning techniques to detect review spam. This study describes <span>EUPHORIA</span> — a novel classification approach to distinguish spam from truthful reviews. This approach couples multi-view learning to deep learning, in order to gain accuracy by accounting for the variety of information possibly associated with both the reviews’ content and the reviewers’ behavior. Experiments carried out on two real review datasets from Yelp.com – Hotel and Restaurant – show that the use of multi-view learning can improve the performance of a deep learning classifier trained for review spam detection. In particular, the proposed approach achieves AUC-ROC equal to 0.813 and 0.708 in Hotel and Restaurant, respectively.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"3 ","pages":"Article 100036"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772415822000086/pdfft?md5=2d7de96c79d3f46c848780e22dd8e576&pid=1-s2.0-S2772415822000086-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81237745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

An improved K-medoids clustering approach based on the crow search algorithm 一种基于crow搜索算法的改进K-medoids聚类方法

Journal of Computational Mathematics and Data Science

Pub Date : 2022-06-01 DOI: 10.1016/j.jcmds.2022.100034

Nitesh Sureja , Bharat Chawda , Avani Vasant

K-medoids clustering algorithm is a simple yet effective algorithm that has been applied to solve many clustering problems. Instead of using the mean point as the centre of a cluster, K-medoids uses an actual point to represent it. Medoid is the most centrally located object of the cluster, with a minimum sum of distances to other points. K-medoids can correctly represent the cluster centre as it is robust to outliers. However, the K-medoids algorithm is unsuitable for clustering arbitrary shaped groups of objects and large scale datasets. This is because it uses compactness as a clustering criterion instead of connectivity. An improved k-medoids algorithm based on the crow search algorithm is proposed to overcome the above problems. This research uses the crow search algorithm to improve the balance between the exploration and exploitation process of the K-medoids algorithm. Experimental result comparison shows that the proposed improved algorithm performs better than other competitors.

K-medoids聚类算法是一种简单而有效的算法，已被应用于解决许多聚类问题。K-Medoid不是使用平均点作为聚类的中心，而是使用实际点来表示它。Medoid是聚类中位于最中心的对象，与其他点的距离总和最小。K-medoid可以正确地表示聚类中心，因为它对异常值是鲁棒的。然而，K-medoids算法不适合对任意形状的对象组和大规模数据集进行聚类。这是因为它使用紧凑性作为聚类标准，而不是连通性。针对上述问题，提出了一种基于crow搜索算法的改进k-medoids算法。本研究使用乌鸦搜索算法来改善K-medoids算法的探索和开发过程之间的平衡。实验结果比较表明，该改进算法的性能优于其他竞争对手。

{"title":"An improved K-medoids clustering approach based on the crow search algorithm","authors":"Nitesh Sureja , Bharat Chawda , Avani Vasant","doi":"10.1016/j.jcmds.2022.100034","DOIUrl":"https://doi.org/10.1016/j.jcmds.2022.100034","url":null,"abstract":"<div><p>K-medoids clustering algorithm is a simple yet effective algorithm that has been applied to solve many clustering problems. Instead of using the mean point as the centre of a cluster, K-medoids uses an actual point to represent it. Medoid is the most centrally located object of the cluster, with a minimum sum of distances to other points. K-medoids can correctly represent the cluster centre as it is robust to outliers. However, the K-medoids algorithm is unsuitable for clustering arbitrary shaped groups of objects and large scale datasets. This is because it uses compactness as a clustering criterion instead of connectivity. An improved k-medoids algorithm based on the crow search algorithm is proposed to overcome the above problems. This research uses the crow search algorithm to improve the balance between the exploration and exploitation process of the K-medoids algorithm. Experimental result comparison shows that the proposed improved algorithm performs better than other competitors.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"3 ","pages":"Article 100034"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772415822000074/pdfft?md5=51264beac75b1244da73f110e16c4c0a&pid=1-s2.0-S2772415822000074-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72243328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Revealing influence of meteorological conditions and flight factors on delays Using XGBoost 利用XGBoost揭示气象条件和飞行因素对延误的影响

Journal of Computational Mathematics and Data Science

Pub Date : 2022-06-01 DOI: 10.1016/j.jcmds.2022.100030

Yinghan Wu, Gang Mei, Kaixuan Shao

With the increasing demand for air transportation, the negative impact of flight delays has been paid more and more attention, especially in the hubs of large cities. By examining flight delay data and analyzing the main factors affecting flight delays, the causes of flight delays can be found and effectively avoided. In this paper, we collect meteorological data and flight data of New York’s John F. Kennedy International Airport (JFK), Laguardia Airport (LGA), and Newark Liberty International Airport (EWR). By consulting relevant data, we select the factors that may have a strong correlation with flight delays, and we simplify and classify the data. Based on the preliminary analysis of the relationship between a single factor and flight delays, we use XGBoost to predict and analyze flight delays. We find that: (1) the effect of a single feature on flight delays is limited; (2) departure time, carrier, and precipitation have a great influence on flight delays; and (3) the accuracy of the prediction results of the change of delay duration during flight is better than the departure delay and arrival delay. Our research results can help airports combine meteorological conditions and forecasts to arrange flights properly and reduce the rate of flight delays and the losses to airlines and passengers.

随着航空运输需求的不断增加，航班延误的负面影响越来越受到人们的关注，特别是在大城市的枢纽。通过检查航班延误数据，分析影响航班延误的主要因素，可以发现航班延误的原因并有效避免。本文收集了美国纽约肯尼迪国际机场(JFK)、拉瓜迪亚机场(LGA)和纽瓦克自由国际机场(EWR)的气象数据和飞行数据。通过查阅相关数据，我们选择可能与航班延误有较强相关性的因素，并对数据进行简化和分类。在初步分析单因素与航班延误关系的基础上，利用XGBoost对航班延误进行预测和分析。我们发现:(1)单个特征对航班延误的影响是有限的;(2)起飞时间、承运人、降水对航班延误影响较大;(3)飞行期间延误时间变化预测结果的准确性优于出发延误和到达延误预测结果。我们的研究成果可以帮助机场结合气象条件和预报，合理安排航班，减少航班延误率，减少航空公司和旅客的损失。

{"title":"Revealing influence of meteorological conditions and flight factors on delays Using XGBoost","authors":"Yinghan Wu, Gang Mei, Kaixuan Shao","doi":"10.1016/j.jcmds.2022.100030","DOIUrl":"https://doi.org/10.1016/j.jcmds.2022.100030","url":null,"abstract":"<div><p>With the increasing demand for air transportation, the negative impact of flight delays has been paid more and more attention, especially in the hubs of large cities. By examining flight delay data and analyzing the main factors affecting flight delays, the causes of flight delays can be found and effectively avoided. In this paper, we collect meteorological data and flight data of New York’s John F. Kennedy International Airport (JFK), Laguardia Airport (LGA), and Newark Liberty International Airport (EWR). By consulting relevant data, we select the factors that may have a strong correlation with flight delays, and we simplify and classify the data. Based on the preliminary analysis of the relationship between a single factor and flight delays, we use XGBoost to predict and analyze flight delays. We find that: (1) the effect of a single feature on flight delays is limited; (2) departure time, carrier, and precipitation have a great influence on flight delays; and (3) the accuracy of the prediction results of the change of delay duration during flight is better than the departure delay and arrival delay. Our research results can help airports combine meteorological conditions and forecasts to arrange flights properly and reduce the rate of flight delays and the losses to airlines and passengers.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"3 ","pages":"Article 100030"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772415822000050/pdfft?md5=bee0b2b1da153dcda474586e7f45857c&pid=1-s2.0-S2772415822000050-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136550813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MicroRNA signature for interpretable breast cancer classification with subtype clue 微RNA标记用于具有亚型线索的可解释的乳腺癌症分类

Journal of Computational Mathematics and Data Science

Pub Date : 2022-06-01 DOI: 10.1016/j.jcmds.2022.100042

Paolo Andreini , Simone Bonechi , Monica Bianchini , Filippo Geraci

MicroRNAs (miRNAs) are short non-coding RNAs engaged in cellular regulation by suppressing genes at their post-transcriptional stage. Evidence of their involvement in breast cancer and the possibility of quantifying the their concentration in the blood has sparked the hope of using them as reliable, inexpensive and non-invasive biomarkers.

While differential expression analysis succeeded in identifying groups of disregulated miRNAs among tumor and healthy samples, its intrinsic dual nature makes it inadequate for cancer subtype detection. Using artificial intelligence or machine learning to uncover complex profiles of miRNA expression associated with different breast cancer subtypes has poorly been investigated and only few recent works have explored this possibility. However, the use of the same dataset both for training and testing leaves the issue of the robustness of these results still open.

In this paper, we propose a two-stage method that leverages on two ad-hoc classifiers for tumor/healthy classification and subtype identification. We assess our results using two completely independent datasets: TGCA for training and GSE68085 for testing. Experiments show that our strategy is extraordinarily effective especially for tumor/healthy classification, where we achieved an accuracy of 0.99. Yet, by means of a feature importance mechanism, our method is able to display which miRNAs lead to every single sample classification so as to enable a personalized medicine approach to therapy as well as the algorithm explainability required by the EU GDPR regulation and other similar legislations.

微小RNA（miRNA）是一种短的非编码RNA，通过在转录后阶段抑制基因参与细胞调控。他们参与癌症的证据以及量化他们在血液中的浓度的可能性激发了将他们用作可靠、廉价和非侵入性生物标志物的希望。虽然差异表达分析成功地鉴定了肿瘤和健康样本中失调的miRNA组，但其内在的双重性质使其不足以检测癌症亚型。使用人工智能或机器学习来揭示与不同乳腺癌症亚型相关的miRNA表达的复杂图谱的研究很少，最近只有很少的工作探索了这种可能性。然而，在训练和测试中使用相同的数据集仍然存在这些结果的稳健性问题。在本文中，我们提出了一种两阶段方法，该方法利用两个自组织分类器进行肿瘤/健康分类和亚型识别。我们使用两个完全独立的数据集来评估我们的结果：用于训练的TGCA和用于测试的GSE68085。实验表明，我们的策略非常有效，尤其是在肿瘤/健康分类方面，我们的准确率达到了0.99。然而，通过特征重要性机制，我们的方法能够显示哪些miRNA导致每个样本分类，从而实现个性化的药物治疗方法，以及欧盟GDPR法规和其他类似立法所要求的算法可解释性。

{"title":"MicroRNA signature for interpretable breast cancer classification with subtype clue","authors":"Paolo Andreini , Simone Bonechi , Monica Bianchini , Filippo Geraci","doi":"10.1016/j.jcmds.2022.100042","DOIUrl":"https://doi.org/10.1016/j.jcmds.2022.100042","url":null,"abstract":"<div><p>MicroRNAs (miRNAs) are short non-coding RNAs engaged in cellular regulation by suppressing genes at their post-transcriptional stage. Evidence of their involvement in breast cancer and the possibility of quantifying the their concentration in the blood has sparked the hope of using them as reliable, inexpensive and non-invasive biomarkers.</p><p>While differential expression analysis succeeded in identifying groups of disregulated miRNAs among tumor and healthy samples, its intrinsic dual nature makes it inadequate for cancer subtype detection. Using artificial intelligence or machine learning to uncover complex profiles of miRNA expression associated with different breast cancer subtypes has poorly been investigated and only few recent works have explored this possibility. However, the use of the same dataset both for training and testing leaves the issue of the robustness of these results still open.</p><p>In this paper, we propose a two-stage method that leverages on two ad-hoc classifiers for tumor/healthy classification and subtype identification. We assess our results using two completely independent datasets: TGCA for training and GSE68085 for testing. Experiments show that our strategy is extraordinarily effective especially for tumor/healthy classification, where we achieved an accuracy of 0.99. Yet, by means of a feature importance mechanism, our method is able to display which miRNAs lead to every single sample classification so as to enable a personalized medicine approach to therapy as well as the algorithm explainability required by the EU GDPR regulation and other similar legislations.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"3 ","pages":"Article 100042"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772415822000116/pdfft?md5=5ebd30b1a40a0f15df580e1b4efa8552&pid=1-s2.0-S2772415822000116-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72292921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

PROVAL: A framework for comparison of protein sequence embeddings PROVAL：一个比较蛋白质序列嵌入的框架

Journal of Computational Mathematics and Data Science

Pub Date : 2022-06-01 DOI: 10.1016/j.jcmds.2022.100044

Philipp Väth , Maximilian Münch , Christoph Raab , F.-M. Schleif

High throughput sequencing technology leads to a significant increase in the number of generated protein sequences and the anchor database UniProt doubles approximately every two years. This large set of annotated data is used by many bioinformatics algorithms. Searching within these databases, typically without using any annotations, is challenging due to the variable lengths of the entries and the used non-standard comparison measures. A promising strategy to address these issues is to find fixed-length, information-preserving representations of the variable length protein sequences. A systematic algorithmic evaluation of the proposals is however surprisingly missing. In this work, we analyze how different algorithms perform in generating general protein sequence representations and provide a thorough evaluation framework PROVAL. The strategies range from a proximity representation using classical Smith–Waterman algorithm to state-of-the-art embedding techniques by means of transformer networks. The methods are evaluated by, e.g., the molecular function classification, embedding space visualization, computational complexity and the carbon footprint.

高通量测序技术导致生成的蛋白质序列数量显著增加，锚定数据库UniProt大约每两年翻一番。许多生物信息学算法都使用这一大组注释数据。在这些数据库中搜索，通常不使用任何注释，由于条目的长度可变和使用的非标准比较度量，具有挑战性。解决这些问题的一个有前途的策略是找到可变长度蛋白质序列的固定长度、信息保存的表示。然而，令人惊讶的是，对提案缺乏系统的算法评估。在这项工作中，我们分析了不同的算法在生成通用蛋白质序列表示方面的表现，并提供了一个全面的评估框架PROVAL。策略范围从使用经典Smith–Waterman算法的邻近表示到通过变压器网络的最先进嵌入技术。这些方法通过分子函数分类、嵌入空间可视化、计算复杂性和碳足迹等进行评估。

{"title":"PROVAL: A framework for comparison of protein sequence embeddings","authors":"Philipp Väth , Maximilian Münch , Christoph Raab , F.-M. Schleif","doi":"10.1016/j.jcmds.2022.100044","DOIUrl":"https://doi.org/10.1016/j.jcmds.2022.100044","url":null,"abstract":"<div><p>High throughput sequencing technology leads to a significant increase in the number of generated protein sequences and the anchor database UniProt doubles approximately every two years. This large set of annotated data is used by many bioinformatics algorithms. Searching within these databases, typically without using any annotations, is challenging due to the variable lengths of the entries and the used non-standard comparison measures. A promising strategy to address these issues is to find fixed-length, information-preserving representations of the variable length protein sequences. A systematic algorithmic evaluation of the proposals is however surprisingly missing. In this work, we analyze how different algorithms perform in generating general protein sequence representations and provide a thorough evaluation framework PROVAL. The strategies range from a proximity representation using classical Smith–Waterman algorithm to state-of-the-art embedding techniques by means of transformer networks. The methods are evaluated by, e.g., the molecular function classification, embedding space visualization, computational complexity and the carbon footprint.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"3 ","pages":"Article 100044"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772415822000128/pdfft?md5=b870f0fa5ea53661bdacc49b6a2e71b8&pid=1-s2.0-S2772415822000128-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72292922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

The localized method of approximate particular solutions for solving an optimal control problem 求解最优控制问题的近似特解的局部化方法

Journal of Computational Mathematics and Data Science

Pub Date : 2022-06-01 DOI: 10.1016/j.jcmds.2022.100038

Kwesi Acheampong , Hongbo Guan , Huiqing Zhu

In this paper, we consider the localized method of approximate particular solutions (LMAPS) for solving a two-dimensional distributive optimal control problem governed by elliptic partial differential equations. Both radial basis functions and polynomial basis functions (RBFs) are used in the LMAPS discretization, while the leave-one-out cross-validation is adopted for the selection of the shape parameter appeared in RBFs. Numerical experiments are presented to demonstrate the accuracy and efficiency of the proposed method.

本文研究了一类椭圆型偏微分方程的二维分布最优控制问题的近似特解的局部化方法。LMAPS离散化采用径向基函数和多项式基函数(rbf)，对rbf中出现的形状参数选择采用留一交叉验证。数值实验验证了该方法的准确性和有效性。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Journal of Computational Mathematics and Data Science

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀