Pub Date : 2022-08-01DOI: 10.1016/j.jcmds.2022.100048
Rajib Biswas , Md. Shahadat Hossain , Rafiqul Islam , Sarder Firoz Ahmmed , S.R. Mishra , Mohammad Afikuzzaman
The present analysis reports a computational study of Magnetohydrodynamic (MHD) flow behaviour of 2D Maxwell nanofluid across a stretched sheet in appearance of Brownian motion. The substantial term thermal radiation and chemical reactions have been employed extensively in the current research. Nanofluids are usually chosen by researchers because of their rheological properties, which are important in determining their appropriateness for convective heat transfer. The present research reveals that the fluid velocity augments for the enhanced values of all the parameters. Heat source, as well as the radiation parameters, ensure that there is enough heat in the fluid, which implies escalation of the thermal boundary layer thickness by accruing radiation parameter. Moreover, streamlines and isotherms have been investigated for the different parametric values. The suggested model is valuable because it has a wide range of applications in domains including medical sciences (treatment of cancer therapeutics), microelectronics, biomedicine, biology, and industrial production processes.
{"title":"Computational treatment of MHD Maxwell nanofluid flow across a stretching sheet considering higher-order chemical reaction and thermal radiation","authors":"Rajib Biswas , Md. Shahadat Hossain , Rafiqul Islam , Sarder Firoz Ahmmed , S.R. Mishra , Mohammad Afikuzzaman","doi":"10.1016/j.jcmds.2022.100048","DOIUrl":"10.1016/j.jcmds.2022.100048","url":null,"abstract":"<div><p>The present analysis reports a computational study of Magnetohydrodynamic (MHD) flow behaviour of 2D Maxwell nanofluid across a stretched sheet in appearance of Brownian motion. The substantial term thermal radiation and chemical reactions have been employed extensively in the current research. Nanofluids are usually chosen by researchers because of their rheological properties, which are important in determining their appropriateness for convective heat transfer. The present research reveals that the fluid velocity augments for the enhanced values of all the parameters. Heat source, as well as the radiation parameters, ensure that there is enough heat in the fluid, which implies escalation of the thermal boundary layer thickness by accruing radiation parameter. Moreover, streamlines and isotherms have been investigated for the different parametric values. The suggested model is valuable because it has a wide range of applications in domains including medical sciences (treatment of cancer therapeutics), microelectronics, biomedicine, biology, and industrial production processes.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"4 ","pages":"Article 100048"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772415822000141/pdfft?md5=901981bc7e4956837055a6b712d8d47e&pid=1-s2.0-S2772415822000141-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88860933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The current investigation is to examine the compound impact of electromagnetic induced force and internal heat source on a tangent hyperbolic fluid in quadratic Boussinesq approximation. The current hyperbolic tangent liquid flow and heat transport formulation model adequately predicts and characterizes the shear-stricken event. The nonlinear dimensionless heat transfer flow equations are solved completely using weighted residual solution procedures coupled with Galerkin approximation integration approach. The results in the table and graphs revealed that the magnetic field strength has a substantial impact on the fluid flow and heat propagation, as well as the internal heat source. Therefore, the entropy generation is optimized through an enhanced thermodynamic equilibrium and adequate control of heat generating terms and energy loss.
{"title":"Thermodynamic analysis of a tangent hyperbolic hydromagnetic heat generating fluid in quadratic Boussinesq approximation","authors":"A.R. Hassan , S.O. Salawu , A.B. Disu , O.R. Aderele","doi":"10.1016/j.jcmds.2022.100058","DOIUrl":"10.1016/j.jcmds.2022.100058","url":null,"abstract":"<div><p>The current investigation is to examine the compound impact of electromagnetic induced force and internal heat source on a tangent hyperbolic fluid in quadratic Boussinesq approximation. The current hyperbolic tangent liquid flow and heat transport formulation model adequately predicts and characterizes the shear-stricken event. The nonlinear dimensionless heat transfer flow equations are solved completely using weighted residual solution procedures coupled with Galerkin approximation integration approach. The results in the table and graphs revealed that the magnetic field strength has a substantial impact on the fluid flow and heat propagation, as well as the internal heat source. Therefore, the entropy generation is optimized through an enhanced thermodynamic equilibrium and adequate control of heat generating terms and energy loss.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"4 ","pages":"Article 100058"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772415822000190/pdfft?md5=8078865678a19d4ad4b600d103f6351a&pid=1-s2.0-S2772415822000190-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88840844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-01DOI: 10.1016/j.jcmds.2022.100052
Pierluigi Amodio , Marcello De Giosa , Felice Iavernaro , Roberto La Scala , Arcangelo Labianca , Monica Lazzo , Francesca Mazzia , Lorenzo Pisani
A point cloud describing a railway environment is considered in a case study aimed at presenting a workflow for the automatic detection of external objects that, coming too close to the railway infrastructure, may cause potential risks for its correct functioning. The approach combines classical semantic segmentation methodologies with a novel geometric and numerical procedure to define a region of interest, consisting of a lower tube enveloping the 3D space occupied by the train during its transit and an upper tube enclosing the overhead contact lines. One useful application could be automatic vegetation monitoring in the proximity of the railway structure, which would help with planning maintenance pruning activities.
{"title":"Detection of anomalies in the proximity of a railway line: A case study","authors":"Pierluigi Amodio , Marcello De Giosa , Felice Iavernaro , Roberto La Scala , Arcangelo Labianca , Monica Lazzo , Francesca Mazzia , Lorenzo Pisani","doi":"10.1016/j.jcmds.2022.100052","DOIUrl":"https://doi.org/10.1016/j.jcmds.2022.100052","url":null,"abstract":"<div><p>A point cloud describing a railway environment is considered in a case study aimed at presenting a workflow for the automatic detection of external objects that, coming too close to the railway infrastructure, may cause potential risks for its correct functioning. The approach combines classical semantic segmentation methodologies with a novel geometric and numerical procedure to define a <em>region of interest</em>, consisting of a lower tube enveloping the 3D space occupied by the train during its transit and an upper tube enclosing the overhead contact lines. One useful application could be automatic vegetation monitoring in the proximity of the railway structure, which would help with planning maintenance pruning activities.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"4 ","pages":"Article 100052"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772415822000165/pdfft?md5=39ce7dbb7fdd23f164ad540509765339&pid=1-s2.0-S2772415822000165-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"137407105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-01DOI: 10.1016/j.jcmds.2022.100032
Guilherme Vieira, Marcos Eduardo Valle
This paper aims to establish a framework for extreme learning machines (ELMs) on general hypercomplex algebras. Hypercomplex neural networks are machine learning models that feature higher-dimension numbers as parameters, inputs, and outputs. Firstly, we review broad hypercomplex algebras and show a framework to operate in these algebras through real-valued linear algebra operations in a robust manner. We proceed to explore a handful of well-known four-dimensional examples. Then, we propose the hypercomplex-valued ELMs and derive their learning using a hypercomplex-valued least-squares problem. Finally, we compare real and hypercomplex-valued ELM models’ performance in an experiment on time-series prediction and another on color image auto-encoding. The computational experiments highlight the excellent performance of hypercomplex-valued ELMs to treat multi-dimensional data, including models based on unusual hypercomplex algebras.
{"title":"A general framework for hypercomplex-valued extreme learning machines","authors":"Guilherme Vieira, Marcos Eduardo Valle","doi":"10.1016/j.jcmds.2022.100032","DOIUrl":"https://doi.org/10.1016/j.jcmds.2022.100032","url":null,"abstract":"<div><p>This paper aims to establish a framework for extreme learning machines (ELMs) on general hypercomplex algebras. Hypercomplex neural networks are machine learning models that feature higher-dimension numbers as parameters, inputs, and outputs. Firstly, we review broad hypercomplex algebras and show a framework to operate in these algebras through real-valued linear algebra operations in a robust manner. We proceed to explore a handful of well-known four-dimensional examples. Then, we propose the hypercomplex-valued ELMs and derive their learning using a hypercomplex-valued least-squares problem. Finally, we compare real and hypercomplex-valued ELM models’ performance in an experiment on time-series prediction and another on color image auto-encoding. The computational experiments highlight the excellent performance of hypercomplex-valued ELMs to treat multi-dimensional data, including models based on unusual hypercomplex algebras.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"3 ","pages":"Article 100032"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772415822000062/pdfft?md5=a9358c110cb7cefa5f7093886926f21f&pid=1-s2.0-S2772415822000062-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72243327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-01DOI: 10.1016/j.jcmds.2022.100036
Giuseppina Andresini , Andrea Iovine , Roberto Gasbarro , Marco Lomolino , Marco de Gemmis , Annalisa Appice
Nowadays, online reviews are the main source to customer opinions. They are especially important in the realm of e-commerce, where reviews regarding products and services influence the purchase decisions of customers, as well as the reputation of the commerce websites. Unfortunately, not all the online reviews are truthful and trustworthy. Therefore, it is crucial to develop machine learning techniques to detect review spam. This study describes EUPHORIA — a novel classification approach to distinguish spam from truthful reviews. This approach couples multi-view learning to deep learning, in order to gain accuracy by accounting for the variety of information possibly associated with both the reviews’ content and the reviewers’ behavior. Experiments carried out on two real review datasets from Yelp.com – Hotel and Restaurant – show that the use of multi-view learning can improve the performance of a deep learning classifier trained for review spam detection. In particular, the proposed approach achieves AUC-ROC equal to 0.813 and 0.708 in Hotel and Restaurant, respectively.
{"title":"EUPHORIA: A neural multi-view approach to combine content and behavioral features in review spam detection","authors":"Giuseppina Andresini , Andrea Iovine , Roberto Gasbarro , Marco Lomolino , Marco de Gemmis , Annalisa Appice","doi":"10.1016/j.jcmds.2022.100036","DOIUrl":"10.1016/j.jcmds.2022.100036","url":null,"abstract":"<div><p>Nowadays, online reviews are the main source to customer opinions. They are especially important in the realm of e-commerce, where reviews regarding products and services influence the purchase decisions of customers, as well as the reputation of the commerce websites. Unfortunately, not all the online reviews are truthful and trustworthy. Therefore, it is crucial to develop machine learning techniques to detect review spam. This study describes <span>EUPHORIA</span> — a novel classification approach to distinguish spam from truthful reviews. This approach couples multi-view learning to deep learning, in order to gain accuracy by accounting for the variety of information possibly associated with both the reviews’ content and the reviewers’ behavior. Experiments carried out on two real review datasets from Yelp.com – Hotel and Restaurant – show that the use of multi-view learning can improve the performance of a deep learning classifier trained for review spam detection. In particular, the proposed approach achieves AUC-ROC equal to 0.813 and 0.708 in Hotel and Restaurant, respectively.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"3 ","pages":"Article 100036"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772415822000086/pdfft?md5=2d7de96c79d3f46c848780e22dd8e576&pid=1-s2.0-S2772415822000086-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81237745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-01DOI: 10.1016/j.jcmds.2022.100034
Nitesh Sureja , Bharat Chawda , Avani Vasant
K-medoids clustering algorithm is a simple yet effective algorithm that has been applied to solve many clustering problems. Instead of using the mean point as the centre of a cluster, K-medoids uses an actual point to represent it. Medoid is the most centrally located object of the cluster, with a minimum sum of distances to other points. K-medoids can correctly represent the cluster centre as it is robust to outliers. However, the K-medoids algorithm is unsuitable for clustering arbitrary shaped groups of objects and large scale datasets. This is because it uses compactness as a clustering criterion instead of connectivity. An improved k-medoids algorithm based on the crow search algorithm is proposed to overcome the above problems. This research uses the crow search algorithm to improve the balance between the exploration and exploitation process of the K-medoids algorithm. Experimental result comparison shows that the proposed improved algorithm performs better than other competitors.
{"title":"An improved K-medoids clustering approach based on the crow search algorithm","authors":"Nitesh Sureja , Bharat Chawda , Avani Vasant","doi":"10.1016/j.jcmds.2022.100034","DOIUrl":"https://doi.org/10.1016/j.jcmds.2022.100034","url":null,"abstract":"<div><p>K-medoids clustering algorithm is a simple yet effective algorithm that has been applied to solve many clustering problems. Instead of using the mean point as the centre of a cluster, K-medoids uses an actual point to represent it. Medoid is the most centrally located object of the cluster, with a minimum sum of distances to other points. K-medoids can correctly represent the cluster centre as it is robust to outliers. However, the K-medoids algorithm is unsuitable for clustering arbitrary shaped groups of objects and large scale datasets. This is because it uses compactness as a clustering criterion instead of connectivity. An improved k-medoids algorithm based on the crow search algorithm is proposed to overcome the above problems. This research uses the crow search algorithm to improve the balance between the exploration and exploitation process of the K-medoids algorithm. Experimental result comparison shows that the proposed improved algorithm performs better than other competitors.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"3 ","pages":"Article 100034"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772415822000074/pdfft?md5=51264beac75b1244da73f110e16c4c0a&pid=1-s2.0-S2772415822000074-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72243328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-01DOI: 10.1016/j.jcmds.2022.100030
Yinghan Wu, Gang Mei, Kaixuan Shao
With the increasing demand for air transportation, the negative impact of flight delays has been paid more and more attention, especially in the hubs of large cities. By examining flight delay data and analyzing the main factors affecting flight delays, the causes of flight delays can be found and effectively avoided. In this paper, we collect meteorological data and flight data of New York’s John F. Kennedy International Airport (JFK), Laguardia Airport (LGA), and Newark Liberty International Airport (EWR). By consulting relevant data, we select the factors that may have a strong correlation with flight delays, and we simplify and classify the data. Based on the preliminary analysis of the relationship between a single factor and flight delays, we use XGBoost to predict and analyze flight delays. We find that: (1) the effect of a single feature on flight delays is limited; (2) departure time, carrier, and precipitation have a great influence on flight delays; and (3) the accuracy of the prediction results of the change of delay duration during flight is better than the departure delay and arrival delay. Our research results can help airports combine meteorological conditions and forecasts to arrange flights properly and reduce the rate of flight delays and the losses to airlines and passengers.
{"title":"Revealing influence of meteorological conditions and flight factors on delays Using XGBoost","authors":"Yinghan Wu, Gang Mei, Kaixuan Shao","doi":"10.1016/j.jcmds.2022.100030","DOIUrl":"https://doi.org/10.1016/j.jcmds.2022.100030","url":null,"abstract":"<div><p>With the increasing demand for air transportation, the negative impact of flight delays has been paid more and more attention, especially in the hubs of large cities. By examining flight delay data and analyzing the main factors affecting flight delays, the causes of flight delays can be found and effectively avoided. In this paper, we collect meteorological data and flight data of New York’s John F. Kennedy International Airport (JFK), Laguardia Airport (LGA), and Newark Liberty International Airport (EWR). By consulting relevant data, we select the factors that may have a strong correlation with flight delays, and we simplify and classify the data. Based on the preliminary analysis of the relationship between a single factor and flight delays, we use XGBoost to predict and analyze flight delays. We find that: (1) the effect of a single feature on flight delays is limited; (2) departure time, carrier, and precipitation have a great influence on flight delays; and (3) the accuracy of the prediction results of the change of delay duration during flight is better than the departure delay and arrival delay. Our research results can help airports combine meteorological conditions and forecasts to arrange flights properly and reduce the rate of flight delays and the losses to airlines and passengers.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"3 ","pages":"Article 100030"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772415822000050/pdfft?md5=bee0b2b1da153dcda474586e7f45857c&pid=1-s2.0-S2772415822000050-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136550813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MicroRNAs (miRNAs) are short non-coding RNAs engaged in cellular regulation by suppressing genes at their post-transcriptional stage. Evidence of their involvement in breast cancer and the possibility of quantifying the their concentration in the blood has sparked the hope of using them as reliable, inexpensive and non-invasive biomarkers.
While differential expression analysis succeeded in identifying groups of disregulated miRNAs among tumor and healthy samples, its intrinsic dual nature makes it inadequate for cancer subtype detection. Using artificial intelligence or machine learning to uncover complex profiles of miRNA expression associated with different breast cancer subtypes has poorly been investigated and only few recent works have explored this possibility. However, the use of the same dataset both for training and testing leaves the issue of the robustness of these results still open.
In this paper, we propose a two-stage method that leverages on two ad-hoc classifiers for tumor/healthy classification and subtype identification. We assess our results using two completely independent datasets: TGCA for training and GSE68085 for testing. Experiments show that our strategy is extraordinarily effective especially for tumor/healthy classification, where we achieved an accuracy of 0.99. Yet, by means of a feature importance mechanism, our method is able to display which miRNAs lead to every single sample classification so as to enable a personalized medicine approach to therapy as well as the algorithm explainability required by the EU GDPR regulation and other similar legislations.
{"title":"MicroRNA signature for interpretable breast cancer classification with subtype clue","authors":"Paolo Andreini , Simone Bonechi , Monica Bianchini , Filippo Geraci","doi":"10.1016/j.jcmds.2022.100042","DOIUrl":"https://doi.org/10.1016/j.jcmds.2022.100042","url":null,"abstract":"<div><p>MicroRNAs (miRNAs) are short non-coding RNAs engaged in cellular regulation by suppressing genes at their post-transcriptional stage. Evidence of their involvement in breast cancer and the possibility of quantifying the their concentration in the blood has sparked the hope of using them as reliable, inexpensive and non-invasive biomarkers.</p><p>While differential expression analysis succeeded in identifying groups of disregulated miRNAs among tumor and healthy samples, its intrinsic dual nature makes it inadequate for cancer subtype detection. Using artificial intelligence or machine learning to uncover complex profiles of miRNA expression associated with different breast cancer subtypes has poorly been investigated and only few recent works have explored this possibility. However, the use of the same dataset both for training and testing leaves the issue of the robustness of these results still open.</p><p>In this paper, we propose a two-stage method that leverages on two ad-hoc classifiers for tumor/healthy classification and subtype identification. We assess our results using two completely independent datasets: TGCA for training and GSE68085 for testing. Experiments show that our strategy is extraordinarily effective especially for tumor/healthy classification, where we achieved an accuracy of 0.99. Yet, by means of a feature importance mechanism, our method is able to display which miRNAs lead to every single sample classification so as to enable a personalized medicine approach to therapy as well as the algorithm explainability required by the EU GDPR regulation and other similar legislations.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"3 ","pages":"Article 100042"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772415822000116/pdfft?md5=5ebd30b1a40a0f15df580e1b4efa8552&pid=1-s2.0-S2772415822000116-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72292921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-01DOI: 10.1016/j.jcmds.2022.100044
Philipp Väth , Maximilian Münch , Christoph Raab , F.-M. Schleif
High throughput sequencing technology leads to a significant increase in the number of generated protein sequences and the anchor database UniProt doubles approximately every two years. This large set of annotated data is used by many bioinformatics algorithms. Searching within these databases, typically without using any annotations, is challenging due to the variable lengths of the entries and the used non-standard comparison measures. A promising strategy to address these issues is to find fixed-length, information-preserving representations of the variable length protein sequences. A systematic algorithmic evaluation of the proposals is however surprisingly missing. In this work, we analyze how different algorithms perform in generating general protein sequence representations and provide a thorough evaluation framework PROVAL. The strategies range from a proximity representation using classical Smith–Waterman algorithm to state-of-the-art embedding techniques by means of transformer networks. The methods are evaluated by, e.g., the molecular function classification, embedding space visualization, computational complexity and the carbon footprint.
{"title":"PROVAL: A framework for comparison of protein sequence embeddings","authors":"Philipp Väth , Maximilian Münch , Christoph Raab , F.-M. Schleif","doi":"10.1016/j.jcmds.2022.100044","DOIUrl":"https://doi.org/10.1016/j.jcmds.2022.100044","url":null,"abstract":"<div><p>High throughput sequencing technology leads to a significant increase in the number of generated protein sequences and the anchor database UniProt doubles approximately every two years. This large set of annotated data is used by many bioinformatics algorithms. Searching within these databases, typically without using any annotations, is challenging due to the variable lengths of the entries and the used non-standard comparison measures. A promising strategy to address these issues is to find fixed-length, information-preserving representations of the variable length protein sequences. A systematic algorithmic evaluation of the proposals is however surprisingly missing. In this work, we analyze how different algorithms perform in generating general protein sequence representations and provide a thorough evaluation framework PROVAL. The strategies range from a proximity representation using classical Smith–Waterman algorithm to state-of-the-art embedding techniques by means of transformer networks. The methods are evaluated by, e.g., the molecular function classification, embedding space visualization, computational complexity and the carbon footprint.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"3 ","pages":"Article 100044"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772415822000128/pdfft?md5=b870f0fa5ea53661bdacc49b6a2e71b8&pid=1-s2.0-S2772415822000128-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72292922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-01DOI: 10.1016/j.jcmds.2022.100038
Kwesi Acheampong , Hongbo Guan , Huiqing Zhu
In this paper, we consider the localized method of approximate particular solutions (LMAPS) for solving a two-dimensional distributive optimal control problem governed by elliptic partial differential equations. Both radial basis functions and polynomial basis functions (RBFs) are used in the LMAPS discretization, while the leave-one-out cross-validation is adopted for the selection of the shape parameter appeared in RBFs. Numerical experiments are presented to demonstrate the accuracy and efficiency of the proposed method.
{"title":"The localized method of approximate particular solutions for solving an optimal control problem","authors":"Kwesi Acheampong , Hongbo Guan , Huiqing Zhu","doi":"10.1016/j.jcmds.2022.100038","DOIUrl":"10.1016/j.jcmds.2022.100038","url":null,"abstract":"<div><p>In this paper, we consider the localized method of approximate particular solutions (LMAPS) for solving a two-dimensional distributive optimal control problem governed by elliptic partial differential equations. Both radial basis functions and polynomial basis functions (RBFs) are used in the LMAPS discretization, while the leave-one-out cross-validation is adopted for the selection of the shape parameter appeared in RBFs. Numerical experiments are presented to demonstrate the accuracy and efficiency of the proposed method.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"3 ","pages":"Article 100038"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772415822000098/pdfft?md5=7a88a8c30fe0636f48d4081f589fccf5&pid=1-s2.0-S2772415822000098-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84146507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}