Pub Date : 2023-06-01DOI: 10.1088/2632-2153/acdd50
Alex Mallen, C. Keller, J. Kutz
In many scenarios, it is necessary to monitor a complex system via a time-series of observations and determine when anomalous exogenous events have occurred so that relevant actions can be taken. Determining whether current observations are abnormal is challenging. It requires learning an extrapolative probabilistic model of the dynamics from historical data, and using a limited number of current observations to make a classification. We leverage recent advances in long-term probabilistic forecasting, namely Deep Probabilistic Koopman, to build a general method for classifying anomalies in multi-dimensional time-series data. We also show how to utilize models with domain knowledge of the dynamics to reduce type I and type II error. We demonstrate our proposed method on the important real-world task of global atmospheric pollution monitoring, integrating it with NASA’s Global Earth Observing System Model. The system successfully detects localized anomalies in air quality due to events such as COVID-19 lockdowns and wildfires.
{"title":"Koopman-inspired approach for identification of exogenous anomalies in nonstationary time-series data","authors":"Alex Mallen, C. Keller, J. Kutz","doi":"10.1088/2632-2153/acdd50","DOIUrl":"https://doi.org/10.1088/2632-2153/acdd50","url":null,"abstract":"In many scenarios, it is necessary to monitor a complex system via a time-series of observations and determine when anomalous exogenous events have occurred so that relevant actions can be taken. Determining whether current observations are abnormal is challenging. It requires learning an extrapolative probabilistic model of the dynamics from historical data, and using a limited number of current observations to make a classification. We leverage recent advances in long-term probabilistic forecasting, namely Deep Probabilistic Koopman, to build a general method for classifying anomalies in multi-dimensional time-series data. We also show how to utilize models with domain knowledge of the dynamics to reduce type I and type II error. We demonstrate our proposed method on the important real-world task of global atmospheric pollution monitoring, integrating it with NASA’s Global Earth Observing System Model. The system successfully detects localized anomalies in air quality due to events such as COVID-19 lockdowns and wildfires.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":" ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49508719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01DOI: 10.1088/2632-2153/acdc03
Rohit Modee, Ashwini Verma, Kavita Joshi, Deva Priyakumar
The generation of low-energy 3D structures of metal clusters depends on the efficiency of the search algorithm and the accuracy of inter-atomic interaction description. In this work, we formulate the search algorithm as a reinforcement learning (RL) problem. Concisely, we propose a novel actor-critic architecture that generates low-lying isomers of metal clusters at a fraction of computational cost than conventional methods. Our RL-based search algorithm uses a previously developed DART model as a reward function to describe the inter-atomic interactions to validate predicted structures. Using the DART model as a reward function incentivizes the RL model to generate low-energy structures and helps generate valid structures. We demonstrate the advantages of our approach over conventional methods for scanning local minima on potential energy surface. Our approach not only generates isomer of gallium clusters at a minimal computational cost but also predicts isomer families that were not discovered through previous density-functional theory (DFT)-based approaches.
{"title":"MeGen - generation of gallium metal clusters using reinforcement learning","authors":"Rohit Modee, Ashwini Verma, Kavita Joshi, Deva Priyakumar","doi":"10.1088/2632-2153/acdc03","DOIUrl":"https://doi.org/10.1088/2632-2153/acdc03","url":null,"abstract":"The generation of low-energy 3D structures of metal clusters depends on the efficiency of the search algorithm and the accuracy of inter-atomic interaction description. In this work, we formulate the search algorithm as a reinforcement learning (RL) problem. Concisely, we propose a novel actor-critic architecture that generates low-lying isomers of metal clusters at a fraction of computational cost than conventional methods. Our RL-based search algorithm uses a previously developed DART model as a reward function to describe the inter-atomic interactions to validate predicted structures. Using the DART model as a reward function incentivizes the RL model to generate low-energy structures and helps generate valid structures. We demonstrate the advantages of our approach over conventional methods for scanning local minima on potential energy surface. Our approach not only generates isomer of gallium clusters at a minimal computational cost but also predicts isomer families that were not discovered through previous density-functional theory (DFT)-based approaches.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":" ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46231450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01DOI: 10.1088/2632-2153/acd987
Jaeyoon Kim, Minhyeok Lee, Junhee Seok
Breast cancer has the highest incidence and death rate among women; moreover, its metastasis to other organs increases the mortality rate. Since several studies have reported gene expression and cancer prognosis to be related, the study of breast cancer metastasis using gene expression is crucial. To this end, a novel deep neural network architecture, deep learning-based cancer metastasis estimator (DeepCME), is proposed in this paper for predicting breast cancer metastasis. However, the problem of overfitting occurs frequently while training deep learning models using gene expression data because they contain a large number of genes and the sample size is rather small. To address overfitting, several regularization methods are implemented, such as L1 penalty, batch normalization, and dropout. To demonstrate the superior performance of our model, area under curve (AUC) scores are evaluated and then compared with five baseline models: logistic regression, support vector classifier (SVC), random forest, decision tree, and k-nearest neighbor. Considering results, DeepCME demonstrates the highest average AUC scores in most cross-validation cases, and the average AUC score of DeepCME is 0.754, which is approximately 12.9% higher than SVC, the second-best model. In addition, the 30 most significant genes related to breast cancer metastasis are identified based on DeepCME results and some are discussed in further detail considering the reports from some previous medical studies. Considering the high expense involved in measuring the expression of a single gene, the ability to develop the cost-effective and time-efficient tests using only a few key genes is valuable. Based on this study, we expect DeepCME to be utilized clinically for predicting breast cancer metastasis and be applied to other types of cancer as well after further research.
{"title":"Deep learning model with L1 penalty for predicting breast cancer metastasis using gene expression data","authors":"Jaeyoon Kim, Minhyeok Lee, Junhee Seok","doi":"10.1088/2632-2153/acd987","DOIUrl":"https://doi.org/10.1088/2632-2153/acd987","url":null,"abstract":"Breast cancer has the highest incidence and death rate among women; moreover, its metastasis to other organs increases the mortality rate. Since several studies have reported gene expression and cancer prognosis to be related, the study of breast cancer metastasis using gene expression is crucial. To this end, a novel deep neural network architecture, deep learning-based cancer metastasis estimator (DeepCME), is proposed in this paper for predicting breast cancer metastasis. However, the problem of overfitting occurs frequently while training deep learning models using gene expression data because they contain a large number of genes and the sample size is rather small. To address overfitting, several regularization methods are implemented, such as L1 penalty, batch normalization, and dropout. To demonstrate the superior performance of our model, area under curve (AUC) scores are evaluated and then compared with five baseline models: logistic regression, support vector classifier (SVC), random forest, decision tree, and k-nearest neighbor. Considering results, DeepCME demonstrates the highest average AUC scores in most cross-validation cases, and the average AUC score of DeepCME is 0.754, which is approximately 12.9% higher than SVC, the second-best model. In addition, the 30 most significant genes related to breast cancer metastasis are identified based on DeepCME results and some are discussed in further detail considering the reports from some previous medical studies. Considering the high expense involved in measuring the expression of a single gene, the ability to develop the cost-effective and time-efficient tests using only a few key genes is valuable. Based on this study, we expect DeepCME to be utilized clinically for predicting breast cancer metastasis and be applied to other types of cancer as well after further research.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":" ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44811425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-30DOI: 10.1088/2632-2153/acda10
Samaneh Manavi Roodsari, Antal Huck-Horváth, Sara Freund, A. Zam, G. Rauter, W. Schade, P. Cattin
Continuum robots in robot-assisted minimally invasive surgeries provide adequate access to target anatomies that are not directly reachable through small incisions. Achieving precise and reliable shape estimation of such snake-like manipulators necessitates an accurate navigation system, that requires no line-of-sight and is immune to electromagnetic noise. Fiber Bragg grating (FBG) shape sensing, particularly eccentric FBG (eFBG), is a promising and cost-effective solution for this task. However, in eFBG sensors, the spectral intensity of the Bragg wavelengths that carries the strain information can be affected by undesired bending-induced phenomena, making standard characterization techniques less suitable for these sensors. We showed in our previous work that a deep learning model has the potential to extract the strain information from the eFBG sensor’s spectrum and accurately predict its shape. In this paper, we conducted a more thorough investigation to find a suitable architectural design of the deep learning model to further increase shape prediction accuracy. We used the Hyperband algorithm to search for optimal hyperparameters in two steps. First, we limited the search space to layer settings of the network, from which, the best-performing configuration was selected. Then, we modified the search space for tuning the training and loss calculation hyperparameters. We also analyzed various data transformations on the network’s input and output variables, as data rescaling can directly influence the model’s performance. Additionally, we performed discriminative training using the Siamese network architecture that employs two convolutional neural networks (CNN) with identical parameters to learn similarity metrics between the spectra of similar target values. The best-performing network architecture among all evaluated configurations can predict the shape of a 30 cm long sensor with a median tip error of 3.11 mm in a curvature range of 1.4 m−1 to 35.3 m−1.
{"title":"Shape sensing of optical fiber Bragg gratings based on deep learning","authors":"Samaneh Manavi Roodsari, Antal Huck-Horváth, Sara Freund, A. Zam, G. Rauter, W. Schade, P. Cattin","doi":"10.1088/2632-2153/acda10","DOIUrl":"https://doi.org/10.1088/2632-2153/acda10","url":null,"abstract":"Continuum robots in robot-assisted minimally invasive surgeries provide adequate access to target anatomies that are not directly reachable through small incisions. Achieving precise and reliable shape estimation of such snake-like manipulators necessitates an accurate navigation system, that requires no line-of-sight and is immune to electromagnetic noise. Fiber Bragg grating (FBG) shape sensing, particularly eccentric FBG (eFBG), is a promising and cost-effective solution for this task. However, in eFBG sensors, the spectral intensity of the Bragg wavelengths that carries the strain information can be affected by undesired bending-induced phenomena, making standard characterization techniques less suitable for these sensors. We showed in our previous work that a deep learning model has the potential to extract the strain information from the eFBG sensor’s spectrum and accurately predict its shape. In this paper, we conducted a more thorough investigation to find a suitable architectural design of the deep learning model to further increase shape prediction accuracy. We used the Hyperband algorithm to search for optimal hyperparameters in two steps. First, we limited the search space to layer settings of the network, from which, the best-performing configuration was selected. Then, we modified the search space for tuning the training and loss calculation hyperparameters. We also analyzed various data transformations on the network’s input and output variables, as data rescaling can directly influence the model’s performance. Additionally, we performed discriminative training using the Siamese network architecture that employs two convolutional neural networks (CNN) with identical parameters to learn similarity metrics between the spectra of similar target values. The best-performing network architecture among all evaluated configurations can predict the shape of a 30 cm long sensor with a median tip error of 3.11 mm in a curvature range of 1.4 m−1 to 35.3 m−1.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":" ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2023-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45790368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-22DOI: 10.1088/2632-2153/acefa8
Peter Wirnsberger, Borja Ibarz, G. Papamakarios
We present a machine-learning model based on normalizing flows that is trained to sample from the isobaric-isothermal ensemble. In our approach, we approximate the joint distribution of a fully-flexible triclinic simulation box and particle coordinates to achieve a desired internal pressure. This novel extension of flow-based sampling to the isobaric-isothermal ensemble yields direct estimates of Gibbs free energies. We test our NPT-flow on monatomic water in the cubic and hexagonal ice phases and find excellent agreement of Gibbs free energies and other observables compared with established baselines.
{"title":"Estimating Gibbs free energies via isobaric-isothermal flows","authors":"Peter Wirnsberger, Borja Ibarz, G. Papamakarios","doi":"10.1088/2632-2153/acefa8","DOIUrl":"https://doi.org/10.1088/2632-2153/acefa8","url":null,"abstract":"We present a machine-learning model based on normalizing flows that is trained to sample from the isobaric-isothermal ensemble. In our approach, we approximate the joint distribution of a fully-flexible triclinic simulation box and particle coordinates to achieve a desired internal pressure. This novel extension of flow-based sampling to the isobaric-isothermal ensemble yields direct estimates of Gibbs free energies. We test our NPT-flow on monatomic water in the cubic and hexagonal ice phases and find excellent agreement of Gibbs free energies and other observables compared with established baselines.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":" ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2023-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42265571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-22DOI: 10.1088/2632-2153/acd7c3
Eric W Lin, Boyuan Liu, L. Lac, Daryl L. X. Fung, C. Leung, P. Hu
Cell type identification using single-cell RNA sequencing data is critical for understanding disease mechanisms and drug discovery. Cell clustering analysis has been widely studied in health research for rare tumor cell detection. In this study, we propose a Gaussian mixture model-based variational graph autoencoder on scRNA-seq data (scGMM-VGAE) that integrates a statistical clustering model to a deep learning algorithm to significantly improve the cell clustering performance. This model feeds a cell-cell graph adjacency matrix and a gene feature matrix into a graph variational autoencoder (VGAE) to generate latent data. These data are then used for cell clustering by the Gaussian mixture model (GMM) module. To optimize the algorithm, a designed loss function is derived by combining parameter estimates from the GMM and VGAE. We test the proposed method on four publicly available and three simulated datasets which contain many biological and technical zeros. The scGMM-VGAE outperforms four selected baseline methods on three evaluation metrics in cell clustering. By successfully incorporating GMM into deep learning VGAE on scRNA-seq data, the proposed method shows higher accuracy in cell clustering on scRNA-seq data. This improvement has a significant impact on detecting rare cell types in health research. All source codes used in this study can be found at https://github.com/ericlin1230/scGMM-VGAE.
{"title":"scGMM-VGAE: a Gaussian mixture model-based variational graph autoencoder algorithm for clustering single-cell RNA-seq data","authors":"Eric W Lin, Boyuan Liu, L. Lac, Daryl L. X. Fung, C. Leung, P. Hu","doi":"10.1088/2632-2153/acd7c3","DOIUrl":"https://doi.org/10.1088/2632-2153/acd7c3","url":null,"abstract":"Cell type identification using single-cell RNA sequencing data is critical for understanding disease mechanisms and drug discovery. Cell clustering analysis has been widely studied in health research for rare tumor cell detection. In this study, we propose a Gaussian mixture model-based variational graph autoencoder on scRNA-seq data (scGMM-VGAE) that integrates a statistical clustering model to a deep learning algorithm to significantly improve the cell clustering performance. This model feeds a cell-cell graph adjacency matrix and a gene feature matrix into a graph variational autoencoder (VGAE) to generate latent data. These data are then used for cell clustering by the Gaussian mixture model (GMM) module. To optimize the algorithm, a designed loss function is derived by combining parameter estimates from the GMM and VGAE. We test the proposed method on four publicly available and three simulated datasets which contain many biological and technical zeros. The scGMM-VGAE outperforms four selected baseline methods on three evaluation metrics in cell clustering. By successfully incorporating GMM into deep learning VGAE on scRNA-seq data, the proposed method shows higher accuracy in cell clustering on scRNA-seq data. This improvement has a significant impact on detecting rare cell types in health research. All source codes used in this study can be found at https://github.com/ericlin1230/scGMM-VGAE.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":" ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2023-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48549329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-15DOI: 10.1088/2632-2153/acd5a9
Snehal Rajput, Rupal A. Kapdi, M. Raval, Mohendra Roy
An artificial intelligence (AI) model’s performance is strongly influenced by the input features. Therefore, it is vital to find the optimal feature set. It is more crucial for the survival prediction of the glioblastoma multiforme (GBM) type of brain tumor. In this study, we identify the best feature set for predicting the survival days (SD) of GBM patients that outrank the current state-of-the-art methodologies. The proposed approach is an end-to-end AI model. This model first segments tumors from healthy brain parts in patients’ MRI images, extracts features from the segmented results, performs feature selection, and makes predictions about patients’ survival days (SD) based on selected features. The extracted features are primarily shape-based, location-based, and radiomics-based features. Additionally, patient metadata is also included as a feature. The selection methods include recursive feature elimination, permutation importance (PI), and finding the correlation between the features. Finally, we examined features’ behavior at local (single sample) and global (all the samples) levels. In this study, we find that out of 1265 extracted features, only 29 dominant features play a crucial role in predicting patients’ SD. Among these 29 features, one is metadata (age of patient), three are location-based, and the rest are radiomics features. Furthermore, we find explanations of these features using post-hoc interpretability methods to validate the model’s robust prediction and understand its decision. Finally, we analyzed the behavioral impact of the top six features on survival prediction, and the findings drawn from the explanations were coherent with the medical domain. We find that after the age of 50 years, the likelihood of survival of a patient deteriorates, and survival after 80 years is scarce. Again, for location-based features, the SD is less if the tumor location is in the central or back part of the brain. All these trends derived from the developed AI model are in sync with medically proven facts. The results show an overall 33% improvement in the accuracy of SD prediction compared to the top-performing methods of the BraTS-2020 challenge.
{"title":"Interpretable machine learning model to predict survival days of malignant brain tumor patients","authors":"Snehal Rajput, Rupal A. Kapdi, M. Raval, Mohendra Roy","doi":"10.1088/2632-2153/acd5a9","DOIUrl":"https://doi.org/10.1088/2632-2153/acd5a9","url":null,"abstract":"An artificial intelligence (AI) model’s performance is strongly influenced by the input features. Therefore, it is vital to find the optimal feature set. It is more crucial for the survival prediction of the glioblastoma multiforme (GBM) type of brain tumor. In this study, we identify the best feature set for predicting the survival days (SD) of GBM patients that outrank the current state-of-the-art methodologies. The proposed approach is an end-to-end AI model. This model first segments tumors from healthy brain parts in patients’ MRI images, extracts features from the segmented results, performs feature selection, and makes predictions about patients’ survival days (SD) based on selected features. The extracted features are primarily shape-based, location-based, and radiomics-based features. Additionally, patient metadata is also included as a feature. The selection methods include recursive feature elimination, permutation importance (PI), and finding the correlation between the features. Finally, we examined features’ behavior at local (single sample) and global (all the samples) levels. In this study, we find that out of 1265 extracted features, only 29 dominant features play a crucial role in predicting patients’ SD. Among these 29 features, one is metadata (age of patient), three are location-based, and the rest are radiomics features. Furthermore, we find explanations of these features using post-hoc interpretability methods to validate the model’s robust prediction and understand its decision. Finally, we analyzed the behavioral impact of the top six features on survival prediction, and the findings drawn from the explanations were coherent with the medical domain. We find that after the age of 50 years, the likelihood of survival of a patient deteriorates, and survival after 80 years is scarce. Again, for location-based features, the SD is less if the tumor location is in the central or back part of the brain. All these trends derived from the developed AI model are in sync with medically proven facts. The results show an overall 33% improvement in the accuracy of SD prediction compared to the top-performing methods of the BraTS-2020 challenge.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":" ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2023-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42639698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-10DOI: 10.1088/2632-2153/acf097
Song Jin Ri, P. Putrov
We test the efficiency of applying geometric deep learning to the problems in low-dimensional topology in a certain simple setting. Specifically, we consider the class of 3-manifolds described by plumbing graphs and use graph neural networks (GNN) for the problem of deciding whether a pair of graphs give homeomorphic 3-manifolds. We use supervised learning to train a GNN that provides the answer to such a question with high accuracy. Moreover, we consider reinforcement learning by a GNN to find a sequence of Neumann moves that relates the pair of graphs if the answer is positive. The setting can be understood as a toy model of the problem of deciding whether a pair of Kirby diagrams give diffeomorphic 3- or 4-manifolds.
{"title":"Graph Neural Networks and 3-dimensional topology","authors":"Song Jin Ri, P. Putrov","doi":"10.1088/2632-2153/acf097","DOIUrl":"https://doi.org/10.1088/2632-2153/acf097","url":null,"abstract":"We test the efficiency of applying geometric deep learning to the problems in low-dimensional topology in a certain simple setting. Specifically, we consider the class of 3-manifolds described by plumbing graphs and use graph neural networks (GNN) for the problem of deciding whether a pair of graphs give homeomorphic 3-manifolds. We use supervised learning to train a GNN that provides the answer to such a question with high accuracy. Moreover, we consider reinforcement learning by a GNN to find a sequence of Neumann moves that relates the pair of graphs if the answer is positive. The setting can be understood as a toy model of the problem of deciding whether a pair of Kirby diagrams give diffeomorphic 3- or 4-manifolds.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":" ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2023-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45440585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-01DOI: 10.1088/2632-2153/acf099
S. Tovey, S. Krippendorf, K. Nikolaou, Daniel Fink
A theory of neural networks (NNs) built upon collective variables would provide scientists with the tools to better understand the learning process at every stage. In this work, we introduce two such variables, the entropy and the trace of the empirical neural tangent kernel (NTK) built on the training data passed to the model. We empirically analyze the NN performance in the context of these variables and find that there exists correlation between the starting entropy, the trace of the NTK, and the generalization of the model computed after training is complete. This framework is then applied to the problem of optimal data selection for the training of NNs. To this end, random network distillation (RND) is used as a means of selecting training data which is then compared with random selection of data. It is shown that not only does RND select data-sets capable of outperforming random selection, but that the collective variables associated with the RND data-sets are larger than those of the randomly selected sets. The results of this investigation provide a stable ground from which the selection of data for NN training can be driven by this phenomenological framework.
{"title":"Towards a phenomenological understanding of neural networks: data","authors":"S. Tovey, S. Krippendorf, K. Nikolaou, Daniel Fink","doi":"10.1088/2632-2153/acf099","DOIUrl":"https://doi.org/10.1088/2632-2153/acf099","url":null,"abstract":"A theory of neural networks (NNs) built upon collective variables would provide scientists with the tools to better understand the learning process at every stage. In this work, we introduce two such variables, the entropy and the trace of the empirical neural tangent kernel (NTK) built on the training data passed to the model. We empirically analyze the NN performance in the context of these variables and find that there exists correlation between the starting entropy, the trace of the NTK, and the generalization of the model computed after training is complete. This framework is then applied to the problem of optimal data selection for the training of NNs. To this end, random network distillation (RND) is used as a means of selecting training data which is then compared with random selection of data. It is shown that not only does RND select data-sets capable of outperforming random selection, but that the collective variables associated with the RND data-sets are larger than those of the randomly selected sets. The results of this investigation provide a stable ground from which the selection of data for NN training can be driven by this phenomenological framework.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":" ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42797889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-25DOI: 10.1088/2632-2153/acd048
Yongcheng Ding, Xi Chen, R. Magdalena-Benedito, J. Martín-Guerrero
The exotic nature of quantum mechanics differentiates machine learning applications in the quantum realm from classical ones. Stream learning is a powerful approach that can be applied to extract knowledge continuously from quantum systems in a wide range of tasks. In this paper, we propose a deep reinforcement learning method that uses streaming data from a continuously measured qubit in the presence of detuning, dephasing, and relaxation. The model receives streaming quantum information for learning and decision-making, providing instant feedback on the quantum system. We also explore the agent’s adaptability to other quantum noise patterns through transfer learning. Our protocol offers insights into closed-loop quantum control, potentially advancing the development of quantum technologies.
{"title":"Closed-loop control of a noisy qubit with reinforcement learning","authors":"Yongcheng Ding, Xi Chen, R. Magdalena-Benedito, J. Martín-Guerrero","doi":"10.1088/2632-2153/acd048","DOIUrl":"https://doi.org/10.1088/2632-2153/acd048","url":null,"abstract":"The exotic nature of quantum mechanics differentiates machine learning applications in the quantum realm from classical ones. Stream learning is a powerful approach that can be applied to extract knowledge continuously from quantum systems in a wide range of tasks. In this paper, we propose a deep reinforcement learning method that uses streaming data from a continuously measured qubit in the presence of detuning, dephasing, and relaxation. The model receives streaming quantum information for learning and decision-making, providing instant feedback on the quantum system. We also explore the agent’s adaptability to other quantum noise patterns through transfer learning. Our protocol offers insights into closed-loop quantum control, potentially advancing the development of quantum technologies.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":" ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2023-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49513687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}