Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications最新文献
Pub Date : 2024-12-01Epub Date: 2025-03-04DOI: 10.1109/icmla61862.2024.00154
Aaron J Masino, Ranga Baminiwatte
Rare disease diagnosis is challenging in large part due to incomplete knowledge of gene-to-phenotype associations. One way to address this is to adopt a gene-to-patient paradigm wherein one selects an in-silico predicted pathogenic variant, identifies individuals with the variant, and then determines if the individuals have a shared phenotype. Most studies following this paradigm determine presence of a shared phenotype through manual review of ontology terms in the patient record. We propose a novel automated method to identify the shared phenotype via genetic search using a fitness function that compares the similarity of phenotype term embeddings generated by advanced NLP models applied to the term's text descriptions. Leveraging Human Phenotype Ontology resources, we generated a library of simulated patients across 5,076 Mendelian diseases. Applying our approach to these simulated disease cohorts, we found that the solution phenotypes included a closely matching term for the majority of terms in the disease phenotype under variable conditions of annotation imprecision and noise. We anticipate these methods can aid gene-to-phenotype association discovery for rare diseases by enabling a scalable gene-to-patient research paradigm.
{"title":"Automated Shared Phenotype Discovery in Undiagnosed Cohorts for Rare Disease Research.","authors":"Aaron J Masino, Ranga Baminiwatte","doi":"10.1109/icmla61862.2024.00154","DOIUrl":"10.1109/icmla61862.2024.00154","url":null,"abstract":"<p><p>Rare disease diagnosis is challenging in large part due to incomplete knowledge of gene-to-phenotype associations. One way to address this is to adopt a gene-to-patient paradigm wherein one selects an in-silico predicted pathogenic variant, identifies individuals with the variant, and then determines if the individuals have a shared phenotype. Most studies following this paradigm determine presence of a shared phenotype through manual review of ontology terms in the patient record. We propose a novel automated method to identify the shared phenotype via genetic search using a fitness function that compares the similarity of phenotype term embeddings generated by advanced NLP models applied to the term's text descriptions. Leveraging Human Phenotype Ontology resources, we generated a library of simulated patients across 5,076 Mendelian diseases. Applying our approach to these simulated disease cohorts, we found that the solution phenotypes included a closely matching term for the majority of terms in the disease phenotype under variable conditions of annotation imprecision and noise. We anticipate these methods can aid gene-to-phenotype association discovery for rare diseases by enabling a scalable gene-to-patient research paradigm.</p>","PeriodicalId":74528,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","volume":"2024 ","pages":"1025-1030"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11967416/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143797378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-28DOI: 10.5121/mlaij.2023.10301
Saachin Bhatt, Mustansar Ghazanfar, Mohammad Hossein Amirhosseini
This research explores the impact of social media sentiments on predicting Bitcoin prices using machine learning models, integrating on-chain data, and applying a Multi Modal Fusion Model. Historical crypto market, on-chain, and Twitter data from 2014 to 2022 were used to train models including K-Nearest Neighbors, Logistic Regression, Gaussian Naive Bayes, Support Vector Machine, Extreme Gradient Boosting, and Multi Modal Fusion. Performance was compared with and without Twitter sentiment data which was analysed using the Twitter-roBERTa and VADAR models. Inclusion of sentiment data enhanced model performance, with Twitter-roBERTa-based models achieving an average accuracy score of 0.81. The best performing model was an optimised Multi Modal Fusion model using Twitter-roBERTa, with an accuracy score of 0.90. This research underscores the value of integrating social media sentiment analysis and onchain data in financial forecasting, providing a robust tool for informed decision-making in cryptocurrency trading.
{"title":"Sentiment-Driven Cryptocurrency Price Prediction: A Machine Learning Approach Utilizing Historical Data and Social Media Sentiment Analysis","authors":"Saachin Bhatt, Mustansar Ghazanfar, Mohammad Hossein Amirhosseini","doi":"10.5121/mlaij.2023.10301","DOIUrl":"https://doi.org/10.5121/mlaij.2023.10301","url":null,"abstract":"This research explores the impact of social media sentiments on predicting Bitcoin prices using machine learning models, integrating on-chain data, and applying a Multi Modal Fusion Model. Historical crypto market, on-chain, and Twitter data from 2014 to 2022 were used to train models including K-Nearest Neighbors, Logistic Regression, Gaussian Naive Bayes, Support Vector Machine, Extreme Gradient Boosting, and Multi Modal Fusion. Performance was compared with and without Twitter sentiment data which was analysed using the Twitter-roBERTa and VADAR models. Inclusion of sentiment data enhanced model performance, with Twitter-roBERTa-based models achieving an average accuracy score of 0.81. The best performing model was an optimised Multi Modal Fusion model using Twitter-roBERTa, with an accuracy score of 0.90. This research underscores the value of integrating social media sentiment analysis and onchain data in financial forecasting, providing a robust tool for informed decision-making in cryptocurrency trading.","PeriodicalId":74528,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135470323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-28DOI: 10.5121/mlaij.2023.10303
Mamdouh M. Gomaa, Alaa Elnashar, Mahmoud M. Eelsherif, Alaa M. Zaki
In current times, after the rapid expansion and spread of the COVID-19 outbreak globally, people have experienced severe disruption to their daily lives. One idea to manage the out-break is to enforce people wear a face mask in public places. Therefore, automated and efficient face detection methods are essential for such enforcement. In this paper, a face mask detection model for images has been presented which classifies the images as “with mask” and “without mask”. The model is trained and evaluated using the three datasets Real-World Masked Face Dataset (RMFD), Simulated Masked Face Dataset (SMFD), and Labeled Faces in the Wild (LFW), and attained a performance accuracy rate of 99.72% for first dataset, and 100% for the second and third datasets. This work can be utilized as a digitized scanning tool in schools, hospitals, banks, and airports, and many other public or commercial locations.
{"title":"Face Mask Detection Model Using Convolutional Neural Network","authors":"Mamdouh M. Gomaa, Alaa Elnashar, Mahmoud M. Eelsherif, Alaa M. Zaki","doi":"10.5121/mlaij.2023.10303","DOIUrl":"https://doi.org/10.5121/mlaij.2023.10303","url":null,"abstract":"In current times, after the rapid expansion and spread of the COVID-19 outbreak globally, people have experienced severe disruption to their daily lives. One idea to manage the out-break is to enforce people wear a face mask in public places. Therefore, automated and efficient face detection methods are essential for such enforcement. In this paper, a face mask detection model for images has been presented which classifies the images as “with mask” and “without mask”. The model is trained and evaluated using the three datasets Real-World Masked Face Dataset (RMFD), Simulated Masked Face Dataset (SMFD), and Labeled Faces in the Wild (LFW), and attained a performance accuracy rate of 99.72% for first dataset, and 100% for the second and third datasets. This work can be utilized as a digitized scanning tool in schools, hospitals, banks, and airports, and many other public or commercial locations.","PeriodicalId":74528,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135470328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-28DOI: 10.5121/mlaij.2023.10302
Ankita Patra, Santi Kumari Behera, Prabira Kumar Sethy, Nalini Kanta Barpanda, Ipsa Mahapatra
Breast cancer tissues grow when cells in the breast expand and divide uncontrollably, resulting in a lump of tissue commonly called and named tumor. Breast cancer is the second most prevalent cancer among women, following skin cancer. While it is more commonly diagnosed in women aged 50 and above, it can affect individuals of any age. Although it is rare, men can also develop breast cancer, accounting for less than 1% of all cases, with approximately 2,600 cases reported annually in the United States. Early detection of breast tumors is crucial in reducing the risk of developing breast cancer. A publicly available dataset containing features of breast tumors was utilized to identify breast tumors using machine learning and deep learning techniques. Various prediction models were constructed, including logistic regression (LR), decision tree (DT), random forest (RF), support vector machine (SVM), Gradient Boosting (GB), Extreme Gradient Boosting (XGB), Light GBM, and a recurrent neural network (RNN) model. These models were trained to classify and predict breast tumor cases based on the provided features.
{"title":"Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Techniques","authors":"Ankita Patra, Santi Kumari Behera, Prabira Kumar Sethy, Nalini Kanta Barpanda, Ipsa Mahapatra","doi":"10.5121/mlaij.2023.10302","DOIUrl":"https://doi.org/10.5121/mlaij.2023.10302","url":null,"abstract":"Breast cancer tissues grow when cells in the breast expand and divide uncontrollably, resulting in a lump of tissue commonly called and named tumor. Breast cancer is the second most prevalent cancer among women, following skin cancer. While it is more commonly diagnosed in women aged 50 and above, it can affect individuals of any age. Although it is rare, men can also develop breast cancer, accounting for less than 1% of all cases, with approximately 2,600 cases reported annually in the United States. Early detection of breast tumors is crucial in reducing the risk of developing breast cancer. A publicly available dataset containing features of breast tumors was utilized to identify breast tumors using machine learning and deep learning techniques. Various prediction models were constructed, including logistic regression (LR), decision tree (DT), random forest (RF), support vector machine (SVM), Gradient Boosting (GB), Extreme Gradient Boosting (XGB), Light GBM, and a recurrent neural network (RNN) model. These models were trained to classify and predict breast tumor cases based on the provided features.","PeriodicalId":74528,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135470329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-01DOI: 10.1109/ICMLA55696.2022.00196
Tiago Rodrigues de Almeida, Eduardo Gutiérrez-Maestro, Óscar Martínez Mozos
{"title":"Context-free Self-Conditioned GAN for Trajectory Forecasting","authors":"Tiago Rodrigues de Almeida, Eduardo Gutiérrez-Maestro, Óscar Martínez Mozos","doi":"10.1109/ICMLA55696.2022.00196","DOIUrl":"https://doi.org/10.1109/ICMLA55696.2022.00196","url":null,"abstract":"","PeriodicalId":74528,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","volume":"1 1","pages":"1218-1223"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87820128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/icmla52953.2021.00131
Zongyu Dai, Zhiqi Bu, Qi Long
Missing data are present in most real world problems and need careful handling to preserve the prediction accuracy and statistical consistency in the downstream analysis. As the gold standard of handling missing data, multiple imputation (MI) methods are proposed to account for the imputation uncertainty and provide proper statistical inference. In this work, we propose Multiple Imputation via Generative Adversarial Network (MI-GAN), a deep learning-based (in specific, a GAN-based) multiple imputation method, that can work under missing at random (MAR) mechanism with theoretical support. MI-GAN leverages recent progress in conditional generative adversarial neural works and shows strong performance matching existing state-of-the-art imputation methods on high-dimensional datasets, in terms of imputation error. In particular, MI-GAN significantly outperforms other imputation methods in the sense of statistical inference and computational speed.
{"title":"Multiple Imputation via Generative Adversarial Network for High-dimensional Blockwise Missing Value Problems.","authors":"Zongyu Dai, Zhiqi Bu, Qi Long","doi":"10.1109/icmla52953.2021.00131","DOIUrl":"https://doi.org/10.1109/icmla52953.2021.00131","url":null,"abstract":"<p><p>Missing data are present in most real world problems and need careful handling to preserve the prediction accuracy and statistical consistency in the downstream analysis. As the gold standard of handling missing data, multiple imputation (MI) methods are proposed to account for the imputation uncertainty and provide proper statistical inference. In this work, we propose Multiple Imputation via Generative Adversarial Network (MI-GAN), a deep learning-based (in specific, a GAN-based) multiple imputation method, that can work under missing at random (MAR) mechanism with theoretical support. MI-GAN leverages recent progress in conditional generative adversarial neural works and shows strong performance matching existing state-of-the-art imputation methods on high-dimensional datasets, in terms of imputation error. In particular, MI-GAN significantly outperforms other imputation methods in the sense of statistical inference and computational speed.</p>","PeriodicalId":74528,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","volume":"2021 ","pages":"791-798"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8841955/pdf/nihms-1776623.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10217351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-01DOI: 10.1109/ICMLA52953.2021.00086
Lixing Song, Junheng Wang, Junhong Xu
{"title":"A Data-Efficient Reinforcement Learning Method Based on Local Koopman Operators","authors":"Lixing Song, Junheng Wang, Junhong Xu","doi":"10.1109/ICMLA52953.2021.00086","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00086","url":null,"abstract":"","PeriodicalId":74528,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","volume":"26 1","pages":"515-520"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81743870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-01DOI: 10.1109/ICMLA52953.2021.00084
Juan Manuel Parrilla Gutierrez
{"title":"Predicting Real-time Scientific Experiments Using Transformer models and Reinforcement Learning","authors":"Juan Manuel Parrilla Gutierrez","doi":"10.1109/ICMLA52953.2021.00084","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00084","url":null,"abstract":"","PeriodicalId":74528,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","volume":"43 1","pages":"502-506"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75249840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01Epub Date: 2021-02-23DOI: 10.1109/icmla51294.2020.00037
Arun V Sathanur, Nathan A Baker
In this work, we developed an efficient approach to compute ensemble averages in systems with pairwise-additive energetic interactions between the entities. Methods involving full enumeration of the configuration space result in exponential complexity. Sampling methods such as Markov Chain Monte Carlo (MCMC) algorithms have been proposed to tackle the exponential complexity of these problems; however, in certain scenarios where significant energetic coupling exists between the entities, the efficiency of the such algorithms can be diminished. We used a strategy to improve the efficiency of MCMC by taking advantage of the cluster structure in the interaction energy matrix to bias the sampling. We pursued two different schemes for the biased MCMC runs and show that they are valid MCMC schemes. We used both synthesized and real-world systems to show the improved performance of our biased MCMC methods when compared to the regular MCMC method. In particular, we applied these algorithms to the problem of estimating protonation ensemble averages and titration curves of residues in a protein.
{"title":"A clustering-based biased Monte Carlo approach to protein titration curve prediction.","authors":"Arun V Sathanur, Nathan A Baker","doi":"10.1109/icmla51294.2020.00037","DOIUrl":"https://doi.org/10.1109/icmla51294.2020.00037","url":null,"abstract":"<p><p>In this work, we developed an efficient approach to compute ensemble averages in systems with pairwise-additive energetic interactions between the entities. Methods involving full enumeration of the configuration space result in exponential complexity. Sampling methods such as Markov Chain Monte Carlo (MCMC) algorithms have been proposed to tackle the exponential complexity of these problems; however, in certain scenarios where significant energetic coupling exists between the entities, the efficiency of the such algorithms can be diminished. We used a strategy to improve the efficiency of MCMC by taking advantage of the cluster structure in the interaction energy matrix to bias the sampling. We pursued two different schemes for the biased MCMC runs and show that they are valid MCMC schemes. We used both synthesized and real-world systems to show the improved performance of our biased MCMC methods when compared to the regular MCMC method. In particular, we applied these algorithms to the problem of estimating protonation ensemble averages and titration curves of residues in a protein.</p>","PeriodicalId":74528,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","volume":"2020 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/icmla51294.2020.00037","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39530145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-01-01DOI: 10.1109/ICMLA51294.2020.00008
Jiebo Luo
{"title":"Learning with Unpaired Data","authors":"Jiebo Luo","doi":"10.1109/ICMLA51294.2020.00008","DOIUrl":"https://doi.org/10.1109/ICMLA51294.2020.00008","url":null,"abstract":"","PeriodicalId":74528,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","volume":"77 1","pages":"38"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79694710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}