首页 > 最新文献

2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)最新文献

英文 中文
A Text-mining approach for crime tweets in Saudi Arabia: From analysis to prediction 沙特阿拉伯犯罪推文的文本挖掘方法:从分析到预测
Amal Algefes, Nouf Aldossari, Fatma Masmoudi, Elham Kariri
Social networks have proven to be a massive hub for investigating contextual and individual behavior of people. Most recently micro-blogging sites like Twitter are indicating to researchers that their content can be aggregated and used to effectively predict forecast, and infer outcomes of real-world events. The crime-related tweets analysis research in Saudi Arabia set off with an ultimate goal of gathering a deeper understanding of what kinds of criminal weapons are people frequently talking about. In this paper, we aim at dealing with tweets mentioning different weapons, analyzing them to gather facts such as annual variation of percentage tweets mentioning different weapons, recognizing the impact of events such as the Covid-19 pandemic on crime social discussions. In the following step, we develop a number of classifiers to predict which weapon is mentioned in a tweet. In order to perform our tasks, the Python programming language is used in the majority of the cases.
社交网络已被证明是调查人们情境和个人行为的一个巨大中心。最近,像Twitter这样的微博网站向研究人员表明,它们的内容可以被聚合起来,用于有效地预测、预测和推断现实世界事件的结果。沙特阿拉伯与犯罪相关的推文分析研究的最终目标是更深入地了解人们经常谈论的犯罪武器类型。在本文中,我们的目标是处理提到不同武器的推文,分析它们以收集诸如提到不同武器的推文百分比的年度变化等事实,并认识到Covid-19大流行等事件对犯罪社会讨论的影响。在接下来的步骤中,我们开发了许多分类器来预测推文中提到的武器。为了执行我们的任务,在大多数情况下使用Python编程语言。
{"title":"A Text-mining approach for crime tweets in Saudi Arabia: From analysis to prediction","authors":"Amal Algefes, Nouf Aldossari, Fatma Masmoudi, Elham Kariri","doi":"10.1109/CDMA54072.2022.00023","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00023","url":null,"abstract":"Social networks have proven to be a massive hub for investigating contextual and individual behavior of people. Most recently micro-blogging sites like Twitter are indicating to researchers that their content can be aggregated and used to effectively predict forecast, and infer outcomes of real-world events. The crime-related tweets analysis research in Saudi Arabia set off with an ultimate goal of gathering a deeper understanding of what kinds of criminal weapons are people frequently talking about. In this paper, we aim at dealing with tweets mentioning different weapons, analyzing them to gather facts such as annual variation of percentage tweets mentioning different weapons, recognizing the impact of events such as the Covid-19 pandemic on crime social discussions. In the following step, we develop a number of classifiers to predict which weapon is mentioned in a tweet. In order to perform our tasks, the Python programming language is used in the majority of the cases.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123799570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Deep Learning for Classifying of White Blood Cancer 基于深度学习的白血癌分类
Asad Ullah, Tufail Muhammad
Automated classification of cells is an essential but challenging task for computer vision with significant biomedical advantages. Numerous studies have attempted to construct a cell classifier based on artificial intelligence using label-free cellular images obtained from an optical microscope in recent years. While these studies showed promising results, different cell types' biological complexity could not be represented by such classifiers. However, it is well-known that intracellular actin filaments are significantly modified in terms of the malignant cell. This is believed to be closely linked to tumor cells' distinctive growth characteristics, their tendency to invade tissues around them, and metastasize. It is also more beneficial to identify various cell types based on their biological activities using an automated technique. This paper shows the differentiation between normal White Blood Cells and cancer, which can provide new knowledge on malignant changes and be used as an additional diagnostic marker. Since human eyes can not observe the features, we proposed the application of a convolutional neural network (CNN) based on malignant and normal WBCs classification. The Inception- V3Cnn model was validated on various WBCs normal and malignant cell images on regular normal and blood cancer cell lines with differing aggression levels. The study showed that CNN performed better in accuracy and efficiency than a human expert in the cell classification system
对于具有显著生物医学优势的计算机视觉来说,细胞的自动分类是一项必要但具有挑战性的任务。近年来,许多研究试图利用光学显微镜获得的无标记细胞图像构建基于人工智能的细胞分类器。虽然这些研究显示了有希望的结果,但不同细胞类型的生物复杂性不能用这些分类器来代表。然而,众所周知,细胞内肌动蛋白丝在恶性细胞中发生了显著的修饰。这被认为与肿瘤细胞独特的生长特征密切相关,它们倾向于侵入周围组织并转移。利用自动化技术根据细胞的生物活性来识别不同类型的细胞也更有益。本文显示了正常白细胞与癌细胞的区分,可以为恶性变化提供新的认识,并可作为额外的诊断标志。由于人眼无法观察到这些特征,我们提出了基于卷积神经网络(CNN)的恶性和正常白细胞分类的应用。Inception- V3Cnn模型在不同攻击水平的常规正常和血癌细胞系的各种白细胞正常和恶性细胞图像上进行验证。研究表明,在细胞分类系统中,CNN在准确性和效率上都比人类专家表现得更好
{"title":"Deep Learning for Classifying of White Blood Cancer","authors":"Asad Ullah, Tufail Muhammad","doi":"10.1109/CDMA54072.2022.00043","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00043","url":null,"abstract":"Automated classification of cells is an essential but challenging task for computer vision with significant biomedical advantages. Numerous studies have attempted to construct a cell classifier based on artificial intelligence using label-free cellular images obtained from an optical microscope in recent years. While these studies showed promising results, different cell types' biological complexity could not be represented by such classifiers. However, it is well-known that intracellular actin filaments are significantly modified in terms of the malignant cell. This is believed to be closely linked to tumor cells' distinctive growth characteristics, their tendency to invade tissues around them, and metastasize. It is also more beneficial to identify various cell types based on their biological activities using an automated technique. This paper shows the differentiation between normal White Blood Cells and cancer, which can provide new knowledge on malignant changes and be used as an additional diagnostic marker. Since human eyes can not observe the features, we proposed the application of a convolutional neural network (CNN) based on malignant and normal WBCs classification. The Inception- V3Cnn model was validated on various WBCs normal and malignant cell images on regular normal and blood cancer cell lines with differing aggression levels. The study showed that CNN performed better in accuracy and efficiency than a human expert in the cell classification system","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"255 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115592210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Students Personality Assessment using Deep Learning from University Admission Statement of Purpose 基于大学录取目的声明的深度学习学生个性评估
Salma Kulsoom, Seemab Latif, T. Saba, R. Latif
Statement of Purpose (SOP) plays a vital role in the university admissions process as reviewers assess the personality of the students by reading their SOPs. In past, the Big Five personality traits of the students are assessed to predict their future academic performance. An exciting application of machine learning is the personality assessment using personality traits and behavior. In this paper, our focus is on developing a deep learning-based personality assessment model for the detection of Big Five Personality traits from SOP and mapping them to speculate a student's academic performance at the university. Our proposed model uses Long-Short Term Memory (LSTM), Convolutional Neural Network (CNN) and Bi-Directional LSTM (Bi- LSTM) architectures to extract features and predict ratios of Big Five traits in the SOP. The proposed model has been trained and tested on an essays' dataset and 400 students' SOP collected from computer science undergraduate students. Maximum accuracy achieved for essays dataset is 88.2 % and for student's personal statement is 67.0 % with FastText Embedding.
目的陈述(SOP)在大学录取过程中起着至关重要的作用,因为审查员通过阅读学生的SOP来评估他们的个性。过去,评估学生的五大人格特征是为了预测他们未来的学业表现。机器学习的一个令人兴奋的应用是使用人格特征和行为进行人格评估。在本文中,我们的重点是开发一个基于深度学习的人格评估模型,用于从SOP中检测五大人格特征,并将它们映射到推测学生在大学的学业表现。我们提出的模型使用长短期记忆(LSTM)、卷积神经网络(CNN)和双向LSTM (Bi- LSTM)架构来提取特征并预测SOP中五大特征的比例。该模型已在论文数据集和400名计算机科学本科生的SOP上进行了训练和测试。使用FastText Embedding,论文数据集的最高准确率为88.2%,学生个人陈述的最高准确率为67.0%。
{"title":"Students Personality Assessment using Deep Learning from University Admission Statement of Purpose","authors":"Salma Kulsoom, Seemab Latif, T. Saba, R. Latif","doi":"10.1109/CDMA54072.2022.00042","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00042","url":null,"abstract":"Statement of Purpose (SOP) plays a vital role in the university admissions process as reviewers assess the personality of the students by reading their SOPs. In past, the Big Five personality traits of the students are assessed to predict their future academic performance. An exciting application of machine learning is the personality assessment using personality traits and behavior. In this paper, our focus is on developing a deep learning-based personality assessment model for the detection of Big Five Personality traits from SOP and mapping them to speculate a student's academic performance at the university. Our proposed model uses Long-Short Term Memory (LSTM), Convolutional Neural Network (CNN) and Bi-Directional LSTM (Bi- LSTM) architectures to extract features and predict ratios of Big Five traits in the SOP. The proposed model has been trained and tested on an essays' dataset and 400 students' SOP collected from computer science undergraduate students. Maximum accuracy achieved for essays dataset is 88.2 % and for student's personal statement is 67.0 % with FastText Embedding.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"16 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116851947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Liver Texture Classification on CT Images of Microwave Ablation Therapy 微波消融治疗肝脏CT图像的纹理分类
N. Mahmoodian, Harshita Thadesar, Marilena Georgiades, M. Pech, C. Hoeschen
Microwave ablation (MWA) therapy with image guidance by computed tomography (CT) is used for liver tumor destruction. However, because of the noise and therefore low contrast, CT images are not good enough for therapy control and need additional magnetic resonance imaging after the ther-apy. The ablation process itself is facing two significant chal-lenges: Firstly insufficient tumor ablation, which leads to tumor recurrence. Secondary, total ablated area was significantly larger than the tumor size which causes damaging of healthy tissue. To minimize the impact, it is crucial for the radiologist to perform the therapy well to prevent tumor recurrence. Therefore, it is essential to differentiate among healthy, tumor, and ablated tissue textures in the CT scan images. This research contributes to the understanding of tissue characterization for the reduction of the recurrence rate. In this regard, four machine-learning (ML) algorithms of Naive-Bayesian, Logistic-Regression, Decision-Tree, and Random-Forest were employed for liver tissues classification. In this paper, we propose higher order spectral particularly bispectrum analysis for extracting features from the CT images. Then classifiers were trained by ten new features extracted from the bispectrum analysis. For that, the images were divided into small patches, they were labeled as healthy, tumor, and ablated tissue. A maximum accuracy of 90.5% was obtained. The approach shows that the bispectral analysis provides valuable information that can be used during the MWA therapy for tissue characterization of CT scan even in the presence of noise.
在计算机断层扫描(CT)引导下的微波消融(MWA)治疗用于肝肿瘤的破坏。然而,由于噪声和对比度较低,CT图像不足以用于治疗控制,治疗后需要额外的磁共振成像。消融过程本身面临着两大挑战:一是肿瘤消融不足,导致肿瘤复发。其次,消融总面积明显大于肿瘤大小,对健康组织造成损伤。为了减少影响,放射科医生做好治疗以防止肿瘤复发是至关重要的。因此,在CT扫描图像中区分健康、肿瘤和消融组织纹理是必要的。本研究有助于了解组织特征,降低复发率。为此,采用了Naive-Bayesian、Logistic-Regression、Decision-Tree和Random-Forest四种机器学习算法进行肝组织分类。在本文中,我们提出了高阶谱特别是双谱分析来提取CT图像的特征。然后用从双谱分析中提取的10个新特征训练分类器。为此,图像被分成小块,分别标记为健康组织、肿瘤组织和消融组织。最高准确度为90.5%。该方法表明,双谱分析提供了有价值的信息,可以在MWA治疗期间用于CT扫描的组织特征,即使在存在噪声的情况下。
{"title":"Liver Texture Classification on CT Images of Microwave Ablation Therapy","authors":"N. Mahmoodian, Harshita Thadesar, Marilena Georgiades, M. Pech, C. Hoeschen","doi":"10.1109/CDMA54072.2022.00028","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00028","url":null,"abstract":"Microwave ablation (MWA) therapy with image guidance by computed tomography (CT) is used for liver tumor destruction. However, because of the noise and therefore low contrast, CT images are not good enough for therapy control and need additional magnetic resonance imaging after the ther-apy. The ablation process itself is facing two significant chal-lenges: Firstly insufficient tumor ablation, which leads to tumor recurrence. Secondary, total ablated area was significantly larger than the tumor size which causes damaging of healthy tissue. To minimize the impact, it is crucial for the radiologist to perform the therapy well to prevent tumor recurrence. Therefore, it is essential to differentiate among healthy, tumor, and ablated tissue textures in the CT scan images. This research contributes to the understanding of tissue characterization for the reduction of the recurrence rate. In this regard, four machine-learning (ML) algorithms of Naive-Bayesian, Logistic-Regression, Decision-Tree, and Random-Forest were employed for liver tissues classification. In this paper, we propose higher order spectral particularly bispectrum analysis for extracting features from the CT images. Then classifiers were trained by ten new features extracted from the bispectrum analysis. For that, the images were divided into small patches, they were labeled as healthy, tumor, and ablated tissue. A maximum accuracy of 90.5% was obtained. The approach shows that the bispectral analysis provides valuable information that can be used during the MWA therapy for tissue characterization of CT scan even in the presence of noise.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130110989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Supervised Multi-tree XGBoost Model for an Earlier COVID-19 Diagnosis Based on Clinical Symptoms 基于临床症状的新冠肺炎早期诊断的监督多树XGBoost模型
A. H. Syed, Tabrej Khan
Efficient screening of Severe Acute Respiratory Syndrome Corona Virus 2 (SARS-CoV-2) enables quick and efficient diagnosis of SARS-CoV-2 and can mitigate the burden on healthcare systems. The aim was to assist the medical team globally in triaging incoming patients, especially in countries with limited healthcare infrastructure. In this context, the features with imminent infection risk (Test Indication, Fever, and Headache) were obtained using a multi-tree XGBoost algorithm. Based on their feature importance, the top three clinically relevant earlier clinical symptoms (attributes) were employed to create a Multi-tree XGBoost-based model for an earlier prediction of SARS-CoV-2. Overall, our Multi-tree XGBoost model predicted SARS-CoV-2 infection status with a high F1-score (0.9920 $pm boldsymbol{0.008)}$ and AUC value (0. 9974 ± 0.0026) only by assessing the primary three clinical symptoms related to COVID-19 infection. Thus our multi-tree XGBoost - based model suggests a simple and accurate method for earlier detection of SARS-CoV-2 cases and initiating proper treatment protocol for SARS-CoV-2 positive patients. Therefore, we can conclude that our model will allow the health organizations to potentially reduce the infection rate and mortality in masses with COVID-19 infection and fatality due to SARS-CoV-2.
有效筛查2型严重急性呼吸综合征冠状病毒(SARS-CoV-2)可实现对SARS-CoV-2的快速有效诊断,并可减轻卫生保健系统的负担。其目的是协助全球医疗团队对即将到来的患者进行分类,特别是在医疗基础设施有限的国家。在这种情况下,使用多树XGBoost算法获得具有迫在眉睫感染风险的特征(试验指征、发烧和头痛)。根据其特征重要性,利用临床相关性最高的3个早期临床症状(属性),建立基于xgboost的多树早期预测模型。总体而言,我们的多树XGBoost模型预测SARS-CoV-2感染状态具有较高的f1得分(0.9920 $pm boldsymbol{0.008}$)和AUC值(0.9920 $pm boldsymbol{0.008}$)。9974±0.0026),仅评估与COVID-19感染相关的主要临床症状。因此,基于XGBoost的多树模型为早期发现SARS-CoV-2病例并对SARS-CoV-2阳性患者启动适当的治疗方案提供了一种简单准确的方法。因此,我们可以得出结论,我们的模型将使卫生组织有可能降低COVID-19感染人群的感染率和死亡率以及SARS-CoV-2导致的死亡率。
{"title":"A Supervised Multi-tree XGBoost Model for an Earlier COVID-19 Diagnosis Based on Clinical Symptoms","authors":"A. H. Syed, Tabrej Khan","doi":"10.1109/CDMA54072.2022.00041","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00041","url":null,"abstract":"Efficient screening of Severe Acute Respiratory Syndrome Corona Virus 2 (SARS-CoV-2) enables quick and efficient diagnosis of SARS-CoV-2 and can mitigate the burden on healthcare systems. The aim was to assist the medical team globally in triaging incoming patients, especially in countries with limited healthcare infrastructure. In this context, the features with imminent infection risk (Test Indication, Fever, and Headache) were obtained using a multi-tree XGBoost algorithm. Based on their feature importance, the top three clinically relevant earlier clinical symptoms (attributes) were employed to create a Multi-tree XGBoost-based model for an earlier prediction of SARS-CoV-2. Overall, our Multi-tree XGBoost model predicted SARS-CoV-2 infection status with a high F1-score (0.9920 $pm boldsymbol{0.008)}$ and AUC value (0. 9974 ± 0.0026) only by assessing the primary three clinical symptoms related to COVID-19 infection. Thus our multi-tree XGBoost - based model suggests a simple and accurate method for earlier detection of SARS-CoV-2 cases and initiating proper treatment protocol for SARS-CoV-2 positive patients. Therefore, we can conclude that our model will allow the health organizations to potentially reduce the infection rate and mortality in masses with COVID-19 infection and fatality due to SARS-CoV-2.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127651403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Machine Learning Based Preemptive Diagnosis of Lung Cancer Using Clinical Data 基于机器学习的肺癌早期诊断的临床研究
S. Olatunji, Aisha Alansari, Heba Alkhorasani, Meelaf Alsubaii, Rasha Sakloua, Reem Alzahrani, Yasmeen Alsaleem, Reem A. Alassaf, Mehwash Farooqui, M. I. B. Ahmed
Lung cancer is a malignant disease that im-poses serious complications restricting patients from performing daily tasks in the early stages and eventu-ally cause their death. The prevalence of this disease has been highlighted by numerous statistics worldwide. The preemptive diagnosis of individuals with lung can-cer can enhance chances of prevention and treatment. Therefore, the purpose of this study is to predict lung cancer preemptively utilizing simple clinical and demo-graphical features obtained from the “data world” website. The experiment was conducted using Support Vector Machine (SVM), K-Nearest Neighbor (K-NN), and Logistic Regression (LR) classifiers. To improve models' accuracy, SMOTETomek was employed along with GridsearchCV to tune hyperparameters. The Re-cursive Feature Elimination method was also utilized to find the best feature subset. Results indicated that SVM achieved the best performance with 98.33% recall, 96.72% precision, and an accuracy of 97.27% using 15 attributes.
肺癌是一种恶性疾病,它会造成严重的并发症,使患者在早期无法进行日常活动,并最终导致死亡。世界各地的许多统计数字都突出了这种疾病的流行。对肺癌患者的早期诊断可以增加预防和治疗的机会。因此,本研究的目的是利用从“数据世界”网站获得的简单临床和人口统计学特征,对肺癌进行前瞻性预测。实验使用支持向量机(SVM)、k -近邻(K-NN)和逻辑回归(LR)分类器进行。为了提高模型的准确性,SMOTETomek与GridsearchCV一起用于调整超参数。利用递归特征消去法寻找最佳特征子集。结果表明,SVM在15个属性的分类中,查全率为98.33%,查准率为96.72%,准确率为97.27%。
{"title":"Machine Learning Based Preemptive Diagnosis of Lung Cancer Using Clinical Data","authors":"S. Olatunji, Aisha Alansari, Heba Alkhorasani, Meelaf Alsubaii, Rasha Sakloua, Reem Alzahrani, Yasmeen Alsaleem, Reem A. Alassaf, Mehwash Farooqui, M. I. B. Ahmed","doi":"10.1109/CDMA54072.2022.00024","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00024","url":null,"abstract":"Lung cancer is a malignant disease that im-poses serious complications restricting patients from performing daily tasks in the early stages and eventu-ally cause their death. The prevalence of this disease has been highlighted by numerous statistics worldwide. The preemptive diagnosis of individuals with lung can-cer can enhance chances of prevention and treatment. Therefore, the purpose of this study is to predict lung cancer preemptively utilizing simple clinical and demo-graphical features obtained from the “data world” website. The experiment was conducted using Support Vector Machine (SVM), K-Nearest Neighbor (K-NN), and Logistic Regression (LR) classifiers. To improve models' accuracy, SMOTETomek was employed along with GridsearchCV to tune hyperparameters. The Re-cursive Feature Elimination method was also utilized to find the best feature subset. Results indicated that SVM achieved the best performance with 98.33% recall, 96.72% precision, and an accuracy of 97.27% using 15 attributes.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115997576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Improve the Accuracy of Students Admission at Universities Using Machine Learning Techniques 使用机器学习技术提高大学录取学生的准确性
Basem Assiri, M. Bashraheel, Ala Alsuri
The advancement of technology contributes in the development of many field of life. One of the major fields to focus on is the field of higher education. Actually, Saudi's universities provide free education to the students, so large number of students apply to the universities. In response to that, universities usually maintain admission policies. Universities' admission policies and procedures focus on students Grade Point Average in high school (GPAH), General Aptitude Test (GAT) and Achievement Test (AT). In fact, guiding students to the suitable major improves students' achievements and success. This paper studies the admission criteria for universities in Saudi Arabia. This paper investigates the hidden details that lies behind students' GP AH, GAT and AT. Those details influence the process of students' major selection at universities. Indeed, this research uses machine learning models to include more features such as the grades of high school courses to predict the suitable majors for the students. We use K-Nearest Neighbor (KNN), Decision Tree (DT) and Support Vector Machine (SVM) to classify students into suitable majors. This process enhances the enrollments of applicants in appropriate majors. Furthermore, the experiments show that KNN gives the highest accuracy rate as it reaches 100%, while DT's accuracy rate is 81 % and SVM's accuracy rate is 75%.
科技的进步促进了生活许多领域的发展。重点关注的主要领域之一是高等教育领域。实际上,沙特的大学对学生提供免费教育,所以大量的学生申请大学。为此,大学通常维持录取政策。大学的录取政策和程序侧重于学生的高中平均成绩(gpa),一般能力倾向测试(GAT)和成就测试(AT)。事实上,引导学生到合适的专业可以提高学生的成绩和成功。本文研究了沙特阿拉伯大学的录取标准。本文调查了学生的GP、AH、GAT和AT背后隐藏的细节。这些细节会影响学生在大学选择专业的过程。事实上,这项研究使用机器学习模型来包含更多的特征,比如高中课程的成绩,以预测学生适合的专业。我们使用k -最近邻(KNN)、决策树(DT)和支持向量机(SVM)对学生进行专业分类。这一过程提高了申请人在适当专业的入学率。此外,实验表明KNN的准确率最高,达到100%,DT的准确率为81%,SVM的准确率为75%。
{"title":"Improve the Accuracy of Students Admission at Universities Using Machine Learning Techniques","authors":"Basem Assiri, M. Bashraheel, Ala Alsuri","doi":"10.1109/CDMA54072.2022.00026","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00026","url":null,"abstract":"The advancement of technology contributes in the development of many field of life. One of the major fields to focus on is the field of higher education. Actually, Saudi's universities provide free education to the students, so large number of students apply to the universities. In response to that, universities usually maintain admission policies. Universities' admission policies and procedures focus on students Grade Point Average in high school (GPAH), General Aptitude Test (GAT) and Achievement Test (AT). In fact, guiding students to the suitable major improves students' achievements and success. This paper studies the admission criteria for universities in Saudi Arabia. This paper investigates the hidden details that lies behind students' GP AH, GAT and AT. Those details influence the process of students' major selection at universities. Indeed, this research uses machine learning models to include more features such as the grades of high school courses to predict the suitable majors for the students. We use K-Nearest Neighbor (KNN), Decision Tree (DT) and Support Vector Machine (SVM) to classify students into suitable majors. This process enhances the enrollments of applicants in appropriate majors. Furthermore, the experiments show that KNN gives the highest accuracy rate as it reaches 100%, while DT's accuracy rate is 81 % and SVM's accuracy rate is 75%.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128002484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
iReader: An Intelligent Reader System for the Visually Impaired iReader:视障人士的智能阅读系统
J. G, A. Azar, B. Qureshi, Nashwa Ahmad Kamal
For visually impaired persons, it is quite difficult to read printed text. Non-visual forms of reading materials, such as Braille, are available as Blind Aiding Technology amoung many others. In recent times, many devices and assistive equipment have been developed and technologies made available to assist visually impaired persons with reading. Most of these research works and products support reading from printed text-based manuscripts only. Due to this limitation, it may not be possible for a visually impaired person to describe and comprehend a printed image. In this paper, we develop iReader, an Intelligent Reader system that not only helps a visually impaired reader to read but also vocally describes an image available in the printed text. The Convolution Neural Network (CNN) is employed to collect features from the printed image and its caption. The Long Short- Term Memory (LSTM) network is used to train the model for describing the image data. The resulting data is sent as a voice message using Text- To-Speech to be read out loud to the user. The efficiency of the LSTM model is examined using the ResNet50 and VGG16. The experimental results show that the LSTM-based training model delivers the best prediction of a picture's description with an accuracy of 83
对于视障人士来说,阅读印刷文字是相当困难的。非视觉形式的阅读材料,如盲文,可以作为辅助盲人的技术。近年来,已经开发了许多辅助设备和技术来帮助视障人士阅读。大多数这些研究工作和产品只支持阅读基于印刷文本的手稿。由于这一限制,视障人士可能无法描述和理解打印图像。在本文中,我们开发了一种智能阅读系统iReader,它不仅可以帮助视障读者阅读,还可以语音描述印刷文本中的图像。使用卷积神经网络(CNN)从打印图像及其标题中收集特征。使用长短期记忆(LSTM)网络训练模型来描述图像数据。结果数据以语音消息的形式发送,使用文本到语音的方式大声朗读给用户。利用ResNet50和VGG16验证了LSTM模型的有效性。实验结果表明,基于lstm的训练模型对图片描述的预测准确率达到了83
{"title":"iReader: An Intelligent Reader System for the Visually Impaired","authors":"J. G, A. Azar, B. Qureshi, Nashwa Ahmad Kamal","doi":"10.1109/CDMA54072.2022.00036","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00036","url":null,"abstract":"For visually impaired persons, it is quite difficult to read printed text. Non-visual forms of reading materials, such as Braille, are available as Blind Aiding Technology amoung many others. In recent times, many devices and assistive equipment have been developed and technologies made available to assist visually impaired persons with reading. Most of these research works and products support reading from printed text-based manuscripts only. Due to this limitation, it may not be possible for a visually impaired person to describe and comprehend a printed image. In this paper, we develop iReader, an Intelligent Reader system that not only helps a visually impaired reader to read but also vocally describes an image available in the printed text. The Convolution Neural Network (CNN) is employed to collect features from the printed image and its caption. The Long Short- Term Memory (LSTM) network is used to train the model for describing the image data. The resulting data is sent as a voice message using Text- To-Speech to be read out loud to the user. The efficiency of the LSTM model is examined using the ResNet50 and VGG16. The experimental results show that the LSTM-based training model delivers the best prediction of a picture's description with an accuracy of 83","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"25 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133650869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Machine Learning Algorithms for Detection of Noisy/Artifact-Corrupted Epochs of Visual Oddball Paradigm ERP Data 视觉古怪范式ERP数据中噪声/伪影损坏时代检测的机器学习算法
Rafia Akhter, F. Beyette
Electroencephalography (EEG) is a non-invasive monitoring method that tracks and records the neural activities of the brain. The time-locked capture of the EEG to the external stimuli is known as Event-Related Potential (ERP) and it can help elucidate how the brain responds to the stimuli. In general, EEG is an uneven mixture of neural and non-neural sources of activities and these non-neural (non-EEG) signals produce artifacts in the EEG that can decrease the SNR in experiments and may lead to erroneous conclusions about the effects of experimental manipulation. Thus, it is very important to remove artifacts from the recorded EEG prior to analysis. The most common artifacts impacting ERPs are eye-blink, eye-movement, and body-movement. These artifacts-corrupted data can be removed by visual inspection or by computer-automated signal processing methods. While these methods are suitable for post-processing of collected ERP applications, they not well-suited for real-time processing of continuous ERP data. This project seeks to address the challenges associated with real-time identification of artifacts by introducing a machine learning model that can screen ERP, detect and reject artifact-corrupted data epochs prior to signal analysis. In addition to enabling real-time pre-processing of streaming ERP data, the DBScan machine-learning methods explored here can provide up to 90% accuracy in the identification of artifacts-mixed ERP epochs. As a result, the findings of this study will help to improve the signal quality of ERP trials and will enable ERP to be used as a biomarker in real-world applications where streaming EEG data collection and analysis are required.
脑电图(EEG)是一种追踪和记录大脑神经活动的无创监测方法。脑电图对外部刺激的时间锁定捕获被称为事件相关电位(ERP),它可以帮助阐明大脑如何对刺激作出反应。一般来说,脑电图是神经和非神经活动源的不均匀混合,这些非神经(非脑电图)信号在脑电图中产生伪影,会降低实验中的信噪比,并可能导致对实验操作效果的错误结论。因此,在分析之前从记录的脑电图中去除伪影是非常重要的。影响erp的最常见的人为因素是眨眼、眼球运动和身体运动。这些损坏的数据可以通过目视检查或计算机自动信号处理方法去除。虽然这些方法适合于收集的ERP应用程序的后处理,但它们不太适合于连续ERP数据的实时处理。该项目旨在通过引入机器学习模型来解决与人工制品实时识别相关的挑战,该模型可以在信号分析之前筛选ERP,检测和拒绝人工制品损坏的数据时代。除了能够实时预处理流ERP数据外,本文探讨的DBScan机器学习方法在识别人工混合ERP时代方面可以提供高达90%的准确性。因此,本研究的发现将有助于提高ERP试验的信号质量,并将使ERP在需要流式脑电图数据收集和分析的现实应用中用作生物标志物。
{"title":"Machine Learning Algorithms for Detection of Noisy/Artifact-Corrupted Epochs of Visual Oddball Paradigm ERP Data","authors":"Rafia Akhter, F. Beyette","doi":"10.1109/CDMA54072.2022.00033","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00033","url":null,"abstract":"Electroencephalography (EEG) is a non-invasive monitoring method that tracks and records the neural activities of the brain. The time-locked capture of the EEG to the external stimuli is known as Event-Related Potential (ERP) and it can help elucidate how the brain responds to the stimuli. In general, EEG is an uneven mixture of neural and non-neural sources of activities and these non-neural (non-EEG) signals produce artifacts in the EEG that can decrease the SNR in experiments and may lead to erroneous conclusions about the effects of experimental manipulation. Thus, it is very important to remove artifacts from the recorded EEG prior to analysis. The most common artifacts impacting ERPs are eye-blink, eye-movement, and body-movement. These artifacts-corrupted data can be removed by visual inspection or by computer-automated signal processing methods. While these methods are suitable for post-processing of collected ERP applications, they not well-suited for real-time processing of continuous ERP data. This project seeks to address the challenges associated with real-time identification of artifacts by introducing a machine learning model that can screen ERP, detect and reject artifact-corrupted data epochs prior to signal analysis. In addition to enabling real-time pre-processing of streaming ERP data, the DBScan machine-learning methods explored here can provide up to 90% accuracy in the identification of artifacts-mixed ERP epochs. As a result, the findings of this study will help to improve the signal quality of ERP trials and will enable ERP to be used as a biomarker in real-world applications where streaming EEG data collection and analysis are required.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115475503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Denoising Electromagnatic Surveys Using LSTMs 利用lstm进行电磁测量降噪
Asma Z. Yamani, Klemens Katterbauer, A. Alshehri, A. Marsala, Rabah A. Al-Zaidy
Resistivity readings obtained from electromagnetic crosswell surveys provide insight for reservoir water saturation prediction. Although high resistivity values should map to low water saturation and vice versa, in many cases the readings may not be consistent with this correlation. This is due to factors that add noise to the resistivity reading, such as the borehole effect and the salinity of the injected water. Here, we attempt to treat the resistivity reading to negatively correlate with water saturation, enhancing the accuracy and interperability of water saturation prediction models. We utilize the resistivity readings from locations further from sources of noise to correct the inconsistencies in the resistivity readings using a Long-Short Term Memory (LSTM) Neural Network approach. Our results demonstrate that by addressing noisy inconsistencies in the data, the performance of the water saturation model increases in terms of R2 from 0.62 to 0.70. Moreover, upon deploying model interpretation method, namely, SHAP TreeExplainer, we show that the resistivity-based features in the water saturation prediction model posses higher importance values than before the enhancement, in comparison with porosity features.
电磁井间测量获得的电阻率读数为储层含水饱和度预测提供了依据。虽然高电阻率值应该映射到低含水饱和度,反之亦然,但在许多情况下,读数可能与这种相关性不一致。这是由于一些因素在电阻率读数中增加了噪声,例如井眼效应和注入水的盐度。在此,我们尝试将电阻率读数与含水饱和度负相关,以提高含水饱和度预测模型的准确性和互操作性。我们利用远离噪声源位置的电阻率读数,使用长短期记忆(LSTM)神经网络方法纠正电阻率读数的不一致性。我们的研究结果表明,通过处理数据中的噪声不一致性,水饱和度模型的性能在R2方面从0.62增加到0.70。此外,通过部署模型解释方法(即SHAP TreeExplainer),我们发现,与孔隙度特征相比,饱和度预测模型中基于电阻率的特征具有比增强前更高的重要值。
{"title":"Denoising Electromagnatic Surveys Using LSTMs","authors":"Asma Z. Yamani, Klemens Katterbauer, A. Alshehri, A. Marsala, Rabah A. Al-Zaidy","doi":"10.1109/CDMA54072.2022.00018","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00018","url":null,"abstract":"Resistivity readings obtained from electromagnetic crosswell surveys provide insight for reservoir water saturation prediction. Although high resistivity values should map to low water saturation and vice versa, in many cases the readings may not be consistent with this correlation. This is due to factors that add noise to the resistivity reading, such as the borehole effect and the salinity of the injected water. Here, we attempt to treat the resistivity reading to negatively correlate with water saturation, enhancing the accuracy and interperability of water saturation prediction models. We utilize the resistivity readings from locations further from sources of noise to correct the inconsistencies in the resistivity readings using a Long-Short Term Memory (LSTM) Neural Network approach. Our results demonstrate that by addressing noisy inconsistencies in the data, the performance of the water saturation model increases in terms of R2 from 0.62 to 0.70. Moreover, upon deploying model interpretation method, namely, SHAP TreeExplainer, we show that the resistivity-based features in the water saturation prediction model posses higher importance values than before the enhancement, in comparison with porosity features.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"4 15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130563832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1