2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)最新文献

A Text-mining approach for crime tweets in Saudi Arabia: From analysis to prediction 沙特阿拉伯犯罪推文的文本挖掘方法:从分析到预测

2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)

Pub Date : 2022-03-01 DOI: 10.1109/CDMA54072.2022.00023

Amal Algefes, Nouf Aldossari, Fatma Masmoudi, Elham Kariri

Social networks have proven to be a massive hub for investigating contextual and individual behavior of people. Most recently micro-blogging sites like Twitter are indicating to researchers that their content can be aggregated and used to effectively predict forecast, and infer outcomes of real-world events. The crime-related tweets analysis research in Saudi Arabia set off with an ultimate goal of gathering a deeper understanding of what kinds of criminal weapons are people frequently talking about. In this paper, we aim at dealing with tweets mentioning different weapons, analyzing them to gather facts such as annual variation of percentage tweets mentioning different weapons, recognizing the impact of events such as the Covid-19 pandemic on crime social discussions. In the following step, we develop a number of classifiers to predict which weapon is mentioned in a tweet. In order to perform our tasks, the Python programming language is used in the majority of the cases.

社交网络已被证明是调查人们情境和个人行为的一个巨大中心。最近，像Twitter这样的微博网站向研究人员表明，它们的内容可以被聚合起来，用于有效地预测、预测和推断现实世界事件的结果。沙特阿拉伯与犯罪相关的推文分析研究的最终目标是更深入地了解人们经常谈论的犯罪武器类型。在本文中，我们的目标是处理提到不同武器的推文，分析它们以收集诸如提到不同武器的推文百分比的年度变化等事实，并认识到Covid-19大流行等事件对犯罪社会讨论的影响。在接下来的步骤中，我们开发了许多分类器来预测推文中提到的武器。为了执行我们的任务，在大多数情况下使用Python编程语言。

引用次数: 2

Deep Learning for Classifying of White Blood Cancer 基于深度学习的白血癌分类

2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)

Pub Date : 2022-03-01 DOI: 10.1109/CDMA54072.2022.00043

Asad Ullah, Tufail Muhammad

Automated classification of cells is an essential but challenging task for computer vision with significant biomedical advantages. Numerous studies have attempted to construct a cell classifier based on artificial intelligence using label-free cellular images obtained from an optical microscope in recent years. While these studies showed promising results, different cell types' biological complexity could not be represented by such classifiers. However, it is well-known that intracellular actin filaments are significantly modified in terms of the malignant cell. This is believed to be closely linked to tumor cells' distinctive growth characteristics, their tendency to invade tissues around them, and metastasize. It is also more beneficial to identify various cell types based on their biological activities using an automated technique. This paper shows the differentiation between normal White Blood Cells and cancer, which can provide new knowledge on malignant changes and be used as an additional diagnostic marker. Since human eyes can not observe the features, we proposed the application of a convolutional neural network (CNN) based on malignant and normal WBCs classification. The Inception- V3Cnn model was validated on various WBCs normal and malignant cell images on regular normal and blood cancer cell lines with differing aggression levels. The study showed that CNN performed better in accuracy and efficiency than a human expert in the cell classification system

对于具有显著生物医学优势的计算机视觉来说，细胞的自动分类是一项必要但具有挑战性的任务。近年来，许多研究试图利用光学显微镜获得的无标记细胞图像构建基于人工智能的细胞分类器。虽然这些研究显示了有希望的结果，但不同细胞类型的生物复杂性不能用这些分类器来代表。然而，众所周知，细胞内肌动蛋白丝在恶性细胞中发生了显著的修饰。这被认为与肿瘤细胞独特的生长特征密切相关，它们倾向于侵入周围组织并转移。利用自动化技术根据细胞的生物活性来识别不同类型的细胞也更有益。本文显示了正常白细胞与癌细胞的区分，可以为恶性变化提供新的认识，并可作为额外的诊断标志。由于人眼无法观察到这些特征，我们提出了基于卷积神经网络(CNN)的恶性和正常白细胞分类的应用。Inception- V3Cnn模型在不同攻击水平的常规正常和血癌细胞系的各种白细胞正常和恶性细胞图像上进行验证。研究表明，在细胞分类系统中，CNN在准确性和效率上都比人类专家表现得更好

{"title":"Deep Learning for Classifying of White Blood Cancer","authors":"Asad Ullah, Tufail Muhammad","doi":"10.1109/CDMA54072.2022.00043","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00043","url":null,"abstract":"Automated classification of cells is an essential but challenging task for computer vision with significant biomedical advantages. Numerous studies have attempted to construct a cell classifier based on artificial intelligence using label-free cellular images obtained from an optical microscope in recent years. While these studies showed promising results, different cell types' biological complexity could not be represented by such classifiers. However, it is well-known that intracellular actin filaments are significantly modified in terms of the malignant cell. This is believed to be closely linked to tumor cells' distinctive growth characteristics, their tendency to invade tissues around them, and metastasize. It is also more beneficial to identify various cell types based on their biological activities using an automated technique. This paper shows the differentiation between normal White Blood Cells and cancer, which can provide new knowledge on malignant changes and be used as an additional diagnostic marker. Since human eyes can not observe the features, we proposed the application of a convolutional neural network (CNN) based on malignant and normal WBCs classification. The Inception- V3Cnn model was validated on various WBCs normal and malignant cell images on regular normal and blood cancer cell lines with differing aggression levels. The study showed that CNN performed better in accuracy and efficiency than a human expert in the cell classification system","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"255 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115592210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Students Personality Assessment using Deep Learning from University Admission Statement of Purpose 基于大学录取目的声明的深度学习学生个性评估

2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)

Pub Date : 2022-03-01 DOI: 10.1109/CDMA54072.2022.00042

Salma Kulsoom, Seemab Latif, T. Saba, R. Latif

Statement of Purpose (SOP) plays a vital role in the university admissions process as reviewers assess the personality of the students by reading their SOPs. In past, the Big Five personality traits of the students are assessed to predict their future academic performance. An exciting application of machine learning is the personality assessment using personality traits and behavior. In this paper, our focus is on developing a deep learning-based personality assessment model for the detection of Big Five Personality traits from SOP and mapping them to speculate a student's academic performance at the university. Our proposed model uses Long-Short Term Memory (LSTM), Convolutional Neural Network (CNN) and Bi-Directional LSTM (Bi- LSTM) architectures to extract features and predict ratios of Big Five traits in the SOP. The proposed model has been trained and tested on an essays' dataset and 400 students' SOP collected from computer science undergraduate students. Maximum accuracy achieved for essays dataset is 88.2 % and for student's personal statement is 67.0 % with FastText Embedding.

目的陈述(SOP)在大学录取过程中起着至关重要的作用，因为审查员通过阅读学生的SOP来评估他们的个性。过去，评估学生的五大人格特征是为了预测他们未来的学业表现。机器学习的一个令人兴奋的应用是使用人格特征和行为进行人格评估。在本文中，我们的重点是开发一个基于深度学习的人格评估模型，用于从SOP中检测五大人格特征，并将它们映射到推测学生在大学的学业表现。我们提出的模型使用长短期记忆(LSTM)、卷积神经网络(CNN)和双向LSTM (Bi- LSTM)架构来提取特征并预测SOP中五大特征的比例。该模型已在论文数据集和400名计算机科学本科生的SOP上进行了训练和测试。使用FastText Embedding，论文数据集的最高准确率为88.2%，学生个人陈述的最高准确率为67.0%。

{"title":"Students Personality Assessment using Deep Learning from University Admission Statement of Purpose","authors":"Salma Kulsoom, Seemab Latif, T. Saba, R. Latif","doi":"10.1109/CDMA54072.2022.00042","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00042","url":null,"abstract":"Statement of Purpose (SOP) plays a vital role in the university admissions process as reviewers assess the personality of the students by reading their SOPs. In past, the Big Five personality traits of the students are assessed to predict their future academic performance. An exciting application of machine learning is the personality assessment using personality traits and behavior. In this paper, our focus is on developing a deep learning-based personality assessment model for the detection of Big Five Personality traits from SOP and mapping them to speculate a student's academic performance at the university. Our proposed model uses Long-Short Term Memory (LSTM), Convolutional Neural Network (CNN) and Bi-Directional LSTM (Bi- LSTM) architectures to extract features and predict ratios of Big Five traits in the SOP. The proposed model has been trained and tested on an essays' dataset and 400 students' SOP collected from computer science undergraduate students. Maximum accuracy achieved for essays dataset is 88.2 % and for student's personal statement is 67.0 % with FastText Embedding.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"16 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116851947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Liver Texture Classification on CT Images of Microwave Ablation Therapy 微波消融治疗肝脏CT图像的纹理分类

2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)

Pub Date : 2022-03-01 DOI: 10.1109/CDMA54072.2022.00028

N. Mahmoodian, Harshita Thadesar, Marilena Georgiades, M. Pech, C. Hoeschen

Microwave ablation (MWA) therapy with image guidance by computed tomography (CT) is used for liver tumor destruction. However, because of the noise and therefore low contrast, CT images are not good enough for therapy control and need additional magnetic resonance imaging after the ther-apy. The ablation process itself is facing two significant chal-lenges: Firstly insufficient tumor ablation, which leads to tumor recurrence. Secondary, total ablated area was significantly larger than the tumor size which causes damaging of healthy tissue. To minimize the impact, it is crucial for the radiologist to perform the therapy well to prevent tumor recurrence. Therefore, it is essential to differentiate among healthy, tumor, and ablated tissue textures in the CT scan images. This research contributes to the understanding of tissue characterization for the reduction of the recurrence rate. In this regard, four machine-learning (ML) algorithms of Naive-Bayesian, Logistic-Regression, Decision-Tree, and Random-Forest were employed for liver tissues classification. In this paper, we propose higher order spectral particularly bispectrum analysis for extracting features from the CT images. Then classifiers were trained by ten new features extracted from the bispectrum analysis. For that, the images were divided into small patches, they were labeled as healthy, tumor, and ablated tissue. A maximum accuracy of 90.5% was obtained. The approach shows that the bispectral analysis provides valuable information that can be used during the MWA therapy for tissue characterization of CT scan even in the presence of noise.

在计算机断层扫描(CT)引导下的微波消融(MWA)治疗用于肝肿瘤的破坏。然而，由于噪声和对比度较低，CT图像不足以用于治疗控制，治疗后需要额外的磁共振成像。消融过程本身面临着两大挑战:一是肿瘤消融不足，导致肿瘤复发。其次，消融总面积明显大于肿瘤大小，对健康组织造成损伤。为了减少影响，放射科医生做好治疗以防止肿瘤复发是至关重要的。因此，在CT扫描图像中区分健康、肿瘤和消融组织纹理是必要的。本研究有助于了解组织特征，降低复发率。为此，采用了Naive-Bayesian、Logistic-Regression、Decision-Tree和Random-Forest四种机器学习算法进行肝组织分类。在本文中，我们提出了高阶谱特别是双谱分析来提取CT图像的特征。然后用从双谱分析中提取的10个新特征训练分类器。为此，图像被分成小块，分别标记为健康组织、肿瘤组织和消融组织。最高准确度为90.5%。该方法表明，双谱分析提供了有价值的信息，可以在MWA治疗期间用于CT扫描的组织特征，即使在存在噪声的情况下。

{"title":"Liver Texture Classification on CT Images of Microwave Ablation Therapy","authors":"N. Mahmoodian, Harshita Thadesar, Marilena Georgiades, M. Pech, C. Hoeschen","doi":"10.1109/CDMA54072.2022.00028","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00028","url":null,"abstract":"Microwave ablation (MWA) therapy with image guidance by computed tomography (CT) is used for liver tumor destruction. However, because of the noise and therefore low contrast, CT images are not good enough for therapy control and need additional magnetic resonance imaging after the ther-apy. The ablation process itself is facing two significant chal-lenges: Firstly insufficient tumor ablation, which leads to tumor recurrence. Secondary, total ablated area was significantly larger than the tumor size which causes damaging of healthy tissue. To minimize the impact, it is crucial for the radiologist to perform the therapy well to prevent tumor recurrence. Therefore, it is essential to differentiate among healthy, tumor, and ablated tissue textures in the CT scan images. This research contributes to the understanding of tissue characterization for the reduction of the recurrence rate. In this regard, four machine-learning (ML) algorithms of Naive-Bayesian, Logistic-Regression, Decision-Tree, and Random-Forest were employed for liver tissues classification. In this paper, we propose higher order spectral particularly bispectrum analysis for extracting features from the CT images. Then classifiers were trained by ten new features extracted from the bispectrum analysis. For that, the images were divided into small patches, they were labeled as healthy, tumor, and ablated tissue. A maximum accuracy of 90.5% was obtained. The approach shows that the bispectral analysis provides valuable information that can be used during the MWA therapy for tissue characterization of CT scan even in the presence of noise.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130110989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Supervised Multi-tree XGBoost Model for an Earlier COVID-19 Diagnosis Based on Clinical Symptoms 基于临床症状的新冠肺炎早期诊断的监督多树XGBoost模型

2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)

Pub Date : 2022-03-01 DOI: 10.1109/CDMA54072.2022.00041

A. H. Syed, Tabrej Khan

Efficient screening of Severe Acute Respiratory Syndrome Corona Virus 2 (SARS-CoV-2) enables quick and efficient diagnosis of SARS-CoV-2 and can mitigate the burden on healthcare systems. The aim was to assist the medical team globally in triaging incoming patients, especially in countries with limited healthcare infrastructure. In this context, the features with imminent infection risk (Test Indication, Fever, and Headache) were obtained using a multi-tree XGBoost algorithm. Based on their feature importance, the top three clinically relevant earlier clinical symptoms (attributes) were employed to create a Multi-tree XGBoost-based model for an earlier prediction of SARS-CoV-2. Overall, our Multi-tree XGBoost model predicted SARS-CoV-2 infection status with a high F1-score (0.9920 $pm boldsymbol{0.008)}$ and AUC value (0. 9974 ± 0.0026) only by assessing the primary three clinical symptoms related to COVID-19 infection. Thus our multi-tree XGBoost - based model suggests a simple and accurate method for earlier detection of SARS-CoV-2 cases and initiating proper treatment protocol for SARS-CoV-2 positive patients. Therefore, we can conclude that our model will allow the health organizations to potentially reduce the infection rate and mortality in masses with COVID-19 infection and fatality due to SARS-CoV-2.

有效筛查2型严重急性呼吸综合征冠状病毒(SARS-CoV-2)可实现对SARS-CoV-2的快速有效诊断，并可减轻卫生保健系统的负担。其目的是协助全球医疗团队对即将到来的患者进行分类，特别是在医疗基础设施有限的国家。在这种情况下，使用多树XGBoost算法获得具有迫在眉睫感染风险的特征(试验指征、发烧和头痛)。根据其特征重要性，利用临床相关性最高的3个早期临床症状(属性)，建立基于xgboost的多树早期预测模型。总体而言，我们的多树XGBoost模型预测SARS-CoV-2感染状态具有较高的f1得分(0.9920 $pm boldsymbol{0.008}$)和AUC值(0.9920 $pm boldsymbol{0.008}$)。9974±0.0026)，仅评估与COVID-19感染相关的主要临床症状。因此，基于XGBoost的多树模型为早期发现SARS-CoV-2病例并对SARS-CoV-2阳性患者启动适当的治疗方案提供了一种简单准确的方法。因此，我们可以得出结论，我们的模型将使卫生组织有可能降低COVID-19感染人群的感染率和死亡率以及SARS-CoV-2导致的死亡率。

{"title":"A Supervised Multi-tree XGBoost Model for an Earlier COVID-19 Diagnosis Based on Clinical Symptoms","authors":"A. H. Syed, Tabrej Khan","doi":"10.1109/CDMA54072.2022.00041","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00041","url":null,"abstract":"Efficient screening of Severe Acute Respiratory Syndrome Corona Virus 2 (SARS-CoV-2) enables quick and efficient diagnosis of SARS-CoV-2 and can mitigate the burden on healthcare systems. The aim was to assist the medical team globally in triaging incoming patients, especially in countries with limited healthcare infrastructure. In this context, the features with imminent infection risk (Test Indication, Fever, and Headache) were obtained using a multi-tree XGBoost algorithm. Based on their feature importance, the top three clinically relevant earlier clinical symptoms (attributes) were employed to create a Multi-tree XGBoost-based model for an earlier prediction of SARS-CoV-2. Overall, our Multi-tree XGBoost model predicted SARS-CoV-2 infection status with a high F1-score (0.9920 $pm boldsymbol{0.008)}$ and AUC value (0. 9974 ± 0.0026) only by assessing the primary three clinical symptoms related to COVID-19 infection. Thus our multi-tree XGBoost - based model suggests a simple and accurate method for earlier detection of SARS-CoV-2 cases and initiating proper treatment protocol for SARS-CoV-2 positive patients. Therefore, we can conclude that our model will allow the health organizations to potentially reduce the infection rate and mortality in masses with COVID-19 infection and fatality due to SARS-CoV-2.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127651403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Machine Learning Based Preemptive Diagnosis of Lung Cancer Using Clinical Data 基于机器学习的肺癌早期诊断的临床研究

2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)

Pub Date : 2022-03-01 DOI: 10.1109/CDMA54072.2022.00024

S. Olatunji, Aisha Alansari, Heba Alkhorasani, Meelaf Alsubaii, Rasha Sakloua, Reem Alzahrani, Yasmeen Alsaleem, Reem A. Alassaf, Mehwash Farooqui, M. I. B. Ahmed

Lung cancer is a malignant disease that im-poses serious complications restricting patients from performing daily tasks in the early stages and eventu-ally cause their death. The prevalence of this disease has been highlighted by numerous statistics worldwide. The preemptive diagnosis of individuals with lung can-cer can enhance chances of prevention and treatment. Therefore, the purpose of this study is to predict lung cancer preemptively utilizing simple clinical and demo-graphical features obtained from the “data world” website. The experiment was conducted using Support Vector Machine (SVM), K-Nearest Neighbor (K-NN), and Logistic Regression (LR) classifiers. To improve models' accuracy, SMOTETomek was employed along with GridsearchCV to tune hyperparameters. The Re-cursive Feature Elimination method was also utilized to find the best feature subset. Results indicated that SVM achieved the best performance with 98.33% recall, 96.72% precision, and an accuracy of 97.27% using 15 attributes.

肺癌是一种恶性疾病，它会造成严重的并发症，使患者在早期无法进行日常活动，并最终导致死亡。世界各地的许多统计数字都突出了这种疾病的流行。对肺癌患者的早期诊断可以增加预防和治疗的机会。因此，本研究的目的是利用从“数据世界”网站获得的简单临床和人口统计学特征，对肺癌进行前瞻性预测。实验使用支持向量机(SVM)、k -近邻(K-NN)和逻辑回归(LR)分类器进行。为了提高模型的准确性，SMOTETomek与GridsearchCV一起用于调整超参数。利用递归特征消去法寻找最佳特征子集。结果表明，SVM在15个属性的分类中，查全率为98.33%，查准率为96.72%，准确率为97.27%。

{"title":"Machine Learning Based Preemptive Diagnosis of Lung Cancer Using Clinical Data","authors":"S. Olatunji, Aisha Alansari, Heba Alkhorasani, Meelaf Alsubaii, Rasha Sakloua, Reem Alzahrani, Yasmeen Alsaleem, Reem A. Alassaf, Mehwash Farooqui, M. I. B. Ahmed","doi":"10.1109/CDMA54072.2022.00024","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00024","url":null,"abstract":"Lung cancer is a malignant disease that im-poses serious complications restricting patients from performing daily tasks in the early stages and eventu-ally cause their death. The prevalence of this disease has been highlighted by numerous statistics worldwide. The preemptive diagnosis of individuals with lung can-cer can enhance chances of prevention and treatment. Therefore, the purpose of this study is to predict lung cancer preemptively utilizing simple clinical and demo-graphical features obtained from the “data world” website. The experiment was conducted using Support Vector Machine (SVM), K-Nearest Neighbor (K-NN), and Logistic Regression (LR) classifiers. To improve models' accuracy, SMOTETomek was employed along with GridsearchCV to tune hyperparameters. The Re-cursive Feature Elimination method was also utilized to find the best feature subset. Results indicated that SVM achieved the best performance with 98.33% recall, 96.72% precision, and an accuracy of 97.27% using 15 attributes.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115997576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Improve the Accuracy of Students Admission at Universities Using Machine Learning Techniques 使用机器学习技术提高大学录取学生的准确性

2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)

Pub Date : 2022-03-01 DOI: 10.1109/CDMA54072.2022.00026

Basem Assiri, M. Bashraheel, Ala Alsuri

The advancement of technology contributes in the development of many field of life. One of the major fields to focus on is the field of higher education. Actually, Saudi's universities provide free education to the students, so large number of students apply to the universities. In response to that, universities usually maintain admission policies. Universities' admission policies and procedures focus on students Grade Point Average in high school (GPAH), General Aptitude Test (GAT) and Achievement Test (AT). In fact, guiding students to the suitable major improves students' achievements and success. This paper studies the admission criteria for universities in Saudi Arabia. This paper investigates the hidden details that lies behind students' GP AH, GAT and AT. Those details influence the process of students' major selection at universities. Indeed, this research uses machine learning models to include more features such as the grades of high school courses to predict the suitable majors for the students. We use K-Nearest Neighbor (KNN), Decision Tree (DT) and Support Vector Machine (SVM) to classify students into suitable majors. This process enhances the enrollments of applicants in appropriate majors. Furthermore, the experiments show that KNN gives the highest accuracy rate as it reaches 100%, while DT's accuracy rate is 81 % and SVM's accuracy rate is 75%.

科技的进步促进了生活许多领域的发展。重点关注的主要领域之一是高等教育领域。实际上，沙特的大学对学生提供免费教育，所以大量的学生申请大学。为此，大学通常维持录取政策。大学的录取政策和程序侧重于学生的高中平均成绩(gpa)，一般能力倾向测试(GAT)和成就测试(AT)。事实上，引导学生到合适的专业可以提高学生的成绩和成功。本文研究了沙特阿拉伯大学的录取标准。本文调查了学生的GP、AH、GAT和AT背后隐藏的细节。这些细节会影响学生在大学选择专业的过程。事实上，这项研究使用机器学习模型来包含更多的特征，比如高中课程的成绩，以预测学生适合的专业。我们使用k -最近邻(KNN)、决策树(DT)和支持向量机(SVM)对学生进行专业分类。这一过程提高了申请人在适当专业的入学率。此外，实验表明KNN的准确率最高，达到100%，DT的准确率为81%，SVM的准确率为75%。

{"title":"Improve the Accuracy of Students Admission at Universities Using Machine Learning Techniques","authors":"Basem Assiri, M. Bashraheel, Ala Alsuri","doi":"10.1109/CDMA54072.2022.00026","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00026","url":null,"abstract":"The advancement of technology contributes in the development of many field of life. One of the major fields to focus on is the field of higher education. Actually, Saudi's universities provide free education to the students, so large number of students apply to the universities. In response to that, universities usually maintain admission policies. Universities' admission policies and procedures focus on students Grade Point Average in high school (GPAH), General Aptitude Test (GAT) and Achievement Test (AT). In fact, guiding students to the suitable major improves students' achievements and success. This paper studies the admission criteria for universities in Saudi Arabia. This paper investigates the hidden details that lies behind students' GP AH, GAT and AT. Those details influence the process of students' major selection at universities. Indeed, this research uses machine learning models to include more features such as the grades of high school courses to predict the suitable majors for the students. We use K-Nearest Neighbor (KNN), Decision Tree (DT) and Support Vector Machine (SVM) to classify students into suitable majors. This process enhances the enrollments of applicants in appropriate majors. Furthermore, the experiments show that KNN gives the highest accuracy rate as it reaches 100%, while DT's accuracy rate is 81 % and SVM's accuracy rate is 75%.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128002484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

iReader: An Intelligent Reader System for the Visually Impaired iReader:视障人士的智能阅读系统

2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)

Pub Date : 2022-03-01 DOI: 10.1109/CDMA54072.2022.00036

J. G, A. Azar, B. Qureshi, Nashwa Ahmad Kamal

For visually impaired persons, it is quite difficult to read printed text. Non-visual forms of reading materials, such as Braille, are available as Blind Aiding Technology amoung many others. In recent times, many devices and assistive equipment have been developed and technologies made available to assist visually impaired persons with reading. Most of these research works and products support reading from printed text-based manuscripts only. Due to this limitation, it may not be possible for a visually impaired person to describe and comprehend a printed image. In this paper, we develop iReader, an Intelligent Reader system that not only helps a visually impaired reader to read but also vocally describes an image available in the printed text. The Convolution Neural Network (CNN) is employed to collect features from the printed image and its caption. The Long Short- Term Memory (LSTM) network is used to train the model for describing the image data. The resulting data is sent as a voice message using Text- To-Speech to be read out loud to the user. The efficiency of the LSTM model is examined using the ResNet50 and VGG16. The experimental results show that the LSTM-based training model delivers the best prediction of a picture's description with an accuracy of 83

对于视障人士来说，阅读印刷文字是相当困难的。非视觉形式的阅读材料，如盲文，可以作为辅助盲人的技术。近年来，已经开发了许多辅助设备和技术来帮助视障人士阅读。大多数这些研究工作和产品只支持阅读基于印刷文本的手稿。由于这一限制，视障人士可能无法描述和理解打印图像。在本文中，我们开发了一种智能阅读系统iReader，它不仅可以帮助视障读者阅读，还可以语音描述印刷文本中的图像。使用卷积神经网络(CNN)从打印图像及其标题中收集特征。使用长短期记忆(LSTM)网络训练模型来描述图像数据。结果数据以语音消息的形式发送，使用文本到语音的方式大声朗读给用户。利用ResNet50和VGG16验证了LSTM模型的有效性。实验结果表明，基于lstm的训练模型对图片描述的预测准确率达到了83

{"title":"iReader: An Intelligent Reader System for the Visually Impaired","authors":"J. G, A. Azar, B. Qureshi, Nashwa Ahmad Kamal","doi":"10.1109/CDMA54072.2022.00036","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00036","url":null,"abstract":"For visually impaired persons, it is quite difficult to read printed text. Non-visual forms of reading materials, such as Braille, are available as Blind Aiding Technology amoung many others. In recent times, many devices and assistive equipment have been developed and technologies made available to assist visually impaired persons with reading. Most of these research works and products support reading from printed text-based manuscripts only. Due to this limitation, it may not be possible for a visually impaired person to describe and comprehend a printed image. In this paper, we develop iReader, an Intelligent Reader system that not only helps a visually impaired reader to read but also vocally describes an image available in the printed text. The Convolution Neural Network (CNN) is employed to collect features from the printed image and its caption. The Long Short- Term Memory (LSTM) network is used to train the model for describing the image data. The resulting data is sent as a voice message using Text- To-Speech to be read out loud to the user. The efficiency of the LSTM model is examined using the ResNet50 and VGG16. The experimental results show that the LSTM-based training model delivers the best prediction of a picture's description with an accuracy of 83","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"25 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133650869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Automatic Classification of Accessibility User Reviews in Android Apps Android应用中可访问性用户评论的自动分类

2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)

Pub Date : 2022-03-01 DOI: 10.1109/CDMA54072.2022.00027

Wajdi Aljedaani, Mohamed Wiem Mkaouer, S. Ludi, Yasir Javed

In recent years, mobile applications have gained popularity for providing information, digital services, and content to users including users with disabilities. However, recent studies have shown that even popular mobile apps are facing issues related to accessibility, which hinders their usability experience for people with disabilities. For discovering these issues in the new app releases, developers consider user reviews published on the official app stores. However, it is a challenging and time-consuming task to identify the type of accessibility-related reviews manually. Therefore, in this study, we have used super-vised learning techniques, namely, Extra Tree Classifier (ETC), Random Forest, Support Vector Classification, Decision Tree, K-Nearest Neighbors (KNN), and Logistic Regression for automated classification of 2,663 Android app reviews based on four types of accessibility guidelines, i.e., Principles, Audio/Images, Design and Focus. Results have shown that the ETC classifier produces the best results in the automated classification of accessibility app reviews with 93% accuracy.

近年来，移动应用程序为包括残疾用户在内的用户提供信息、数字服务和内容而变得越来越受欢迎。然而，最近的研究表明，即使是流行的移动应用也面临着与可访问性相关的问题，这阻碍了它们对残疾人的可用性体验。为了在新应用发布中发现这些问题，开发者会考虑发布在官方应用商店上的用户评论。然而，手动确定与可访问性相关的审查类型是一项具有挑战性且耗时的任务。因此，在本研究中，我们使用了监督学习技术，即额外树分类器(ETC)、随机森林、支持向量分类、决策树、k近邻(KNN)和逻辑回归，基于四种可访问性准则，即原则、音频/图像、设计和焦点，对2663条Android应用评论进行了自动分类。结果表明，ETC分类器在可访问性应用程序评论的自动分类中产生了最好的结果，准确率为93%。

{"title":"Automatic Classification of Accessibility User Reviews in Android Apps","authors":"Wajdi Aljedaani, Mohamed Wiem Mkaouer, S. Ludi, Yasir Javed","doi":"10.1109/CDMA54072.2022.00027","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00027","url":null,"abstract":"In recent years, mobile applications have gained popularity for providing information, digital services, and content to users including users with disabilities. However, recent studies have shown that even popular mobile apps are facing issues related to accessibility, which hinders their usability experience for people with disabilities. For discovering these issues in the new app releases, developers consider user reviews published on the official app stores. However, it is a challenging and time-consuming task to identify the type of accessibility-related reviews manually. Therefore, in this study, we have used super-vised learning techniques, namely, Extra Tree Classifier (ETC), Random Forest, Support Vector Classification, Decision Tree, K-Nearest Neighbors (KNN), and Logistic Regression for automated classification of 2,663 Android app reviews based on four types of accessibility guidelines, i.e., Principles, Audio/Images, Design and Focus. Results have shown that the ETC classifier produces the best results in the automated classification of accessibility app reviews with 93% accuracy.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132274150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Phishing Attacks Detection using Machine Learning and Deep Learning Models 使用机器学习和深度学习模型检测网络钓鱼攻击

2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)

Pub Date : 2022-03-01 DOI: 10.1109/CDMA54072.2022.00034

M. Aljabri, Samiha Mirza

Because of the fast expansion of internet users, phishing attacks have become a significant menace where the attacker poses as a trusted entity in order to steal sensitive data, causing reputational damage, loss of money, ransomware, or other malware infections. Intelligent techniques mainly Machine Learning (ML) and Deep Learning (D L) are increasingly applied in the field of cybersecurity due to their ability to learn from available data in order to extract useful insight and predict future events. The effectiveness of applying such intelligent approaches in detecting phishing web sites is investigated in this paper. We used two separate datasets and selected the highest correlated features which comprised of a combination of content-based, URL lexical-based, and domain-based features. A set of ML models were then applied, and a comparative performance evaluation was conducted. Results proved the importance of features selection in improving the models' performance. Furthermore, the results also aimed to identify the best features that influence the model in identifying phishing websites. For classification performance, Random Forest (RF) algorithm achieved the highest accuracy for both datasets.

由于互联网用户的快速扩张，网络钓鱼攻击已经成为一个重大的威胁，攻击者冒充一个受信任的实体，以窃取敏感数据，造成声誉损害，金钱损失，勒索软件或其他恶意软件感染。智能技术(主要是机器学习(ML)和深度学习(dl))在网络安全领域的应用越来越多，因为它们能够从可用数据中学习，以提取有用的见解并预测未来事件。本文研究了应用这种智能方法检测钓鱼网站的有效性。我们使用了两个独立的数据集，并选择了相关度最高的特征，这些特征包括基于内容的、基于URL词汇的和基于域的特征。然后应用了一组ML模型，并进行了性能比较评价。结果证明了特征选择对提高模型性能的重要性。此外，结果还旨在确定影响识别网络钓鱼网站模型的最佳特征。在分类性能方面，随机森林(RF)算法在两个数据集上都达到了最高的准确率。

{"title":"Phishing Attacks Detection using Machine Learning and Deep Learning Models","authors":"M. Aljabri, Samiha Mirza","doi":"10.1109/CDMA54072.2022.00034","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00034","url":null,"abstract":"Because of the fast expansion of internet users, phishing attacks have become a significant menace where the attacker poses as a trusted entity in order to steal sensitive data, causing reputational damage, loss of money, ransomware, or other malware infections. Intelligent techniques mainly Machine Learning (ML) and Deep Learning (D L) are increasingly applied in the field of cybersecurity due to their ability to learn from available data in order to extract useful insight and predict future events. The effectiveness of applying such intelligent approaches in detecting phishing web sites is investigated in this paper. We used two separate datasets and selected the highest correlated features which comprised of a combination of content-based, URL lexical-based, and domain-based features. A set of ML models were then applied, and a comparative performance evaluation was conducted. Results proved the importance of features selection in improving the models' performance. Furthermore, the results also aimed to identify the best features that influence the model in identifying phishing websites. For classification performance, Random Forest (RF) algorithm achieved the highest accuracy for both datasets.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116640498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14