Pub Date : 2021-05-24DOI: 10.1109/ICICS52457.2021.9464619
Qanita Bani Baker, Dalya Faraj, Alanoud Alguzo
With the increase in life-threatening viral diseases, the need for extensive research on its causes, recovery, and methods of prevention becomes crucial. Some of these diseases are dangerous and sometimes they might cause death. Dengue Fever remains one of the important public health issues expanded several areas all around the world. Dengue Fever spread could be affected by several factors such as climate conditions. In this paper, we analyze a weather-related dataset to predict the number of illness cases per week in the cities of San Juan and Iquitos by using several machine learning regression algorithms. To achieve this, we utilized and compared different machine learning regression techniques, the performance is evaluated using the Mean Absolute Error (MAE). As a result, the Poisson Regression Model achieved the best ratios and the lowest mean absolute error ratio of 25.6%.
{"title":"Forecasting Dengue Fever Using Machine Learning Regression Techniques","authors":"Qanita Bani Baker, Dalya Faraj, Alanoud Alguzo","doi":"10.1109/ICICS52457.2021.9464619","DOIUrl":"https://doi.org/10.1109/ICICS52457.2021.9464619","url":null,"abstract":"With the increase in life-threatening viral diseases, the need for extensive research on its causes, recovery, and methods of prevention becomes crucial. Some of these diseases are dangerous and sometimes they might cause death. Dengue Fever remains one of the important public health issues expanded several areas all around the world. Dengue Fever spread could be affected by several factors such as climate conditions. In this paper, we analyze a weather-related dataset to predict the number of illness cases per week in the cities of San Juan and Iquitos by using several machine learning regression algorithms. To achieve this, we utilized and compared different machine learning regression techniques, the performance is evaluated using the Mean Absolute Error (MAE). As a result, the Poisson Regression Model achieved the best ratios and the lowest mean absolute error ratio of 25.6%.","PeriodicalId":421803,"journal":{"name":"2021 12th International Conference on Information and Communication Systems (ICICS)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125704969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-24DOI: 10.1109/ICICS52457.2021.9464579
Shutong Song, Fadi Wedyan, Y. Jararweh
The study of software energy consumption is gaining more importance due to the wildly increasing use of resource limited portable devices that run on batteries, in addition to the economical and environmental concerns. Mobile hardware has been mostly well optimized on their energy consumption, but that cannot be said for mobile applications. Studying the energy consumption of applications requires investigating the amount of energy consumed at a granule level (e.g., method calls), and therefore, identifying the leaks which are responsible for peaks in energy consumed by an application. In this paper, we performed an empirical measurement of energy consumption for 10 Android applications using a software-based tool called PETRA. We reported and compared the energy consumed by method calls by the test cases. The study reveals that there are clear variations on the average energy consumption in the studied applications and are ranging from 0.25 Joule/second to 1.25 Joule/second. Moreover, the study revealed that the relative high average energy consumption in is associated with some frequently called methods by the test cases. These methods are identified and reported as energy hotspots. These findings could help practitioners to minimize the energy pattern by applying refactoring techniques during software maintenance.
{"title":"Empirical Evaluation of Energy Consumption for Mobile Applications","authors":"Shutong Song, Fadi Wedyan, Y. Jararweh","doi":"10.1109/ICICS52457.2021.9464579","DOIUrl":"https://doi.org/10.1109/ICICS52457.2021.9464579","url":null,"abstract":"The study of software energy consumption is gaining more importance due to the wildly increasing use of resource limited portable devices that run on batteries, in addition to the economical and environmental concerns. Mobile hardware has been mostly well optimized on their energy consumption, but that cannot be said for mobile applications. Studying the energy consumption of applications requires investigating the amount of energy consumed at a granule level (e.g., method calls), and therefore, identifying the leaks which are responsible for peaks in energy consumed by an application. In this paper, we performed an empirical measurement of energy consumption for 10 Android applications using a software-based tool called PETRA. We reported and compared the energy consumed by method calls by the test cases. The study reveals that there are clear variations on the average energy consumption in the studied applications and are ranging from 0.25 Joule/second to 1.25 Joule/second. Moreover, the study revealed that the relative high average energy consumption in is associated with some frequently called methods by the test cases. These methods are identified and reported as energy hotspots. These findings could help practitioners to minimize the energy pattern by applying refactoring techniques during software maintenance.","PeriodicalId":421803,"journal":{"name":"2021 12th International Conference on Information and Communication Systems (ICICS)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124803654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-24DOI: 10.1109/ICICS52457.2021.9464583
Pankaj Mendki
Cloud computing technologies have dominated more that a decade. Wider adoption of cloud based processing has given rise to different architectural patterns. Cloud native applications have emerged out ensuring application agility and scalability. But they have their own challenges in terms of security. In the last few years, blockchain technology has found its applicability in non-cryptocurrency areas as well. This paper illustrates how blockchain can be used to address security challenges for the cloud native applications. This work focuses challenges and possible blockchain based solution in the areas of network security, identity management, authentication, container security and audit log for forensics.
{"title":"Securing Cloud Native Applications Using Blockchain","authors":"Pankaj Mendki","doi":"10.1109/ICICS52457.2021.9464583","DOIUrl":"https://doi.org/10.1109/ICICS52457.2021.9464583","url":null,"abstract":"Cloud computing technologies have dominated more that a decade. Wider adoption of cloud based processing has given rise to different architectural patterns. Cloud native applications have emerged out ensuring application agility and scalability. But they have their own challenges in terms of security. In the last few years, blockchain technology has found its applicability in non-cryptocurrency areas as well. This paper illustrates how blockchain can be used to address security challenges for the cloud native applications. This work focuses challenges and possible blockchain based solution in the areas of network security, identity management, authentication, container security and audit log for forensics.","PeriodicalId":421803,"journal":{"name":"2021 12th International Conference on Information and Communication Systems (ICICS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116863838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-24DOI: 10.1109/ICICS52457.2021.9464585
Olga Jodelka, C. Anagnostopoulos, Kostas Kolomvatsos
Online novelty detection is an emerging task in Edge Computing trying to identify novel concepts in contextual data streams which should be incorporated into predictive analytics and inferential models locally executed on edge computing nodes. We introduce an unsupervised adaptive mechanism for online novelty detection over multi-variate data streams at the network edge based on the One-class Support Vector Machine; an instance of One-class Classification paradigm. Due to the proposed adjustable periodic model retraining, our mechanism timely and effectively recognises novelties and resource-efficiently adapts to data streams. Our experimental evaluation and comparative assessment showcase the effectiveness and efficiency of the proposed mechanism over real data-streams in identifying novelty conditioned on the necessary model retraining epochs.
{"title":"Adaptive Novelty Detection over Contextual Data Streams at the Edge using One-class Classification","authors":"Olga Jodelka, C. Anagnostopoulos, Kostas Kolomvatsos","doi":"10.1109/ICICS52457.2021.9464585","DOIUrl":"https://doi.org/10.1109/ICICS52457.2021.9464585","url":null,"abstract":"Online novelty detection is an emerging task in Edge Computing trying to identify novel concepts in contextual data streams which should be incorporated into predictive analytics and inferential models locally executed on edge computing nodes. We introduce an unsupervised adaptive mechanism for online novelty detection over multi-variate data streams at the network edge based on the One-class Support Vector Machine; an instance of One-class Classification paradigm. Due to the proposed adjustable periodic model retraining, our mechanism timely and effectively recognises novelties and resource-efficiently adapts to data streams. Our experimental evaluation and comparative assessment showcase the effectiveness and efficiency of the proposed mechanism over real data-streams in identifying novelty conditioned on the necessary model retraining epochs.","PeriodicalId":421803,"journal":{"name":"2021 12th International Conference on Information and Communication Systems (ICICS)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115511989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-24DOI: 10.1109/ICICS52457.2021.9464594
Qanita Bani Baker, Ala’a Abu Qutaish
Breast cancer classification and detection using histopathological images is considered a difficult process due to the complexity of the characteristics of histopathology images. This paper presents an automated system for the classification and detection of breast cancer from microscopic histological images where the images are classified into benign, in situ, invasive, and normal. The proposed approach involves several steps which are image preprocessing (Enhancement), image segmentation, feature extraction, feature selection, and finally image classification. The proposed approach utilizes and compares two segmentation methods which are clustering and Global thresholding using Otsu’s method. Initially, images are segmented using K-means and Global thresholding methods. Then, features (morphological and texture) are extracted from the images for the two methods. Moreover, feature selection is done by using Principal Component Analysis (PCA). Finally, K-means and Global thresholding methods are compared in the classification process by using different classifiers. The results show better performance for the Global thresholding.
{"title":"Evaluation of Histopathological Images Segmentation Techniques for Breast Cancer Detection","authors":"Qanita Bani Baker, Ala’a Abu Qutaish","doi":"10.1109/ICICS52457.2021.9464594","DOIUrl":"https://doi.org/10.1109/ICICS52457.2021.9464594","url":null,"abstract":"Breast cancer classification and detection using histopathological images is considered a difficult process due to the complexity of the characteristics of histopathology images. This paper presents an automated system for the classification and detection of breast cancer from microscopic histological images where the images are classified into benign, in situ, invasive, and normal. The proposed approach involves several steps which are image preprocessing (Enhancement), image segmentation, feature extraction, feature selection, and finally image classification. The proposed approach utilizes and compares two segmentation methods which are clustering and Global thresholding using Otsu’s method. Initially, images are segmented using K-means and Global thresholding methods. Then, features (morphological and texture) are extracted from the images for the two methods. Moreover, feature selection is done by using Principal Component Analysis (PCA). Finally, K-means and Global thresholding methods are compared in the classification process by using different classifiers. The results show better performance for the Global thresholding.","PeriodicalId":421803,"journal":{"name":"2021 12th International Conference on Information and Communication Systems (ICICS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121804583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-24DOI: 10.1109/ICICS52457.2021.9464578
E. Caron, Arthur Chevalier, Noelle Baillon-Bachoc, Anne-Lucie Vion
In the Cloud Era, we want to be able to quickly deploy any software anywhere in the world to provide high availability and fast services while maintaining acceptable levels of performance, low energy consumption and ensuring the compliance with every software level agreements contracted. To answer some of these needs, different tools exist in parallel to a big variety of Cloud architectures. Several interesting problems arise like deployment, networking, storage, security, and many others. In this paper, we will focus on the deployment issue with a Software Asset Management point of view. Most Cloud providers use proprietary software to ensure different kinds of services, and with them comes the licensing problem. We will tackle and propose a heuristic to solve the problem of deploying software in a Cloud architecture while considering license compliance, license price, and other important criteria. We will prove the NP-completeness of this problem and compare our heuristic with others to evaluate the enhancement we propose.
{"title":"Heuristic for license-aware, performant and energy efficient deployment of multiple software in Cloud architecture","authors":"E. Caron, Arthur Chevalier, Noelle Baillon-Bachoc, Anne-Lucie Vion","doi":"10.1109/ICICS52457.2021.9464578","DOIUrl":"https://doi.org/10.1109/ICICS52457.2021.9464578","url":null,"abstract":"In the Cloud Era, we want to be able to quickly deploy any software anywhere in the world to provide high availability and fast services while maintaining acceptable levels of performance, low energy consumption and ensuring the compliance with every software level agreements contracted. To answer some of these needs, different tools exist in parallel to a big variety of Cloud architectures. Several interesting problems arise like deployment, networking, storage, security, and many others. In this paper, we will focus on the deployment issue with a Software Asset Management point of view. Most Cloud providers use proprietary software to ensure different kinds of services, and with them comes the licensing problem. We will tackle and propose a heuristic to solve the problem of deploying software in a Cloud architecture while considering license compliance, license price, and other important criteria. We will prove the NP-completeness of this problem and compare our heuristic with others to evaluate the enhancement we propose.","PeriodicalId":421803,"journal":{"name":"2021 12th International Conference on Information and Communication Systems (ICICS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128005326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lung cancer is one of the most commonly diagnosed cancer. Most studies found that lung cancer patients have a survival time up to 5 years after the cancer is found. An accurate prognosis is the most critical aspect of a clinical decision-making process for patients. predicting patients’ survival time helps healthcare professionals to make treatment recommendations based on the prediction. In this paper, we used various deep learning methods to predict the survival time of Non-Small Cell Lung Cancer (NSCLC) patients in days which has been evaluated on clinical and radiomics dataset. The dataset was extracted from computerized tomography (CT) images that contain data for 300 patients. The concordance index (C-index) was used to evaluate the models. We applied several deep learning approaches and the best accuracy gained is 70.05% on the OWKIN task using Multilayer Perceptron (MLP) which outperforms the baseline model provided by the OWKIN task organizers
{"title":"Predicting Lung Cancer Survival Time Using Deep Learning Techniques","authors":"Qanita Bani Baker, Maram Gharaibeh, Yara Al-Harahsheh","doi":"10.1109/ICICS52457.2021.9464589","DOIUrl":"https://doi.org/10.1109/ICICS52457.2021.9464589","url":null,"abstract":"Lung cancer is one of the most commonly diagnosed cancer. Most studies found that lung cancer patients have a survival time up to 5 years after the cancer is found. An accurate prognosis is the most critical aspect of a clinical decision-making process for patients. predicting patients’ survival time helps healthcare professionals to make treatment recommendations based on the prediction. In this paper, we used various deep learning methods to predict the survival time of Non-Small Cell Lung Cancer (NSCLC) patients in days which has been evaluated on clinical and radiomics dataset. The dataset was extracted from computerized tomography (CT) images that contain data for 300 patients. The concordance index (C-index) was used to evaluate the models. We applied several deep learning approaches and the best accuracy gained is 70.05% on the OWKIN task using Multilayer Perceptron (MLP) which outperforms the baseline model provided by the OWKIN task organizers","PeriodicalId":421803,"journal":{"name":"2021 12th International Conference on Information and Communication Systems (ICICS)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127592325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-24DOI: 10.1109/ICICS52457.2021.9464605
Dalia Alzu'bi, R. Duwairi
In recent times, Arabic text analysis has attracted great interest, due to the widespread and use of the Arabic language by social media platforms, applications, and communities, and others. Each Arabian country has a special dialect that distinguishes it from others. Accordingly, the work on classifying these dialects is an interesting area of research, as it has implications for other areas, such as; sentiment analysis and machine translation. In this paper, we build a multi-task classification model for dialects based on utilizing Recurrent Neural Networks, where the dialects are classified into four categories, namely; Maghreb, Levantine, Gulf (in addition to Iraqi), and the Nile. The used dataset is taken from the MADAR corpus, which contained 110,000 sentences, these belong to dialects of different countries in the four regions. Based on experimentations, the results revealed that the classifiers are able to distinguish between the four dialects with an accuracy of up to 84.76%, which in turn is considered a promising result in this field.
{"title":"Detecting Regional Arabic Dialect based on Recurrent Neural Network","authors":"Dalia Alzu'bi, R. Duwairi","doi":"10.1109/ICICS52457.2021.9464605","DOIUrl":"https://doi.org/10.1109/ICICS52457.2021.9464605","url":null,"abstract":"In recent times, Arabic text analysis has attracted great interest, due to the widespread and use of the Arabic language by social media platforms, applications, and communities, and others. Each Arabian country has a special dialect that distinguishes it from others. Accordingly, the work on classifying these dialects is an interesting area of research, as it has implications for other areas, such as; sentiment analysis and machine translation. In this paper, we build a multi-task classification model for dialects based on utilizing Recurrent Neural Networks, where the dialects are classified into four categories, namely; Maghreb, Levantine, Gulf (in addition to Iraqi), and the Nile. The used dataset is taken from the MADAR corpus, which contained 110,000 sentences, these belong to dialects of different countries in the four regions. Based on experimentations, the results revealed that the classifiers are able to distinguish between the four dialects with an accuracy of up to 84.76%, which in turn is considered a promising result in this field.","PeriodicalId":421803,"journal":{"name":"2021 12th International Conference on Information and Communication Systems (ICICS)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131349004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-24DOI: 10.1109/ICICS52457.2021.9464591
Lojin Bani Younis, Safa Sweda, A. Alzu’bi
Modern smartphones can perform and display nearly as much of the internet as personal computers, including web browsing and video streaming. Nowadays, all users rely on web browsers to access, display, and manipulate information from Internet on their mobile devices. However, almost all of user browsing activities flow through a web browser, thereby threatening the privacy preservation by third-party trackers or browsers’ providers. Therefore, there is a demand to make web browsers private for users with the most intensive concentrate on keeping their data safe. In this paper, we perform a thorough digital forensics analysis on the smartphone’s volatile memory with the aim at investigating the data privacy on various web browsers. Memory acquisitions is methodically applied in private and non-private modes on Android platforms to examine which of the user artifacts are being protected or exposed from web history or email communications. Comprehensive experiments are conducted on the four popular browsers Google Chrome, Mozilla Firefox, Dolphin and Opera under various scenarios. The experimental results show that the Chrome web browser was the lowest secure browser in which all the inspected data have been retrieved even with a private mode enabled. The other browsers have shown a partial privacy preservation. Such findings emphasize the importance of conducting such forensics analysis and warning users to keep their browsing practices safe from prying eyes.
{"title":"Forensics Analysis of Private Web Browsing Using Android Memory Acquisition","authors":"Lojin Bani Younis, Safa Sweda, A. Alzu’bi","doi":"10.1109/ICICS52457.2021.9464591","DOIUrl":"https://doi.org/10.1109/ICICS52457.2021.9464591","url":null,"abstract":"Modern smartphones can perform and display nearly as much of the internet as personal computers, including web browsing and video streaming. Nowadays, all users rely on web browsers to access, display, and manipulate information from Internet on their mobile devices. However, almost all of user browsing activities flow through a web browser, thereby threatening the privacy preservation by third-party trackers or browsers’ providers. Therefore, there is a demand to make web browsers private for users with the most intensive concentrate on keeping their data safe. In this paper, we perform a thorough digital forensics analysis on the smartphone’s volatile memory with the aim at investigating the data privacy on various web browsers. Memory acquisitions is methodically applied in private and non-private modes on Android platforms to examine which of the user artifacts are being protected or exposed from web history or email communications. Comprehensive experiments are conducted on the four popular browsers Google Chrome, Mozilla Firefox, Dolphin and Opera under various scenarios. The experimental results show that the Chrome web browser was the lowest secure browser in which all the inspected data have been retrieved even with a private mode enabled. The other browsers have shown a partial privacy preservation. Such findings emphasize the importance of conducting such forensics analysis and warning users to keep their browsing practices safe from prying eyes.","PeriodicalId":421803,"journal":{"name":"2021 12th International Conference on Information and Communication Systems (ICICS)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131469512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-24DOI: 10.1109/ICICS52457.2021.9464540
Abdallah Ghourabi
Text classification (or categorization) is one of the most common natural language processing (NLP) tasks. It is very useful to simplify the management of a large volume of textual data by assigning each text to one or more categories. This operation is challenging when it is a multi-label classification. For Arabic text, this task becomes more challenging due to the complex morphology and structure of Arabic language. In this paper, we address this issue by proposing a classification system for the Mowjaz Multi-Topic Labelling Task. The objective of this task is to classify Arabic articles according to the 10 topics predefined in Mowjaz. The proposed system is based on AraBERT, a pre-trained BERT model for the Arabic language. The first step of this system consists in tokenizing and representing the input articles using the AraBERT model. Then, a fully connected neural network is applied on the output of the AraBERT model to classify the articles according to their topics. The experimental tests conducted on the Mowjaz dataset showed an accuracy of 0.865 for the development set and an accuracy of 0.851 for the test set.
{"title":"A BERT-based system for multi-topic labeling of Arabic content","authors":"Abdallah Ghourabi","doi":"10.1109/ICICS52457.2021.9464540","DOIUrl":"https://doi.org/10.1109/ICICS52457.2021.9464540","url":null,"abstract":"Text classification (or categorization) is one of the most common natural language processing (NLP) tasks. It is very useful to simplify the management of a large volume of textual data by assigning each text to one or more categories. This operation is challenging when it is a multi-label classification. For Arabic text, this task becomes more challenging due to the complex morphology and structure of Arabic language. In this paper, we address this issue by proposing a classification system for the Mowjaz Multi-Topic Labelling Task. The objective of this task is to classify Arabic articles according to the 10 topics predefined in Mowjaz. The proposed system is based on AraBERT, a pre-trained BERT model for the Arabic language. The first step of this system consists in tokenizing and representing the input articles using the AraBERT model. Then, a fully connected neural network is applied on the output of the AraBERT model to classify the articles according to their topics. The experimental tests conducted on the Mowjaz dataset showed an accuracy of 0.865 for the development set and an accuracy of 0.851 for the test set.","PeriodicalId":421803,"journal":{"name":"2021 12th International Conference on Information and Communication Systems (ICICS)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134038933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}