Pub Date : 2023-01-01DOI: 10.14569/ijacsa.2023.0140222
Beesetti Kiran Kumar, Saurabh Bilgaiyan, B. Mishra
—For a long time, researchers have been working to predict the effort of software development with the help of various machine learning algorithms. These algorithms are known for better understanding the underlying facts inside the data and improving the prediction rate than conventional approaches such as line of code and functional point approaches. According to no free lunch theory, there is no single algorithm which gives better predictions on all the datasets. To remove this bias our work aims to provide a better model for software effort estimation and thereby reduce the distance between the actual and predicted effort for future projects. The authors proposed an ensembling of regressor models using voting estimator for better predictions to reduce the error rate to over the biasness provide by single machine learning algorithm. The results obtained show that the ensemble models were better than those from the single models used on different datasets.
{"title":"Software Effort Estimation through Ensembling of Base Models in Machine Learning using a Voting Estimator","authors":"Beesetti Kiran Kumar, Saurabh Bilgaiyan, B. Mishra","doi":"10.14569/ijacsa.2023.0140222","DOIUrl":"https://doi.org/10.14569/ijacsa.2023.0140222","url":null,"abstract":"—For a long time, researchers have been working to predict the effort of software development with the help of various machine learning algorithms. These algorithms are known for better understanding the underlying facts inside the data and improving the prediction rate than conventional approaches such as line of code and functional point approaches. According to no free lunch theory, there is no single algorithm which gives better predictions on all the datasets. To remove this bias our work aims to provide a better model for software effort estimation and thereby reduce the distance between the actual and predicted effort for future projects. The authors proposed an ensembling of regressor models using voting estimator for better predictions to reduce the error rate to over the biasness provide by single machine learning algorithm. The results obtained show that the ensemble models were better than those from the single models used on different datasets.","PeriodicalId":13824,"journal":{"name":"International Journal of Advanced Computer Science and Applications","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73155373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.14569/ijacsa.2023.0140654
Nazirah Abd. Hamid, S. R. Selamat, R. Ahmad, M. Mohamad
Role-based Access Control has become the standard of practice for many organizations for restricting control on limited resources in complicated infrastructures or systems. The main objective of the role mining development is to define appropriate roles that can be applied to the specified security access policies. However, the mining scales in this kind of setting are extensive and can cause a huge load on the management of the systems. To resolve the above mentioned problems, this paper proposes a model that implements Hamming Distance approach by rearranging the existing matrix as the input data to overcome the scalability problem. The findings of the model show that the generated file size of all datasets substantially have been reduced compared to the original datasets It has also shown that Hamming Distance technique can successfully reduce the mining scale of datasets ranging between 30% and 47% and produce better candidate roles. Keywords—Role-based Access Control; role mining; hamming distance; data mining
{"title":"Hamming Distance Approach to Reduce Role Mining Scalability","authors":"Nazirah Abd. Hamid, S. R. Selamat, R. Ahmad, M. Mohamad","doi":"10.14569/ijacsa.2023.0140654","DOIUrl":"https://doi.org/10.14569/ijacsa.2023.0140654","url":null,"abstract":"Role-based Access Control has become the standard of practice for many organizations for restricting control on limited resources in complicated infrastructures or systems. The main objective of the role mining development is to define appropriate roles that can be applied to the specified security access policies. However, the mining scales in this kind of setting are extensive and can cause a huge load on the management of the systems. To resolve the above mentioned problems, this paper proposes a model that implements Hamming Distance approach by rearranging the existing matrix as the input data to overcome the scalability problem. The findings of the model show that the generated file size of all datasets substantially have been reduced compared to the original datasets It has also shown that Hamming Distance technique can successfully reduce the mining scale of datasets ranging between 30% and 47% and produce better candidate roles. Keywords—Role-based Access Control; role mining; hamming distance; data mining","PeriodicalId":13824,"journal":{"name":"International Journal of Advanced Computer Science and Applications","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80438654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.14569/ijacsa.2023.0140802
R. Dornala
— Edge computing has gained significant attention in recent years due to its ability to process data closer to the source, resulting in reduced latency and improved performance. However, ensuring data security and efficient data management in edge-based computing applications poses significant challenges. This paper proposes an ensemble security approach and a multi-cloud load-balancing strategy to address these challenges. The ensemble security approach leverages multiple security mechanisms, such as encryption, authentication, and intrusion detection systems, to provide a layered defense against potential threats. By combining these mechanisms, the system can detect and mitigate security breaches at various levels, ensuring the integrity and confidentiality of data in edge-based environments. The multi-cloud load balancing strategy also aims to optimize resource utilization and performance by distributing data processing tasks across multiple cloud service providers. This approach takes advantage of the flexibility and scalability offered by the cloud, allowing for dynamic workload allocation based on factors like network conditions and computational capabilities. To evaluate the effectiveness of the proposed approach, we conducted experiments using a realistic edge-based computing environment. The results demonstrate that the ensemble security approach effectively detects and prevents security threats, while the multi-cloud load balancing strategy with edge computing to improve the overall system performance and resource utilization.
{"title":"Ensemble Security and Multi-Cloud Load Balancing for Data in Edge-based Computing Applications","authors":"R. Dornala","doi":"10.14569/ijacsa.2023.0140802","DOIUrl":"https://doi.org/10.14569/ijacsa.2023.0140802","url":null,"abstract":"— Edge computing has gained significant attention in recent years due to its ability to process data closer to the source, resulting in reduced latency and improved performance. However, ensuring data security and efficient data management in edge-based computing applications poses significant challenges. This paper proposes an ensemble security approach and a multi-cloud load-balancing strategy to address these challenges. The ensemble security approach leverages multiple security mechanisms, such as encryption, authentication, and intrusion detection systems, to provide a layered defense against potential threats. By combining these mechanisms, the system can detect and mitigate security breaches at various levels, ensuring the integrity and confidentiality of data in edge-based environments. The multi-cloud load balancing strategy also aims to optimize resource utilization and performance by distributing data processing tasks across multiple cloud service providers. This approach takes advantage of the flexibility and scalability offered by the cloud, allowing for dynamic workload allocation based on factors like network conditions and computational capabilities. To evaluate the effectiveness of the proposed approach, we conducted experiments using a realistic edge-based computing environment. The results demonstrate that the ensemble security approach effectively detects and prevents security threats, while the multi-cloud load balancing strategy with edge computing to improve the overall system performance and resource utilization.","PeriodicalId":13824,"journal":{"name":"International Journal of Advanced Computer Science and Applications","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76622033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.14569/ijacsa.2023.0140749
Zineb Elkaimbillah, B. E. Asri, M. Mikram, Maryem Rhanoui
—Information Technology (IT) job offers are available on the web in a heterogeneous way. It is difficult for a candidate looking for an IT job to retrieve the exact information they need to locate the ideal match for their profile, without wasting time on useless searches. Traditional IT job search systems are based on simple keywords that are generally not adapted to provide detailed answers because they do not take into account semantic links. In this article, an ontology is developed to meet the expectations of IT profiles from the IT job descriptions accumulated and pre-annotated using the UBIAI tool. The classes and subclasses of the ontology are designed using the Protégé 5.5.0 editor. Then the properties of objects and data are defined to improve the ontology. The ontology results are validated using DL queries by asking a number of questions to retrieve the requested information for each IT profile, and the ontology answers all these questions adequately. Finally, various plugins are used to display an ontology in a graphical representation.
{"title":"Construction of an Ontology-based Document Collection for the IT Job Offer in Morocco","authors":"Zineb Elkaimbillah, B. E. Asri, M. Mikram, Maryem Rhanoui","doi":"10.14569/ijacsa.2023.0140749","DOIUrl":"https://doi.org/10.14569/ijacsa.2023.0140749","url":null,"abstract":"—Information Technology (IT) job offers are available on the web in a heterogeneous way. It is difficult for a candidate looking for an IT job to retrieve the exact information they need to locate the ideal match for their profile, without wasting time on useless searches. Traditional IT job search systems are based on simple keywords that are generally not adapted to provide detailed answers because they do not take into account semantic links. In this article, an ontology is developed to meet the expectations of IT profiles from the IT job descriptions accumulated and pre-annotated using the UBIAI tool. The classes and subclasses of the ontology are designed using the Protégé 5.5.0 editor. Then the properties of objects and data are defined to improve the ontology. The ontology results are validated using DL queries by asking a number of questions to retrieve the requested information for each IT profile, and the ontology answers all these questions adequately. Finally, various plugins are used to display an ontology in a graphical representation.","PeriodicalId":13824,"journal":{"name":"International Journal of Advanced Computer Science and Applications","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76792988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.14569/ijacsa.2023.0140844
Jin-Il Han
—Under the background of the continuous progress of Industry 4.0 reform, the market demand for mobile robots in major world economies is gradually increasing. In order to improve the mobile robot's movement path planning quality and obstacle avoidance ability, this research adjusted the node selection method, pheromone update mechanism, transition probability and volatility coefficient calculation method of the ant colony algorithm, and improved the search direction setting and cost estimation calculation method of the A* algorithm. Thus, a robot movement path planning model can be designed with respect to the improved ant colony algorithm and A* algorithm. The simulation experiment results on grid maps show that the planning model constructed in view of the improved algorithm, the traditional ant colony algorithm, the Tianniu whisker search algorithm, and the particle swarm algorithm designed in this study converged after 8, 37, 23, and 26 iterations, respectively. The minimum path lengths after convergence were 13.24m, 17.82m, 16.24m, and 17.05m, respectively. When the edge length of the grid map is 100m, the minimum planning length and total moving time of the planning model constructed in view of the improved algorithm, the traditional ant colony algorithm, the longicorn whisker search algorithm, and the particle swarm algorithm designed in this study are 49m, 104m, 75m, 93m and 49s, 142s, 93s, and 127s, respectively. This indicates that the model designed in this study can effectively shorten the mobile path and training time while completing mobile tasks. The results of this study have a certain reference value for optimizing the robot's movement mode and obstacle avoidance ability.
{"title":"Application of Improved Ant Colony Algorithm Integrating Adaptive Parameter Configuration in Robot Mobile Path Design","authors":"Jin-Il Han","doi":"10.14569/ijacsa.2023.0140844","DOIUrl":"https://doi.org/10.14569/ijacsa.2023.0140844","url":null,"abstract":"—Under the background of the continuous progress of Industry 4.0 reform, the market demand for mobile robots in major world economies is gradually increasing. In order to improve the mobile robot's movement path planning quality and obstacle avoidance ability, this research adjusted the node selection method, pheromone update mechanism, transition probability and volatility coefficient calculation method of the ant colony algorithm, and improved the search direction setting and cost estimation calculation method of the A* algorithm. Thus, a robot movement path planning model can be designed with respect to the improved ant colony algorithm and A* algorithm. The simulation experiment results on grid maps show that the planning model constructed in view of the improved algorithm, the traditional ant colony algorithm, the Tianniu whisker search algorithm, and the particle swarm algorithm designed in this study converged after 8, 37, 23, and 26 iterations, respectively. The minimum path lengths after convergence were 13.24m, 17.82m, 16.24m, and 17.05m, respectively. When the edge length of the grid map is 100m, the minimum planning length and total moving time of the planning model constructed in view of the improved algorithm, the traditional ant colony algorithm, the longicorn whisker search algorithm, and the particle swarm algorithm designed in this study are 49m, 104m, 75m, 93m and 49s, 142s, 93s, and 127s, respectively. This indicates that the model designed in this study can effectively shorten the mobile path and training time while completing mobile tasks. The results of this study have a certain reference value for optimizing the robot's movement mode and obstacle avoidance ability.","PeriodicalId":13824,"journal":{"name":"International Journal of Advanced Computer Science and Applications","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77182836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.14569/ijacsa.2023.0140329
Kawtar Younsi Dahbi, D. Chiadmi, Hind Lamharhar
{"title":"Knowledge Graph based Representation to Extract Value from Open Government Data","authors":"Kawtar Younsi Dahbi, D. Chiadmi, Hind Lamharhar","doi":"10.14569/ijacsa.2023.0140329","DOIUrl":"https://doi.org/10.14569/ijacsa.2023.0140329","url":null,"abstract":"","PeriodicalId":13824,"journal":{"name":"International Journal of Advanced Computer Science and Applications","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81339129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.14569/ijacsa.2023.0140467
J. Alsayaydeh, Mohd Faizal bin Yusof, Muhammad Zulhakim Bin Abdul Halim, M. N. S. Zainudin, S. Herawan
The Internet of Things (IoT) has emerged as a transformative technology that has revolutionized the field of healthcare. One of the most promising applications of Internet of Things (IoT) in healthcare is patient health monitoring, which allows healthcare providers to remotely monitor patients' health and provide prompt medical attention when needed. This research work focuses on developing an Internet of Things (IoT)based patient health monitoring system aimed at providing a solution for patients, particularly the elderly, who face the risk of unexpected death due to the lack of medical attention. The proposed system utilizes a heartbeat sensor and an Infrared IR temperature sensor connected to Arduino UNO and Nodemcu, respectively, to monitor the patient's vital signs. The sensors collect the data, which is then sent to an Internet of Things (IoT) web platform via a Wi-Fi connection. The Internet of Things (IoT) platform displays the real-time data of the patient's health status, including the temperature and heartbeat rate, which can be monitored by doctors and nurses. The system is designed to send alerts to healthcare providers in the event of any medical emergency, ensuring that prompt medical attention can be provided to the patient. The significance of this research work lies in its potential to revolutionize the healthcare industry by providing a more efficient and effective means of patient health monitoring. The system can be used to monitor a large number of patients simultaneously, which is particularly beneficial in hospitals with a large patient load. Moreover, it can reduce the workload of healthcare providers, allowing them to focus on other critical tasks. This innovative system has the potential to improve the overall quality of healthcare services and lead to better health outcomes for the society. Keywords—Patient health monitoring; Internet of Things (IoT); Arduino UNO; Nodemcu ESP8266; thingspeak; wearable device; temperature value; heartbeat value; remotely
{"title":"Patient Health Monitoring System Development using ESP8266 and Arduino with IoT Platform","authors":"J. Alsayaydeh, Mohd Faizal bin Yusof, Muhammad Zulhakim Bin Abdul Halim, M. N. S. Zainudin, S. Herawan","doi":"10.14569/ijacsa.2023.0140467","DOIUrl":"https://doi.org/10.14569/ijacsa.2023.0140467","url":null,"abstract":"The Internet of Things (IoT) has emerged as a transformative technology that has revolutionized the field of healthcare. One of the most promising applications of Internet of Things (IoT) in healthcare is patient health monitoring, which allows healthcare providers to remotely monitor patients' health and provide prompt medical attention when needed. This research work focuses on developing an Internet of Things (IoT)based patient health monitoring system aimed at providing a solution for patients, particularly the elderly, who face the risk of unexpected death due to the lack of medical attention. The proposed system utilizes a heartbeat sensor and an Infrared IR temperature sensor connected to Arduino UNO and Nodemcu, respectively, to monitor the patient's vital signs. The sensors collect the data, which is then sent to an Internet of Things (IoT) web platform via a Wi-Fi connection. The Internet of Things (IoT) platform displays the real-time data of the patient's health status, including the temperature and heartbeat rate, which can be monitored by doctors and nurses. The system is designed to send alerts to healthcare providers in the event of any medical emergency, ensuring that prompt medical attention can be provided to the patient. The significance of this research work lies in its potential to revolutionize the healthcare industry by providing a more efficient and effective means of patient health monitoring. The system can be used to monitor a large number of patients simultaneously, which is particularly beneficial in hospitals with a large patient load. Moreover, it can reduce the workload of healthcare providers, allowing them to focus on other critical tasks. This innovative system has the potential to improve the overall quality of healthcare services and lead to better health outcomes for the society. Keywords—Patient health monitoring; Internet of Things (IoT); Arduino UNO; Nodemcu ESP8266; thingspeak; wearable device; temperature value; heartbeat value; remotely","PeriodicalId":13824,"journal":{"name":"International Journal of Advanced Computer Science and Applications","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81401751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.14569/ijacsa.2023.0140894
Zhiyuan Liang, Pengtao He, Wenbin Liang, Xiaolei Zhao, Bin Wei
—The study focuses on the laser radar SLAM mapping method employed by the tobacco production line inspection robot, utilizing an enhanced RBPF approach. It involves the construction of a well-structured two-dimensional map of the inspection environment for the tobacco production line inspection robot. This construction aims to ensure the seamless execution of inspection tasks along the tobacco production line. The fusion of wheel odometer and IMU data is accomplished using the extended Kalman filter algorithm, wherein the resulting fused odometer motion model and LiDAR observation model jointly serve as the hybrid proposal distribution. In the hybrid proposal distribution, the iterative nearest point method is used to find the sampling particles in the high probability area, and the matching score during particle matching scanning is used as the fitness value, and the Drosophila optimization strategy is used to adjust the particle distribution. Then, the weight of each particle after optimization is solved, and the particles are adaptively resampled according to the size of the weight after solution, and the inspection map of the inspection robot of the tobacco production line is updated according to the updated position and posture information and observation information of the particles of the inspection robot of the tobacco production line. The experimental results show that this method can realize the laser radar SLAM mapping of the tobacco production line inspection robot, and it can build a more ideal two-dimensional map of the inspection environment of the tobacco production line inspection robot with fewer particles. If it is applied to practical work, a more ideal work effect can be achieved.
{"title":"SLAM Mapping Method of Laser Radar for Tobacco Production Line Inspection Robot Based on Improved RBPF","authors":"Zhiyuan Liang, Pengtao He, Wenbin Liang, Xiaolei Zhao, Bin Wei","doi":"10.14569/ijacsa.2023.0140894","DOIUrl":"https://doi.org/10.14569/ijacsa.2023.0140894","url":null,"abstract":"—The study focuses on the laser radar SLAM mapping method employed by the tobacco production line inspection robot, utilizing an enhanced RBPF approach. It involves the construction of a well-structured two-dimensional map of the inspection environment for the tobacco production line inspection robot. This construction aims to ensure the seamless execution of inspection tasks along the tobacco production line. The fusion of wheel odometer and IMU data is accomplished using the extended Kalman filter algorithm, wherein the resulting fused odometer motion model and LiDAR observation model jointly serve as the hybrid proposal distribution. In the hybrid proposal distribution, the iterative nearest point method is used to find the sampling particles in the high probability area, and the matching score during particle matching scanning is used as the fitness value, and the Drosophila optimization strategy is used to adjust the particle distribution. Then, the weight of each particle after optimization is solved, and the particles are adaptively resampled according to the size of the weight after solution, and the inspection map of the inspection robot of the tobacco production line is updated according to the updated position and posture information and observation information of the particles of the inspection robot of the tobacco production line. The experimental results show that this method can realize the laser radar SLAM mapping of the tobacco production line inspection robot, and it can build a more ideal two-dimensional map of the inspection environment of the tobacco production line inspection robot with fewer particles. If it is applied to practical work, a more ideal work effect can be achieved.","PeriodicalId":13824,"journal":{"name":"International Journal of Advanced Computer Science and Applications","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82258921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Experimental Evaluation of Genetic Algorithms to Solve the DNA Assembly Optimization Problem","authors":"Hachemi Bennaceur, Meznah Almutairy, Nora Alqhtani","doi":"10.14569/ijacsa.2023.0140333","DOIUrl":"https://doi.org/10.14569/ijacsa.2023.0140333","url":null,"abstract":"org","PeriodicalId":13824,"journal":{"name":"International Journal of Advanced Computer Science and Applications","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78663318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.14569/ijacsa.2023.01406125
Endang Wahyu Pamungkas, Divi Galih Prasetyo Putri, A. Fatmawati
This study aims to provide an overview of the current research on detecting abusive language in Indonesian social media. The study examines existing datasets, methods, and challenges and opportunities in this field. The research found that most existing datasets for detecting abusive language were collected from social media platforms such as Twitter, Facebook, and Instagram, with Twitter being the most commonly used source. The study also found that hate speech is the most researched type of abusive language. Various models, including traditional machine learning and deep learning approaches, have been implemented for this task, with deep learning models showing more competitive results. However, the use of transformer-based models is less popular in Indonesian hate speech studies. The study also emphasizes the importance of exploring more diverse phenomena, such as islamophobia and political hate speech. Additionally, the study suggests crowdsourcing as a potential solution for the annotation approach for labeling datasets. Furthermore, it encourages researchers to consider code-mixing issues in abusive language datasets in Indonesia, as it could improve the overall model performance for detecting abusive language in Indonesian data. The study also suggests that the lack of effective regulations and the anonymity afforded to users on most social networking sites, as well as the increasing number of Twitter users in Indonesia, have contributed to the rising prevalence of hate speech in Indonesian social media. The study also notes the importance of considering code-mixed language, out-of-vocabulary words, grammatical errors, and limited context when working with social media data. Keywords—Abusive language; hate speech detection; machine learning; social media
{"title":"Hate Speech Detection in Bahasa Indonesia: Challenges and Opportunities","authors":"Endang Wahyu Pamungkas, Divi Galih Prasetyo Putri, A. Fatmawati","doi":"10.14569/ijacsa.2023.01406125","DOIUrl":"https://doi.org/10.14569/ijacsa.2023.01406125","url":null,"abstract":"This study aims to provide an overview of the current research on detecting abusive language in Indonesian social media. The study examines existing datasets, methods, and challenges and opportunities in this field. The research found that most existing datasets for detecting abusive language were collected from social media platforms such as Twitter, Facebook, and Instagram, with Twitter being the most commonly used source. The study also found that hate speech is the most researched type of abusive language. Various models, including traditional machine learning and deep learning approaches, have been implemented for this task, with deep learning models showing more competitive results. However, the use of transformer-based models is less popular in Indonesian hate speech studies. The study also emphasizes the importance of exploring more diverse phenomena, such as islamophobia and political hate speech. Additionally, the study suggests crowdsourcing as a potential solution for the annotation approach for labeling datasets. Furthermore, it encourages researchers to consider code-mixing issues in abusive language datasets in Indonesia, as it could improve the overall model performance for detecting abusive language in Indonesian data. The study also suggests that the lack of effective regulations and the anonymity afforded to users on most social networking sites, as well as the increasing number of Twitter users in Indonesia, have contributed to the rising prevalence of hate speech in Indonesian social media. The study also notes the importance of considering code-mixed language, out-of-vocabulary words, grammatical errors, and limited context when working with social media data. Keywords—Abusive language; hate speech detection; machine learning; social media","PeriodicalId":13824,"journal":{"name":"International Journal of Advanced Computer Science and Applications","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76312984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}