Pub Date : 2023-06-28DOI: 10.1109/JCSSE58229.2023.10201994
Patthamanan Isaranontakul, W. Kreesuradej
This study evaluated the effectiveness of using synthetic text datasets generated by GPT-3 for sentiment analysis with deep learning models, namely Bi-GRU and Bi-LSTM. The study compares the performance of these models on both synthetic text and Label Tweet datasets using GPT-3 and reveals that deep learning model performance is dependent on the dataset's nature. The results indicate that using synthetic text generated by GPT-3 significantly enhances the accuracy of both models, with Bi-LSTM achieving an accuracy of 0.84 and Bi-GRU achieving an accuracy of 0.85. The study underscores the importance of meticulous dataset selection and preparation for developing precise and effective deep learning models for various sequential data types. The findings demonstrate that synthetic text datasets generated by GPT-3 can serve as a valuable resource for developing deep learning models as they are labeled and save researchers time and effort in manual labeling of large datasets.
{"title":"A Study of Using GPT-3 to Generate a Thai Sentiment Analysis of COVID-19 Tweets Dataset","authors":"Patthamanan Isaranontakul, W. Kreesuradej","doi":"10.1109/JCSSE58229.2023.10201994","DOIUrl":"https://doi.org/10.1109/JCSSE58229.2023.10201994","url":null,"abstract":"This study evaluated the effectiveness of using synthetic text datasets generated by GPT-3 for sentiment analysis with deep learning models, namely Bi-GRU and Bi-LSTM. The study compares the performance of these models on both synthetic text and Label Tweet datasets using GPT-3 and reveals that deep learning model performance is dependent on the dataset's nature. The results indicate that using synthetic text generated by GPT-3 significantly enhances the accuracy of both models, with Bi-LSTM achieving an accuracy of 0.84 and Bi-GRU achieving an accuracy of 0.85. The study underscores the importance of meticulous dataset selection and preparation for developing precise and effective deep learning models for various sequential data types. The findings demonstrate that synthetic text datasets generated by GPT-3 can serve as a valuable resource for developing deep learning models as they are labeled and save researchers time and effort in manual labeling of large datasets.","PeriodicalId":298838,"journal":{"name":"2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114453372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-28DOI: 10.1109/JCSSE58229.2023.10202046
Michelle C. Tanega, Arnel C. Fajardo, J. S. Limbago
The rapid population growth in the Philippines has increased the demand for food and aquatic commodities, making fishing a crucial source of income for coastal households [1]. Ensuring the water quality of Philippine lakes is essential to maintaining the environment's integrity and protecting communities that rely on aquatic resources. In this work, we applied machine learning classification algorithms such as Random Forest, Decision Tree, and Support Vector to calculate Taal Lake, Philippines's Water Quality Index (WQI) and Water Quality Classification (WQC). The Weighted Arithmetic Water Quality Index (WAWQI) approach was employed to classify the water quality in Taal Lake. Our results showed the lake's water quality was unsuitable between 2018 and 2022 at five selected stations. Moreover, we evaluated the classification model against two other algorithms and demonstrated that it outperformed Precision, Recall, and the F-1 score. Random Forest achieved the highest overall accuracy rate of 95.0% compared to the other models tested. This study emphasizes the importance of utilizing machine learning algorithms to monitor and classify water quality in the Philippines.
{"title":"Analysis of Water Quality for Taal Lake Using Machine Learning Classification Algorithm","authors":"Michelle C. Tanega, Arnel C. Fajardo, J. S. Limbago","doi":"10.1109/JCSSE58229.2023.10202046","DOIUrl":"https://doi.org/10.1109/JCSSE58229.2023.10202046","url":null,"abstract":"The rapid population growth in the Philippines has increased the demand for food and aquatic commodities, making fishing a crucial source of income for coastal households [1]. Ensuring the water quality of Philippine lakes is essential to maintaining the environment's integrity and protecting communities that rely on aquatic resources. In this work, we applied machine learning classification algorithms such as Random Forest, Decision Tree, and Support Vector to calculate Taal Lake, Philippines's Water Quality Index (WQI) and Water Quality Classification (WQC). The Weighted Arithmetic Water Quality Index (WAWQI) approach was employed to classify the water quality in Taal Lake. Our results showed the lake's water quality was unsuitable between 2018 and 2022 at five selected stations. Moreover, we evaluated the classification model against two other algorithms and demonstrated that it outperformed Precision, Recall, and the F-1 score. Random Forest achieved the highest overall accuracy rate of 95.0% compared to the other models tested. This study emphasizes the importance of utilizing machine learning algorithms to monitor and classify water quality in the Philippines.","PeriodicalId":298838,"journal":{"name":"2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123801228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-28DOI: 10.1109/JCSSE58229.2023.10202100
Thiti Chuenbubpha, Thapana Boonchoo, J. Haga, Prapaporn Rattanatamrong
Federated Learning (FL) has emerged as a powerful methodology for training centralized models while preserving data privacy by using trained parameters from local models that are distributed among decentralized sites. Despite its growing popularity in the development of cloud-based Internet of Things (IoT) applications, FL performance is significantly impacted when data is non-independently and identically distributed (non-IID). This paper proposes a novel framework, called GANs Augmented IID - Federated Learning (GAIID-FL) to tackle the diversity of data distribution among clients in FL. The GAIID-FL framework collaboratively trains Generative Adversarial Networks (GANs) and then employs the trained GANs models to generate synthetic data and distribute proportionally to each device. Using GAIID-FL effectively restores the IID data distribution for the setting. The experimental results demonstrate that our framework can achieve up to 45% improvement in accuracy. In addition, the batch collaborative training approach for GANs models can reduce communication overhead by up to 90 times when compared to the unoptimized method.
{"title":"Solving Non-IID in Federated Learning for Image Classification using GANs","authors":"Thiti Chuenbubpha, Thapana Boonchoo, J. Haga, Prapaporn Rattanatamrong","doi":"10.1109/JCSSE58229.2023.10202100","DOIUrl":"https://doi.org/10.1109/JCSSE58229.2023.10202100","url":null,"abstract":"Federated Learning (FL) has emerged as a powerful methodology for training centralized models while preserving data privacy by using trained parameters from local models that are distributed among decentralized sites. Despite its growing popularity in the development of cloud-based Internet of Things (IoT) applications, FL performance is significantly impacted when data is non-independently and identically distributed (non-IID). This paper proposes a novel framework, called GANs Augmented IID - Federated Learning (GAIID-FL) to tackle the diversity of data distribution among clients in FL. The GAIID-FL framework collaboratively trains Generative Adversarial Networks (GANs) and then employs the trained GANs models to generate synthetic data and distribute proportionally to each device. Using GAIID-FL effectively restores the IID data distribution for the setting. The experimental results demonstrate that our framework can achieve up to 45% improvement in accuracy. In addition, the batch collaborative training approach for GANs models can reduce communication overhead by up to 90 times when compared to the unoptimized method.","PeriodicalId":298838,"journal":{"name":"2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130491435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-28DOI: 10.1109/JCSSE58229.2023.10201963
Lutfee Phalawan, Nalina Phisanbut, P. Piamsa-nga
Defect inspection is an essential process in glass bottle production. Machine vision has shown great potential as an alternative to human inspection. In this paper, we present two open challenges in developing effective machine vision system, i.e., the constraint on an availability of training datasets with all relevant ground truth labels due to high annotation cost; and the difficulty in detecting defects in the texture region which may only be observable at certain angles. A novel weakly-supervised glass bottle defect detection system is proposed. The model features a two-stage learning process to recover missing labels, and train classifiers. A n IoT-based image acquisition system was designed to provide multi-view images of the bottle using a single camera. The proposed defect detection system is evaluated on glass bottle images acquired by our designed apparatus. The experimental results validate the effectiveness of the proposed system on a dataset with only single-positive labels with up to 96% accuracy.
{"title":"A Weakly-Supervised Glass Bottle Defect Detection System Based on Multi-View Analysis","authors":"Lutfee Phalawan, Nalina Phisanbut, P. Piamsa-nga","doi":"10.1109/JCSSE58229.2023.10201963","DOIUrl":"https://doi.org/10.1109/JCSSE58229.2023.10201963","url":null,"abstract":"Defect inspection is an essential process in glass bottle production. Machine vision has shown great potential as an alternative to human inspection. In this paper, we present two open challenges in developing effective machine vision system, i.e., the constraint on an availability of training datasets with all relevant ground truth labels due to high annotation cost; and the difficulty in detecting defects in the texture region which may only be observable at certain angles. A novel weakly-supervised glass bottle defect detection system is proposed. The model features a two-stage learning process to recover missing labels, and train classifiers. A n IoT-based image acquisition system was designed to provide multi-view images of the bottle using a single camera. The proposed defect detection system is evaluated on glass bottle images acquired by our designed apparatus. The experimental results validate the effectiveness of the proposed system on a dataset with only single-positive labels with up to 96% accuracy.","PeriodicalId":298838,"journal":{"name":"2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115080906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-28DOI: 10.1109/JCSSE58229.2023.10202137
Leow Meng-Chew, Ong Lee-Yeng, Jian-Yu Tan
An Online Roadshow is a new model that serves the same goal as a physical roadshow. After the initial trial of the Online Roadshow website version 1.0, it was discovered that there were several aspects of the Online Roadshow website such as the website interface and functionality that might be improved. As a result, version 2.0 of the Online Roadshow website was built based on these enhancements. This study assessed the usability of the newly developed website using the proposed practical usability framework that combined the System Usability Scale (SUS) and one open-ended question, which was then analysed using the affinity diagrams. A total of 250 respondents completed the survey after experiencing the website. The result showed that the average SUS score of the new Online Roadshow website increased from 58.85 to 62.6. The overall analysis suggests that enhancements to version 1.0 using the same proposed pragmatic usability framework improves user experience.
{"title":"A Practical Usability Study Framework Using the SUS and the Affinity Diagram: A Case Study on the Revised Online Roadshow Website","authors":"Leow Meng-Chew, Ong Lee-Yeng, Jian-Yu Tan","doi":"10.1109/JCSSE58229.2023.10202137","DOIUrl":"https://doi.org/10.1109/JCSSE58229.2023.10202137","url":null,"abstract":"An Online Roadshow is a new model that serves the same goal as a physical roadshow. After the initial trial of the Online Roadshow website version 1.0, it was discovered that there were several aspects of the Online Roadshow website such as the website interface and functionality that might be improved. As a result, version 2.0 of the Online Roadshow website was built based on these enhancements. This study assessed the usability of the newly developed website using the proposed practical usability framework that combined the System Usability Scale (SUS) and one open-ended question, which was then analysed using the affinity diagrams. A total of 250 respondents completed the survey after experiencing the website. The result showed that the average SUS score of the new Online Roadshow website increased from 58.85 to 62.6. The overall analysis suggests that enhancements to version 1.0 using the same proposed pragmatic usability framework improves user experience.","PeriodicalId":298838,"journal":{"name":"2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"438 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132686794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-28DOI: 10.1109/JCSSE58229.2023.10202081
Chanipa Sonklin, Kachane Sonklin
As the demand for cloud computing services has increased continuously, the number of cloud service providers and cloud applications are increasing dramatically. This has resulted in a significant increase in data center energy consumption. One of the most well-known ways to improve the efficiency of data centers is server consolidation. In our recent study on server consolidation, we presented a grouping genetic algorithm for a single objective that is minimization of the energy consumption in a data center. In order to enhance the performance of the grouping genetic algorithm, this paper presents a multi-objective grouping genetic algorithm for a multi-objective, including minimizing both the energy consumption and resource wastage. The optimal solution for the multi-objective grouping genetic algorithm is determined by a fitness function that considers the two objectives with a trade-off between each objective. The results from the experiments show that the multi-objective grouping genetic algorithm performs well in terms of the resource wastage and number of active physical machines in the data center and is scalable to the number of virtual machines and physical machines.
{"title":"A Multi-Objective Grouping Genetic Algorithm for Server Consolidation in Cloud Data Centers","authors":"Chanipa Sonklin, Kachane Sonklin","doi":"10.1109/JCSSE58229.2023.10202081","DOIUrl":"https://doi.org/10.1109/JCSSE58229.2023.10202081","url":null,"abstract":"As the demand for cloud computing services has increased continuously, the number of cloud service providers and cloud applications are increasing dramatically. This has resulted in a significant increase in data center energy consumption. One of the most well-known ways to improve the efficiency of data centers is server consolidation. In our recent study on server consolidation, we presented a grouping genetic algorithm for a single objective that is minimization of the energy consumption in a data center. In order to enhance the performance of the grouping genetic algorithm, this paper presents a multi-objective grouping genetic algorithm for a multi-objective, including minimizing both the energy consumption and resource wastage. The optimal solution for the multi-objective grouping genetic algorithm is determined by a fitness function that considers the two objectives with a trade-off between each objective. The results from the experiments show that the multi-objective grouping genetic algorithm performs well in terms of the resource wastage and number of active physical machines in the data center and is scalable to the number of virtual machines and physical machines.","PeriodicalId":298838,"journal":{"name":"2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134052722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-28DOI: 10.1109/JCSSE58229.2023.10202093
S. Mekruksavanich, A. Jitpattanakul
The field o f human activity recognition (HAR) is a priority for cutting-edge study because of its potential to revolutionize the way we understand and improve our everyday lives. A large variety of ordinary, everyday tasks has been classified using H AR. Nevertheless, inc ontrast to basic human actions, the increasing demands of numerous real-world applications have attracted the interest of the HAR area of study. Electrical line workers (ELWs) face a variety of challenges, including long hours, working in isolated locations, and performing particularly hazardous tasks. Wearable sensor-based HAR allows for unobtrusive tracking of ELW efficiency and security. This study explores deep learning strategies for automatically categorizing ELWs' complicated actions through sensor data collected through a wrist-worn device. We propose ResNeXt, a deep residual neural network, and evaluate it with other deep learning networks for their ability to categorize ELW activities effectively. We employ a publicly available benchmark dataset that includes 10 ELW tasks. The results of the experiment demonstrate that the proposed ResNeXt achieved the highest accuracy (98.74%) and F1-score (98.81%) compared to other deep learning networks studied.
{"title":"Classifying Activities of Electrical Line Workers Based on Deep Learning Approaches Using Wrist-Worn Sensor","authors":"S. Mekruksavanich, A. Jitpattanakul","doi":"10.1109/JCSSE58229.2023.10202093","DOIUrl":"https://doi.org/10.1109/JCSSE58229.2023.10202093","url":null,"abstract":"The field o f human activity recognition (HAR) is a priority for cutting-edge study because of its potential to revolutionize the way we understand and improve our everyday lives. A large variety of ordinary, everyday tasks has been classified using H AR. Nevertheless, inc ontrast to basic human actions, the increasing demands of numerous real-world applications have attracted the interest of the HAR area of study. Electrical line workers (ELWs) face a variety of challenges, including long hours, working in isolated locations, and performing particularly hazardous tasks. Wearable sensor-based HAR allows for unobtrusive tracking of ELW efficiency and security. This study explores deep learning strategies for automatically categorizing ELWs' complicated actions through sensor data collected through a wrist-worn device. We propose ResNeXt, a deep residual neural network, and evaluate it with other deep learning networks for their ability to categorize ELW activities effectively. We employ a publicly available benchmark dataset that includes 10 ELW tasks. The results of the experiment demonstrate that the proposed ResNeXt achieved the highest accuracy (98.74%) and F1-score (98.81%) compared to other deep learning networks studied.","PeriodicalId":298838,"journal":{"name":"2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"283 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133975619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-28DOI: 10.1109/JCSSE58229.2023.10202141
Jenjira Lomchan, R. Wiangsripanawan, S. F. Shahandashti
The study by Mayer, Kirchner, and Volkamer published at SOUPS 2017 showed that the password composition policy (PCP) strength of both the US and German websites was not influenced by the security but by the usability features of the websites. Surprisingly, the PCP strength of the banking website category was the lowest, whereas the government website was the highest. Our aim in conducting the first study is to find whether 78 Thai frequently used websites in 2018 would yield the same surprising results. Our finding showed an opposite perspective, the highest PCP strength was from the banking websites, followed by university and government websites, respectively. Two more security features were added to our study: 2FA and HTTPS. Although some German websites employing 2FA allowed lower PCPs for better usability, Thai websites with 2FA did not loosen the password requirements. Also, employing HTTPS did not impact the PCP strength. The study with Thai websites was reinvestigated in 2021, two years after the Personal Data Protection Act (PDPA) was announced. The result showed that the median PCP strength of all Thailand samples had grown from 26.6 in 2018 to 31.0 in 2021. The banking websites still retained the highest PCP strength. A significant change appeared on the government websites, increasing from 29.9 to 40.4. In summary, the security features such as the size of services, and values of assets which play no part in both the US and German PCPs were heavily concerned by Thai websites. Government and university websites in Germany and USA gave much higher PCP strength than those in Thailand. The Thai government's PCP strength sharply increased in 2021 due to the privacy law. Nevertheless, it was still lower than the results in Germany and USA in 2016. Therefore, the criteria influencing PCP vary depending on the country.
{"title":"The Comparison of Password Composition Policies Among US, German, and Thailand Samples","authors":"Jenjira Lomchan, R. Wiangsripanawan, S. F. Shahandashti","doi":"10.1109/JCSSE58229.2023.10202141","DOIUrl":"https://doi.org/10.1109/JCSSE58229.2023.10202141","url":null,"abstract":"The study by Mayer, Kirchner, and Volkamer published at SOUPS 2017 showed that the password composition policy (PCP) strength of both the US and German websites was not influenced by the security but by the usability features of the websites. Surprisingly, the PCP strength of the banking website category was the lowest, whereas the government website was the highest. Our aim in conducting the first study is to find whether 78 Thai frequently used websites in 2018 would yield the same surprising results. Our finding showed an opposite perspective, the highest PCP strength was from the banking websites, followed by university and government websites, respectively. Two more security features were added to our study: 2FA and HTTPS. Although some German websites employing 2FA allowed lower PCPs for better usability, Thai websites with 2FA did not loosen the password requirements. Also, employing HTTPS did not impact the PCP strength. The study with Thai websites was reinvestigated in 2021, two years after the Personal Data Protection Act (PDPA) was announced. The result showed that the median PCP strength of all Thailand samples had grown from 26.6 in 2018 to 31.0 in 2021. The banking websites still retained the highest PCP strength. A significant change appeared on the government websites, increasing from 29.9 to 40.4. In summary, the security features such as the size of services, and values of assets which play no part in both the US and German PCPs were heavily concerned by Thai websites. Government and university websites in Germany and USA gave much higher PCP strength than those in Thailand. The Thai government's PCP strength sharply increased in 2021 due to the privacy law. Nevertheless, it was still lower than the results in Germany and USA in 2016. Therefore, the criteria influencing PCP vary depending on the country.","PeriodicalId":298838,"journal":{"name":"2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"21 S6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132388517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-28DOI: 10.1109/JCSSE58229.2023.10201973
Sirawit Tantiphuwanart, Nuengwong Tuaycharoen, Dittaya Wanvarie, Naruemon Pratanwanich, A. Suchato
When the COVID-19 pandemic began, the world had to step into a new normal world, forcing real-life activities to transform into online activities. Formal teaching, learning, and examination were also affected. The online learning assessment system is a highly complex system because it must be able to handle a mass of students taking exams simultaneously, especially during the midterm and the final exam week. Therefore, the system must have an acceptable response time and high reliability. Our assessment system had been developed utilizing MySQL as a database, but the response time was unsatisfactory for actual usage. This paper proposes methods to improve the system's efficiency during high traffic by utilizing Amazon Web Services (AWS) DynamoDB as a cache of MySQL since it has a higher read-and-write efficiency than MySQL database. The read operation will be conducted in the Viewing exam session operation. At the same time, the write policy used for Managing exam sessions is write-through, while the policy for Saving examinees' answers is write-back. Choosing between write-through and write-back policies should take into account the acceptable response time and required instant consistency validation. The evaluation results with JMeter reveal that the method can double the efficiency of the read operation. On the other hand, the write operation is less than 3 seconds, which is still within an acceptable limit. For improving the DynamoDB efficiency, further suggestions have also been given in this paper.
当COVID-19大流行开始时,世界不得不进入一个新常态,迫使现实生活中的活动转变为网络活动。正规的教学、学习和考试也受到了影响。在线学习评估系统是一个高度复杂的系统,因为它必须能够处理大量学生同时参加考试,特别是在期中和期末考试周。因此,系统必须具有可接受的响应时间和高可靠性。我们的评估系统是利用MySQL作为数据库开发的,但是实际使用时的响应时间不能令人满意。本文提出了利用Amazon Web Services (AWS) DynamoDB作为MySQL的缓存来提高系统在高流量时的效率的方法,因为它比MySQL数据库具有更高的读写效率。读操作将在“查看考试会话”操作中进行。同时,管理考试的写策略为透写,保存考生答案的写策略为回写。在write-through和write-back策略之间进行选择时,应该考虑可接受的响应时间和所需的即时一致性验证。用JMeter进行的评价结果表明,该方法可以将读取操作的效率提高一倍。另一方面,写操作少于3秒,这仍然在可接受的限制之内。为了提高DynamoDB的效率,本文还提出了进一步的建议。
{"title":"Performance Improvement on a Learning Assessment Web Application Using AWS DynamoDB as a Cache Database","authors":"Sirawit Tantiphuwanart, Nuengwong Tuaycharoen, Dittaya Wanvarie, Naruemon Pratanwanich, A. Suchato","doi":"10.1109/JCSSE58229.2023.10201973","DOIUrl":"https://doi.org/10.1109/JCSSE58229.2023.10201973","url":null,"abstract":"When the COVID-19 pandemic began, the world had to step into a new normal world, forcing real-life activities to transform into online activities. Formal teaching, learning, and examination were also affected. The online learning assessment system is a highly complex system because it must be able to handle a mass of students taking exams simultaneously, especially during the midterm and the final exam week. Therefore, the system must have an acceptable response time and high reliability. Our assessment system had been developed utilizing MySQL as a database, but the response time was unsatisfactory for actual usage. This paper proposes methods to improve the system's efficiency during high traffic by utilizing Amazon Web Services (AWS) DynamoDB as a cache of MySQL since it has a higher read-and-write efficiency than MySQL database. The read operation will be conducted in the Viewing exam session operation. At the same time, the write policy used for Managing exam sessions is write-through, while the policy for Saving examinees' answers is write-back. Choosing between write-through and write-back policies should take into account the acceptable response time and required instant consistency validation. The evaluation results with JMeter reveal that the method can double the efficiency of the read operation. On the other hand, the write operation is less than 3 seconds, which is still within an acceptable limit. For improving the DynamoDB efficiency, further suggestions have also been given in this paper.","PeriodicalId":298838,"journal":{"name":"2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"147 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116411998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-28DOI: 10.1109/JCSSE58229.2023.10202012
Arisa Umparat, S. Phoomvuthisarn
Tag classification is essential in Stack Overflow. Instead of combining through pages or replies of irrelevant information, users can easily and quickly pinpoint relevant posts and answers using tags. Since User-submitted posts can have multiple tags, classifying tags in Stack Overflow can be challenging. This results in an imbalance problem between labels in the whole labelset. Pretrained deep-learning models with small datasets can improve tag classification accuracy. Common multi-label resampling techniques with machine learning classifiers can also fix this issue. Still, few studies have explored which resampling technique can improve the performance of pre-trained deep models for predicting tags. To address this gap, we experimented to evaluate the effectiveness of ELECTRA, a powerful deep learning pre-trained model, with various multi-label resampling techniques in decreasing the imbalance that induces mislabeling in Stack Overflow's tagging posts. We compared seven resampling techniques, such as LP-ROS, ML-ROS, MLSMOTE, MLeNN, MLTL, ML-SOL, and REMEDIAL, to find the best method to mitigate the imbalance and improve tag prediction accuracy. Our results show that MLTL is the most effective selection to tackle the inequality in multi-label classification for our Stack Overflow data with deep learning scenarios. MLTL achieved 0.517, 0.804, 0.467, and 0.98 from the metrics Precision@l, Recall@5, F1-score@1, and AUC, respectively. Conversely, MLeNN gained only 0.323, 0.648, 0.277, and 0.95 from the same metrics.
{"title":"Improving Pre-Trained Models for Multi-Label Classification in Stack Overflow: A Comparison of Imbalanced Data Handling Methods","authors":"Arisa Umparat, S. Phoomvuthisarn","doi":"10.1109/JCSSE58229.2023.10202012","DOIUrl":"https://doi.org/10.1109/JCSSE58229.2023.10202012","url":null,"abstract":"Tag classification is essential in Stack Overflow. Instead of combining through pages or replies of irrelevant information, users can easily and quickly pinpoint relevant posts and answers using tags. Since User-submitted posts can have multiple tags, classifying tags in Stack Overflow can be challenging. This results in an imbalance problem between labels in the whole labelset. Pretrained deep-learning models with small datasets can improve tag classification accuracy. Common multi-label resampling techniques with machine learning classifiers can also fix this issue. Still, few studies have explored which resampling technique can improve the performance of pre-trained deep models for predicting tags. To address this gap, we experimented to evaluate the effectiveness of ELECTRA, a powerful deep learning pre-trained model, with various multi-label resampling techniques in decreasing the imbalance that induces mislabeling in Stack Overflow's tagging posts. We compared seven resampling techniques, such as LP-ROS, ML-ROS, MLSMOTE, MLeNN, MLTL, ML-SOL, and REMEDIAL, to find the best method to mitigate the imbalance and improve tag prediction accuracy. Our results show that MLTL is the most effective selection to tackle the inequality in multi-label classification for our Stack Overflow data with deep learning scenarios. MLTL achieved 0.517, 0.804, 0.467, and 0.98 from the metrics Precision@l, Recall@5, F1-score@1, and AUC, respectively. Conversely, MLeNN gained only 0.323, 0.648, 0.277, and 0.95 from the same metrics.","PeriodicalId":298838,"journal":{"name":"2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130191634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}