2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)最新文献_第4页

A Study of Using GPT-3 to Generate a Thai Sentiment Analysis of COVID-19 Tweets Dataset 使用GPT-3生成COVID-19推文数据集的泰国情绪分析研究

2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)

Pub Date : 2023-06-28 DOI: 10.1109/JCSSE58229.2023.10201994

Patthamanan Isaranontakul, W. Kreesuradej

This study evaluated the effectiveness of using synthetic text datasets generated by GPT-3 for sentiment analysis with deep learning models, namely Bi-GRU and Bi-LSTM. The study compares the performance of these models on both synthetic text and Label Tweet datasets using GPT-3 and reveals that deep learning model performance is dependent on the dataset's nature. The results indicate that using synthetic text generated by GPT-3 significantly enhances the accuracy of both models, with Bi-LSTM achieving an accuracy of 0.84 and Bi-GRU achieving an accuracy of 0.85. The study underscores the importance of meticulous dataset selection and preparation for developing precise and effective deep learning models for various sequential data types. The findings demonstrate that synthetic text datasets generated by GPT-3 can serve as a valuable resource for developing deep learning models as they are labeled and save researchers time and effort in manual labeling of large datasets.

本研究评估了使用GPT-3生成的合成文本数据集与深度学习模型(Bi-GRU和Bi-LSTM)进行情感分析的有效性。该研究使用GPT-3比较了这些模型在合成文本和标签Tweet数据集上的性能，并揭示了深度学习模型的性能取决于数据集的性质。结果表明，使用GPT-3生成的合成文本显著提高了两种模型的准确率，Bi-LSTM的准确率为0.84,Bi-GRU的准确率为0.85。该研究强调了细致的数据集选择和准备对于为各种顺序数据类型开发精确有效的深度学习模型的重要性。研究结果表明，GPT-3生成的合成文本数据集可以作为开发深度学习模型的宝贵资源，因为它们被标记，并且节省了研究人员手动标记大型数据集的时间和精力。

引用次数: 0

Analysis of Water Quality for Taal Lake Using Machine Learning Classification Algorithm 基于机器学习分类算法的塔尔湖水质分析

2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)

Pub Date : 2023-06-28 DOI: 10.1109/JCSSE58229.2023.10202046

Michelle C. Tanega, Arnel C. Fajardo, J. S. Limbago

The rapid population growth in the Philippines has increased the demand for food and aquatic commodities, making fishing a crucial source of income for coastal households [1]. Ensuring the water quality of Philippine lakes is essential to maintaining the environment's integrity and protecting communities that rely on aquatic resources. In this work, we applied machine learning classification algorithms such as Random Forest, Decision Tree, and Support Vector to calculate Taal Lake, Philippines's Water Quality Index (WQI) and Water Quality Classification (WQC). The Weighted Arithmetic Water Quality Index (WAWQI) approach was employed to classify the water quality in Taal Lake. Our results showed the lake's water quality was unsuitable between 2018 and 2022 at five selected stations. Moreover, we evaluated the classification model against two other algorithms and demonstrated that it outperformed Precision, Recall, and the F-1 score. Random Forest achieved the highest overall accuracy rate of 95.0% compared to the other models tested. This study emphasizes the importance of utilizing machine learning algorithms to monitor and classify water quality in the Philippines.

菲律宾人口的快速增长增加了对食品和水产商品的需求，使渔业成为沿海家庭的重要收入来源[1]。确保菲律宾湖泊的水质对于维持环境的完整性和保护依赖水生资源的社区至关重要。在这项工作中，我们应用随机森林、决策树和支持向量等机器学习分类算法来计算菲律宾塔尔湖的水质指数(WQI)和水质分类(WQC)。采用加权算术水质指数(WAWQI)方法对塔尔湖水质进行分类。我们的研究结果显示，2018年至2022年期间，五个选定的站点的水质不适宜。此外，我们将分类模型与其他两种算法进行了评估，并证明它优于Precision, Recall和F-1分数。与测试的其他模型相比，Random Forest的总体准确率最高，达到95.0%。本研究强调了利用机器学习算法监测和分类菲律宾水质的重要性。

{"title":"Analysis of Water Quality for Taal Lake Using Machine Learning Classification Algorithm","authors":"Michelle C. Tanega, Arnel C. Fajardo, J. S. Limbago","doi":"10.1109/JCSSE58229.2023.10202046","DOIUrl":"https://doi.org/10.1109/JCSSE58229.2023.10202046","url":null,"abstract":"The rapid population growth in the Philippines has increased the demand for food and aquatic commodities, making fishing a crucial source of income for coastal households [1]. Ensuring the water quality of Philippine lakes is essential to maintaining the environment's integrity and protecting communities that rely on aquatic resources. In this work, we applied machine learning classification algorithms such as Random Forest, Decision Tree, and Support Vector to calculate Taal Lake, Philippines's Water Quality Index (WQI) and Water Quality Classification (WQC). The Weighted Arithmetic Water Quality Index (WAWQI) approach was employed to classify the water quality in Taal Lake. Our results showed the lake's water quality was unsuitable between 2018 and 2022 at five selected stations. Moreover, we evaluated the classification model against two other algorithms and demonstrated that it outperformed Precision, Recall, and the F-1 score. Random Forest achieved the highest overall accuracy rate of 95.0% compared to the other models tested. This study emphasizes the importance of utilizing machine learning algorithms to monitor and classify water quality in the Philippines.","PeriodicalId":298838,"journal":{"name":"2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123801228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Solving Non-IID in Federated Learning for Image Classification using GANs 利用gan解决图像分类联邦学习中的非iid问题

2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)

Pub Date : 2023-06-28 DOI: 10.1109/JCSSE58229.2023.10202100

Thiti Chuenbubpha, Thapana Boonchoo, J. Haga, Prapaporn Rattanatamrong

Federated Learning (FL) has emerged as a powerful methodology for training centralized models while preserving data privacy by using trained parameters from local models that are distributed among decentralized sites. Despite its growing popularity in the development of cloud-based Internet of Things (IoT) applications, FL performance is significantly impacted when data is non-independently and identically distributed (non-IID). This paper proposes a novel framework, called GANs Augmented IID - Federated Learning (GAIID-FL) to tackle the diversity of data distribution among clients in FL. The GAIID-FL framework collaboratively trains Generative Adversarial Networks (GANs) and then employs the trained GANs models to generate synthetic data and distribute proportionally to each device. Using GAIID-FL effectively restores the IID data distribution for the setting. The experimental results demonstrate that our framework can achieve up to 45% improvement in accuracy. In addition, the batch collaborative training approach for GANs models can reduce communication overhead by up to 90 times when compared to the unoptimized method.

联邦学习(FL)已经成为一种强大的方法，用于训练集中式模型，同时通过使用分布在分散站点中的本地模型的训练参数来保护数据隐私。尽管它在基于云的物联网(IoT)应用程序的开发中越来越受欢迎，但当数据是非独立和同分布(non-IID)时，FL性能会受到显著影响。本文提出了一种新的框架，称为GANs增强IID-联邦学习(GAIID-FL)，以解决FL客户端之间数据分布的多样性。GAIID-FL框架协同训练生成对抗网络(GANs)，然后使用训练好的GANs模型生成合成数据并按比例分布到每个设备。使用GAIID-FL可以有效地恢复设置的IID数据分布。实验结果表明，该框架的精度提高了45%。此外，gan模型的批量协同训练方法与未优化的方法相比，可以将通信开销减少90倍。

{"title":"Solving Non-IID in Federated Learning for Image Classification using GANs","authors":"Thiti Chuenbubpha, Thapana Boonchoo, J. Haga, Prapaporn Rattanatamrong","doi":"10.1109/JCSSE58229.2023.10202100","DOIUrl":"https://doi.org/10.1109/JCSSE58229.2023.10202100","url":null,"abstract":"Federated Learning (FL) has emerged as a powerful methodology for training centralized models while preserving data privacy by using trained parameters from local models that are distributed among decentralized sites. Despite its growing popularity in the development of cloud-based Internet of Things (IoT) applications, FL performance is significantly impacted when data is non-independently and identically distributed (non-IID). This paper proposes a novel framework, called GANs Augmented IID - Federated Learning (GAIID-FL) to tackle the diversity of data distribution among clients in FL. The GAIID-FL framework collaboratively trains Generative Adversarial Networks (GANs) and then employs the trained GANs models to generate synthetic data and distribute proportionally to each device. Using GAIID-FL effectively restores the IID data distribution for the setting. The experimental results demonstrate that our framework can achieve up to 45% improvement in accuracy. In addition, the batch collaborative training approach for GANs models can reduce communication overhead by up to 90 times when compared to the unoptimized method.","PeriodicalId":298838,"journal":{"name":"2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130491435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Weakly-Supervised Glass Bottle Defect Detection System Based on Multi-View Analysis 基于多视角分析的弱监督玻璃瓶缺陷检测系统

2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)

Pub Date : 2023-06-28 DOI: 10.1109/JCSSE58229.2023.10201963

Lutfee Phalawan, Nalina Phisanbut, P. Piamsa-nga

Defect inspection is an essential process in glass bottle production. Machine vision has shown great potential as an alternative to human inspection. In this paper, we present two open challenges in developing effective machine vision system, i.e., the constraint on an availability of training datasets with all relevant ground truth labels due to high annotation cost; and the difficulty in detecting defects in the texture region which may only be observable at certain angles. A novel weakly-supervised glass bottle defect detection system is proposed. The model features a two-stage learning process to recover missing labels, and train classifiers. A n IoT-based image acquisition system was designed to provide multi-view images of the bottle using a single camera. The proposed defect detection system is evaluated on glass bottle images acquired by our designed apparatus. The experimental results validate the effectiveness of the proposed system on a dataset with only single-positive labels with up to 96% accuracy.

缺陷检测是玻璃瓶生产中的一个重要环节。机器视觉作为人类检查的替代品已经显示出巨大的潜力。在本文中，我们提出了开发有效机器视觉系统的两个开放挑战，即由于注释成本高，具有所有相关地面真值标签的训练数据集的可用性受到限制;纹理区域的缺陷只能在一定角度上观察到，难以检测。提出了一种新型的弱监督玻璃瓶缺陷检测系统。该模型的特点是一个两阶段的学习过程来恢复缺失的标签，并训练分类器。设计了一种基于物联网的图像采集系统，可以使用单个摄像头提供瓶子的多视图图像。利用所设计的仪器采集的玻璃瓶图像对所提出的缺陷检测系统进行了评价。实验结果验证了该系统在单阳性标签数据集上的有效性，准确率高达96%。

引用次数: 0

A Practical Usability Study Framework Using the SUS and the Affinity Diagram: A Case Study on the Revised Online Roadshow Website 基于SUS和关联图的实用可用性研究框架——以改版路演网站为例

2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)

Pub Date : 2023-06-28 DOI: 10.1109/JCSSE58229.2023.10202137

Leow Meng-Chew, Ong Lee-Yeng, Jian-Yu Tan

An Online Roadshow is a new model that serves the same goal as a physical roadshow. After the initial trial of the Online Roadshow website version 1.0, it was discovered that there were several aspects of the Online Roadshow website such as the website interface and functionality that might be improved. As a result, version 2.0 of the Online Roadshow website was built based on these enhancements. This study assessed the usability of the newly developed website using the proposed practical usability framework that combined the System Usability Scale (SUS) and one open-ended question, which was then analysed using the affinity diagrams. A total of 250 respondents completed the survey after experiencing the website. The result showed that the average SUS score of the new Online Roadshow website increased from 58.85 to 62.6. The overall analysis suggests that enhancements to version 1.0 using the same proposed pragmatic usability framework improves user experience.

在线路演是一种与实体路演目的相同的新模式。通过对在线路演网站1.0版本的初步试用，发现在线路演网站在网站界面、功能等几个方面有需要改进的地方。因此，在线路演网站的2.0版本是基于这些增强功能构建的。本研究使用提出的实用可用性框架评估新开发网站的可用性，该框架结合了系统可用性量表(SUS)和一个开放式问题，然后使用关联图对其进行分析。共有250名受访者在体验了该网站后完成了调查。结果显示，新上线路演网站的SUS平均分从58.85分上升到62.6分。总体分析表明，使用相同的实用可用性框架对1.0版本进行增强可以改善用户体验。

引用次数: 0

A Multi-Objective Grouping Genetic Algorithm for Server Consolidation in Cloud Data Centers 云数据中心服务器整合的多目标分组遗传算法

2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)

Pub Date : 2023-06-28 DOI: 10.1109/JCSSE58229.2023.10202081

Chanipa Sonklin, Kachane Sonklin

As the demand for cloud computing services has increased continuously, the number of cloud service providers and cloud applications are increasing dramatically. This has resulted in a significant increase in data center energy consumption. One of the most well-known ways to improve the efficiency of data centers is server consolidation. In our recent study on server consolidation, we presented a grouping genetic algorithm for a single objective that is minimization of the energy consumption in a data center. In order to enhance the performance of the grouping genetic algorithm, this paper presents a multi-objective grouping genetic algorithm for a multi-objective, including minimizing both the energy consumption and resource wastage. The optimal solution for the multi-objective grouping genetic algorithm is determined by a fitness function that considers the two objectives with a trade-off between each objective. The results from the experiments show that the multi-objective grouping genetic algorithm performs well in terms of the resource wastage and number of active physical machines in the data center and is scalable to the number of virtual machines and physical machines.

随着云计算服务需求的不断增长，云服务提供商和云应用的数量也在急剧增加。这导致了数据中心能耗的显著增加。提高数据中心效率的最著名的方法之一是服务器整合。在我们最近关于服务器整合的研究中，我们提出了一种分组遗传算法，用于实现数据中心能耗最小化这一单一目标。为了提高分组遗传算法的性能，本文提出了一种多目标分组遗传算法，该算法既要使能量消耗最小化，又要使资源浪费最小化。多目标分组遗传算法的最优解由考虑两个目标的适应度函数确定，并在每个目标之间进行权衡。实验结果表明，多目标分组遗传算法在资源浪费和数据中心活动物理机数量方面表现良好，并可扩展到虚拟机和物理机的数量。

{"title":"A Multi-Objective Grouping Genetic Algorithm for Server Consolidation in Cloud Data Centers","authors":"Chanipa Sonklin, Kachane Sonklin","doi":"10.1109/JCSSE58229.2023.10202081","DOIUrl":"https://doi.org/10.1109/JCSSE58229.2023.10202081","url":null,"abstract":"As the demand for cloud computing services has increased continuously, the number of cloud service providers and cloud applications are increasing dramatically. This has resulted in a significant increase in data center energy consumption. One of the most well-known ways to improve the efficiency of data centers is server consolidation. In our recent study on server consolidation, we presented a grouping genetic algorithm for a single objective that is minimization of the energy consumption in a data center. In order to enhance the performance of the grouping genetic algorithm, this paper presents a multi-objective grouping genetic algorithm for a multi-objective, including minimizing both the energy consumption and resource wastage. The optimal solution for the multi-objective grouping genetic algorithm is determined by a fitness function that considers the two objectives with a trade-off between each objective. The results from the experiments show that the multi-objective grouping genetic algorithm performs well in terms of the resource wastage and number of active physical machines in the data center and is scalable to the number of virtual machines and physical machines.","PeriodicalId":298838,"journal":{"name":"2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134052722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Classifying Activities of Electrical Line Workers Based on Deep Learning Approaches Using Wrist-Worn Sensor 基于腕戴式传感器的深度学习方法对电线工人活动进行分类

2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)

Pub Date : 2023-06-28 DOI: 10.1109/JCSSE58229.2023.10202093

S. Mekruksavanich, A. Jitpattanakul

The field o f human activity recognition (HAR) is a priority for cutting-edge study because of its potential to revolutionize the way we understand and improve our everyday lives. A large variety of ordinary, everyday tasks has been classified using H AR. Nevertheless, inc ontrast to basic human actions, the increasing demands of numerous real-world applications have attracted the interest of the HAR area of study. Electrical line workers (ELWs) face a variety of challenges, including long hours, working in isolated locations, and performing particularly hazardous tasks. Wearable sensor-based HAR allows for unobtrusive tracking of ELW efficiency and security. This study explores deep learning strategies for automatically categorizing ELWs' complicated actions through sensor data collected through a wrist-worn device. We propose ResNeXt, a deep residual neural network, and evaluate it with other deep learning networks for their ability to categorize ELW activities effectively. We employ a publicly available benchmark dataset that includes 10 ELW tasks. The results of the experiment demonstrate that the proposed ResNeXt achieved the highest accuracy (98.74%) and F1-score (98.81%) compared to other deep learning networks studied.

人类活动识别(HAR)领域是前沿研究的重点，因为它有可能彻底改变我们理解和改善日常生活的方式。各种各样的普通日常任务已经使用HAR进行分类。然而，与基本的人类行为相比，许多现实世界应用的不断增长的需求吸引了HAR研究领域的兴趣。电气线路工人(elw)面临着各种各样的挑战，包括长时间工作，在孤立的地点工作，以及执行特别危险的任务。基于可穿戴传感器的HAR可以不显眼地跟踪ELW的效率和安全性。本研究通过腕带设备收集传感器数据，探索深度学习策略对elw的复杂动作进行自动分类。我们提出了深度残差神经网络ResNeXt，并与其他深度学习网络一起评估其有效分类ELW活动的能力。我们使用了一个公开可用的基准数据集，其中包括10个ELW任务。实验结果表明，与其他研究过的深度学习网络相比，本文提出的ResNeXt的准确率(98.74%)和f1分数(98.81%)最高。

{"title":"Classifying Activities of Electrical Line Workers Based on Deep Learning Approaches Using Wrist-Worn Sensor","authors":"S. Mekruksavanich, A. Jitpattanakul","doi":"10.1109/JCSSE58229.2023.10202093","DOIUrl":"https://doi.org/10.1109/JCSSE58229.2023.10202093","url":null,"abstract":"The field o f human activity recognition (HAR) is a priority for cutting-edge study because of its potential to revolutionize the way we understand and improve our everyday lives. A large variety of ordinary, everyday tasks has been classified using H AR. Nevertheless, inc ontrast to basic human actions, the increasing demands of numerous real-world applications have attracted the interest of the HAR area of study. Electrical line workers (ELWs) face a variety of challenges, including long hours, working in isolated locations, and performing particularly hazardous tasks. Wearable sensor-based HAR allows for unobtrusive tracking of ELW efficiency and security. This study explores deep learning strategies for automatically categorizing ELWs' complicated actions through sensor data collected through a wrist-worn device. We propose ResNeXt, a deep residual neural network, and evaluate it with other deep learning networks for their ability to categorize ELW activities effectively. We employ a publicly available benchmark dataset that includes 10 ELW tasks. The results of the experiment demonstrate that the proposed ResNeXt achieved the highest accuracy (98.74%) and F1-score (98.81%) compared to other deep learning networks studied.","PeriodicalId":298838,"journal":{"name":"2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"283 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133975619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The Comparison of Password Composition Policies Among US, German, and Thailand Samples 美国、德国和泰国样本密码组合策略的比较

2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)

Pub Date : 2023-06-28 DOI: 10.1109/JCSSE58229.2023.10202141

Jenjira Lomchan, R. Wiangsripanawan, S. F. Shahandashti

The study by Mayer, Kirchner, and Volkamer published at SOUPS 2017 showed that the password composition policy (PCP) strength of both the US and German websites was not influenced by the security but by the usability features of the websites. Surprisingly, the PCP strength of the banking website category was the lowest, whereas the government website was the highest. Our aim in conducting the first study is to find whether 78 Thai frequently used websites in 2018 would yield the same surprising results. Our finding showed an opposite perspective, the highest PCP strength was from the banking websites, followed by university and government websites, respectively. Two more security features were added to our study: 2FA and HTTPS. Although some German websites employing 2FA allowed lower PCPs for better usability, Thai websites with 2FA did not loosen the password requirements. Also, employing HTTPS did not impact the PCP strength. The study with Thai websites was reinvestigated in 2021, two years after the Personal Data Protection Act (PDPA) was announced. The result showed that the median PCP strength of all Thailand samples had grown from 26.6 in 2018 to 31.0 in 2021. The banking websites still retained the highest PCP strength. A significant change appeared on the government websites, increasing from 29.9 to 40.4. In summary, the security features such as the size of services, and values of assets which play no part in both the US and German PCPs were heavily concerned by Thai websites. Government and university websites in Germany and USA gave much higher PCP strength than those in Thailand. The Thai government's PCP strength sharply increased in 2021 due to the privacy law. Nevertheless, it was still lower than the results in Germany and USA in 2016. Therefore, the criteria influencing PCP vary depending on the country.

Mayer, Kirchner和Volkamer在SOUPS 2017上发表的研究表明，美国和德国网站的密码组合策略(PCP)强度不受安全性影响，而是受网站可用性特征的影响。令人惊讶的是，银行网站类别的PCP强度最低，而政府网站类别的PCP强度最高。我们进行第一项研究的目的是找出2018年泰国人经常使用的78个网站是否会产生同样令人惊讶的结果。我们的发现显示了相反的观点，PCP强度最高的是银行网站，其次是大学和政府网站。我们的研究中又增加了两个安全特性:2FA和HTTPS。虽然一些采用2FA的德国网站允许较低的pcp以获得更好的可用性，但采用2FA的泰国网站并没有放松密码要求。此外，采用HTTPS并不影响PCP强度。2021年，在《个人数据保护法》(PDPA)宣布两年后，对泰国网站进行了这项研究。结果显示，所有泰国样本的PCP强度中位数从2018年的26.6增长到2021年的31.0。银行网站仍然保持最高的PCP强度。政府网站出现了显著变化，从29.9上升到40.4。总之，泰国网站非常关注服务规模和资产价值等安全特性，而这些特性在美国和德国的pcp中都没有发挥作用。德国和美国的政府和大学网站的PCP强度明显高于泰国。受《隐私法》影响，泰国政府的PCP实力在2021年大幅增强。尽管如此，这仍然低于2016年德国和美国的结果。因此，影响PCP的标准因国家而异。

{"title":"The Comparison of Password Composition Policies Among US, German, and Thailand Samples","authors":"Jenjira Lomchan, R. Wiangsripanawan, S. F. Shahandashti","doi":"10.1109/JCSSE58229.2023.10202141","DOIUrl":"https://doi.org/10.1109/JCSSE58229.2023.10202141","url":null,"abstract":"The study by Mayer, Kirchner, and Volkamer published at SOUPS 2017 showed that the password composition policy (PCP) strength of both the US and German websites was not influenced by the security but by the usability features of the websites. Surprisingly, the PCP strength of the banking website category was the lowest, whereas the government website was the highest. Our aim in conducting the first study is to find whether 78 Thai frequently used websites in 2018 would yield the same surprising results. Our finding showed an opposite perspective, the highest PCP strength was from the banking websites, followed by university and government websites, respectively. Two more security features were added to our study: 2FA and HTTPS. Although some German websites employing 2FA allowed lower PCPs for better usability, Thai websites with 2FA did not loosen the password requirements. Also, employing HTTPS did not impact the PCP strength. The study with Thai websites was reinvestigated in 2021, two years after the Personal Data Protection Act (PDPA) was announced. The result showed that the median PCP strength of all Thailand samples had grown from 26.6 in 2018 to 31.0 in 2021. The banking websites still retained the highest PCP strength. A significant change appeared on the government websites, increasing from 29.9 to 40.4. In summary, the security features such as the size of services, and values of assets which play no part in both the US and German PCPs were heavily concerned by Thai websites. Government and university websites in Germany and USA gave much higher PCP strength than those in Thailand. The Thai government's PCP strength sharply increased in 2021 due to the privacy law. Nevertheless, it was still lower than the results in Germany and USA in 2016. Therefore, the criteria influencing PCP vary depending on the country.","PeriodicalId":298838,"journal":{"name":"2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"21 S6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132388517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Performance Improvement on a Learning Assessment Web Application Using AWS DynamoDB as a Cache Database 使用AWS DynamoDB作为缓存数据库的学习评估Web应用程序的性能改进

2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)

Pub Date : 2023-06-28 DOI: 10.1109/JCSSE58229.2023.10201973

Sirawit Tantiphuwanart, Nuengwong Tuaycharoen, Dittaya Wanvarie, Naruemon Pratanwanich, A. Suchato

When the COVID-19 pandemic began, the world had to step into a new normal world, forcing real-life activities to transform into online activities. Formal teaching, learning, and examination were also affected. The online learning assessment system is a highly complex system because it must be able to handle a mass of students taking exams simultaneously, especially during the midterm and the final exam week. Therefore, the system must have an acceptable response time and high reliability. Our assessment system had been developed utilizing MySQL as a database, but the response time was unsatisfactory for actual usage. This paper proposes methods to improve the system's efficiency during high traffic by utilizing Amazon Web Services (AWS) DynamoDB as a cache of MySQL since it has a higher read-and-write efficiency than MySQL database. The read operation will be conducted in the Viewing exam session operation. At the same time, the write policy used for Managing exam sessions is write-through, while the policy for Saving examinees' answers is write-back. Choosing between write-through and write-back policies should take into account the acceptable response time and required instant consistency validation. The evaluation results with JMeter reveal that the method can double the efficiency of the read operation. On the other hand, the write operation is less than 3 seconds, which is still within an acceptable limit. For improving the DynamoDB efficiency, further suggestions have also been given in this paper.

当COVID-19大流行开始时，世界不得不进入一个新常态，迫使现实生活中的活动转变为网络活动。正规的教学、学习和考试也受到了影响。在线学习评估系统是一个高度复杂的系统，因为它必须能够处理大量学生同时参加考试，特别是在期中和期末考试周。因此，系统必须具有可接受的响应时间和高可靠性。我们的评估系统是利用MySQL作为数据库开发的，但是实际使用时的响应时间不能令人满意。本文提出了利用Amazon Web Services (AWS) DynamoDB作为MySQL的缓存来提高系统在高流量时的效率的方法，因为它比MySQL数据库具有更高的读写效率。读操作将在“查看考试会话”操作中进行。同时，管理考试的写策略为透写，保存考生答案的写策略为回写。在write-through和write-back策略之间进行选择时，应该考虑可接受的响应时间和所需的即时一致性验证。用JMeter进行的评价结果表明，该方法可以将读取操作的效率提高一倍。另一方面，写操作少于3秒，这仍然在可接受的限制之内。为了提高DynamoDB的效率，本文还提出了进一步的建议。

{"title":"Performance Improvement on a Learning Assessment Web Application Using AWS DynamoDB as a Cache Database","authors":"Sirawit Tantiphuwanart, Nuengwong Tuaycharoen, Dittaya Wanvarie, Naruemon Pratanwanich, A. Suchato","doi":"10.1109/JCSSE58229.2023.10201973","DOIUrl":"https://doi.org/10.1109/JCSSE58229.2023.10201973","url":null,"abstract":"When the COVID-19 pandemic began, the world had to step into a new normal world, forcing real-life activities to transform into online activities. Formal teaching, learning, and examination were also affected. The online learning assessment system is a highly complex system because it must be able to handle a mass of students taking exams simultaneously, especially during the midterm and the final exam week. Therefore, the system must have an acceptable response time and high reliability. Our assessment system had been developed utilizing MySQL as a database, but the response time was unsatisfactory for actual usage. This paper proposes methods to improve the system's efficiency during high traffic by utilizing Amazon Web Services (AWS) DynamoDB as a cache of MySQL since it has a higher read-and-write efficiency than MySQL database. The read operation will be conducted in the Viewing exam session operation. At the same time, the write policy used for Managing exam sessions is write-through, while the policy for Saving examinees' answers is write-back. Choosing between write-through and write-back policies should take into account the acceptable response time and required instant consistency validation. The evaluation results with JMeter reveal that the method can double the efficiency of the read operation. On the other hand, the write operation is less than 3 seconds, which is still within an acceptable limit. For improving the DynamoDB efficiency, further suggestions have also been given in this paper.","PeriodicalId":298838,"journal":{"name":"2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"147 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116411998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Improving Pre-Trained Models for Multi-Label Classification in Stack Overflow: A Comparison of Imbalanced Data Handling Methods 改进堆栈溢出中多标签分类的预训练模型:不平衡数据处理方法的比较

2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)

Pub Date : 2023-06-28 DOI: 10.1109/JCSSE58229.2023.10202012

Arisa Umparat, S. Phoomvuthisarn

Tag classification is essential in Stack Overflow. Instead of combining through pages or replies of irrelevant information, users can easily and quickly pinpoint relevant posts and answers using tags. Since User-submitted posts can have multiple tags, classifying tags in Stack Overflow can be challenging. This results in an imbalance problem between labels in the whole labelset. Pretrained deep-learning models with small datasets can improve tag classification accuracy. Common multi-label resampling techniques with machine learning classifiers can also fix this issue. Still, few studies have explored which resampling technique can improve the performance of pre-trained deep models for predicting tags. To address this gap, we experimented to evaluate the effectiveness of ELECTRA, a powerful deep learning pre-trained model, with various multi-label resampling techniques in decreasing the imbalance that induces mislabeling in Stack Overflow's tagging posts. We compared seven resampling techniques, such as LP-ROS, ML-ROS, MLSMOTE, MLeNN, MLTL, ML-SOL, and REMEDIAL, to find the best method to mitigate the imbalance and improve tag prediction accuracy. Our results show that MLTL is the most effective selection to tackle the inequality in multi-label classification for our Stack Overflow data with deep learning scenarios. MLTL achieved 0.517, 0.804, 0.467, and 0.98 from the metrics Precision@l, Recall@5, F1-score@1, and AUC, respectively. Conversely, MLeNN gained only 0.323, 0.648, 0.277, and 0.95 from the same metrics.

标签分类在Stack Overflow中是必不可少的。用户可以使用标签轻松快速地找到相关的帖子和答案，而不是通过页面或不相关信息的回复进行组合。因为用户提交的文章可以有多个标签，所以在Stack Overflow中对标签进行分类是很有挑战性的。这将导致整个标签集中标签之间的不平衡问题。使用小数据集进行预训练的深度学习模型可以提高标签分类的准确性。使用机器学习分类器的常见多标签重采样技术也可以解决这个问题。然而，很少有研究探索哪种重采样技术可以提高预训练深度模型预测标签的性能。为了解决这一差距，我们通过实验来评估ELECTRA的有效性，ELECTRA是一个强大的深度学习预训练模型，使用各种多标签重采样技术来减少Stack Overflow标签帖子中导致错误标记的不平衡。我们比较了7种重采样技术，如LP-ROS、ML-ROS、MLSMOTE、MLeNN、MLTL、ML-SOL和补救，以找到缓解不平衡和提高标签预测精度的最佳方法。结果表明，对于深度学习场景下的Stack Overflow数据，MLTL是解决多标签分类不平等问题的最有效选择。MLTL分别从指标Precision@l、Recall@5、F1-score@1和AUC中获得0.517、0.804、0.467和0.98。相反，MLeNN从相同的指标中仅获得0.323、0.648、0.277和0.95。

{"title":"Improving Pre-Trained Models for Multi-Label Classification in Stack Overflow: A Comparison of Imbalanced Data Handling Methods","authors":"Arisa Umparat, S. Phoomvuthisarn","doi":"10.1109/JCSSE58229.2023.10202012","DOIUrl":"https://doi.org/10.1109/JCSSE58229.2023.10202012","url":null,"abstract":"Tag classification is essential in Stack Overflow. Instead of combining through pages or replies of irrelevant information, users can easily and quickly pinpoint relevant posts and answers using tags. Since User-submitted posts can have multiple tags, classifying tags in Stack Overflow can be challenging. This results in an imbalance problem between labels in the whole labelset. Pretrained deep-learning models with small datasets can improve tag classification accuracy. Common multi-label resampling techniques with machine learning classifiers can also fix this issue. Still, few studies have explored which resampling technique can improve the performance of pre-trained deep models for predicting tags. To address this gap, we experimented to evaluate the effectiveness of ELECTRA, a powerful deep learning pre-trained model, with various multi-label resampling techniques in decreasing the imbalance that induces mislabeling in Stack Overflow's tagging posts. We compared seven resampling techniques, such as LP-ROS, ML-ROS, MLSMOTE, MLeNN, MLTL, ML-SOL, and REMEDIAL, to find the best method to mitigate the imbalance and improve tag prediction accuracy. Our results show that MLTL is the most effective selection to tackle the inequality in multi-label classification for our Stack Overflow data with deep learning scenarios. MLTL achieved 0.517, 0.804, 0.467, and 0.98 from the metrics Precision@l, Recall@5, F1-score@1, and AUC, respectively. Conversely, MLeNN gained only 0.323, 0.648, 0.277, and 0.95 from the same metrics.","PeriodicalId":298838,"journal":{"name":"2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130191634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0