2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)最新文献

英文中文

A Backup Mechanism of Virtual Machine Checkpoint Image using ZFS Snapshots 基于ZFS快照的虚拟机检查点映像备份机制

2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)

Pub Date : 2023-06-28 DOI: 10.1109/JCSSE58229.2023.10202119

Tinnaphob Angkaprasert, K. Chanchio

This paper investigates the use of ZFS filesystem to improve the checkpointing mechanisms and the backup of the disk image of QEMU-KVM. In the traditional method, QEMU-KVM has to create a new QCOW2 overlay disk image file at every checkpoint operation. After checkpointing many times, the number of overlay disk images can be overwhelming. These overlay disk images have a dependency on one another. Therefore, the traditional method requires high maintenance costs and takes a long time to restore a VM from a checkpoint. In this paper, we introduce a Snapshot Manager to create a checkpoint of a VM, manage ZFS snapshots, and back up the snapshot from one host to another. By using ZFS filesystem, the Snapshot Manager reduces the number of disk images to one. This approach makes it much easier and causes very low overheads to manage VM checkpoints and disk images. In terms of performance, our experimental results show that ZFS takes significantly less restoration time of a VM than the traditional method at the cost of moderately higher backup time.

本文研究了使用ZFS文件系统来改进QEMU-KVM磁盘映像的检查点机制和备份。在传统方法中，QEMU-KVM必须在每个检查点操作中创建一个新的QCOW2覆盖磁盘映像文件。在多次检查点之后，覆盖磁盘映像的数量可能会非常多。这些覆盖磁盘映像彼此依赖。因此，传统方法维护成本高，从检查点恢复虚拟机的时间较长。在本文中，我们介绍了一个快照管理器来创建虚拟机的检查点，管理ZFS快照，并将快照从一台主机备份到另一台主机。通过使用ZFS文件系统，快照管理器将磁盘映像的数量减少到一个。这种方法使管理VM检查点和磁盘映像更加容易，开销也非常低。在性能方面，我们的实验结果表明，与传统方法相比，ZFS恢复VM所需的时间明显更少，但代价是备份时间略高。

引用次数: 0

Combining AI and Non-AI for a Smooth User Experience 结合AI和非AI以获得流畅的用户体验

2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)

Pub Date : 2023-06-28 DOI: 10.1109/JCSSE58229.2023.10201988

Thanakij Wanavit, Mario Quintana, Samuel Sallee, L. Klieb

Cranioplasty implants are commonly used in the treatment of traumatic brain injuries. 3D-printed titanium has emerged as a suitable material for creating these products. However, their design and manufacturing process involves numerous skilled professionals, including designers, printers, finishers, inspectors, and communication liaisons with surgeons. We have developed a system that automates the design process, streamlines communication, and assists all relevant parties in completing their tasks more efficiently. Our system's backend utilizes deep learning algorithms to automatically read and segment CT scans, subsequently generating implant designs. The initial draft of the design is produced within 5 minutes, a significant improvement from the 5–7 days required by a human technician. The fully serverless backend demands minimal IT maintenance and offers robust resilience and security. The frontend, developed using Swift 5, is compatible with iOS, iPadOS, and macOS platforms. The application ensures a secure and convenient data pipeline with end-to-end encryption, visually appealing rendering, and high speed.

颅骨成形术是治疗创伤性脑损伤的常用方法。3d打印钛已经成为制造这些产品的合适材料。然而，它们的设计和制造过程涉及许多熟练的专业人员，包括设计师、印刷商、精加工商、检验员以及与外科医生的沟通联络员。我们开发了一个系统，使设计过程自动化，简化沟通，并协助所有相关方更有效地完成任务。我们的系统后端利用深度学习算法自动读取和分割CT扫描，随后生成植入物设计。设计的初稿在5分钟内完成，相比人工技术人员需要的5 - 7天，这是一个显著的改进。完全无服务器的后端需要最少的IT维护，并提供强大的弹性和安全性。前端使用Swift 5开发，兼容iOS、ipad和macOS平台。该应用程序通过端到端加密、视觉上吸引人的呈现和高速确保了安全方便的数据管道。

{"title":"Combining AI and Non-AI for a Smooth User Experience","authors":"Thanakij Wanavit, Mario Quintana, Samuel Sallee, L. Klieb","doi":"10.1109/JCSSE58229.2023.10201988","DOIUrl":"https://doi.org/10.1109/JCSSE58229.2023.10201988","url":null,"abstract":"Cranioplasty implants are commonly used in the treatment of traumatic brain injuries. 3D-printed titanium has emerged as a suitable material for creating these products. However, their design and manufacturing process involves numerous skilled professionals, including designers, printers, finishers, inspectors, and communication liaisons with surgeons. We have developed a system that automates the design process, streamlines communication, and assists all relevant parties in completing their tasks more efficiently. Our system's backend utilizes deep learning algorithms to automatically read and segment CT scans, subsequently generating implant designs. The initial draft of the design is produced within 5 minutes, a significant improvement from the 5–7 days required by a human technician. The fully serverless backend demands minimal IT maintenance and offers robust resilience and security. The frontend, developed using Swift 5, is compatible with iOS, iPadOS, and macOS platforms. The application ensures a secure and convenient data pipeline with end-to-end encryption, visually appealing rendering, and high speed.","PeriodicalId":298838,"journal":{"name":"2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115090401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Comparative Study of LSTM, GRU, BiLSTM and BiGRU to Predict Dissolved Oxygen LSTM、GRU、BiLSTM和BiGRU预测溶解氧的比较研究

2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)

Pub Date : 2023-06-28 DOI: 10.1109/JCSSE58229.2023.10202128

Narongsak Putpuek, Apiradee Putpuek, Apichart Sungthong

In aquaculture, dissolved oxygen (DO) levels affect fish growth and survival. Automated monitoring and prediction of DO is challenging and becomes expensive if unnecessary sensors are used. This study aims to identify the optimal water and environmental parameters for DO prediction. Data from the fishpond station of Rajabhat Rajanagarindra University were pre-processed and used for training using LSTM, GRU, BiLSTM, and BiGRU. The performance of the models was evaluated and contrasted using three error measures. The results showed that GRU gave the best performance compared to the other models. In conclusion, the best parameters for DO prediction are water pH and water temperature.

在水产养殖中，溶解氧(DO)水平影响鱼类生长和存活。如果使用不必要的传感器，对DO的自动监测和预测是具有挑战性的，而且会变得昂贵。本研究旨在确定最佳的水和环境参数，用于DO预测。对来自Rajabhat Rajanagarindra大学鱼塘站的数据进行预处理，并使用LSTM、GRU、BiLSTM和BiGRU进行训练。使用三种误差度量对模型的性能进行了评价和对比。结果表明，与其他模型相比，GRU的性能最好。综上所述，水体pH和水温是预测水体溶解氧的最佳参数。

引用次数: 0

Sleep Behavior Classification Based on Clusters of Sleep Quality 基于睡眠质量聚类的睡眠行为分类

2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)

Pub Date : 2023-06-28 DOI: 10.1109/JCSSE58229.2023.10202108

Pawonrat Khumngoen, S. Sinthupinyo

Sleep is a significant activity that can influence livelihoods. The critical part of sleep is recovery, repairing cells physically, and preparing energy for the beginning of the next living days. Good sleep can refer to strong health and mental health, which is capably measured by sleep quality. Normally, many works used the whole dataset to train models. But we believe that each person has a different sleeping pattern. So, in this paper, we presented a classification of sleep behavior based on a cluster of sleep quality. We first clustered people who have similar sleep patterns using the Principal Component Analysis technique and K-means algorithm. Then, we used Logistic Regression and Random Forest algorithm to classify sleep behavior. We performed models from the analysis with Leave-one-out cross-validation. The results showed that the accuracy given by Random Forest algorithm models in every group was better than Logistic Regression models between 2.1% and 7.6%.

睡眠是一项影响生计的重要活动。睡眠的关键部分是恢复，修复身体细胞，为下一天的开始准备能量。良好的睡眠可以指身体健康和心理健康，这可以通过睡眠质量来衡量。通常，许多工作使用整个数据集来训练模型。但我们相信每个人都有不同的睡眠模式。因此，在本文中，我们提出了一种基于睡眠质量聚类的睡眠行为分类。我们首先使用主成分分析技术和K-means算法对睡眠模式相似的人进行聚类。然后，我们使用Logistic回归和随机森林算法对睡眠行为进行分类。我们用留一交叉验证法从分析中建立模型。结果表明，随机森林算法模型在各组的准确率均优于Logistic回归模型，准确率在2.1% ~ 7.6%之间。

引用次数: 0

Analysis of the 5Rs in Thailand Medication Error Classification through Natural Language Processing 基于自然语言处理的泰国用药差错5Rs分类分析

2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)

Pub Date : 2023-06-28 DOI: 10.1109/JCSSE58229.2023.10202096

Peachyasitt Udomnuchaisup, Aurawan Imsombut, Picha Suwannahitatorn, Thammakorn Saethang

Medication errors threaten patient safety considerably, underscoring the necessity for enhanced detection and prevention techniques. A prevalent classification system in hospitals relies on the standard practice of medication administration known as the Five Rights (5R). This study seeks to develop an NLP-based tool designed to expand 5R error categorization coverage and alleviate the workload of medical professionals. The proposed method focuses on Thai medical text, incorporating Thai and English vocabulary. In this investigation, we developed a supervised learning classification framework using the Universal Sentence Encoder (USE) for sentence embedding, followed by an Artificial Neural Network (ANN) for model training. Additionally, we explored a zero-shot classification model employing pre-trained Large Language Models (PLMs). Our findings reveal that the supervised learning classification model provides the most favorable performance, albeit with the limitation of reliance on labeled datasets, which can be resource intensive. Conversely, the zero-shot classification framework's performance is less optimal. However, future advancements in Thai medical PLMs may improve efficacy and present a viable alternative for medical data analysis without dependence on labeled datasets. This initiative lays the groundwork for potential future applications and advantages within Thailand's medical domain.

用药错误严重威胁患者安全，强调了加强检测和预防技术的必要性。医院中流行的分类系统依赖于被称为五权(5R)的药物管理的标准实践。本研究旨在开发一种基于nlp的工具，旨在扩大5R错误分类的覆盖范围，减轻医疗专业人员的工作量。该方法以泰语医学文本为重点，结合泰语和英语词汇。在这项研究中，我们开发了一个监督学习分类框架，使用通用句子编码器(USE)进行句子嵌入，然后使用人工神经网络(ANN)进行模型训练。此外，我们探索了使用预训练的大型语言模型(PLMs)的零射击分类模型。我们的研究结果表明，监督学习分类模型提供了最有利的性能，尽管依赖于标记数据集的限制，这可能是资源密集型的。相反，零射击分类框架的性能不太理想。然而，泰国医疗plm的未来发展可能会提高疗效，并为不依赖标记数据集的医疗数据分析提供可行的替代方案。这一举措为泰国医疗领域潜在的未来应用和优势奠定了基础。

{"title":"Analysis of the 5Rs in Thailand Medication Error Classification through Natural Language Processing","authors":"Peachyasitt Udomnuchaisup, Aurawan Imsombut, Picha Suwannahitatorn, Thammakorn Saethang","doi":"10.1109/JCSSE58229.2023.10202096","DOIUrl":"https://doi.org/10.1109/JCSSE58229.2023.10202096","url":null,"abstract":"Medication errors threaten patient safety considerably, underscoring the necessity for enhanced detection and prevention techniques. A prevalent classification system in hospitals relies on the standard practice of medication administration known as the Five Rights (5R). This study seeks to develop an NLP-based tool designed to expand 5R error categorization coverage and alleviate the workload of medical professionals. The proposed method focuses on Thai medical text, incorporating Thai and English vocabulary. In this investigation, we developed a supervised learning classification framework using the Universal Sentence Encoder (USE) for sentence embedding, followed by an Artificial Neural Network (ANN) for model training. Additionally, we explored a zero-shot classification model employing pre-trained Large Language Models (PLMs). Our findings reveal that the supervised learning classification model provides the most favorable performance, albeit with the limitation of reliance on labeled datasets, which can be resource intensive. Conversely, the zero-shot classification framework's performance is less optimal. However, future advancements in Thai medical PLMs may improve efficacy and present a viable alternative for medical data analysis without dependence on labeled datasets. This initiative lays the groundwork for potential future applications and advantages within Thailand's medical domain.","PeriodicalId":298838,"journal":{"name":"2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"27 11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126064981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Design and Implementation of a Data Governance Framework and Platform: A Case Study of a National Research Organization of Thailand 数据治理框架和平台的设计与实施:以泰国某国家研究机构为例

2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)

Pub Date : 2023-06-28 DOI: 10.1109/JCSSE58229.2023.10201972

Sapa Chanyachatchawan, Krich Nasingkun, Patipat Tumsangthong, Porntiwa Chata, M. Buranarach, Monsak Socharoentum

In the current era of extensive data usage across industries, data collection, preservation, utilization, and organization has become more challenging and nuanced because it is necessary to consider critical concerns such as data security, privacy, and legal issues, apart from efficiency issues. As a result, Thai government initiated the idea and effort to implement data governance throughout the government agency. This paper showcases the implementation of data governance in a governmental research organization with highly diverse structured and unstructured data. The implementation follows international standards and the guidelines of the Digital Government Development Agency (DGA). The executives set up the working body, including the Data Governance Council and Data Stewards, responsible for setting up and deploying policies and regulations. Creating awareness and the necessary infrastructure are the main focuses in the first-year phase. The metadata was designed to extend DGA's version and match the organization's unique requirements. A data catalog platform was developed accordingly. We organized activities to boost employee awareness and participation, including advertising and data catalog platform training. By the end of the first year of implementation, every organization unit had registered at least one data record into the data catalog.

在跨行业广泛使用数据的当今时代，数据的收集、保存、利用和组织变得更具挑战性和细微差别，因为除了效率问题外，还需要考虑数据安全、隐私和法律问题等关键问题。因此，泰国政府发起了在整个政府机构中实施数据治理的想法和努力。本文展示了在一个具有高度多样化的结构化和非结构化数据的政府研究机构中数据治理的实现。实施遵循国际标准和数字政府发展局(DGA)的指导方针。执行人员建立工作机构，包括数据治理委员会和数据管理员，负责建立和部署政策和法规。建立意识和必要的基础设施是第一年阶段的主要重点。设计元数据是为了扩展DGA的版本，并匹配组织的独特需求。据此开发了数据目录平台。我们组织了一些活动来提高员工的意识和参与度，包括广告和数据目录平台培训。到实施的第一年结束时，每个组织单位至少在数据目录中登记了一条数据记录。

{"title":"Design and Implementation of a Data Governance Framework and Platform: A Case Study of a National Research Organization of Thailand","authors":"Sapa Chanyachatchawan, Krich Nasingkun, Patipat Tumsangthong, Porntiwa Chata, M. Buranarach, Monsak Socharoentum","doi":"10.1109/JCSSE58229.2023.10201972","DOIUrl":"https://doi.org/10.1109/JCSSE58229.2023.10201972","url":null,"abstract":"In the current era of extensive data usage across industries, data collection, preservation, utilization, and organization has become more challenging and nuanced because it is necessary to consider critical concerns such as data security, privacy, and legal issues, apart from efficiency issues. As a result, Thai government initiated the idea and effort to implement data governance throughout the government agency. This paper showcases the implementation of data governance in a governmental research organization with highly diverse structured and unstructured data. The implementation follows international standards and the guidelines of the Digital Government Development Agency (DGA). The executives set up the working body, including the Data Governance Council and Data Stewards, responsible for setting up and deploying policies and regulations. Creating awareness and the necessary infrastructure are the main focuses in the first-year phase. The metadata was designed to extend DGA's version and match the organization's unique requirements. A data catalog platform was developed accordingly. We organized activities to boost employee awareness and participation, including advertising and data catalog platform training. By the end of the first year of implementation, every organization unit had registered at least one data record into the data catalog.","PeriodicalId":298838,"journal":{"name":"2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129406508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cell-key Perturbation Data Privacy Procedure for Security Operations Center Team 安全操作中心团队的单元键扰动数据隐私程序

2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)

Pub Date : 2023-06-28 DOI: 10.1109/JCSSE58229.2023.10202111

Supornpol Nukrongsin, Chetneti Srisa-An

Data privacy laws such as GDPR in Europe and PDPA in Thailand are both laws to protect personal data. The data center task is also a data service organization that needs to do data publishing services among their stakeholders. The challenging task for the Security Operation Center (SOC) team is to analyze all security risks such as data breaches. Most cases of data breach problems are overlooked cases that occur indirectly by guessing from other prior knowledge. For example, attackers combine our dataset with other data sets to reidentify personal data. This attack is called a re-Identification attack that causes a data breach. To fix the risk, statistical noise control techniques for data anonymization are explored and implemented in this study. A Cell-key perturbation is to fix the attack without modifying an original dataset but return an answer dataset with noise addition per query instead.

数据隐私法，如欧洲的GDPR和泰国的PDPA，都是保护个人数据的法律。数据中心任务也是一个数据服务组织，需要在其涉众之间执行数据发布服务。对于安全运营中心(SOC)团队来说，最具挑战性的任务是分析所有的安全风险，例如数据泄露。大多数数据泄露问题都是通过猜测其他先验知识间接发生的被忽视的情况。例如，攻击者将我们的数据集与其他数据集结合起来重新识别个人数据。这种攻击被称为导致数据泄露的重新识别攻击。为了解决这一风险，本研究探索并实施了数据匿名化的统计噪声控制技术。Cell-key扰动是在不修改原始数据集的情况下修复攻击，而是返回每个查询添加噪声的答案数据集。

引用次数: 0

Effects of Speech Duration on Preserving the Identity of Synthesized Voice 语音持续时间对保持合成语音身份的影响

2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)

Pub Date : 2023-06-28 DOI: 10.1109/JCSSE58229.2023.10202157

Papapin Supmee, Klittiya Suwanmalai, Natkrita Hanchoenkul, Napa Sae-Bae, Banphatee Khomkham

This paper studied the identity preserving performance of the speech synthesized model when durations of speech samples in Thai language were varied. In particular, two experiments were designed to investigate such property of the model. The first experiment was set to reflect the identity preserving performance of the identity vector derived from speech synthesized model. The results suggest that better identity vector quality is achieved when the longer duration of a Thai speech signal is used as shorter speech signals result in identity vectors that are more dispersed. The second experiment was set to directly reflect the identity preserving performance of the synthesized voice signal generated from the speech synthesized model in independent speaker recognition systems. The results similarly suggest that a better identity-preserving voice signal is achieved when the longer duration of Thai speech signal is used as shorter speech signals result in synthesized voice signals with larger distances from the real voice signals. Therefore, the trade-off between usability and quality of synthesized voices must be carefully considered when developing applications from such models. In addition, the investigation framework used in this study could be used to evaluate the newly developed identity-preserving speech synthesized models.

研究了泰语语音样本持续时间变化时语音合成模型的身份保持性能。特别地，设计了两个实验来研究模型的这种性质。第一个实验是为了反映由语音合成模型导出的身份向量的身份保持性能。结果表明，当使用较长的泰语语音信号持续时间时，可以获得更好的身份向量质量，因为较短的语音信号导致更分散的身份向量。第二个实验是为了直接反映由语音合成模型生成的合成语音信号在独立说话人识别系统中的身份保持性能。结果同样表明，当使用较长的泰语语音信号时，可以获得更好的保持身份的语音信号，因为较短的语音信号会导致合成的语音信号与真实语音信号的距离较大。因此，在基于这些模型开发应用程序时，必须仔细考虑合成语音的可用性和质量之间的权衡。此外，本研究使用的调查框架可用于评估新开发的保留身份的语音合成模型。

{"title":"Effects of Speech Duration on Preserving the Identity of Synthesized Voice","authors":"Papapin Supmee, Klittiya Suwanmalai, Natkrita Hanchoenkul, Napa Sae-Bae, Banphatee Khomkham","doi":"10.1109/JCSSE58229.2023.10202157","DOIUrl":"https://doi.org/10.1109/JCSSE58229.2023.10202157","url":null,"abstract":"This paper studied the identity preserving performance of the speech synthesized model when durations of speech samples in Thai language were varied. In particular, two experiments were designed to investigate such property of the model. The first experiment was set to reflect the identity preserving performance of the identity vector derived from speech synthesized model. The results suggest that better identity vector quality is achieved when the longer duration of a Thai speech signal is used as shorter speech signals result in identity vectors that are more dispersed. The second experiment was set to directly reflect the identity preserving performance of the synthesized voice signal generated from the speech synthesized model in independent speaker recognition systems. The results similarly suggest that a better identity-preserving voice signal is achieved when the longer duration of Thai speech signal is used as shorter speech signals result in synthesized voice signals with larger distances from the real voice signals. Therefore, the trade-off between usability and quality of synthesized voices must be carefully considered when developing applications from such models. In addition, the investigation framework used in this study could be used to evaluate the newly developed identity-preserving speech synthesized models.","PeriodicalId":298838,"journal":{"name":"2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127168359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Customer Churn Prediction Using Weight Average Ensemble Machine Learning Model 基于加权平均集成机器学习模型的客户流失预测

2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)

Pub Date : 2023-06-28 DOI: 10.1109/JCSSE58229.2023.10202105

I. Nyoman, Mahayasa Adiputra, Paweena Wanchai

Obtaining a new customer is more expensive than predicting the churn probability of an existing customer. A high-performance model in churn prediction can help a company to reduce the cost of obtaining a new customer. Ensemble machine learning is one of the machine learning techniques that can be used in prediction problems. Many studies have shown ensemble machine learning achieves superior results. The main purpose of this study is to build a framework with several combinations of preprocessing techniques and an ensemble of two machine learning models, XGBoost and random forest. The dataset for this study is from a public dataset platform; the experiment uses two different sectors: telecom and insurance. This study achieved 0.850 F1-score in the telecom sector dataset and the insurance sector achieved 0.947 F1-score and 28 seconds in processing time. Compared with the latest work in the same dataset, our model achieved a greater effectiveness in F1-score performance and efficiency performance in dataset 1, but slower algorithm time in dataset 2.

获得新客户比预测现有客户的流失概率要昂贵得多。一个高性能的客户流失预测模型可以帮助公司降低获得新客户的成本。集成机器学习是一种可以用于预测问题的机器学习技术。许多研究表明，集成机器学习取得了优异的效果。本研究的主要目的是构建一个框架，其中包含几种预处理技术的组合以及两个机器学习模型(XGBoost和随机森林)的集成。本研究的数据集来自一个公共数据集平台;这个实验使用了两个不同的部门:电信和保险。本研究在电信行业数据集的f1得分为0.850，保险行业数据集的f1得分为0.947，处理时间为28秒。与同一数据集的最新成果相比，我们的模型在数据集1的f1得分性能和效率性能上取得了更大的效果，但在数据集2上的算法时间较慢。

引用次数: 0

Development of a novel quality control indicator for coffee roasting based on the digitalization of smell by a portable electronic nose 基于便携式电子鼻气味数字化的新型咖啡烘焙质量控制指标的研制

2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)

Pub Date : 2023-06-28 DOI: 10.1109/JCSSE58229.2023.10201962

Varanya Somaudon, Pathapee Sakuldee, T. Kerdcharoen

Northern Thailand is home to several Arabica coffee-growing regions, including Mae-kampong, Teentok, Mae-lord, and Monngo Valleys, whose coffees are featured throughout this study. These coffees have distinct aromas and flavors due to their varied cultivation locations, which are influenced by unique climatic conditions. The purpose of this paper is to comprehend the aroma of coffees brought from various locations and roasted under the same conditions. Our lab-made electronic nose (e-nose) was used to digitalize and analyze the smell of coffee. In order to monitor the coffee scent throughout roasting and assess how similar the aromas of coffee samples taken from various locations are to one another, principal component analysis and hierarchical cluster analysis were used. It was found that our e-nose system is an effective tool for determining the geo-location of the coffee origin as well as for quality control of coffee production.

泰国北部是几个阿拉比卡咖啡种植区的所在地，包括Mae-kampong, Teentok, Mae-lord和Monngo山谷，其咖啡在整个研究中都有特色。这些咖啡由于其不同的种植地点，受到独特的气候条件的影响，具有独特的香气和风味。本文的目的是了解咖啡的香气来自不同的地方，并在相同的条件下烘焙。我们实验室制造的电子鼻(电子鼻)被用来数字化和分析咖啡的气味。为了监测整个烘焙过程中的咖啡气味，并评估从不同地点采集的咖啡样品的香气有多相似，使用了主成分分析和分层聚类分析。研究发现，我们的电子鼻系统是确定咖啡原产地地理位置以及咖啡生产质量控制的有效工具。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀