首页 > 最新文献

2022 IEEE 4th International Conference on Cognitive Machine Intelligence (CogMI)最新文献

英文 中文
Performance Evaluation of Text Augmentation Methods with BERT on Small-sized, Imbalanced Datasets 基于BERT的文本增强方法在小型不平衡数据集上的性能评价
Pub Date : 2022-12-01 DOI: 10.1109/CogMI56440.2022.00027
Lingshu Hu, Can Li, Wenbo Wang, Bin Pang, Yi Shang
Recently deep learning methods have achieved great success in understanding and analyzing text messages. In real-world applications, however, labeled text data are often small-sized and imbalanced in classes due to the high cost of data collection and human annotation, limiting the performance of deep learning classifiers. Therefore, this study explores an understudied area—how sample sizes and imbalance ratios influence the performance of deep learning models and augmentation methods—and provides a solution to this problem. Specifically, this study examines the performance of BERT, Word2Vec, and WordNet augmentation methods with BERT fine-tuning on datasets of sizes 500, 1,000, and 2,000 and imbalance ratios of 4:1 and 9:1. Experimental results show that BERT augmentation improves the performance of BERT in detecting the minority class, and the improvement is most significantly (15.6–40.4% F1 increase compared to the base model and 2.8%–10.4% F1 increase compared to the model with the oversampling method) when the data size is small (e.g., 500 training documents) and highly imbalanced (e.g., 9:1). When the data size increases or the imbalance ratio decreases, the improvement generated by the BERT augmentation becomes smaller or insignificant. Moreover, BERT augmentation plus BERT fine-tuning achieves the best performance compared to other models and methods, demonstrating a promising solution for small-sized, highly imbalanced text classification tasks.
最近,深度学习方法在理解和分析短信方面取得了巨大的成功。然而,在现实世界的应用中,由于数据收集和人工注释的高成本,标记的文本数据通常尺寸较小,并且在类中不平衡,从而限制了深度学习分类器的性能。因此,本研究探索了一个未被充分研究的领域——样本大小和不平衡比例如何影响深度学习模型和增强方法的性能——并提供了一个解决方案。具体来说,本研究考察了BERT、Word2Vec和WordNet增强方法在数据集规模为500、1000和2000、失衡比例为4:1和9:1的情况下的性能。实验结果表明,BERT增强提高了BERT检测少数类的性能,并且当数据规模较小(例如500个训练文档)且高度不平衡(例如9:1)时,改进效果最为显著(与基本模型相比F1提高15.6-40.4%,与过采样方法模型相比F1提高2.8%-10.4%)。当数据量增大或失衡比减小时,BERT增强所产生的改进变小或不显著。此外,与其他模型和方法相比,BERT增强和BERT微调实现了最佳性能,为小尺寸、高度不平衡的文本分类任务展示了一个有前途的解决方案。
{"title":"Performance Evaluation of Text Augmentation Methods with BERT on Small-sized, Imbalanced Datasets","authors":"Lingshu Hu, Can Li, Wenbo Wang, Bin Pang, Yi Shang","doi":"10.1109/CogMI56440.2022.00027","DOIUrl":"https://doi.org/10.1109/CogMI56440.2022.00027","url":null,"abstract":"Recently deep learning methods have achieved great success in understanding and analyzing text messages. In real-world applications, however, labeled text data are often small-sized and imbalanced in classes due to the high cost of data collection and human annotation, limiting the performance of deep learning classifiers. Therefore, this study explores an understudied area—how sample sizes and imbalance ratios influence the performance of deep learning models and augmentation methods—and provides a solution to this problem. Specifically, this study examines the performance of BERT, Word2Vec, and WordNet augmentation methods with BERT fine-tuning on datasets of sizes 500, 1,000, and 2,000 and imbalance ratios of 4:1 and 9:1. Experimental results show that BERT augmentation improves the performance of BERT in detecting the minority class, and the improvement is most significantly (15.6–40.4% F1 increase compared to the base model and 2.8%–10.4% F1 increase compared to the model with the oversampling method) when the data size is small (e.g., 500 training documents) and highly imbalanced (e.g., 9:1). When the data size increases or the imbalance ratio decreases, the improvement generated by the BERT augmentation becomes smaller or insignificant. Moreover, BERT augmentation plus BERT fine-tuning achieves the best performance compared to other models and methods, demonstrating a promising solution for small-sized, highly imbalanced text classification tasks.","PeriodicalId":211430,"journal":{"name":"2022 IEEE 4th International Conference on Cognitive Machine Intelligence (CogMI)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126747571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating the Privacy Exposure of Interpretable Global Explainers 评估可解释全局解释器的隐私暴露
Pub Date : 2022-12-01 DOI: 10.1109/CogMI56440.2022.00012
Francesca Naretto, A. Monreale, F. Giannotti
In recent years we are witnessing the diffusion of AI systems based on powerful Machine Learning models which find application in many critical contexts such as medicine, financial market and credit scoring. In such a context it is particularly important to design Trustworthy AI systems while guaranteeing transparency, with respect to their decision reasoning and privacy protection. Although many works in the literature addressed the lack of transparency and the risk of privacy exposure of Machine Learning models, the privacy risks of explainers have not been appropriately studied. This paper presents a methodology for evaluating the privacy exposure raised by interpretable global explainers able to imitate the original black-box classifier. Our methodology exploits the well-known Membership Inference Attack. The experimental results highlight that global explainers based on interpretable trees lead to an increase in privacy exposure.
近年来,我们目睹了基于强大机器学习模型的人工智能系统的扩散,这些模型在医学、金融市场和信用评分等许多关键环境中得到了应用。在这种情况下,设计值得信赖的人工智能系统,同时保证其决策推理和隐私保护的透明度,就显得尤为重要。尽管文献中的许多工作都解决了机器学习模型缺乏透明度和隐私暴露风险的问题,但解释器的隐私风险尚未得到适当的研究。本文提出了一种方法来评估由可解释的全局解释器引起的隐私暴露,该解释器能够模仿原始的黑盒分类器。我们的方法利用了众所周知的成员推理攻击。实验结果强调,基于可解释树的全局解释器导致隐私暴露的增加。
{"title":"Evaluating the Privacy Exposure of Interpretable Global Explainers","authors":"Francesca Naretto, A. Monreale, F. Giannotti","doi":"10.1109/CogMI56440.2022.00012","DOIUrl":"https://doi.org/10.1109/CogMI56440.2022.00012","url":null,"abstract":"In recent years we are witnessing the diffusion of AI systems based on powerful Machine Learning models which find application in many critical contexts such as medicine, financial market and credit scoring. In such a context it is particularly important to design Trustworthy AI systems while guaranteeing transparency, with respect to their decision reasoning and privacy protection. Although many works in the literature addressed the lack of transparency and the risk of privacy exposure of Machine Learning models, the privacy risks of explainers have not been appropriately studied. This paper presents a methodology for evaluating the privacy exposure raised by interpretable global explainers able to imitate the original black-box classifier. Our methodology exploits the well-known Membership Inference Attack. The experimental results highlight that global explainers based on interpretable trees lead to an increase in privacy exposure.","PeriodicalId":211430,"journal":{"name":"2022 IEEE 4th International Conference on Cognitive Machine Intelligence (CogMI)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122314550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Thoughts on Non-IID Data Impact in Healthcare with Federated Learning Medical Blockchain 联邦学习医疗区块链对医疗保健中非iid数据影响的思考
Pub Date : 2022-12-01 DOI: 10.1109/CogMI56440.2022.00013
Zonyin Shae, Kun-Yi Chen, Chi-Yu Chang, Yuan-Yu Tsai, C. Chou, William I. Baskett, Chi-Ren Shyu, J. J. Tsai
We share the common hypothesis/belief that the more aggregated good quality training data, the better the performance that can be attained by the resulting Artificial Intelligence (AI) model. However, this common belief, in general, is not true in the medical area, since healthcare data sets sourced from different hospitals are often not identically distributed (Non-IID). This imposes severe technical challenges for effectively aggregating the individual hospital data sets together. In this vision paper, instead of offering complete solutions, we will discuss some questions and food for thought with the goal of aiding effective data aggregation and improving federated learning (FL) AI model performance: (1) benchmark and measure the Non-IID degree of medical data sets. (2) include the Non-IID degree metrics in the FL data aggregation mechanism. (3) search for the optimal global model creation strategy among a group of many medical data sets. (4) investigate FL performance better than the centralized learning. This paper will discuss these questions by outlining a visionary approach for exploring a medical blockchain FL mechanism to effectively aggregate medical data across multiple healthcare systems to serve large populations with broad demographics.
我们有一个共同的假设/信念,即聚合的高质量训练数据越多,由此产生的人工智能(AI)模型的性能就越好。然而,一般来说,这种普遍的信念在医疗领域并不正确,因为来自不同医院的医疗保健数据集通常不是相同分布的(非iid)。这对有效地将各个医院数据集聚合在一起提出了严峻的技术挑战。在这篇愿景论文中,我们不提供完整的解决方案,而是讨论一些问题和思考,目的是帮助有效的数据聚合和提高联邦学习(FL)人工智能模型的性能:(1)对医疗数据集的非iid程度进行基准测试和测量。(2)在FL数据聚合机制中加入非iid度度量。(3)在一组医疗数据集中寻找最优的全局模型创建策略。(4)调查FL学习优于集中式学习。本文将通过概述一种有远见的方法来探讨这些问题,该方法用于探索医疗区块链FL机制,以有效地聚合多个医疗保健系统的医疗数据,为具有广泛人口统计数据的大量人群提供服务。
{"title":"Thoughts on Non-IID Data Impact in Healthcare with Federated Learning Medical Blockchain","authors":"Zonyin Shae, Kun-Yi Chen, Chi-Yu Chang, Yuan-Yu Tsai, C. Chou, William I. Baskett, Chi-Ren Shyu, J. J. Tsai","doi":"10.1109/CogMI56440.2022.00013","DOIUrl":"https://doi.org/10.1109/CogMI56440.2022.00013","url":null,"abstract":"We share the common hypothesis/belief that the more aggregated good quality training data, the better the performance that can be attained by the resulting Artificial Intelligence (AI) model. However, this common belief, in general, is not true in the medical area, since healthcare data sets sourced from different hospitals are often not identically distributed (Non-IID). This imposes severe technical challenges for effectively aggregating the individual hospital data sets together. In this vision paper, instead of offering complete solutions, we will discuss some questions and food for thought with the goal of aiding effective data aggregation and improving federated learning (FL) AI model performance: (1) benchmark and measure the Non-IID degree of medical data sets. (2) include the Non-IID degree metrics in the FL data aggregation mechanism. (3) search for the optimal global model creation strategy among a group of many medical data sets. (4) investigate FL performance better than the centralized learning. This paper will discuss these questions by outlining a visionary approach for exploring a medical blockchain FL mechanism to effectively aggregate medical data across multiple healthcare systems to serve large populations with broad demographics.","PeriodicalId":211430,"journal":{"name":"2022 IEEE 4th International Conference on Cognitive Machine Intelligence (CogMI)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128495506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Multilingual Virtual Guide for Self-Attachment Technique 自我依恋技术的多语言虚拟指南
Pub Date : 2022-12-01 DOI: 10.1109/CogMI56440.2022.00025
Alicia Jiayun Law, Ruoyu Hu, Lisa Alazraki, A. Gopalan, Neophytos Polydorou, A. Edalat
In this work, we propose a computational framework that leverages existing out-of-language data to create a conversational agent for the delivery of Self-Attachment Technique (SAT) in Mandarin. Our framework does not require large-scale human translations, yet it achieves a comparable performance whilst also maintaining safety and reliability. We propose two different methods of augmenting available response data through empathetic rewriting. We evaluate our chatbot against a previous, English-only SAT chatbot through non-clinical human trials (N = 42), each lasting five days, and quantitatively show that we are able to attain a comparable level of performance to the English SAT chatbot. We provide qualitative analysis on the limitations of our study and suggestions with the aim of guiding future improvements.
在这项工作中,我们提出了一个计算框架,利用现有的语言外数据来创建一个会话代理,用于用普通话传递自我依恋技术(SAT)。我们的框架不需要大规模的人工翻译,但它在保持安全性和可靠性的同时取得了相当的性能。我们提出了两种不同的方法,通过共情重写增加可用的响应数据。我们通过非临床人体试验(N = 42)将我们的聊天机器人与之前的英语SAT聊天机器人进行了比较,每次持续五天,并定量地表明我们能够达到与英语SAT聊天机器人相当的性能水平。我们对研究的局限性进行了定性分析,并提出了一些建议,以指导未来的改进。
{"title":"A Multilingual Virtual Guide for Self-Attachment Technique","authors":"Alicia Jiayun Law, Ruoyu Hu, Lisa Alazraki, A. Gopalan, Neophytos Polydorou, A. Edalat","doi":"10.1109/CogMI56440.2022.00025","DOIUrl":"https://doi.org/10.1109/CogMI56440.2022.00025","url":null,"abstract":"In this work, we propose a computational framework that leverages existing out-of-language data to create a conversational agent for the delivery of Self-Attachment Technique (SAT) in Mandarin. Our framework does not require large-scale human translations, yet it achieves a comparable performance whilst also maintaining safety and reliability. We propose two different methods of augmenting available response data through empathetic rewriting. We evaluate our chatbot against a previous, English-only SAT chatbot through non-clinical human trials (N = 42), each lasting five days, and quantitatively show that we are able to attain a comparable level of performance to the English SAT chatbot. We provide qualitative analysis on the limitations of our study and suggestions with the aim of guiding future improvements.","PeriodicalId":211430,"journal":{"name":"2022 IEEE 4th International Conference on Cognitive Machine Intelligence (CogMI)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125318678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
PSLotto: A Privacy-Enhanced COVID Lottery System PSLotto:增强隐私的COVID彩票系统
Pub Date : 2022-12-01 DOI: 10.1109/CogMI56440.2022.00019
Stacey Truex, Giorgi Alavidze
In March 2020, the World Health Organization (WHO) declared the novel coronavirus (COVID-19) a global pandemic. Globally the rapid spread of COVID-19 ground economies to a halt with stay at home orders and took the lives of millions of people. Therefore when vaccines became available as a tool to slow the spread of the COVID-19 virus, governments world-wide were looking to incentivize their populations to get vaccinated. Included in this effort, the government of Georgia created a lottery initiative to monetarily reward citizens who were vaccinated and encourage participation from those who were hesitant to get vaccinated. The Georgian lottery system that developed out of this initiative included a website displaying lottery winner data leading to serious privacy leakage. In this paper, we develop of an attack framework that allows adversaries with minimal background knowledge to re-identify STOPCOV Lottery winners and deploying our system against a subpopulation vulnerable to attack. We then propose our privacy-enhanced alternative, PSLotto, which simultaneously preserves the functionalities of the existing STOPCOV Lottery system and protects the privacy of lottery winners.
2020年3月,世界卫生组织宣布新型冠状病毒(COVID-19)为全球大流行。在全球范围内,2019冠状病毒病的迅速蔓延使经济陷入停顿,人们纷纷下令呆在家里,并夺走了数百万人的生命。因此,当疫苗成为减缓COVID-19病毒传播的工具时,世界各国政府都在寻求激励民众接种疫苗。在这项努力中,格鲁吉亚政府发起了一项彩票倡议,以金钱奖励接种疫苗的公民,并鼓励那些犹豫不决是否接种疫苗的公民参与进来。根据这一倡议开发的格鲁吉亚彩票系统包括一个显示彩票中奖者数据的网站,导致严重的隐私泄露。在本文中,我们开发了一个攻击框架,允许具有最少背景知识的攻击者重新识别STOPCOV彩票中奖者,并将我们的系统部署到易受攻击的亚群中。然后,我们提出了我们的隐私增强替代方案PSLotto,它同时保留了现有STOPCOV彩票系统的功能,并保护了彩票中奖者的隐私。
{"title":"PSLotto: A Privacy-Enhanced COVID Lottery System","authors":"Stacey Truex, Giorgi Alavidze","doi":"10.1109/CogMI56440.2022.00019","DOIUrl":"https://doi.org/10.1109/CogMI56440.2022.00019","url":null,"abstract":"In March 2020, the World Health Organization (WHO) declared the novel coronavirus (COVID-19) a global pandemic. Globally the rapid spread of COVID-19 ground economies to a halt with stay at home orders and took the lives of millions of people. Therefore when vaccines became available as a tool to slow the spread of the COVID-19 virus, governments world-wide were looking to incentivize their populations to get vaccinated. Included in this effort, the government of Georgia created a lottery initiative to monetarily reward citizens who were vaccinated and encourage participation from those who were hesitant to get vaccinated. The Georgian lottery system that developed out of this initiative included a website displaying lottery winner data leading to serious privacy leakage. In this paper, we develop of an attack framework that allows adversaries with minimal background knowledge to re-identify STOPCOV Lottery winners and deploying our system against a subpopulation vulnerable to attack. We then propose our privacy-enhanced alternative, PSLotto, which simultaneously preserves the functionalities of the existing STOPCOV Lottery system and protects the privacy of lottery winners.","PeriodicalId":211430,"journal":{"name":"2022 IEEE 4th International Conference on Cognitive Machine Intelligence (CogMI)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115579790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Novel Approach for Unsupervised Learning of Highly-Imbalanced Data 一种高度不平衡数据的无监督学习新方法
Pub Date : 2022-12-01 DOI: 10.1109/CogMI56440.2022.00018
Robert K. L. Kennedy, Zahra Salekshahrezaee, T. Khoshgoftaar
Typical fraud datasets lack consistent and accurate labels and, as such, are typically highly imbalanced with non-fraud examples greatly outnumbering the fraudulent ones. This presents significant challenges to machine learning researchers and practitioners. Due to these challenges, an effective approach in identifying fraudulent data points needs to handle highly-imbalanced datasets and be robust to class labeling. This paper introduces a novel unsupervised procedure for learning from imbalanced datasets without class labels by iteratively cleaning the training dataset. Our methodology uses an autoencoder as an underlying learner. We describe its fraud detection performance and compare it to a baseline unsupervised fraud detection learner. Our results show that our procedure significantly outperforms the baseline, in both AUC and TPR, when testing on a publicly available highly-imbalanced credit card fraud detection dataset.
典型的欺诈数据集缺乏一致和准确的标签,因此,通常高度不平衡,非欺诈示例的数量大大超过欺诈示例。这对机器学习研究人员和实践者提出了重大挑战。由于这些挑战,识别欺诈数据点的有效方法需要处理高度不平衡的数据集,并且对类标记具有鲁棒性。本文介绍了一种新的无监督学习方法,通过迭代清洗训练数据集,从不平衡的无类标签数据集中学习。我们的方法使用自动编码器作为底层学习器。我们描述了它的欺诈检测性能,并将其与基线无监督欺诈检测学习器进行比较。我们的结果表明,当在公开可用的高度不平衡的信用卡欺诈检测数据集上进行测试时,我们的程序在AUC和TPR方面都明显优于基线。
{"title":"A Novel Approach for Unsupervised Learning of Highly-Imbalanced Data","authors":"Robert K. L. Kennedy, Zahra Salekshahrezaee, T. Khoshgoftaar","doi":"10.1109/CogMI56440.2022.00018","DOIUrl":"https://doi.org/10.1109/CogMI56440.2022.00018","url":null,"abstract":"Typical fraud datasets lack consistent and accurate labels and, as such, are typically highly imbalanced with non-fraud examples greatly outnumbering the fraudulent ones. This presents significant challenges to machine learning researchers and practitioners. Due to these challenges, an effective approach in identifying fraudulent data points needs to handle highly-imbalanced datasets and be robust to class labeling. This paper introduces a novel unsupervised procedure for learning from imbalanced datasets without class labels by iteratively cleaning the training dataset. Our methodology uses an autoencoder as an underlying learner. We describe its fraud detection performance and compare it to a baseline unsupervised fraud detection learner. Our results show that our procedure significantly outperforms the baseline, in both AUC and TPR, when testing on a publicly available highly-imbalanced credit card fraud detection dataset.","PeriodicalId":211430,"journal":{"name":"2022 IEEE 4th International Conference on Cognitive Machine Intelligence (CogMI)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122491726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An approach to dealing with incremental concept drift in personalized learning systems 个性化学习系统中增量概念漂移的处理方法
Pub Date : 2022-12-01 DOI: 10.1109/CogMI56440.2022.00029
Bander Allogmany, D. Josyula
In recent years, personalized learning systems have garnered significant academic research attention in the field of education. In a personalized learning system, learners receive a customized learning style that is tailored to their unique needs, goals, and abilities. Thus, students can achieve their objectives faster than with the traditional method of learning. Rapid advancements in artificial intelligence technologies enable tracking and influencing each student’s learning process. Machine learning algorithms facilitate the determination of students’ learning styles, abilities, and progress throughout the learning process. One of the major challenges to effective personalization is the resistance of machine learning models to adapt to non-stationary data streams. Machine learning models for personalized learning systems are susceptible to the concept drift phenomenon, in which the models’ performance deteriorates over time due to changes in data distribution. For successful personalization, it is critical for the underlying predictive and classification models to adapt to data distribution changes. In this paper, we propose an approach to address concept drifts in personalized learning systems, and evaluate the approach on the OULAD dataset infused with concept drift. The proposed method comprises training utilizing sequential features extracted automatically.
近年来,个性化学习系统在教育领域引起了广泛的学术研究关注。在个性化的学习系统中,学习者接受定制的学习方式,根据他们独特的需求、目标和能力量身定制。因此,学生可以比传统的学习方法更快地达到他们的目标。人工智能技术的快速发展使跟踪和影响每个学生的学习过程成为可能。机器学习算法有助于确定学生在整个学习过程中的学习风格、能力和进步。有效个性化的主要挑战之一是机器学习模型适应非平稳数据流的阻力。个性化学习系统的机器学习模型容易受到概念漂移现象的影响,在这种现象中,由于数据分布的变化,模型的性能随着时间的推移而恶化。对于成功的个性化,关键是底层预测和分类模型要适应数据分布的变化。在本文中,我们提出了一种解决个性化学习系统中概念漂移的方法,并在包含概念漂移的OULAD数据集上对该方法进行了评估。该方法包括利用自动提取的序列特征进行训练。
{"title":"An approach to dealing with incremental concept drift in personalized learning systems","authors":"Bander Allogmany, D. Josyula","doi":"10.1109/CogMI56440.2022.00029","DOIUrl":"https://doi.org/10.1109/CogMI56440.2022.00029","url":null,"abstract":"In recent years, personalized learning systems have garnered significant academic research attention in the field of education. In a personalized learning system, learners receive a customized learning style that is tailored to their unique needs, goals, and abilities. Thus, students can achieve their objectives faster than with the traditional method of learning. Rapid advancements in artificial intelligence technologies enable tracking and influencing each student’s learning process. Machine learning algorithms facilitate the determination of students’ learning styles, abilities, and progress throughout the learning process. One of the major challenges to effective personalization is the resistance of machine learning models to adapt to non-stationary data streams. Machine learning models for personalized learning systems are susceptible to the concept drift phenomenon, in which the models’ performance deteriorates over time due to changes in data distribution. For successful personalization, it is critical for the underlying predictive and classification models to adapt to data distribution changes. In this paper, we propose an approach to address concept drifts in personalized learning systems, and evaluate the approach on the OULAD dataset infused with concept drift. The proposed method comprises training utilizing sequential features extracted automatically.","PeriodicalId":211430,"journal":{"name":"2022 IEEE 4th International Conference on Cognitive Machine Intelligence (CogMI)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116664090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development of New Aerial Image Datasets and Deep Learning Methods for Waterfowl Detection and Classification 新型航空图像数据集的开发和水鸟检测与分类的深度学习方法
Pub Date : 2022-12-01 DOI: 10.1109/CogMI56440.2022.00026
Yang Zhang, Shiqi Wang, Zhenduo Zhai, Y. Shang, Reid Viegut, Elisabeth Webb, A. Raedeke, J. Sartwell
Monitoring waterfowl populations and distribution is important for conservation. This paper presents our recent work on creating new aerial image datasets collected by drones and applying and evaluating state-of-the-art deep learning models for waterfowl detection and classification. We collected thousands of aerial images from 10 conservation areas in Missouri, labeled around 600 images with close to 300,000 bird labels, and created 9 datasets with different properties for training and evaluating deep neural network models. Among the models, YOLOv5 performed the best, outperforming Faster R-CNN and RetinaNet. To reduce the amount of labeled data needed for model training, we applied Soft Teacher, a semi-supervised learning method, and obtained slightly better detection performance than supervised learning methods, with just half of the labeled training examples. We trained generic detection models using all datasets containing diverse images and obtained accurate detection results in most cases. For waterfowl classification, we created a dataset of images containing individual waterfowl by cropping them from raw aerial images. We applied several deep learning models to the dataset and obtained promising results.
监测水禽的数量和分布对保护很重要。本文介绍了我们最近在创建由无人机收集的新的航空图像数据集以及应用和评估最先进的水鸟检测和分类的深度学习模型方面的工作。我们从密苏里州的10个保护区收集了数千张航空图像,用近30万个鸟类标签标记了大约600张图像,并创建了9个具有不同属性的数据集,用于训练和评估深度神经网络模型。在这些模型中,YOLOv5表现最好,优于Faster R-CNN和RetinaNet。为了减少模型训练所需的标记数据量,我们应用了半监督学习方法Soft Teacher,仅使用了一半的标记训练样例,就获得了比监督学习方法稍好的检测性能。我们使用包含不同图像的所有数据集训练通用检测模型,并在大多数情况下获得准确的检测结果。对于水禽分类,我们从原始航空图像中裁剪出包含单个水禽的图像数据集。我们对数据集应用了几个深度学习模型,并获得了很好的结果。
{"title":"Development of New Aerial Image Datasets and Deep Learning Methods for Waterfowl Detection and Classification","authors":"Yang Zhang, Shiqi Wang, Zhenduo Zhai, Y. Shang, Reid Viegut, Elisabeth Webb, A. Raedeke, J. Sartwell","doi":"10.1109/CogMI56440.2022.00026","DOIUrl":"https://doi.org/10.1109/CogMI56440.2022.00026","url":null,"abstract":"Monitoring waterfowl populations and distribution is important for conservation. This paper presents our recent work on creating new aerial image datasets collected by drones and applying and evaluating state-of-the-art deep learning models for waterfowl detection and classification. We collected thousands of aerial images from 10 conservation areas in Missouri, labeled around 600 images with close to 300,000 bird labels, and created 9 datasets with different properties for training and evaluating deep neural network models. Among the models, YOLOv5 performed the best, outperforming Faster R-CNN and RetinaNet. To reduce the amount of labeled data needed for model training, we applied Soft Teacher, a semi-supervised learning method, and obtained slightly better detection performance than supervised learning methods, with just half of the labeled training examples. We trained generic detection models using all datasets containing diverse images and obtained accurate detection results in most cases. For waterfowl classification, we created a dataset of images containing individual waterfowl by cropping them from raw aerial images. We applied several deep learning models to the dataset and obtained promising results.","PeriodicalId":211430,"journal":{"name":"2022 IEEE 4th International Conference on Cognitive Machine Intelligence (CogMI)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123355984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Learning Methods for Tree Detection and Classification 树检测和分类的深度学习方法
Pub Date : 2022-12-01 DOI: 10.1109/CogMI56440.2022.00030
Yang Zhang, Yizhen Wang, Zhicheng Tang, Zhenduo Zhai, Y. Shang, Reid Viegut
This paper presents the results of our deep learning methods for tree detection and classification on aerial images in the Plant Recognition University Challenge sponsored by Ameren in 2021–2022. The task was to locate the trees in an aerial image and predict their family, genus, and species. For tree detection, we applied various supervised learning methods with labeled training data as well as semi-supervised learning methods with the addition of unlabeled data. Our experimental results show that the semi-supervised learning method outperformed the supervised learning methods, improving the f1-score by an average of three percent on the set of images used in the final Plant Challenge competition. For tree classification, We applied various machine learning methods and deep learning models for image classification to predict family, genus and species on the portions of images detected of trees by the detection models. By considering the relationships between family, genus and species, we developed a multi-head ResNet18-based neural network and increased mean accuracy by two percent over the baseline ResNet18. Finally, our team ranked first among all teams in the Plant Challenge competition.
本文介绍了我们在2021-2022年由Ameren赞助的植物识别大学挑战赛中对航空图像进行树木检测和分类的深度学习方法的结果。任务是在航拍图像中定位树木,并预测它们的科、属和种。对于树检测,我们应用了各种带标签训练数据的监督学习方法,以及添加无标签数据的半监督学习方法。我们的实验结果表明,半监督学习方法优于监督学习方法,在最终植物挑战赛中使用的图像集上,平均将f1分数提高了3%。对于树木分类,我们应用各种机器学习方法和深度学习模型进行图像分类,在检测模型检测到的树木图像部分上预测科、属和种。通过考虑科、属和种之间的关系,我们开发了一个基于ResNet18的多头神经网络,并将平均准确率提高了2%。最终,我们的团队在植物挑战赛中获得了所有团队的第一名。
{"title":"Deep Learning Methods for Tree Detection and Classification","authors":"Yang Zhang, Yizhen Wang, Zhicheng Tang, Zhenduo Zhai, Y. Shang, Reid Viegut","doi":"10.1109/CogMI56440.2022.00030","DOIUrl":"https://doi.org/10.1109/CogMI56440.2022.00030","url":null,"abstract":"This paper presents the results of our deep learning methods for tree detection and classification on aerial images in the Plant Recognition University Challenge sponsored by Ameren in 2021–2022. The task was to locate the trees in an aerial image and predict their family, genus, and species. For tree detection, we applied various supervised learning methods with labeled training data as well as semi-supervised learning methods with the addition of unlabeled data. Our experimental results show that the semi-supervised learning method outperformed the supervised learning methods, improving the f1-score by an average of three percent on the set of images used in the final Plant Challenge competition. For tree classification, We applied various machine learning methods and deep learning models for image classification to predict family, genus and species on the portions of images detected of trees by the detection models. By considering the relationships between family, genus and species, we developed a multi-head ResNet18-based neural network and increased mean accuracy by two percent over the baseline ResNet18. Finally, our team ranked first among all teams in the Plant Challenge competition.","PeriodicalId":211430,"journal":{"name":"2022 IEEE 4th International Conference on Cognitive Machine Intelligence (CogMI)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132599943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adversarial Promotion for Video based Recommender Systems 基于视频的推荐系统的对抗性推广
Pub Date : 2022-12-01 DOI: 10.1109/CogMI56440.2022.00028
DeMarcus Edwards, D. Rawat, Brian M. Sadler
Short-form video content is fast to consume, easy to digest, and for most creators, inexpensive to make. Content Creators on Video Content platforms have a vested interest in having their videos appear as high as possible in recommendations that users are shown. This paper demonstrates how content creators can manipulate video content to adversarially promote their ranking in a recommendation model that uses action classification labels as an input feature. We focus on the context of these videos in terms of action classification to extract context about these videos to then rank and adversarially promote. Our attack successfully boosted the predicted like probability in 78 percent of generated lists for our model trained with non-perturbed inputs. However, after adversarial training, our model trained with perturbed inputs was 20 percent less effective in boosting the rank of targeted videos.
短视频内容消费速度快,易于消化,而且对大多数创作者来说,制作成本不高。视频内容平台上的内容创作者有一个既得利益,那就是让他们的视频在向用户展示的推荐中出现得尽可能高。本文演示了内容创作者如何在使用动作分类标签作为输入特征的推荐模型中操纵视频内容以对抗性地提高其排名。我们在动作分类方面关注这些视频的上下文,提取关于这些视频的上下文,然后对这些视频进行排名和对抗性推广。对于使用非扰动输入训练的模型,我们的攻击成功地提高了78%的生成列表的预测相似概率。然而,经过对抗性训练后,我们的模型在提高目标视频排名方面的效率降低了20%。
{"title":"Adversarial Promotion for Video based Recommender Systems","authors":"DeMarcus Edwards, D. Rawat, Brian M. Sadler","doi":"10.1109/CogMI56440.2022.00028","DOIUrl":"https://doi.org/10.1109/CogMI56440.2022.00028","url":null,"abstract":"Short-form video content is fast to consume, easy to digest, and for most creators, inexpensive to make. Content Creators on Video Content platforms have a vested interest in having their videos appear as high as possible in recommendations that users are shown. This paper demonstrates how content creators can manipulate video content to adversarially promote their ranking in a recommendation model that uses action classification labels as an input feature. We focus on the context of these videos in terms of action classification to extract context about these videos to then rank and adversarially promote. Our attack successfully boosted the predicted like probability in 78 percent of generated lists for our model trained with non-perturbed inputs. However, after adversarial training, our model trained with perturbed inputs was 20 percent less effective in boosting the rank of targeted videos.","PeriodicalId":211430,"journal":{"name":"2022 IEEE 4th International Conference on Cognitive Machine Intelligence (CogMI)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134373178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2022 IEEE 4th International Conference on Cognitive Machine Intelligence (CogMI)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1