首页 > 最新文献

2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)最新文献

英文 中文
Evaluation of GAN Architectures For Visualisation of HPV Viruses From Microscopic Images 从显微图像中评估用于HPV病毒可视化的GAN结构
Pub Date : 2021-12-01 DOI: 10.1109/ICMLA52953.2021.00137
Xiaohong W. Gao, X. Wen, Dong Li, Weiping Liu, Jichun Xiong, Bin Xu, Juan Liu, Heng Zhang, Xuefeng Liu
Human papillomavirus (HPV) remains a leading cause of virus-induced cancers and has a typical size of 52 to 55nm in diameter. Hence conventional light microscopy that usually sustains a resolution at $sim$ 100nm per pixel falls short of detecting it. This study explores four state of the art generative adversarial networks (GANs) for visualising HPV. The evaluation is achieved by counting the HPV clusters that are corrected identified as well as drug treated cultured cells, i.e. no HPVs. The average sensitivity and specificity are 78.81%, 76.37%, 76.62% and 84.71% for CycleGAN, Pix2pix, ESRGAN and Pix2pixHD respectively. For ESRGAN, the training takes place by matching pairs between low and high resolution (x4) images. For the other three networks, the translation is performed from original raw images to their coloured maps that have undertaken Gaussian filtering in order to discern HPV clusters visually. Pix2pixHD appears to perform the best.
人乳头瘤病毒(HPV)仍然是病毒诱发癌症的主要原因,其典型直径为52至55nm。因此,通常维持在每像素100纳米的分辨率的传统光学显微镜无法检测到它。本研究探讨了可视化HPV的四种最先进的生成对抗网络(gan)。评估是通过计算被正确识别的HPV簇以及药物处理的培养细胞来实现的,即没有HPV。CycleGAN、Pix2pix、ESRGAN和Pix2pixHD的平均敏感性和特异性分别为78.81%、76.37%、76.62%和84.71%。对于ESRGAN,通过在低分辨率和高分辨率(x4)图像之间进行配对来进行训练。对于其他三个网络,从原始的原始图像到他们的彩色地图进行翻译,这些地图已经进行了高斯滤波,以便在视觉上识别HPV集群。Pix2pixHD表现最好。
{"title":"Evaluation of GAN Architectures For Visualisation of HPV Viruses From Microscopic Images","authors":"Xiaohong W. Gao, X. Wen, Dong Li, Weiping Liu, Jichun Xiong, Bin Xu, Juan Liu, Heng Zhang, Xuefeng Liu","doi":"10.1109/ICMLA52953.2021.00137","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00137","url":null,"abstract":"Human papillomavirus (HPV) remains a leading cause of virus-induced cancers and has a typical size of 52 to 55nm in diameter. Hence conventional light microscopy that usually sustains a resolution at $sim$ 100nm per pixel falls short of detecting it. This study explores four state of the art generative adversarial networks (GANs) for visualising HPV. The evaluation is achieved by counting the HPV clusters that are corrected identified as well as drug treated cultured cells, i.e. no HPVs. The average sensitivity and specificity are 78.81%, 76.37%, 76.62% and 84.71% for CycleGAN, Pix2pix, ESRGAN and Pix2pixHD respectively. For ESRGAN, the training takes place by matching pairs between low and high resolution (x4) images. For the other three networks, the translation is performed from original raw images to their coloured maps that have undertaken Gaussian filtering in order to discern HPV clusters visually. Pix2pixHD appears to perform the best.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"9 1","pages":"829-833"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73403651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Anomaly Detection of actual IoT traffic flows through Deep Learning 通过深度学习实现物联网实际流量异常检测
Pub Date : 2021-12-01 DOI: 10.1109/ICMLA52953.2021.00275
Lerina Aversano, M. Bernardi, Marta Cimitile, R. Pecori
The detection and classification of Internet traffic was studied in depth in the last twenty years, but this is still an open research issue as pertains the Internet of Things (IoT), mainly because real IoT traffic dataset are not very widespread. With this paper, we make public an integrated dataset, made of actual IoT network flows, built using six different network sources, which could represent a research reference for further investigations. Furthermore, we exploited it to optimize the hyper-parameters of a deep neural network and evaluate its performance for both distinguishing normal and abnormal traffic and discriminating different types of attacks, achieving very good results.
近二十年来,人们对互联网流量的检测和分类进行了深入的研究,但由于物联网(IoT)的真实流量数据集不是很广泛,这仍然是一个开放的研究问题。在本文中,我们公开了一个集成的数据集,由实际的物联网网络流组成,使用六个不同的网络源,可以为进一步的研究提供参考。此外,我们利用它来优化深度神经网络的超参数,并评估其区分正常和异常流量以及区分不同类型攻击的性能,取得了很好的效果。
{"title":"Anomaly Detection of actual IoT traffic flows through Deep Learning","authors":"Lerina Aversano, M. Bernardi, Marta Cimitile, R. Pecori","doi":"10.1109/ICMLA52953.2021.00275","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00275","url":null,"abstract":"The detection and classification of Internet traffic was studied in depth in the last twenty years, but this is still an open research issue as pertains the Internet of Things (IoT), mainly because real IoT traffic dataset are not very widespread. With this paper, we make public an integrated dataset, made of actual IoT network flows, built using six different network sources, which could represent a research reference for further investigations. Furthermore, we exploited it to optimize the hyper-parameters of a deep neural network and evaluate its performance for both distinguishing normal and abnormal traffic and discriminating different types of attacks, achieving very good results.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"233 1","pages":"1736-1741"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77494507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MetaPrep: Data preparation pipelines recommendation via meta-learning MetaPrep:通过元学习推荐数据准备管道
Pub Date : 2021-12-01 DOI: 10.1109/ICMLA52953.2021.00194
F. Zagatti, L. C. Silva, Lucas Nildaimon Dos Santos Silva, B. S. Sette, Helena de Medeiros Caseli, D. Lucrédio, D. F. Silva
Data preparation is a mandatory phase in the machine learning pipeline. The goal of data preparation is to convert noisy and disordered data into refined data that can be used by the algorithms. However, data preparation is time-consuming and requires specialized knowledge about the data and algorithms. Therefore, automating data preparation is essential to decrease the effort made by data scientists to develop satisfactory models. Despite its relevance, current AutoML platforms disregard or make simple hardcoded data preparation pipelines. Trying to fill this gap, we present a meta-learning-based recommendation system for data preparation. Our system recommends five pipelines, ranked by their relevance, making it useful for users with varying degrees of experience. Using the top-1 pipeline we demonstrated that our proposal allows a better performance of an AutoML system. Furthermore, the accuracy rates of our method were comparable to those achieved by a reinforcement-learning-based algorithm with the same goal, but it was up to two orders of magnitude faster. Moreover, we tested our method in a real-world application and evaluated its benefits and limitations in this scenario.
数据准备是机器学习管道中必不可少的阶段。数据准备的目标是将有噪声和无序的数据转换为算法可以使用的精细数据。但是,数据准备非常耗时,并且需要对数据和算法有专门的了解。因此,自动化数据准备对于减少数据科学家开发令人满意的模型的工作量至关重要。尽管它具有相关性,但当前的AutoML平台忽略或制作简单的硬编码数据准备管道。为了填补这一空白,我们提出了一个基于元学习的数据准备推荐系统。我们的系统推荐了五个管道,根据它们的相关性进行排名,使其对具有不同程度经验的用户有用。通过使用top-1管道,我们证明了我们的建议可以提高AutoML系统的性能。此外,我们的方法的准确率与基于强化学习的算法所达到的准确率相当,但速度快了两个数量级。此外,我们在实际应用程序中测试了我们的方法,并评估了它在此场景中的优点和局限性。
{"title":"MetaPrep: Data preparation pipelines recommendation via meta-learning","authors":"F. Zagatti, L. C. Silva, Lucas Nildaimon Dos Santos Silva, B. S. Sette, Helena de Medeiros Caseli, D. Lucrédio, D. F. Silva","doi":"10.1109/ICMLA52953.2021.00194","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00194","url":null,"abstract":"Data preparation is a mandatory phase in the machine learning pipeline. The goal of data preparation is to convert noisy and disordered data into refined data that can be used by the algorithms. However, data preparation is time-consuming and requires specialized knowledge about the data and algorithms. Therefore, automating data preparation is essential to decrease the effort made by data scientists to develop satisfactory models. Despite its relevance, current AutoML platforms disregard or make simple hardcoded data preparation pipelines. Trying to fill this gap, we present a meta-learning-based recommendation system for data preparation. Our system recommends five pipelines, ranked by their relevance, making it useful for users with varying degrees of experience. Using the top-1 pipeline we demonstrated that our proposal allows a better performance of an AutoML system. Furthermore, the accuracy rates of our method were comparable to those achieved by a reinforcement-learning-based algorithm with the same goal, but it was up to two orders of magnitude faster. Moreover, we tested our method in a real-world application and evaluated its benefits and limitations in this scenario.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"26 1","pages":"1197-1202"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81413978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Elastic distributed training with fast convergence and efficient resource utilization 具有快速收敛和高效资源利用的弹性分布式训练
Pub Date : 2021-12-01 DOI: 10.1109/ICMLA52953.2021.00160
Guojing Cong
Distributed learning is now routinely conducted on cloud as well as dedicated clusters. Training with elastic resources brings new challenges and design choices. Prior studies focus on runtime performance and assume a static algorithmic behavior. In this work, by analyzing the impact of of resource scaling on convergence, we introduce schedules for synchronous stochastic gradient descent that proactively adapt the number of learners to reduce training time and improve convergence. Our approach no longer assumes a constant number of processors throughout training. In our experiment, distributed stochastic gradient descent with dynamic schedules and reduction momentum achieves better convergence and significant speedups over prior static ones. Numerous distributed training jobs running on cloud may benefit from our approach.
分布式学习现在在云和专用集群上例行进行。弹性资源的培训带来了新的挑战和设计选择。先前的研究主要关注运行时性能,并假设静态算法行为。在这项工作中,通过分析资源缩放对收敛的影响,我们引入同步随机梯度下降的调度,主动适应学习者的数量,以减少训练时间和提高收敛性。我们的方法不再假设在整个训练过程中处理器的数量是恒定的。在我们的实验中,具有动态调度和约简动量的分布式随机梯度下降比先前的静态梯度下降具有更好的收敛性和显著的速度。在云上运行的许多分布式训练作业可能受益于我们的方法。
{"title":"Elastic distributed training with fast convergence and efficient resource utilization","authors":"Guojing Cong","doi":"10.1109/ICMLA52953.2021.00160","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00160","url":null,"abstract":"Distributed learning is now routinely conducted on cloud as well as dedicated clusters. Training with elastic resources brings new challenges and design choices. Prior studies focus on runtime performance and assume a static algorithmic behavior. In this work, by analyzing the impact of of resource scaling on convergence, we introduce schedules for synchronous stochastic gradient descent that proactively adapt the number of learners to reduce training time and improve convergence. Our approach no longer assumes a constant number of processors throughout training. In our experiment, distributed stochastic gradient descent with dynamic schedules and reduction momentum achieves better convergence and significant speedups over prior static ones. Numerous distributed training jobs running on cloud may benefit from our approach.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"29 1","pages":"972-979"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84373472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Text Mining Approach To Predict Non-Adherence 预测非依从性的文本挖掘方法
Pub Date : 2021-12-01 DOI: 10.1109/ICMLA52953.2021.00236
Yufan Wang, Mahsa Mohaghegh
Companies operating patient support programs for chronic diseases have been dedicated to enhancing treatment adherence by utilizing data from various interventions of the programs. The purpose of this paper is to examine whether the textual patient notes recorded by program coordinators can be beneficial to predict non-adherence and provide useful insights. In this paper we show work in processing and analyzing over 20,000 patient notes corresponding to 1313 Psoriasis patients using statistical analysis and several NLP methods, such as term representation, sentiment analysis and topic modelling. To build predictive models, Support Vector Machine (SVM), Random Forest (RF) and Logistic Regression (LR) are tested with different feature subsets. The best performing model is SVM with 93% accuracy and 91% recall of non-adherent. Additionally, we also present patterns to differentiate non-adherent and adherent patients in terms of completion efficiency of call objectives and uncontactable problem. Accordingly, high-risk patients can be targeted to take interventions.
经营慢性疾病患者支持项目的公司一直致力于通过利用项目的各种干预数据来提高治疗依从性。本文的目的是研究由项目协调员记录的文本患者笔记是否有助于预测不依从性并提供有用的见解。在本文中,我们展示了使用统计分析和几种NLP方法(如术语表示、情感分析和主题建模)处理和分析1313名牛皮癣患者的20,000多份患者笔记的工作。为了建立预测模型,使用不同的特征子集对支持向量机(SVM)、随机森林(RF)和逻辑回归(LR)进行了测试。表现最好的模型是SVM,准确率为93%,非粘附召回率为91%。此外,我们也提出了模式来区分非依从性和依从性患者在呼叫目标的完成效率和不可接触的问题。因此,高危患者可以有针对性地采取干预措施。
{"title":"Text Mining Approach To Predict Non-Adherence","authors":"Yufan Wang, Mahsa Mohaghegh","doi":"10.1109/ICMLA52953.2021.00236","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00236","url":null,"abstract":"Companies operating patient support programs for chronic diseases have been dedicated to enhancing treatment adherence by utilizing data from various interventions of the programs. The purpose of this paper is to examine whether the textual patient notes recorded by program coordinators can be beneficial to predict non-adherence and provide useful insights. In this paper we show work in processing and analyzing over 20,000 patient notes corresponding to 1313 Psoriasis patients using statistical analysis and several NLP methods, such as term representation, sentiment analysis and topic modelling. To build predictive models, Support Vector Machine (SVM), Random Forest (RF) and Logistic Regression (LR) are tested with different feature subsets. The best performing model is SVM with 93% accuracy and 91% recall of non-adherent. Additionally, we also present patterns to differentiate non-adherent and adherent patients in terms of completion efficiency of call objectives and uncontactable problem. Accordingly, high-risk patients can be targeted to take interventions.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"12 1","pages":"1468-1471"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85866814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A proposal to identify stakeholders from news for the institutional relationship management activities of an institution based on Named Entity Recognition using BERT 基于BERT的命名实体识别,从新闻中识别机构关系管理活动的利益相关者
Pub Date : 2021-12-01 DOI: 10.1109/ICMLA52953.2021.00251
Eric Hans Messias Da Silva, J. Laterza, Marcos Paulo Pereira Da Silva, M. Ladeira
For an organization’s institutional relationship activities, it is strategic that there is an efficient process of identification and characterization of stakeholders based on available information. Given the increasing volume of data currently available, this strategic process has commonly been supported by information technology solutions, with high potential for the use of data mining techniques such as textual analysis and natural language processing (NLP). In this work we analyzed the possibility of using a mechanism of Named Entity Recognition (NER) based on the use of Bidirectional Encoder Representations from Transformers (BERT) with Conditional Random Field (CRF), which in the future can be used as the stakeholder identification solution as a replacement of the rule based identification. We applied the proposed solution in news dataset to evaluate its performance. The experiment results showed us that pre-trained Portuguese models performed better than Multilingual ones by a good margin of at least 3.43 percentage points on Test Dataset. We also added a post processing Prediction Masking to correct invalid tagging scheme transitions to improve Micro F1 Score in both datasets ranging from 0.38 percentage points to 1.29 percentage points of improvement. Thus, we achieved the objective of improving stakeholder detection by proposing a NER model that far surpasses the naive rules-based approach of current application, which consisted of an exact text match of stakeholders based on a dictionary built manually.
对于一个组织的制度关系活动来说,有一个基于现有信息的识别和描述利益相关者的有效过程是具有战略意义的。鉴于目前可用的数据量不断增加,这一战略过程通常得到信息技术解决方案的支持,具有使用文本分析和自然语言处理(NLP)等数据挖掘技术的巨大潜力。在这项工作中,我们分析了使用基于条件随机场(CRF)的双向编码器表示(BERT)的命名实体识别(NER)机制的可能性,该机制在未来可以用作利益相关者识别解决方案,以替代基于规则的识别。我们将提出的解决方案应用于新闻数据集来评估其性能。实验结果表明,在测试数据集上,预训练的葡萄牙语模型比多语言模型表现得更好,至少高出3.43个百分点。我们还添加了一个后处理预测掩蔽来纠正无效的标记方案转换,以提高两个数据集的Micro F1分数,提高幅度从0.38个百分点到1.29个百分点不等。因此,我们通过提出一个NER模型实现了改进利益相关者检测的目标,该模型远远超过了当前应用程序中基于规则的朴素方法,该方法由基于手动构建的字典的利益相关者的精确文本匹配组成。
{"title":"A proposal to identify stakeholders from news for the institutional relationship management activities of an institution based on Named Entity Recognition using BERT","authors":"Eric Hans Messias Da Silva, J. Laterza, Marcos Paulo Pereira Da Silva, M. Ladeira","doi":"10.1109/ICMLA52953.2021.00251","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00251","url":null,"abstract":"For an organization’s institutional relationship activities, it is strategic that there is an efficient process of identification and characterization of stakeholders based on available information. Given the increasing volume of data currently available, this strategic process has commonly been supported by information technology solutions, with high potential for the use of data mining techniques such as textual analysis and natural language processing (NLP). In this work we analyzed the possibility of using a mechanism of Named Entity Recognition (NER) based on the use of Bidirectional Encoder Representations from Transformers (BERT) with Conditional Random Field (CRF), which in the future can be used as the stakeholder identification solution as a replacement of the rule based identification. We applied the proposed solution in news dataset to evaluate its performance. The experiment results showed us that pre-trained Portuguese models performed better than Multilingual ones by a good margin of at least 3.43 percentage points on Test Dataset. We also added a post processing Prediction Masking to correct invalid tagging scheme transitions to improve Micro F1 Score in both datasets ranging from 0.38 percentage points to 1.29 percentage points of improvement. Thus, we achieved the objective of improving stakeholder detection by proposing a NER model that far surpasses the naive rules-based approach of current application, which consisted of an exact text match of stakeholders based on a dictionary built manually.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"633 1","pages":"1569-1575"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77083273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Hashtags: an essential aspect of topic modeling of city events through social media. 话题标签:通过社交媒体对城市事件进行主题建模的一个重要方面。
Pub Date : 2021-12-01 DOI: 10.1109/ICMLA52953.2021.00255
Mikhail V. Kovalchuk, D. Nasonov
Today, the city is full of digital information, which can be extremely useful in various applications. Instagram, Facebook, VKontakte, and other popular social networks contain a vast amount of valuable data. This information reflects individual stories of people and the background of the city, its events, and current activities in different areas and places of attraction. City events have essential attributes like the time of occurrence, geographical coverage, audience, and often expressed interests or topics. Owning the subject of events, you can solve a whole range of tasks - from individual recommendation systems for leisure activities for citizens and tourists to providing services in the field of food (food trucks) and transport (taxis). To determine the topic (subject) of events, it is necessary to solve two crucial tasks: to identify the events themselves from a variety of city posts and to develop an approach based on modern natural language processing methods for identifying events topics. To determine the events, we suggest an improved algorithm that we had previously developed that integrates time window and area coverage strategy. However, the focus of the work is on the analysis of different approaches to identifying topics, considering the heterogeneity of posts, both in semantic meaning and in size and structure. The focus of this paper is the importance of using post hashtags in various variations to set up more accurate models. In addition, the analysis of features for different language groups was carried out.
如今,这个城市充满了数字信息,这些信息在各种应用中都非常有用。Instagram、Facebook、VKontakte和其他流行的社交网络包含大量有价值的数据。这些信息反映了人们的个人故事和城市背景,它的事件,以及不同地区和景点的当前活动。城市事件具有一些基本属性,如发生时间、地理覆盖范围、受众以及经常表达的兴趣或主题。拥有活动主题,您可以解决一系列任务-从为市民和游客提供休闲活动的个人推荐系统到提供食品(食品卡车)和运输(出租车)领域的服务。为了确定事件的主题(subject),有必要解决两个关键任务:从各种城市帖子中识别事件本身,并开发一种基于现代自然语言处理方法的方法来识别事件主题。为了确定事件,我们提出了一种改进的算法,该算法集成了时间窗口和区域覆盖策略。但是,工作的重点是分析确定题目的不同方法,考虑到员额在语义和大小和结构方面的异质性。本文的重点是在各种变体中使用post hashtag来建立更准确的模型的重要性。此外,还对不同语言群体的特征进行了分析。
{"title":"Hashtags: an essential aspect of topic modeling of city events through social media.","authors":"Mikhail V. Kovalchuk, D. Nasonov","doi":"10.1109/ICMLA52953.2021.00255","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00255","url":null,"abstract":"Today, the city is full of digital information, which can be extremely useful in various applications. Instagram, Facebook, VKontakte, and other popular social networks contain a vast amount of valuable data. This information reflects individual stories of people and the background of the city, its events, and current activities in different areas and places of attraction. City events have essential attributes like the time of occurrence, geographical coverage, audience, and often expressed interests or topics. Owning the subject of events, you can solve a whole range of tasks - from individual recommendation systems for leisure activities for citizens and tourists to providing services in the field of food (food trucks) and transport (taxis). To determine the topic (subject) of events, it is necessary to solve two crucial tasks: to identify the events themselves from a variety of city posts and to develop an approach based on modern natural language processing methods for identifying events topics. To determine the events, we suggest an improved algorithm that we had previously developed that integrates time window and area coverage strategy. However, the focus of the work is on the analysis of different approaches to identifying topics, considering the heterogeneity of posts, both in semantic meaning and in size and structure. The focus of this paper is the importance of using post hashtags in various variations to set up more accurate models. In addition, the analysis of features for different language groups was carried out.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"1 1","pages":"1594-1599"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82039740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Influence of Training Data on the Invertability of Neural Networks for Handwritten Digit Recognition 训练数据对手写体数字识别神经网络可逆性的影响
Pub Date : 2021-12-01 DOI: 10.1109/ICMLA52953.2021.00122
Antonia Adler, Michaela Geierhos, Eleanor Hobley
Model inversion attacks aim to extract details of training data from a trained model, potentially revealing sensitive information about a person’s identity. To abide with protection of personal privacy requirements, it is important to understand the mechanisms that increase the privacy of training data. In this work, we systematically investigated the impact of the training data on a model’s susceptibility to model inversion attacks for models trained at the task of hand-written digit recognition with the openly available MNIST dataset. Using an optimization-based inversion approach, we studied the impacts of the quantity and diversity of training data, and the number and selection of classes on the susceptibility of models to inversion. Our model inversion attack strategy was less successful for models with a larger number of training data and greater training data diversity. Moreover, atypical training records provided additional protection against model inversion. We discovered that not every class was equally susceptible to model inversion attacks and that the inversion results of one class were changed when models were trained with a different selection of classes. However, we did not detect a clear relationship between the number of classes and a model’s susceptibility to inversion. Our study shows that the inversion susceptibility of a model depends on the training data-not only the data used to train the class that is inverted, but also the data used to train the other classes.
模型反转攻击旨在从训练模型中提取训练数据的细节,这可能会泄露有关个人身份的敏感信息。为了遵守个人隐私保护要求,了解增加训练数据隐私的机制非常重要。在这项工作中,我们系统地研究了训练数据对模型对模型反演攻击敏感性的影响,这些模型是用公开可用的MNIST数据集训练的手写数字识别任务。采用基于优化的反演方法,研究了训练数据的数量和多样性、类别的数量和选择对模型反演敏感性的影响。我们的模型反转攻击策略对于具有大量训练数据和更大的训练数据多样性的模型来说不太成功。此外,非典型训练记录为防止模型反演提供了额外的保护。我们发现,并不是每个类都同样容易受到模型反演攻击,而且当模型使用不同的类进行训练时,一个类的反演结果会发生变化。然而,我们没有发现类的数量和模型对反转的敏感性之间有明确的关系。我们的研究表明,模型的反演敏感性取决于训练数据——不仅取决于用于训练被反演的类的数据,还取决于用于训练其他类的数据。
{"title":"Influence of Training Data on the Invertability of Neural Networks for Handwritten Digit Recognition","authors":"Antonia Adler, Michaela Geierhos, Eleanor Hobley","doi":"10.1109/ICMLA52953.2021.00122","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00122","url":null,"abstract":"Model inversion attacks aim to extract details of training data from a trained model, potentially revealing sensitive information about a person’s identity. To abide with protection of personal privacy requirements, it is important to understand the mechanisms that increase the privacy of training data. In this work, we systematically investigated the impact of the training data on a model’s susceptibility to model inversion attacks for models trained at the task of hand-written digit recognition with the openly available MNIST dataset. Using an optimization-based inversion approach, we studied the impacts of the quantity and diversity of training data, and the number and selection of classes on the susceptibility of models to inversion. Our model inversion attack strategy was less successful for models with a larger number of training data and greater training data diversity. Moreover, atypical training records provided additional protection against model inversion. We discovered that not every class was equally susceptible to model inversion attacks and that the inversion results of one class were changed when models were trained with a different selection of classes. However, we did not detect a clear relationship between the number of classes and a model’s susceptibility to inversion. Our study shows that the inversion susceptibility of a model depends on the training data-not only the data used to train the class that is inverted, but also the data used to train the other classes.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"61 1","pages":"730-737"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84195234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Augmented Image Captioning Model: Incorporating Hierarchical Image Information 一种融合分层图像信息的增强图像字幕模型
Pub Date : 2021-12-01 DOI: 10.1109/ICMLA52953.2021.00257
Nathan Funckes, Erin Carrier, Greg Wolffe
Despite published accessibility standards many websites remain nan-compliant, containing images lacking accompanying textual descriptions. This leaves visually-impaired individuals unable to fully enjoy the rich wonders of the web. To help address this inequity, our research seeks to improve the ability of autonomous systems to generate accurate, relevant image descriptions. Our model enhances training efficacy by incorporating the use of category labels, high-level object superclasses, which are derivable using modern object-detection models. We show that this simple augmentation to an existing architecture results in a statistically significant improvement in caption quality.
尽管发布了可访问性标准,但许多网站仍然不兼容,包含的图像缺乏相应的文本描述。这使得视障人士无法充分享受网络的丰富奇观。为了帮助解决这种不平等,我们的研究旨在提高自主系统生成准确、相关图像描述的能力。我们的模型通过结合使用类别标签和高级对象超类来提高训练效率,这些超类可以使用现代对象检测模型派生。我们表明,这种对现有架构的简单增强在统计上显著提高了标题质量。
{"title":"An Augmented Image Captioning Model: Incorporating Hierarchical Image Information","authors":"Nathan Funckes, Erin Carrier, Greg Wolffe","doi":"10.1109/ICMLA52953.2021.00257","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00257","url":null,"abstract":"Despite published accessibility standards many websites remain nan-compliant, containing images lacking accompanying textual descriptions. This leaves visually-impaired individuals unable to fully enjoy the rich wonders of the web. To help address this inequity, our research seeks to improve the ability of autonomous systems to generate accurate, relevant image descriptions. Our model enhances training efficacy by incorporating the use of category labels, high-level object superclasses, which are derivable using modern object-detection models. We show that this simple augmentation to an existing architecture results in a statistically significant improvement in caption quality.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"34 1","pages":"1608-1614"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87834203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Auto-encoder LSTM for Li-ion SOH prediction: a comparative study on various benchmark datasets 用于锂离子SOH预测的自编码器LSTM:不同基准数据集的比较研究
Pub Date : 2021-12-01 DOI: 10.1109/ICMLA52953.2021.00246
Paul Audin, I. Jorge, T. Mesbahi, Ahmed Samet, F. D. Beuvron, R. Boné
Lithium-ion batteries are used in most battery powered devices. Today’s research on Lithium-ion batteries mainly focuses on better energy management strategies and predictive maintenance. In this paper, a new approach based on auto-encoders and long short-term memory neural networks applied to usage data (voltage, current, temperature) is used to make a State of Health prediction. Encouraging results are obtained when conducting tests on various battery ageing datasets published by Sandia National Laboratories, the Massachusetts Institute of Technology and NASA’s Prognostics Center of Excellence.
锂离子电池用于大多数电池供电的设备。目前对锂离子电池的研究主要集中在更好的能量管理策略和预测性维护上。本文提出了一种基于自编码器和长短期记忆神经网络的新方法,将其应用于使用数据(电压、电流、温度)进行健康状态预测。在对桑迪亚国家实验室、麻省理工学院和美国宇航局卓越预测中心发布的各种电池老化数据集进行测试时,获得了令人鼓舞的结果。
{"title":"Auto-encoder LSTM for Li-ion SOH prediction: a comparative study on various benchmark datasets","authors":"Paul Audin, I. Jorge, T. Mesbahi, Ahmed Samet, F. D. Beuvron, R. Boné","doi":"10.1109/ICMLA52953.2021.00246","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00246","url":null,"abstract":"Lithium-ion batteries are used in most battery powered devices. Today’s research on Lithium-ion batteries mainly focuses on better energy management strategies and predictive maintenance. In this paper, a new approach based on auto-encoders and long short-term memory neural networks applied to usage data (voltage, current, temperature) is used to make a State of Health prediction. Encouraging results are obtained when conducting tests on various battery ageing datasets published by Sandia National Laboratories, the Massachusetts Institute of Technology and NASA’s Prognostics Center of Excellence.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"5 1","pages":"1529-1536"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87645903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1