首页 > 最新文献

Machine learning and knowledge extraction最新文献

英文 中文
Can Principal Component Analysis Be Used to Explore the Relationship of Rowing Kinematics and Force Production in Elite Rowers during a Step Test? A Pilot Study 主成分分析可以用来探讨赛艇运动和力量生产的关系,在一个步骤测试中的优秀赛艇运动员?一项初步研究
Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-02-17 DOI: 10.3390/make5010015
M. Jensen, T. Stellingwerff, C. Pollock, J. Wakeling, M. Klimstra
Investigating the relationship between the movement patterns of multiple limb segments during the rowing stroke on the resulting force production in elite rowers can provide foundational insight into optimal technique. It can also highlight potential mechanisms of injury and performance improvement. The purpose of this study was to conduct a kinematic analysis of the rowing stroke together with force production during a step test in elite national-team heavyweight men to evaluate the fundamental patterns that contribute to expert performance. Twelve elite heavyweight male rowers performed a step test on a row-perfect sliding ergometer [5 × 1 min with 1 min rest at set stroke rates (20, 24, 28, 32, 36)]. Joint angle displacement and velocity of the hip, knee and elbow were measured with electrogoniometers, and force was measured with a tension/compression force transducer in line with the handle. To explore interactions between kinematic patterns and stroke performance variables, joint angular velocities of the hip, knee and elbow were entered into principal component analysis (PCA) and separate ANCOVAs were run for each performance variable (peak force, impulse, split time) with dependent variables, and the kinematic loading scores (Kpc,ls) as covariates with athlete/stroke rate as fixed factors. The results suggested that rowers’ kinematic patterns respond differently across varying stroke rates. The first seven PCs accounted for 79.5% (PC1 [26.4%], PC2 [14.6%], PC3 [11.3%], PC4 [8.4%], PC5 [7.5%], PC6 [6.5%], PC7 [4.8%]) of the variances in the signal. The PCs contributing significantly (p ≤ 0.05) to performance metrics based on PC loading scores from an ANCOVA were (PC1, PC2, PC6) for split time, (PC3, PC4, PC5, PC6) for impulse, and (PC1, PC6, PC7) for peak force. The significant PCs for each performance measure were used to reconstruct the kinematic patterns for split time, impulse and peak force separately. Overall, PCA was able to differentiate between rowers and stroke rates, and revealed features of the rowing-stroke technique correlated with measures of performance that may highlight meaningful technique-optimization strategies. PCA could be used to provide insight into differences in kinematic strategies that could result in suboptimal performance, potential asymmetries or to determine how well a desired technique change has been accomplished by group and/or individual athletes.
研究桨桨运动员在划桨过程中多个肢体运动模式与产生的力量之间的关系,可以为最佳技术提供基础见解。它还可以突出潜在的损伤机制和性能改善。本研究的目的是在一个国家级重量级男子精英队的阶段测试中,对划桨动作和力量产生进行运动学分析,以评估有助于专家表现的基本模式。12名优秀的重量级男子赛艇运动员在划桨完美滑动测力仪上进行了一步测试[5 × 1分钟,休息1分钟,设定划桨率[20,24,28,32,36]]。用测角仪测量髋关节、膝关节和肘关节的角位移和速度,用与手柄对齐的拉力/压缩力传感器测量力。为了探索运动模式与动作表现变量之间的相互作用,将髋关节、膝关节和肘关节角速度纳入主成分分析(PCA),并对每个动作变量(峰值力、冲量、分裂时间)进行独立的ANCOVAs分析,并将运动负荷评分(Kpc、ls)作为协变量,以运动员/动作率为固定因素。结果表明,赛艇运动员的运动模式在不同的冲程速率下反应不同。前7个pc占信号方差的79.5% (PC1[26.4%]、PC2[14.6%]、PC3[11.3%]、PC4[8.4%]、PC5[7.5%]、PC6[6.5%]、PC7[4.8%])。基于ANCOVA的PC加载分数,对性能指标贡献显著(p≤0.05)的PC为分裂时间(PC1, PC2, PC6),冲量(PC3, PC4, PC5, PC6)和峰值力(PC1, PC6, PC7)。每个性能指标的显著pc分别用于重建分裂时间、冲量和峰值力的运动学模式。总体而言,PCA能够区分桨手和划桨率,并揭示了划桨技术与性能指标相关的特征,这些指标可能会突出有意义的技术优化策略。PCA可以用来洞察运动策略的差异,这些差异可能导致次优表现,潜在的不对称,或者确定团队和/或个人运动员完成所需技术变化的程度。
{"title":"Can Principal Component Analysis Be Used to Explore the Relationship of Rowing Kinematics and Force Production in Elite Rowers during a Step Test? A Pilot Study","authors":"M. Jensen, T. Stellingwerff, C. Pollock, J. Wakeling, M. Klimstra","doi":"10.3390/make5010015","DOIUrl":"https://doi.org/10.3390/make5010015","url":null,"abstract":"Investigating the relationship between the movement patterns of multiple limb segments during the rowing stroke on the resulting force production in elite rowers can provide foundational insight into optimal technique. It can also highlight potential mechanisms of injury and performance improvement. The purpose of this study was to conduct a kinematic analysis of the rowing stroke together with force production during a step test in elite national-team heavyweight men to evaluate the fundamental patterns that contribute to expert performance. Twelve elite heavyweight male rowers performed a step test on a row-perfect sliding ergometer [5 × 1 min with 1 min rest at set stroke rates (20, 24, 28, 32, 36)]. Joint angle displacement and velocity of the hip, knee and elbow were measured with electrogoniometers, and force was measured with a tension/compression force transducer in line with the handle. To explore interactions between kinematic patterns and stroke performance variables, joint angular velocities of the hip, knee and elbow were entered into principal component analysis (PCA) and separate ANCOVAs were run for each performance variable (peak force, impulse, split time) with dependent variables, and the kinematic loading scores (Kpc,ls) as covariates with athlete/stroke rate as fixed factors. The results suggested that rowers’ kinematic patterns respond differently across varying stroke rates. The first seven PCs accounted for 79.5% (PC1 [26.4%], PC2 [14.6%], PC3 [11.3%], PC4 [8.4%], PC5 [7.5%], PC6 [6.5%], PC7 [4.8%]) of the variances in the signal. The PCs contributing significantly (p ≤ 0.05) to performance metrics based on PC loading scores from an ANCOVA were (PC1, PC2, PC6) for split time, (PC3, PC4, PC5, PC6) for impulse, and (PC1, PC6, PC7) for peak force. The significant PCs for each performance measure were used to reconstruct the kinematic patterns for split time, impulse and peak force separately. Overall, PCA was able to differentiate between rowers and stroke rates, and revealed features of the rowing-stroke technique correlated with measures of performance that may highlight meaningful technique-optimization strategies. PCA could be used to provide insight into differences in kinematic strategies that could result in suboptimal performance, potential asymmetries or to determine how well a desired technique change has been accomplished by group and/or individual athletes.","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"51 1","pages":"237-251"},"PeriodicalIF":0.0,"publicationDate":"2023-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85798624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
InvMap and Witness Simplicial Variational Auto-Encoders InvMap和Witness简单变分自编码器
Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-02-05 DOI: 10.3390/make5010014
Aniss Aiman Medbouhi, Vladislav Polianskii, Anastasia Varava, Danica Kragic
Variational auto-encoders (VAEs) are deep generative models used for unsupervised learning, however their standard version is not topology-aware in practice since the data topology may not be taken into consideration. In this paper, we propose two different approaches with the aim to preserve the topological structure between the input space and the latent representation of a VAE. Firstly, we introduce InvMap-VAE as a way to turn any dimensionality reduction technique, given an embedding it produces, into a generative model within a VAE framework providing an inverse mapping into original space. Secondly, we propose the Witness Simplicial VAE as an extension of the simplicial auto-encoder to the variational setup using a witness complex for computing the simplicial regularization, and we motivate this method theoretically using tools from algebraic topology. The Witness Simplicial VAE is independent of any dimensionality reduction technique and together with its extension, Isolandmarks Witness Simplicial VAE, preserves the persistent Betti numbers of a dataset better than a standard VAE.
变分自编码器(VAEs)是用于无监督学习的深度生成模型,但是它们的标准版本在实践中不具有拓扑感知,因为可能没有考虑数据拓扑。在本文中,我们提出了两种不同的方法,目的是保持输入空间和VAE潜在表示之间的拓扑结构。首先,我们介绍了InvMap-VAE,将其作为一种将任何降维技术(给定其产生的嵌入)转换为VAE框架内的生成模型的方法,该框架提供了到原始空间的逆映射。其次,我们提出了Witness Simplicial VAE作为简单自编码器的扩展到变分设置,使用一个见证人复形来计算简单正则化,并使用代数拓扑工具从理论上激励了这种方法。Witness Simplicial VAE独立于任何降维技术,并且与它的扩展,Isolandmarks Witness Simplicial VAE一起,比标准VAE更好地保留了数据集的持久Betti数。
{"title":"InvMap and Witness Simplicial Variational Auto-Encoders","authors":"Aniss Aiman Medbouhi, Vladislav Polianskii, Anastasia Varava, Danica Kragic","doi":"10.3390/make5010014","DOIUrl":"https://doi.org/10.3390/make5010014","url":null,"abstract":"Variational auto-encoders (VAEs) are deep generative models used for unsupervised learning, however their standard version is not topology-aware in practice since the data topology may not be taken into consideration. In this paper, we propose two different approaches with the aim to preserve the topological structure between the input space and the latent representation of a VAE. Firstly, we introduce InvMap-VAE as a way to turn any dimensionality reduction technique, given an embedding it produces, into a generative model within a VAE framework providing an inverse mapping into original space. Secondly, we propose the Witness Simplicial VAE as an extension of the simplicial auto-encoder to the variational setup using a witness complex for computing the simplicial regularization, and we motivate this method theoretically using tools from algebraic topology. The Witness Simplicial VAE is independent of any dimensionality reduction technique and together with its extension, Isolandmarks Witness Simplicial VAE, preserves the persistent Betti numbers of a dataset better than a standard VAE.","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"101 1","pages":"199-236"},"PeriodicalIF":0.0,"publicationDate":"2023-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79033237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine Learning and Prediction of Infectious Diseases: A Systematic Review 机器学习与传染病预测:系统综述
Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-02-01 DOI: 10.3390/make5010013
O. E. Santangelo, V. Gentile, Stefano Pizzo, D. Giordano, F. Cedrone
The aim of the study is to show whether it is possible to predict infectious disease outbreaks early, by using machine learning. This study was carried out following the guidelines of the Cochrane Collaboration and the meta-analysis of observational studies in epidemiology and the preferred reporting items for systematic reviews and meta-analyses. The suitable bibliography on PubMed/Medline and Scopus was searched by combining text, words, and titles on medical topics. At the end of the search, this systematic review contained 75 records. The studies analyzed in this systematic review demonstrate that it is possible to predict the incidence and trends of some infectious diseases; by combining several techniques and types of machine learning, it is possible to obtain accurate and plausible results.
这项研究的目的是展示是否有可能通过使用机器学习来早期预测传染病的爆发。本研究遵循Cochrane协作网的指导方针和流行病学观察性研究的荟萃分析,以及系统评价和荟萃分析的首选报告项目。通过结合医学主题的文本、单词和标题,在PubMed/Medline和Scopus上搜索合适的书目。在搜索结束时,该系统综述包含75条记录。本系统综述分析的研究表明,对某些传染病的发病率和趋势进行预测是可能的;通过结合几种技术和类型的机器学习,可以获得准确和可信的结果。
{"title":"Machine Learning and Prediction of Infectious Diseases: A Systematic Review","authors":"O. E. Santangelo, V. Gentile, Stefano Pizzo, D. Giordano, F. Cedrone","doi":"10.3390/make5010013","DOIUrl":"https://doi.org/10.3390/make5010013","url":null,"abstract":"The aim of the study is to show whether it is possible to predict infectious disease outbreaks early, by using machine learning. This study was carried out following the guidelines of the Cochrane Collaboration and the meta-analysis of observational studies in epidemiology and the preferred reporting items for systematic reviews and meta-analyses. The suitable bibliography on PubMed/Medline and Scopus was searched by combining text, words, and titles on medical topics. At the end of the search, this systematic review contained 75 records. The studies analyzed in this systematic review demonstrate that it is possible to predict the incidence and trends of some infectious diseases; by combining several techniques and types of machine learning, it is possible to obtain accurate and plausible results.","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"3 1","pages":"175-198"},"PeriodicalIF":0.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75159337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Special Issue "Selected Papers from CD-MAKE 2020 and ARES 2020" 特刊《CD-MAKE 2020及ARES 2020论文选集》
Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-01-20 DOI: 10.3390/make5010012
E. Weippl, A. Holzinger, Peter Kieseberg
In the current era of rapid technological advancement, machine learning (ML) is quickly becoming a dominant force in the development of smart environments [...]
在当今技术飞速发展的时代,机器学习(ML)正迅速成为智能环境开发的主导力量[…]
{"title":"Special Issue \"Selected Papers from CD-MAKE 2020 and ARES 2020\"","authors":"E. Weippl, A. Holzinger, Peter Kieseberg","doi":"10.3390/make5010012","DOIUrl":"https://doi.org/10.3390/make5010012","url":null,"abstract":"In the current era of rapid technological advancement, machine learning (ML) is quickly becoming a dominant force in the development of smart environments [...]","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"69 1","pages":"173-174"},"PeriodicalIF":0.0,"publicationDate":"2023-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81389327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Acknowledgment to the Reviewers of Machine Learning and Knowledge Extraction in 2022 感谢2022年机器学习和知识提取审稿人
Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-01-18 DOI: 10.3390/make5010011
High-quality academic publishing is built on rigorous peer review [...]
高质量的学术出版建立在严格的同行评审的基础上[…]
{"title":"Acknowledgment to the Reviewers of Machine Learning and Knowledge Extraction in 2022","authors":"","doi":"10.3390/make5010011","DOIUrl":"https://doi.org/10.3390/make5010011","url":null,"abstract":"High-quality academic publishing is built on rigorous peer review [...]","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135436114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explainable Machine Learning 可解释的机器学习
Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-01-17 DOI: 10.3390/make5010010
J. Garcke, R. Roscher
Machine learning methods are widely used in commercial applications and in many scientific areas [...]
机器学习方法广泛应用于商业应用和许多科学领域[…]
{"title":"Explainable Machine Learning","authors":"J. Garcke, R. Roscher","doi":"10.3390/make5010010","DOIUrl":"https://doi.org/10.3390/make5010010","url":null,"abstract":"Machine learning methods are widely used in commercial applications and in many scientific areas [...]","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"88 1","pages":"169-170"},"PeriodicalIF":0.0,"publicationDate":"2023-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83818173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
On Deceiving Malware Classification with Section Injection 利用分段注入欺骗恶意软件分类
Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-01-16 DOI: 10.3390/make5010009
Adeilson Antonio da Silva, Maurício Pamplona Segundo
We investigate how to modify executable files to deceive malware classification systems. This work’s main contribution is a methodology to inject bytes across a malware file randomly and use it both as an attack to decrease classification accuracy but also as a defensive method, augmenting the data available for training. It respects the operating system file format to make sure the malware will still execute after our injection and will not change its behavior. We reproduced five state-of-the-art malware classification approaches to evaluate our injection scheme: one based on Global Image Descriptor (GIST) + K-Nearest-Neighbors (KNN), three Convolutional Neural Network (CNN) variations and one Gated CNN. We performed our experiments on a public dataset with 9339 malware samples from 25 different families. Our results show that a mere increase of 7% in the malware size causes an accuracy drop between 25% and 40% for malware family classification. They show that an automatic malware classification system may not be as trustworthy as initially reported in the literature. We also evaluate using modified malware alongside the original ones to increase networks robustness against the mentioned attacks. The results show that a combination of reordering malware sections and injecting random data can improve the overall performance of the classification. All the code is publicly available.
我们研究如何修改可执行文件来欺骗恶意软件分类系统。这项工作的主要贡献是一种方法,可以在恶意软件文件中随机注入字节,并将其用作降低分类准确性的攻击,也可以用作防御方法,增加可用于训练的数据。它尊重操作系统文件格式,以确保恶意软件在我们注入后仍将执行,并且不会改变其行为。我们复制了五种最先进的恶意软件分类方法来评估我们的注入方案:一种基于全局图像描述符(GIST)+K-最近邻居(KNN),三种卷积神经网络(CNN)变体和一种门控CNN。我们在一个公共数据集上进行了实验,该数据集包含来自25个不同家族的9339个恶意软件样本。我们的结果表明,恶意软件大小仅增加7%,就会导致恶意软件家族分类的准确率下降25%至40%。他们表明,自动恶意软件分类系统可能不像文献中最初报道的那样值得信赖。我们还评估了在使用原始恶意软件的同时使用修改后的恶意软件,以提高网络对上述攻击的稳健性。结果表明,重新排序恶意软件部分和注入随机数据的组合可以提高分类的整体性能。所有代码都是公开的。
{"title":"On Deceiving Malware Classification with Section Injection","authors":"Adeilson Antonio da Silva, Maurício Pamplona Segundo","doi":"10.3390/make5010009","DOIUrl":"https://doi.org/10.3390/make5010009","url":null,"abstract":"We investigate how to modify executable files to deceive malware classification systems. This work’s main contribution is a methodology to inject bytes across a malware file randomly and use it both as an attack to decrease classification accuracy but also as a defensive method, augmenting the data available for training. It respects the operating system file format to make sure the malware will still execute after our injection and will not change its behavior. We reproduced five state-of-the-art malware classification approaches to evaluate our injection scheme: one based on Global Image Descriptor (GIST) + K-Nearest-Neighbors (KNN), three Convolutional Neural Network (CNN) variations and one Gated CNN. We performed our experiments on a public dataset with 9339 malware samples from 25 different families. Our results show that a mere increase of 7% in the malware size causes an accuracy drop between 25% and 40% for malware family classification. They show that an automatic malware classification system may not be as trustworthy as initially reported in the literature. We also evaluate using modified malware alongside the original ones to increase networks robustness against the mentioned attacks. The results show that a combination of reordering malware sections and injecting random data can improve the overall performance of the classification. All the code is publicly available.","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47668184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Detection of Temporal Shifts in Semantics Using Local Graph Clustering 基于局部图聚类的语义时间偏移检测
Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-01-13 DOI: 10.3390/make5010008
N. Hwang, S. Chatterjee, Yanming Di, Sharmodeep Bhattacharyya
Many changes in our digital corpus have been brought about by the interplay between rapid advances in digital communication and the current environment characterized by pandemics, political polarization, and social unrest. One such change is the pace with which new words enter the mass vocabulary and the frequency at which meanings, perceptions, and interpretations of existing expressions change. The current state-of-the-art algorithms do not allow for an intuitive and rigorous detection of these changes in word meanings over time. We propose a dynamic graph-theoretic approach to inferring the semantics of words and phrases (“terms”) and detecting temporal shifts. Our approach represents each term as a stochastic time-evolving set of contextual words and is a count-based distributional semantic model in nature. We use local clustering techniques to assess the structural changes in a given word’s contextual words. We demonstrate the efficacy of our method by investigating the changes in the semantics of the phrase “Chinavirus”. We conclude that the term took on a much more pejorative meaning when the White House used the term in the second half of March 2020, although the effect appears to have been temporary. We make both the dataset and the code used to generate this paper’s results available.
数字通信的快速发展与当前以流行病、政治两极分化和社会动荡为特征的环境之间的相互作用,给我们的数字语料库带来了许多变化。其中一个变化是新词进入大众词汇的速度,以及对现有表达的含义、感知和解释发生变化的频率。目前最先进的算法不能直观和严格地检测词义随时间的变化。我们提出了一种动态图论方法来推断单词和短语(“术语”)的语义并检测时间变化。我们的方法将每个术语表示为随机时间进化的上下文词集,本质上是一种基于计数的分布语义模型。我们使用局部聚类技术来评估给定词的上下文词的结构变化。我们通过调查短语“Chinavirus”的语义变化来证明我们的方法的有效性。我们得出的结论是,当白宫在2020年3月下半月使用这个词时,这个词的贬义要大得多,尽管这种影响似乎是暂时的。我们提供了用于生成本文结果的数据集和代码。
{"title":"Detection of Temporal Shifts in Semantics Using Local Graph Clustering","authors":"N. Hwang, S. Chatterjee, Yanming Di, Sharmodeep Bhattacharyya","doi":"10.3390/make5010008","DOIUrl":"https://doi.org/10.3390/make5010008","url":null,"abstract":"Many changes in our digital corpus have been brought about by the interplay between rapid advances in digital communication and the current environment characterized by pandemics, political polarization, and social unrest. One such change is the pace with which new words enter the mass vocabulary and the frequency at which meanings, perceptions, and interpretations of existing expressions change. The current state-of-the-art algorithms do not allow for an intuitive and rigorous detection of these changes in word meanings over time. We propose a dynamic graph-theoretic approach to inferring the semantics of words and phrases (“terms”) and detecting temporal shifts. Our approach represents each term as a stochastic time-evolving set of contextual words and is a count-based distributional semantic model in nature. We use local clustering techniques to assess the structural changes in a given word’s contextual words. We demonstrate the efficacy of our method by investigating the changes in the semantics of the phrase “Chinavirus”. We conclude that the term took on a much more pejorative meaning when the White House used the term in the second half of March 2020, although the effect appears to have been temporary. We make both the dataset and the code used to generate this paper’s results available.","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"298 ","pages":"128-143"},"PeriodicalIF":0.0,"publicationDate":"2023-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72541768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
E2H Distance-Weighted Minimum Reference Set for Numerical and Categorical Mixture Data and a Bayesian Swap Feature Selection Algorithm 数值和分类混合数据的E2H距离加权最小参考集及贝叶斯交换特征选择算法
Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-01-11 DOI: 10.3390/make5010007
Yuto Omae, Masaya Mori
Generally, when developing classification models using supervised learning methods (e.g., support vector machine, neural network, and decision tree), feature selection, as a pre-processing step, is essential to reduce calculation costs and improve the generalization scores. In this regard, the minimum reference set (MRS), which is a feature selection algorithm, can be used. The original MRS considers a feature subset as effective if it leads to the correct classification of all samples by using the 1-nearest neighbor algorithm based on small samples. However, the original MRS is only applicable to numerical features, and the distances between different classes cannot be considered. Therefore, herein, we propose a novel feature subset evaluation algorithm, referred to as the “E2H distance-weighted MRS,” which can be used for a mixture of numerical and categorical features and considers the distances between different classes in the evaluation. Moreover, a Bayesian swap feature selection algorithm, which is used to identify an effective feature subset, is also proposed. The effectiveness of the proposed methods is verified based on experiments conducted using artificially generated data comprising a mixture of numerical and categorical features.
通常,在使用监督学习方法(如支持向量机、神经网络和决策树)开发分类模型时,特征选择作为预处理步骤对于降低计算成本和提高泛化分数至关重要。在这方面,可以使用最小参考集(MRS),这是一种特征选择算法。原始MRS认为,如果一个特征子集使用基于小样本的1近邻算法对所有样本进行正确分类,那么该特征子集就是有效的。但是,原来的MRS只适用于数值特征,不能考虑不同类别之间的距离。因此,在此,我们提出了一种新的特征子集评估算法,称为“E2H距离加权MRS”,该算法可用于数值和分类特征的混合,并在评估中考虑不同类别之间的距离。此外,还提出了一种用于识别有效特征子集的贝叶斯交换特征选择算法。所提出的方法的有效性是基于使用人工生成的数据,包括数字和分类特征的混合实验进行验证。
{"title":"E2H Distance-Weighted Minimum Reference Set for Numerical and Categorical Mixture Data and a Bayesian Swap Feature Selection Algorithm","authors":"Yuto Omae, Masaya Mori","doi":"10.3390/make5010007","DOIUrl":"https://doi.org/10.3390/make5010007","url":null,"abstract":"Generally, when developing classification models using supervised learning methods (e.g., support vector machine, neural network, and decision tree), feature selection, as a pre-processing step, is essential to reduce calculation costs and improve the generalization scores. In this regard, the minimum reference set (MRS), which is a feature selection algorithm, can be used. The original MRS considers a feature subset as effective if it leads to the correct classification of all samples by using the 1-nearest neighbor algorithm based on small samples. However, the original MRS is only applicable to numerical features, and the distances between different classes cannot be considered. Therefore, herein, we propose a novel feature subset evaluation algorithm, referred to as the “E2H distance-weighted MRS,” which can be used for a mixture of numerical and categorical features and considers the distances between different classes in the evaluation. Moreover, a Bayesian swap feature selection algorithm, which is used to identify an effective feature subset, is also proposed. The effectiveness of the proposed methods is verified based on experiments conducted using artificially generated data comprising a mixture of numerical and categorical features.","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"15 1","pages":"109-127"},"PeriodicalIF":0.0,"publicationDate":"2023-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85423013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
XAIR: A Systematic Metareview of Explainable AI (XAI) Aligned to the Software Development Process XAIR:与软件开发过程相结合的可解释AI (XAI)的系统元视图
Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-01-11 DOI: 10.3390/make5010006
Tobias Clement, Nils Kemmerzell, Mohamed Abdelaal, M. Amberg
Currently, explainability represents a major barrier that Artificial Intelligence (AI) is facing in regard to its practical implementation in various application domains. To combat the lack of understanding of AI-based systems, Explainable AI (XAI) aims to make black-box AI models more transparent and comprehensible for humans. Fortunately, plenty of XAI methods have been introduced to tackle the explainability problem from different perspectives. However, due to the vast search space, it is challenging for ML practitioners and data scientists to start with the development of XAI software and to optimally select the most suitable XAI methods. To tackle this challenge, we introduce XAIR, a novel systematic metareview of the most promising XAI methods and tools. XAIR differentiates itself from existing reviews by aligning its results to the five steps of the software development process, including requirement analysis, design, implementation, evaluation, and deployment. Through this mapping, we aim to create a better understanding of the individual steps of developing XAI software and to foster the creation of real-world AI applications that incorporate explainability. Finally, we conclude with highlighting new directions for future research.
目前,可解释性是人工智能(AI)在各个应用领域的实际实施所面临的主要障碍。为了解决人们对基于人工智能的系统缺乏理解的问题,可解释人工智能(Explainable AI, XAI)旨在使人工智能黑箱模型对人类来说更加透明和易于理解。幸运的是,已经引入了许多XAI方法来从不同的角度解决可解释性问题。然而,由于巨大的搜索空间,对于ML从业者和数据科学家来说,从XAI软件的开发开始并优化选择最合适的XAI方法是具有挑战性的。为了应对这一挑战,我们介绍了XAIR,这是一种新颖的系统元视图,包含最有前途的XAI方法和工具。XAIR通过将其结果与软件开发过程的五个步骤(包括需求分析、设计、实现、评估和部署)结合起来,将自己与现有的审查区分开来。通过这种映射,我们的目标是更好地理解开发XAI软件的各个步骤,并促进创建包含可解释性的真实AI应用程序。最后,对今后的研究方向进行了展望。
{"title":"XAIR: A Systematic Metareview of Explainable AI (XAI) Aligned to the Software Development Process","authors":"Tobias Clement, Nils Kemmerzell, Mohamed Abdelaal, M. Amberg","doi":"10.3390/make5010006","DOIUrl":"https://doi.org/10.3390/make5010006","url":null,"abstract":"Currently, explainability represents a major barrier that Artificial Intelligence (AI) is facing in regard to its practical implementation in various application domains. To combat the lack of understanding of AI-based systems, Explainable AI (XAI) aims to make black-box AI models more transparent and comprehensible for humans. Fortunately, plenty of XAI methods have been introduced to tackle the explainability problem from different perspectives. However, due to the vast search space, it is challenging for ML practitioners and data scientists to start with the development of XAI software and to optimally select the most suitable XAI methods. To tackle this challenge, we introduce XAIR, a novel systematic metareview of the most promising XAI methods and tools. XAIR differentiates itself from existing reviews by aligning its results to the five steps of the software development process, including requirement analysis, design, implementation, evaluation, and deployment. Through this mapping, we aim to create a better understanding of the individual steps of developing XAI software and to foster the creation of real-world AI applications that incorporate explainability. Finally, we conclude with highlighting new directions for future research.","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"32 1","pages":"78-108"},"PeriodicalIF":0.0,"publicationDate":"2023-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78826420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
期刊
Machine learning and knowledge extraction
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1