Journal of Intelligent Information Systems最新文献

英文中文

A mutually enhanced multi-scale relation-aware graph convolutional network for argument pair extraction 一种用于参数对提取的相互增强的多尺度关系感知图卷积网络

IF 3.4 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Intelligent Information Systems

Pub Date : 2023-11-30 DOI: 10.1007/s10844-023-00826-9

Xiaofei Zhu, Yidan Liu, Zhuo Chen, Xu Chen, Jiafeng Guo, Stefan Dietze

Argument pair extraction (APE) is a fine-grained task of argument mining which aims to identify arguments offered by different participants in some discourse and detect interaction relationships between arguments from different participants. In recent years, many research efforts have been devoted to dealing with APE in a multi-task learning framework. Although these approaches have achieved encouraging results, they still face several challenging issues. First, different types of sentence relationships as well as different levels of information exchange among sentences are largely ignored. Second, they solely model interactions between argument pairs either in an explicit or implicit strategy, while neglecting the complementary effect of the two strategies. In this paper, we propose a novel Mutually Enhanced Multi-Scale Relation-Aware Graph Convolutional Network (MMR-GCN) for APE. Specifically, we first design a multi-scale relation-aware graph aggregation module to explicitly model the complex relationships between review and rebuttal passage sentences. In addition, we propose a mutually enhancement transformer module to implicitly and interactively enhance representations of review and rebuttal passage sentences. We experimentally validate MMR-GCN by comparing with the state-of-the-art APE methods. Experimental results show that it considerably outperforms all baseline methods, and the relative performance improvement of MMR-GCN over the best performing baseline MRC-APE in terms of F1 score reaches to 3.48% and 4.43% on the two benchmark datasets, respectively.

论点对抽取(APE)是一种细粒度的论点挖掘任务，旨在识别某一话语中不同参与者提供的论点，并检测不同参与者的论点之间的交互关系。近年来，许多研究都致力于在多任务学习框架下处理APE问题。尽管这些方法取得了令人鼓舞的成果，但它们仍然面临一些具有挑战性的问题。首先，不同类型的句子关系以及句子之间不同程度的信息交换在很大程度上被忽略了。其次，它们仅对显式或隐式策略中参数对之间的相互作用进行建模，而忽略了两种策略的互补效应。在本文中，我们提出了一种新的互增强多尺度关系感知图卷积网络(MMR-GCN)。具体来说，我们首先设计了一个多尺度关系感知的图聚合模块来明确地建模评论和反驳段落之间的复杂关系。此外，我们提出了一个相互增强的转换模块，以隐式和交互式地增强评论和反驳段落句子的表示。我们通过比较最先进的APE方法，实验验证了MMR-GCN。实验结果表明，它明显优于所有基线方法，在两个基准数据集上，MMR-GCN相对于表现最好的基线MRC-APE的F1分数的相对性能提升分别达到3.48%和4.43%。

{"title":"A mutually enhanced multi-scale relation-aware graph convolutional network for argument pair extraction","authors":"Xiaofei Zhu, Yidan Liu, Zhuo Chen, Xu Chen, Jiafeng Guo, Stefan Dietze","doi":"10.1007/s10844-023-00826-9","DOIUrl":"https://doi.org/10.1007/s10844-023-00826-9","url":null,"abstract":"Argument pair extraction (APE) is a fine-grained task of argument mining which aims to identify arguments offered by different participants in some discourse and detect interaction relationships between arguments from different participants. In recent years, many research efforts have been devoted to dealing with APE in a multi-task learning framework. Although these approaches have achieved encouraging results, they still face several challenging issues. First, different types of sentence relationships as well as different levels of information exchange among sentences are largely ignored. Second, they solely model interactions between argument pairs either in an explicit or implicit strategy, while neglecting the complementary effect of the two strategies. In this paper, we propose a novel Mutually Enhanced Multi-Scale Relation-Aware Graph Convolutional Network (MMR-GCN) for APE. Specifically, we first design a multi-scale relation-aware graph aggregation module to explicitly model the complex relationships between review and rebuttal passage sentences. In addition, we propose a mutually enhancement transformer module to implicitly and interactively enhance representations of review and rebuttal passage sentences. We experimentally validate MMR-GCN by comparing with the state-of-the-art APE methods. Experimental results show that it considerably outperforms all baseline methods, and the relative performance improvement of MMR-GCN over the best performing baseline MRC-APE in terms of F1 score reaches to 3.48% and 4.43% on the two benchmark datasets, respectively.","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"56 8","pages":""},"PeriodicalIF":3.4,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138518264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

T-shaped expert mining: a novel approach based on skill translation and focal loss t型专家挖掘:一种基于技能转换和焦点丢失的新方法

IF 3.4 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Intelligent Information Systems

Pub Date : 2023-11-28 DOI: 10.1007/s10844-023-00831-y

Zohreh Fallahnejad, Mahmood Karimian, Fatemeh Lashkari, Hamid Beigy

Hiring knowledgeable and cost-effective individuals, who use their knowledge and expertise to boost the organization, is extremely important for organizations as they are the most valuable assets. T-shaped experts are the best option based on agile methodology. The T-shaped professional has a deep understanding of one topic and broad knowledge of several others. Compared to other types of professionals, T-shaped professionals are better communicators and cheaper to hire. Finding T-shaped experts in a given skill area requires determining each candidate’s depth of knowledge and shape of expertise. To estimate each candidate’s depth of knowledge in a given skill area, we propose a translation-based method that utilizes two attention-based skill translation models to overcome the vocabulary mismatch between skills and user documents. We also propose two new approaches based on binary cross-entropy and focal loss to determine whether each user is T-shaped. Our experiments on three collections of the StackOverflow dataset demonstrate the efficiency of our proposed method compared to the state-of-the-art approaches.

雇佣知识渊博、成本效益高的人，利用他们的知识和专长来推动组织发展，对组织来说是极其重要的，因为他们是最有价值的资产。t型专家是基于敏捷方法的最佳选择。t型专业人士对一个主题有深刻的理解，对其他几个主题有广泛的了解。与其他类型的专业人士相比，t型专业人士更善于沟通，雇佣成本也更低。在特定的技能领域找到t型专家需要确定每个候选人的知识深度和专业知识的形状。为了估计每个候选人在给定技能领域的知识深度，我们提出了一种基于翻译的方法，该方法利用两个基于注意力的技能翻译模型来克服技能和用户文档之间的词汇不匹配。我们还提出了基于二元交叉熵和焦点损失的两种新方法来确定每个用户是否为t形。我们在StackOverflow数据集的三个集合上的实验表明，与最先进的方法相比，我们提出的方法是有效的。

{"title":"T-shaped expert mining: a novel approach based on skill translation and focal loss","authors":"Zohreh Fallahnejad, Mahmood Karimian, Fatemeh Lashkari, Hamid Beigy","doi":"10.1007/s10844-023-00831-y","DOIUrl":"https://doi.org/10.1007/s10844-023-00831-y","url":null,"abstract":"Hiring knowledgeable and cost-effective individuals, who use their knowledge and expertise to boost the organization, is extremely important for organizations as they are the most valuable assets. T-shaped experts are the best option based on agile methodology. The T-shaped professional has a deep understanding of one topic and broad knowledge of several others. Compared to other types of professionals, T-shaped professionals are better communicators and cheaper to hire. Finding T-shaped experts in a given skill area requires determining each candidate’s depth of knowledge and shape of expertise. To estimate each candidate’s depth of knowledge in a given skill area, we propose a translation-based method that utilizes two attention-based skill translation models to overcome the vocabulary mismatch between skills and user documents. We also propose two new approaches based on binary cross-entropy and focal loss to determine whether each user is T-shaped. Our experiments on three collections of the StackOverflow dataset demonstrate the efficiency of our proposed method compared to the state-of-the-art approaches.","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"56 9","pages":""},"PeriodicalIF":3.4,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138518263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing anomaly detectors with LatentOut 利用LatentOut增强异常检测器

IF 3.4 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Intelligent Information Systems

Pub Date : 2023-11-24 DOI: 10.1007/s10844-023-00829-6

Fabrizio Angiulli, Fabio Fassetti, Luca Ferragina

({{textbf{Latent}}varvec{Out}}) is a recently introduced algorithm for unsupervised anomaly detection which enhances latent space-based neural methods, namely (Variational) Autoencoders, GANomaly and ANOGan architectures. The main idea behind it is to exploit both the latent space and the baseline score of these architectures in order to provide a refined anomaly score performing density estimation in the augmented latent-space/baseline-score feature space. In this paper we investigate the performance of ({{textbf{Latent}}varvec{Out}}) acting as a one-class classifier and we experiment the combination of ({{textbf{Latent}}varvec{Out}}) with GAAL architectures, a novel type of Generative Adversarial Networks for unsupervised anomaly detection. Moreover, we show that the feature space induced by ({{textbf{Latent}}varvec{Out}}) has the characteristic to enhance the separation between normal and anomalous data. Indeed, we prove that standard data mining outlier detection methods perform better when applied on this novel augmented latent space rather than on the original data space.

({{textbf{Latent}}varvec{Out}}) 是最近引入的一种用于无监督异常检测的算法，它增强了基于潜在空间的神经方法，即(变分)自编码器、GANomaly和ANOGan架构。其背后的主要思想是利用这些架构的潜在空间和基线分数，以便在增强的潜在空间/基线分数特征空间中提供执行密度估计的精细异常分数。在本文中，我们研究了({{textbf{Latent}}varvec{Out}})作为单类分类器的性能，并实验了({{textbf{Latent}}varvec{Out}})与GAAL架构的组合，GAAL架构是一种用于无监督异常检测的新型生成对抗网络。此外，我们还证明了({{textbf{Latent}}varvec{Out}})诱导的特征空间具有增强正常和异常数据分离的特性。事实上，我们证明了标准的数据挖掘离群点检测方法在应用于这种新的增强潜在空间时比应用于原始数据空间时表现得更好。

{"title":"Enhancing anomaly detectors with LatentOut","authors":"Fabrizio Angiulli, Fabio Fassetti, Luca Ferragina","doi":"10.1007/s10844-023-00829-6","DOIUrl":"https://doi.org/10.1007/s10844-023-00829-6","url":null,"abstract":"({{textbf{Latent}}varvec{Out}}) is a recently introduced algorithm for unsupervised anomaly detection which enhances latent space-based neural methods, namely (Variational) Autoencoders, GANomaly and ANOGan architectures. The main idea behind it is to exploit both the latent space and the baseline score of these architectures in order to provide a refined anomaly score performing density estimation in the augmented latent-space/baseline-score feature space. In this paper we investigate the performance of ({{textbf{Latent}}varvec{Out}}) acting as a one-class classifier and we experiment the combination of ({{textbf{Latent}}varvec{Out}}) with GAAL architectures, a novel type of Generative Adversarial Networks for unsupervised anomaly detection. Moreover, we show that the feature space induced by ({{textbf{Latent}}varvec{Out}}) has the characteristic to enhance the separation between normal and anomalous data. Indeed, we prove that standard data mining outlier detection methods perform better when applied on this novel augmented latent space rather than on the original data space.","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"77 10","pages":""},"PeriodicalIF":3.4,"publicationDate":"2023-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138518258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A transformer-based framework for predicting geomagnetic indices with uncertainty quantification 基于变压器的不确定量化地磁指数预测框架

IF 3.4 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Intelligent Information Systems

Pub Date : 2023-11-18 DOI: 10.1007/s10844-023-00828-7

Yasser Abduallah, Jason T. L. Wang, Haimin Wang, Ju Jing

Geomagnetic activities have a crucial impact on Earth, which can affect spacecraft and electrical power grids. Geospace scientists use a geomagnetic index, called the Kp index, to describe the overall level of geomagnetic activity. This index is an important indicator of disturbances in the Earth’s magnetic field and is used by the U.S. Space Weather Prediction Center as an alert and warning service for users who may be affected by the disturbances. Another commonly used index, called the ap index, is converted from the Kp index. Early and accurate prediction of the Kp and ap indices is essential for preparedness and disaster risk management. In this paper, we present a deep learning framework, named GNet, to perform short-term forecasting of the Kp and ap indices. Specifically, GNet takes as input time series of solar wind parameters’ values, provided by NASA’s Space Science Data Coordinated Archive, and predicts as output the Kp and ap indices respectively at time point (varvec{t + w}) hours for a given time point (varvec{t}) where (varvec{w}) ranges from 1 to 9. GNet combines transformer encoder blocks with Bayesian inference, which is capable of quantifying both aleatoric uncertainty (data uncertainty) and epistemic uncertainty (model uncertainty) in making predictions. Experimental results show that GNet outperforms closely related machine learning methods in terms of the root mean square error and R-squared score. Furthermore, GNet can provide both data and model uncertainty quantification results, which the existing methods cannot offer. To our knowledge, this is the first time that Bayesian transformers have been used for geomagnetic activity prediction.

地磁活动对地球有至关重要的影响，它可以影响航天器和电网。地球空间科学家使用一种地磁指数，称为Kp指数，来描述地磁活动的总体水平。该指标是地球磁场扰动的重要指标，被美国空间天气预报中心用作可能受到干扰影响的用户的警报和预警服务。另一个常用的指数，称为ap指数，是由Kp指数转换而来的。Kp和ap指数的早期和准确预测对于备灾和灾害风险管理至关重要。在本文中，我们提出了一个名为GNet的深度学习框架，用于对Kp和ap指数进行短期预测。具体而言，GNet以NASA空间科学数据协调档案提供的太阳风参数值时间序列作为输入，分别预测给定时间点(varvec{t}) ((varvec{w})的取值范围为1 ～ 9)(varvec{t + w})小时的Kp和ap指数作为输出。GNet将变压器编码器块与贝叶斯推理相结合，能够在预测中量化任意不确定性(数据不确定性)和认知不确定性(模型不确定性)。实验结果表明，GNet在均方根误差和r平方分数方面优于密切相关的机器学习方法。此外，GNet可以提供现有方法无法提供的数据和模型不确定性量化结果。据我们所知，这是贝叶斯变压器第一次被用于地磁活动预测。

{"title":"A transformer-based framework for predicting geomagnetic indices with uncertainty quantification","authors":"Yasser Abduallah, Jason T. L. Wang, Haimin Wang, Ju Jing","doi":"10.1007/s10844-023-00828-7","DOIUrl":"https://doi.org/10.1007/s10844-023-00828-7","url":null,"abstract":"Geomagnetic activities have a crucial impact on Earth, which can affect spacecraft and electrical power grids. Geospace scientists use a geomagnetic index, called the Kp index, to describe the overall level of geomagnetic activity. This index is an important indicator of disturbances in the Earth’s magnetic field and is used by the U.S. Space Weather Prediction Center as an alert and warning service for users who may be affected by the disturbances. Another commonly used index, called the ap index, is converted from the Kp index. Early and accurate prediction of the Kp and ap indices is essential for preparedness and disaster risk management. In this paper, we present a deep learning framework, named GNet, to perform short-term forecasting of the Kp and ap indices. Specifically, GNet takes as input time series of solar wind parameters’ values, provided by NASA’s Space Science Data Coordinated Archive, and predicts as output the Kp and ap indices respectively at time point (varvec{t + w}) hours for a given time point (varvec{t}) where (varvec{w}) ranges from 1 to 9. GNet combines transformer encoder blocks with Bayesian inference, which is capable of quantifying both aleatoric uncertainty (data uncertainty) and epistemic uncertainty (model uncertainty) in making predictions. Experimental results show that GNet outperforms closely related machine learning methods in terms of the root mean square error and R-squared score. Furthermore, GNet can provide both data and model uncertainty quantification results, which the existing methods cannot offer. To our knowledge, this is the first time that Bayesian transformers have been used for geomagnetic activity prediction.","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"2 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2023-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138516069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

EqBal-RS: Mitigating popularity bias in recommender systems EqBal-RS:减轻推荐系统中的流行偏见

3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Intelligent Information Systems

Pub Date : 2023-11-07 DOI: 10.1007/s10844-023-00817-w

Shivam Gupta, Kirandeep Kaur, Shweta Jain

引用次数: 0

BMDF-SR: bidirectional multi-sequence decoupling fusion method for sequential recommendation 序列推荐的双向多序列解耦融合方法

3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Intelligent Information Systems

Pub Date : 2023-11-06 DOI: 10.1007/s10844-023-00825-w

Aohua Gao, Jiwei Qin, Chao Ma, Tao Wang

引用次数: 0

Learning autoencoder ensembles for detecting malware hidden communications in IoT ecosystems 学习自动编码器集成，用于检测物联网生态系统中的恶意软件隐藏通信

3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Intelligent Information Systems

Pub Date : 2023-11-03 DOI: 10.1007/s10844-023-00819-8

Nunziato Cassavia, Luca Caviglione, Massimo Guarascio, Angelica Liguori, Marco Zuppelli

Abstract Modern IoT ecosystems are the preferred target of threat actors wanting to incorporate resource-constrained devices within a botnet or leak sensitive information. A major research effort is then devoted to create countermeasures for mitigating attacks, for instance, hardware-level verification mechanisms or effective network intrusion detection frameworks. Unfortunately, advanced malware is often endowed with the ability of cloaking communications within network traffic, e.g., to orchestrate compromised IoT nodes or exfiltrate data without being noticed. Therefore, this paper showcases how different autoencoder-based architectures can spot the presence of malicious communications hidden in conversations, especially in the TTL of IPv4 traffic. To conduct tests, this work considers IoT traffic traces gathered in a real setting and the presence of an attacker deploying two hiding schemes (i.e., naive and “elusive” approaches). Collected results showcase the effectiveness of our method as well as the feasibility of deploying autoencoders in production-quality IoT settings.

现代物联网生态系统是威胁行为者的首选目标，他们希望将资源受限的设备纳入僵尸网络或泄露敏感信息。然后，主要的研究工作致力于创建缓解攻击的对策，例如，硬件级验证机制或有效的网络入侵检测框架。不幸的是，高级恶意软件通常具有隐藏网络流量中的通信的能力，例如，编排受损的物联网节点或在不被发现的情况下泄露数据。因此，本文展示了不同的基于自动编码器的架构如何发现隐藏在会话中的恶意通信的存在，特别是在IPv4流量的TTL中。为了进行测试，这项工作考虑了在真实环境中收集的物联网流量痕迹以及部署两种隐藏方案(即幼稚和“难以捉摸”的方法)的攻击者的存在。收集的结果展示了我们方法的有效性，以及在生产质量的物联网设置中部署自动编码器的可行性。

{"title":"Learning autoencoder ensembles for detecting malware hidden communications in IoT ecosystems","authors":"Nunziato Cassavia, Luca Caviglione, Massimo Guarascio, Angelica Liguori, Marco Zuppelli","doi":"10.1007/s10844-023-00819-8","DOIUrl":"https://doi.org/10.1007/s10844-023-00819-8","url":null,"abstract":"Abstract Modern IoT ecosystems are the preferred target of threat actors wanting to incorporate resource-constrained devices within a botnet or leak sensitive information. A major research effort is then devoted to create countermeasures for mitigating attacks, for instance, hardware-level verification mechanisms or effective network intrusion detection frameworks. Unfortunately, advanced malware is often endowed with the ability of cloaking communications within network traffic, e.g., to orchestrate compromised IoT nodes or exfiltrate data without being noticed. Therefore, this paper showcases how different autoencoder-based architectures can spot the presence of malicious communications hidden in conversations, especially in the TTL of IPv4 traffic. To conduct tests, this work considers IoT traffic traces gathered in a real setting and the presence of an attacker deploying two hiding schemes (i.e., naive and “elusive” approaches). Collected results showcase the effectiveness of our method as well as the feasibility of deploying autoencoders in production-quality IoT settings.","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"7 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135820617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Finding a needle in a haystack: insights on feature selection for classification tasks 大海捞针:对分类任务特征选择的见解

3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Intelligent Information Systems

Pub Date : 2023-11-03 DOI: 10.1007/s10844-023-00823-y

Laura Morán-Fernández, Verónica Bolón-Canedo

Abstract The growth of Big Data has resulted in an overwhelming increase in the volume of data available, including the number of features. Feature selection, the process of selecting relevant features and discarding irrelevant ones, has been successfully used to reduce the dimensionality of datasets. However, with numerous feature selection approaches in the literature, determining the best strategy for a specific problem is not straightforward. In this study, we compare the performance of various feature selection approaches to a random selection to identify the most effective strategy for a given type of problem. We use a large number of datasets to cover a broad range of real-world challenges. We evaluate the performance of seven popular feature selection approaches and five classifiers. Our findings show that feature selection is a valuable tool in machine learning and that correlation-based feature selection is the most effective strategy regardless of the scenario. Additionally, we found that using improper thresholds with ranker approaches produces results as poor as randomly selecting a subset of features.

大数据的增长导致了可用数据量的压倒性增长，包括特征的数量。特征选择，即选择相关特征并丢弃不相关特征的过程，已被成功地用于降低数据集的维数。然而，由于文献中有许多特征选择方法，确定针对特定问题的最佳策略并不简单。在本研究中，我们将各种特征选择方法的性能与随机选择方法进行比较，以确定针对给定类型问题的最有效策略。我们使用大量的数据集来涵盖广泛的现实世界挑战。我们评估了七种流行的特征选择方法和五种分类器的性能。我们的研究结果表明，特征选择是机器学习中有价值的工具，基于相关性的特征选择是最有效的策略，无论场景如何。此外，我们发现使用不适当的阈值与排名方法产生的结果与随机选择特征子集一样差。

{"title":"Finding a needle in a haystack: insights on feature selection for classification tasks","authors":"Laura Morán-Fernández, Verónica Bolón-Canedo","doi":"10.1007/s10844-023-00823-y","DOIUrl":"https://doi.org/10.1007/s10844-023-00823-y","url":null,"abstract":"Abstract The growth of Big Data has resulted in an overwhelming increase in the volume of data available, including the number of features. Feature selection, the process of selecting relevant features and discarding irrelevant ones, has been successfully used to reduce the dimensionality of datasets. However, with numerous feature selection approaches in the literature, determining the best strategy for a specific problem is not straightforward. In this study, we compare the performance of various feature selection approaches to a random selection to identify the most effective strategy for a given type of problem. We use a large number of datasets to cover a broad range of real-world challenges. We evaluate the performance of seven popular feature selection approaches and five classifiers. Our findings show that feature selection is a valuable tool in machine learning and that correlation-based feature selection is the most effective strategy regardless of the scenario. Additionally, we found that using improper thresholds with ranker approaches produces results as poor as randomly selecting a subset of features.","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"29 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135868757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Leveraging neighborhood and path information for influential spreaders recognition in complex networks 利用邻域和路径信息识别复杂网络中有影响力的传播者

3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Intelligent Information Systems

Pub Date : 2023-11-01 DOI: 10.1007/s10844-023-00822-z

Aman Ullah, JinFang Sheng, Bin Wang, Salah Ud Din, Nasrullah Khan

引用次数: 0

Session-aware recommender system using double deep reinforcement learning 会话感知推荐系统使用双重深度强化学习

3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Intelligent Information Systems

Pub Date : 2023-11-01 DOI: 10.1007/s10844-023-00824-x

Purnima Khurana, Bhavna Gupta, Ravish Sharma, Punam Bedi

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Journal of Intelligent Information Systems

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀