首页 > 最新文献

Big Data and Cognitive Computing最新文献

英文 中文
Understanding the Influence of Genre-Specific Music Using Network Analysis and Machine Learning Algorithms 利用网络分析和机器学习算法了解特定音乐流派的影响
IF 3.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-04 DOI: 10.3390/bdcc7040180
Bishal Lamichhane, Aniket Kumar Singh, Sumana Devkota, Uttam Dhakal, Subham Singh, Chandra Dhakal
This study analyzes a network of musical influence using machine learning and network analysis techniques. A directed network model is used to represent the influence relations between artists as nodes and edges. Network properties and centrality measures are analyzed to identify influential patterns. In addition, influence within and outside the genre is quantified using in-genre and out-genre weights. Regression analysis is performed to determine the impact of musical attributes on influence. We find that speechiness, acousticness, and valence are the top features of the most influential artists. We also introduce the IRDI, an algorithm that provides an innovative approach to quantify an artist’s influence by capturing the degree of dominance among their followers. This approach underscores influential artists who drive the evolution of music, setting trends and significantly inspiring a new generation of artists. The independent cascade model is further employed to open up the temporal dynamics of influence propagation across the entire musical network, highlighting how initial seeds of influence can contagiously spread through the network. This multidisciplinary approach provides a nuanced understanding of musical influence that refines existing methods and sheds light on influential trends and dynamics.
本研究使用机器学习和网络分析技术分析了音乐影响网络。使用有向网络模型将艺术家之间的影响关系表示为节点和边。分析了网络特性和中心性度量,以确定影响模式。此外,使用类型内和类型外权重来量化类型内和类型外的影响。进行回归分析以确定音乐属性对影响的影响。我们发现,最具影响力的艺术家的最主要特征是言语性、声学性和价性。我们还介绍了IRDI,这是一种算法,它提供了一种创新的方法,通过捕捉艺术家在追随者中的主导程度来量化艺术家的影响力。这种方法强调了有影响力的艺术家,他们推动了音乐的发展,引领了潮流,并极大地激励了新一代艺术家。独立级联模型进一步揭示了整个音乐网络中影响传播的时间动态,突出了影响的初始种子如何通过网络传染传播。这种多学科的方法提供了对音乐影响的细致理解,改进了现有的方法,并揭示了有影响力的趋势和动态。
{"title":"Understanding the Influence of Genre-Specific Music Using Network Analysis and Machine Learning Algorithms","authors":"Bishal Lamichhane, Aniket Kumar Singh, Sumana Devkota, Uttam Dhakal, Subham Singh, Chandra Dhakal","doi":"10.3390/bdcc7040180","DOIUrl":"https://doi.org/10.3390/bdcc7040180","url":null,"abstract":"This study analyzes a network of musical influence using machine learning and network analysis techniques. A directed network model is used to represent the influence relations between artists as nodes and edges. Network properties and centrality measures are analyzed to identify influential patterns. In addition, influence within and outside the genre is quantified using in-genre and out-genre weights. Regression analysis is performed to determine the impact of musical attributes on influence. We find that speechiness, acousticness, and valence are the top features of the most influential artists. We also introduce the IRDI, an algorithm that provides an innovative approach to quantify an artist’s influence by capturing the degree of dominance among their followers. This approach underscores influential artists who drive the evolution of music, setting trends and significantly inspiring a new generation of artists. The independent cascade model is further employed to open up the temporal dynamics of influence propagation across the entire musical network, highlighting how initial seeds of influence can contagiously spread through the network. This multidisciplinary approach provides a nuanced understanding of musical influence that refines existing methods and sheds light on influential trends and dynamics.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":"13 12","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138603177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward Morphologic Atlasing of the Human Whole Brain at the Nanoscale 在纳米尺度上绘制人类全脑形态图
IF 3.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-01 DOI: 10.3390/bdcc7040179
W. Nowinski
Although no dataset at the nanoscale for the entire human brain has yet been acquired and neither a nanoscale human whole brain atlas has been constructed, tremendous progress in neuroimaging and high-performance computing makes them feasible in the non-distant future. To construct the human whole brain nanoscale atlas, there are several challenges, and here, we address two, i.e., the morphology modeling of the brain at the nanoscale and designing of a nanoscale brain atlas. A new nanoscale neuronal format is introduced to describe data necessary and sufficient to model the entire human brain at the nanoscale, enabling calculations of the synaptome and connectome. The design of the nanoscale brain atlas covers design principles, content, architecture, navigation, functionality, and user interface. Three novel design principles are introduced supporting navigation, exploration, and calculations, namely, a gross neuroanatomy-guided navigation of micro/nanoscale neuroanatomy; a movable and zoomable sampling volume of interest for navigation and exploration; and a nanoscale data processing in a parallel-pipeline mode exploiting parallelism resulting from the decomposition of gross neuroanatomy parcellated into structures and regions as well as nano neuroanatomy decomposed into neurons and synapses, enabling the distributed construction and continual enhancement of the nanoscale atlas. Numerous applications of this atlas can be contemplated ranging from proofreading and continual multi-site extension to exploration, morphometric and network-related analyses, and knowledge discovery. To my best knowledge, this is the first proposed neuronal morphology nanoscale model and the first attempt to design a human whole brain atlas at the nanoscale.
虽然目前还没有纳米尺度的整个人类大脑数据集,也没有纳米尺度的人类全脑图谱,但神经成像和高性能计算的巨大进步使它们在不久的将来成为可能。构建人类全脑纳米图谱面临诸多挑战,本文主要解决两个问题,即纳米尺度的脑形态学建模和纳米尺度脑图谱的设计。介绍了一种新的纳米级神经元格式来描述在纳米尺度上模拟整个人类大脑所必需和足够的数据,使突触组和连接组的计算成为可能。纳米级脑图谱的设计涵盖了设计原则、内容、架构、导航、功能和用户界面。介绍了支持导航、探索和计算的三种新的设计原则,即微/纳米尺度神经解剖学的总神经解剖学引导导航;可移动和可缩放的采样体,用于导航和勘探;采用并行管道模式的纳米级数据处理,利用了将总体神经解剖分解为结构和区域以及将纳米神经解剖分解为神经元和突触所产生的并行性,从而实现了纳米级图谱的分布式构建和持续增强。该地图集的许多应用可以考虑从校对和持续多站点扩展到探索,形态测量学和网络相关分析以及知识发现。据我所知,这是第一个提出的神经元形态纳米尺度模型,也是第一次尝试在纳米尺度上设计人类全脑图谱。
{"title":"Toward Morphologic Atlasing of the Human Whole Brain at the Nanoscale","authors":"W. Nowinski","doi":"10.3390/bdcc7040179","DOIUrl":"https://doi.org/10.3390/bdcc7040179","url":null,"abstract":"Although no dataset at the nanoscale for the entire human brain has yet been acquired and neither a nanoscale human whole brain atlas has been constructed, tremendous progress in neuroimaging and high-performance computing makes them feasible in the non-distant future. To construct the human whole brain nanoscale atlas, there are several challenges, and here, we address two, i.e., the morphology modeling of the brain at the nanoscale and designing of a nanoscale brain atlas. A new nanoscale neuronal format is introduced to describe data necessary and sufficient to model the entire human brain at the nanoscale, enabling calculations of the synaptome and connectome. The design of the nanoscale brain atlas covers design principles, content, architecture, navigation, functionality, and user interface. Three novel design principles are introduced supporting navigation, exploration, and calculations, namely, a gross neuroanatomy-guided navigation of micro/nanoscale neuroanatomy; a movable and zoomable sampling volume of interest for navigation and exploration; and a nanoscale data processing in a parallel-pipeline mode exploiting parallelism resulting from the decomposition of gross neuroanatomy parcellated into structures and regions as well as nano neuroanatomy decomposed into neurons and synapses, enabling the distributed construction and continual enhancement of the nanoscale atlas. Numerous applications of this atlas can be contemplated ranging from proofreading and continual multi-site extension to exploration, morphometric and network-related analyses, and knowledge discovery. To my best knowledge, this is the first proposed neuronal morphology nanoscale model and the first attempt to design a human whole brain atlas at the nanoscale.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" May","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138610890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Managing Cybersecurity Threats and Increasing Organizational Resilience 管理网络安全威胁,提高组织复原力
IF 3.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-11-22 DOI: 10.3390/bdcc7040177
Peter R. J. Trim, Yang-Im Lee
Cyber security is high up on the agenda of senior managers in private and public sector organizations and is likely to remain so for the foreseeable future. [...]
网络安全在私营和公共部门组织高级管理人员的议事日程中占据重要位置,在可预见的未来,网络安全可能仍将如此。[...]
{"title":"Managing Cybersecurity Threats and Increasing Organizational Resilience","authors":"Peter R. J. Trim, Yang-Im Lee","doi":"10.3390/bdcc7040177","DOIUrl":"https://doi.org/10.3390/bdcc7040177","url":null,"abstract":"Cyber security is high up on the agenda of senior managers in private and public sector organizations and is likely to remain so for the foreseeable future. [...]","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":"87 ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139250668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A New Approach to Data Analysis Using Machine Learning for Cybersecurity 利用机器学习进行网络安全数据分析的新方法
IF 3.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-11-21 DOI: 10.3390/bdcc7040176
Shivashankar Hiremath, Eeshan Shetty, A. J. Prakash, S. Sahoo, Kiran Kumar Patro, Kandala N. V. P. S. Rajesh, Paweł Pławiak
The internet has become an indispensable tool for organizations, permeating every facet of their operations. Virtually all companies leverage Internet services for diverse purposes, including the digital storage of data in databases and cloud platforms. Furthermore, the rising demand for software and applications has led to a widespread shift toward computer-based activities within the corporate landscape. However, this digital transformation has exposed the information technology (IT) infrastructures of these organizations to a heightened risk of cyber-attacks, endangering sensitive data. Consequently, organizations must identify and address vulnerabilities within their systems, with a primary focus on scrutinizing customer-facing websites and applications. This work aims to tackle this pressing issue by employing data analysis tools, such as Power BI, to assess vulnerabilities within a client’s application or website. Through a rigorous analysis of data, valuable insights and information will be provided, which are necessary to formulate effective remedial measures against potential attacks. Ultimately, the central goal of this research is to demonstrate that clients can establish a secure environment, shielding their digital assets from potential attackers.
互联网已成为企业不可或缺的工具,渗透到企业运营的方方面面。几乎所有公司都利用互联网服务来实现各种目的,包括在数据库和云平台中以数字方式存储数据。此外,对软件和应用程序的需求不断增长,导致企业内部普遍转向基于计算机的活动。然而,这种数字化转型使这些组织的信息技术(IT)基础设施面临更高的网络攻击风险,从而危及敏感数据。因此,企业必须识别并解决系统中的漏洞,重点是仔细检查面向客户的网站和应用程序。这项工作旨在利用 Power BI 等数据分析工具来评估客户应用程序或网站中的漏洞,从而解决这一紧迫问题。通过对数据的严格分析,将提供有价值的见解和信息,这些见解和信息是针对潜在攻击制定有效补救措施所必需的。最终,本研究的核心目标是证明客户可以建立一个安全的环境,保护其数字资产免受潜在攻击者的侵害。
{"title":"A New Approach to Data Analysis Using Machine Learning for Cybersecurity","authors":"Shivashankar Hiremath, Eeshan Shetty, A. J. Prakash, S. Sahoo, Kiran Kumar Patro, Kandala N. V. P. S. Rajesh, Paweł Pławiak","doi":"10.3390/bdcc7040176","DOIUrl":"https://doi.org/10.3390/bdcc7040176","url":null,"abstract":"The internet has become an indispensable tool for organizations, permeating every facet of their operations. Virtually all companies leverage Internet services for diverse purposes, including the digital storage of data in databases and cloud platforms. Furthermore, the rising demand for software and applications has led to a widespread shift toward computer-based activities within the corporate landscape. However, this digital transformation has exposed the information technology (IT) infrastructures of these organizations to a heightened risk of cyber-attacks, endangering sensitive data. Consequently, organizations must identify and address vulnerabilities within their systems, with a primary focus on scrutinizing customer-facing websites and applications. This work aims to tackle this pressing issue by employing data analysis tools, such as Power BI, to assess vulnerabilities within a client’s application or website. Through a rigorous analysis of data, valuable insights and information will be provided, which are necessary to formulate effective remedial measures against potential attacks. Ultimately, the central goal of this research is to demonstrate that clients can establish a secure environment, shielding their digital assets from potential attackers.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":"1 1","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139253609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Empowering Propaganda Detection in Resource-Restraint Languages: A Transformer-Based Framework for Classifying Hindi News Articles 增强资源受限语言的宣传检测能力:基于变换器的印地语新闻文章分类框架
IF 3.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-11-15 DOI: 10.3390/bdcc7040175
Deptii D. Chaudhari, Ambika Vishal Pawar
Misinformation, fake news, and various propaganda techniques are increasingly used in digital media. It becomes challenging to uncover propaganda as it works with the systematic goal of influencing other individuals for the determined ends. While significant research has been reported on propaganda identification and classification in resource-rich languages such as English, much less effort has been made in resource-deprived languages like Hindi. The spread of propaganda in the Hindi news media has induced our attempt to devise an approach for the propaganda categorization of Hindi news articles. The unavailability of the necessary language tools makes propaganda classification in Hindi more challenging. This study proposes the effective use of deep learning and transformer-based approaches for Hindi computational propaganda classification. To address the lack of pretrained word embeddings in Hindi, Hindi Word2vec embeddings were created using the H-Prop-News corpus for feature extraction. Subsequently, three deep learning models, i.e., CNN (convolutional neural network), LSTM (long short-term memory), Bi-LSTM (bidirectional long short-term memory); and four transformer-based models, i.e., multi-lingual BERT, Distil-BERT, Hindi-BERT, and Hindi-TPU-Electra, were experimented with. The experimental outcomes indicate that the multi-lingual BERT and Hindi-BERT models provide the best performance, with the highest F1 score of 84% on the test data. These results strongly support the efficacy of the proposed solution and indicate its appropriateness for propaganda classification.
错误信息、假新闻和各种宣传手段在数字媒体中的使用越来越多。由于宣传的系统性目标是影响他人以达到既定目的,因此揭露宣传变得极具挑战性。在英语等资源丰富的语言中,对宣传的识别和分类已有大量研究报道,但在印地语等资源匮乏的语言中,这方面的研究却少得多。宣传在印地语新闻媒体中的传播促使我们尝试设计一种对印地语新闻文章进行宣传分类的方法。由于缺乏必要的语言工具,印地语的宣传分类更具挑战性。本研究提出有效利用深度学习和基于转换器的方法来进行印地语计算宣传分类。为了解决印地语缺乏预训练词嵌入的问题,我们使用 H-Prop-News 语料库创建了印地语 Word2vec 嵌入,用于特征提取。随后,对三种深度学习模型,即 CNN(卷积神经网络)、LSTM(长短期记忆)和 Bi-LSTM(双向长短期记忆),以及四种基于转换器的模型,即多语言 BERT、Distil-BERT、Handi-BERT 和 Hindi-TPU-Electra 进行了实验。实验结果表明,多语言 BERT 和 Hindi-BERT 模型性能最佳,在测试数据上的 F1 分数最高,达到 84%。这些结果有力地证明了所提解决方案的有效性,并表明其适用于宣传分类。
{"title":"Empowering Propaganda Detection in Resource-Restraint Languages: A Transformer-Based Framework for Classifying Hindi News Articles","authors":"Deptii D. Chaudhari, Ambika Vishal Pawar","doi":"10.3390/bdcc7040175","DOIUrl":"https://doi.org/10.3390/bdcc7040175","url":null,"abstract":"Misinformation, fake news, and various propaganda techniques are increasingly used in digital media. It becomes challenging to uncover propaganda as it works with the systematic goal of influencing other individuals for the determined ends. While significant research has been reported on propaganda identification and classification in resource-rich languages such as English, much less effort has been made in resource-deprived languages like Hindi. The spread of propaganda in the Hindi news media has induced our attempt to devise an approach for the propaganda categorization of Hindi news articles. The unavailability of the necessary language tools makes propaganda classification in Hindi more challenging. This study proposes the effective use of deep learning and transformer-based approaches for Hindi computational propaganda classification. To address the lack of pretrained word embeddings in Hindi, Hindi Word2vec embeddings were created using the H-Prop-News corpus for feature extraction. Subsequently, three deep learning models, i.e., CNN (convolutional neural network), LSTM (long short-term memory), Bi-LSTM (bidirectional long short-term memory); and four transformer-based models, i.e., multi-lingual BERT, Distil-BERT, Hindi-BERT, and Hindi-TPU-Electra, were experimented with. The experimental outcomes indicate that the multi-lingual BERT and Hindi-BERT models provide the best performance, with the highest F1 score of 84% on the test data. These results strongly support the efficacy of the proposed solution and indicate its appropriateness for propaganda classification.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":"27 2","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139274664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimization of Cryptocurrency Algorithmic Trading Strategies Using the Decomposition Approach 使用分解法优化加密货币算法交易策略
IF 3.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-11-14 DOI: 10.3390/bdcc7040174
Sherin M. Omran, Wessam H. El-Behaidy, A. Youssif
A cryptocurrency is a non-centralized form of money that facilitates financial transactions using cryptographic processes. It can be thought of as a virtual currency or a payment mechanism for sending and receiving money online. Cryptocurrencies have gained wide market acceptance and rapid development during the past few years. Due to the volatile nature of the crypto-market, cryptocurrency trading involves a high level of risk. In this paper, a new normalized decomposition-based, multi-objective particle swarm optimization (N-MOPSO/D) algorithm is presented for cryptocurrency algorithmic trading. The aim of this algorithm is to help traders find the best Litecoin trading strategies that improve their outcomes. The proposed algorithm is used to manage the trade-offs among three objectives: the return on investment, the Sortino ratio, and the number of trades. A hybrid weight assignment mechanism has also been proposed. It was compared against the trading rules with their standard parameters, MOPSO/D, using normalized weighted Tchebycheff scalarization, and MOEA/D. The proposed algorithm could outperform the counterpart algorithms for benchmark and real-world problems. Results showed that the proposed algorithm is very promising and stable under different market conditions. It could maintain the best returns and risk during both training and testing with a moderate number of trades.
加密货币是一种非中心化的货币形式,利用加密过程促进金融交易。它可以被视为一种虚拟货币或一种在线收发货币的支付机制。在过去几年中,加密货币获得了广泛的市场认可和快速发展。由于加密货币市场的不稳定性,加密货币交易涉及高风险。本文针对加密货币算法交易提出了一种新的基于归一化分解的多目标粒子群优化(N-MOPSO/D)算法。该算法的目的是帮助交易者找到最佳的莱特币交易策略,从而提高交易结果。所提出的算法用于管理三个目标之间的权衡:投资回报率、索蒂诺比率和交易次数。此外,还提出了一种混合权重分配机制。它与带有标准参数的交易规则、MOPSO/D(使用归一化加权 Tchebycheff 标量化)和 MOEA/D 进行了比较。在基准问题和实际问题上,所提出的算法优于其他算法。结果表明,所提出的算法在不同的市场条件下都具有很好的前景和稳定性。在适量交易的情况下,该算法在训练和测试期间都能保持最佳收益和风险。
{"title":"Optimization of Cryptocurrency Algorithmic Trading Strategies Using the Decomposition Approach","authors":"Sherin M. Omran, Wessam H. El-Behaidy, A. Youssif","doi":"10.3390/bdcc7040174","DOIUrl":"https://doi.org/10.3390/bdcc7040174","url":null,"abstract":"A cryptocurrency is a non-centralized form of money that facilitates financial transactions using cryptographic processes. It can be thought of as a virtual currency or a payment mechanism for sending and receiving money online. Cryptocurrencies have gained wide market acceptance and rapid development during the past few years. Due to the volatile nature of the crypto-market, cryptocurrency trading involves a high level of risk. In this paper, a new normalized decomposition-based, multi-objective particle swarm optimization (N-MOPSO/D) algorithm is presented for cryptocurrency algorithmic trading. The aim of this algorithm is to help traders find the best Litecoin trading strategies that improve their outcomes. The proposed algorithm is used to manage the trade-offs among three objectives: the return on investment, the Sortino ratio, and the number of trades. A hybrid weight assignment mechanism has also been proposed. It was compared against the trading rules with their standard parameters, MOPSO/D, using normalized weighted Tchebycheff scalarization, and MOEA/D. The proposed algorithm could outperform the counterpart algorithms for benchmark and real-world problems. Results showed that the proposed algorithm is very promising and stable under different market conditions. It could maintain the best returns and risk during both training and testing with a moderate number of trades.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":"17 1","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139276436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Semantic Adjacency Criterion in Time Intervals Mining 时间间隔挖掘中的语义邻接准则
Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-11-09 DOI: 10.3390/bdcc7040173
Alexander Shknevsky, Yuval Shahar, Robert Moskovitch
We propose a new pruning constraint when mining frequent temporal patterns to be used as classification and prediction features, the Semantic Adjacency Criterion [SAC], which filters out temporal patterns that contain potentially semantically contradictory components, exploiting each medical domain’s knowledge. We have defined three SAC versions and tested them within three medical domains (oncology, hepatitis, diabetes) and a frequent-temporal-pattern discovery framework. Previously, we had shown that using SAC enhances the repeatability of discovering the same temporal patterns in similar proportions in different patient groups within the same clinical domain. Here, we focused on SAC’s computational implications for pattern discovery, and for classification and prediction, using the discovered patterns as features, by four different machine-learning methods: Random Forests, Naïve Bayes, SVM, and Logistic Regression. Using SAC resulted in a significant reduction, across all medical domains and classification methods, of up to 97% in the number of discovered temporal patterns, and in the runtime of the discovery process, of up to 98%. Nevertheless, the highly reduced set of only semantically transparent patterns, when used as features, resulted in classification and prediction models whose performance was at least as good as the models resulting from using the complete temporal-pattern set.
我们提出了一种新的修剪约束,当挖掘频繁的时间模式用于分类和预测特征时,语义邻接准则[SAC],它过滤掉包含潜在语义矛盾成分的时间模式,利用每个医学领域的知识。我们定义了三个SAC版本,并在三个医学领域(肿瘤学、肝炎、糖尿病)和一个频繁时间模式发现框架中对它们进行了测试。之前,我们已经证明,使用SAC可以提高在同一临床领域内不同患者组中以相似比例发现相同时间模式的可重复性。在这里,我们关注SAC在模式发现、分类和预测方面的计算意义,使用发现的模式作为特征,通过四种不同的机器学习方法:随机森林、Naïve贝叶斯、支持向量机和逻辑回归。在所有医学领域和分类方法中,使用SAC可以显著减少发现的时间模式的数量,最多可减少97%,在发现过程的运行时中,最多可减少98%。然而,当使用高度简化的语义透明模式集作为特征时,产生的分类和预测模型的性能至少与使用完整时间模式集产生的模型一样好。
{"title":"The Semantic Adjacency Criterion in Time Intervals Mining","authors":"Alexander Shknevsky, Yuval Shahar, Robert Moskovitch","doi":"10.3390/bdcc7040173","DOIUrl":"https://doi.org/10.3390/bdcc7040173","url":null,"abstract":"We propose a new pruning constraint when mining frequent temporal patterns to be used as classification and prediction features, the Semantic Adjacency Criterion [SAC], which filters out temporal patterns that contain potentially semantically contradictory components, exploiting each medical domain’s knowledge. We have defined three SAC versions and tested them within three medical domains (oncology, hepatitis, diabetes) and a frequent-temporal-pattern discovery framework. Previously, we had shown that using SAC enhances the repeatability of discovering the same temporal patterns in similar proportions in different patient groups within the same clinical domain. Here, we focused on SAC’s computational implications for pattern discovery, and for classification and prediction, using the discovered patterns as features, by four different machine-learning methods: Random Forests, Naïve Bayes, SVM, and Logistic Regression. Using SAC resulted in a significant reduction, across all medical domains and classification methods, of up to 97% in the number of discovered temporal patterns, and in the runtime of the discovery process, of up to 98%. Nevertheless, the highly reduced set of only semantically transparent patterns, when used as features, resulted in classification and prediction models whose performance was at least as good as the models resulting from using the complete temporal-pattern set.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" 94","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135191533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Evaluation of Short-Term Rockburst Risk Severity Using Machine Learning Methods 利用机器学习方法评估短期岩爆风险严重程度
Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-11-07 DOI: 10.3390/bdcc7040172
Aibing Jin, Prabhat Basnet, Shakil Mahtab
In deep engineering, rockburst hazards frequently result in injuries, fatalities, and the destruction of contiguous structures. Due to the complex nature of rockbursts, predicting the severity of rockburst damage (intensity) without the aid of computer models is challenging. Although there are various predictive models in existence, effectively identifying the risk severity in imbalanced data remains crucial. The ensemble boosting method is often better suited to dealing with unequally distributed classes than are classical models. Therefore, this paper employs the ensemble categorical gradient boosting (CGB) method to predict short-term rockburst risk severity. After data collection, principal component analysis (PCA) was employed to avoid the redundancies caused by multi-collinearity. Afterwards, the CGB was trained on PCA data, optimal hyper-parameters were retrieved using the grid-search technique to predict the test samples, and performance was evaluated using precision, recall, and F1 score metrics. The results showed that the PCA-CGB model achieved better results in prediction than did the single CGB model or conventional boosting methods. The model achieved an F1 score of 0.8952, indicating that the proposed model is robust in predicting damage severity given an imbalanced dataset. This work provides practical guidance in risk management.
在深部工程中,岩爆灾害经常造成人员伤亡和相邻结构的破坏。由于岩爆的复杂性,在没有计算机模型的帮助下预测岩爆损伤的严重程度(强度)是具有挑战性的。尽管存在各种预测模型,但有效识别不平衡数据中的风险严重程度仍然至关重要。集成增强方法通常比经典模型更适合于处理不均匀分布的类。因此,本文采用集合分类梯度提升法(CGB)预测短期岩爆风险严重程度。数据采集后,采用主成分分析(PCA)避免多重共线性造成的冗余。然后,在PCA数据上对CGB进行训练,使用网格搜索技术检索最优超参数来预测测试样本,并使用精度、召回率和F1分数指标来评估性能。结果表明,PCA-CGB模型的预测效果优于单一CGB模型或常规助推方法。该模型的F1得分为0.8952,表明该模型在不平衡数据集下预测损伤严重程度具有鲁棒性。这项工作为风险管理提供了实际指导。
{"title":"Evaluation of Short-Term Rockburst Risk Severity Using Machine Learning Methods","authors":"Aibing Jin, Prabhat Basnet, Shakil Mahtab","doi":"10.3390/bdcc7040172","DOIUrl":"https://doi.org/10.3390/bdcc7040172","url":null,"abstract":"In deep engineering, rockburst hazards frequently result in injuries, fatalities, and the destruction of contiguous structures. Due to the complex nature of rockbursts, predicting the severity of rockburst damage (intensity) without the aid of computer models is challenging. Although there are various predictive models in existence, effectively identifying the risk severity in imbalanced data remains crucial. The ensemble boosting method is often better suited to dealing with unequally distributed classes than are classical models. Therefore, this paper employs the ensemble categorical gradient boosting (CGB) method to predict short-term rockburst risk severity. After data collection, principal component analysis (PCA) was employed to avoid the redundancies caused by multi-collinearity. Afterwards, the CGB was trained on PCA data, optimal hyper-parameters were retrieved using the grid-search technique to predict the test samples, and performance was evaluated using precision, recall, and F1 score metrics. The results showed that the PCA-CGB model achieved better results in prediction than did the single CGB model or conventional boosting methods. The model achieved an F1 score of 0.8952, indicating that the proposed model is robust in predicting damage severity given an imbalanced dataset. This work provides practical guidance in risk management.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135433118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Social Trend Mining: Lead or Lag 社交趋势挖掘:领先还是落后
Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-11-07 DOI: 10.3390/bdcc7040171
Hossein Hassani, Nadejda Komendantova, Elena Rovenskaya, Mohammad Reza Yeganegi
This research underscores the profound implications of Social Intelligence Mining, notably employing open access data and Google Search engine data for trend discernment. Utilizing advanced analytical methodologies, including wavelet coherence analysis and phase difference, hidden relationships and patterns within social data were revealed. These techniques furnish an enriched comprehension of social phenomena dynamics, bolstering decision-making processes. The study’s versatility extends across myriad domains, offering insights into public sentiment and the foresight for strategic approaches. The findings suggest immense potential in Social Intelligence Mining to influence strategies, foster innovation, and add value across diverse sectors.
这项研究强调了社会智能挖掘的深刻含义,特别是使用开放获取数据和谷歌搜索引擎数据进行趋势识别。利用先进的分析方法,包括小波相干分析和相位差,揭示了社会数据中隐藏的关系和模式。这些技术提供了对社会现象动态的丰富理解,支持了决策过程。这项研究的多功能性跨越了无数领域,提供了对公众情绪的洞察和对战略方法的预见。研究结果表明,社会智能挖掘在影响不同行业的战略、促进创新和增加价值方面具有巨大潜力。
{"title":"Social Trend Mining: Lead or Lag","authors":"Hossein Hassani, Nadejda Komendantova, Elena Rovenskaya, Mohammad Reza Yeganegi","doi":"10.3390/bdcc7040171","DOIUrl":"https://doi.org/10.3390/bdcc7040171","url":null,"abstract":"This research underscores the profound implications of Social Intelligence Mining, notably employing open access data and Google Search engine data for trend discernment. Utilizing advanced analytical methodologies, including wavelet coherence analysis and phase difference, hidden relationships and patterns within social data were revealed. These techniques furnish an enriched comprehension of social phenomena dynamics, bolstering decision-making processes. The study’s versatility extends across myriad domains, offering insights into public sentiment and the foresight for strategic approaches. The findings suggest immense potential in Social Intelligence Mining to influence strategies, foster innovation, and add value across diverse sectors.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":"2 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135433108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Arabic Toxic Tweet Classification: Leveraging the AraBERT Model 阿拉伯语有毒推文分类:利用AraBERT模型
Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-10-26 DOI: 10.3390/bdcc7040170
Amr Mohamed El Koshiry, Entesar Hamed I. Eliwa, Tarek Abd El-Hafeez, Ahmed Omar
Social media platforms have become the primary means of communication and information sharing, facilitating interactive exchanges among users. Unfortunately, these platforms also witness the dissemination of inappropriate and toxic content, including hate speech and insults. While significant efforts have been made to classify toxic content in the English language, the same level of attention has not been given to Arabic texts. This study addresses this gap by constructing a standardized Arabic dataset specifically designed for toxic tweet classification. The dataset is annotated automatically using Google’s Perspective API and the expertise of three native Arabic speakers and linguists. To evaluate the performance of different models, we conduct a series of experiments using seven models: long short-term memory (LSTM), bidirectional LSTM, a convolutional neural network, a gated recurrent unit (GRU), bidirectional GRU, multilingual bidirectional encoder representations from transformers, and AraBERT. Additionally, we employ word embedding techniques. Our experimental findings demonstrate that the fine-tuned AraBERT model surpasses the performance of other models, achieving an impressive accuracy of 0.9960. Notably, this accuracy value outperforms similar approaches reported in recent literature. This study represents a significant advancement in Arabic toxic tweet classification, shedding light on the importance of addressing toxicity in social media platforms while considering diverse languages and cultures.
社交媒体平台已经成为沟通和信息分享的主要手段,方便了用户之间的互动交流。不幸的是,这些平台也见证了不恰当和有毒内容的传播,包括仇恨言论和侮辱。虽然已作出重大努力对英语语文的有毒内容进行分类,但对阿拉伯语文本却没有给予同样的重视。本研究通过构建一个专门为有毒推文分类设计的标准化阿拉伯语数据集来解决这一差距。该数据集使用Google的Perspective API和三位母语为阿拉伯语的语言学家的专业知识自动注释。为了评估不同模型的性能,我们使用七个模型进行了一系列实验:长短期记忆(LSTM),双向LSTM,卷积神经网络,门通循环单元(GRU),双向GRU,多语言双向编码器表示来自变压器和AraBERT。此外,我们还采用了词嵌入技术。我们的实验结果表明,经过微调的AraBERT模型的性能优于其他模型,达到了令人印象深刻的0.9960的精度。值得注意的是,该精度值优于最近文献中报道的类似方法。这项研究代表了阿拉伯语有毒推文分类的重大进步,揭示了在考虑不同语言和文化的情况下解决社交媒体平台毒性问题的重要性。
{"title":"Arabic Toxic Tweet Classification: Leveraging the AraBERT Model","authors":"Amr Mohamed El Koshiry, Entesar Hamed I. Eliwa, Tarek Abd El-Hafeez, Ahmed Omar","doi":"10.3390/bdcc7040170","DOIUrl":"https://doi.org/10.3390/bdcc7040170","url":null,"abstract":"Social media platforms have become the primary means of communication and information sharing, facilitating interactive exchanges among users. Unfortunately, these platforms also witness the dissemination of inappropriate and toxic content, including hate speech and insults. While significant efforts have been made to classify toxic content in the English language, the same level of attention has not been given to Arabic texts. This study addresses this gap by constructing a standardized Arabic dataset specifically designed for toxic tweet classification. The dataset is annotated automatically using Google’s Perspective API and the expertise of three native Arabic speakers and linguists. To evaluate the performance of different models, we conduct a series of experiments using seven models: long short-term memory (LSTM), bidirectional LSTM, a convolutional neural network, a gated recurrent unit (GRU), bidirectional GRU, multilingual bidirectional encoder representations from transformers, and AraBERT. Additionally, we employ word embedding techniques. Our experimental findings demonstrate that the fine-tuned AraBERT model surpasses the performance of other models, achieving an impressive accuracy of 0.9960. Notably, this accuracy value outperforms similar approaches reported in recent literature. This study represents a significant advancement in Arabic toxic tweet classification, shedding light on the importance of addressing toxicity in social media platforms while considering diverse languages and cultures.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":"105 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134907884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Big Data and Cognitive Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1