首页 > 最新文献

International Journal of Data Science and Analytics最新文献

英文 中文
A new discrete XLindley distribution: theory, actuarial measures, inference, and applications 一个新的离散XLindley分布:理论、精算措施、推断和应用
IF 2.4 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-05-05 DOI: 10.1007/s41060-023-00395-8
Ahmed Sedky Eldeeb, Muhammad Ahsan-ul-Haq, Ayesha Babar
{"title":"A new discrete XLindley distribution: theory, actuarial measures, inference, and applications","authors":"Ahmed Sedky Eldeeb, Muhammad Ahsan-ul-Haq, Ayesha Babar","doi":"10.1007/s41060-023-00395-8","DOIUrl":"https://doi.org/10.1007/s41060-023-00395-8","url":null,"abstract":"","PeriodicalId":45667,"journal":{"name":"International Journal of Data Science and Analytics","volume":"2 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84524380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Machine learning with big data to solve real-world problems 用大数据的机器学习来解决现实问题
IF 2.4 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-04-23 DOI: 10.59615/jda.2.1.9
M. Rahmaty
Machine learning algorithms use big data to learn future trends and predict them for businesses. Machine learning can be very efficient for deciphering data in industries where understanding consumer patterns can lead to big improvements. The use of machine learning can be a giant leap for businesses and cannot simply be integrated as the top layer. This requires redefining workflow, architecture, data collection and storage, analytics, and other modules. The magnitude of the system overhaul should be assessed and clearly communicated to the appropriate stakeholders. The main focus of machine learning is to develop computer programs that can access data and use it to learn. The learning process starts with observations or data, to find a pattern in the data and make better decisions. The main goal of data analysis using machine learning is that it allows the computer to learn automatically without human intervention and help and can adjust its actions accordingly. Considering the many applications that data analysis has found in the real world, therefore, in this article, a review of the basic applications of machine learning as one of the tools of artificial intelligence has been done with an emphasis on big data analysis. The purpose of this article is to understand the dimensions, components and applications, and challenges of using machine learning in the real world.
机器学习算法使用大数据来学习未来趋势,并为企业做出预测。在了解消费者模式可以带来巨大改进的行业中,机器学习可以非常有效地破译数据。机器学习的使用对企业来说是一个巨大的飞跃,不能简单地集成为顶层。这需要重新定义工作流、架构、数据收集和存储、分析和其他模块。应该评估系统改革的规模,并清楚地传达给适当的利益相关者。机器学习的主要重点是开发能够访问数据并使用数据进行学习的计算机程序。学习过程从观察或数据开始,在数据中找到一种模式,做出更好的决策。使用机器学习进行数据分析的主要目标是,它允许计算机在没有人为干预和帮助的情况下自动学习,并可以相应地调整其行为。考虑到数据分析在现实世界中发现的许多应用,因此,在本文中,机器学习作为人工智能工具之一的基本应用的回顾已经完成,重点是大数据分析。本文的目的是了解在现实世界中使用机器学习的维度、组件和应用程序以及挑战。
{"title":"Machine learning with big data to solve real-world problems","authors":"M. Rahmaty","doi":"10.59615/jda.2.1.9","DOIUrl":"https://doi.org/10.59615/jda.2.1.9","url":null,"abstract":"Machine learning algorithms use big data to learn future trends and predict them for businesses. Machine learning can be very efficient for deciphering data in industries where understanding consumer patterns can lead to big improvements. The use of machine learning can be a giant leap for businesses and cannot simply be integrated as the top layer. This requires redefining workflow, architecture, data collection and storage, analytics, and other modules. The magnitude of the system overhaul should be assessed and clearly communicated to the appropriate stakeholders. The main focus of machine learning is to develop computer programs that can access data and use it to learn. The learning process starts with observations or data, to find a pattern in the data and make better decisions. The main goal of data analysis using machine learning is that it allows the computer to learn automatically without human intervention and help and can adjust its actions accordingly. Considering the many applications that data analysis has found in the real world, therefore, in this article, a review of the basic applications of machine learning as one of the tools of artificial intelligence has been done with an emphasis on big data analysis. The purpose of this article is to understand the dimensions, components and applications, and challenges of using machine learning in the real world.","PeriodicalId":45667,"journal":{"name":"International Journal of Data Science and Analytics","volume":"1 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87588189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Concepts and applications of data mining and analysis of social networks 数据挖掘和社会网络分析的概念和应用
IF 2.4 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-04-22 DOI: 10.59615/jda.2.1.1
Azam Hajiaghajani
Social media has become an important reference for information during the last few decades. They have been able to be effective in various fields such as business, entertainment, science, crisis management, politics, etc. For this reason, a social media analysis has become very important for researchers and large companies. The widespread use of social media leads to a complex problem called "accumulation of data". Many data science specialists seek to analyze this data in order to identify the behavioral characteristics of users, analyze interests and needs, and improve marketing processes. Different social media platforms have the ability to use all kinds of media, including text data, video, video, audio, and location information, etc. Therefore, data analysis in social networks is very important. In this research, the concepts and applications of data analysis in social networks will be investigated.
在过去的几十年里,社交媒体已经成为一个重要的信息参考。他们已经能够在商业、娱乐、科学、危机管理、政治等各个领域发挥作用。因此,对研究人员和大公司来说,社交媒体分析变得非常重要。社交媒体的广泛使用导致了一个名为“数据积累”的复杂问题。许多数据科学专家试图分析这些数据,以确定用户的行为特征,分析兴趣和需求,并改进营销流程。不同的社交媒体平台具有使用各种媒体的能力,包括文本数据、视频、视频、音频、位置信息等。因此,社交网络中的数据分析是非常重要的。在本研究中,数据分析的概念和应用在社会网络将被调查。
{"title":"Concepts and applications of data mining and analysis of social networks","authors":"Azam Hajiaghajani","doi":"10.59615/jda.2.1.1","DOIUrl":"https://doi.org/10.59615/jda.2.1.1","url":null,"abstract":"Social media has become an important reference for information during the last few decades. They have been able to be effective in various fields such as business, entertainment, science, crisis management, politics, etc. For this reason, a social media analysis has become very important for researchers and large companies. The widespread use of social media leads to a complex problem called \"accumulation of data\". Many data science specialists seek to analyze this data in order to identify the behavioral characteristics of users, analyze interests and needs, and improve marketing processes. Different social media platforms have the ability to use all kinds of media, including text data, video, video, audio, and location information, etc. Therefore, data analysis in social networks is very important. In this research, the concepts and applications of data analysis in social networks will be investigated.","PeriodicalId":45667,"journal":{"name":"International Journal of Data Science and Analytics","volume":"20 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82679483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Statistical power, accuracy, reproducibility and robustness of a graph clusterability test 统计能力,准确性,再现性和稳健性的图聚类性测试
IF 2.4 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-04-16 DOI: 10.1007/s41060-023-00389-6
P. Miasnikof, Alexander Y. Shestopaloff, A. Raigorodskii
{"title":"Statistical power, accuracy, reproducibility and robustness of a graph clusterability test","authors":"P. Miasnikof, Alexander Y. Shestopaloff, A. Raigorodskii","doi":"10.1007/s41060-023-00389-6","DOIUrl":"https://doi.org/10.1007/s41060-023-00389-6","url":null,"abstract":"","PeriodicalId":45667,"journal":{"name":"International Journal of Data Science and Analytics","volume":"13 1","pages":"379-390"},"PeriodicalIF":2.4,"publicationDate":"2023-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87684302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A data analytics framework for reliable bus arrival time prediction using artificial neural networks 基于人工神经网络的可靠公交到达时间预测数据分析框架
IF 2.4 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-04-12 DOI: 10.1007/s41060-023-00391-y
E. Hassannayebi, Ali Farjad, A. Azadnia, Mehrdad Javidi, R. Chunduri
{"title":"A data analytics framework for reliable bus arrival time prediction using artificial neural networks","authors":"E. Hassannayebi, Ali Farjad, A. Azadnia, Mehrdad Javidi, R. Chunduri","doi":"10.1007/s41060-023-00391-y","DOIUrl":"https://doi.org/10.1007/s41060-023-00391-y","url":null,"abstract":"","PeriodicalId":45667,"journal":{"name":"International Journal of Data Science and Analytics","volume":"23 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81239072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating narrative visualization: a survey of practitioners. 评价叙事可视化:对从业者的调查。
IF 2.4 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-03-31 DOI: 10.1007/s41060-023-00394-9
Nina Errey, Jie Liang, Tuck Wah Leong, Didar Zowghi

Narrative visualization is characterized by the integration of data visualization and storytelling techniques. These characteristics provide challenges in its evaluation. Little is known about how these evaluation challenges are addressed by narrative visualization practitioners. We surveyed experienced narrative visualization practitioners to investigate their methods of evaluation. To gain deeper insight we conducted a series of semi-structured interviews with practitioners. We found that there is usually an informal approach to narrative visualization evaluation, where practitioners rely on prior experience and their peers for evaluation. Our study also revealed novel approaches to evaluation. We introduce a practice-led heuristic framework to aid practitioners to evaluate narrative visualization systematically. Our practice-led heuristic framework couples first-hand practitioner experience with recent research literature. This work sheds light on how to address narrative visualization evaluation to better inform both academic research and practice.

叙事可视化的特点是数据可视化和讲故事技术的结合。这些特点对其评估提出了挑战。关于叙事可视化从业者如何应对这些评估挑战,我们知之甚少。我们调查了经验丰富的叙事可视化从业者,以调查他们的评估方法。为了获得更深入的见解,我们对从业者进行了一系列半结构化的采访。我们发现,叙事可视化评估通常有一种非正式的方法,从业者依靠先前的经验和同行进行评估。我们的研究还揭示了新的评估方法。我们引入了一个以实践为导向的启发式框架,以帮助从业者系统地评估叙事可视化。我们以实践为导向的启发式框架将第一手从业者经验与最近的研究文献相结合。这项工作揭示了如何处理叙事可视化评估,以更好地为学术研究和实践提供信息。
{"title":"Evaluating narrative visualization: a survey of practitioners.","authors":"Nina Errey,&nbsp;Jie Liang,&nbsp;Tuck Wah Leong,&nbsp;Didar Zowghi","doi":"10.1007/s41060-023-00394-9","DOIUrl":"10.1007/s41060-023-00394-9","url":null,"abstract":"<p><p>Narrative visualization is characterized by the integration of data visualization and storytelling techniques. These characteristics provide challenges in its evaluation. Little is known about how these evaluation challenges are addressed by narrative visualization practitioners. We surveyed experienced narrative visualization practitioners to investigate their methods of evaluation. To gain deeper insight we conducted a series of semi-structured interviews with practitioners. We found that there is usually an informal approach to narrative visualization evaluation, where practitioners rely on prior experience and their peers for evaluation. Our study also revealed novel approaches to evaluation. We introduce a practice-led heuristic framework to aid practitioners to evaluate narrative visualization systematically. Our practice-led heuristic framework couples first-hand practitioner experience with recent research literature. This work sheds light on how to address narrative visualization evaluation to better inform both academic research and practice.</p>","PeriodicalId":45667,"journal":{"name":"International Journal of Data Science and Analytics","volume":" ","pages":"1-16"},"PeriodicalIF":2.4,"publicationDate":"2023-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10064970/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10093964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Identity2Vec: learning mesoscopic structural identity representations via Poisson probability metric Identity2Vec:通过泊松概率度量学习介观结构同一性表示
IF 2.4 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-03-23 DOI: 10.1007/s41060-023-00390-z
I. V. Oluigbo, H. Seba, Mohammed Haddad
{"title":"Identity2Vec: learning mesoscopic structural identity representations via Poisson probability metric","authors":"I. V. Oluigbo, H. Seba, Mohammed Haddad","doi":"10.1007/s41060-023-00390-z","DOIUrl":"https://doi.org/10.1007/s41060-023-00390-z","url":null,"abstract":"","PeriodicalId":45667,"journal":{"name":"International Journal of Data Science and Analytics","volume":"97 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77463197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A data science approach to risk assessment for automobile insurance policies 汽车保险政策风险评估的数据科学方法
Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-03-22 DOI: 10.1007/s41060-023-00392-x
Patrick Hosein
In order to determine a suitable automobile insurance policy premium, one needs to take into account three factors: the risk associated with the drivers and cars on the policy, the operational costs associated with management of the policy and the desired profit margin. The premium should then be some function of these three values. We focus on risk assessment using a data science approach. Instead of using the traditional frequency and severity metrics, we instead predict the total claims that will be made by a new customer using historical data of current and past policies. Given multiple features of the policy (age and gender of drivers, value of car, previous accidents, etc.), one can potentially try to provide personalized insurance policies based specifically on these features as follows. We can compute the average claims made per year of all past and current policies with identical features and then take an average over these claim rates. Unfortunately there may not be sufficient samples to obtain a robust average. We can instead try to include policies that are “similar” to obtain sufficient samples for a robust average. We therefore face a trade-off between personalization (only using closely similar policies) and robustness (extending the domain far enough to capture sufficient samples). This is known as the bias–variance trade-off. We model this problem and determine the optimal trade-off between the two (i.e., the balance that provides the highest prediction accuracy) and apply it to the claim rate prediction problem. We demonstrate our approach using real data.
为了确定一个合适的汽车保险单保费,人们需要考虑三个因素:与保险单上的司机和汽车相关的风险,与保险单管理相关的运营成本以及期望的利润率。溢价应该是这三个值的函数。我们专注于使用数据科学方法进行风险评估。我们没有使用传统的频率和严重性指标,而是使用当前和过去保单的历史数据来预测新客户将提出的总索赔。给定保单的多个特征(驾驶员的年龄和性别、汽车的价值、以前的事故等),可以尝试根据这些特征提供个性化的保单,具体如下。我们可以计算具有相同特征的所有过去和当前保单每年的平均索赔额,然后对这些索赔率取平均值。不幸的是,可能没有足够的样本来获得稳健的平均值。相反,我们可以尝试包含“相似”的策略,以获得足够的样本来获得稳健的平均值。因此,我们面临着个性化(只使用非常相似的策略)和鲁棒性(将域扩展到足够远以捕获足够的样本)之间的权衡。这就是所谓的偏差-方差权衡。我们对这个问题进行建模,并确定两者之间的最佳权衡(即,提供最高预测精度的平衡),并将其应用于索赔率预测问题。我们使用真实数据来演示我们的方法。
{"title":"A data science approach to risk assessment for automobile insurance policies","authors":"Patrick Hosein","doi":"10.1007/s41060-023-00392-x","DOIUrl":"https://doi.org/10.1007/s41060-023-00392-x","url":null,"abstract":"In order to determine a suitable automobile insurance policy premium, one needs to take into account three factors: the risk associated with the drivers and cars on the policy, the operational costs associated with management of the policy and the desired profit margin. The premium should then be some function of these three values. We focus on risk assessment using a data science approach. Instead of using the traditional frequency and severity metrics, we instead predict the total claims that will be made by a new customer using historical data of current and past policies. Given multiple features of the policy (age and gender of drivers, value of car, previous accidents, etc.), one can potentially try to provide personalized insurance policies based specifically on these features as follows. We can compute the average claims made per year of all past and current policies with identical features and then take an average over these claim rates. Unfortunately there may not be sufficient samples to obtain a robust average. We can instead try to include policies that are “similar” to obtain sufficient samples for a robust average. We therefore face a trade-off between personalization (only using closely similar policies) and robustness (extending the domain far enough to capture sufficient samples). This is known as the bias–variance trade-off. We model this problem and determine the optimal trade-off between the two (i.e., the balance that provides the highest prediction accuracy) and apply it to the claim rate prediction problem. We demonstrate our approach using real data.","PeriodicalId":45667,"journal":{"name":"International Journal of Data Science and Analytics","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136196365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Privacy preserving cold-start recommendation for out-of-matrix users via content baskets 通过内容篮为矩阵外用户提供保护隐私的冷启动推荐
IF 2.4 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-03-12 DOI: 10.1007/s41060-023-00388-7
Michael Sun, Andrew Wang
{"title":"Privacy preserving cold-start recommendation for out-of-matrix users via content baskets","authors":"Michael Sun, Andrew Wang","doi":"10.1007/s41060-023-00388-7","DOIUrl":"https://doi.org/10.1007/s41060-023-00388-7","url":null,"abstract":"","PeriodicalId":45667,"journal":{"name":"International Journal of Data Science and Analytics","volume":"14 1","pages":"1-17"},"PeriodicalIF":2.4,"publicationDate":"2023-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82039816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Fake news detection: deep semantic representation with enhanced feature engineering. 假新闻检测:具有增强特征工程的深层语义表示。
IF 3.4 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-03-09 DOI: 10.1007/s41060-023-00387-8
Mohammadreza Samadi, Saeedeh Momtazi

Due to the widespread use of social media, people are exposed to fake news and misinformation. Spreading fake news has adverse effects on both the general public and governments. This issue motivated researchers to utilize advanced natural language processing concepts to detect such misinformation in social media. Despite the recent research studies that only focused on semantic features extracted by deep contextualized text representation models, we aim to show that content-based feature engineering can enhance the semantic models in a complex task like fake news detection. These features can provide valuable information from different aspects of input texts and assist our neural classifier in detecting fake and real news more accurately than using semantic features. To substantiate the effectiveness of feature engineering besides semantic features, we proposed a deep neural architecture in which three parallel convolutional neural network (CNN) layers extract semantic features from contextual representation vectors. Then, semantic and content-based features are fed to a fully connected layer. We evaluated our model on an English dataset about the COVID-19 pandemic and a domain-independent Persian fake news dataset (TAJ). Our experiments on the English COVID-19 dataset show 4.16% and 4.02% improvement in accuracy and f1-score, respectively, compared to the baseline model, which does not benefit from the content-based features. We also achieved 2.01% and 0.69% improvement in accuracy and f1-score, respectively, compared to the state-of-the-art results reported by Shifath et al. (A transformer based approach for fighting covid-19 fake news, arXiv preprint arXiv:2101.12027, 2021). Our model outperformed the baseline on the TAJ dataset by improving accuracy and f1-score metrics by 1.89% and 1.74%, respectively. The model also shows 2.13% and 1.6% improvement in accuracy and f1-score, respectively, compared to the state-of-the-art model proposed by Samadi et al. (ACM Trans Asian Low-Resour Lang Inf Process, https://doi.org/10.1145/3472620, 2021).

由于社交媒体的广泛使用,人们接触到了假新闻和错误信息。传播假新闻对公众和政府都有不利影响。这一问题促使研究人员利用先进的自然语言处理概念来检测社交媒体中的此类错误信息。尽管最近的研究只关注深度上下文化文本表示模型提取的语义特征,但我们的目的是表明基于内容的特征工程可以在假新闻检测等复杂任务中增强语义模型。这些特征可以从输入文本的不同方面提供有价值的信息,并帮助我们的神经分类器比使用语义特征更准确地检测假新闻和真新闻。为了证明除了语义特征之外,特征工程的有效性,我们提出了一种深度神经架构,其中三个并行卷积神经网络(CNN)层从上下文表示向量中提取语义特征。然后,语义和基于内容的特征被馈送到完全连接的层。我们在关于新冠肺炎大流行的英文数据集和依赖领域的波斯假新闻数据集(TAJ)上评估了我们的模型。我们在英文新冠肺炎数据集上的实验显示,与基线模型相比,准确度和f1-score分别提高了4.16%和4.02%,基线模型没有从基于内容的特征中获益。与Shifath等人报告的最新结果相比,我们的准确度和f1-score分别提高了2.01%和0.69%。(一种基于变压器的抗击新冠肺炎假新闻的方法,arXiv预打印arXiv:2101.120272021)。我们的模型在TAJ数据集上的表现优于基线,准确率和f1得分指标分别提高了1.89%和1.74%。该模型还显示,与Samadi等人提出的最先进的模型相比,准确度和f1得分分别提高了2.13%和1.6%。(ACM Trans-Asian Low Resour Lang-Inf Process,https://doi.org/10.1145/3472620,2021)。
{"title":"Fake news detection: deep semantic representation with enhanced feature engineering.","authors":"Mohammadreza Samadi, Saeedeh Momtazi","doi":"10.1007/s41060-023-00387-8","DOIUrl":"10.1007/s41060-023-00387-8","url":null,"abstract":"<p><p>Due to the widespread use of social media, people are exposed to fake news and misinformation. Spreading fake news has adverse effects on both the general public and governments. This issue motivated researchers to utilize advanced natural language processing concepts to detect such misinformation in social media. Despite the recent research studies that only focused on semantic features extracted by deep contextualized text representation models, we aim to show that content-based feature engineering can enhance the semantic models in a complex task like fake news detection. These features can provide valuable information from different aspects of input texts and assist our neural classifier in detecting fake and real news more accurately than using semantic features. To substantiate the effectiveness of feature engineering besides semantic features, we proposed a deep neural architecture in which three parallel convolutional neural network (CNN) layers extract semantic features from contextual representation vectors. Then, semantic and content-based features are fed to a fully connected layer. We evaluated our model on an English dataset about the COVID-19 pandemic and a domain-independent Persian fake news dataset (TAJ). Our experiments on the English COVID-19 dataset show 4.16% and 4.02% improvement in accuracy and f1-score, respectively, compared to the baseline model, which does not benefit from the content-based features. We also achieved 2.01% and 0.69% improvement in accuracy and f1-score, respectively, compared to the state-of-the-art results reported by Shifath et al. (A transformer based approach for fighting covid-19 fake news, arXiv preprint arXiv:2101.12027, 2021). Our model outperformed the baseline on the TAJ dataset by improving accuracy and f1-score metrics by 1.89% and 1.74%, respectively. The model also shows 2.13% and 1.6% improvement in accuracy and f1-score, respectively, compared to the state-of-the-art model proposed by Samadi et al. (ACM Trans Asian Low-Resour Lang Inf Process, https://doi.org/10.1145/3472620, 2021).</p>","PeriodicalId":45667,"journal":{"name":"International Journal of Data Science and Analytics","volume":" ","pages":"1-12"},"PeriodicalIF":3.4,"publicationDate":"2023-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9998010/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10075360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of Data Science and Analytics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1