首页 > 最新文献

Big Data and Cognitive Computing最新文献

英文 中文
Transfer Learning Approach to Seed Taxonomy: A Wild Plant Case Study 种子分类的迁移学习方法:以野生植物为例
IF 3.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-07-04 DOI: 10.3390/bdcc7030128
Nehad M. Ibrahim, D. Gabr, Atta Rahman, Dhiaa Musleh, Dania Alkhulaifi, Mariam Alkharraa
Plant taxonomy is the scientific study of the classification and naming of various plant species. It is a branch of biology that aims to categorize and organize the diverse variety of plant life on earth. Traditionally, plant taxonomy has been performed using morphological and anatomical characteristics, such as leaf shape, flower structure, and seed and fruit characters. Artificial intelligence (AI), machine learning, and especially deep learning can also play an instrumental role in plant taxonomy by automating the process of categorizing plant species based on the available features. This study investigated transfer learning techniques to analyze images of plants and extract features that can be used to cluster the species hierarchically using the k-means clustering algorithm. Several pretrained deep learning models were employed and evaluated. In this regard, two separate datasets were used in the study comprising of seed images of wild plants collected from Egypt. Extensive experiments using the transfer learning method (DenseNet201) demonstrated that the proposed methods achieved superior accuracy compared to traditional methods with the highest accuracy of 93% and F1-score and area under the curve (AUC) of 95%, respectively. That is considerable in contrast to the state-of-the-art approaches in the literature.
植物分类学是对各种植物种类进行分类和命名的科学研究。它是生物学的一个分支,旨在对地球上各种各样的植物进行分类和组织。传统上,植物分类是利用形态学和解剖学特征,如叶片形状、花结构、种子和果实特征来进行的。人工智能(AI),机器学习,特别是深度学习也可以在植物分类中发挥重要作用,通过基于可用特征对植物物种进行自动化分类。本研究研究了迁移学习技术来分析植物图像,并提取可用于使用k-means聚类算法分层聚类的物种特征。使用并评估了几个预训练的深度学习模型。在这方面,研究中使用了两个独立的数据集,包括从埃及收集的野生植物种子图像。使用迁移学习方法(DenseNet201)进行的大量实验表明,与传统方法相比,所提出的方法具有更高的准确率,最高准确率为93%,f1分数和曲线下面积(AUC)分别为95%。与文献中最先进的方法相比,这是相当可观的。
{"title":"Transfer Learning Approach to Seed Taxonomy: A Wild Plant Case Study","authors":"Nehad M. Ibrahim, D. Gabr, Atta Rahman, Dhiaa Musleh, Dania Alkhulaifi, Mariam Alkharraa","doi":"10.3390/bdcc7030128","DOIUrl":"https://doi.org/10.3390/bdcc7030128","url":null,"abstract":"Plant taxonomy is the scientific study of the classification and naming of various plant species. It is a branch of biology that aims to categorize and organize the diverse variety of plant life on earth. Traditionally, plant taxonomy has been performed using morphological and anatomical characteristics, such as leaf shape, flower structure, and seed and fruit characters. Artificial intelligence (AI), machine learning, and especially deep learning can also play an instrumental role in plant taxonomy by automating the process of categorizing plant species based on the available features. This study investigated transfer learning techniques to analyze images of plants and extract features that can be used to cluster the species hierarchically using the k-means clustering algorithm. Several pretrained deep learning models were employed and evaluated. In this regard, two separate datasets were used in the study comprising of seed images of wild plants collected from Egypt. Extensive experiments using the transfer learning method (DenseNet201) demonstrated that the proposed methods achieved superior accuracy compared to traditional methods with the highest accuracy of 93% and F1-score and area under the curve (AUC) of 95%, respectively. That is considerable in contrast to the state-of-the-art approaches in the literature.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48693249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Arabic Sentiment Analysis of YouTube Comments: NLP-Based Machine Learning Approaches for Content Evaluation YouTube评论的阿拉伯语情感分析:基于nlp的内容评估机器学习方法
IF 3.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-07-03 DOI: 10.3390/bdcc7030127
Dhiaa Musleh, Ibrahim Alkhwaja, Ali Alkhwaja, Mohammed Alghamdi, Hussam Abahussain, Faisal Alfawaz, N. Min-Allah, M. M. Abdulqader
YouTube is a popular video-sharing platform that offers a diverse range of content. Assessing the quality of a video without watching it poses a significant challenge, especially considering the recent removal of the dislike count feature on YouTube. Although comments have the potential to provide insights into video content quality, navigating through the comments section can be time-consuming and overwhelming work for both content creators and viewers. This paper proposes an NLP-based model to classify Arabic comments as positive or negative. It was trained on a novel dataset of 4212 labeled comments, with a Kappa score of 0.818. The model uses six classifiers: SVM, Naïve Bayes, Logistic Regression, KNN, Decision Tree, and Random Forest. It achieved 94.62% accuracy and an MCC score of 91.46% with NB. Precision, Recall, and F1-measure for NB were 94.64%, 94.64%, and 94.62%, respectively. The Decision Tree had a suboptimal performance with 84.10% accuracy and an MCC score of 69.64% without TF-IDF. This study provides valuable insights for content creators to improve their content and audience engagement by analyzing viewers’ sentiments toward the videos. Furthermore, it bridges a literature gap by offering a comprehensive approach to Arabic sentiment analysis, which is currently limited in the field.
YouTube是一个很受欢迎的视频分享平台,提供各种各样的内容。在不看视频的情况下评估视频的质量是一个巨大的挑战,尤其是考虑到YouTube最近取消了不喜欢数功能。尽管评论有可能提供对视频内容质量的洞察,但对于内容创建者和观众来说,浏览评论部分可能是一项耗时且繁重的工作。本文提出了一种基于nlp的阿拉伯语评论分类模型。它在一个包含4212条标记评论的新数据集上进行训练,Kappa得分为0.818。该模型使用六种分类器:SVM、Naïve贝叶斯、逻辑回归、KNN、决策树和随机森林。NB的准确率为94.62%,MCC评分为91.46%。NB的精密度为94.64%,召回率为94.64%,一级测量值为94.62%。在没有TF-IDF的情况下,决策树的准确率为84.10%,MCC评分为69.64%。本研究通过分析观众对视频的情绪,为内容创作者提供了有价值的见解,以改善他们的内容和观众的参与。此外,它通过提供阿拉伯语情感分析的综合方法弥补了文献差距,这目前在该领域受到限制。
{"title":"Arabic Sentiment Analysis of YouTube Comments: NLP-Based Machine Learning Approaches for Content Evaluation","authors":"Dhiaa Musleh, Ibrahim Alkhwaja, Ali Alkhwaja, Mohammed Alghamdi, Hussam Abahussain, Faisal Alfawaz, N. Min-Allah, M. M. Abdulqader","doi":"10.3390/bdcc7030127","DOIUrl":"https://doi.org/10.3390/bdcc7030127","url":null,"abstract":"YouTube is a popular video-sharing platform that offers a diverse range of content. Assessing the quality of a video without watching it poses a significant challenge, especially considering the recent removal of the dislike count feature on YouTube. Although comments have the potential to provide insights into video content quality, navigating through the comments section can be time-consuming and overwhelming work for both content creators and viewers. This paper proposes an NLP-based model to classify Arabic comments as positive or negative. It was trained on a novel dataset of 4212 labeled comments, with a Kappa score of 0.818. The model uses six classifiers: SVM, Naïve Bayes, Logistic Regression, KNN, Decision Tree, and Random Forest. It achieved 94.62% accuracy and an MCC score of 91.46% with NB. Precision, Recall, and F1-measure for NB were 94.64%, 94.64%, and 94.62%, respectively. The Decision Tree had a suboptimal performance with 84.10% accuracy and an MCC score of 69.64% without TF-IDF. This study provides valuable insights for content creators to improve their content and audience engagement by analyzing viewers’ sentiments toward the videos. Furthermore, it bridges a literature gap by offering a comprehensive approach to Arabic sentiment analysis, which is currently limited in the field.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45851972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Industrial Insights on Digital Twins in Manufacturing: Application Landscape, Current Practices, and Future Needs 制造业中数字孪生的工业见解:应用前景、当前实践和未来需求
IF 3.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-06-29 DOI: 10.3390/bdcc7030126
R. D. D’Amico, S. Addepalli, J. Erkoyuncu
The digital twin (DT) research field is experiencing rapid expansion; yet, the research on industrial practices in this area remains poorly understood. This paper aims to address this knowledge gap by sharing feedback and future requirements from the manufacturing industry. The methodology employed in this study involves an examination of a survey that received 99 responses and interviews with 14 experts from 10 prominent UK organisations, most of which are involved in the defence industry in the UK. The survey and interviews explored topics such as DT design, return on investment, drivers, inhibitors, and future directions for DT development in manufacturing. This study’s findings indicate that DTs should possess characteristics such as adaptability, scalability, interoperability, and the ability to support assets throughout their entire life cycle. On average, completed DT projects reach the breakeven point in less than two years. The primary motivators behind DT development were identified to be autonomy, customer satisfaction, safety, awareness, optimisation, and sustainability. Meanwhile, the main obstacles include a lack of expertise, funding, and interoperability. This study concludes that the federation of twins and a paradigm shift in industrial thinking are essential components for the future of DT development.
数字孪生(DT)研究领域正在经历快速扩张;然而,对该领域工业实践的研究仍然知之甚少。本文旨在通过分享制造业的反馈和未来需求来解决这一知识差距。本研究采用的方法包括对一项调查的审查,该调查收到了99份回复,并采访了来自英国10个著名组织的14名专家,其中大多数组织都参与了英国的国防工业。调查和采访探讨了DT设计、投资回报、驱动因素、抑制剂以及制造业DT发展的未来方向等主题。这项研究的结果表明,DT应该具有适应性、可扩展性、互操作性以及在其整个生命周期中支持资产的能力。平均而言,已完成的DT项目在不到两年的时间内达到盈亏平衡点。DT开发背后的主要动力是自主性、客户满意度、安全性、意识、优化和可持续性。同时,主要障碍包括缺乏专业知识、资金和互操作性。这项研究得出结论,双胞胎的联合和产业思维的范式转变是DT未来发展的重要组成部分。
{"title":"Industrial Insights on Digital Twins in Manufacturing: Application Landscape, Current Practices, and Future Needs","authors":"R. D. D’Amico, S. Addepalli, J. Erkoyuncu","doi":"10.3390/bdcc7030126","DOIUrl":"https://doi.org/10.3390/bdcc7030126","url":null,"abstract":"The digital twin (DT) research field is experiencing rapid expansion; yet, the research on industrial practices in this area remains poorly understood. This paper aims to address this knowledge gap by sharing feedback and future requirements from the manufacturing industry. The methodology employed in this study involves an examination of a survey that received 99 responses and interviews with 14 experts from 10 prominent UK organisations, most of which are involved in the defence industry in the UK. The survey and interviews explored topics such as DT design, return on investment, drivers, inhibitors, and future directions for DT development in manufacturing. This study’s findings indicate that DTs should possess characteristics such as adaptability, scalability, interoperability, and the ability to support assets throughout their entire life cycle. On average, completed DT projects reach the breakeven point in less than two years. The primary motivators behind DT development were identified to be autonomy, customer satisfaction, safety, awareness, optimisation, and sustainability. Meanwhile, the main obstacles include a lack of expertise, funding, and interoperability. This study concludes that the federation of twins and a paradigm shift in industrial thinking are essential components for the future of DT development.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49253465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Determining the Factors Influencing Business Analytics Adoption at Organizational Level: A Systematic Literature Review 确定影响组织层面采用商业分析的因素:系统文献综述
IF 3.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-06-28 DOI: 10.3390/bdcc7030125
O. Horani, A. Khatibi, A. Al-Soud, J. Tham, A. Al-Adwan
The adoption of business analytics (BA) has become increasingly important for organizations seeking to gain a competitive edge in today’s data-driven business landscape. Hence, understanding the key factors influencing the adoption of BA at the organizational level is decisive for the successful implementation of these technologies. This paper presents a systematic literature review that utilizes the PRISMA technique to investigate the organizational, technological, and environmental factors that affect the adoption of BA. By conducting a thorough examination of pertinent research, this review consolidates the current understanding and pinpoints essential elements that shape the process of adoption. Out of a total of 614 articles published between 2012 and 2022, 29 final articles were carefully chosen. The findings highlight the significance of organizational factors, technological factors, and environmental factors in shaping the adoption of the BA process. By consolidating and analyzing the current body of research, this paper offers valuable insights for organizations aiming to adopt BA successfully and maximize their benefits at the organizational level. The synthesized findings also contribute to the existing literature and provide a foundation for future research in this field.
对于寻求在当今数据驱动的商业环境中获得竞争优势的组织来说,采用商业分析(BA)变得越来越重要。因此,了解影响BA在组织层面采用的关键因素对于这些技术的成功实施至关重要。本文采用PRISMA技术对影响BA采用的组织、技术和环境因素进行了系统的文献综述。通过对相关研究进行彻底的审查,这篇综述巩固了目前的理解,并指出了影响BA采用过程的基本因素。在2012年至2022年间发表的614篇文章中,有29篇是经过精心挑选的。研究结果强调了组织因素、技术因素和环境因素对BA流程采用的影响。通过整合和分析现有的研究成果,本文为组织成功采用BA并在组织层面实现利益最大化提供了宝贵的见解。综合发现也有助于现有文献,并为该领域的未来研究奠定基础。
{"title":"Determining the Factors Influencing Business Analytics Adoption at Organizational Level: A Systematic Literature Review","authors":"O. Horani, A. Khatibi, A. Al-Soud, J. Tham, A. Al-Adwan","doi":"10.3390/bdcc7030125","DOIUrl":"https://doi.org/10.3390/bdcc7030125","url":null,"abstract":"The adoption of business analytics (BA) has become increasingly important for organizations seeking to gain a competitive edge in today’s data-driven business landscape. Hence, understanding the key factors influencing the adoption of BA at the organizational level is decisive for the successful implementation of these technologies. This paper presents a systematic literature review that utilizes the PRISMA technique to investigate the organizational, technological, and environmental factors that affect the adoption of BA. By conducting a thorough examination of pertinent research, this review consolidates the current understanding and pinpoints essential elements that shape the process of adoption. Out of a total of 614 articles published between 2012 and 2022, 29 final articles were carefully chosen. The findings highlight the significance of organizational factors, technological factors, and environmental factors in shaping the adoption of the BA process. By consolidating and analyzing the current body of research, this paper offers valuable insights for organizations aiming to adopt BA successfully and maximize their benefits at the organizational level. The synthesized findings also contribute to the existing literature and provide a foundation for future research in this field.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49649712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A New Big Data Processing Framework for the Online Roadshow 一种新的在线路演大数据处理框架
IF 3.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-06-27 DOI: 10.3390/bdcc7030123
Kang-Ren Leow, M. Leow, Lee-Yeng Ong
The Online Roadshow, a new type of web application, is a digital marketing approach that aims to maximize contactless business engagement. It leverages web computing to conduct interactive game sessions via the internet. As a result, massive amounts of personal data are generated during the engagement process between the audience and the Online Roadshow (e.g., gameplay data and clickstream information). The high volume of data collected is valuable for more effective market segmentation in strategic business planning through data-driven processes such as web personalization and trend evaluation. However, the data storage and processing techniques used in conventional data analytic approaches are typically overloaded in such a computing environment. Hence, this paper proposed a new big data processing framework to improve the processing, handling, and storing of these large amounts of data. The proposed framework aims to provide a better dual-mode solution for processing the generated data for the Online Roadshow engagement process in both historical and real-time scenarios. Multiple functional modules, such as the Application Controller, the Message Broker, the Data Processing Module, and the Data Storage Module, were reformulated to provide a more efficient solution that matches the new needs of the Online Roadshow data analytics procedures. Some tests were conducted to compare the performance of the proposed frameworks against existing similar frameworks and verify the performance of the proposed framework in fulfilling the data processing requirements of the Online Roadshow. The experimental results evidenced multiple advantages of the proposed framework for Online Roadshow compared to similar existing big data processing frameworks.
在线路演是一种新型的网络应用程序,是一种旨在最大限度地提高非接触式商业参与度的数字营销方法。它利用网络计算通过互联网进行交互式游戏会话。因此,在观众和在线路演之间的互动过程中,会产生大量的个人数据(例如游戏数据和点击流信息)。通过网络个性化和趋势评估等数据驱动流程,收集的大量数据有助于在战略业务规划中更有效地细分市场。然而,在这样的计算环境中,传统数据分析方法中使用的数据存储和处理技术通常是过载的。因此,本文提出了一种新的大数据处理框架,以改进对这些大量数据的处理、处理和存储。拟议的框架旨在提供一个更好的双模解决方案,用于在历史和实时场景中处理在线路演参与过程生成的数据。重新制定了多个功能模块,如应用程序控制器、消息代理、数据处理模块和数据存储模块,以提供更高效的解决方案,满足在线路演数据分析程序的新需求。进行了一些测试,将拟议框架的性能与现有类似框架进行比较,并验证拟议框架在满足在线路演数据处理要求方面的性能。实验结果证明,与现有的类似大数据处理框架相比,所提出的在线路演框架具有多种优势。
{"title":"A New Big Data Processing Framework for the Online Roadshow","authors":"Kang-Ren Leow, M. Leow, Lee-Yeng Ong","doi":"10.3390/bdcc7030123","DOIUrl":"https://doi.org/10.3390/bdcc7030123","url":null,"abstract":"The Online Roadshow, a new type of web application, is a digital marketing approach that aims to maximize contactless business engagement. It leverages web computing to conduct interactive game sessions via the internet. As a result, massive amounts of personal data are generated during the engagement process between the audience and the Online Roadshow (e.g., gameplay data and clickstream information). The high volume of data collected is valuable for more effective market segmentation in strategic business planning through data-driven processes such as web personalization and trend evaluation. However, the data storage and processing techniques used in conventional data analytic approaches are typically overloaded in such a computing environment. Hence, this paper proposed a new big data processing framework to improve the processing, handling, and storing of these large amounts of data. The proposed framework aims to provide a better dual-mode solution for processing the generated data for the Online Roadshow engagement process in both historical and real-time scenarios. Multiple functional modules, such as the Application Controller, the Message Broker, the Data Processing Module, and the Data Storage Module, were reformulated to provide a more efficient solution that matches the new needs of the Online Roadshow data analytics procedures. Some tests were conducted to compare the performance of the proposed frameworks against existing similar frameworks and verify the performance of the proposed framework in fulfilling the data processing requirements of the Online Roadshow. The experimental results evidenced multiple advantages of the proposed framework for Online Roadshow compared to similar existing big data processing frameworks.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49162579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cognitive Network Science Reveals Bias in GPT-3, GPT-3.5 Turbo, and GPT-4 Mirroring Math Anxiety in High-School Students 认知网络科学揭示GPT-3、GPT-3.5Turbo和GPT-4的偏差反映了高中生的数学焦虑
IF 3.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-06-27 DOI: 10.3390/bdcc7030124
Katherine Abramski, Salvatore Citraro, L. Lombardi, Giulio Rossetti, Massimo Stella
Large Language Models (LLMs) are becoming increasingly integrated into our lives. Hence, it is important to understand the biases present in their outputs in order to avoid perpetuating harmful stereotypes, which originate in our own flawed ways of thinking. This challenge requires developing new benchmarks and methods for quantifying affective and semantic bias, keeping in mind that LLMs act as psycho-social mirrors that reflect the views and tendencies that are prevalent in society. One such tendency that has harmful negative effects is the global phenomenon of anxiety toward math and STEM subjects. In this study, we introduce a novel application of network science and cognitive psychology to understand biases towards math and STEM fields in LLMs from ChatGPT, such as GPT-3, GPT-3.5, and GPT-4. Specifically, we use behavioral forma mentis networks (BFMNs) to understand how these LLMs frame math and STEM disciplines in relation to other concepts. We use data obtained by probing the three LLMs in a language generation task that has previously been applied to humans. Our findings indicate that LLMs have negative perceptions of math and STEM fields, associating math with negative concepts in 6 cases out of 10. We observe significant differences across OpenAI’s models: newer versions (i.e., GPT-4) produce 5× semantically richer, more emotionally polarized perceptions with fewer negative associations compared to older versions and N=159 high-school students. These findings suggest that advances in the architecture of LLMs may lead to increasingly less biased models that could even perhaps someday aid in reducing harmful stereotypes in society rather than perpetuating them.
大型语言模型(LLM)正日益融入我们的生活。因此,重要的是要了解其产出中存在的偏见,以避免有害的刻板印象长期存在,这些刻板印象源于我们自己有缺陷的思维方式。这一挑战需要开发新的基准和方法来量化情感和语义偏见,记住LLM是反映社会中普遍存在的观点和趋势的心理社会镜子。一种具有有害负面影响的趋势是对数学和STEM科目的全球焦虑现象。在这项研究中,我们介绍了网络科学和认知心理学的新应用,以了解ChatGPT LLM中对数学和STEM领域的偏见,如GPT-3、GPT-3.5和GPT-4。具体而言,我们使用行为形式心理网络(BFMNs)来理解这些LLM如何将数学和STEM学科与其他概念联系起来。我们使用通过在以前应用于人类的语言生成任务中探测三个LLM而获得的数据。我们的研究结果表明,LLM对数学和STEM领域有负面看法,在10种情况中有6种情况将数学与负面概念联系在一起。我们观察到OpenAI模型之间的显著差异:与旧版本和N=159名高中生相比,新版本(即GPT-4)产生了5倍语义更丰富、情绪更两极分化的感知,负面联想更少。这些发现表明,LLM架构的进步可能会导致偏见越来越少的模型,甚至有一天可能有助于减少社会中有害的刻板印象,而不是使其永久化。
{"title":"Cognitive Network Science Reveals Bias in GPT-3, GPT-3.5 Turbo, and GPT-4 Mirroring Math Anxiety in High-School Students","authors":"Katherine Abramski, Salvatore Citraro, L. Lombardi, Giulio Rossetti, Massimo Stella","doi":"10.3390/bdcc7030124","DOIUrl":"https://doi.org/10.3390/bdcc7030124","url":null,"abstract":"Large Language Models (LLMs) are becoming increasingly integrated into our lives. Hence, it is important to understand the biases present in their outputs in order to avoid perpetuating harmful stereotypes, which originate in our own flawed ways of thinking. This challenge requires developing new benchmarks and methods for quantifying affective and semantic bias, keeping in mind that LLMs act as psycho-social mirrors that reflect the views and tendencies that are prevalent in society. One such tendency that has harmful negative effects is the global phenomenon of anxiety toward math and STEM subjects. In this study, we introduce a novel application of network science and cognitive psychology to understand biases towards math and STEM fields in LLMs from ChatGPT, such as GPT-3, GPT-3.5, and GPT-4. Specifically, we use behavioral forma mentis networks (BFMNs) to understand how these LLMs frame math and STEM disciplines in relation to other concepts. We use data obtained by probing the three LLMs in a language generation task that has previously been applied to humans. Our findings indicate that LLMs have negative perceptions of math and STEM fields, associating math with negative concepts in 6 cases out of 10. We observe significant differences across OpenAI’s models: newer versions (i.e., GPT-4) produce 5× semantically richer, more emotionally polarized perceptions with fewer negative associations compared to older versions and N=159 high-school students. These findings suggest that advances in the architecture of LLMs may lead to increasingly less biased models that could even perhaps someday aid in reducing harmful stereotypes in society rather than perpetuating them.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42743736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The Value of Web Data Scraping: An Application to TripAdvisor Web数据抓取的价值:TripAdvisor的一个应用
Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-06-21 DOI: 10.3390/bdcc7030121
Gianluca Barbera, Luiz Araujo, Silvia Fernandes
Social Media Analytics (SMA) is more and more relevant in today’s market dynamics. However, it is necessary to use it wisely, either in promoting any kind of product/brand, or interacting with customers. This requires its effective understanding and monitoring. One way is through web data scraping (WDS) tools that allow to select sites and platforms to compare them in their performances. They can optimize extraction of big data published on social media. Due to current challenges, a sector that can particularly take advantage of this source is tourism (and its related sectors). This year has the hope of tourism’s revival after a pandemic whose impacts are still affecting several activities. Many traders and entrepreneurs have already used these versatile tools. However, do they really know their potential? The present study highlights the use of WDS to collect data from TripAdvisor’s social pages. Besides comparing competitors’ performance, companies also gain new knowledge of unnoticed preferences/habits. This contributes to more interesting innovations and results for them and for their customers. The approach used here is based on a project for smart tourism consultancy, from the identification of a gap in our region, to aid tourism organizations to enhance their digital presence and business model. Many things can be detected in this big source of unstructured data very quickly and easily without programming. Moreover, exploring code, either to refine the web scraper or connect it with other platforms/apps, can be an object of future research to leverage consumer behavior prediction for more advanced interactions.
社交媒体分析(SMA)在当今的市场动态中越来越重要。然而,明智地使用它是必要的,无论是在推广任何一种产品/品牌,还是与客户互动。这需要它的有效理解和监测。一种方法是通过web数据抓取(WDS)工具,它允许选择网站和平台来比较它们的性能。他们可以优化提取社交媒体上发布的大数据。由于目前的挑战,一个可以特别利用这一来源的部门是旅游业(及其相关部门)。今年旅游业有望在大流行之后复苏,其影响仍在影响若干活动。许多商人和企业家已经使用了这些多功能工具。然而,他们真的知道自己的潜力吗?目前的研究强调了使用WDS从TripAdvisor的社交页面收集数据。除了比较竞争对手的表现,公司还获得了未被注意到的偏好/习惯的新知识。这有助于为他们和他们的客户带来更多有趣的创新和结果。本文使用的方法基于一个智能旅游咨询项目,从识别我们地区的差距开始,帮助旅游组织增强其数字形象和商业模式。在这个庞大的非结构化数据源中,无需编程就可以非常快速、轻松地检测到许多东西。此外,探索代码,无论是改进web scraper还是将其与其他平台/应用程序连接起来,都可以成为未来研究的对象,以利用消费者行为预测进行更高级的交互。
{"title":"The Value of Web Data Scraping: An Application to TripAdvisor","authors":"Gianluca Barbera, Luiz Araujo, Silvia Fernandes","doi":"10.3390/bdcc7030121","DOIUrl":"https://doi.org/10.3390/bdcc7030121","url":null,"abstract":"Social Media Analytics (SMA) is more and more relevant in today’s market dynamics. However, it is necessary to use it wisely, either in promoting any kind of product/brand, or interacting with customers. This requires its effective understanding and monitoring. One way is through web data scraping (WDS) tools that allow to select sites and platforms to compare them in their performances. They can optimize extraction of big data published on social media. Due to current challenges, a sector that can particularly take advantage of this source is tourism (and its related sectors). This year has the hope of tourism’s revival after a pandemic whose impacts are still affecting several activities. Many traders and entrepreneurs have already used these versatile tools. However, do they really know their potential? The present study highlights the use of WDS to collect data from TripAdvisor’s social pages. Besides comparing competitors’ performance, companies also gain new knowledge of unnoticed preferences/habits. This contributes to more interesting innovations and results for them and for their customers. The approach used here is based on a project for smart tourism consultancy, from the identification of a gap in our region, to aid tourism organizations to enhance their digital presence and business model. Many things can be detected in this big source of unstructured data very quickly and easily without programming. Moreover, exploring code, either to refine the web scraper or connect it with other platforms/apps, can be an object of future research to leverage consumer behavior prediction for more advanced interactions.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136295906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Empowering Short Answer Grading: Integrating Transformer-Based Embeddings and BI-LSTM Network 授权简答评分:整合基于变压器的嵌入和BI-LSTM网络
Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-06-21 DOI: 10.3390/bdcc7030122
Wael H. Gomaa, Abdelrahman E. Nagib, Mostafa M. Saeed, Abdulmohsen Algarni, Emad Nabil
Automated scoring systems have been revolutionized by natural language processing, enabling the evaluation of students’ diverse answers across various academic disciplines. However, this presents a challenge as students’ responses may vary significantly in terms of length, structure, and content. To tackle this challenge, this research introduces a novel automated model for short answer grading. The proposed model uses pretrained “transformer” models, specifically T5, in conjunction with a BI-LSTM architecture which is effective in processing sequential data by considering the past and future context. This research evaluated several preprocessing techniques and different hyperparameters to identify the most efficient architecture. Experiments were conducted using a standard benchmark dataset named the North Texas Dataset. This research achieved a state-of-the-art correlation value of 92.5 percent. The proposed model’s accuracy has significant implications for education as it has the potential to save educators considerable time and effort, while providing a reliable and fair evaluation for students, ultimately leading to improved learning outcomes.
通过自然语言处理,自动评分系统发生了革命性的变化,能够对不同学科的学生的不同答案进行评估。然而,这是一个挑战,因为学生的回答可能在长度、结构和内容上有很大的不同。为了解决这一挑战,本研究引入了一种新的自动简答评分模型。提出的模型使用预训练的“转换”模型,特别是T5,并结合BI-LSTM体系结构,该体系结构通过考虑过去和未来的上下文来有效地处理顺序数据。本研究评估了几种预处理技术和不同的超参数,以确定最有效的体系结构。实验使用名为北德克萨斯数据集的标准基准数据集进行。该研究获得了92.5%的最新相关值。该模型的准确性对教育具有重要意义,因为它有可能为教育工作者节省大量的时间和精力,同时为学生提供可靠和公平的评估,最终提高学习效果。
{"title":"Empowering Short Answer Grading: Integrating Transformer-Based Embeddings and BI-LSTM Network","authors":"Wael H. Gomaa, Abdelrahman E. Nagib, Mostafa M. Saeed, Abdulmohsen Algarni, Emad Nabil","doi":"10.3390/bdcc7030122","DOIUrl":"https://doi.org/10.3390/bdcc7030122","url":null,"abstract":"Automated scoring systems have been revolutionized by natural language processing, enabling the evaluation of students’ diverse answers across various academic disciplines. However, this presents a challenge as students’ responses may vary significantly in terms of length, structure, and content. To tackle this challenge, this research introduces a novel automated model for short answer grading. The proposed model uses pretrained “transformer” models, specifically T5, in conjunction with a BI-LSTM architecture which is effective in processing sequential data by considering the past and future context. This research evaluated several preprocessing techniques and different hyperparameters to identify the most efficient architecture. Experiments were conducted using a standard benchmark dataset named the North Texas Dataset. This research achieved a state-of-the-art correlation value of 92.5 percent. The proposed model’s accuracy has significant implications for education as it has the potential to save educators considerable time and effort, while providing a reliable and fair evaluation for students, ultimately leading to improved learning outcomes.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136296077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Efficient Method for Continuous IoT Data Stream Indexing in the Fog-Cloud Computing Level 雾云计算水平下连续物联网数据流索引的有效方法
IF 3.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-06-14 DOI: 10.3390/bdcc7020119
Karima Khettabi, Zineddine Kouahla, Brahim Farou, Hamid Seridi, M. Ferrag
Internet of Things (IoT) systems include many smart devices that continuously generate massive spatio-temporal data, which can be difficult to process. These continuous data streams need to be stored smartly so that query searches are efficient. In this work, we propose an efficient method, in the fog-cloud computing architecture, to index continuous and heterogeneous data streams in metric space. This method divides the fog layer into three levels: clustering, clusters processing and indexing. The Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm is used to group the data from each stream into homogeneous clusters at the clustering fog level. Each cluster in the first data stream is stored in the clusters processing fog level and indexed directly in the indexing fog level in a Binary tree with Hyperplane (BH tree). The indexing of clusters in the subsequent data stream is determined by the coefficient of variation (CV) value of the union of the new cluster with the existing clusters in the cluster processing fog layer. An analysis and comparison of our experimental results with other results in the literature demonstrated the effectiveness of the CV method in reducing energy consumption during BH tree construction, as well as reducing the search time and energy consumption during a k Nearest Neighbor (kNN) parallel query search.
物联网(IoT)系统包括许多智能设备,这些设备不断产生大量的时空数据,这些数据很难处理。需要巧妙地存储这些连续的数据流,以便查询搜索更高效。在这项工作中,我们提出了一种有效的方法,在雾云计算架构中,索引度量空间中的连续和异构数据流。该方法将雾层分为聚类、聚类处理和索引三个层次。采用基于密度的带噪声应用空间聚类(DBSCAN)算法,在聚类雾级上将每个流中的数据分组为均匀的聚类。第一个数据流中的每个簇存储在簇处理雾级中,并直接在具有超平面的二叉树(BH树)的索引雾级中进行索引。后续数据流中聚类的索引由聚类处理雾层中新聚类与现有聚类联合的变异系数(CV)值决定。我们的实验结果与文献中的其他结果进行了分析和比较,证明了CV方法在减少BH树构建过程中的能量消耗以及减少k最近邻(kNN)并行查询搜索时的搜索时间和能量消耗方面的有效性。
{"title":"Efficient Method for Continuous IoT Data Stream Indexing in the Fog-Cloud Computing Level","authors":"Karima Khettabi, Zineddine Kouahla, Brahim Farou, Hamid Seridi, M. Ferrag","doi":"10.3390/bdcc7020119","DOIUrl":"https://doi.org/10.3390/bdcc7020119","url":null,"abstract":"Internet of Things (IoT) systems include many smart devices that continuously generate massive spatio-temporal data, which can be difficult to process. These continuous data streams need to be stored smartly so that query searches are efficient. In this work, we propose an efficient method, in the fog-cloud computing architecture, to index continuous and heterogeneous data streams in metric space. This method divides the fog layer into three levels: clustering, clusters processing and indexing. The Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm is used to group the data from each stream into homogeneous clusters at the clustering fog level. Each cluster in the first data stream is stored in the clusters processing fog level and indexed directly in the indexing fog level in a Binary tree with Hyperplane (BH tree). The indexing of clusters in the subsequent data stream is determined by the coefficient of variation (CV) value of the union of the new cluster with the existing clusters in the cluster processing fog layer. An analysis and comparison of our experimental results with other results in the literature demonstrated the effectiveness of the CV method in reducing energy consumption during BH tree construction, as well as reducing the search time and energy consumption during a k Nearest Neighbor (kNN) parallel query search.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46197027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
YOLO-v5 Variant Selection Algorithm Coupled with Representative Augmentations for Modelling Production-Based Variance in Automated Lightweight Pallet Racking Inspection YOLO-v5变量选择算法与代表性增广相结合用于轻型托盘货架自动化检测中基于生产的方差建模
IF 3.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-06-14 DOI: 10.3390/bdcc7020120
Muhammad Hussain
The aim of this research is to develop an automated pallet inspection architecture with two key objectives: high performance with respect to defect classification and computational efficacy, i.e., lightweight footprint. As automated pallet racking via machine vision is a developing field, the procurement of racking datasets can be a difficult task. Therefore, the first contribution of this study was the proposal of several tailored augmentations that were generated based on modelling production floor conditions/variances within warehouses. Secondly, the variant selection algorithm was proposed, starting with extreme-end analysis and providing a protocol for selecting the optimal architecture with respect to accuracy and computational efficiency. The proposed YOLO-v5n architecture generated the highest MAP@0.5 of 96.8% compared to previous works in the racking domain, with a computational footprint in terms of the number of parameters at its lowest, i.e., 1.9 M compared to YOLO-v5x at 86.7 M.
这项研究的目的是开发一种自动化托盘检查架构,该架构具有两个关键目标:缺陷分类方面的高性能和计算效率,即轻量级占地面积。由于通过机器视觉实现自动化托盘货架是一个发展中的领域,货架数据集的采购可能是一项艰巨的任务。因此,这项研究的第一个贡献是提出了几个基于仓库内生产车间条件/差异建模的定制扩增。其次,从极值分析入手,提出了变体选择算法,并提供了一种在准确性和计算效率方面选择最佳架构的协议。所提出的YOLO-v5n架构产生了最高MAP@0.5与之前在支架领域的工作相比,为96.8%,在参数数量方面的计算足迹最低,即1.9 M,而YOLO-v5x为86.7 M。
{"title":"YOLO-v5 Variant Selection Algorithm Coupled with Representative Augmentations for Modelling Production-Based Variance in Automated Lightweight Pallet Racking Inspection","authors":"Muhammad Hussain","doi":"10.3390/bdcc7020120","DOIUrl":"https://doi.org/10.3390/bdcc7020120","url":null,"abstract":"The aim of this research is to develop an automated pallet inspection architecture with two key objectives: high performance with respect to defect classification and computational efficacy, i.e., lightweight footprint. As automated pallet racking via machine vision is a developing field, the procurement of racking datasets can be a difficult task. Therefore, the first contribution of this study was the proposal of several tailored augmentations that were generated based on modelling production floor conditions/variances within warehouses. Secondly, the variant selection algorithm was proposed, starting with extreme-end analysis and providing a protocol for selecting the optimal architecture with respect to accuracy and computational efficiency. The proposed YOLO-v5n architecture generated the highest MAP@0.5 of 96.8% compared to previous works in the racking domain, with a computational footprint in terms of the number of parameters at its lowest, i.e., 1.9 M compared to YOLO-v5x at 86.7 M.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43877373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Big Data and Cognitive Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1