首页 > 最新文献

Frontiers in Big Data最新文献

英文 中文
Benchmarking open source and paid services for speech to text: an analysis of quality and input variety. 对语音到文本的开源和付费服务进行基准测试:对质量和输入多样性的分析。
IF 3.1 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-09-20 eCollection Date: 2023-01-01 DOI: 10.3389/fdata.2023.1210559
Antonino Ferraro, Antonio Galli, Valerio La Gatta, Marco Postiglione

Introduction: Speech to text (STT) technology has seen increased usage in recent years for automating transcription of spoken language. To choose the most suitable tool for a given task, it is essential to evaluate the performance and quality of both open source and paid STT services.

Methods: In this paper, we conduct a benchmarking study of open source and paid STT services, with a specific focus on assessing their performance concerning the variety of input text. We utilizes ix datasets obtained from diverse sources, including interviews, lectures, and speeches, as input for the STT tools. The evaluation of the instruments employs the Word Error Rate (WER), a standard metric for STT evaluation.

Results: Our analysis of the results demonstrates significant variations in the performance of the STT tools based on the input text. Certain tools exhibit superior performance on specific types of audio samples compared to others. Our study provides insights into STT tool performance when handling substantial data volumes, as well as the challenges and opportunities posed by the multimedia nature of the data.

Discussion: Although paid services generally demonstrate better accuracy and speed compared to open source alternatives, their performance remains dependent on the input text. The study highlights the need for considering specific requirements and characteristics of the audio samples when selecting an appropriate STT tool.

引言:近年来,语音转文本(STT)技术在口语转录自动化方面的应用越来越多。要为给定的任务选择最合适的工具,必须评估开源和付费STT服务的性能和质量。方法:在本文中,我们对开源和付费STT服务进行了基准测试研究,重点评估它们在输入文本多样性方面的表现。我们利用从不同来源获得的九个数据集,包括采访、讲座和演讲,作为STT工具的输入。仪器的评估采用单词错误率(WER),这是STT评估的标准度量。结果:我们对结果的分析表明,基于输入文本的STT工具的性能存在显著差异。与其他工具相比,某些工具在特定类型的音频样本上表现出优异的性能。我们的研究深入了解了STT工具在处理大量数据时的性能,以及数据的多媒体性质带来的挑战和机遇。讨论:尽管与开源替代方案相比,付费服务通常表现出更好的准确性和速度,但它们的性能仍然取决于输入文本。该研究强调,在选择合适的STT工具时,需要考虑音频样本的具体要求和特性。
{"title":"Benchmarking open source and paid services for speech to text: an analysis of quality and input variety.","authors":"Antonino Ferraro,&nbsp;Antonio Galli,&nbsp;Valerio La Gatta,&nbsp;Marco Postiglione","doi":"10.3389/fdata.2023.1210559","DOIUrl":"10.3389/fdata.2023.1210559","url":null,"abstract":"<p><strong>Introduction: </strong>Speech to text (STT) technology has seen increased usage in recent years for automating transcription of spoken language. To choose the most suitable tool for a given task, it is essential to evaluate the performance and quality of both open source and paid STT services.</p><p><strong>Methods: </strong>In this paper, we conduct a benchmarking study of open source and paid STT services, with a specific focus on assessing their performance concerning the variety of input text. We utilizes ix datasets obtained from diverse sources, including interviews, lectures, and speeches, as input for the STT tools. The evaluation of the instruments employs the Word Error Rate (WER), a standard metric for STT evaluation.</p><p><strong>Results: </strong>Our analysis of the results demonstrates significant variations in the performance of the STT tools based on the input text. Certain tools exhibit superior performance on specific types of audio samples compared to others. Our study provides insights into STT tool performance when handling substantial data volumes, as well as the challenges and opportunities posed by the multimedia nature of the data.</p><p><strong>Discussion: </strong>Although paid services generally demonstrate better accuracy and speed compared to open source alternatives, their performance remains dependent on the input text. The study highlights the need for considering specific requirements and characteristics of the audio samples when selecting an appropriate STT tool.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1210559"},"PeriodicalIF":3.1,"publicationDate":"2023-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10548127/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41157619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An intelligent telemonitoring application for coronavirus patients: reCOVeryaID. 用于冠状病毒患者的智能远程监测应用程序:reCOVeryaID。
IF 3.1 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-09-18 eCollection Date: 2023-01-01 DOI: 10.3389/fdata.2023.1205766
Daniela D'Auria, Raffaele Russo, Alfonso Fedele, Federica Addabbo, Diego Calvanese

The COVID-19 emergency underscored the importance of resolving crucial issues of territorial health monitoring, such as overloaded phone lines, doctors exposed to infection, chronically ill patients unable to access hospitals, etc. In fact, it often happened that people would call doctors/hospitals just out of anxiety, not realizing that they were clogging up communications, thus causing problems for those who needed them most; such people, often elderly, have often felt lonely and abandoned by the health care system because of poor telemedicine. In addition, doctors were unable to follow up on the most serious cases or make sure that others did not worsen. Thus, uring the first pandemic wave we had the idea to design a system that could help people alleviate their fears and be constantly monitored by doctors both in hospitals and at home; consequently, we developed reCOVeryaID, a telemonitoring application for coronavirus patients. It is an autonomous application supported by a knowledge base that can react promptly and inform medical doctors if dangerous trends in the patient's short- and long-term vital signs are detected. In this paper, we also validate the knowledge-base rules in real-world settings by testing them on data from real patients infected with COVID-19.

新冠肺炎紧急情况强调了解决领土卫生监测关键问题的重要性,如电话线过载、医生感染、慢性病患者无法进入医院等,从而给那些最需要它们的人带来问题;这些人,通常是老年人,经常感到孤独,因为远程医疗不好而被医疗系统抛弃。此外,医生无法对最严重的病例进行随访,也无法确保其他病例不会恶化。因此,在第一波疫情期间,我们有了设计一个系统的想法,可以帮助人们减轻恐惧,并由医院和家里的医生不断监测;因此,我们开发了reCOVeryaID,一种针对冠状病毒患者的远程监测应用程序。这是一个由知识库支持的自主应用程序,如果检测到患者短期和长期生命体征的危险趋势,它可以迅速做出反应并通知医生。在这篇论文中,我们还通过对新冠肺炎感染者的真实数据进行测试,在现实世界中验证了知识库规则。
{"title":"An intelligent telemonitoring application for coronavirus patients: reCOVeryaID.","authors":"Daniela D'Auria,&nbsp;Raffaele Russo,&nbsp;Alfonso Fedele,&nbsp;Federica Addabbo,&nbsp;Diego Calvanese","doi":"10.3389/fdata.2023.1205766","DOIUrl":"https://doi.org/10.3389/fdata.2023.1205766","url":null,"abstract":"<p><p>The COVID-19 emergency underscored the importance of resolving crucial issues of territorial health monitoring, such as overloaded phone lines, doctors exposed to infection, chronically ill patients unable to access hospitals, etc. In fact, it often happened that people would call doctors/hospitals just out of anxiety, not realizing that they were clogging up communications, thus causing problems for those who needed them most; such people, often elderly, have often felt lonely and abandoned by the health care system because of poor telemedicine. In addition, doctors were unable to follow up on the most serious cases or make sure that others did not worsen. Thus, uring the first pandemic wave we had the idea to design a system that could help people alleviate their fears and be constantly monitored by doctors both in hospitals and at home; consequently, we developed reCOVeryaID, a telemonitoring application for coronavirus patients. It is an autonomous application supported by a knowledge base that can react promptly and inform medical doctors if dangerous trends in the patient's short- and long-term vital signs are detected. In this paper, we also validate the knowledge-base rules in real-world settings by testing them on data from real patients infected with COVID-19.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1205766"},"PeriodicalIF":3.1,"publicationDate":"2023-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10543687/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41159201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Authentication, access, and monitoring system for critical areas with the use of artificial intelligence integrated into perimeter security in a data center. 关键区域的身份验证、访问和监控系统,将人工智能集成到数据中心的外围安全中。
IF 3.1 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-08-31 eCollection Date: 2023-01-01 DOI: 10.3389/fdata.2023.1200390
William Villegas-Ch, Joselin García-Ortiz

Perimeter security in data centers helps protect systems and the data they store by preventing unauthorized access and protecting critical resources from potential threats. According to the report of the information security company SonicWall, in 2021, there was a 66% increase in the number of ransomware attacks. In addition, the message from the same company indicates that the total number of cyber threats detected in 2021 increased by 24% compared to 2019. Among these attacks, the infrastructure of data centers was compromised; for this reason, organizations include elements Physical such as security cameras, movement detection systems, authentication systems, etc., as an additional measure that contributes to perimeter security. This work proposes using artificial intelligence in the perimeter security of data centers. It allows the automation and optimization of security processes, which translates into greater efficiency and reliability in the operations that prevent intrusions through authentication, permit verification, and monitoring critical areas. It is crucial to ensure that AI-based perimeter security systems are designed to protect and respect user privacy. In addition, it is essential to regularly monitor the effectiveness and integrity of these systems to ensure that they function correctly and meet security standards.

数据中心的周界安全通过防止未经授权的访问和保护关键资源免受潜在威胁,有助于保护系统及其存储的数据。根据信息安全公司SonicWall的报告,2021年,勒索软件攻击的数量增加了66%。此外,来自同一家公司的消息显示,2021年检测到的网络威胁总数比2019年增加了24%。在这些攻击中,数据中心的基础设施遭到破坏;出于这个原因,组织包括物理元素,如安全摄像头、移动检测系统、身份验证系统等,作为有助于周边安全的额外措施。这项工作建议在数据中心的外围安全中使用人工智能。它允许安全流程的自动化和优化,从而提高操作的效率和可靠性,通过身份验证、许可证验证和监控关键区域来防止入侵。至关重要的是要确保基于人工智能的周边安全系统旨在保护和尊重用户隐私。此外,必须定期监测这些系统的有效性和完整性,以确保它们正确运行并符合安全标准。
{"title":"Authentication, access, and monitoring system for critical areas with the use of artificial intelligence integrated into perimeter security in a data center.","authors":"William Villegas-Ch,&nbsp;Joselin García-Ortiz","doi":"10.3389/fdata.2023.1200390","DOIUrl":"10.3389/fdata.2023.1200390","url":null,"abstract":"<p><p>Perimeter security in data centers helps protect systems and the data they store by preventing unauthorized access and protecting critical resources from potential threats. According to the report of the information security company SonicWall, in 2021, there was a 66% increase in the number of ransomware attacks. In addition, the message from the same company indicates that the total number of cyber threats detected in 2021 increased by 24% compared to 2019. Among these attacks, the infrastructure of data centers was compromised; for this reason, organizations include elements Physical such as security cameras, movement detection systems, authentication systems, etc., as an additional measure that contributes to perimeter security. This work proposes using artificial intelligence in the perimeter security of data centers. It allows the automation and optimization of security processes, which translates into greater efficiency and reliability in the operations that prevent intrusions through authentication, permit verification, and monitoring critical areas. It is crucial to ensure that AI-based perimeter security systems are designed to protect and respect user privacy. In addition, it is essential to regularly monitor the effectiveness and integrity of these systems to ensure that they function correctly and meet security standards.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1200390"},"PeriodicalIF":3.1,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10500307/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10289348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of methods for assigning causes of death from verbal autopsies in India. 印度死因鉴定方法的评价。
IF 3.1 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-08-24 eCollection Date: 2023-01-01 DOI: 10.3389/fdata.2023.1197471
Sudhir K Benara, Saurabh Sharma, Atul Juneja, Saritha Nair, B K Gulati, Kh Jitenkumar Singh, Lucky Singh, Ved Prakash Yadav, Chalapati Rao, M Vishnu Vardhana Rao

Background: Physician-coded verbal autopsy (PCVA) is the most widely used method to determine causes of death (COD) in countries where medical certification of death is low. Computer-coded verbal autopsy (CCVA), an alternative method to PCVA for assigning the COD is considered to be efficient and cost-effective. However, the performance of CCVA as compared to PCVA is yet to be established in the Indian context.

Methods: We evaluated the performance of PCVA and three CCVA methods i.e., InterVA 5, InSilico, and Tariff 2.0 on verbal autopsies done using the WHO 2016 VA tool on 2,120 reference standard cases developed from five tertiary care hospitals of Delhi. PCVA methodology involved dual independent review with adjudication, where required. Metrics to assess performance were Cause Specific Mortality Fraction (CSMF), sensitivity, positive predictive value (PPV), CSMF Accuracy, and Kappa statistic.

Results: In terms of the measures of the overall performance of COD assignment methods, for CSMF Accuracy, the PCVA method achieved the highest score of 0.79, followed by 0.67 for Tariff_2.0, 0.66 for Inter-VA and 0.62 for InSilicoVA. The PCVA method also achieved the highest agreement (57%) and Kappa scores (0.54). The PCVA method showed the highest sensitivity for 15 out of 20 causes of death.

Conclusion: Our study found that the PCVA method had the best performance out of all the four COD assignment methods that were tested in our study sample. In order to improve the performance of CCVA methods, multicentric studies with larger sample sizes need to be conducted using the WHO VA tool.

背景:在死亡医学证明较低的国家,医生编码的口头尸检(PCVA)是确定死亡原因(COD)最广泛使用的方法。计算机编码的口头尸检(CCVA)是PCVA分配COD的一种替代方法,被认为是有效且具有成本效益的。然而,与PCVA相比,CCVA的性能尚未在印度背景下确定。方法:我们评估了PCVA和三种CCVA方法(即InterVA 5、InSilico和Tariff 2.0)在使用世界卫生组织2016 VA工具对德里五家三级护理医院开发的2120例参考标准病例进行的口头尸检中的表现。PCVA方法涉及双重独立审查和裁决(如需要)。评估绩效的指标包括病因特异性死亡率(CSMF)、敏感性、阳性预测值(PPV)、CSMF准确性和Kappa统计。结果:就COD分配方法的总体性能衡量而言,在CSMF准确性方面,PCVA方法获得了0.79的最高分数,其次是Tariff_2.0的0.67、Inter-VA的0.66和InSilicoVA的0.62。PCVA方法也获得了最高的一致性(57%)和Kappa评分(0.54)。PCVA方法对20种死亡原因中的15种表现出最高的敏感性。结论:我们的研究发现,在我们研究样品中测试的所有四种COD分配方法中,PCVA方法具有最好的性能。为了提高CCVA方法的性能,需要使用世界卫生组织VA工具进行样本量较大的多中心研究。
{"title":"Evaluation of methods for assigning causes of death from verbal autopsies in India.","authors":"Sudhir K Benara, Saurabh Sharma, Atul Juneja, Saritha Nair, B K Gulati, Kh Jitenkumar Singh, Lucky Singh, Ved Prakash Yadav, Chalapati Rao, M Vishnu Vardhana Rao","doi":"10.3389/fdata.2023.1197471","DOIUrl":"10.3389/fdata.2023.1197471","url":null,"abstract":"<p><strong>Background: </strong>Physician-coded verbal autopsy (PCVA) is the most widely used method to determine causes of death (COD) in countries where medical certification of death is low. Computer-coded verbal autopsy (CCVA), an alternative method to PCVA for assigning the COD is considered to be efficient and cost-effective. However, the performance of CCVA as compared to PCVA is yet to be established in the Indian context.</p><p><strong>Methods: </strong>We evaluated the performance of PCVA and three CCVA methods i.e., InterVA 5, InSilico, and Tariff 2.0 on verbal autopsies done using the WHO 2016 VA tool on 2,120 reference standard cases developed from five tertiary care hospitals of Delhi. PCVA methodology involved dual independent review with adjudication, where required. Metrics to assess performance were Cause Specific Mortality Fraction (CSMF), sensitivity, positive predictive value (PPV), CSMF Accuracy, and Kappa statistic.</p><p><strong>Results: </strong>In terms of the measures of the overall performance of COD assignment methods, for CSMF Accuracy, the PCVA method achieved the highest score of 0.79, followed by 0.67 for Tariff_2.0, 0.66 for Inter-VA and 0.62 for InSilicoVA. The PCVA method also achieved the highest agreement (57%) and Kappa scores (0.54). The PCVA method showed the highest sensitivity for 15 out of 20 causes of death.</p><p><strong>Conclusion: </strong>Our study found that the PCVA method had the best performance out of all the four COD assignment methods that were tested in our study sample. In order to improve the performance of CCVA methods, multicentric studies with larger sample sizes need to be conducted using the WHO VA tool.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1197471"},"PeriodicalIF":3.1,"publicationDate":"2023-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10483407/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10225201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Do you hear the people sing? Comparison of synchronized URL and narrative themes in 2020 and 2023 French protests. 你听到人民在歌唱吗?2020 年和 2023 年法国抗议活动中同步 URL 和叙事主题的比较。
IF 3.1 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-08-24 eCollection Date: 2023-01-01 DOI: 10.3389/fdata.2023.1221744
Lynnette Hui Xian Ng, Kathleen M Carley

Introduction: France has seen two key protests within the term of President Emmanuel Macron: one in 2020 against Islamophobia, and another in 2023 against the pension reform. During these protests, there is much chatter on online social media platforms like Twitter.

Methods: In this study, we aim to analyze the differences between the online chatter of the 2 years through a network-centric view, and in particular the synchrony of users. This study begins by identifying groups of accounts that work together through two methods: temporal synchronicity and narrative similarity. We also apply a bot detection algorithm to identify bots within these networks and analyze the extent of inorganic synchronization within the discourse of these events.

Results: Overall, our findings suggest that the synchrony of users in 2020 on Twitter is much higher than that of 2023, and there are more bot activity in 2020 compared to 2023.

导言:在埃马纽埃尔-马克龙总统的任期内,法国发生了两次重要的抗议活动:一次是 2020 年的反对伊斯兰恐惧症活动,另一次是 2023 年的反对养老金改革活动。在这些抗议活动期间,推特等网络社交媒体平台上出现了大量的讨论:在本研究中,我们旨在通过以网络为中心的视角,特别是用户的同步性,分析这两年网络讨论的差异。本研究首先通过时间同步性和叙事相似性这两种方法来识别共同工作的账户群。我们还应用了一种机器人检测算法来识别这些网络中的机器人,并分析了这些事件话语中的无机同步程度:总体而言,我们的研究结果表明,推特上 2020 年用户的同步性远高于 2023 年,而且 2020 年的机器人活动多于 2023 年。
{"title":"Do you hear the people sing? Comparison of synchronized URL and narrative themes in 2020 and 2023 French protests.","authors":"Lynnette Hui Xian Ng, Kathleen M Carley","doi":"10.3389/fdata.2023.1221744","DOIUrl":"10.3389/fdata.2023.1221744","url":null,"abstract":"<p><strong>Introduction: </strong>France has seen two key protests within the term of President Emmanuel Macron: one in 2020 against Islamophobia, and another in 2023 against the pension reform. During these protests, there is much chatter on online social media platforms like Twitter.</p><p><strong>Methods: </strong>In this study, we aim to analyze the differences between the online chatter of the 2 years through a network-centric view, and in particular the synchrony of users. This study begins by identifying groups of accounts that work together through two methods: temporal synchronicity and narrative similarity. We also apply a bot detection algorithm to identify bots within these networks and analyze the extent of inorganic synchronization within the discourse of these events.</p><p><strong>Results: </strong>Overall, our findings suggest that the synchrony of users in 2020 on Twitter is much higher than that of 2023, and there are more bot activity in 2020 compared to 2023.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1221744"},"PeriodicalIF":3.1,"publicationDate":"2023-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10483998/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10225202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Universal skepticism of ChatGPT: a review of early literature on chat generative pre-trained transformer. 对 ChatGPT 的普遍怀疑:聊天生成预训练转换器早期文献综述。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-08-23 eCollection Date: 2023-01-01 DOI: 10.3389/fdata.2023.1224976
Casey Watters, Michal K Lemanski

ChatGPT, a new language model developed by OpenAI, has garnered significant attention in various fields since its release. This literature review provides an overview of early ChatGPT literature across multiple disciplines, exploring its applications, limitations, and ethical considerations. The review encompasses Scopus-indexed publications from November 2022 to April 2023 and includes 156 articles related to ChatGPT. The findings reveal a predominance of negative sentiment across disciplines, though subject-specific attitudes must be considered. The review highlights the implications of ChatGPT in many fields including healthcare, raising concerns about employment opportunities and ethical considerations. While ChatGPT holds promise for improved communication, further research is needed to address its capabilities and limitations. This literature review provides insights into early research on ChatGPT, informing future investigations and practical applications of chatbot technology, as well as development and usage of generative AI.

ChatGPT 是 OpenAI 开发的一种新型语言模型,自发布以来已在各个领域引起了广泛关注。本文献综述概述了跨学科的早期 ChatGPT 文献,探讨了其应用、局限性和伦理考虑。综述涵盖了 2022 年 11 月至 2023 年 4 月期间 Scopus 索引的出版物,其中包括 156 篇与 ChatGPT 相关的文章。研究结果表明,尽管必须考虑特定学科的态度,但各学科的负面情绪占主导地位。综述强调了 ChatGPT 在包括医疗保健在内的许多领域的影响,引起了人们对就业机会和伦理问题的关注。虽然 ChatGPT 为改善交流带来了希望,但还需要进一步的研究来解决其能力和局限性问题。本文献综述提供了有关 ChatGPT 早期研究的见解,为聊天机器人技术的未来调查和实际应用以及生成式人工智能的开发和使用提供了参考。
{"title":"Universal skepticism of ChatGPT: a review of early literature on chat generative pre-trained transformer.","authors":"Casey Watters, Michal K Lemanski","doi":"10.3389/fdata.2023.1224976","DOIUrl":"10.3389/fdata.2023.1224976","url":null,"abstract":"<p><p>ChatGPT, a new language model developed by OpenAI, has garnered significant attention in various fields since its release. This literature review provides an overview of early ChatGPT literature across multiple disciplines, exploring its applications, limitations, and ethical considerations. The review encompasses Scopus-indexed publications from November 2022 to April 2023 and includes 156 articles related to ChatGPT. The findings reveal a predominance of negative sentiment across disciplines, though subject-specific attitudes must be considered. The review highlights the implications of ChatGPT in many fields including healthcare, raising concerns about employment opportunities and ethical considerations. While ChatGPT holds promise for improved communication, further research is needed to address its capabilities and limitations. This literature review provides insights into early research on ChatGPT, informing future investigations and practical applications of chatbot technology, as well as development and usage of generative AI.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1224976"},"PeriodicalIF":2.4,"publicationDate":"2023-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10482048/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10189854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nonstationary time series forecasting using optimized-EVDHM-ARIMA for COVID-19. 针对 COVID-19 使用优化-EVDHM-ARIMA 进行非平稳时间序列预测。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-06-14 eCollection Date: 2023-01-01 DOI: 10.3389/fdata.2023.1081639
Suraj Singh Nagvanshi, Inderjeet Kaur, Charu Agarwal, Ashish Sharma

The Coronavirus (COVID-19) outbreak swept the world, infected millions of people, and caused many deaths. Multiple COVID-19 variations have been discovered since the initial case in December 2019, indicating that COVID-19 is highly mutable. COVID-19 variation "XE" is the most current of all COVID-19 variants found in January 2022. It is vital to detect the virus transmission rate and forecast instances of infection to be prepared for all scenarios, prepare healthcare services, and avoid deaths. Time-series forecasting helps predict future infected cases and determine the virus transmission rate to make timely decisions. A forecasting model for nonstationary time series has been created in this paper. The model comprises an optimized EigenValue Decomposition of Hankel Matrix (EVDHM) and an optimized AutoRegressive Integrated Moving Average (ARIMA). The Phillips Perron Test (PPT) has been used to determine whether a time series is nonstationary. A time series has been decomposed into components using EVDHM, and each component has been forecasted using ARIMA. The final forecasts have been formed by combining the predicted values of each component. A Genetic Algorithm (GA) to select ARIMA parameters resulting in the lowest Akaike Information Criterion (AIC) values has been used to discover the best ARIMA parameters. Another genetic algorithm has been used to optimize the decomposition results of EVDHM that ensures the minimum nonstationarity and maximal utilization of eigenvalues for each decomposed component.

冠状病毒(COVID-19)疫情席卷全球,感染数百万人,造成多人死亡。自2019年12月首次发现病例以来,已发现多个COVID-19变种,这表明COVID-19具有高度变异性。COVID-19变种 "XE "是2022年1月发现的所有COVID-19变种中最新的一种。检测病毒传播率和预测感染病例对于应对各种情况、准备医疗服务和避免死亡至关重要。时间序列预测有助于预测未来的感染病例并确定病毒传播率,从而及时做出决策。本文创建了一个非平稳时间序列预测模型。该模型由优化的汉克尔矩阵特征值分解(EVDHM)和优化的自回归整合移动平均(ARIMA)组成。菲利普斯-佩伦检验法(PPT)用于确定时间序列是否为非平稳序列。使用 EVDHM 将时间序列分解为多个部分,并使用 ARIMA 对每个部分进行预测。最终的预测值是由每个部分的预测值组合而成的。使用遗传算法(GA)来选择 ARIMA 参数,从而获得最低的 Akaike 信息标准(AIC)值,以发现最佳 ARIMA 参数。另一种遗传算法用于优化 EVDHM 的分解结果,以确保每个分解成分的最小非平稳性和最大特征值利用率。
{"title":"Nonstationary time series forecasting using optimized-EVDHM-ARIMA for COVID-19.","authors":"Suraj Singh Nagvanshi, Inderjeet Kaur, Charu Agarwal, Ashish Sharma","doi":"10.3389/fdata.2023.1081639","DOIUrl":"10.3389/fdata.2023.1081639","url":null,"abstract":"<p><p>The Coronavirus (COVID-19) outbreak swept the world, infected millions of people, and caused many deaths. Multiple COVID-19 variations have been discovered since the initial case in December 2019, indicating that COVID-19 is highly mutable. COVID-19 variation \"XE\" is the most current of all COVID-19 variants found in January 2022. It is vital to detect the virus transmission rate and forecast instances of infection to be prepared for all scenarios, prepare healthcare services, and avoid deaths. Time-series forecasting helps predict future infected cases and determine the virus transmission rate to make timely decisions. A forecasting model for nonstationary time series has been created in this paper. The model comprises an optimized EigenValue Decomposition of Hankel Matrix (EVDHM) and an optimized AutoRegressive Integrated Moving Average (ARIMA). The Phillips Perron Test (PPT) has been used to determine whether a time series is nonstationary. A time series has been decomposed into components using EVDHM, and each component has been forecasted using ARIMA. The final forecasts have been formed by combining the predicted values of each component. A Genetic Algorithm (GA) to select ARIMA parameters resulting in the lowest Akaike Information Criterion (AIC) values has been used to discover the best ARIMA parameters. Another genetic algorithm has been used to optimize the decomposition results of EVDHM that ensures the minimum nonstationarity and maximal utilization of eigenvalues for each decomposed component.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1081639"},"PeriodicalIF":2.4,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10303915/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10114998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Crime, inequality and public health: a survey of emerging trends in urban data science. 犯罪、不平等与公共卫生:城市数据科学新趋势调查。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-05-25 eCollection Date: 2023-01-01 DOI: 10.3389/fdata.2023.1124526
Massimiliano Luca, Gian Maria Campedelli, Simone Centellegher, Michele Tizzoni, Bruno Lepri

Urban agglomerations are constantly and rapidly evolving ecosystems, with globalization and increasing urbanization posing new challenges in sustainable urban development well summarized in the United Nations' Sustainable Development Goals (SDGs). The advent of the digital age generated by modern alternative data sources provides new tools to tackle these challenges with spatio-temporal scales that were previously unavailable with census statistics. In this review, we present how new digital data sources are employed to provide data-driven insights to study and track (i) urban crime and public safety; (ii) socioeconomic inequalities and segregation; and (iii) public health, with a particular focus on the city scale.

城市群是一个不断快速演变的生态系统,随着全球化和城市化进程的加快,城市可持续发展面临着新的挑战,联合国可持续发展目标(SDGs)对此进行了详细总结。由现代替代数据源产生的数字时代的到来为应对这些挑战提供了新的时空尺度工具,这在以前的普查统计中是无法实现的。在本综述中,我们将介绍如何利用新的数字数据源提供数据驱动的洞察力,以研究和跟踪:(i) 城市犯罪和公共安全;(ii) 社会经济不平等和隔离;以及 (iii) 公共卫生,尤其侧重于城市规模。
{"title":"Crime, inequality and public health: a survey of emerging trends in urban data science.","authors":"Massimiliano Luca, Gian Maria Campedelli, Simone Centellegher, Michele Tizzoni, Bruno Lepri","doi":"10.3389/fdata.2023.1124526","DOIUrl":"10.3389/fdata.2023.1124526","url":null,"abstract":"<p><p>Urban agglomerations are constantly and rapidly evolving ecosystems, with globalization and increasing urbanization posing new challenges in sustainable urban development well summarized in the United Nations' Sustainable Development Goals (SDGs). The advent of the digital age generated by modern alternative data sources provides new tools to tackle these challenges with spatio-temporal scales that were previously unavailable with census statistics. In this review, we present how new digital data sources are employed to provide data-driven insights to study and track (i) urban crime and public safety; (ii) socioeconomic inequalities and segregation; and (iii) public health, with a particular focus on the city scale.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1124526"},"PeriodicalIF":2.4,"publicationDate":"2023-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10248183/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10302120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fused multi-modal similarity network as prior in guiding brain imaging genetic association. 融合多模态相似性网络作为指导脑成像基因关联的先验网络。
IF 3.1 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-05-05 eCollection Date: 2023-01-01 DOI: 10.3389/fdata.2023.1151893
Bing He, Linhui Xie, Pradeep Varathan, Kwangsik Nho, Shannon L Risacher, Andrew J Saykin, Jingwen Yan

Introduction: Brain imaging genetics aims to explore the genetic architecture underlying brain structure and functions. Recent studies showed that the incorporation of prior knowledge, such as subject diagnosis information and brain regional correlation, can help identify significantly stronger imaging genetic associations. However, sometimes such information may be incomplete or even unavailable.

Methods: In this study, we explore a new data-driven prior knowledge that captures the subject-level similarity by fusing multi-modal similarity networks. It was incorporated into the sparse canonical correlation analysis (SCCA) model, which is aimed to identify a small set of brain imaging and genetic markers that explain the similarity matrix supported by both modalities. It was applied to amyloid and tau imaging data of the ADNI cohort, respectively.

Results: Fused similarity matrix across imaging and genetic data was found to improve the association performance better or similarly well as diagnosis information, and therefore would be a potential substitute prior when the diagnosis information is not available (i.e., studies focused on healthy controls).

Discussion: Our result confirmed the value of all types of prior knowledge in improving association identification. In addition, the fused network representing the subject relationship supported by multi-modal data showed consistently the best or equally best performance compared to the diagnosis network and the co-expression network.

简介脑成像遗传学旨在探索大脑结构和功能的遗传结构。最近的研究表明,结合受试者的诊断信息和大脑区域相关性等先验知识,有助于发现明显更强的成像遗传关联。然而,有时这些信息可能并不完整,甚至不可用:在这项研究中,我们探索了一种新的数据驱动先验知识,它通过融合多模态相似性网络来捕捉受试者层面的相似性。它被纳入到稀疏典型相关分析(SCCA)模型中,该模型旨在确定一小部分大脑成像和遗传标记,以解释由两种模态支持的相似性矩阵。该模型分别应用于ADNI队列的淀粉样蛋白和tau成像数据:结果:研究发现,影像和基因数据的融合相似性矩阵能更好地提高关联性能,甚至与诊断信息相似,因此在诊断信息不可用的情况下(即以健康对照为重点的研究),可以作为潜在的替代先验指标:讨论:我们的研究结果证实了各类先验知识在改善关联识别方面的价值。此外,与诊断网络和共表达网络相比,由多模态数据支持的代表受试者关系的融合网络始终表现最佳或同样最佳。
{"title":"Fused multi-modal similarity network as prior in guiding brain imaging genetic association.","authors":"Bing He, Linhui Xie, Pradeep Varathan, Kwangsik Nho, Shannon L Risacher, Andrew J Saykin, Jingwen Yan","doi":"10.3389/fdata.2023.1151893","DOIUrl":"10.3389/fdata.2023.1151893","url":null,"abstract":"<p><strong>Introduction: </strong>Brain imaging genetics aims to explore the genetic architecture underlying brain structure and functions. Recent studies showed that the incorporation of prior knowledge, such as subject diagnosis information and brain regional correlation, can help identify significantly stronger imaging genetic associations. However, sometimes such information may be incomplete or even unavailable.</p><p><strong>Methods: </strong>In this study, we explore a new data-driven prior knowledge that captures the subject-level similarity by fusing multi-modal similarity networks. It was incorporated into the sparse canonical correlation analysis (SCCA) model, which is aimed to identify a small set of brain imaging and genetic markers that explain the similarity matrix supported by both modalities. It was applied to amyloid and tau imaging data of the ADNI cohort, respectively.</p><p><strong>Results: </strong>Fused similarity matrix across imaging and genetic data was found to improve the association performance better or similarly well as diagnosis information, and therefore would be a potential substitute prior when the diagnosis information is not available (i.e., studies focused on healthy controls).</p><p><strong>Discussion: </strong>Our result confirmed the value of all types of prior knowledge in improving association identification. In addition, the fused network representing the subject relationship supported by multi-modal data showed consistently the best or equally best performance compared to the diagnosis network and the co-expression network.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1151893"},"PeriodicalIF":3.1,"publicationDate":"2023-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10196480/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10036800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Human behavior in the time of COVID-19: Learning from big data. 新冠肺炎时期的人类行为:从大数据中学习。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-04-06 eCollection Date: 2023-01-01 DOI: 10.3389/fdata.2023.1099182
Hanjia Lyu, Arsal Imtiaz, Yufei Zhao, Jiebo Luo

Since the World Health Organization (WHO) characterized COVID-19 as a pandemic in March 2020, there have been over 600 million confirmed cases of COVID-19 and more than six million deaths as of October 2022. The relationship between the COVID-19 pandemic and human behavior is complicated. On one hand, human behavior is found to shape the spread of the disease. On the other hand, the pandemic has impacted and even changed human behavior in almost every aspect. To provide a holistic understanding of the complex interplay between human behavior and the COVID-19 pandemic, researchers have been employing big data techniques such as natural language processing, computer vision, audio signal processing, frequent pattern mining, and machine learning. In this study, we present an overview of the existing studies on using big data techniques to study human behavior in the time of the COVID-19 pandemic. In particular, we categorize these studies into three groups-using big data to measure, model, and leverage human behavior, respectively. The related tasks, data, and methods are summarized accordingly. To provide more insights into how to fight the COVID-19 pandemic and future global catastrophes, we further discuss challenges and potential opportunities.

自2020年3月世界卫生组织(世界卫生组织)将新冠肺炎定性为大流行以来,截至2022年10月,新冠肺炎确诊病例已超过6亿例,死亡病例超过600万例。新冠肺炎大流行与人类行为之间的关系是复杂的。一方面,人们发现人类行为会影响疾病的传播。另一方面,疫情几乎在各个方面影响甚至改变了人类的行为。为了全面了解人类行为与新冠肺炎大流行之间的复杂相互作用,研究人员一直在使用大数据技术,如自然语言处理、计算机视觉、音频信号处理、频繁模式挖掘和机器学习。在这项研究中,我们概述了在新冠肺炎大流行期间使用大数据技术研究人类行为的现有研究。特别是,我们将这些研究分为三组,分别使用大数据来测量、建模和利用人类行为。相应地总结了相关的任务、数据和方法。为了深入了解如何抗击新冠肺炎大流行和未来的全球灾难,我们进一步讨论了挑战和潜在机遇。
{"title":"Human behavior in the time of COVID-19: Learning from big data.","authors":"Hanjia Lyu, Arsal Imtiaz, Yufei Zhao, Jiebo Luo","doi":"10.3389/fdata.2023.1099182","DOIUrl":"10.3389/fdata.2023.1099182","url":null,"abstract":"<p><p>Since the World Health Organization (WHO) characterized COVID-19 as a pandemic in March 2020, there have been over 600 million confirmed cases of COVID-19 and more than six million deaths as of October 2022. The relationship between the COVID-19 pandemic and human behavior is complicated. On one hand, human behavior is found to shape the spread of the disease. On the other hand, the pandemic has impacted and even changed human behavior in almost every aspect. To provide a holistic understanding of the complex interplay between human behavior and the COVID-19 pandemic, researchers have been employing big data techniques such as natural language processing, computer vision, audio signal processing, frequent pattern mining, and machine learning. In this study, we present an overview of the existing studies on using big data techniques to study human behavior in the time of the COVID-19 pandemic. In particular, we categorize these studies into three groups-using big data to measure, model, and leverage human behavior, respectively. The related tasks, data, and methods are summarized accordingly. To provide more insights into how to fight the COVID-19 pandemic and future global catastrophes, we further discuss challenges and potential opportunities.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1099182"},"PeriodicalIF":2.4,"publicationDate":"2023-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10118015/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9742150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Frontiers in Big Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1