Pub Date : 2023-09-20eCollection Date: 2023-01-01DOI: 10.3389/fdata.2023.1210559
Antonino Ferraro, Antonio Galli, Valerio La Gatta, Marco Postiglione
Introduction: Speech to text (STT) technology has seen increased usage in recent years for automating transcription of spoken language. To choose the most suitable tool for a given task, it is essential to evaluate the performance and quality of both open source and paid STT services.
Methods: In this paper, we conduct a benchmarking study of open source and paid STT services, with a specific focus on assessing their performance concerning the variety of input text. We utilizes ix datasets obtained from diverse sources, including interviews, lectures, and speeches, as input for the STT tools. The evaluation of the instruments employs the Word Error Rate (WER), a standard metric for STT evaluation.
Results: Our analysis of the results demonstrates significant variations in the performance of the STT tools based on the input text. Certain tools exhibit superior performance on specific types of audio samples compared to others. Our study provides insights into STT tool performance when handling substantial data volumes, as well as the challenges and opportunities posed by the multimedia nature of the data.
Discussion: Although paid services generally demonstrate better accuracy and speed compared to open source alternatives, their performance remains dependent on the input text. The study highlights the need for considering specific requirements and characteristics of the audio samples when selecting an appropriate STT tool.
{"title":"Benchmarking open source and paid services for speech to text: an analysis of quality and input variety.","authors":"Antonino Ferraro, Antonio Galli, Valerio La Gatta, Marco Postiglione","doi":"10.3389/fdata.2023.1210559","DOIUrl":"10.3389/fdata.2023.1210559","url":null,"abstract":"<p><strong>Introduction: </strong>Speech to text (STT) technology has seen increased usage in recent years for automating transcription of spoken language. To choose the most suitable tool for a given task, it is essential to evaluate the performance and quality of both open source and paid STT services.</p><p><strong>Methods: </strong>In this paper, we conduct a benchmarking study of open source and paid STT services, with a specific focus on assessing their performance concerning the variety of input text. We utilizes ix datasets obtained from diverse sources, including interviews, lectures, and speeches, as input for the STT tools. The evaluation of the instruments employs the Word Error Rate (WER), a standard metric for STT evaluation.</p><p><strong>Results: </strong>Our analysis of the results demonstrates significant variations in the performance of the STT tools based on the input text. Certain tools exhibit superior performance on specific types of audio samples compared to others. Our study provides insights into STT tool performance when handling substantial data volumes, as well as the challenges and opportunities posed by the multimedia nature of the data.</p><p><strong>Discussion: </strong>Although paid services generally demonstrate better accuracy and speed compared to open source alternatives, their performance remains dependent on the input text. The study highlights the need for considering specific requirements and characteristics of the audio samples when selecting an appropriate STT tool.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1210559"},"PeriodicalIF":3.1,"publicationDate":"2023-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10548127/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41157619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-18eCollection Date: 2023-01-01DOI: 10.3389/fdata.2023.1205766
Daniela D'Auria, Raffaele Russo, Alfonso Fedele, Federica Addabbo, Diego Calvanese
The COVID-19 emergency underscored the importance of resolving crucial issues of territorial health monitoring, such as overloaded phone lines, doctors exposed to infection, chronically ill patients unable to access hospitals, etc. In fact, it often happened that people would call doctors/hospitals just out of anxiety, not realizing that they were clogging up communications, thus causing problems for those who needed them most; such people, often elderly, have often felt lonely and abandoned by the health care system because of poor telemedicine. In addition, doctors were unable to follow up on the most serious cases or make sure that others did not worsen. Thus, uring the first pandemic wave we had the idea to design a system that could help people alleviate their fears and be constantly monitored by doctors both in hospitals and at home; consequently, we developed reCOVeryaID, a telemonitoring application for coronavirus patients. It is an autonomous application supported by a knowledge base that can react promptly and inform medical doctors if dangerous trends in the patient's short- and long-term vital signs are detected. In this paper, we also validate the knowledge-base rules in real-world settings by testing them on data from real patients infected with COVID-19.
{"title":"An intelligent telemonitoring application for coronavirus patients: reCOVeryaID.","authors":"Daniela D'Auria, Raffaele Russo, Alfonso Fedele, Federica Addabbo, Diego Calvanese","doi":"10.3389/fdata.2023.1205766","DOIUrl":"https://doi.org/10.3389/fdata.2023.1205766","url":null,"abstract":"<p><p>The COVID-19 emergency underscored the importance of resolving crucial issues of territorial health monitoring, such as overloaded phone lines, doctors exposed to infection, chronically ill patients unable to access hospitals, etc. In fact, it often happened that people would call doctors/hospitals just out of anxiety, not realizing that they were clogging up communications, thus causing problems for those who needed them most; such people, often elderly, have often felt lonely and abandoned by the health care system because of poor telemedicine. In addition, doctors were unable to follow up on the most serious cases or make sure that others did not worsen. Thus, uring the first pandemic wave we had the idea to design a system that could help people alleviate their fears and be constantly monitored by doctors both in hospitals and at home; consequently, we developed reCOVeryaID, a telemonitoring application for coronavirus patients. It is an autonomous application supported by a knowledge base that can react promptly and inform medical doctors if dangerous trends in the patient's short- and long-term vital signs are detected. In this paper, we also validate the knowledge-base rules in real-world settings by testing them on data from real patients infected with COVID-19.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1205766"},"PeriodicalIF":3.1,"publicationDate":"2023-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10543687/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41159201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-31eCollection Date: 2023-01-01DOI: 10.3389/fdata.2023.1200390
William Villegas-Ch, Joselin García-Ortiz
Perimeter security in data centers helps protect systems and the data they store by preventing unauthorized access and protecting critical resources from potential threats. According to the report of the information security company SonicWall, in 2021, there was a 66% increase in the number of ransomware attacks. In addition, the message from the same company indicates that the total number of cyber threats detected in 2021 increased by 24% compared to 2019. Among these attacks, the infrastructure of data centers was compromised; for this reason, organizations include elements Physical such as security cameras, movement detection systems, authentication systems, etc., as an additional measure that contributes to perimeter security. This work proposes using artificial intelligence in the perimeter security of data centers. It allows the automation and optimization of security processes, which translates into greater efficiency and reliability in the operations that prevent intrusions through authentication, permit verification, and monitoring critical areas. It is crucial to ensure that AI-based perimeter security systems are designed to protect and respect user privacy. In addition, it is essential to regularly monitor the effectiveness and integrity of these systems to ensure that they function correctly and meet security standards.
{"title":"Authentication, access, and monitoring system for critical areas with the use of artificial intelligence integrated into perimeter security in a data center.","authors":"William Villegas-Ch, Joselin García-Ortiz","doi":"10.3389/fdata.2023.1200390","DOIUrl":"10.3389/fdata.2023.1200390","url":null,"abstract":"<p><p>Perimeter security in data centers helps protect systems and the data they store by preventing unauthorized access and protecting critical resources from potential threats. According to the report of the information security company SonicWall, in 2021, there was a 66% increase in the number of ransomware attacks. In addition, the message from the same company indicates that the total number of cyber threats detected in 2021 increased by 24% compared to 2019. Among these attacks, the infrastructure of data centers was compromised; for this reason, organizations include elements Physical such as security cameras, movement detection systems, authentication systems, etc., as an additional measure that contributes to perimeter security. This work proposes using artificial intelligence in the perimeter security of data centers. It allows the automation and optimization of security processes, which translates into greater efficiency and reliability in the operations that prevent intrusions through authentication, permit verification, and monitoring critical areas. It is crucial to ensure that AI-based perimeter security systems are designed to protect and respect user privacy. In addition, it is essential to regularly monitor the effectiveness and integrity of these systems to ensure that they function correctly and meet security standards.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1200390"},"PeriodicalIF":3.1,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10500307/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10289348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-24eCollection Date: 2023-01-01DOI: 10.3389/fdata.2023.1197471
Sudhir K Benara, Saurabh Sharma, Atul Juneja, Saritha Nair, B K Gulati, Kh Jitenkumar Singh, Lucky Singh, Ved Prakash Yadav, Chalapati Rao, M Vishnu Vardhana Rao
Background: Physician-coded verbal autopsy (PCVA) is the most widely used method to determine causes of death (COD) in countries where medical certification of death is low. Computer-coded verbal autopsy (CCVA), an alternative method to PCVA for assigning the COD is considered to be efficient and cost-effective. However, the performance of CCVA as compared to PCVA is yet to be established in the Indian context.
Methods: We evaluated the performance of PCVA and three CCVA methods i.e., InterVA 5, InSilico, and Tariff 2.0 on verbal autopsies done using the WHO 2016 VA tool on 2,120 reference standard cases developed from five tertiary care hospitals of Delhi. PCVA methodology involved dual independent review with adjudication, where required. Metrics to assess performance were Cause Specific Mortality Fraction (CSMF), sensitivity, positive predictive value (PPV), CSMF Accuracy, and Kappa statistic.
Results: In terms of the measures of the overall performance of COD assignment methods, for CSMF Accuracy, the PCVA method achieved the highest score of 0.79, followed by 0.67 for Tariff_2.0, 0.66 for Inter-VA and 0.62 for InSilicoVA. The PCVA method also achieved the highest agreement (57%) and Kappa scores (0.54). The PCVA method showed the highest sensitivity for 15 out of 20 causes of death.
Conclusion: Our study found that the PCVA method had the best performance out of all the four COD assignment methods that were tested in our study sample. In order to improve the performance of CCVA methods, multicentric studies with larger sample sizes need to be conducted using the WHO VA tool.
{"title":"Evaluation of methods for assigning causes of death from verbal autopsies in India.","authors":"Sudhir K Benara, Saurabh Sharma, Atul Juneja, Saritha Nair, B K Gulati, Kh Jitenkumar Singh, Lucky Singh, Ved Prakash Yadav, Chalapati Rao, M Vishnu Vardhana Rao","doi":"10.3389/fdata.2023.1197471","DOIUrl":"10.3389/fdata.2023.1197471","url":null,"abstract":"<p><strong>Background: </strong>Physician-coded verbal autopsy (PCVA) is the most widely used method to determine causes of death (COD) in countries where medical certification of death is low. Computer-coded verbal autopsy (CCVA), an alternative method to PCVA for assigning the COD is considered to be efficient and cost-effective. However, the performance of CCVA as compared to PCVA is yet to be established in the Indian context.</p><p><strong>Methods: </strong>We evaluated the performance of PCVA and three CCVA methods i.e., InterVA 5, InSilico, and Tariff 2.0 on verbal autopsies done using the WHO 2016 VA tool on 2,120 reference standard cases developed from five tertiary care hospitals of Delhi. PCVA methodology involved dual independent review with adjudication, where required. Metrics to assess performance were Cause Specific Mortality Fraction (CSMF), sensitivity, positive predictive value (PPV), CSMF Accuracy, and Kappa statistic.</p><p><strong>Results: </strong>In terms of the measures of the overall performance of COD assignment methods, for CSMF Accuracy, the PCVA method achieved the highest score of 0.79, followed by 0.67 for Tariff_2.0, 0.66 for Inter-VA and 0.62 for InSilicoVA. The PCVA method also achieved the highest agreement (57%) and Kappa scores (0.54). The PCVA method showed the highest sensitivity for 15 out of 20 causes of death.</p><p><strong>Conclusion: </strong>Our study found that the PCVA method had the best performance out of all the four COD assignment methods that were tested in our study sample. In order to improve the performance of CCVA methods, multicentric studies with larger sample sizes need to be conducted using the WHO VA tool.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1197471"},"PeriodicalIF":3.1,"publicationDate":"2023-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10483407/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10225201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-24eCollection Date: 2023-01-01DOI: 10.3389/fdata.2023.1221744
Lynnette Hui Xian Ng, Kathleen M Carley
Introduction: France has seen two key protests within the term of President Emmanuel Macron: one in 2020 against Islamophobia, and another in 2023 against the pension reform. During these protests, there is much chatter on online social media platforms like Twitter.
Methods: In this study, we aim to analyze the differences between the online chatter of the 2 years through a network-centric view, and in particular the synchrony of users. This study begins by identifying groups of accounts that work together through two methods: temporal synchronicity and narrative similarity. We also apply a bot detection algorithm to identify bots within these networks and analyze the extent of inorganic synchronization within the discourse of these events.
Results: Overall, our findings suggest that the synchrony of users in 2020 on Twitter is much higher than that of 2023, and there are more bot activity in 2020 compared to 2023.
{"title":"Do you hear the people sing? Comparison of synchronized URL and narrative themes in 2020 and 2023 French protests.","authors":"Lynnette Hui Xian Ng, Kathleen M Carley","doi":"10.3389/fdata.2023.1221744","DOIUrl":"10.3389/fdata.2023.1221744","url":null,"abstract":"<p><strong>Introduction: </strong>France has seen two key protests within the term of President Emmanuel Macron: one in 2020 against Islamophobia, and another in 2023 against the pension reform. During these protests, there is much chatter on online social media platforms like Twitter.</p><p><strong>Methods: </strong>In this study, we aim to analyze the differences between the online chatter of the 2 years through a network-centric view, and in particular the synchrony of users. This study begins by identifying groups of accounts that work together through two methods: temporal synchronicity and narrative similarity. We also apply a bot detection algorithm to identify bots within these networks and analyze the extent of inorganic synchronization within the discourse of these events.</p><p><strong>Results: </strong>Overall, our findings suggest that the synchrony of users in 2020 on Twitter is much higher than that of 2023, and there are more bot activity in 2020 compared to 2023.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1221744"},"PeriodicalIF":3.1,"publicationDate":"2023-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10483998/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10225202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-23eCollection Date: 2023-01-01DOI: 10.3389/fdata.2023.1224976
Casey Watters, Michal K Lemanski
ChatGPT, a new language model developed by OpenAI, has garnered significant attention in various fields since its release. This literature review provides an overview of early ChatGPT literature across multiple disciplines, exploring its applications, limitations, and ethical considerations. The review encompasses Scopus-indexed publications from November 2022 to April 2023 and includes 156 articles related to ChatGPT. The findings reveal a predominance of negative sentiment across disciplines, though subject-specific attitudes must be considered. The review highlights the implications of ChatGPT in many fields including healthcare, raising concerns about employment opportunities and ethical considerations. While ChatGPT holds promise for improved communication, further research is needed to address its capabilities and limitations. This literature review provides insights into early research on ChatGPT, informing future investigations and practical applications of chatbot technology, as well as development and usage of generative AI.
{"title":"Universal skepticism of ChatGPT: a review of early literature on chat generative pre-trained transformer.","authors":"Casey Watters, Michal K Lemanski","doi":"10.3389/fdata.2023.1224976","DOIUrl":"10.3389/fdata.2023.1224976","url":null,"abstract":"<p><p>ChatGPT, a new language model developed by OpenAI, has garnered significant attention in various fields since its release. This literature review provides an overview of early ChatGPT literature across multiple disciplines, exploring its applications, limitations, and ethical considerations. The review encompasses Scopus-indexed publications from November 2022 to April 2023 and includes 156 articles related to ChatGPT. The findings reveal a predominance of negative sentiment across disciplines, though subject-specific attitudes must be considered. The review highlights the implications of ChatGPT in many fields including healthcare, raising concerns about employment opportunities and ethical considerations. While ChatGPT holds promise for improved communication, further research is needed to address its capabilities and limitations. This literature review provides insights into early research on ChatGPT, informing future investigations and practical applications of chatbot technology, as well as development and usage of generative AI.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1224976"},"PeriodicalIF":2.4,"publicationDate":"2023-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10482048/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10189854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Coronavirus (COVID-19) outbreak swept the world, infected millions of people, and caused many deaths. Multiple COVID-19 variations have been discovered since the initial case in December 2019, indicating that COVID-19 is highly mutable. COVID-19 variation "XE" is the most current of all COVID-19 variants found in January 2022. It is vital to detect the virus transmission rate and forecast instances of infection to be prepared for all scenarios, prepare healthcare services, and avoid deaths. Time-series forecasting helps predict future infected cases and determine the virus transmission rate to make timely decisions. A forecasting model for nonstationary time series has been created in this paper. The model comprises an optimized EigenValue Decomposition of Hankel Matrix (EVDHM) and an optimized AutoRegressive Integrated Moving Average (ARIMA). The Phillips Perron Test (PPT) has been used to determine whether a time series is nonstationary. A time series has been decomposed into components using EVDHM, and each component has been forecasted using ARIMA. The final forecasts have been formed by combining the predicted values of each component. A Genetic Algorithm (GA) to select ARIMA parameters resulting in the lowest Akaike Information Criterion (AIC) values has been used to discover the best ARIMA parameters. Another genetic algorithm has been used to optimize the decomposition results of EVDHM that ensures the minimum nonstationarity and maximal utilization of eigenvalues for each decomposed component.
{"title":"Nonstationary time series forecasting using optimized-EVDHM-ARIMA for COVID-19.","authors":"Suraj Singh Nagvanshi, Inderjeet Kaur, Charu Agarwal, Ashish Sharma","doi":"10.3389/fdata.2023.1081639","DOIUrl":"10.3389/fdata.2023.1081639","url":null,"abstract":"<p><p>The Coronavirus (COVID-19) outbreak swept the world, infected millions of people, and caused many deaths. Multiple COVID-19 variations have been discovered since the initial case in December 2019, indicating that COVID-19 is highly mutable. COVID-19 variation \"XE\" is the most current of all COVID-19 variants found in January 2022. It is vital to detect the virus transmission rate and forecast instances of infection to be prepared for all scenarios, prepare healthcare services, and avoid deaths. Time-series forecasting helps predict future infected cases and determine the virus transmission rate to make timely decisions. A forecasting model for nonstationary time series has been created in this paper. The model comprises an optimized EigenValue Decomposition of Hankel Matrix (EVDHM) and an optimized AutoRegressive Integrated Moving Average (ARIMA). The Phillips Perron Test (PPT) has been used to determine whether a time series is nonstationary. A time series has been decomposed into components using EVDHM, and each component has been forecasted using ARIMA. The final forecasts have been formed by combining the predicted values of each component. A Genetic Algorithm (GA) to select ARIMA parameters resulting in the lowest Akaike Information Criterion (AIC) values has been used to discover the best ARIMA parameters. Another genetic algorithm has been used to optimize the decomposition results of EVDHM that ensures the minimum nonstationarity and maximal utilization of eigenvalues for each decomposed component.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1081639"},"PeriodicalIF":2.4,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10303915/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10114998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-25eCollection Date: 2023-01-01DOI: 10.3389/fdata.2023.1124526
Massimiliano Luca, Gian Maria Campedelli, Simone Centellegher, Michele Tizzoni, Bruno Lepri
Urban agglomerations are constantly and rapidly evolving ecosystems, with globalization and increasing urbanization posing new challenges in sustainable urban development well summarized in the United Nations' Sustainable Development Goals (SDGs). The advent of the digital age generated by modern alternative data sources provides new tools to tackle these challenges with spatio-temporal scales that were previously unavailable with census statistics. In this review, we present how new digital data sources are employed to provide data-driven insights to study and track (i) urban crime and public safety; (ii) socioeconomic inequalities and segregation; and (iii) public health, with a particular focus on the city scale.
{"title":"Crime, inequality and public health: a survey of emerging trends in urban data science.","authors":"Massimiliano Luca, Gian Maria Campedelli, Simone Centellegher, Michele Tizzoni, Bruno Lepri","doi":"10.3389/fdata.2023.1124526","DOIUrl":"10.3389/fdata.2023.1124526","url":null,"abstract":"<p><p>Urban agglomerations are constantly and rapidly evolving ecosystems, with globalization and increasing urbanization posing new challenges in sustainable urban development well summarized in the United Nations' Sustainable Development Goals (SDGs). The advent of the digital age generated by modern alternative data sources provides new tools to tackle these challenges with spatio-temporal scales that were previously unavailable with census statistics. In this review, we present how new digital data sources are employed to provide data-driven insights to study and track (i) urban crime and public safety; (ii) socioeconomic inequalities and segregation; and (iii) public health, with a particular focus on the city scale.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1124526"},"PeriodicalIF":2.4,"publicationDate":"2023-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10248183/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10302120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-05eCollection Date: 2023-01-01DOI: 10.3389/fdata.2023.1151893
Bing He, Linhui Xie, Pradeep Varathan, Kwangsik Nho, Shannon L Risacher, Andrew J Saykin, Jingwen Yan
Introduction: Brain imaging genetics aims to explore the genetic architecture underlying brain structure and functions. Recent studies showed that the incorporation of prior knowledge, such as subject diagnosis information and brain regional correlation, can help identify significantly stronger imaging genetic associations. However, sometimes such information may be incomplete or even unavailable.
Methods: In this study, we explore a new data-driven prior knowledge that captures the subject-level similarity by fusing multi-modal similarity networks. It was incorporated into the sparse canonical correlation analysis (SCCA) model, which is aimed to identify a small set of brain imaging and genetic markers that explain the similarity matrix supported by both modalities. It was applied to amyloid and tau imaging data of the ADNI cohort, respectively.
Results: Fused similarity matrix across imaging and genetic data was found to improve the association performance better or similarly well as diagnosis information, and therefore would be a potential substitute prior when the diagnosis information is not available (i.e., studies focused on healthy controls).
Discussion: Our result confirmed the value of all types of prior knowledge in improving association identification. In addition, the fused network representing the subject relationship supported by multi-modal data showed consistently the best or equally best performance compared to the diagnosis network and the co-expression network.
{"title":"Fused multi-modal similarity network as prior in guiding brain imaging genetic association.","authors":"Bing He, Linhui Xie, Pradeep Varathan, Kwangsik Nho, Shannon L Risacher, Andrew J Saykin, Jingwen Yan","doi":"10.3389/fdata.2023.1151893","DOIUrl":"10.3389/fdata.2023.1151893","url":null,"abstract":"<p><strong>Introduction: </strong>Brain imaging genetics aims to explore the genetic architecture underlying brain structure and functions. Recent studies showed that the incorporation of prior knowledge, such as subject diagnosis information and brain regional correlation, can help identify significantly stronger imaging genetic associations. However, sometimes such information may be incomplete or even unavailable.</p><p><strong>Methods: </strong>In this study, we explore a new data-driven prior knowledge that captures the subject-level similarity by fusing multi-modal similarity networks. It was incorporated into the sparse canonical correlation analysis (SCCA) model, which is aimed to identify a small set of brain imaging and genetic markers that explain the similarity matrix supported by both modalities. It was applied to amyloid and tau imaging data of the ADNI cohort, respectively.</p><p><strong>Results: </strong>Fused similarity matrix across imaging and genetic data was found to improve the association performance better or similarly well as diagnosis information, and therefore would be a potential substitute prior when the diagnosis information is not available (i.e., studies focused on healthy controls).</p><p><strong>Discussion: </strong>Our result confirmed the value of all types of prior knowledge in improving association identification. In addition, the fused network representing the subject relationship supported by multi-modal data showed consistently the best or equally best performance compared to the diagnosis network and the co-expression network.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1151893"},"PeriodicalIF":3.1,"publicationDate":"2023-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10196480/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10036800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-06eCollection Date: 2023-01-01DOI: 10.3389/fdata.2023.1099182
Hanjia Lyu, Arsal Imtiaz, Yufei Zhao, Jiebo Luo
Since the World Health Organization (WHO) characterized COVID-19 as a pandemic in March 2020, there have been over 600 million confirmed cases of COVID-19 and more than six million deaths as of October 2022. The relationship between the COVID-19 pandemic and human behavior is complicated. On one hand, human behavior is found to shape the spread of the disease. On the other hand, the pandemic has impacted and even changed human behavior in almost every aspect. To provide a holistic understanding of the complex interplay between human behavior and the COVID-19 pandemic, researchers have been employing big data techniques such as natural language processing, computer vision, audio signal processing, frequent pattern mining, and machine learning. In this study, we present an overview of the existing studies on using big data techniques to study human behavior in the time of the COVID-19 pandemic. In particular, we categorize these studies into three groups-using big data to measure, model, and leverage human behavior, respectively. The related tasks, data, and methods are summarized accordingly. To provide more insights into how to fight the COVID-19 pandemic and future global catastrophes, we further discuss challenges and potential opportunities.
{"title":"Human behavior in the time of COVID-19: Learning from big data.","authors":"Hanjia Lyu, Arsal Imtiaz, Yufei Zhao, Jiebo Luo","doi":"10.3389/fdata.2023.1099182","DOIUrl":"10.3389/fdata.2023.1099182","url":null,"abstract":"<p><p>Since the World Health Organization (WHO) characterized COVID-19 as a pandemic in March 2020, there have been over 600 million confirmed cases of COVID-19 and more than six million deaths as of October 2022. The relationship between the COVID-19 pandemic and human behavior is complicated. On one hand, human behavior is found to shape the spread of the disease. On the other hand, the pandemic has impacted and even changed human behavior in almost every aspect. To provide a holistic understanding of the complex interplay between human behavior and the COVID-19 pandemic, researchers have been employing big data techniques such as natural language processing, computer vision, audio signal processing, frequent pattern mining, and machine learning. In this study, we present an overview of the existing studies on using big data techniques to study human behavior in the time of the COVID-19 pandemic. In particular, we categorize these studies into three groups-using big data to measure, model, and leverage human behavior, respectively. The related tasks, data, and methods are summarized accordingly. To provide more insights into how to fight the COVID-19 pandemic and future global catastrophes, we further discuss challenges and potential opportunities.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1099182"},"PeriodicalIF":2.4,"publicationDate":"2023-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10118015/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9742150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}