{"title":"面向COVID-19数据分析和可视化的并行网络分析和社区检测(PANC)管道","authors":"Giuseppe Agapito, Marianna Milano, M. Cannataro","doi":"10.1142/s0129626421420020","DOIUrl":null,"url":null,"abstract":"A new coronavirus, causing a severe acute respiratory syndrome (COVID-19), was started at Wuhan, China, in December 2019. The epidemic has rapidly spread across the world becoming a pandemic that, as of today, has affected more than 70 million people causing over 2 million deaths. To better understand the evolution of spread of the COVID-19 pandemic, we developed PANC (Parallel Network Analysis and Communities Detection), a new parallel preprocessing methodology for network-based analysis and communities detection on Italian COVID-19 data. The goal of the methodology is to analyze set of homogeneous datasets (i.e. COVID-19 data in several regions) using a statistical test to find similar/dissimilar behaviours, mapping such similarity information on a graph and then using community detection algorithm to visualize and analyze the initial dataset. The methodology includes the following steps: (i) a parallel methodology to build similarity matrices that represent similar or dissimilar regions with respect to data; (ii) an effective workload balancing function to improve performance; (iii) the mapping of similarity matrices into networks where nodes represent Italian regions, and edges represent similarity relationships; (iv) the discovering and visualization of communities of regions that show similar behaviour. The methodology is general and can be applied to world-wide data about COVID-19, as well as to all types of data sets in tabular and matrix format. To estimate the scalability with increasing workloads, we analyzed three synthetic COVID-19 datasets with the size of 90.0[Formula: see text]MB, 180.0[Formula: see text]MB, and 360.0[Formula: see text]MB. Experiments was performed on showing the amount of data that can be analyzed in a given amount of time increases almost linearly with the number of computing resources available. Instead, to perform communities detection, we employed the real data set.","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Parallel Network Analysis and Communities Detection (PANC) Pipeline for the Analysis and Visualization of COVID-19 Data\",\"authors\":\"Giuseppe Agapito, Marianna Milano, M. Cannataro\",\"doi\":\"10.1142/s0129626421420020\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A new coronavirus, causing a severe acute respiratory syndrome (COVID-19), was started at Wuhan, China, in December 2019. The epidemic has rapidly spread across the world becoming a pandemic that, as of today, has affected more than 70 million people causing over 2 million deaths. To better understand the evolution of spread of the COVID-19 pandemic, we developed PANC (Parallel Network Analysis and Communities Detection), a new parallel preprocessing methodology for network-based analysis and communities detection on Italian COVID-19 data. The goal of the methodology is to analyze set of homogeneous datasets (i.e. COVID-19 data in several regions) using a statistical test to find similar/dissimilar behaviours, mapping such similarity information on a graph and then using community detection algorithm to visualize and analyze the initial dataset. The methodology includes the following steps: (i) a parallel methodology to build similarity matrices that represent similar or dissimilar regions with respect to data; (ii) an effective workload balancing function to improve performance; (iii) the mapping of similarity matrices into networks where nodes represent Italian regions, and edges represent similarity relationships; (iv) the discovering and visualization of communities of regions that show similar behaviour. The methodology is general and can be applied to world-wide data about COVID-19, as well as to all types of data sets in tabular and matrix format. To estimate the scalability with increasing workloads, we analyzed three synthetic COVID-19 datasets with the size of 90.0[Formula: see text]MB, 180.0[Formula: see text]MB, and 360.0[Formula: see text]MB. Experiments was performed on showing the amount of data that can be analyzed in a given amount of time increases almost linearly with the number of computing resources available. Instead, to perform communities detection, we employed the real data set.\",\"PeriodicalId\":422436,\"journal\":{\"name\":\"Parallel Process. Lett.\",\"volume\":\"81 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Parallel Process. Lett.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/s0129626421420020\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Parallel Process. Lett.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s0129626421420020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Parallel Network Analysis and Communities Detection (PANC) Pipeline for the Analysis and Visualization of COVID-19 Data
A new coronavirus, causing a severe acute respiratory syndrome (COVID-19), was started at Wuhan, China, in December 2019. The epidemic has rapidly spread across the world becoming a pandemic that, as of today, has affected more than 70 million people causing over 2 million deaths. To better understand the evolution of spread of the COVID-19 pandemic, we developed PANC (Parallel Network Analysis and Communities Detection), a new parallel preprocessing methodology for network-based analysis and communities detection on Italian COVID-19 data. The goal of the methodology is to analyze set of homogeneous datasets (i.e. COVID-19 data in several regions) using a statistical test to find similar/dissimilar behaviours, mapping such similarity information on a graph and then using community detection algorithm to visualize and analyze the initial dataset. The methodology includes the following steps: (i) a parallel methodology to build similarity matrices that represent similar or dissimilar regions with respect to data; (ii) an effective workload balancing function to improve performance; (iii) the mapping of similarity matrices into networks where nodes represent Italian regions, and edges represent similarity relationships; (iv) the discovering and visualization of communities of regions that show similar behaviour. The methodology is general and can be applied to world-wide data about COVID-19, as well as to all types of data sets in tabular and matrix format. To estimate the scalability with increasing workloads, we analyzed three synthetic COVID-19 datasets with the size of 90.0[Formula: see text]MB, 180.0[Formula: see text]MB, and 360.0[Formula: see text]MB. Experiments was performed on showing the amount of data that can be analyzed in a given amount of time increases almost linearly with the number of computing resources available. Instead, to perform communities detection, we employed the real data set.