D. Pham, Minh-Tan Nguyen, Ha-Nam Nguyen, Tien-Dzung Tran
{"title":"用复杂网络技术分析越南北部的癌症数据","authors":"D. Pham, Minh-Tan Nguyen, Ha-Nam Nguyen, Tien-Dzung Tran","doi":"10.31130/ict-ud.2021.140","DOIUrl":null,"url":null,"abstract":"Data-clustering tools can be employed to generate new knowledge for the diagnosis and treatment of cancer. However, traditional clustering methods, such as the K-mean approach, often require the determination of input parameters such as the cluster number and initial centers to be viable. In this study, we present a network science-based clustering method with fewer parameters that were used to mine a cancer-screening dataset containing over 177,000 records. We propose an algorithm that computes the similarity between pairs of records to create a complex network in which each node represents a record, and two nodes are connected by an edge if their similarity is greater than a given threshold as determined by experimental observation. Based on the network created, we employed the network modularity optimization algorithm to detect modules (clusters) within it. Each cluster contains records that are similar to one another in terms of some attributes; therefore, we could derive rules from the cluster for insights into the cancer situation in Vietnam. These rules reveal that some cancer types are more widespread in specific families and living environments in Vietnam. Clustering data based on network science can therefore be a good option for large-scale relational data-mining problems in the future.","PeriodicalId":114451,"journal":{"name":"Journal of Science and Technology: Issue on Information and Communications Technology","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Analyzing cancer data in North Vietnam by complex network technique\",\"authors\":\"D. Pham, Minh-Tan Nguyen, Ha-Nam Nguyen, Tien-Dzung Tran\",\"doi\":\"10.31130/ict-ud.2021.140\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data-clustering tools can be employed to generate new knowledge for the diagnosis and treatment of cancer. However, traditional clustering methods, such as the K-mean approach, often require the determination of input parameters such as the cluster number and initial centers to be viable. In this study, we present a network science-based clustering method with fewer parameters that were used to mine a cancer-screening dataset containing over 177,000 records. We propose an algorithm that computes the similarity between pairs of records to create a complex network in which each node represents a record, and two nodes are connected by an edge if their similarity is greater than a given threshold as determined by experimental observation. Based on the network created, we employed the network modularity optimization algorithm to detect modules (clusters) within it. Each cluster contains records that are similar to one another in terms of some attributes; therefore, we could derive rules from the cluster for insights into the cancer situation in Vietnam. These rules reveal that some cancer types are more widespread in specific families and living environments in Vietnam. Clustering data based on network science can therefore be a good option for large-scale relational data-mining problems in the future.\",\"PeriodicalId\":114451,\"journal\":{\"name\":\"Journal of Science and Technology: Issue on Information and Communications Technology\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Science and Technology: Issue on Information and Communications Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.31130/ict-ud.2021.140\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Science and Technology: Issue on Information and Communications Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31130/ict-ud.2021.140","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Analyzing cancer data in North Vietnam by complex network technique
Data-clustering tools can be employed to generate new knowledge for the diagnosis and treatment of cancer. However, traditional clustering methods, such as the K-mean approach, often require the determination of input parameters such as the cluster number and initial centers to be viable. In this study, we present a network science-based clustering method with fewer parameters that were used to mine a cancer-screening dataset containing over 177,000 records. We propose an algorithm that computes the similarity between pairs of records to create a complex network in which each node represents a record, and two nodes are connected by an edge if their similarity is greater than a given threshold as determined by experimental observation. Based on the network created, we employed the network modularity optimization algorithm to detect modules (clusters) within it. Each cluster contains records that are similar to one another in terms of some attributes; therefore, we could derive rules from the cluster for insights into the cancer situation in Vietnam. These rules reveal that some cancer types are more widespread in specific families and living environments in Vietnam. Clustering data based on network science can therefore be a good option for large-scale relational data-mining problems in the future.