D. Pham, Minh-Tan Nguyen, Ha-Nam Nguyen, Tien-Dzung Tran
{"title":"Analyzing cancer data in North Vietnam by complex network technique","authors":"D. Pham, Minh-Tan Nguyen, Ha-Nam Nguyen, Tien-Dzung Tran","doi":"10.31130/ict-ud.2021.140","DOIUrl":null,"url":null,"abstract":"Data-clustering tools can be employed to generate new knowledge for the diagnosis and treatment of cancer. However, traditional clustering methods, such as the K-mean approach, often require the determination of input parameters such as the cluster number and initial centers to be viable. In this study, we present a network science-based clustering method with fewer parameters that were used to mine a cancer-screening dataset containing over 177,000 records. We propose an algorithm that computes the similarity between pairs of records to create a complex network in which each node represents a record, and two nodes are connected by an edge if their similarity is greater than a given threshold as determined by experimental observation. Based on the network created, we employed the network modularity optimization algorithm to detect modules (clusters) within it. Each cluster contains records that are similar to one another in terms of some attributes; therefore, we could derive rules from the cluster for insights into the cancer situation in Vietnam. These rules reveal that some cancer types are more widespread in specific families and living environments in Vietnam. Clustering data based on network science can therefore be a good option for large-scale relational data-mining problems in the future.","PeriodicalId":114451,"journal":{"name":"Journal of Science and Technology: Issue on Information and Communications Technology","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Science and Technology: Issue on Information and Communications Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31130/ict-ud.2021.140","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Data-clustering tools can be employed to generate new knowledge for the diagnosis and treatment of cancer. However, traditional clustering methods, such as the K-mean approach, often require the determination of input parameters such as the cluster number and initial centers to be viable. In this study, we present a network science-based clustering method with fewer parameters that were used to mine a cancer-screening dataset containing over 177,000 records. We propose an algorithm that computes the similarity between pairs of records to create a complex network in which each node represents a record, and two nodes are connected by an edge if their similarity is greater than a given threshold as determined by experimental observation. Based on the network created, we employed the network modularity optimization algorithm to detect modules (clusters) within it. Each cluster contains records that are similar to one another in terms of some attributes; therefore, we could derive rules from the cluster for insights into the cancer situation in Vietnam. These rules reveal that some cancer types are more widespread in specific families and living environments in Vietnam. Clustering data based on network science can therefore be a good option for large-scale relational data-mining problems in the future.