Analyzing cancer data in North Vietnam by complex network technique

Journal of Science and Technology: Issue on Information and Communications Technology Pub Date : 2022-01-25 DOI:10.31130/ict-ud.2021.140

D. Pham, Minh-Tan Nguyen, Ha-Nam Nguyen, Tien-Dzung Tran

{"title":"Analyzing cancer data in North Vietnam by complex network technique","authors":"D. Pham, Minh-Tan Nguyen, Ha-Nam Nguyen, Tien-Dzung Tran","doi":"10.31130/ict-ud.2021.140","DOIUrl":null,"url":null,"abstract":"Data-clustering tools can be employed to generate new knowledge for the diagnosis and treatment of cancer. However, traditional clustering methods, such as the K-mean approach, often require the determination of input parameters such as the cluster number and initial centers to be viable. In this study, we present a network science-based clustering method with fewer parameters that were used to mine a cancer-screening dataset containing over 177,000 records. We propose an algorithm that computes the similarity between pairs of records to create a complex network in which each node represents a record, and two nodes are connected by an edge if their similarity is greater than a given threshold as determined by experimental observation. Based on the network created, we employed the network modularity optimization algorithm to detect modules (clusters) within it. Each cluster contains records that are similar to one another in terms of some attributes; therefore, we could derive rules from the cluster for insights into the cancer situation in Vietnam. These rules reveal that some cancer types are more widespread in specific families and living environments in Vietnam. Clustering data based on network science can therefore be a good option for large-scale relational data-mining problems in the future.","PeriodicalId":114451,"journal":{"name":"Journal of Science and Technology: Issue on Information and Communications Technology","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Science and Technology: Issue on Information and Communications Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31130/ict-ud.2021.140","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Data-clustering tools can be employed to generate new knowledge for the diagnosis and treatment of cancer. However, traditional clustering methods, such as the K-mean approach, often require the determination of input parameters such as the cluster number and initial centers to be viable. In this study, we present a network science-based clustering method with fewer parameters that were used to mine a cancer-screening dataset containing over 177,000 records. We propose an algorithm that computes the similarity between pairs of records to create a complex network in which each node represents a record, and two nodes are connected by an edge if their similarity is greater than a given threshold as determined by experimental observation. Based on the network created, we employed the network modularity optimization algorithm to detect modules (clusters) within it. Each cluster contains records that are similar to one another in terms of some attributes; therefore, we could derive rules from the cluster for insights into the cancer situation in Vietnam. These rules reveal that some cancer types are more widespread in specific families and living environments in Vietnam. Clustering data based on network science can therefore be a good option for large-scale relational data-mining problems in the future.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用复杂网络技术分析越南北部的癌症数据

数据聚类工具可以用来为癌症的诊断和治疗产生新的知识。然而，传统的聚类方法，如k均值方法，通常需要确定输入参数，如聚类数和初始中心是可行的。在这项研究中，我们提出了一种基于网络科学的聚类方法，使用较少的参数来挖掘包含超过177,000条记录的癌症筛查数据集。我们提出了一种计算记录对之间相似度的算法，以创建一个复杂的网络，其中每个节点代表一条记录，如果两个节点的相似度大于实验观察确定的给定阈值，则两个节点通过一条边连接。在构建网络的基础上，采用网络模块化优化算法对网络中的模块(集群)进行检测。每个集群包含在某些属性方面彼此相似的记录;因此，我们可以从集群中得出规则，以深入了解越南的癌症情况。这些规律表明，某些类型的癌症在越南的特定家庭和生活环境中更为普遍。因此，基于网络科学的数据聚类可以成为未来大规模关系数据挖掘问题的一个很好的选择。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Science and Technology: Issue on Information and Communications Technology

自引率

0.00%

发文量