Analyzing cancer data in North Vietnam by complex network technique

D. Pham, Minh-Tan Nguyen, Ha-Nam Nguyen, Tien-Dzung Tran
{"title":"Analyzing cancer data in North Vietnam by complex network technique","authors":"D. Pham, Minh-Tan Nguyen, Ha-Nam Nguyen, Tien-Dzung Tran","doi":"10.31130/ict-ud.2021.140","DOIUrl":null,"url":null,"abstract":"Data-clustering tools can be employed to generate new knowledge for the diagnosis and treatment of cancer. However, traditional clustering methods, such as the K-mean approach, often require the determination of input parameters such as the cluster number and initial centers to be viable. In this study, we present a network science-based clustering method with fewer parameters that were used to mine a cancer-screening dataset containing over 177,000 records. We propose an algorithm that computes the similarity between pairs of records to create a complex network in which each node represents a record, and two nodes are connected by an edge if their similarity is greater than a given threshold as determined by experimental observation. Based on the network created, we employed the network modularity optimization algorithm to detect modules (clusters) within it. Each cluster contains records that are similar to one another in terms of some attributes; therefore, we could derive rules from the cluster for insights into the cancer situation in Vietnam. These rules reveal that some cancer types are more widespread in specific families and living environments in Vietnam. Clustering data based on network science can therefore be a good option for large-scale relational data-mining problems in the future.","PeriodicalId":114451,"journal":{"name":"Journal of Science and Technology: Issue on Information and Communications Technology","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Science and Technology: Issue on Information and Communications Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31130/ict-ud.2021.140","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Data-clustering tools can be employed to generate new knowledge for the diagnosis and treatment of cancer. However, traditional clustering methods, such as the K-mean approach, often require the determination of input parameters such as the cluster number and initial centers to be viable. In this study, we present a network science-based clustering method with fewer parameters that were used to mine a cancer-screening dataset containing over 177,000 records. We propose an algorithm that computes the similarity between pairs of records to create a complex network in which each node represents a record, and two nodes are connected by an edge if their similarity is greater than a given threshold as determined by experimental observation. Based on the network created, we employed the network modularity optimization algorithm to detect modules (clusters) within it. Each cluster contains records that are similar to one another in terms of some attributes; therefore, we could derive rules from the cluster for insights into the cancer situation in Vietnam. These rules reveal that some cancer types are more widespread in specific families and living environments in Vietnam. Clustering data based on network science can therefore be a good option for large-scale relational data-mining problems in the future.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用复杂网络技术分析越南北部的癌症数据
数据聚类工具可以用来为癌症的诊断和治疗产生新的知识。然而,传统的聚类方法,如k均值方法,通常需要确定输入参数,如聚类数和初始中心是可行的。在这项研究中,我们提出了一种基于网络科学的聚类方法,使用较少的参数来挖掘包含超过177,000条记录的癌症筛查数据集。我们提出了一种计算记录对之间相似度的算法,以创建一个复杂的网络,其中每个节点代表一条记录,如果两个节点的相似度大于实验观察确定的给定阈值,则两个节点通过一条边连接。在构建网络的基础上,采用网络模块化优化算法对网络中的模块(集群)进行检测。每个集群包含在某些属性方面彼此相似的记录;因此,我们可以从集群中得出规则,以深入了解越南的癌症情况。这些规律表明,某些类型的癌症在越南的特定家庭和生活环境中更为普遍。因此,基于网络科学的数据聚类可以成为未来大规模关系数据挖掘问题的一个很好的选择。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Joint UAV Trajectory and Data Demand for Scheduling in Wireless Sensor Networks Photovoltaic effect in nanopore based bilayer MoS2 devices Investigating the Performance of SCM/MMW/RoF Optical-Wireless Access System for Next Generation Communications Analysis of First- and Second-Order Digital DS Modulator Used in Fractional-N PLLs Analyzing cancer data in North Vietnam by complex network technique
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1