Effectiveness of machine learning at modeling the relationship between Hi‐C data and copy number variation

Yuyang Wang, Yu Sun, Zeyu Liu, Bijia Chen, Hebing Chen, Chao Ren, Xuanwei Lin, Pengzhen Hu, Peiheng Jia, Xiang Xu, Kang Xu, Ximeng Liu, Hao Li, Xiaochen Bo
{"title":"Effectiveness of machine learning at modeling the relationship between Hi‐C data and copy number variation","authors":"Yuyang Wang, Yu Sun, Zeyu Liu, Bijia Chen, Hebing Chen, Chao Ren, Xuanwei Lin, Pengzhen Hu, Peiheng Jia, Xiang Xu, Kang Xu, Ximeng Liu, Hao Li, Xiaochen Bo","doi":"10.1002/qub2.52","DOIUrl":null,"url":null,"abstract":"Copy number variation (CNV) refers to the number of copies of a specific sequence in a genome and is a type of chromatin structural variation. The development of the Hi‐C technique has empowered research on the spatial structure of chromatins by capturing interactions between DNA fragments. We utilized machine‐learning methods including the linear transformation model and graph convolutional network (GCN) to detect CNV events from Hi‐C data and reveal how CNV is related to three‐dimensional interactions between genomic fragments in terms of the one‐dimensional read count signal and features of the chromatin structure. The experimental results demonstrated a specific linear relation between the Hi‐C read count and CNV for each chromosome that can be well qualified by the linear transformation model. In addition, the GCN‐based model could accurately extract features of the spatial structure from Hi‐C data and infer the corresponding CNV across different chromosomes in a cancer cell line. We performed a series of experiments including dimension reduction, transfer learning, and Hi‐C data perturbation to comprehensively evaluate the utility and robustness of the GCN‐based model. This work can provide a benchmark for using machine learning to infer CNV from Hi‐C data and serves as a necessary foundation for deeper understanding of the relationship between Hi‐C data and CNV.","PeriodicalId":508846,"journal":{"name":"Quantitative Biology","volume":" 14","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Quantitative Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/qub2.52","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Copy number variation (CNV) refers to the number of copies of a specific sequence in a genome and is a type of chromatin structural variation. The development of the Hi‐C technique has empowered research on the spatial structure of chromatins by capturing interactions between DNA fragments. We utilized machine‐learning methods including the linear transformation model and graph convolutional network (GCN) to detect CNV events from Hi‐C data and reveal how CNV is related to three‐dimensional interactions between genomic fragments in terms of the one‐dimensional read count signal and features of the chromatin structure. The experimental results demonstrated a specific linear relation between the Hi‐C read count and CNV for each chromosome that can be well qualified by the linear transformation model. In addition, the GCN‐based model could accurately extract features of the spatial structure from Hi‐C data and infer the corresponding CNV across different chromosomes in a cancer cell line. We performed a series of experiments including dimension reduction, transfer learning, and Hi‐C data perturbation to comprehensively evaluate the utility and robustness of the GCN‐based model. This work can provide a benchmark for using machine learning to infer CNV from Hi‐C data and serves as a necessary foundation for deeper understanding of the relationship between Hi‐C data and CNV.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
机器学习建模 Hi-C 数据与拷贝数变异之间关系的有效性
拷贝数变异(CNV)是指基因组中特定序列的拷贝数,是染色质结构变异的一种类型。Hi-C 技术的发展通过捕捉 DNA 片段之间的相互作用,促进了染色质空间结构的研究。我们利用线性变换模型和图卷积网络(GCN)等机器学习方法从Hi-C数据中检测CNV事件,并从一维读数信号和染色质结构特征方面揭示CNV与基因组片段间三维相互作用的关系。实验结果表明,每条染色体的 Hi-C 读数与 CNV 之间存在特定的线性关系,线性变换模型可以很好地证明这一点。此外,基于 GCN 的模型还能从 Hi-C 数据中准确提取空间结构特征,并推断出癌细胞系中不同染色体上相应的 CNV。我们进行了一系列实验,包括降维、迁移学习和 Hi-C 数据扰动,以全面评估基于 GCN 的模型的实用性和鲁棒性。这项工作为利用机器学习从 Hi-C 数据推断 CNV 提供了一个基准,也为深入理解 Hi-C 数据与 CNV 之间的关系奠定了必要的基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Deterministic modelling of asymptomatic spread and disease stage progression in vaccine preventable infectious diseases Perspectives on benchmarking foundation models for network biology In silico designing and optimization of anti‐epidermal growth factor receptor scaffolds by complementary‐determining regions‐grafting technique Mathematical modeling of evolution of cell networks in epithelial tissues A  substructure‐aware graph neural network incorporating relation features for drug–drug interaction prediction
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1