基于多特征挖掘的人类非编码基因组区域功能snp识别计算方法

Rong Li Rong Li, Zhi-e Lou Rong Li
{"title":"基于多特征挖掘的人类非编码基因组区域功能snp识别计算方法","authors":"Rong Li Rong Li, Zhi-e Lou Rong Li","doi":"10.53106/160792642023052403021","DOIUrl":null,"url":null,"abstract":"\n Single Nucleotide Polymorphism (SNP) is the variant on a single nucleotide in the genome. Functional SNP, as one of the most important molecular markers in disease research, has been widely used in various research fields, such as tumor pathogenesis, disease diagnosis and treatment, prognostic evaluation, drug development, etc. The number of functional SNPs in noncoding genome regions is much more than that in coding regions, and their detection is more difficult. In this work, a multi-feature mining based computational method is proposed to predict the functional SNPs in human noncoding genomes. We first analyzed the sequence properties, evolutionary conservation properties and epigenetic modification signal properties of the sample SNPs. Statistical methods together with multiple annotation data from genomes and epigenetics were used to mine high-dimensional discriminative features subsequently. In particular, the allele-specific features were designed to distinguish the function of SNPs with close locations. The random forest method was used to conduct feature dimension reduction and classification. The 10-fold cross-validation result showed the Area Under the Receiver Operating Characteristic Curve (AUC) of our method improved by 16.9% and 43.4% over existing methods GWAVA and CADD, respectively, illustrating that the allele-specific based features can help to distinguish functional and netural SNPs with near locations.\n \n","PeriodicalId":442331,"journal":{"name":"網際網路技術學刊","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Computational Method for Identification of Functional SNPs in Human Noncoding Genome Regions based on Multi-feature Mining\",\"authors\":\"Rong Li Rong Li, Zhi-e Lou Rong Li\",\"doi\":\"10.53106/160792642023052403021\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n Single Nucleotide Polymorphism (SNP) is the variant on a single nucleotide in the genome. Functional SNP, as one of the most important molecular markers in disease research, has been widely used in various research fields, such as tumor pathogenesis, disease diagnosis and treatment, prognostic evaluation, drug development, etc. The number of functional SNPs in noncoding genome regions is much more than that in coding regions, and their detection is more difficult. In this work, a multi-feature mining based computational method is proposed to predict the functional SNPs in human noncoding genomes. We first analyzed the sequence properties, evolutionary conservation properties and epigenetic modification signal properties of the sample SNPs. Statistical methods together with multiple annotation data from genomes and epigenetics were used to mine high-dimensional discriminative features subsequently. In particular, the allele-specific features were designed to distinguish the function of SNPs with close locations. The random forest method was used to conduct feature dimension reduction and classification. The 10-fold cross-validation result showed the Area Under the Receiver Operating Characteristic Curve (AUC) of our method improved by 16.9% and 43.4% over existing methods GWAVA and CADD, respectively, illustrating that the allele-specific based features can help to distinguish functional and netural SNPs with near locations.\\n \\n\",\"PeriodicalId\":442331,\"journal\":{\"name\":\"網際網路技術學刊\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"網際網路技術學刊\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.53106/160792642023052403021\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"網際網路技術學刊","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.53106/160792642023052403021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

单核苷酸多态性(SNP)是基因组中单个核苷酸的变异。功能SNP作为疾病研究中最重要的分子标记之一,已广泛应用于肿瘤发病机制、疾病诊断与治疗、预后评价、药物开发等各个研究领域。基因组非编码区功能snp数量远多于编码区,检测难度较大。本文提出了一种基于多特征挖掘的人类非编码基因组功能snp预测方法。我们首先分析了样本snp的序列特性、进化保守特性和表观遗传修饰信号特性。随后,利用统计方法结合基因组和表观遗传学的多重注释数据挖掘高维判别特征。特别是,等位基因特异性特征被设计用于区分位置接近的snp的功能。采用随机森林方法进行特征降维和分类。10倍交叉验证结果表明,与现有的GWAVA和CADD方法相比,该方法的AUC分别提高了16.9%和43.4%,说明基于等位基因特异性的特征可以帮助区分近位置的功能性和中性snp。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Computational Method for Identification of Functional SNPs in Human Noncoding Genome Regions based on Multi-feature Mining
Single Nucleotide Polymorphism (SNP) is the variant on a single nucleotide in the genome. Functional SNP, as one of the most important molecular markers in disease research, has been widely used in various research fields, such as tumor pathogenesis, disease diagnosis and treatment, prognostic evaluation, drug development, etc. The number of functional SNPs in noncoding genome regions is much more than that in coding regions, and their detection is more difficult. In this work, a multi-feature mining based computational method is proposed to predict the functional SNPs in human noncoding genomes. We first analyzed the sequence properties, evolutionary conservation properties and epigenetic modification signal properties of the sample SNPs. Statistical methods together with multiple annotation data from genomes and epigenetics were used to mine high-dimensional discriminative features subsequently. In particular, the allele-specific features were designed to distinguish the function of SNPs with close locations. The random forest method was used to conduct feature dimension reduction and classification. The 10-fold cross-validation result showed the Area Under the Receiver Operating Characteristic Curve (AUC) of our method improved by 16.9% and 43.4% over existing methods GWAVA and CADD, respectively, illustrating that the allele-specific based features can help to distinguish functional and netural SNPs with near locations.  
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Compact Depth Separable Convolutional Image Filter for Clinical Color Perception Test Hybrid Dynamic Analysis for Android Malware Protected by Anti-Analysis Techniques with DOOLDA An Improved SSD Model for Small Size Work-pieces Recognition in Automatic Production Line A Construction of Knowledge Graph for Semiconductor Industry Chain Based on Lattice-LSTM and PCNN Models Designing a Multi-Criteria Decision-Making Framework to Establish a Value Ranking System for the Quality Evaluation of Long-Term Care Services
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1