Detection of Fusion Genes from Human Breast Cancer Cell-Line RNA-Seq Data Using Shifted Short Read Clustering

Yoshiaki Sota, S. Seno, Hironori Shigeta, N. Osato, M. Shimoda, S. Noguchi, H. Matsuda
{"title":"Detection of Fusion Genes from Human Breast Cancer Cell-Line RNA-Seq Data Using Shifted Short Read Clustering","authors":"Yoshiaki Sota, S. Seno, Hironori Shigeta, N. Osato, M. Shimoda, S. Noguchi, H. Matsuda","doi":"10.1109/BIBE.2018.00038","DOIUrl":null,"url":null,"abstract":"Fusion genes make for one of the mechanisms of tumorigenesis. The identification of fusion genes by RNA-Seq has attracted attention. Various methods for detecting fusion genes have been proposed, but their accuracy is not sufficient. One of the causes of this problem is the relatively short reading length in RNA-Seq data. Therefore, before mapping RNA-Seq data, we proposed a method, which is based on shifted short-read clustering (SSC), to identify shifted reads of the same origin and extend them as representative sequences. As a result, we assumed that the percentage of uniquely mapped reads would be increased, and the detection rates of the fusion genes could be improved. To verify these hypotheses, we applied the SSC method to RNA-Seq data from three cell lines (BT-474, MCF-7, and SKBR-3). When only one base was shifted, the average read lengths of BT-474, MCF-7, and SKBR-3 were extended from 201 to 223 bases (111%), 201 to 214 bases (106%), and 201 to 213 bases (106%), respectively. Furthermore, the effectiveness of the SSC method is demonstrated by comparing the performances of a fusion gene detection tool's results, STAR-Fusion, with and without the SSC method of the reads. The percentage of uniquely mapped reads of BT-474, MCF-7, and SKBR-3 were improved from 88% to 93%, 88% to 94%, and 92% to 95%, respectively. Finally, the fusion gene detection rates of BT-474, MCF-7, and SKBR-3 were increased from 48% to 57%, 49% to 53%, and 50% to 53% respectively. The SSC method is considered to be an effective method not only for improving the percentage of uniquely mapped reads but also for fusion gene detection.","PeriodicalId":127507,"journal":{"name":"2018 IEEE 18th International Conference on Bioinformatics and Bioengineering (BIBE)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 18th International Conference on Bioinformatics and Bioengineering (BIBE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBE.2018.00038","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Fusion genes make for one of the mechanisms of tumorigenesis. The identification of fusion genes by RNA-Seq has attracted attention. Various methods for detecting fusion genes have been proposed, but their accuracy is not sufficient. One of the causes of this problem is the relatively short reading length in RNA-Seq data. Therefore, before mapping RNA-Seq data, we proposed a method, which is based on shifted short-read clustering (SSC), to identify shifted reads of the same origin and extend them as representative sequences. As a result, we assumed that the percentage of uniquely mapped reads would be increased, and the detection rates of the fusion genes could be improved. To verify these hypotheses, we applied the SSC method to RNA-Seq data from three cell lines (BT-474, MCF-7, and SKBR-3). When only one base was shifted, the average read lengths of BT-474, MCF-7, and SKBR-3 were extended from 201 to 223 bases (111%), 201 to 214 bases (106%), and 201 to 213 bases (106%), respectively. Furthermore, the effectiveness of the SSC method is demonstrated by comparing the performances of a fusion gene detection tool's results, STAR-Fusion, with and without the SSC method of the reads. The percentage of uniquely mapped reads of BT-474, MCF-7, and SKBR-3 were improved from 88% to 93%, 88% to 94%, and 92% to 95%, respectively. Finally, the fusion gene detection rates of BT-474, MCF-7, and SKBR-3 were increased from 48% to 57%, 49% to 53%, and 50% to 53% respectively. The SSC method is considered to be an effective method not only for improving the percentage of uniquely mapped reads but also for fusion gene detection.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用移位短读聚类技术检测人乳腺癌细胞系RNA-Seq数据中的融合基因
融合基因是肿瘤发生的机制之一。利用RNA-Seq技术鉴定融合基因已受到广泛关注。人们提出了多种检测融合基因的方法,但它们的准确性都不够。造成这一问题的原因之一是RNA-Seq数据的读取长度相对较短。因此,在绘制RNA-Seq数据之前,我们提出了一种基于移位短读聚类(SSC)的方法来识别同源移位读,并将其扩展为具有代表性的序列。因此,我们认为唯一映射reads的百分比将会增加,融合基因的检出率将会提高。为了验证这些假设,我们将SSC方法应用于三个细胞系(BT-474、MCF-7和SKBR-3)的RNA-Seq数据。仅移位1个碱基时,BT-474、MCF-7和SKBR-3的平均读长分别从201个碱基增加到223个碱基(111%)、201个碱基增加到214个碱基(106%)和201个碱基增加到213个碱基(106%)。此外,通过比较融合基因检测工具STAR-Fusion在使用和不使用reads的SSC方法时的性能,证明了SSC方法的有效性。BT-474、MCF-7和SKBR-3的唯一定位reads的百分比分别从88%提高到93%、88%提高到94%和92%提高到95%。最后,BT-474、MCF-7、SKBR-3的融合基因检出率分别由48%提高到57%、49%提高到53%、50%提高到53%。SSC方法被认为是一种有效的方法,不仅可以提高唯一定位reads的百分比,而且可以用于融合基因的检测。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Nonlinear CMOS Image Sensor with SOC Integrated Local Contrast Stretch for Bio-Microfluidic Imaging [Regular Paper] Recovering a Chemotopic Feature Space from a Group of Fruit Fly Antenna Chemosensors A Systems Biology Approach to Model Gene-Gene Interaction for Childhood Sarcomas Finite Element Modelling for the Detection of Breast Tumor [Regular Paper] Implementation of an Ultrasound Platform for Proposed Photoacoustic Image Reconstruction Algorithm
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1