Pub Date : 2021-07-01DOI: 10.1158/1538-7445.AM2021-165
E. Crowgey, Pankaj Vats, Karl R. Franke, G. Burnett, Ankit Sethia, T. Harkins, T. Druley
{"title":"Abstract 165: Enhanced processing of genomic sequencing data for pediatric cancers: GPUs and machine learning techniques for variant detection","authors":"E. Crowgey, Pankaj Vats, Karl R. Franke, G. Burnett, Ankit Sethia, T. Harkins, T. Druley","doi":"10.1158/1538-7445.AM2021-165","DOIUrl":"https://doi.org/10.1158/1538-7445.AM2021-165","url":null,"abstract":"","PeriodicalId":73617,"journal":{"name":"Journal of bioinformatics and systems biology : Open access","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90481225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-01DOI: 10.1158/1538-7445.AM2021-240
Kenneth B. Thomas, Y. Mou, C. Magnan, T. Gyuris, E. Shinbrot, Fernando Díaz, Steven Lau-Rivera, Segun Jung, V. Funari, L. Weiss
Introduction: Our goal is to improve gene fusion detection via RNA sequencing by combining multiple fusion callers through machine learning techniques. Background: Gene Fusion events are important drivers of malignancy. RNA sequencing (RNAseq) methods for detection of fusions have the advantage that multiple markers can be targeted at one time. Unlike DNA methods, in which it is challenging to capture fusion breakpoints, in RNA methods fusions are readily identified through chimeric transcripts. While many fusion calling algorithms exist for use on RNAseq data, sensitive fusion callers, needed for samples of low tumor content, often present high false positive rates - a result of aligning chimeric transcripts. Further, there currently is no single feature in NGS data that can be used to filter out false positive fusion calls. In order to achieve higher accuracy in fusion calls than can be achieved using individual fusion callers, we have weighted and combined the results of multiple fusion callers by systematic and objective means: an ensemble learning approach based on random forest models. Our method selects from data generated by three independent fusion callers supplemented by metrics obtained from in-house methods. It presents a metric that can be immediately interpreted as the probability that a candidate fusion call is a true fusion call. Methods: Random forest models were generated by use of the randomForest package in R, with tuning by the R caret package. Training data sets consisted of a balanced set of 394 fusion calls from clinical samples of solid tumors. For training, fusion calls with at least 10 supporting reads were deemed true or false based on manual review via IGV, and orthogonal methods including PCR with Sanger sequencing and the commercial Archer™ fusion CTL and Sarcoma panels. We present the results of training on data from the three well-known fusion callers Arriba, STAR-Fusion, and FusionCatcher, together with additional data from an in-house developed junction counting method, and fusion membership in a list of known fusions (a “white list”). Models were validated by 10-fold cross-validation. Results: In performance evaluations, false positive and false negative calls were presumed false based on orthogonal determinations. On that basis, our current best model has an accuracy of 94.9% (sensitivity 93.4%, specificity 96.7%). Currently, High Confidence fusion calls (calls with probability score greater than 70%) are the most common positive calls. These have been confirmed with 100% success. Conclusion: We have successfully integrated multiple fusion callers by means of random forest models. Our current model is validated for use on our solid tumor fusion calling pipeline. Citation Format: Kenneth B. Thomas, Yanglong Mou, Christophe Magnan, Tibor Gyuris, Eve Shinbrot, Fernando Lopez Diaz, Steven Lau-Rivera, Segun Jung, Vincent Funari, Lawrence M. Weiss. Gene fusion calling from RNA panel sequencing data: An ensemble lear
我们的目标是通过机器学习技术结合多个融合调用者,通过RNA测序改进基因融合检测。背景:基因融合事件是恶性肿瘤的重要驱动因素。RNA测序(RNAseq)检测融合物的方法具有一次检测多个标记物的优点。与DNA方法不同,在DNA方法中很难捕获融合断点,而RNA方法通过嵌合转录物很容易识别融合。虽然存在许多用于RNAseq数据的融合调用算法,但对于低肿瘤含量的样本来说,敏感的融合调用器通常会出现高假阳性率——这是嵌合转录物排列的结果。此外,目前在NGS数据中没有单一的特征可以用来过滤掉误报融合呼叫。为了获得比使用单个融合调用器更高的融合调用精度,我们通过系统和客观的方法对多个融合调用器的结果进行加权和组合:基于随机森林模型的集成学习方法。我们的方法从三个独立的融合调用程序生成的数据中进行选择,并辅以从内部方法获得的指标。它提出了一个度量,可以立即解释为候选融合调用是真正融合调用的概率。方法:使用R中的randomForest包生成随机森林模型,并使用R插入符号包进行调优。训练数据集包括来自实体瘤临床样本的394个融合呼叫的平衡集。对于训练,基于IGV和正交方法(包括PCR与Sanger测序和商业Archer™融合CTL和Sarcoma面板)的人工审查,具有至少10个支持读数的融合呼叫被认为是正确或错误的。我们介绍了三个著名的融合调用器Arriba、STAR-Fusion和FusionCatcher的数据训练结果,以及来自内部开发的结计数方法的额外数据,以及已知融合列表(“白名单”)中的融合成员。模型采用10倍交叉验证。结果:在绩效评估中,假阳性和假阴性呼叫被假定为基于正交确定的假。在此基础上,我们目前的最佳模型准确率为94.9%(灵敏度93.4%,特异性96.7%)。目前,高置信度融合呼叫(概率得分大于70%)是最常见的正面呼叫。这些已被证实100%成功。结论:我们利用随机森林模型成功地集成了多个融合调用者。我们目前的模型已被验证用于我们的实体肿瘤融合呼叫管道。引用格式:Kenneth B. Thomas, Yanglong Mou, Christophe Magnan, Tibor Gyuris, Eve Shinbrot, Fernando Lopez Diaz, Steven Lau-Rivera, Segun Jung, Vincent Funari, Lawrence M. Weiss来自RNA面板测序数据的基因融合调用:一种集成学习方法[摘要]。见:美国癌症研究协会2021年年会论文集;2021年4月10日至15日和5月17日至21日。费城(PA): AACR;癌症杂志,2021;81(13 -增刊):摘要第240期。
{"title":"Abstract 240: Gene fusion calling from RNA panel sequencing data: An ensemble learning approach","authors":"Kenneth B. Thomas, Y. Mou, C. Magnan, T. Gyuris, E. Shinbrot, Fernando Díaz, Steven Lau-Rivera, Segun Jung, V. Funari, L. Weiss","doi":"10.1158/1538-7445.AM2021-240","DOIUrl":"https://doi.org/10.1158/1538-7445.AM2021-240","url":null,"abstract":"Introduction: Our goal is to improve gene fusion detection via RNA sequencing by combining multiple fusion callers through machine learning techniques. Background: Gene Fusion events are important drivers of malignancy. RNA sequencing (RNAseq) methods for detection of fusions have the advantage that multiple markers can be targeted at one time. Unlike DNA methods, in which it is challenging to capture fusion breakpoints, in RNA methods fusions are readily identified through chimeric transcripts. While many fusion calling algorithms exist for use on RNAseq data, sensitive fusion callers, needed for samples of low tumor content, often present high false positive rates - a result of aligning chimeric transcripts. Further, there currently is no single feature in NGS data that can be used to filter out false positive fusion calls. In order to achieve higher accuracy in fusion calls than can be achieved using individual fusion callers, we have weighted and combined the results of multiple fusion callers by systematic and objective means: an ensemble learning approach based on random forest models. Our method selects from data generated by three independent fusion callers supplemented by metrics obtained from in-house methods. It presents a metric that can be immediately interpreted as the probability that a candidate fusion call is a true fusion call. Methods: Random forest models were generated by use of the randomForest package in R, with tuning by the R caret package. Training data sets consisted of a balanced set of 394 fusion calls from clinical samples of solid tumors. For training, fusion calls with at least 10 supporting reads were deemed true or false based on manual review via IGV, and orthogonal methods including PCR with Sanger sequencing and the commercial Archer™ fusion CTL and Sarcoma panels. We present the results of training on data from the three well-known fusion callers Arriba, STAR-Fusion, and FusionCatcher, together with additional data from an in-house developed junction counting method, and fusion membership in a list of known fusions (a “white list”). Models were validated by 10-fold cross-validation. Results: In performance evaluations, false positive and false negative calls were presumed false based on orthogonal determinations. On that basis, our current best model has an accuracy of 94.9% (sensitivity 93.4%, specificity 96.7%). Currently, High Confidence fusion calls (calls with probability score greater than 70%) are the most common positive calls. These have been confirmed with 100% success. Conclusion: We have successfully integrated multiple fusion callers by means of random forest models. Our current model is validated for use on our solid tumor fusion calling pipeline. Citation Format: Kenneth B. Thomas, Yanglong Mou, Christophe Magnan, Tibor Gyuris, Eve Shinbrot, Fernando Lopez Diaz, Steven Lau-Rivera, Segun Jung, Vincent Funari, Lawrence M. Weiss. Gene fusion calling from RNA panel sequencing data: An ensemble lear","PeriodicalId":73617,"journal":{"name":"Journal of bioinformatics and systems biology : Open access","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78213570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-01DOI: 10.1158/1538-7445.AM2021-183
D. Petrini, C. Shimizu, G. Valente, Guilherme Folgueira, Guilherme Apolinario Silva Novaes, M. H. Katayama, P. Serio, R. A. Roela, T. Tucunduva, M. A. K. Folgueira, Hae Yong Kim
Background:Early computer-aided detection systems for mammography have failed to improve the performance of radiologists. With the remarkable success of deep learning, some recent studies have described computer systems with similar or even superior performance to that of human experts. Among them, Shen et al. (Nature Sci. Rep., 2019) present a promising “end-to-end” training approach. Instead of training a convolutional net with whole mammograms, they first train a “patch classifier” that recognizes lesions in small subimages. Then, they generalize the patch classifier to “whole image classifier” using the property of fully convolutional networks and the end-to-end approach. Using this strategy, the authors have obtained a per-image AUC of 0.87 [0.84, 0.90] in the CBIS-DDSM dataset. Standard mammography consists of two views for each breast: bilateral craniocaudal (CC) and mediolateral oblique (MLO). The algorithm proposed by Shen et al. processes only single-view mammography. We extend their work, presenting the end-to-end training of convolutional net for two-view mammography. Methods:First, we reproduced Shen et al.9s work, using the CBIS-DDSM dataset. We trained a ResNet50-based net for classifying patches with 224x224 pixels using segmented lesions. Then, the weights of the patch classifier were transferred to the whole image single-view classifier, obtained by removing the dense layers from the patch classifier and stacking one ResNet block at the top. This single-view classifier was trained using full images from the same dataset. Trying to replicate Shen et al.9s work, we obtained an AUC of 0.8524±0.0560, less than 0.87 reported in the original paper. We attribute this worsening to the fact that we are using only 2260 images with two views, instead of 2478 images from the original work. Finally, we built the two-view classifier that receives CC and MLO views as input. This classifier has inside two copies of the patch classifier, loaded with the weights from the single-view classifier. The features extracted by the two patch classifiers are concatenated and submitted to the ResNet block. The two-view classifier is end-to-end trained using full images, refining all its weights, including those inside the two patch classifiers. Results:The two-view classifier yielded an AUC of 0.9199±0.0623 in 5-fold cross-validation to classify mammographies into malignant/non-malignant, using single-model and without test-time data augmentation. This is better than the Shen et al.9s AUC (0.87), our single-view AUC (0.85). Zhang et al. (Plos One, 2020) present another two-view algorithm (without end-to-end training) with AUC of 0.95. However, this work cannot directly be compared with ours, as it was tested on a different set of images. Conclusions:We presented end-to-end training of convolutional net for two-view mammography. Our system9s AUC was 0.92, better than the 0.87 obtained by the previous single-view system. Citation Format: Daniel G. Petrini, C
背景:早期乳腺x线摄影的计算机辅助检测系统未能提高放射科医生的工作水平。随着深度学习的显著成功,最近的一些研究已经描述了与人类专家相似甚至优于人类专家的计算机系统。其中,Shen等(Nature Sci.;Rep., 2019)提出了一种有前途的“端到端”培训方法。他们不是用整个乳房x光照片训练卷积网络,而是首先训练一个“补丁分类器”,在小的子图像中识别病变。然后,他们利用全卷积网络的特性和端到端方法将patch分类器推广到“整幅图像分类器”。使用这种策略,作者在CBIS-DDSM数据集中获得了0.87[0.84,0.90]的单幅图像AUC。标准乳房x线照相术包括每个乳房的两个视图:双侧颅侧(CC)和中外侧斜位(MLO)。Shen等人提出的算法只处理单视图乳房x线检查。我们扩展了他们的工作,提出了卷积网络对双视图乳房x线检查的端到端训练。方法:首先,我们使用CBIS-DDSM数据集复制了Shen等人的工作。我们训练了一个基于resnet50的网络,用于使用分割的病灶对224x224像素的斑块进行分类。然后,将patch分类器的权重转移到整个图像的单视图分类器中,该分类器通过去除patch分类器中的密集层并在顶部堆叠一个ResNet块来获得。这个单视图分类器使用来自同一数据集的完整图像进行训练。我们试图复制Shen等人的工作,得到的AUC为0.8524±0.0560,小于原论文报道的0.87。我们将这种恶化归因于我们只使用了2260张带有两个视图的图像,而不是原始作品中的2478张图像。最后,我们构建了接收CC和MLO视图作为输入的双视图分类器。这个分类器有两个补丁分类器的副本,加载了来自单视图分类器的权重。两个补丁分类器提取的特征被连接并提交给ResNet块。双视图分类器使用完整图像进行端到端训练,精炼其所有权重,包括两个补丁分类器内的权重。结果:双视图分类器在单模型、无测试时间数据增强的情况下,5次交叉验证的AUC为0.9199±0.0623。这优于Shen等人的AUC(0.87)和我们的单视图AUC(0.85)。Zhang等人(Plos One, 2020)提出了另一种双视图算法(没有端到端训练),AUC为0.95。但是,这项工作不能直接与我们的工作进行比较,因为它是在不同的图像集上进行测试的。结论:我们提出了用于双视图乳房x线摄影的卷积网络的端到端训练。该系统的AUC为0.92,优于以往单视图系统的0.87。引用格式:Daniel G. Petrini, Carlos Shimizu, Gabriel V. Valente, Guilherme Folgueira, Guilherme A. Novaes, Maria L. Katayama, Pedro Serio, Rosimeire A. Roela, Tatiana C. Tucunduva, Maria aprecida A. Folgueira, Hae Y. Kim。基于卷积网络的乳腺癌双视图乳房x光检查端到端训练[摘要]。见:美国癌症研究协会2021年年会论文集;2021年4月10日至15日和5月17日至21日。费城(PA): AACR;癌症杂志,2021;81(13 -增刊):摘要第183期。
{"title":"Abstract 183: End-to-end training of convolutional network for breast cancer detection in two-view mammography","authors":"D. Petrini, C. Shimizu, G. Valente, Guilherme Folgueira, Guilherme Apolinario Silva Novaes, M. H. Katayama, P. Serio, R. A. Roela, T. Tucunduva, M. A. K. Folgueira, Hae Yong Kim","doi":"10.1158/1538-7445.AM2021-183","DOIUrl":"https://doi.org/10.1158/1538-7445.AM2021-183","url":null,"abstract":"Background:Early computer-aided detection systems for mammography have failed to improve the performance of radiologists. With the remarkable success of deep learning, some recent studies have described computer systems with similar or even superior performance to that of human experts. Among them, Shen et al. (Nature Sci. Rep., 2019) present a promising “end-to-end” training approach. Instead of training a convolutional net with whole mammograms, they first train a “patch classifier” that recognizes lesions in small subimages. Then, they generalize the patch classifier to “whole image classifier” using the property of fully convolutional networks and the end-to-end approach. Using this strategy, the authors have obtained a per-image AUC of 0.87 [0.84, 0.90] in the CBIS-DDSM dataset. Standard mammography consists of two views for each breast: bilateral craniocaudal (CC) and mediolateral oblique (MLO). The algorithm proposed by Shen et al. processes only single-view mammography. We extend their work, presenting the end-to-end training of convolutional net for two-view mammography. Methods:First, we reproduced Shen et al.9s work, using the CBIS-DDSM dataset. We trained a ResNet50-based net for classifying patches with 224x224 pixels using segmented lesions. Then, the weights of the patch classifier were transferred to the whole image single-view classifier, obtained by removing the dense layers from the patch classifier and stacking one ResNet block at the top. This single-view classifier was trained using full images from the same dataset. Trying to replicate Shen et al.9s work, we obtained an AUC of 0.8524±0.0560, less than 0.87 reported in the original paper. We attribute this worsening to the fact that we are using only 2260 images with two views, instead of 2478 images from the original work. Finally, we built the two-view classifier that receives CC and MLO views as input. This classifier has inside two copies of the patch classifier, loaded with the weights from the single-view classifier. The features extracted by the two patch classifiers are concatenated and submitted to the ResNet block. The two-view classifier is end-to-end trained using full images, refining all its weights, including those inside the two patch classifiers. Results:The two-view classifier yielded an AUC of 0.9199±0.0623 in 5-fold cross-validation to classify mammographies into malignant/non-malignant, using single-model and without test-time data augmentation. This is better than the Shen et al.9s AUC (0.87), our single-view AUC (0.85). Zhang et al. (Plos One, 2020) present another two-view algorithm (without end-to-end training) with AUC of 0.95. However, this work cannot directly be compared with ours, as it was tested on a different set of images. Conclusions:We presented end-to-end training of convolutional net for two-view mammography. Our system9s AUC was 0.92, better than the 0.87 obtained by the previous single-view system. Citation Format: Daniel G. Petrini, C","PeriodicalId":73617,"journal":{"name":"Journal of bioinformatics and systems biology : Open access","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76684620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-01DOI: 10.1158/1538-7445.AM2021-153
Claire J. Guo, Mary Saltarelli, S. Lambert, H. Fang, Chun Zhang
{"title":"Abstract 153: Development of a workflow to handle the quality control and analysis of Olink protein biomarker data in early phase oncology clinical trials","authors":"Claire J. Guo, Mary Saltarelli, S. Lambert, H. Fang, Chun Zhang","doi":"10.1158/1538-7445.AM2021-153","DOIUrl":"https://doi.org/10.1158/1538-7445.AM2021-153","url":null,"abstract":"","PeriodicalId":73617,"journal":{"name":"Journal of bioinformatics and systems biology : Open access","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73607351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-01DOI: 10.1158/1538-7445.AM2021-210
Arpad M. Danos, Wan-Hsin Lin, J. Saliba, Angshumoy Roy, A. Church, Shruti Rao, D. Ritter, Kilannin Krysiak, A. Wagner, Erica K. Barnell, Lana M. Sheta, Adam C. Coffman, S. Kiwala, Joshua F. McMichael, L. Corson, Kevin E. Fisher, H. Williams, Matthew C. Hiemenz, K. Janeway, J. Ji, Kesserwan A. Chimene, L. Fuqua, L. Dyer, Huiling Xu, Jeffrey Jean, L. Satgunaseelan, Liying Zhang, T. Laetsch, D. Parsons, Ryan J. Schmidt, L. Schriml, K. Sund, S. Kulkarni, Subha Madhavan, Xinjie Xu, R. Kanagal-Shamana, M. Harris, Y. Akkari, Nurit Paz Yacov, P. Terraf, M. Griffith, O. Griffith, G. Raca
Childhood cancers are driven by unique profiles of somatic genetic alterations, with a significant contribution from predisposing germline variants. Understanding the genomic landscape of pediatric cancers is complicated by their rarity, the heterogeneity of variation within a given disease, and the complex forms of structural variation they contain. Variants in childhood disease may differ from those in adult versions of the same cancer type, or may have different clinical significance. Currently, pediatric variants are underrepresented in cancer variant databases, and an urgent need exists for their publicly available expert curation. To address this, the Pediatric Cancer Taskforce (PCT) was formed within the Clinical Genome Resource (ClinGen) Somatic Cancer Clinical Domain Working Group (CDWG) (https://www.clinicalgenome.org/working-groups/somatic/). The PCT is a multi-institutional group of 39 members with broad experience in childhood cancer and variant curation, whose work consists of standardization and classification of genetic variants in pediatric cancers. The CIViC knowledgebase (www.civicdb.org) is a freely available resource for Clinical Interpretation of Variants in Cancer, which leverages public curation and expert moderation to address the problem of annotating the large volume of clinically actionable cancer variants. PCT curators work together with PCT expert members and the CIViC team on variant curation, and have submitted over 230 Evidence Items and over 10 Assertions to CIViC. To further address issues specific to pediatric curation, the PCT is working with CIViC to develop new pediatric-specific CIViC features and modifications of the data model that will aid in pediatric curation. A pediatric user interface, as well as representation of large scale structural and copy number variation are being developed for version two of CIViC, expected to be released in 1-2 years, which will enable curation of a new class of structural variants often encountered in pediatric cancer. A novel standard operating procedure for childhood cancer curation in CIViC is being developed by PCT experts, curators and the CIViC team. This SOP will cover topics including curation of structural variants, as well as pediatric-specific variant tiering guidelines which take into account the sparse nature of evidence in pediatric cases. A companion resource, CIViCmine (http://bionlp.bcgsc.ca/civicmine/), will be further developed to incorporate pediatric data. These and other joint efforts of the PCT and CIViC will significantly enhance pediatric variant representation for public use, to support the care of children with cancer. Citation Format: Arpad Danos, Wan-Hsin Lin, Jason Saliba, Angshumoy Roy, Alanna J. Church, Shruti Rao, Deborah Ritter, Kilannin Krysiak, Alex Wagner, Erica Barnell, Lana Sheta, Adam Coffman, Susanna Kiwala, Joshua F. McMichael, Laura Corson, Kevin Fisher, Heather E. Williams, Matthew Hiemenz, Katherine A. Janeway, Jianling Ji, Kess
儿童癌症是由体细胞遗传改变的独特特征驱动的,其中重要的贡献来自易感的种系变异。了解儿童癌症的基因组景观是复杂的,因为它们的罕见性,特定疾病内变异的异质性,以及它们所包含的复杂形式的结构变异。儿童疾病的变异可能与相同癌症类型的成人版本不同,或者可能具有不同的临床意义。目前,儿科癌症变体在癌症变体数据库中的代表性不足,迫切需要对其进行公开的专家管理。为了解决这个问题,临床基因组资源(ClinGen)体细胞癌临床领域工作组(CDWG) (https://www.clinicalgenome.org/working-groups/somatic/)成立了儿科癌症工作组(PCT)。PCT是一个由39名成员组成的多机构小组,在儿童癌症和变异管理方面具有广泛的经验,其工作包括儿童癌症遗传变异的标准化和分类。CIViC知识库(www.civicdb.org)是癌症变异临床解释的免费资源,它利用公共管理和专家审核来解决注释大量临床可操作的癌症变异的问题。PCT策展人与PCT专家成员和思域团队一起进行变体策展,并向思域提交了230多条证据项和10多条断言。为了进一步解决儿科护理的具体问题,PCT正在与CIViC合作开发新的儿科专用CIViC功能,并修改数据模型,以帮助儿科护理。CIViC的第二版正在开发儿科用户界面,以及大规模结构和拷贝数变异的表示,预计将在1-2年内发布,这将使儿科癌症中经常遇到的一类新的结构变异得以管理。PCT专家、策展人和CIViC团队正在为CIViC的儿童癌症策展制定一个新的标准操作程序。本SOP将涵盖的主题包括结构变异的管理,以及考虑到儿科病例证据稀疏性的儿科特异性变异分级指南。将进一步开发配套资源CIViCmine (http://bionlp.bcgsc.ca/civicmine/),以纳入儿科数据。PCT和CIViC的这些和其他共同努力将显著提高儿科变体的公共使用代表性,以支持癌症儿童的护理。引文格式:Arpad Danos, Lin Wan-Hsin, Jason Saliba, Angshumoy Roy, Alanna J. Church, Shruti Rao, Deborah Ritter, Kilannin Krysiak, Alex Wagner, Erica Barnell, Lana Sheta, Adam Coffman, Susanna Kiwala, Joshua F. McMichael, Laura Corson, Kevin Fisher, Heather E. Williams, Matthew Hiemenz, Katherine A. Janeway, Jianling Ji, Kesserwan A. Chimene, Laura Fuqua, Lisa Dyer,许惠玲,Jeffrey Jean, Laveniya Satgunaseelan, Liying Zhang, Ted W. Laetsch, Donald W. Parsons, Ryan Schmidt, Lynn M. Schriml,Kristen L. Sund, Shashikant Kulkarni, Subha Madhavan, Xinjie Xu, Rashmi Kanagal-Shamana, Marian Harris, Yasmine Akkari, Nurit Paz Yacov, Panieh Terraf, Malachi Griffith, Obi L. Griffith, Gordana Raca。通过ClinGen/CIViC合作推进儿科癌症变异的知识库表示[摘要]。见:美国癌症研究协会2021年年会论文集;2021年4月10日至15日和5月17日至21日。费城(PA): AACR;癌症杂志,2021;81(13 -增刊):摘要第210期。
{"title":"Abstract 210: Advancing knowledgebase representation of pediatric cancer variants through ClinGen/CIViC collaboration","authors":"Arpad M. Danos, Wan-Hsin Lin, J. Saliba, Angshumoy Roy, A. Church, Shruti Rao, D. Ritter, Kilannin Krysiak, A. Wagner, Erica K. Barnell, Lana M. Sheta, Adam C. Coffman, S. Kiwala, Joshua F. McMichael, L. Corson, Kevin E. Fisher, H. Williams, Matthew C. Hiemenz, K. Janeway, J. Ji, Kesserwan A. Chimene, L. Fuqua, L. Dyer, Huiling Xu, Jeffrey Jean, L. Satgunaseelan, Liying Zhang, T. Laetsch, D. Parsons, Ryan J. Schmidt, L. Schriml, K. Sund, S. Kulkarni, Subha Madhavan, Xinjie Xu, R. Kanagal-Shamana, M. Harris, Y. Akkari, Nurit Paz Yacov, P. Terraf, M. Griffith, O. Griffith, G. Raca","doi":"10.1158/1538-7445.AM2021-210","DOIUrl":"https://doi.org/10.1158/1538-7445.AM2021-210","url":null,"abstract":"Childhood cancers are driven by unique profiles of somatic genetic alterations, with a significant contribution from predisposing germline variants. Understanding the genomic landscape of pediatric cancers is complicated by their rarity, the heterogeneity of variation within a given disease, and the complex forms of structural variation they contain. Variants in childhood disease may differ from those in adult versions of the same cancer type, or may have different clinical significance. Currently, pediatric variants are underrepresented in cancer variant databases, and an urgent need exists for their publicly available expert curation. To address this, the Pediatric Cancer Taskforce (PCT) was formed within the Clinical Genome Resource (ClinGen) Somatic Cancer Clinical Domain Working Group (CDWG) (https://www.clinicalgenome.org/working-groups/somatic/). The PCT is a multi-institutional group of 39 members with broad experience in childhood cancer and variant curation, whose work consists of standardization and classification of genetic variants in pediatric cancers. The CIViC knowledgebase (www.civicdb.org) is a freely available resource for Clinical Interpretation of Variants in Cancer, which leverages public curation and expert moderation to address the problem of annotating the large volume of clinically actionable cancer variants. PCT curators work together with PCT expert members and the CIViC team on variant curation, and have submitted over 230 Evidence Items and over 10 Assertions to CIViC. To further address issues specific to pediatric curation, the PCT is working with CIViC to develop new pediatric-specific CIViC features and modifications of the data model that will aid in pediatric curation. A pediatric user interface, as well as representation of large scale structural and copy number variation are being developed for version two of CIViC, expected to be released in 1-2 years, which will enable curation of a new class of structural variants often encountered in pediatric cancer. A novel standard operating procedure for childhood cancer curation in CIViC is being developed by PCT experts, curators and the CIViC team. This SOP will cover topics including curation of structural variants, as well as pediatric-specific variant tiering guidelines which take into account the sparse nature of evidence in pediatric cases. A companion resource, CIViCmine (http://bionlp.bcgsc.ca/civicmine/), will be further developed to incorporate pediatric data. These and other joint efforts of the PCT and CIViC will significantly enhance pediatric variant representation for public use, to support the care of children with cancer. Citation Format: Arpad Danos, Wan-Hsin Lin, Jason Saliba, Angshumoy Roy, Alanna J. Church, Shruti Rao, Deborah Ritter, Kilannin Krysiak, Alex Wagner, Erica Barnell, Lana Sheta, Adam Coffman, Susanna Kiwala, Joshua F. McMichael, Laura Corson, Kevin Fisher, Heather E. Williams, Matthew Hiemenz, Katherine A. Janeway, Jianling Ji, Kess","PeriodicalId":73617,"journal":{"name":"Journal of bioinformatics and systems biology : Open access","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84487140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-01DOI: 10.1158/1538-7445.AM2021-225
Chenhui Ma, A. Almasan, Evren Gurkan-Cavusoglu
{"title":"Abstract 225: Computational analysis of 5-fluorouracil antitumor activity in colon cancer using a mechanistic pharmacokinetic/pharmacodynamic model","authors":"Chenhui Ma, A. Almasan, Evren Gurkan-Cavusoglu","doi":"10.1158/1538-7445.AM2021-225","DOIUrl":"https://doi.org/10.1158/1538-7445.AM2021-225","url":null,"abstract":"","PeriodicalId":73617,"journal":{"name":"Journal of bioinformatics and systems biology : Open access","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72859961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-01DOI: 10.1158/1538-7445.AM2021-220
M. Parikh, O. Elemento, Neel S. Madhukar, Coryandar Gilvary
{"title":"Abstract 220: Identifying novel oncology targets and positioning existing targets through the prediction of cancer dependencies","authors":"M. Parikh, O. Elemento, Neel S. Madhukar, Coryandar Gilvary","doi":"10.1158/1538-7445.AM2021-220","DOIUrl":"https://doi.org/10.1158/1538-7445.AM2021-220","url":null,"abstract":"","PeriodicalId":73617,"journal":{"name":"Journal of bioinformatics and systems biology : Open access","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83278925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-01DOI: 10.1158/1538-7445.am2021-207
Jianjiong Gao, T. Mazor, Ino de Bruijn, Adam Abeshouse, Diana Baiceanu, Ziya Erkoç, Benjamin E. Gross, David M Higgins, P. Jagannathan, Karthik Kalletla, P. Kumari, Ritika Kundra, Xiang Li, James Lindsay, Aaron Lisman, Pieter Lukasse, Divya Madala, Ramyasree Madupuri, Angelica Ochoa, Oleguer Plantalech, Joyce Quach, Sander Y. A. Rodenburg, Anusha Satravada, F. Schaeffer, R. Sheridan, Lucas Sikina, S. O. Sumer, Yichao Sun, P. van Dijk, P. van Nierop, Avery Wang, Manda Wilson, Hongxin Zhang, Gaofei Zhao, Sjoerd van Hagen, K. van Bochove, U. Dogrusoz, Allison P. Heath, A. Resnick, Trevor J Pugh, C. Sander, E. Cerami, N. Schultz
207: The cBioPortal for Cancer Genomics Author & Article Information Cancer Res (2021) 81 (13_Supplement): 207. https://doi.org/10.1158/1538-7445.AM2021-207
{"title":"Abstract 207: The cBioPortal for Cancer Genomics","authors":"Jianjiong Gao, T. Mazor, Ino de Bruijn, Adam Abeshouse, Diana Baiceanu, Ziya Erkoç, Benjamin E. Gross, David M Higgins, P. Jagannathan, Karthik Kalletla, P. Kumari, Ritika Kundra, Xiang Li, James Lindsay, Aaron Lisman, Pieter Lukasse, Divya Madala, Ramyasree Madupuri, Angelica Ochoa, Oleguer Plantalech, Joyce Quach, Sander Y. A. Rodenburg, Anusha Satravada, F. Schaeffer, R. Sheridan, Lucas Sikina, S. O. Sumer, Yichao Sun, P. van Dijk, P. van Nierop, Avery Wang, Manda Wilson, Hongxin Zhang, Gaofei Zhao, Sjoerd van Hagen, K. van Bochove, U. Dogrusoz, Allison P. Heath, A. Resnick, Trevor J Pugh, C. Sander, E. Cerami, N. Schultz","doi":"10.1158/1538-7445.am2021-207","DOIUrl":"https://doi.org/10.1158/1538-7445.am2021-207","url":null,"abstract":"207: The cBioPortal for Cancer Genomics Author & Article Information Cancer Res (2021) 81 (13_Supplement): 207. https://doi.org/10.1158/1538-7445.AM2021-207","PeriodicalId":73617,"journal":{"name":"Journal of bioinformatics and systems biology : Open access","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84333756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-01DOI: 10.1158/1538-7445.AM2021-LB019
F. Mehrabadi, S. Malikić, Kerrie L. Marie, Eva Pérez-Guijarro, Erfan Sadeqi Azer, Howard H. Yang, Can Kızılkale, Charli Gruen, Huaitian Liu, C. Marcelus, A. Buluç, Funda Ergün, M. Lee, G. Merlino, Chi-Ping Day, S. C. Sahinalp
Emerging sets of single-cell sequencing data makes it appealing to apply existing tumor phylogeny reconstruction methods to analyze associated intratumor heterogeneity. Unfortunately, tumor phylogeny inference is an NP-hard problem and existing principled methods typically fail to scale up to handle thousands of cells and mutations observed in emerging single-cell data sets. Even though there are greedy heuristics to build hierarchical clustering of cells and mutations, they suffer from well-documented issues in accuracy. Additionally even when “optimal” solutions are feasible, existing approaches only provide a single “most likely” tree to depict the evolutionary processes that may result in an observed collection of cells and mutations. To make matters worse, the vast majority of single-cell sequencing data sets are transcriptomic and as a result, suffer from considerable variation in coverage across mutational loci. In this paper, we introduce Trisicell, a computational toolkit for scalable tumor phylogeny reconstruction and validation from single-cell genomic, exomic or transcriptomic sequencing data. Trisicell has three components: (i) Trisicell-DnC, a new tumor phylogeny reconstruction method from genotype matrices derived from single-cell data, (ii) Trisicell-ConT a new algorithm for constructing the consensus for two or more tumor phylogenies - which may be built through the use of different data types on the same set of cells, or built through the use of different methods on the same data, and (iii) Trisicell-PF, a new partition function method for assessing the likelihood of any user-defined subtree/set of cells to be seeded by a given set of mutations in the phylogeny. Collectively, these tools provide means of identifying and validating robust portions of a tumor phylogeny, offering the ability to focus on the most important (sub)clones and the genomic alterations that seed the associated clonal expansion. We applied Trisicell to a panel of clonal sublines derived from single-cells of a parental mouse melanoma model on which we performed both whole exome and whole transcriptome sequencing. The tumor phylogenies of the clonal sublines built on exomic and transcriptomic mutations by Trisicell-DnC, were shown by Trisicell-ConT to be highly similar and the subtrees comprised of phenotypically similar clonal sublines were shown to be strongly associated by Trisicell-PF to their seeding mutations. In addition, we applied Trisicell to single-cell whole transcriptome sequencing data from a tumor derived from the same parental melanoma cell line, which was subjected to anti-CTLA-4 immunotherapy. The phylogenies generated from both studies featured distinct subtrees, strongly associated with phenotypes including cell differentiation status, tumor growth and therapeutic response. These results suggest that Trisicell can be used for scalable tumor phylogeny reconstruction and validation through both single-cell and clonal-subline sequencing data,
新出现的单细胞测序数据集使得应用现有的肿瘤系统发育重建方法来分析相关的肿瘤内异质性具有吸引力。不幸的是,肿瘤系统发育推断是一个np难题,现有的原则方法通常无法扩展到处理新兴单细胞数据集中观察到的数千个细胞和突变。尽管存在贪婪的启发式方法来构建细胞和突变的分层聚类,但它们在准确性方面存在众所周知的问题。此外,即使“最优”解决方案是可行的,现有的方法也只能提供一个单一的“最可能”树来描述可能导致观察到的细胞集合和突变的进化过程。更糟糕的是,绝大多数单细胞测序数据集都是转录组的,因此,在突变位点的覆盖范围上存在相当大的差异。在本文中,我们介绍了Trisicell,这是一个计算工具包,用于从单细胞基因组,外显子组或转录组测序数据进行可扩展的肿瘤系统发育重建和验证。Trisicell有三个组成部分:(i) Trisicell-DnC,一种新的基于单细胞数据的基因型矩阵的肿瘤系统发育重建方法,(ii) Trisicell-ConT,一种用于构建两个或多个肿瘤系统发育共识的新算法,可以通过在同一组细胞上使用不同的数据类型来构建,或者通过在同一数据上使用不同的方法来构建,以及(iii) Trisicell-PF,一种新的配分函数方法,用于评估任何用户定义的子树/细胞集被系统发育中给定的一组突变所播种的可能性。总的来说,这些工具提供了识别和验证肿瘤系统发育稳健部分的方法,提供了关注最重要(亚)克隆和为相关克隆扩增提供种子的基因组改变的能力。我们将Trisicell应用于来自亲代小鼠黑色素瘤模型单细胞的克隆亚系,并对其进行了全外显子组和全转录组测序。由Trisicell-DnC构建的外显组和转录组突变的克隆亚系的肿瘤系统发育与Trisicell-ConT显示高度相似,由表型相似的克隆亚系组成的亚树与Trisicell-PF显示的种子突变密切相关。此外,我们将Trisicell应用于来自同一亲本黑色素瘤细胞系的肿瘤的单细胞全转录组测序数据,该肿瘤接受抗ctla -4免疫治疗。两项研究产生的系统发育特征不同的亚树,与表型密切相关,包括细胞分化状态、肿瘤生长和治疗反应。这些结果表明,通过单细胞和克隆亚系测序数据,Trisicell可以用于可扩展的肿瘤系统发育重建和验证,这可能揭示出强烈的表型关联。特别是,他们认为黑色素瘤的发育状态和表型瘤内异质性源于可观察到的亚克隆变异。引文格式:Farid Rashidi Mehrabadi, Salem Malikic, Kerrie L. Marie, Eva perez - gujarro, Erfan Sadeqi Azer, Howard H. Yang, Can Kizilkale, Charli Gruen,刘怀天,Christina Marcelus, Aydin Buluc, Funda Ergun, Maxwell P. Lee, Glenn Merlino, Chi-Ping Day, S. Cenk Sahinalp。三细胞:可扩展的肿瘤系统发育重建和验证揭示了肿瘤内异质性的发育起源和治疗影响[摘要]。见:美国癌症研究协会2021年年会论文集;2021年4月10日至15日和5月17日至21日。费城(PA): AACR;癌症杂志,2021;81(13 -增刊):摘要nr LB019。
{"title":"Abstract LB019: Trisicell: Scalable Tumor Phylogeny Reconstruction and Validation Reveals Developmental Origin and Therapeutic Impact of Intratumoral Heterogeneity","authors":"F. Mehrabadi, S. Malikić, Kerrie L. Marie, Eva Pérez-Guijarro, Erfan Sadeqi Azer, Howard H. Yang, Can Kızılkale, Charli Gruen, Huaitian Liu, C. Marcelus, A. Buluç, Funda Ergün, M. Lee, G. Merlino, Chi-Ping Day, S. C. Sahinalp","doi":"10.1158/1538-7445.AM2021-LB019","DOIUrl":"https://doi.org/10.1158/1538-7445.AM2021-LB019","url":null,"abstract":"Emerging sets of single-cell sequencing data makes it appealing to apply existing tumor phylogeny reconstruction methods to analyze associated intratumor heterogeneity. Unfortunately, tumor phylogeny inference is an NP-hard problem and existing principled methods typically fail to scale up to handle thousands of cells and mutations observed in emerging single-cell data sets. Even though there are greedy heuristics to build hierarchical clustering of cells and mutations, they suffer from well-documented issues in accuracy. Additionally even when “optimal” solutions are feasible, existing approaches only provide a single “most likely” tree to depict the evolutionary processes that may result in an observed collection of cells and mutations. To make matters worse, the vast majority of single-cell sequencing data sets are transcriptomic and as a result, suffer from considerable variation in coverage across mutational loci. In this paper, we introduce Trisicell, a computational toolkit for scalable tumor phylogeny reconstruction and validation from single-cell genomic, exomic or transcriptomic sequencing data. Trisicell has three components: (i) Trisicell-DnC, a new tumor phylogeny reconstruction method from genotype matrices derived from single-cell data, (ii) Trisicell-ConT a new algorithm for constructing the consensus for two or more tumor phylogenies - which may be built through the use of different data types on the same set of cells, or built through the use of different methods on the same data, and (iii) Trisicell-PF, a new partition function method for assessing the likelihood of any user-defined subtree/set of cells to be seeded by a given set of mutations in the phylogeny. Collectively, these tools provide means of identifying and validating robust portions of a tumor phylogeny, offering the ability to focus on the most important (sub)clones and the genomic alterations that seed the associated clonal expansion. We applied Trisicell to a panel of clonal sublines derived from single-cells of a parental mouse melanoma model on which we performed both whole exome and whole transcriptome sequencing. The tumor phylogenies of the clonal sublines built on exomic and transcriptomic mutations by Trisicell-DnC, were shown by Trisicell-ConT to be highly similar and the subtrees comprised of phenotypically similar clonal sublines were shown to be strongly associated by Trisicell-PF to their seeding mutations. In addition, we applied Trisicell to single-cell whole transcriptome sequencing data from a tumor derived from the same parental melanoma cell line, which was subjected to anti-CTLA-4 immunotherapy. The phylogenies generated from both studies featured distinct subtrees, strongly associated with phenotypes including cell differentiation status, tumor growth and therapeutic response. These results suggest that Trisicell can be used for scalable tumor phylogeny reconstruction and validation through both single-cell and clonal-subline sequencing data,","PeriodicalId":73617,"journal":{"name":"Journal of bioinformatics and systems biology : Open access","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89145061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-01DOI: 10.1158/1538-7445.AM2021-181
D. Petrini, C. Shimizu, G. Valente, Guilherme Folgueira, Guilherme Apolinario Silva Novaes, M. H. Katayama, P. Serio, R. A. Roela, T. Tucunduva, M. A. K. Folgueira, Hae Yong Kim
Background:Breast cancer (BC) is the second most common cancer among women. BC screening is usually based on mammography interpreted by radiologists. Recently, some researchers have used deep learning to automatically diagnose BC in mammography and so assist radiologists. The progress of BC detection algorithms can be measured by their performance on public datasets. The CBIS-DDSM is a widely used public dataset composed of scanned mammographies, equally divided into malignant and non-malignant (benign) images. Each image is accompanied by the segmentation of the lesion. Shen et al. (Nature Sci. Rep., 2019) presented a BC detection algorithm using an “end-to-end” approach to train deep neural networks. In this algorithm, a patch classifier is first trained to classify local image patches. The patch classifier9s weights are then used to initialize the whole image classifier, that is refined using datasets with the cancer status of the whole image. They achieved an AUC of 0.87 [0.84, 0.90] in classifying CBIS-DDSM images, using their best single-model, single-view breast classifier. They used ResNet (He et al., CVPR 2016) as the basis of their algorithm. Our hypothesis was that replacing the old ResNet with the modern EfficientNet (Tan et al., arXiv 2019) and MobileNetV2 (Sandler et al.,CVPR 2018) would result in greater accuracy. Methods:We tested many different models, to conclude that the best model is obtained using EfficientNet-B4 as the base model, with a MobileNetV2 block at the top, followed by a dense layer with two output categories. We trained the patch classifier using 52,528 patches with 224x224 pixels extracted from CBIS-DDSM. From each image, we extracted 20 patches: 10 patches containing the lesion and 10 from the background (without lesion). The patch classifier weights were then used to initialize the whole image classifier, that was trained using the end-to-end approach with CBIS-DDSM images resized to 1152x896 pixels, with data augmentation. The training was performed using a step learning rate of 1e-4 for the first 20 epochs then 1e-5 for the remaining 10 and batch size of 4, using 10-fold cross-validation. We used 81% of the dataset for training, 9% for validation and 10% for testing. Results:We obtained an AUC of 0.8963±0.06, using a single-model, single-view classifier and without test-time data augmentation. Conclusions:Using EfficientNet and MobileNetV2 as the basis of the BC detection algorithm (instead of ResNet), we obtained an improvement in classifying CBIS-DDSM images into malignant/non-malignant: AUC has increased from 0.87 to 0.896. Our AUC is also larger than other recent papers in the literature, such as Shu et al. (IEEE Trans Med. Image, 2020) that achieved an AUC of 0.838 in the same CBIS-DDSM dataset. Citation Format: Daniel G. Petrini, Carlos Shimizu, Gabriel V. Valente, Guilherme Folgueira, Guilherme A. Novaes, Maria L. Katayama, Pedro Serio, Rosimeire A. Roela, Tatiana C. Tucunduva, Maria Aparecida A. Folgu
背景:乳腺癌(BC)是女性中第二常见的癌症。BC筛查通常是基于由放射科医生解读的乳房x光检查。最近,一些研究人员已经使用深度学习来自动诊断乳房x光检查中的BC,从而帮助放射科医生。BC检测算法的进步可以通过它们在公共数据集上的表现来衡量。CBIS-DDSM是一个广泛使用的公共数据集,由扫描乳房x线照片组成,平均分为恶性和非恶性(良性)图像。每张图像都伴随着病灶的分割。《自然科学》;Rep., 2019)提出了一种使用“端到端”方法训练深度神经网络的BC检测算法。在该算法中,首先训练一个补丁分类器对局部图像补丁进行分类。然后使用patch分类器权重初始化整个图像分类器,并使用具有整个图像癌症状态的数据集对其进行细化。他们使用他们最好的单模型、单视图乳腺分类器对CBIS-DDSM图像进行分类,AUC为0.87[0.84,0.90]。他们使用ResNet (He et al., CVPR 2016)作为算法的基础。我们的假设是,用现代的EfficientNet (Tan等人,arXiv 2019)和MobileNetV2 (Sandler等人,CVPR 2018)取代旧的ResNet将导致更高的准确性。方法:我们测试了许多不同的模型,得出的结论是,以EfficientNet-B4为基础模型,顶部为MobileNetV2块,然后是具有两个输出类别的密集层,获得了最佳模型。我们使用从CBIS-DDSM中提取的224x224像素的52528个补丁来训练补丁分类器。从每张图像中,我们提取了20块补丁:10块包含病变,10块来自背景(没有病变)。然后使用patch分类器权重初始化整个图像分类器,使用端到端方法对cis - ddsm图像进行训练,将图像大小调整为1152x896像素,并进行数据增强。前20个epoch的步学习率为1e-4,其余10个epoch的步学习率为1e-5,批大小为4,使用10倍交叉验证。我们使用81%的数据集用于训练,9%用于验证,10%用于测试。结果:采用单模型、单视图分类器,未经测试时间数据增强,AUC为0.8963±0.06。结论:使用EfficientNet和MobileNetV2作为BC检测算法的基础(而不是ResNet),我们对CBIS-DDSM图像的恶性/非恶性分类得到了改进:AUC从0.87提高到0.896。我们的AUC也比文献中最近的其他论文要大,例如Shu等人(IEEE Trans Med. Image, 2020)在相同的CBIS-DDSM数据集中实现了0.838的AUC。引用格式:Daniel G. Petrini, Carlos Shimizu, Gabriel V. Valente, Guilherme Folgueira, Guilherme A. Novaes, Maria L. Katayama, Pedro Serio, Rosimeire A. Roela, Tatiana C. Tucunduva, Maria aprecida A. Folgueira, Hae Y. Kim。基于高效网络和端到端训练的乳房x线摄影高精度乳腺癌检测[摘要]。见:美国癌症研究协会2021年年会论文集;2021年4月10日至15日和5月17日至21日。费城(PA): AACR;癌症杂志,2021;81(13 -增刊):摘要第181期。
{"title":"Abstract 181: High-accuracy breast cancer detection in mammography using EfficientNet and end-to-end training","authors":"D. Petrini, C. Shimizu, G. Valente, Guilherme Folgueira, Guilherme Apolinario Silva Novaes, M. H. Katayama, P. Serio, R. A. Roela, T. Tucunduva, M. A. K. Folgueira, Hae Yong Kim","doi":"10.1158/1538-7445.AM2021-181","DOIUrl":"https://doi.org/10.1158/1538-7445.AM2021-181","url":null,"abstract":"Background:Breast cancer (BC) is the second most common cancer among women. BC screening is usually based on mammography interpreted by radiologists. Recently, some researchers have used deep learning to automatically diagnose BC in mammography and so assist radiologists. The progress of BC detection algorithms can be measured by their performance on public datasets. The CBIS-DDSM is a widely used public dataset composed of scanned mammographies, equally divided into malignant and non-malignant (benign) images. Each image is accompanied by the segmentation of the lesion. Shen et al. (Nature Sci. Rep., 2019) presented a BC detection algorithm using an “end-to-end” approach to train deep neural networks. In this algorithm, a patch classifier is first trained to classify local image patches. The patch classifier9s weights are then used to initialize the whole image classifier, that is refined using datasets with the cancer status of the whole image. They achieved an AUC of 0.87 [0.84, 0.90] in classifying CBIS-DDSM images, using their best single-model, single-view breast classifier. They used ResNet (He et al., CVPR 2016) as the basis of their algorithm. Our hypothesis was that replacing the old ResNet with the modern EfficientNet (Tan et al., arXiv 2019) and MobileNetV2 (Sandler et al.,CVPR 2018) would result in greater accuracy. Methods:We tested many different models, to conclude that the best model is obtained using EfficientNet-B4 as the base model, with a MobileNetV2 block at the top, followed by a dense layer with two output categories. We trained the patch classifier using 52,528 patches with 224x224 pixels extracted from CBIS-DDSM. From each image, we extracted 20 patches: 10 patches containing the lesion and 10 from the background (without lesion). The patch classifier weights were then used to initialize the whole image classifier, that was trained using the end-to-end approach with CBIS-DDSM images resized to 1152x896 pixels, with data augmentation. The training was performed using a step learning rate of 1e-4 for the first 20 epochs then 1e-5 for the remaining 10 and batch size of 4, using 10-fold cross-validation. We used 81% of the dataset for training, 9% for validation and 10% for testing. Results:We obtained an AUC of 0.8963±0.06, using a single-model, single-view classifier and without test-time data augmentation. Conclusions:Using EfficientNet and MobileNetV2 as the basis of the BC detection algorithm (instead of ResNet), we obtained an improvement in classifying CBIS-DDSM images into malignant/non-malignant: AUC has increased from 0.87 to 0.896. Our AUC is also larger than other recent papers in the literature, such as Shu et al. (IEEE Trans Med. Image, 2020) that achieved an AUC of 0.838 in the same CBIS-DDSM dataset. Citation Format: Daniel G. Petrini, Carlos Shimizu, Gabriel V. Valente, Guilherme Folgueira, Guilherme A. Novaes, Maria L. Katayama, Pedro Serio, Rosimeire A. Roela, Tatiana C. Tucunduva, Maria Aparecida A. Folgu","PeriodicalId":73617,"journal":{"name":"Journal of bioinformatics and systems biology : Open access","volume":"85 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86986647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}