In this paper, we compare the performance of the deep neural network-based image classifiers and fine-tune with different hyperparameter configurations for an automatic COVID-19 diagnosis from various and limited chest x-ray image dataset provided by Deep Learning and Artificial Intelligence Summer School 3 (DLAI3). We show that high accuracy results can be obtained using the transfer learning technique combined with a well fine-tuned Convolutional Neural Network. Moreover, we seek for not only smaller deep learning architectures with less trainable parameters to reduce the training and inference time of AI applications for mobile and edge devices, but also relatively high performance. The results from the DLAI3 hackathon session show that our model outperforms other submitted models in terms of effectiveness and generalization.
{"title":"Fine-Tuning A Lightweight Convolutional Neural Networks for COVID-19 Diagnosis","authors":"Jaturong Kongmanee, Thanyathorn Thanapattheerakul","doi":"10.1145/3429210.3429218","DOIUrl":"https://doi.org/10.1145/3429210.3429218","url":null,"abstract":"In this paper, we compare the performance of the deep neural network-based image classifiers and fine-tune with different hyperparameter configurations for an automatic COVID-19 diagnosis from various and limited chest x-ray image dataset provided by Deep Learning and Artificial Intelligence Summer School 3 (DLAI3). We show that high accuracy results can be obtained using the transfer learning technique combined with a well fine-tuned Convolutional Neural Network. Moreover, we seek for not only smaller deep learning architectures with less trainable parameters to reduce the training and inference time of AI applications for mobile and edge devices, but also relatively high performance. The results from the DLAI3 hackathon session show that our model outperforms other submitted models in terms of effectiveness and generalization.","PeriodicalId":164790,"journal":{"name":"CSBio '20: Proceedings of the Eleventh International Conference on Computational Systems-Biology and Bioinformatics","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122886970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chenqi Li, Maggie Wang, Grace Wu, Khadija Rana, Nipon Charoenkitkarn, Jonathan H. Chan
COVID-19 outbreak calls for the urgent need of quick, accurate, and accessible methods for detection. Convolutional neural networks applied to chest x-ray images is a promising solution; however, x-ray device configurations vary and data quality across different datasets are inconsistent. This leads to overfitting on a particular set of training data. This paper aims to explore methods to mitigate overfitting.
{"title":"COVID19 Chest X-Ray Classification with Simple Convolutional Neural Network","authors":"Chenqi Li, Maggie Wang, Grace Wu, Khadija Rana, Nipon Charoenkitkarn, Jonathan H. Chan","doi":"10.1145/3429210.3429216","DOIUrl":"https://doi.org/10.1145/3429210.3429216","url":null,"abstract":"COVID-19 outbreak calls for the urgent need of quick, accurate, and accessible methods for detection. Convolutional neural networks applied to chest x-ray images is a promising solution; however, x-ray device configurations vary and data quality across different datasets are inconsistent. This leads to overfitting on a particular set of training data. This paper aims to explore methods to mitigate overfitting.","PeriodicalId":164790,"journal":{"name":"CSBio '20: Proceedings of the Eleventh International Conference on Computational Systems-Biology and Bioinformatics","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124274237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fluid flow characteristics and their dependence on device geometry and fluid parameters were studied in this article. We computationally examined three types of microfluidic channel geometries: torus, cylindrical annulus with squared cross-section, and helix. Several parameters varied over simulations: kinematic viscosity of the fluid, the velocity of the fluid, curvature radius, cross-section dimension, and pitch for helical channels. We analyzed the velocity distribution of the primary flow, as well as shape of Dean vortices for secondary inertial flow, and in case of helical channels, we analyzed also S-shaped streamlines within non-perpendicular cross-section. We analyzed also dependence of the secondary flow velocity and vorticity on average velocity of primary flow. We could confirm that the Dean effect is present in our numerical simulations, and it can be further investigated as a sorting tool for cells in suspension.
{"title":"Computational study of inertial effects in toroidal and helical microchannels","authors":"K. Kovalcíková, A. Bugánová, I. Cimrák","doi":"10.1145/3429210.3429222","DOIUrl":"https://doi.org/10.1145/3429210.3429222","url":null,"abstract":"Fluid flow characteristics and their dependence on device geometry and fluid parameters were studied in this article. We computationally examined three types of microfluidic channel geometries: torus, cylindrical annulus with squared cross-section, and helix. Several parameters varied over simulations: kinematic viscosity of the fluid, the velocity of the fluid, curvature radius, cross-section dimension, and pitch for helical channels. We analyzed the velocity distribution of the primary flow, as well as shape of Dean vortices for secondary inertial flow, and in case of helical channels, we analyzed also S-shaped streamlines within non-perpendicular cross-section. We analyzed also dependence of the secondary flow velocity and vorticity on average velocity of primary flow. We could confirm that the Dean effect is present in our numerical simulations, and it can be further investigated as a sorting tool for cells in suspension.","PeriodicalId":164790,"journal":{"name":"CSBio '20: Proceedings of the Eleventh International Conference on Computational Systems-Biology and Bioinformatics","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130180530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammad Kawsar Sharif Siam, Afsana Karim, Mohammad Umer Sharif Shohan
HSP90 (Heat shock protein 90), molecular chaperone contains various oncogenic client proteins, which play a significant role in initiating cancer cell hallmarks. The “HSP90-addiction” of cancer cells, makes it a suitable target in cancer treatment. Inhibition of HSP90 mitigates the tumor progression but results in over-expression of the HSP70 family (The 70-kDa heat shock proteins). HSP70 family is expressed abundantly in human tumors. High expression of HSP70 in cancer cells is responsible for tumor progression. It has been found that, inhibition of both Heat shock 70 kDa protein 1a, HSP72 and Heat shock cognate 71-kDa proteins and HSC70 (two isoforms of the HSP70 family) simultaneously lead to the inhibition of HSP90 client proteins. In this study, molecular docking approach was done in search of the best possible inhibitors of HSP72 and HSC70. Zafirlukast was used as a reference drug that is a potent inhibitor of both the isoforms HSP72 and HSC70. The binding affinity of Zafirlukast with HSP72 (PDB ID-5AQZ) and HSC70 (PDB ID-4H5N) is -10.5 and -9.9 kcal/mol respectively. 100 potential inhibitors (Anti-diabetic drugs, anti-rheumatic drugs, anti-inflammatory, statins and small molecule inhibitors) were screened through In silico approach and Apoptozole was found to be a potential inhibitor of both HSP72 and HSC70 with strong binding affinities of -11.0 and -10.2 kcal/mol respectively. Protein-ligand interaction was monitored and visualized by discovery studio to better understand the nature of intermolecular bonds. Furthermore, ADMET properties were obtained from admetSAR 2.0 and were compared with reference drug for validation.
{"title":"In-Silico Study for Potential Inhibitors of Both HSP72 and HSC70 Proteins in the Treatment of Cancer","authors":"Mohammad Kawsar Sharif Siam, Afsana Karim, Mohammad Umer Sharif Shohan","doi":"10.1145/3429210.3429226","DOIUrl":"https://doi.org/10.1145/3429210.3429226","url":null,"abstract":"HSP90 (Heat shock protein 90), molecular chaperone contains various oncogenic client proteins, which play a significant role in initiating cancer cell hallmarks. The “HSP90-addiction” of cancer cells, makes it a suitable target in cancer treatment. Inhibition of HSP90 mitigates the tumor progression but results in over-expression of the HSP70 family (The 70-kDa heat shock proteins). HSP70 family is expressed abundantly in human tumors. High expression of HSP70 in cancer cells is responsible for tumor progression. It has been found that, inhibition of both Heat shock 70 kDa protein 1a, HSP72 and Heat shock cognate 71-kDa proteins and HSC70 (two isoforms of the HSP70 family) simultaneously lead to the inhibition of HSP90 client proteins. In this study, molecular docking approach was done in search of the best possible inhibitors of HSP72 and HSC70. Zafirlukast was used as a reference drug that is a potent inhibitor of both the isoforms HSP72 and HSC70. The binding affinity of Zafirlukast with HSP72 (PDB ID-5AQZ) and HSC70 (PDB ID-4H5N) is -10.5 and -9.9 kcal/mol respectively. 100 potential inhibitors (Anti-diabetic drugs, anti-rheumatic drugs, anti-inflammatory, statins and small molecule inhibitors) were screened through In silico approach and Apoptozole was found to be a potential inhibitor of both HSP72 and HSC70 with strong binding affinities of -11.0 and -10.2 kcal/mol respectively. Protein-ligand interaction was monitored and visualized by discovery studio to better understand the nature of intermolecular bonds. Furthermore, ADMET properties were obtained from admetSAR 2.0 and were compared with reference drug for validation.","PeriodicalId":164790,"journal":{"name":"CSBio '20: Proceedings of the Eleventh International Conference on Computational Systems-Biology and Bioinformatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126523446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Identification of underlying genetic factors has provided important information on the functional pathways involved in many of complex disorders. However, the casual genetic factors identified in many complex disorders so far generally confer less risk than expected from the empirical estimates of their heritability. Tandem DNA repeats make up around 6% of the human genome and have been associated with more than 40 monogenic disorders, but their involvement in complex disorders is largely unknown. I will present our novel approach to detect genome-wide tandem repeat expansions. This approach has led to the identification of rare tandem repeat expansions contributing to autism spectrum disorder and other related conditions. It provides a model to search for missing heritability in other complex disorders.
{"title":"Genome-wide repeat expansions in complex disorders: beyond the coding sequence","authors":"R. Yuen","doi":"10.1145/3429210.3429231","DOIUrl":"https://doi.org/10.1145/3429210.3429231","url":null,"abstract":"Identification of underlying genetic factors has provided important information on the functional pathways involved in many of complex disorders. However, the casual genetic factors identified in many complex disorders so far generally confer less risk than expected from the empirical estimates of their heritability. Tandem DNA repeats make up around 6% of the human genome and have been associated with more than 40 monogenic disorders, but their involvement in complex disorders is largely unknown. I will present our novel approach to detect genome-wide tandem repeat expansions. This approach has led to the identification of rare tandem repeat expansions contributing to autism spectrum disorder and other related conditions. It provides a model to search for missing heritability in other complex disorders.","PeriodicalId":164790,"journal":{"name":"CSBio '20: Proceedings of the Eleventh International Conference on Computational Systems-Biology and Bioinformatics","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127714145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pichahpuk Uthaipaisanwong, Pantakan Puengrang, C. Rangsiwutisak, Photchanathorn Prombun, Athisri Sitthipunya, Natchaphon Rajudom, K. Kusonmano
The advent of next-generation sequencing (NGS) allows to study living organisms by reading genetic materials in a high-throughput manner. The technology has opened up a field of microbial research in several areas such as medicine, agriculture, energy, and environment, to study a whole microbial community in an environment of interest without culturing. Bioinformatics analysis is a need in order to characterize and analyze microbiota in the studied samples. In this tutorial, we will give an overview of microbiome analysis based on high-throughput 16S rRNA genes sequencing, a commonly-used target sequence to classify bacteria and archaea. With biological and technology backgrounds, microbiome data from short-read sequencing platform will be elucidated followed by all important computational steps for microbiome analysis. The steps include data preprocessing, amplicon sequence variant analysis, taxonomy assignment, data normalization, and diversity analyses. Practical concepts and codes for the microbiome analysis will be demonstrated step by step providing a basic guideline for beginner.
{"title":"Beginner's guide to microbiome analysis: Bioinformatics guidelines and practical concepts for amplicon-based microbiome analysis.","authors":"Pichahpuk Uthaipaisanwong, Pantakan Puengrang, C. Rangsiwutisak, Photchanathorn Prombun, Athisri Sitthipunya, Natchaphon Rajudom, K. Kusonmano","doi":"10.1145/3429210.3429211","DOIUrl":"https://doi.org/10.1145/3429210.3429211","url":null,"abstract":"The advent of next-generation sequencing (NGS) allows to study living organisms by reading genetic materials in a high-throughput manner. The technology has opened up a field of microbial research in several areas such as medicine, agriculture, energy, and environment, to study a whole microbial community in an environment of interest without culturing. Bioinformatics analysis is a need in order to characterize and analyze microbiota in the studied samples. In this tutorial, we will give an overview of microbiome analysis based on high-throughput 16S rRNA genes sequencing, a commonly-used target sequence to classify bacteria and archaea. With biological and technology backgrounds, microbiome data from short-read sequencing platform will be elucidated followed by all important computational steps for microbiome analysis. The steps include data preprocessing, amplicon sequence variant analysis, taxonomy assignment, data normalization, and diversity analyses. Practical concepts and codes for the microbiome analysis will be demonstrated step by step providing a basic guideline for beginner.","PeriodicalId":164790,"journal":{"name":"CSBio '20: Proceedings of the Eleventh International Conference on Computational Systems-Biology and Bioinformatics","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121750683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amy Y Chen, Jonathan Jaegerman, Dunja Matić, Hassaan Inayatali, Nipon Charoenkitkarn, Jonathan H. Chan
Covid-19 is a novel epidemic that has hugely impacted countries worldwide [13]; and for which there is a need for quick and accurate screening methods. Current testing methods include the reverse transcription-polymerase chain reaction test and medical diagnosis using computed tomography scans. Both of these require expensive technologies as well as highly-trained practitioners and thus are in short supply [18]. Less developed countries and overloaded hospitals have increased the demand for cheap, easy and accurate screening methods [4]. X-ray devices are now cheap, portable and easy to use; there are few professionals, however, who are capable of manually identifying Covid-19 from a chest x-ray. We suggest implementing a machine learning model that incorporates transfer learning to automatically detect Covid-19 from chest x-ray images. The suggested model is built on top of the VGG16 architecture and pre-trained ImageNet weights. Compared with the VGG19, Inception-V3, Inception-ResNet, Xception, RestNet152-V2, and DenseNet201 models, the VGG16 model achieved the highest testing accuracy of 98% on 10 epochs as well as high positive-class accuracy. Gradient-weighted class activation mapping (Grad-CAM) was also applied to detect the regions that have a greater impact on the model classification decision.
Covid-19是一种新型流行病,对世界各国产生了巨大影响[13];因此需要快速准确的筛查方法。目前的检测方法包括逆转录聚合酶链反应测试和使用计算机断层扫描进行医学诊断。这两种方法都需要昂贵的技术和训练有素的从业人员,因此供不应求[18]。欠发达国家和超负荷的医院增加了对廉价、简便、准确的筛查方法的需求[4]。x射线设备现在便宜、便携且易于使用;然而,很少有专业人士能够从胸部x光片中手动识别Covid-19。我们建议实施一种结合迁移学习的机器学习模型,从胸部x射线图像中自动检测Covid-19。建议的模型建立在VGG16架构和预训练的ImageNet权重之上。与VGG19、Inception-V3、Inception-ResNet、Xception、RestNet152-V2和DenseNet201模型相比,VGG16模型在10个epoch上的测试准确率最高,达到98%,具有较高的正类准确率。采用梯度加权类激活映射(Gradient-weighted class activation mapping, Grad-CAM)检测对模型分类决策影响较大的区域。
{"title":"Detecting Covid-19 in Chest X-Rays using Transfer Learning with VGG16","authors":"Amy Y Chen, Jonathan Jaegerman, Dunja Matić, Hassaan Inayatali, Nipon Charoenkitkarn, Jonathan H. Chan","doi":"10.1145/3429210.3429213","DOIUrl":"https://doi.org/10.1145/3429210.3429213","url":null,"abstract":"Covid-19 is a novel epidemic that has hugely impacted countries worldwide [13]; and for which there is a need for quick and accurate screening methods. Current testing methods include the reverse transcription-polymerase chain reaction test and medical diagnosis using computed tomography scans. Both of these require expensive technologies as well as highly-trained practitioners and thus are in short supply [18]. Less developed countries and overloaded hospitals have increased the demand for cheap, easy and accurate screening methods [4]. X-ray devices are now cheap, portable and easy to use; there are few professionals, however, who are capable of manually identifying Covid-19 from a chest x-ray. We suggest implementing a machine learning model that incorporates transfer learning to automatically detect Covid-19 from chest x-ray images. The suggested model is built on top of the VGG16 architecture and pre-trained ImageNet weights. Compared with the VGG19, Inception-V3, Inception-ResNet, Xception, RestNet152-V2, and DenseNet201 models, the VGG16 model achieved the highest testing accuracy of 98% on 10 epochs as well as high positive-class accuracy. Gradient-weighted class activation mapping (Grad-CAM) was also applied to detect the regions that have a greater impact on the model classification decision.","PeriodicalId":164790,"journal":{"name":"CSBio '20: Proceedings of the Eleventh International Conference on Computational Systems-Biology and Bioinformatics","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126480930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ribonucleotides modifications to mRNA play important roles biological regulations. Over 170 types of RNA modifications have been experimentally validated. Their detection traditionally relies on specific antibody-based enrichment and analytical chemistry tools; these approaches are labor intensive and can detect only one or a few modifications at a time. This is insufficient to truly assess complete transcriptomes for sequence-specific identification and quantitation of epigenetic signals. Recently, we were the first to use third-generation Oxford Nanopore Technology (ONT) sequencing to directly sequence cellular RNA in native from, at a transcriptomic level. We determined that the method can uncover RNA modifications of any type. Based on the principle that such modifications are absent on cDNA or synthetical unmodified RNA, we conducted a study that compared sequence features of native modified RNA with unmodified RNA of the same sequence. We developed a bioinformatics tool, ELIGOS (Epitranscriptional Landscape Inferring from Glitches of ONT Signals), that successfully identified modified RNA bases from the native RNA sequences. ELIGOS accurately predicts known classes of RNA methylation sites (AUC > 0.93) in rRNAs from E. coli, yeast, and human cells, by using either unmodified in vitro transcribed RNA or our developed background-error model, which mimics the systematic error in native RNA sequences. The validity of the approach was illustrated in transcriptomes of yeast, mouse, and human cells. We further apply ELIGOS in detection of DNA adducts and for distinguishing individual alkylated DNA adducts. We analyzed a library of 16 plasmids containing site-specifically inserted O6- or N2-alkyl-deoxyguanosine lesions differing in sizes, functional group, regiochemistries, and abasic site. Based on the native DNA sequences, ELIGOS can accurately identified the location of individual DNA adducts. Moreover, individual DNA adducts were clearly distinguished from each other at the signal level. ELIGOS software is publicly available and can be used to detect possible RNA and DNA modification sites at genome-scale from native RNA/DNA sequences.
核糖核苷酸对mRNA的修饰在生物调控中起着重要作用。超过170种RNA修饰已被实验证实。它们的检测传统上依赖于基于特异性抗体的富集和分析化学工具;这些方法是劳动密集型的,一次只能检测到一个或几个修改。这不足以真正评估完整的转录组序列特异性鉴定和表观遗传信号的定量。最近,我们首次使用第三代牛津纳米孔技术(ONT)测序,在转录组水平上直接对原生细胞RNA进行测序。我们确定该方法可以发现任何类型的RNA修饰。基于cDNA或合成的未修饰RNA不存在这种修饰的原则,我们进行了一项研究,比较了天然修饰RNA与相同序列的未修饰RNA的序列特征。我们开发了一个生物信息学工具ELIGOS (Epitranscriptional Landscape Inferring from Glitches of ONT Signals),成功地从天然RNA序列中鉴定出修饰的RNA碱基。ELIGOS通过使用未经修饰的体外转录RNA或我们开发的模拟天然RNA序列系统误差的背景误差模型,准确预测大肠杆菌、酵母和人类细胞中rnas中已知的RNA甲基化位点(AUC > 0.93)。该方法的有效性在酵母、小鼠和人类细胞的转录组中得到了证明。我们进一步将ELIGOS应用于DNA加合物的检测和区分单个烷基化DNA加合物。我们分析了一个包含16个质粒的文库,这些质粒含有位点特异性插入的O6-或n2 -烷基脱氧鸟苷损伤,其大小、官能团、区域化学和基本位点不同。基于天然DNA序列,ELIGOS可以准确地识别单个DNA加合物的位置。此外,单个DNA加合物在信号水平上被清楚地区分开来。ELIGOS软件是公开可用的,可用于从天然RNA/DNA序列中检测基因组尺度上可能的RNA和DNA修饰位点。
{"title":"Uncovering RNA and DNA Modifications from Native Sequences","authors":"I. Nookaew","doi":"10.1145/3429210.3429232","DOIUrl":"https://doi.org/10.1145/3429210.3429232","url":null,"abstract":"Ribonucleotides modifications to mRNA play important roles biological regulations. Over 170 types of RNA modifications have been experimentally validated. Their detection traditionally relies on specific antibody-based enrichment and analytical chemistry tools; these approaches are labor intensive and can detect only one or a few modifications at a time. This is insufficient to truly assess complete transcriptomes for sequence-specific identification and quantitation of epigenetic signals. Recently, we were the first to use third-generation Oxford Nanopore Technology (ONT) sequencing to directly sequence cellular RNA in native from, at a transcriptomic level. We determined that the method can uncover RNA modifications of any type. Based on the principle that such modifications are absent on cDNA or synthetical unmodified RNA, we conducted a study that compared sequence features of native modified RNA with unmodified RNA of the same sequence. We developed a bioinformatics tool, ELIGOS (Epitranscriptional Landscape Inferring from Glitches of ONT Signals), that successfully identified modified RNA bases from the native RNA sequences. ELIGOS accurately predicts known classes of RNA methylation sites (AUC > 0.93) in rRNAs from E. coli, yeast, and human cells, by using either unmodified in vitro transcribed RNA or our developed background-error model, which mimics the systematic error in native RNA sequences. The validity of the approach was illustrated in transcriptomes of yeast, mouse, and human cells. We further apply ELIGOS in detection of DNA adducts and for distinguishing individual alkylated DNA adducts. We analyzed a library of 16 plasmids containing site-specifically inserted O6- or N2-alkyl-deoxyguanosine lesions differing in sizes, functional group, regiochemistries, and abasic site. Based on the native DNA sequences, ELIGOS can accurately identified the location of individual DNA adducts. Moreover, individual DNA adducts were clearly distinguished from each other at the signal level. ELIGOS software is publicly available and can be used to detect possible RNA and DNA modification sites at genome-scale from native RNA/DNA sequences.","PeriodicalId":164790,"journal":{"name":"CSBio '20: Proceedings of the Eleventh International Conference on Computational Systems-Biology and Bioinformatics","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120967435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposed a novel and straightforward approach to improve the accuracy of progressive multiple protein sequence alignment method. We trained a decision-making model based on the convolutional neural networks and bi-directional long short term memory networks, and progressively aligned the input protein sequences by calculating different posterior probability matrices. To evaluate this method, we have implemented a multiple sequence alignment tool called DLPAlign and compared its performance with eleven leading alignment methods on three empirical alignment benchmarks (BAliBASE, OXBench and SABMark). Our results show that DLPAlign can get the best total-column scores on the three benchmarks. When evaluated against the 711 low similarity families with average PID ≤ 30%, DLPAlign improved about 2.8% over the second-best MSA software. Besides, we compared the performance of DLPAlign and other alignment tools on a real-life application, namely protein secondary structure prediction on four protein sequences related to SARS-COV-2, and DLPAlign provides the best result in all cases.
{"title":"DLPAlign: A Deep Learning based Progressive Alignment Method for Multiple Protein Sequences","authors":"Mengmeng Kuang, Yong Liu, Lufei Gao","doi":"10.1145/3429210.3429221","DOIUrl":"https://doi.org/10.1145/3429210.3429221","url":null,"abstract":"This paper proposed a novel and straightforward approach to improve the accuracy of progressive multiple protein sequence alignment method. We trained a decision-making model based on the convolutional neural networks and bi-directional long short term memory networks, and progressively aligned the input protein sequences by calculating different posterior probability matrices. To evaluate this method, we have implemented a multiple sequence alignment tool called DLPAlign and compared its performance with eleven leading alignment methods on three empirical alignment benchmarks (BAliBASE, OXBench and SABMark). Our results show that DLPAlign can get the best total-column scores on the three benchmarks. When evaluated against the 711 low similarity families with average PID ≤ 30%, DLPAlign improved about 2.8% over the second-best MSA software. Besides, we compared the performance of DLPAlign and other alignment tools on a real-life application, namely protein secondary structure prediction on four protein sequences related to SARS-COV-2, and DLPAlign provides the best result in all cases.","PeriodicalId":164790,"journal":{"name":"CSBio '20: Proceedings of the Eleventh International Conference on Computational Systems-Biology and Bioinformatics","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130873120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sanghamita Bhoumik, Sayantan Chatterjee, Ankur Sarkar, Abhishek Kumar, Ferdin Joe John Joseph
COVID 19 pandemic has paralyzed the whole world irrespective of any discrimination. To contain the infection effective testing of people plays a vital role. Usually, chest X-ray image-based diagnosis using manual methods is carried out, which is not only time-consuming but also paves way for asymptomatic patients to transmit the virus at a faster pace. Chest X-ray image analysis using a fully connected convolutional neural network (CNN) has been proposed in this paper to solve the purpose. The fully connected CNN with two variants of convolution especially DSC has proved its efficiency in detecting COVID 19 infections.
{"title":"Covid 19 Prediction from X Ray Images Using Fully Connected Convolutional Neural Network","authors":"Sanghamita Bhoumik, Sayantan Chatterjee, Ankur Sarkar, Abhishek Kumar, Ferdin Joe John Joseph","doi":"10.1145/3429210.3429233","DOIUrl":"https://doi.org/10.1145/3429210.3429233","url":null,"abstract":"COVID 19 pandemic has paralyzed the whole world irrespective of any discrimination. To contain the infection effective testing of people plays a vital role. Usually, chest X-ray image-based diagnosis using manual methods is carried out, which is not only time-consuming but also paves way for asymptomatic patients to transmit the virus at a faster pace. Chest X-ray image analysis using a fully connected convolutional neural network (CNN) has been proposed in this paper to solve the purpose. The fully connected CNN with two variants of convolution especially DSC has proved its efficiency in detecting COVID 19 infections.","PeriodicalId":164790,"journal":{"name":"CSBio '20: Proceedings of the Eleventh International Conference on Computational Systems-Biology and Bioinformatics","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117074425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}