Pub Date : 2012-10-04DOI: 10.1109/BIBM.2012.6392718
Bingjing Cai, Haiying Wang, Huiru Zheng, Hui Wang
This paper presents a framework for incorporating semantic similarities in the detection of protein complexes from Affinity Purification/Mass Spectrometry (AP-MS) data. AP-MS data is modeled as a bipartite network, where one set of nodes consist of bait proteins and the other set are prey proteins. Pair-wise similarities of bait proteins are computed by combining similarities based on topological features and functional semantic similarities. A hierarchical clustering algorithm is then applied to obtain `seed clusters' consisting of bait proteins. Starting from these `seed' clusters, an expansion process is developed to recruit prey proteins which are significantly associated with bait proteins, to produce final sets of identified protein complexes. In the application to real AP-MS datasets, we validate biological significance of predicted protein complexes by using curated protein complexes. Six statistical metrics have been applied. Results show that by integrating semantic similarities into the clustering process, the accuracy of identifying complexes has been greatly improved. Meanwhile, clustering results obtained by the proposed framework are better than those from several existent clustering methods.
{"title":"Incorporating semantic similarity into clustering process for identifying protein complexes from Affinity Purification/Mass Spectrometry data","authors":"Bingjing Cai, Haiying Wang, Huiru Zheng, Hui Wang","doi":"10.1109/BIBM.2012.6392718","DOIUrl":"https://doi.org/10.1109/BIBM.2012.6392718","url":null,"abstract":"This paper presents a framework for incorporating semantic similarities in the detection of protein complexes from Affinity Purification/Mass Spectrometry (AP-MS) data. AP-MS data is modeled as a bipartite network, where one set of nodes consist of bait proteins and the other set are prey proteins. Pair-wise similarities of bait proteins are computed by combining similarities based on topological features and functional semantic similarities. A hierarchical clustering algorithm is then applied to obtain `seed clusters' consisting of bait proteins. Starting from these `seed' clusters, an expansion process is developed to recruit prey proteins which are significantly associated with bait proteins, to produce final sets of identified protein complexes. In the application to real AP-MS datasets, we validate biological significance of predicted protein complexes by using curated protein complexes. Six statistical metrics have been applied. Results show that by integrating semantic similarities into the clustering process, the accuracy of identifying complexes has been greatly improved. Meanwhile, clustering results obtained by the proposed framework are better than those from several existent clustering methods.","PeriodicalId":6392,"journal":{"name":"2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79509611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-04DOI: 10.1109/BIBMW.2012.6470312
Chen Geng, Jian Yang, Tong Li, Yongtian Wang
Currently, photographic film is the most commonly used medium for viewing medical images in hospitals. However, as digitalized medical images need to be printed on photographic film first, such kind of viewing pattern has an extremely limited use A homogeneous white light source is also needed to observe the anatomic structure clearly. Such observation pattern is very limited to the circumstances in which it is used In this paper, a novel interactive non-contact system is developed to observe multimodal medical images. For this system, a series of gestures is defined, and a depth sensor is used to capture the speckle pattern of infrared laser, which irradiates to the person in front of the system By analyzing the morphologic atlas of the depth photo, the 3-D structure and the motion of the captured image can be obtained in real time. Then, all kinds of operations on medical images, including transformation, contrast adjustment, volume rendering, can be achieved through different gesture regulations. The system developed realizes flexible observation of medical images using digitalized images directly, which greatly reduces expenses for the clinical diagnosis. The system does not need any contact with the medium. Therefore, it can be utilized by doctors doing clinical surgery.
{"title":"A novel non-contact interactive medical image viewing system","authors":"Chen Geng, Jian Yang, Tong Li, Yongtian Wang","doi":"10.1109/BIBMW.2012.6470312","DOIUrl":"https://doi.org/10.1109/BIBMW.2012.6470312","url":null,"abstract":"Currently, photographic film is the most commonly used medium for viewing medical images in hospitals. However, as digitalized medical images need to be printed on photographic film first, such kind of viewing pattern has an extremely limited use A homogeneous white light source is also needed to observe the anatomic structure clearly. Such observation pattern is very limited to the circumstances in which it is used In this paper, a novel interactive non-contact system is developed to observe multimodal medical images. For this system, a series of gestures is defined, and a depth sensor is used to capture the speckle pattern of infrared laser, which irradiates to the person in front of the system By analyzing the morphologic atlas of the depth photo, the 3-D structure and the motion of the captured image can be obtained in real time. Then, all kinds of operations on medical images, including transformation, contrast adjustment, volume rendering, can be achieved through different gesture regulations. The system developed realizes flexible observation of medical images using digitalized images directly, which greatly reduces expenses for the clinical diagnosis. The system does not need any contact with the medium. Therefore, it can be utilized by doctors doing clinical surgery.","PeriodicalId":6392,"journal":{"name":"2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78588356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-04DOI: 10.1109/BIBMW.2012.6470283
Andrew McKnight, Kamal Al-Nasr, Dong Si, Andrey N. Chernikov, N. Chrisochoides, Jing He
Cryo-electron Microscopy (cryoEM) is an important biophysical technique that produces 3-dimensional (3D) images at different resolutions. De novo modeling is becoming a promising approach to derive the atomic structure of proteins from the cryoEM 3D images at medium resolutions. Distance measurement along a thin skeleton in the 3D image is an important step in de novo modeling. In spite of the need of such measurement, little has been investigated about the accuracy of the measurement in searching for an effective method. We propose a new computational geometric approach to estimate the distance along the skeleton. Our preliminary test results show that the method was able to estimate fairly well in eleven cases.
{"title":"CryoEM skeleton length estimation using a decimated curve","authors":"Andrew McKnight, Kamal Al-Nasr, Dong Si, Andrey N. Chernikov, N. Chrisochoides, Jing He","doi":"10.1109/BIBMW.2012.6470283","DOIUrl":"https://doi.org/10.1109/BIBMW.2012.6470283","url":null,"abstract":"Cryo-electron Microscopy (cryoEM) is an important biophysical technique that produces 3-dimensional (3D) images at different resolutions. De novo modeling is becoming a promising approach to derive the atomic structure of proteins from the cryoEM 3D images at medium resolutions. Distance measurement along a thin skeleton in the 3D image is an important step in de novo modeling. In spite of the need of such measurement, little has been investigated about the accuracy of the measurement in searching for an effective method. We propose a new computational geometric approach to estimate the distance along the skeleton. Our preliminary test results show that the method was able to estimate fairly well in eleven cases.","PeriodicalId":6392,"journal":{"name":"2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76736758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-04DOI: 10.1109/BIBMW.2012.6470215
Renato de Paula, M. Holanda, M. E. Walter, Sérgio Lifschitz
In this article, we propose the application of the PROV-DM model to manage data provenance for workflows designed to support genome projects. This provenance model aims at storing details of each execution of the workflow, which include raw and produced data, computational tools and versions, parameters, and so on. This way, biologists can review details of a particular workflow execution, compare information generated among different executions, and plan new ones more efficiently. In addition, we have created a provenance simulator to facilitate the inclusion of a provenance data model in genome projects. In order to validate our proposal, we discuss a case study of an RNA-Seq project that aims to identify, measure and compare RNA expression levels across liver and kidney RNA samples produced by high-throughput automatic sequencers.
{"title":"Managing data provenance in genome project workflows","authors":"Renato de Paula, M. Holanda, M. E. Walter, Sérgio Lifschitz","doi":"10.1109/BIBMW.2012.6470215","DOIUrl":"https://doi.org/10.1109/BIBMW.2012.6470215","url":null,"abstract":"In this article, we propose the application of the PROV-DM model to manage data provenance for workflows designed to support genome projects. This provenance model aims at storing details of each execution of the workflow, which include raw and produced data, computational tools and versions, parameters, and so on. This way, biologists can review details of a particular workflow execution, compare information generated among different executions, and plan new ones more efficiently. In addition, we have created a provenance simulator to facilitate the inclusion of a provenance data model in genome projects. In order to validate our proposal, we discuss a case study of an RNA-Seq project that aims to identify, measure and compare RNA expression levels across liver and kidney RNA samples produced by high-throughput automatic sequencers.","PeriodicalId":6392,"journal":{"name":"2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76888371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-04DOI: 10.1109/BIBMW.2012.6470307
T. Cohen, D. Widdows, R. Schvaneveldt, T. Rindflesch
In this paper we extend the Predication-based Semantic Indexing (PSI) approach to search efficiently across triple-predicate pathways in a database of predications extracted from the biomédical literature by the SemRep system. PSI circumvents the combinatorial explosion of possible pathways by converting the task of traversing individual predications into the task of measuring the similarity between composite concept vectors. Consequently, search time for single, double or triple predicate paths is identical once the relevant concept vectors have been constructed. This paper describes the application of PSI to infer double and triple predicate pathways connecting example pairs of therapeutically related drugs and diseases; and to use these inferred pathways to guide search for treatments for other diseases. In an evaluation of the utility of vector-based dual- and triple-predicate path search in a simulated discovery experiment, these approaches are found to be complementary, with best performance obtained through their application in combination.
{"title":"Discovery at a distance: Farther journeys in predication space","authors":"T. Cohen, D. Widdows, R. Schvaneveldt, T. Rindflesch","doi":"10.1109/BIBMW.2012.6470307","DOIUrl":"https://doi.org/10.1109/BIBMW.2012.6470307","url":null,"abstract":"In this paper we extend the Predication-based Semantic Indexing (PSI) approach to search efficiently across triple-predicate pathways in a database of predications extracted from the biomédical literature by the SemRep system. PSI circumvents the combinatorial explosion of possible pathways by converting the task of traversing individual predications into the task of measuring the similarity between composite concept vectors. Consequently, search time for single, double or triple predicate paths is identical once the relevant concept vectors have been constructed. This paper describes the application of PSI to infer double and triple predicate pathways connecting example pairs of therapeutically related drugs and diseases; and to use these inferred pathways to guide search for treatments for other diseases. In an evaluation of the utility of vector-based dual- and triple-predicate path search in a simulated discovery experiment, these approaches are found to be complementary, with best performance obtained through their application in combination.","PeriodicalId":6392,"journal":{"name":"2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75817838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-04DOI: 10.1109/BIBM.2012.6392654
Mohamed F. Ghalwash, Dusan Ramljak, Z. Obradovic
Early classification of time series has been receiving a lot of attention as of late, particularly in the context of gene expression. In the biomédical realm, early classification can be of tremendous help, by identifying the onset of a disease before it has time to fully take hold, or determining that a treatment has done its job and can be discontinued. In this paper we present a state-of-the-art model, which we call the Early Classification Model (ECM), that allows for early, accurate, and patient-specific classification of multivariate time series. The model is comprised of an integration of the widely-used HMM and SVM models, which, while not a new technique per se, has not been used for early classification of multivariate time series classification until now. It attained very promising results on the datasets we tested it on: in our experiments based on a published dataset of response to drug therapy in Multiple Sclerosis patients, ECM used only an average of 40% of a time series and was able to outperform some of the baseline models, which needed the full time series for classification.
{"title":"Early classification of multivariate time series using a hybrid HMM/SVM model","authors":"Mohamed F. Ghalwash, Dusan Ramljak, Z. Obradovic","doi":"10.1109/BIBM.2012.6392654","DOIUrl":"https://doi.org/10.1109/BIBM.2012.6392654","url":null,"abstract":"Early classification of time series has been receiving a lot of attention as of late, particularly in the context of gene expression. In the biomédical realm, early classification can be of tremendous help, by identifying the onset of a disease before it has time to fully take hold, or determining that a treatment has done its job and can be discontinued. In this paper we present a state-of-the-art model, which we call the Early Classification Model (ECM), that allows for early, accurate, and patient-specific classification of multivariate time series. The model is comprised of an integration of the widely-used HMM and SVM models, which, while not a new technique per se, has not been used for early classification of multivariate time series classification until now. It attained very promising results on the datasets we tested it on: in our experiments based on a published dataset of response to drug therapy in Multiple Sclerosis patients, ECM used only an average of 40% of a time series and was able to outperform some of the baseline models, which needed the full time series for classification.","PeriodicalId":6392,"journal":{"name":"2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75072371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Advanced technologies are producing large-scale protein-protein interaction data at an ever increasing pace. Finding protein-protein interaction complexes from large PPI networks is a fundamental problem in bioinformatics. As a group of core proteins which interacts with other more proteins, hub proteins play a key role in protein complex and life activity. In this paper, we propose a novel topological model, HP*-complex, which defines the hub proteins of protein complex and extends to encompass the neighborhood of the hub proteins, for the initial structure of protein complexes. An algorithm based on the new topological model, called HPCMiner, is developed for identifying protein complexes from large PPI networks. The experiment results on real dataset show that our proposed algorithm detects many complexes having special biological significance. The results from a study on synthetic data sets demonstrate that the HPCMiner algorithm scales well with respect to data set size.
{"title":"Mining hub-based protein complexes in massive biological networks","authors":"Zhijie Lin, Yan Chen, Shiwei Wu, Yun Xiong, Yangyong Zhu, Guangyong Zheng","doi":"10.1109/BIBMW.2012.6470299","DOIUrl":"https://doi.org/10.1109/BIBMW.2012.6470299","url":null,"abstract":"Advanced technologies are producing large-scale protein-protein interaction data at an ever increasing pace. Finding protein-protein interaction complexes from large PPI networks is a fundamental problem in bioinformatics. As a group of core proteins which interacts with other more proteins, hub proteins play a key role in protein complex and life activity. In this paper, we propose a novel topological model, HP*-complex, which defines the hub proteins of protein complex and extends to encompass the neighborhood of the hub proteins, for the initial structure of protein complexes. An algorithm based on the new topological model, called HPCMiner, is developed for identifying protein complexes from large PPI networks. The experiment results on real dataset show that our proposed algorithm detects many complexes having special biological significance. The results from a study on synthetic data sets demonstrate that the HPCMiner algorithm scales well with respect to data set size.","PeriodicalId":6392,"journal":{"name":"2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84516476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-04DOI: 10.1109/BIBMW.2012.6470364
Changeai Xie, Nipeng Lin, Jiaxin Zhou, Zhiqi Fan, W. Fu
This article introduces the clinical experience of three-step ladder therapy of Professor Fu on gouty arthritis. In accordance with the overall syndrome and Meridian dialectical combined with the disease clinical feature, Professor Fu cures gouty arthritis from the start of pain. The first step is the application of eye acupuncture and body acupuncture in order to rapidly relieve the patient's pain; The second step is the application of moxibustion, fire needle and blood-letting puncture to enhance the efficacy; The third step is the buried intradermal needle to solidate long-term efficacy. Professor Fu's three-step ladder therapy on gouty arthritis has achieved the exact effect and significantly guided in clinical practice.
{"title":"The experimental introduction of professor Fu's three-step therapy on gouty arthritis","authors":"Changeai Xie, Nipeng Lin, Jiaxin Zhou, Zhiqi Fan, W. Fu","doi":"10.1109/BIBMW.2012.6470364","DOIUrl":"https://doi.org/10.1109/BIBMW.2012.6470364","url":null,"abstract":"This article introduces the clinical experience of three-step ladder therapy of Professor Fu on gouty arthritis. In accordance with the overall syndrome and Meridian dialectical combined with the disease clinical feature, Professor Fu cures gouty arthritis from the start of pain. The first step is the application of eye acupuncture and body acupuncture in order to rapidly relieve the patient's pain; The second step is the application of moxibustion, fire needle and blood-letting puncture to enhance the efficacy; The third step is the buried intradermal needle to solidate long-term efficacy. Professor Fu's three-step ladder therapy on gouty arthritis has achieved the exact effect and significantly guided in clinical practice.","PeriodicalId":6392,"journal":{"name":"2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90452804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-04DOI: 10.1109/BIBMW.2012.6470200
Jasjit K. Banwait, H. Ali, D. Bastola
MicroRNAs are small (approx. 22nt) noncoding RNAs that regulate gene expression by either degrading messenger-RNA (mRNA) that has already been transcribed or by repressing the translation of mRNA. This mechanism of gene regulation by binding of the miRNA to 3-prime-UTR of target mRNAs has been recently discovered and sequence-specific post-transcriptional gene regulation process affects large set of genes involved in number of biological pathways. Mapping of 7nt long miRNAseed sequence to the target gene has been a standard way of predicting miRNA targets. In this study, we have generated a profile-based filter to increase the specificity of human miRNA-mRNA relationship thereby enriching true-positive miRNA target sitesin humans based on sequence information.
{"title":"Enriching miRNA binding site specificity with sequence profile based filtering of 3'-UTR region of mRNA","authors":"Jasjit K. Banwait, H. Ali, D. Bastola","doi":"10.1109/BIBMW.2012.6470200","DOIUrl":"https://doi.org/10.1109/BIBMW.2012.6470200","url":null,"abstract":"MicroRNAs are small (approx. 22nt) noncoding RNAs that regulate gene expression by either degrading messenger-RNA (mRNA) that has already been transcribed or by repressing the translation of mRNA. This mechanism of gene regulation by binding of the miRNA to 3-prime-UTR of target mRNAs has been recently discovered and sequence-specific post-transcriptional gene regulation process affects large set of genes involved in number of biological pathways. Mapping of 7nt long miRNAseed sequence to the target gene has been a standard way of predicting miRNA targets. In this study, we have generated a profile-based filter to increase the specificity of human miRNA-mRNA relationship thereby enriching true-positive miRNA target sitesin humans based on sequence information.","PeriodicalId":6392,"journal":{"name":"2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88876166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-04DOI: 10.1109/BIBMW.2012.6470223
Julia D. Warnke-Sommer, H. Ali
Next generation sequencing has quickly emerged as the most exciting yet challenging computational problem in Bioinformatics. Current sequencing technologies are capable of producing several hundreds of thousands to several millions of short sequence reads in a single run. However, current methods for managing, storing, and processing the produced reads remain for the most part simple and lack the complexity needed to model the produced reads efficiently and assemble them correctly. These reads are produced at a high coverage of the original target sequence such that many reads overlap. The overlap relationships are used to align and merge reads into contiguous sequences called contigs. In this paper, we present an overlap graph coarsening scheme for modeling reads and their overlap relationships. Our approach is different from previous read analysis and assembly methods that use a single graph to model read overlap relationships. Instead, we use a series of graphs with different granularities of information to represent the complex read overlap relationships. We present a new graph coarsening algorithm for clustering a simulated metagenomics dataset at various levels of granularity. We also use the proposed graph coarsening scheme along with graph traversal algorithms to find a labeling of the overlap graph that allows for the efficient organization of nodes within the graph data structure.
{"title":"An efficient overlap graph coarsening approach for modeling short reads","authors":"Julia D. Warnke-Sommer, H. Ali","doi":"10.1109/BIBMW.2012.6470223","DOIUrl":"https://doi.org/10.1109/BIBMW.2012.6470223","url":null,"abstract":"Next generation sequencing has quickly emerged as the most exciting yet challenging computational problem in Bioinformatics. Current sequencing technologies are capable of producing several hundreds of thousands to several millions of short sequence reads in a single run. However, current methods for managing, storing, and processing the produced reads remain for the most part simple and lack the complexity needed to model the produced reads efficiently and assemble them correctly. These reads are produced at a high coverage of the original target sequence such that many reads overlap. The overlap relationships are used to align and merge reads into contiguous sequences called contigs. In this paper, we present an overlap graph coarsening scheme for modeling reads and their overlap relationships. Our approach is different from previous read analysis and assembly methods that use a single graph to model read overlap relationships. Instead, we use a series of graphs with different granularities of information to represent the complex read overlap relationships. We present a new graph coarsening algorithm for clustering a simulated metagenomics dataset at various levels of granularity. We also use the proposed graph coarsening scheme along with graph traversal algorithms to find a labeling of the overlap graph that allows for the efficient organization of nodes within the graph data structure.","PeriodicalId":6392,"journal":{"name":"2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72920343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}