2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops最新文献

英文中文

Incorporating semantic similarity into clustering process for identifying protein complexes from Affinity Purification/Mass Spectrometry data 将语义相似度整合到聚类过程中，从亲和纯化/质谱数据中识别蛋白质复合物

2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops

Pub Date : 2012-10-04 DOI: 10.1109/BIBM.2012.6392718

Bingjing Cai, Haiying Wang, Huiru Zheng, Hui Wang

This paper presents a framework for incorporating semantic similarities in the detection of protein complexes from Affinity Purification/Mass Spectrometry (AP-MS) data. AP-MS data is modeled as a bipartite network, where one set of nodes consist of bait proteins and the other set are prey proteins. Pair-wise similarities of bait proteins are computed by combining similarities based on topological features and functional semantic similarities. A hierarchical clustering algorithm is then applied to obtain `seed clusters' consisting of bait proteins. Starting from these `seed' clusters, an expansion process is developed to recruit prey proteins which are significantly associated with bait proteins, to produce final sets of identified protein complexes. In the application to real AP-MS datasets, we validate biological significance of predicted protein complexes by using curated protein complexes. Six statistical metrics have been applied. Results show that by integrating semantic similarities into the clustering process, the accuracy of identifying complexes has been greatly improved. Meanwhile, clustering results obtained by the proposed framework are better than those from several existent clustering methods.

本文提出了一个结合语义相似性的框架，用于从亲和纯化/质谱(AP-MS)数据中检测蛋白质复合物。AP-MS数据建模为一个二部网络，其中一组节点由诱饵蛋白质组成，另一组节点由猎物蛋白质组成。将基于拓扑特征的相似性和功能语义相似性相结合，计算诱饵蛋白的成对相似性。然后应用分层聚类算法获得由诱饵蛋白组成的“种子簇”。从这些“种子”簇开始，开发了一个扩展过程，以招募与诱饵蛋白显著相关的猎物蛋白，以产生最终的鉴定蛋白复合物。在实际AP-MS数据集的应用中，我们通过使用策划的蛋白质复合物来验证预测的蛋白质复合物的生物学意义。应用了六种统计度量。结果表明，将语义相似度集成到聚类过程中，大大提高了识别复合体的准确率。同时，该框架的聚类结果优于现有的几种聚类方法。

{"title":"Incorporating semantic similarity into clustering process for identifying protein complexes from Affinity Purification/Mass Spectrometry data","authors":"Bingjing Cai, Haiying Wang, Huiru Zheng, Hui Wang","doi":"10.1109/BIBM.2012.6392718","DOIUrl":"https://doi.org/10.1109/BIBM.2012.6392718","url":null,"abstract":"This paper presents a framework for incorporating semantic similarities in the detection of protein complexes from Affinity Purification/Mass Spectrometry (AP-MS) data. AP-MS data is modeled as a bipartite network, where one set of nodes consist of bait proteins and the other set are prey proteins. Pair-wise similarities of bait proteins are computed by combining similarities based on topological features and functional semantic similarities. A hierarchical clustering algorithm is then applied to obtain `seed clusters' consisting of bait proteins. Starting from these `seed' clusters, an expansion process is developed to recruit prey proteins which are significantly associated with bait proteins, to produce final sets of identified protein complexes. In the application to real AP-MS datasets, we validate biological significance of predicted protein complexes by using curated protein complexes. Six statistical metrics have been applied. Results show that by integrating semantic similarities into the clustering process, the accuracy of identifying complexes has been greatly improved. Meanwhile, clustering results obtained by the proposed framework are better than those from several existent clustering methods.","PeriodicalId":6392,"journal":{"name":"2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops","volume":"190 1","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2012-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79509611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A novel non-contact interactive medical image viewing system 一种新型非接触式交互式医学图像查看系统

2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops

Pub Date : 2012-10-04 DOI: 10.1109/BIBMW.2012.6470312

Chen Geng, Jian Yang, Tong Li, Yongtian Wang

Currently, photographic film is the most commonly used medium for viewing medical images in hospitals. However, as digitalized medical images need to be printed on photographic film first, such kind of viewing pattern has an extremely limited use A homogeneous white light source is also needed to observe the anatomic structure clearly. Such observation pattern is very limited to the circumstances in which it is used In this paper, a novel interactive non-contact system is developed to observe multimodal medical images. For this system, a series of gestures is defined, and a depth sensor is used to capture the speckle pattern of infrared laser, which irradiates to the person in front of the system By analyzing the morphologic atlas of the depth photo, the 3-D structure and the motion of the captured image can be obtained in real time. Then, all kinds of operations on medical images, including transformation, contrast adjustment, volume rendering, can be achieved through different gesture regulations. The system developed realizes flexible observation of medical images using digitalized images directly, which greatly reduces expenses for the clinical diagnosis. The system does not need any contact with the medium. Therefore, it can be utilized by doctors doing clinical surgery.

目前，摄影胶片是医院观看医学影像最常用的媒介。然而，由于数字化医学图像需要先在照相胶片上打印，这种观看方式的使用极为有限。为了清晰地观察解剖结构，还需要均匀的白光光源。本文开发了一种新型的交互式非接触式多模态医学图像观测系统。该系统定义了一系列手势，利用深度传感器捕获红外激光的散斑图案，照射到系统前方的人身上。通过分析深度照片的形态图谱，可以实时获得被捕获图像的三维结构和运动。然后，通过不同的手势调节，可以实现对医学图像的变换、对比度调整、体绘制等各种操作。该系统实现了直接使用数字化图像对医学图像进行灵活的观察，大大降低了临床诊断的费用。系统不需要与介质有任何接触。因此，它可以被医生用于临床手术。

{"title":"A novel non-contact interactive medical image viewing system","authors":"Chen Geng, Jian Yang, Tong Li, Yongtian Wang","doi":"10.1109/BIBMW.2012.6470312","DOIUrl":"https://doi.org/10.1109/BIBMW.2012.6470312","url":null,"abstract":"Currently, photographic film is the most commonly used medium for viewing medical images in hospitals. However, as digitalized medical images need to be printed on photographic film first, such kind of viewing pattern has an extremely limited use A homogeneous white light source is also needed to observe the anatomic structure clearly. Such observation pattern is very limited to the circumstances in which it is used In this paper, a novel interactive non-contact system is developed to observe multimodal medical images. For this system, a series of gestures is defined, and a depth sensor is used to capture the speckle pattern of infrared laser, which irradiates to the person in front of the system By analyzing the morphologic atlas of the depth photo, the 3-D structure and the motion of the captured image can be obtained in real time. Then, all kinds of operations on medical images, including transformation, contrast adjustment, volume rendering, can be achieved through different gesture regulations. The system developed realizes flexible observation of medical images using digitalized images directly, which greatly reduces expenses for the clinical diagnosis. The system does not need any contact with the medium. Therefore, it can be utilized by doctors doing clinical surgery.","PeriodicalId":6392,"journal":{"name":"2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops","volume":"157 1","pages":"254-259"},"PeriodicalIF":0.0,"publicationDate":"2012-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78588356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CryoEM skeleton length estimation using a decimated curve 利用抽取曲线估计CryoEM骨架长度

2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops

Pub Date : 2012-10-04 DOI: 10.1109/BIBMW.2012.6470283

Andrew McKnight, Kamal Al-Nasr, Dong Si, Andrey N. Chernikov, N. Chrisochoides, Jing He

Cryo-electron Microscopy (cryoEM) is an important biophysical technique that produces 3-dimensional (3D) images at different resolutions. De novo modeling is becoming a promising approach to derive the atomic structure of proteins from the cryoEM 3D images at medium resolutions. Distance measurement along a thin skeleton in the 3D image is an important step in de novo modeling. In spite of the need of such measurement, little has been investigated about the accuracy of the measurement in searching for an effective method. We propose a new computational geometric approach to estimate the distance along the skeleton. Our preliminary test results show that the method was able to estimate fairly well in eleven cases.

低温电子显微镜(cryoEM)是一种重要的生物物理技术，可以产生不同分辨率的三维(3D)图像。从头建模正在成为一种很有前途的方法，可以从中分辨率的冷冻电镜3D图像中获得蛋白质的原子结构。在三维图像中沿薄骨架的距离测量是从头建模的重要步骤。尽管有这种测量的需要，但在寻找有效方法的过程中，很少对测量的准确性进行研究。我们提出了一种新的计算几何方法来估计沿骨架的距离。我们的初步测试结果表明，该方法能够较好地估计11种情况。

引用次数: 2

Managing data provenance in genome project workflows 管理基因组计划工作流程中的数据来源

2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops

Pub Date : 2012-10-04 DOI: 10.1109/BIBMW.2012.6470215

Renato de Paula, M. Holanda, M. E. Walter, Sérgio Lifschitz

In this article, we propose the application of the PROV-DM model to manage data provenance for workflows designed to support genome projects. This provenance model aims at storing details of each execution of the workflow, which include raw and produced data, computational tools and versions, parameters, and so on. This way, biologists can review details of a particular workflow execution, compare information generated among different executions, and plan new ones more efficiently. In addition, we have created a provenance simulator to facilitate the inclusion of a provenance data model in genome projects. In order to validate our proposal, we discuss a case study of an RNA-Seq project that aims to identify, measure and compare RNA expression levels across liver and kidney RNA samples produced by high-throughput automatic sequencers.

在本文中，我们建议应用provo - dm模型来管理为支持基因组计划而设计的工作流的数据来源。这个溯源模型旨在存储工作流的每个执行的细节，包括原始和生成的数据、计算工具和版本、参数等等。通过这种方式，生物学家可以查看特定工作流执行的细节，比较不同执行之间生成的信息，并更有效地计划新的工作流。此外，我们还创建了一个来源模拟器，以方便在基因组计划中包含来源数据模型。为了验证我们的建议，我们讨论了一个RNA- seq项目的案例研究，该项目旨在鉴定、测量和比较高通量自动测序仪产生的肝脏和肾脏RNA样品中的RNA表达水平。

引用次数: 3

Discovery at a distance: Farther journeys in predication space 远方的发现:预测空间中更远的旅程

2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops

Pub Date : 2012-10-04 DOI: 10.1109/BIBMW.2012.6470307

T. Cohen, D. Widdows, R. Schvaneveldt, T. Rindflesch

In this paper we extend the Predication-based Semantic Indexing (PSI) approach to search efficiently across triple-predicate pathways in a database of predications extracted from the biomédical literature by the SemRep system. PSI circumvents the combinatorial explosion of possible pathways by converting the task of traversing individual predications into the task of measuring the similarity between composite concept vectors. Consequently, search time for single, double or triple predicate paths is identical once the relevant concept vectors have been constructed. This paper describes the application of PSI to infer double and triple predicate pathways connecting example pairs of therapeutically related drugs and diseases; and to use these inferred pathways to guide search for treatments for other diseases. In an evaluation of the utility of vector-based dual- and triple-predicate path search in a simulated discovery experiment, these approaches are found to be complementary, with best performance obtained through their application in combination.

在本文中，我们扩展了基于谓词的语义索引(PSI)方法，以便在SemRep系统从生物医学文献中提取的谓词数据库中高效地搜索三谓词路径。PSI通过将遍历单个谓词的任务转换为测量复合概念向量之间相似性的任务，避免了可能路径的组合爆炸。因此，一旦构建了相关的概念向量，单个、两个或三个谓词路径的搜索时间是相同的。本文描述了PSI在推断连接治疗相关药物和疾病的示例对的双和三重谓词通路中的应用;并利用这些推断的途径来指导寻找其他疾病的治疗方法。在模拟发现实验中对基于向量的双谓词和三谓词路径搜索的效用进行了评估，发现这些方法是互补的，通过组合应用可以获得最佳性能。

引用次数: 16

Early classification of multivariate time series using a hybrid HMM/SVM model 基于HMM/SVM混合模型的多变量时间序列早期分类

2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops

Pub Date : 2012-10-04 DOI: 10.1109/BIBM.2012.6392654

Mohamed F. Ghalwash, Dusan Ramljak, Z. Obradovic

Early classification of time series has been receiving a lot of attention as of late, particularly in the context of gene expression. In the biomédical realm, early classification can be of tremendous help, by identifying the onset of a disease before it has time to fully take hold, or determining that a treatment has done its job and can be discontinued. In this paper we present a state-of-the-art model, which we call the Early Classification Model (ECM), that allows for early, accurate, and patient-specific classification of multivariate time series. The model is comprised of an integration of the widely-used HMM and SVM models, which, while not a new technique per se, has not been used for early classification of multivariate time series classification until now. It attained very promising results on the datasets we tested it on: in our experiments based on a published dataset of response to drug therapy in Multiple Sclerosis patients, ECM used only an average of 40% of a time series and was able to outperform some of the baseline models, which needed the full time series for classification.

时间序列的早期分类近年来受到了广泛的关注，特别是在基因表达方面。在生物医学领域，早期分类可以提供巨大的帮助，通过在疾病有时间完全控制之前识别疾病的发作，或者确定治疗已经发挥了作用，可以停止治疗。在本文中，我们提出了一个最先进的模型，我们称之为早期分类模型(ECM)，它允许对多变量时间序列进行早期，准确和患者特定的分类。该模型由广泛使用的HMM模型和SVM模型集成而成，虽然本身不是一种新技术，但迄今为止尚未将其用于多变量时间序列分类的早期分类。在我们测试的数据集上，它获得了非常有希望的结果:在我们基于多发性硬化症患者对药物治疗反应的公开数据集的实验中，ECM平均只使用了40%的时间序列，并且能够优于一些基线模型，这些模型需要完整的时间序列进行分类。

{"title":"Early classification of multivariate time series using a hybrid HMM/SVM model","authors":"Mohamed F. Ghalwash, Dusan Ramljak, Z. Obradovic","doi":"10.1109/BIBM.2012.6392654","DOIUrl":"https://doi.org/10.1109/BIBM.2012.6392654","url":null,"abstract":"Early classification of time series has been receiving a lot of attention as of late, particularly in the context of gene expression. In the biomédical realm, early classification can be of tremendous help, by identifying the onset of a disease before it has time to fully take hold, or determining that a treatment has done its job and can be discontinued. In this paper we present a state-of-the-art model, which we call the Early Classification Model (ECM), that allows for early, accurate, and patient-specific classification of multivariate time series. The model is comprised of an integration of the widely-used HMM and SVM models, which, while not a new technique per se, has not been used for early classification of multivariate time series classification until now. It attained very promising results on the datasets we tested it on: in our experiments based on a published dataset of response to drug therapy in Multiple Sclerosis patients, ECM used only an average of 40% of a time series and was able to outperform some of the baseline models, which needed the full time series for classification.","PeriodicalId":6392,"journal":{"name":"2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops","volume":"189 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2012-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75072371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34

Mining hub-based protein complexes in massive biological networks 在大规模生物网络中挖掘基于枢纽的蛋白质复合物

2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops

Pub Date : 2012-10-04 DOI: 10.1109/BIBMW.2012.6470299

Zhijie Lin, Yan Chen, Shiwei Wu, Yun Xiong, Yangyong Zhu, Guangyong Zheng

Advanced technologies are producing large-scale protein-protein interaction data at an ever increasing pace. Finding protein-protein interaction complexes from large PPI networks is a fundamental problem in bioinformatics. As a group of core proteins which interacts with other more proteins, hub proteins play a key role in protein complex and life activity. In this paper, we propose a novel topological model, HP*-complex, which defines the hub proteins of protein complex and extends to encompass the neighborhood of the hub proteins, for the initial structure of protein complexes. An algorithm based on the new topological model, called HPCMiner, is developed for identifying protein complexes from large PPI networks. The experiment results on real dataset show that our proposed algorithm detects many complexes having special biological significance. The results from a study on synthetic data sets demonstrate that the HPCMiner algorithm scales well with respect to data set size.

先进的技术正在以越来越快的速度产生大规模的蛋白质-蛋白质相互作用数据。从大型PPI网络中寻找蛋白质-蛋白质相互作用复合物是生物信息学中的一个基本问题。枢纽蛋白作为一组与其他蛋白质相互作用的核心蛋白，在蛋白质复合体和生命活动中起着关键作用。本文提出了一种新的拓扑模型HP*-complex，它定义了蛋白质复合体的枢纽蛋白，并扩展到枢纽蛋白的邻域，作为蛋白质复合体的初始结构。一种基于新拓扑模型的算法，称为HPCMiner，被开发用于从大型PPI网络中识别蛋白质复合物。在真实数据集上的实验结果表明，我们提出的算法能够检测到许多具有特殊生物学意义的复合物。对合成数据集的研究结果表明，HPCMiner算法在数据集大小方面具有良好的可伸缩性。

引用次数: 0

The experimental introduction of professor Fu's three-step therapy on gouty arthritis 傅教授三步法治疗痛风性关节炎的实验介绍

2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops

Pub Date : 2012-10-04 DOI: 10.1109/BIBMW.2012.6470364

Changeai Xie, Nipeng Lin, Jiaxin Zhou, Zhiqi Fan, W. Fu

This article introduces the clinical experience of three-step ladder therapy of Professor Fu on gouty arthritis. In accordance with the overall syndrome and Meridian dialectical combined with the disease clinical feature, Professor Fu cures gouty arthritis from the start of pain. The first step is the application of eye acupuncture and body acupuncture in order to rapidly relieve the patient's pain; The second step is the application of moxibustion, fire needle and blood-letting puncture to enhance the efficacy; The third step is the buried intradermal needle to solidate long-term efficacy. Professor Fu's three-step ladder therapy on gouty arthritis has achieved the exact effect and significantly guided in clinical practice.

本文介绍傅教授三步梯法治疗痛风性关节炎的临床经验。傅教授按照整体证经辨证结合疾病临床特点，从痛风性关节炎的疼痛入手进行治疗。第一步是应用眼针和体针，以迅速缓解患者的疼痛;第二步是应用艾灸、火针、放血穿刺法增强疗效;第三步是埋设皮内针，巩固长期疗效。傅教授的三步阶梯疗法治疗痛风性关节炎疗效确切，对临床具有重要指导意义。

引用次数: 0

Enriching miRNA binding site specificity with sequence profile based filtering of 3'-UTR region of mRNA 利用基于序列谱的mRNA 3′-UTR区域过滤来增强miRNA结合位点特异性

2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops

Pub Date : 2012-10-04 DOI: 10.1109/BIBMW.2012.6470200

Jasjit K. Banwait, H. Ali, D. Bastola

MicroRNAs are small (approx. 22nt) noncoding RNAs that regulate gene expression by either degrading messenger-RNA (mRNA) that has already been transcribed or by repressing the translation of mRNA. This mechanism of gene regulation by binding of the miRNA to 3-prime-UTR of target mRNAs has been recently discovered and sequence-specific post-transcriptional gene regulation process affects large set of genes involved in number of biological pathways. Mapping of 7nt long miRNAseed sequence to the target gene has been a standard way of predicting miRNA targets. In this study, we have generated a profile-based filter to increase the specificity of human miRNA-mRNA relationship thereby enriching true-positive miRNA target sitesin humans based on sequence information.

microrna很小(约为。通过降解已转录的信使rna (mRNA)或抑制mRNA的翻译来调节基因表达的非编码rna。这种通过miRNA结合靶mrna的3-prime-UTR进行基因调控的机制最近才被发现，序列特异性的转录后基因调控过程影响了大量涉及多种生物学途径的基因。将7nt长的miRNAseed序列定位到靶基因已成为预测miRNA靶标的标准方法。在本研究中，我们生成了一个基于谱的过滤器，以提高人类miRNA- mrna关系的特异性，从而根据序列信息丰富人类miRNA真阳性靶点。

引用次数: 0

An efficient overlap graph coarsening approach for modeling short reads 一种高效的重叠图粗化方法，用于短读段建模

2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops

Pub Date : 2012-10-04 DOI: 10.1109/BIBMW.2012.6470223

Julia D. Warnke-Sommer, H. Ali

Next generation sequencing has quickly emerged as the most exciting yet challenging computational problem in Bioinformatics. Current sequencing technologies are capable of producing several hundreds of thousands to several millions of short sequence reads in a single run. However, current methods for managing, storing, and processing the produced reads remain for the most part simple and lack the complexity needed to model the produced reads efficiently and assemble them correctly. These reads are produced at a high coverage of the original target sequence such that many reads overlap. The overlap relationships are used to align and merge reads into contiguous sequences called contigs. In this paper, we present an overlap graph coarsening scheme for modeling reads and their overlap relationships. Our approach is different from previous read analysis and assembly methods that use a single graph to model read overlap relationships. Instead, we use a series of graphs with different granularities of information to represent the complex read overlap relationships. We present a new graph coarsening algorithm for clustering a simulated metagenomics dataset at various levels of granularity. We also use the proposed graph coarsening scheme along with graph traversal algorithms to find a labeling of the overlap graph that allows for the efficient organization of nodes within the graph data structure.

下一代测序已迅速成为生物信息学中最令人兴奋但也最具挑战性的计算问题。目前的测序技术能够在一次运行中产生数十万到数百万个短序列读取。然而，目前用于管理、存储和处理生成的读取的方法在很大程度上仍然很简单，缺乏对生成的读取进行有效建模和正确组装所需的复杂性。这些读取是在原始目标序列的高覆盖率上产生的，因此许多读取重叠。重叠关系用于将读取对齐和合并为称为contigs的连续序列。在本文中，我们提出了一种重叠图粗化方案来建模读取及其重叠关系。我们的方法不同于以前的读取分析和组装方法，这些方法使用单个图来建模读取重叠关系。相反，我们使用一系列具有不同粒度信息的图来表示复杂的读重叠关系。我们提出了一种新的图形粗化算法，用于在不同粒度级别上聚类模拟宏基因组数据集。我们还使用提出的图粗化方案和图遍历算法来找到重叠图的标记，该标记允许在图数据结构中有效地组织节点。

{"title":"An efficient overlap graph coarsening approach for modeling short reads","authors":"Julia D. Warnke-Sommer, H. Ali","doi":"10.1109/BIBMW.2012.6470223","DOIUrl":"https://doi.org/10.1109/BIBMW.2012.6470223","url":null,"abstract":"Next generation sequencing has quickly emerged as the most exciting yet challenging computational problem in Bioinformatics. Current sequencing technologies are capable of producing several hundreds of thousands to several millions of short sequence reads in a single run. However, current methods for managing, storing, and processing the produced reads remain for the most part simple and lack the complexity needed to model the produced reads efficiently and assemble them correctly. These reads are produced at a high coverage of the original target sequence such that many reads overlap. The overlap relationships are used to align and merge reads into contiguous sequences called contigs. In this paper, we present an overlap graph coarsening scheme for modeling reads and their overlap relationships. Our approach is different from previous read analysis and assembly methods that use a single graph to model read overlap relationships. Instead, we use a series of graphs with different granularities of information to represent the complex read overlap relationships. We present a new graph coarsening algorithm for clustering a simulated metagenomics dataset at various levels of granularity. We also use the proposed graph coarsening scheme along with graph traversal algorithms to find a labeling of the overlap graph that allows for the efficient organization of nodes within the graph data structure.","PeriodicalId":6392,"journal":{"name":"2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops","volume":"6 1","pages":"704-711"},"PeriodicalIF":0.0,"publicationDate":"2012-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72920343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀