首页 > 最新文献

Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference)最新文献

英文 中文
Replication of SCN5A Associations with Electrocardio-graphic Traits in African Americans from Clinical and Epidemiologic Studies. 从临床和流行病学研究中复制非裔美国人中 SCN5A 与心电图特征的关联。
Janina M Jeff, Kristin Brown-Gentry, Robert Goodloe, Marylyn D Ritchie, Joshua C Denny, Abel N Kho, Loren L Armstrong, Bob McClellan, Ping Mayo, Melissa Allen, Hailing Jin, Niloufar B Gillani, Nathalie Schnetz-Boutaud, Holli H Dilks, Melissa A Basford, Jennifer A Pacheco, Gail P Jarvik, Rex L Chisholm, Dan M Roden, M Geoffrey Hayes, Dana C Crawford

The NAv1.5 sodium channel α subunit is the predominant α-subunit expressed in the heart and is associated with cardiac arrhythmias. We tested five previously identified SCN5A variants (rs7374138, rs7637849, rs7637849, rs7629265, and rs11129796) for an association with PR interval and QRS duration in two unique study populations: the Third National Health and Nutrition Examination Survey (NHANES III, n= 552) accessed by the Epidemiologic Architecture for Genes Linked to Environment (EAGLE) and a combined dataset (n= 455) from two biobanks linked to electronic medical records from Vanderbilt University (BioVU) and Northwestern University (NUgene) as part of the electronic Medical Records & Genomics (eMERGE) network. A meta-analysis including all three study populations (n~4,000) suggests that eight SCN5A associations were significant for both QRS duration and PR interval (p<5.0E-3) with little evidence for heterogeneity across the study populations. These results suggest that published SCN5A associations replicate across different study designs in a meta-analysis and represent an important first step in utility of multiple study designs for genetic studies and the identification/characterization of genetic variants associated with ECG traits in African-descent populations.

NAv1.5 钠通道 α 亚基是在心脏中表达的主要 α 亚基,与心律失常有关。我们在两个独特的研究人群中测试了之前确定的五个 SCN5A 变异(rs7374138、rs7637849、rs7637849、rs7629265 和 rs11129796)与 PR 间期和 QRS 持续时间的关系:这两个独特的研究人群分别是:由与环境相关的基因流行病学架构(EAGLE)访问的第三次全国健康与营养调查(NHANES III,n= 552),以及作为电子病历与基因组学(eMERGE)网络的一部分,与范德比尔特大学(BioVU)和西北大学(NUgene)的电子病历相连接的两个生物库的综合数据集(n= 455)。一项包括所有三个研究人群(n~4,000)的荟萃分析表明,八项 SCN5A 关联对 QRS 持续时间和 PR 间期均有显著影响(pSCN5A 关联在荟萃分析中的不同研究设计中均有重复,代表着在遗传研究中使用多种研究设计以及鉴定/描述与非洲裔人群心电图特征相关的遗传变异方面迈出了重要的第一步)。
{"title":"Replication of <i>SCN5A</i> Associations with Electrocardio-graphic Traits in African Americans from Clinical and Epidemiologic Studies.","authors":"Janina M Jeff, Kristin Brown-Gentry, Robert Goodloe, Marylyn D Ritchie, Joshua C Denny, Abel N Kho, Loren L Armstrong, Bob McClellan, Ping Mayo, Melissa Allen, Hailing Jin, Niloufar B Gillani, Nathalie Schnetz-Boutaud, Holli H Dilks, Melissa A Basford, Jennifer A Pacheco, Gail P Jarvik, Rex L Chisholm, Dan M Roden, M Geoffrey Hayes, Dana C Crawford","doi":"10.1007/978-3-662-45523-4_76","DOIUrl":"10.1007/978-3-662-45523-4_76","url":null,"abstract":"<p><p>The NAv1.5 sodium channel α subunit is the predominant α-subunit expressed in the heart and is associated with cardiac arrhythmias. We tested five previously identified <i>SCN5A</i> variants (rs7374138, rs7637849, rs7637849, rs7629265, and rs11129796) for an association with PR interval and QRS duration in two unique study populations: the Third National Health and Nutrition Examination Survey (NHANES III, n= 552) accessed by the Epidemiologic Architecture for Genes Linked to Environment (EAGLE) and a combined dataset (n= 455) from two biobanks linked to electronic medical records from Vanderbilt University (BioVU) and Northwestern University (NUgene) as part of the electronic Medical Records & Genomics (eMERGE) network. A meta-analysis including all three study populations (n~4,000) suggests that eight <i>SCN5A</i> associations were significant for both QRS duration and PR interval (p<5.0E-3) with little evidence for heterogeneity across the study populations. These results suggest that published <i>SCN5A</i> associations replicate across different study designs in a meta-analysis and represent an important first step in utility of multiple study designs for genetic studies and the identification/characterization of genetic variants associated with ECG traits in African-descent populations.</p>","PeriodicalId":90497,"journal":{"name":"Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference)","volume":"2014 ","pages":"939-951"},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4290789/pdf/nihms644245.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32976681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Signal detection in genome sequences using complexity based features 基于复杂性特征的基因组序列信号检测
M. Kargar, Aijun An, N. Cercone, Kayvan Tirdad, Morteza Zihayat
In this work, we tackle the problem of evaluating complexity methods and measures for finding interesting signals in the whole genome of three prokaryotic organisms. In addition to previous complexity measures, new measures are introduced for representing Open Reading Frames (ORF). We apply different classification algorithms to determine which complexity measure results in better predictive performance in discriminating genes from pseudo-genes in ORFs. Also, we investigate whether positions and lengths of windows in ORFs have significant impact on distinguishing between genes and pseudo-genes. Different classification algorithms are applied for classifying ORFs into genes and pseudo-genes.
在这项工作中,我们解决了在三种原核生物的全基因组中寻找有趣信号的评估复杂性方法和措施的问题。除了之前的复杂度度量外,还引入了新的度量来表示开放阅读帧(ORF)。我们应用不同的分类算法来确定哪种复杂性度量在orf中区分基因和伪基因方面具有更好的预测性能。此外,我们还研究了orf中窗口的位置和长度是否对基因和伪基因的区分有显著影响。将orf分类为基因和伪基因采用了不同的分类算法。
{"title":"Signal detection in genome sequences using complexity based features","authors":"M. Kargar, Aijun An, N. Cercone, Kayvan Tirdad, Morteza Zihayat","doi":"10.1145/2500863.2500867","DOIUrl":"https://doi.org/10.1145/2500863.2500867","url":null,"abstract":"In this work, we tackle the problem of evaluating complexity methods and measures for finding interesting signals in the whole genome of three prokaryotic organisms. In addition to previous complexity measures, new measures are introduced for representing Open Reading Frames (ORF). We apply different classification algorithms to determine which complexity measure results in better predictive performance in discriminating genes from pseudo-genes in ORFs. Also, we investigate whether positions and lengths of windows in ORFs have significant impact on distinguishing between genes and pseudo-genes. Different classification algorithms are applied for classifying ORFs into genes and pseudo-genes.","PeriodicalId":90497,"journal":{"name":"Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference)","volume":"729 1","pages":"25-33"},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78754758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Drug-target interaction prediction for drug repurposing with probabilistic similarity logic 基于概率相似逻辑的药物再利用药物-靶标相互作用预测
Shobeir Fakhraei, L. Raschid, L. Getoor
The high development cost and low success rate of drug discovery from new compounds highlight the need for methods to discover alternate therapeutic effects for currently approved drugs. Computational methods can be effective in focusing efforts for such drug repurposing. In this paper, we propose a novel drug-target interaction prediction framework based on probabilistic similarity logic (PSL) [5]. Interaction prediction corresponds to link prediction in a bipartite network of drug-target interactions extended with a set of similarities between drugs and between targets. Using probabilistic first-order logic rules in PSL, we show how rules describing link predictions based on triads and tetrads can effectively make use of a variety of similarity measures. We learn weights for the rules based on training data, and report relative importance of each similarity for interaction prediction. We show that the learned rule weights significantly improve prediction precision. We evaluate our results on a dataset of drug-target interactions obtained from Drugbank [27] augmented with five drug-based and three target-based similarities. We integrate domain knowledge in drug-target interaction prediction and match the performance of the state-of-the-art drug-target interaction prediction systems [22] with our model using simple triad-based rules. Furthermore, we apply techniques that make link prediction in PSL more efficient for drug-target interaction prediction.
从新化合物中发现药物的高开发成本和低成功率突出了发现现有批准药物的替代治疗效果的方法的必要性。计算方法可以有效地集中此类药物再利用的努力。在本文中,我们提出了一种基于概率相似逻辑(PSL)的药物-靶标相互作用预测框架[5]。相互作用预测对应于药物-靶标相互作用的二部网络中的链接预测,该网络由药物之间和靶标之间的一系列相似性扩展而成。使用PSL中的概率一阶逻辑规则,我们展示了描述基于三分体和四分体的链接预测的规则如何有效地利用各种相似性度量。我们根据训练数据学习规则的权重,并报告每个相似度的相对重要性,以进行交互预测。结果表明,学习到的规则权重显著提高了预测精度。我们在Drugbank[27]获得的药物-靶标相互作用数据集上评估了我们的结果,其中增加了5种基于药物的相似性和3种基于靶标的相似性。我们将领域知识整合到药物-靶标相互作用预测中,并使用简单的基于三元组的规则将最先进的药物-靶标相互作用预测系统[22]的性能与我们的模型相匹配。此外,我们将使PSL中的链接预测技术更有效地用于药物-靶标相互作用预测。
{"title":"Drug-target interaction prediction for drug repurposing with probabilistic similarity logic","authors":"Shobeir Fakhraei, L. Raschid, L. Getoor","doi":"10.1145/2500863.2500870","DOIUrl":"https://doi.org/10.1145/2500863.2500870","url":null,"abstract":"The high development cost and low success rate of drug discovery from new compounds highlight the need for methods to discover alternate therapeutic effects for currently approved drugs. Computational methods can be effective in focusing efforts for such drug repurposing. In this paper, we propose a novel drug-target interaction prediction framework based on probabilistic similarity logic (PSL) [5]. Interaction prediction corresponds to link prediction in a bipartite network of drug-target interactions extended with a set of similarities between drugs and between targets. Using probabilistic first-order logic rules in PSL, we show how rules describing link predictions based on triads and tetrads can effectively make use of a variety of similarity measures. We learn weights for the rules based on training data, and report relative importance of each similarity for interaction prediction. We show that the learned rule weights significantly improve prediction precision. We evaluate our results on a dataset of drug-target interactions obtained from Drugbank [27] augmented with five drug-based and three target-based similarities. We integrate domain knowledge in drug-target interaction prediction and match the performance of the state-of-the-art drug-target interaction prediction systems [22] with our model using simple triad-based rules. Furthermore, we apply techniques that make link prediction in PSL more efficient for drug-target interaction prediction.","PeriodicalId":90497,"journal":{"name":"Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference)","volume":"33 2 1","pages":"10-17"},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72926445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
MFMS: maximal frequent module set mining from multiple human gene expression data sets MFMS:从多个人类基因表达数据集中挖掘最大频繁模块集
Saeed Salem, C. Ozcaglar
Advances in genomic technologies have allowed vast amounts of gene expression data to be collected. Protein functional annotation and biological module discovery that are based on a single gene expression data suffers from spurious coexpression. Recent work have focused on integrating multiple independent gene expression data sets. In this paper, we propose a two-step approach for mining maximally frequent collection of highly connected modules from coexpression graphs. We first mine maximal frequent edge-sets and then extract highly connected subgraphs from the edge-induced subgraphs. Experimental results on the collection of modules mined from 52 Human gene expression data sets show that coexpression links that occur together in a significant number of experiments have a modular topological structure. Moreover, GO enrichment analysis shows that the proposed approach discovers biologically significant frequent collections of modules.
基因组技术的进步使得大量的基因表达数据得以收集。基于单个基因表达数据的蛋白质功能注释和生物模块发现存在虚假共表达的问题。最近的工作集中在整合多个独立的基因表达数据集。在本文中,我们提出了一种从共表达式图中挖掘高度连接模块的最大频繁集合的两步方法。首先挖掘最大频繁边集,然后从边诱导子图中提取高连通子图。从52个人类基因表达数据集中挖掘模块的实验结果表明,在大量实验中一起发生的共表达链接具有模块化拓扑结构。此外,氧化石墨烯富集分析表明,该方法发现了具有生物学意义的频繁模块集合。
{"title":"MFMS: maximal frequent module set mining from multiple human gene expression data sets","authors":"Saeed Salem, C. Ozcaglar","doi":"10.1145/2500863.2500869","DOIUrl":"https://doi.org/10.1145/2500863.2500869","url":null,"abstract":"Advances in genomic technologies have allowed vast amounts of gene expression data to be collected. Protein functional annotation and biological module discovery that are based on a single gene expression data suffers from spurious coexpression. Recent work have focused on integrating multiple independent gene expression data sets. In this paper, we propose a two-step approach for mining maximally frequent collection of highly connected modules from coexpression graphs. We first mine maximal frequent edge-sets and then extract highly connected subgraphs from the edge-induced subgraphs. Experimental results on the collection of modules mined from 52 Human gene expression data sets show that coexpression links that occur together in a significant number of experiments have a modular topological structure. Moreover, GO enrichment analysis shows that the proposed approach discovers biologically significant frequent collections of modules.","PeriodicalId":90497,"journal":{"name":"Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference)","volume":"20 1","pages":"51-57"},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83168458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Computational phenotype prediction of ionizing-radiation-resistant bacteria with a multiple-instance learning model 用多实例学习模型预测抗电离辐射细菌的计算表型
Sabeur Aridhi, Mondher Maddouri, H. Sghaier, E. Nguifo
Ionizing-radiation-resistant bacteria (IRRB) are important in biotechnology. The use of these bacteria for the treatment of radioactive wastes is determined by their surprising capacity of adaptation to radionuclides and a variety of toxic molecules. In silico methods are unavailable for the purpose of phenotypic prediction and genotype-phenotype relationship discovery. We analyze basal DNA repair proteins of most known proteomes sequences of IRRB and ionizing-radiation-sensitive bacteria (IRSB) in order to learn a classifier that correctly predicts unseen bacteria. In this work, we formulate the problem of predicting IRRB as a multiple-instance learning (MIL) problem and we propose a novel approach for predicting IRRB. We use a local alignment technique to measure the similarity between protein sequences to predict ionizing-radiation-resistant bacteria. The first results are satisfactory and provide a MIL-based prediction system that predicts whether a bacterium belongs to IRRB or to IRSB. The proposed system is available online.
抗电离辐射细菌(IRRB)在生物技术中具有重要意义。利用这些细菌处理放射性废物是由它们对放射性核素和各种有毒分子的惊人适应能力决定的。计算机方法无法用于表型预测和基因型-表型关系的发现。我们分析了irb和电离辐射敏感细菌(IRSB)大多数已知蛋白质组序列的基础DNA修复蛋白,以学习正确预测未见细菌的分类器。在这项工作中,我们将IRRB预测问题表述为一个多实例学习(MIL)问题,并提出了一种预测IRRB的新方法。我们使用局部比对技术来测量蛋白质序列之间的相似性,以预测电离辐射抗性细菌。第一个结果令人满意,并提供了一个基于mil的预测系统来预测细菌是属于irb还是IRSB。提出的系统可在线使用。
{"title":"Computational phenotype prediction of ionizing-radiation-resistant bacteria with a multiple-instance learning model","authors":"Sabeur Aridhi, Mondher Maddouri, H. Sghaier, E. Nguifo","doi":"10.1145/2500863.2500866","DOIUrl":"https://doi.org/10.1145/2500863.2500866","url":null,"abstract":"Ionizing-radiation-resistant bacteria (IRRB) are important in biotechnology. The use of these bacteria for the treatment of radioactive wastes is determined by their surprising capacity of adaptation to radionuclides and a variety of toxic molecules. In silico methods are unavailable for the purpose of phenotypic prediction and genotype-phenotype relationship discovery. We analyze basal DNA repair proteins of most known proteomes sequences of IRRB and ionizing-radiation-sensitive bacteria (IRSB) in order to learn a classifier that correctly predicts unseen bacteria. In this work, we formulate the problem of predicting IRRB as a multiple-instance learning (MIL) problem and we propose a novel approach for predicting IRRB. We use a local alignment technique to measure the similarity between protein sequences to predict ionizing-radiation-resistant bacteria. The first results are satisfactory and provide a MIL-based prediction system that predicts whether a bacterium belongs to IRRB or to IRSB. The proposed system is available online.","PeriodicalId":90497,"journal":{"name":"Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference)","volume":"10 1","pages":"18-24"},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75549674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Mining spatially cohesive itemsets in protein molecular structures 在蛋白质分子结构中挖掘空间内聚项集
Cheng Zhou, P. Meysman, B. Cule, K. Laukens, Bart Goethals
In this paper we present a cohesive structural itemset miner aiming to discover interesting patterns in a set of data objects within a multidimensional spatial structure by combining the cohesion and the support of the pattern. The usefulness of this algorithm is demonstrated by applying it to find interesting patterns of amino acids in spatial proximity within a set of proteins based on their atomic coordinates in the protein molecular structure. The experiments show that several patterns found by the cohesive structural itemset miner contain amino acids that frequently co-occur in the spatial structure, even if they are distant in the primary protein sequence and only brought together by protein folding. Further various indications were found that some of the discovered patterns seem to represent common underlying support structures within the proteins.
在本文中,我们提出了一个内聚结构项集挖掘器,旨在通过结合内聚和模式支持,在多维空间结构中的一组数据对象中发现有趣的模式。通过将该算法应用于基于蛋白质分子结构中的原子坐标的一组蛋白质中寻找空间邻近氨基酸的有趣模式,证明了该算法的实用性。实验表明,由内聚结构项集挖掘器发现的几种模式包含经常在空间结构中共同出现的氨基酸,即使它们在初级蛋白质序列中距离较远,并且仅通过蛋白质折叠聚集在一起。进一步的各种迹象表明,一些发现的模式似乎代表了蛋白质中共同的潜在支持结构。
{"title":"Mining spatially cohesive itemsets in protein molecular structures","authors":"Cheng Zhou, P. Meysman, B. Cule, K. Laukens, Bart Goethals","doi":"10.1145/2500863.2500871","DOIUrl":"https://doi.org/10.1145/2500863.2500871","url":null,"abstract":"In this paper we present a cohesive structural itemset miner aiming to discover interesting patterns in a set of data objects within a multidimensional spatial structure by combining the cohesion and the support of the pattern. The usefulness of this algorithm is demonstrated by applying it to find interesting patterns of amino acids in spatial proximity within a set of proteins based on their atomic coordinates in the protein molecular structure. The experiments show that several patterns found by the cohesive structural itemset miner contain amino acids that frequently co-occur in the spatial structure, even if they are distant in the primary protein sequence and only brought together by protein folding. Further various indications were found that some of the discovered patterns seem to represent common underlying support structures within the proteins.","PeriodicalId":90497,"journal":{"name":"Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference)","volume":"7 1","pages":"42-50"},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90897089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A fast and scalable clustering-based approach for constructing reliable radiation hybrid maps 一种快速、可扩展的基于聚类的可靠辐射混合地图构建方法
Raed I. Seetan, Ajay Kumar, A. Denton, M. Iqbal, O. Azzam, S. Kianian
The process of mapping markers from radiation hybrid mapping (RHM) experiments is equivalent to the traveling salesman problem and, thereby, has combinatorial complexity. As an additional problem, experiments typically result in some unreliable markers that reduce the overall quality of the map. We propose a clustering approach for addressing both problems efficiently by eliminating unreliable markers without the need for mapping the complete set of markers. Traditional approaches for eliminating markers use resampling of the full data set, which has an even higher computational complexity than the original mapping problem. In contrast, the proposed approach uses a divide and conquer strategy to construct framework maps based on clusters that exclude unreliable markers. Clusters are ordered using parallel processing and are then combined to form the complete map. Using an RHM data set of the human genome, we compare the framework maps from our proposed approaches with published physical maps and with the Carthagene tool. Overall, our approach has a very low computational complexity and produces solid framework maps with good chromosome coverage and high agreement with the physical map marker order.
辐射混合映射(RHM)实验中标记的映射过程等价于旅行商问题,因此具有组合复杂性。另一个问题是,实验通常会产生一些不可靠的标记,从而降低地图的整体质量。我们提出了一种聚类方法,通过消除不可靠的标记而不需要映射完整的标记集来有效地解决这两个问题。传统的消除标记的方法使用对整个数据集进行重新采样,这比原始映射问题具有更高的计算复杂度。相比之下,该方法使用分而治之的策略来构建基于排除不可靠标记的聚类的框架图。集群使用并行处理排序,然后组合形成完整的地图。使用人类基因组的RHM数据集,我们将我们提出的方法的框架图与已发表的物理图和Carthagene工具进行了比较。总的来说,我们的方法具有非常低的计算复杂度,并产生具有良好染色体覆盖率和与物理地图标记顺序高度一致的实体框架地图。
{"title":"A fast and scalable clustering-based approach for constructing reliable radiation hybrid maps","authors":"Raed I. Seetan, Ajay Kumar, A. Denton, M. Iqbal, O. Azzam, S. Kianian","doi":"10.1145/2500863.2500868","DOIUrl":"https://doi.org/10.1145/2500863.2500868","url":null,"abstract":"The process of mapping markers from radiation hybrid mapping (RHM) experiments is equivalent to the traveling salesman problem and, thereby, has combinatorial complexity. As an additional problem, experiments typically result in some unreliable markers that reduce the overall quality of the map. We propose a clustering approach for addressing both problems efficiently by eliminating unreliable markers without the need for mapping the complete set of markers. Traditional approaches for eliminating markers use resampling of the full data set, which has an even higher computational complexity than the original mapping problem. In contrast, the proposed approach uses a divide and conquer strategy to construct framework maps based on clusters that exclude unreliable markers. Clusters are ordered using parallel processing and are then combined to form the complete map. Using an RHM data set of the human genome, we compare the framework maps from our proposed approaches with published physical maps and with the Carthagene tool. Overall, our approach has a very low computational complexity and produces solid framework maps with good chromosome coverage and high agreement with the physical map marker order.","PeriodicalId":90497,"journal":{"name":"Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference)","volume":"3 1","pages":"34-41"},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87213764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Heuristic approaches for time-lagged biclustering 时滞双聚类的启发式方法
Joana P. Gonçalves, S. Madeira
Identifying patterns in temporal data supports complex analyses in several domains, including stock markets (finance) and social interactions (social science). Clinical and biological applications, such as monitoring patient response to treatment or characterizing activity at the molecular level, are also of interest. In particular, researchers seek to gain insight into the dynamics of biological processes, and potential perturbations of these leading to disease, through the discovery of patterns in time series gene expression data. For many years, clustering has remained the standard technique to group genes exhibiting similar response profiles. However, clustering defines similarity across all time points, focusing on global patterns which tend to characterize rather broad and unspecific responses. It is widely believed that local patterns offer additional insight into the underlying intricate events leading to the overall observed behavior. Efficient biclustering algorithms have been devised for the discovery of temporally aligned local patterns in gene expression time series, but the extraction of time-lagged patterns remains a challenge due to the combinatorial explosion of pattern occurrence combinations when delays are considered. We present heuristic approaches enabling polynomial rather than exponential time solutions for the problem.
识别时间数据中的模式支持多个领域的复杂分析,包括股票市场(金融)和社会互动(社会科学)。临床和生物学应用,如监测患者对治疗的反应或在分子水平上表征活性,也令人感兴趣。特别是,研究人员试图通过发现时间序列基因表达数据的模式,深入了解生物过程的动态,以及这些过程导致疾病的潜在扰动。多年来,聚类一直是标准的技术组基因表现出相似的反应概况。然而,聚类定义了所有时间点上的相似性,关注的是倾向于描述相当广泛和非特定响应的全局模式。人们普遍认为,局部模式提供了对导致整体观察行为的潜在复杂事件的额外见解。高效的双聚类算法已被设计用于发现基因表达时间序列中时间对齐的局部模式,但由于考虑延迟时模式发生组合的组合爆炸,时间滞后模式的提取仍然是一个挑战。我们提出了启发式方法,使多项式而不是指数时间解决问题。
{"title":"Heuristic approaches for time-lagged biclustering","authors":"Joana P. Gonçalves, S. Madeira","doi":"10.1145/2500863.2500865","DOIUrl":"https://doi.org/10.1145/2500863.2500865","url":null,"abstract":"Identifying patterns in temporal data supports complex analyses in several domains, including stock markets (finance) and social interactions (social science). Clinical and biological applications, such as monitoring patient response to treatment or characterizing activity at the molecular level, are also of interest. In particular, researchers seek to gain insight into the dynamics of biological processes, and potential perturbations of these leading to disease, through the discovery of patterns in time series gene expression data. For many years, clustering has remained the standard technique to group genes exhibiting similar response profiles. However, clustering defines similarity across all time points, focusing on global patterns which tend to characterize rather broad and unspecific responses. It is widely believed that local patterns offer additional insight into the underlying intricate events leading to the overall observed behavior. Efficient biclustering algorithms have been devised for the discovery of temporally aligned local patterns in gene expression time series, but the extraction of time-lagged patterns remains a challenge due to the combinatorial explosion of pattern occurrence combinations when delays are considered. We present heuristic approaches enabling polynomial rather than exponential time solutions for the problem.","PeriodicalId":90497,"journal":{"name":"Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference)","volume":"50 1","pages":"1-9"},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82039710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Mining for Variability in the Coagulation Pathway: A Systems Biology Approach 在凝血途径中挖掘可变性:系统生物学方法
D. Castaldi, D. Maccagnola, D. Mari, F. Archetti
{"title":"Mining for Variability in the Coagulation Pathway: A Systems Biology Approach","authors":"D. Castaldi, D. Maccagnola, D. Mari, F. Archetti","doi":"10.1007/978-3-642-37189-9_14","DOIUrl":"https://doi.org/10.1007/978-3-642-37189-9_14","url":null,"abstract":"","PeriodicalId":90497,"journal":{"name":"Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference)","volume":"5 3","pages":"153-164"},"PeriodicalIF":0.0,"publicationDate":"2013-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72594568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structured Populations and the Maintenance of Sex 结构人口和性别的维持
P. Whigham, Grant Dick, A. Wright, H. Spencer
{"title":"Structured Populations and the Maintenance of Sex","authors":"P. Whigham, Grant Dick, A. Wright, H. Spencer","doi":"10.1007/978-3-642-37189-9_6","DOIUrl":"https://doi.org/10.1007/978-3-642-37189-9_6","url":null,"abstract":"","PeriodicalId":90497,"journal":{"name":"Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference)","volume":"62 1","pages":"56-67"},"PeriodicalIF":0.0,"publicationDate":"2013-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81271383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1