首页 > 最新文献

2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops最新文献

英文 中文
A novel quasi-alignment-based method for discovering conserved regions in genetic sequences 一种基于准比对的发现基因序列保守区域的新方法
Pub Date : 2012-10-04 DOI: 10.1109/BIBMW.2012.6470216
Anurag Nagar, Michael Hahsler
This paper presents an alignment-free technique to efficiently discover similar regions in large sets of biological sequences using position sensitive p-mer frequency clustering. A set of sequences is broken down into segment and then a frequency distribution over all oligomers of size p (referred to as p-mers) is obtained to summarize each segment. These summaries are clustered while the order of segments in the set of sequences is preserved in a Markov-type model. Sequence segments within each cluster have very similar DNA/RNA patterns and form a so called quasi-alignment. This fact can be used for a variety of tasks such as species characterization and identification, phylogenetic analysis, functional analysis of sequences and, as in this paper, for discovering conserved regions. Our method is computationally more efficient than multiple sequences alignment since it can apply modern data stream clustering algorithms which run in time linear in the number of segments and thus can help discover highly similar regions across a large number of sequences efficiently. In this paper, we apply the approach to efficiently discover and visualize conserved regions in 16S rRNA.
本文提出了一种利用位置敏感p-mer频率聚类技术在大量生物序列中有效地发现相似区域的无比对技术。将一组序列分解成片段,然后得到大小为p的所有低聚物(称为p-mers)的频率分布,以总结每个片段。这些摘要被聚类,而序列集合中片段的顺序在马尔可夫模型中被保留。每个簇内的序列片段具有非常相似的DNA/RNA模式,并形成所谓的准对齐。这一事实可以用于各种任务,如物种表征和鉴定,系统发育分析,序列的功能分析,以及发现保守区域。我们的方法在计算上比多序列比对更有效,因为它可以应用现代数据流聚类算法,该算法在时间上线性地运行片段数量,因此可以帮助在大量序列中有效地发现高度相似的区域。在本文中,我们应用该方法有效地发现和可视化16S rRNA的保守区域。
{"title":"A novel quasi-alignment-based method for discovering conserved regions in genetic sequences","authors":"Anurag Nagar, Michael Hahsler","doi":"10.1109/BIBMW.2012.6470216","DOIUrl":"https://doi.org/10.1109/BIBMW.2012.6470216","url":null,"abstract":"This paper presents an alignment-free technique to efficiently discover similar regions in large sets of biological sequences using position sensitive p-mer frequency clustering. A set of sequences is broken down into segment and then a frequency distribution over all oligomers of size p (referred to as p-mers) is obtained to summarize each segment. These summaries are clustered while the order of segments in the set of sequences is preserved in a Markov-type model. Sequence segments within each cluster have very similar DNA/RNA patterns and form a so called quasi-alignment. This fact can be used for a variety of tasks such as species characterization and identification, phylogenetic analysis, functional analysis of sequences and, as in this paper, for discovering conserved regions. Our method is computationally more efficient than multiple sequences alignment since it can apply modern data stream clustering algorithms which run in time linear in the number of segments and thus can help discover highly similar regions across a large number of sequences efficiently. In this paper, we apply the approach to efficiently discover and visualize conserved regions in 16S rRNA.","PeriodicalId":6392,"journal":{"name":"2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops","volume":"75 1","pages":"662-669"},"PeriodicalIF":0.0,"publicationDate":"2012-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86114842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A comparison study on protein-protein interaction network models 蛋白质-蛋白质相互作用网络模型的比较研究
Pub Date : 2012-10-04 DOI: 10.1109/BIBM.2012.6392732
Mingyu Shao, Yi Yang, J. Guan, Shuigeng Zhou
This paper presents a comprehensive comparison study on the performances of major existing models over two PPI datasets, by comparing the global and local statistical properties of the original PPI networks and the model-reproduced ones. Our experimental results show that the DD model has best fitting ability while iSite model and STICKY model also fit well with the PPI datasets over most statistical properties.
本文通过比较原始PPI网络和模型复制网络的全局和局部统计特性,对现有主要模型在两个PPI数据集上的性能进行了全面的比较研究。实验结果表明,DD模型具有最佳的拟合能力,而iSite模型和STICKY模型对PPI数据集的大部分统计属性也具有较好的拟合能力。
{"title":"A comparison study on protein-protein interaction network models","authors":"Mingyu Shao, Yi Yang, J. Guan, Shuigeng Zhou","doi":"10.1109/BIBM.2012.6392732","DOIUrl":"https://doi.org/10.1109/BIBM.2012.6392732","url":null,"abstract":"This paper presents a comprehensive comparison study on the performances of major existing models over two PPI datasets, by comparing the global and local statistical properties of the original PPI networks and the model-reproduced ones. Our experimental results show that the DD model has best fitting ability while iSite model and STICKY model also fit well with the PPI datasets over most statistical properties.","PeriodicalId":6392,"journal":{"name":"2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops","volume":"78 1","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2012-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83931104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Gifts from Chinese Medicine for diabetic nephropathy: Ancient formulas in modern times 中医药馈赠糖尿病肾病:现代古方
Pub Date : 2012-10-04 DOI: 10.1109/BIBMW.2012.6470374
Lei Zhang, W. Mao, Yin Li, G. Su, Xusheng Liu
To provide inspiration for developing new therapies of diabetic nephropathy, our study combed ancient formulas for diabetic nephropathy recorded in China from Tang dynasty to Qing dynasty, and discussed their application in modern times. A total of 87 ancient formulas for diabetic nephropathy were collected. Six-Ingredient Rehmannia Pill and Supplemented Kidney Qi Pill were recorded at higher frequency. Radix Astragali liqid, Xuan Bu Pill, and Ass Hide Glue Decoction were the earliest formulas for diabetic nephropathy recorded in Tang dynasty. Ginseng Powder was recorded by doctors in 4 dynasties, with the highest dynastic repeated frequency. Only formulas with high recorded frequency, such as Six-Ingredient Rehmannia Pill, Supplemented Kidney Qi Pill, Poria Pill, Four Ingredients Decoction, Pilose Antler Pill, were applied or studied in modern times keeping original medicinal combination. Most ancient formulas have not been made good use, and just sovereign medicinals in them were applied. In order to have better guidance from valuable experiences in ancient China, we should pay more attention to apply and study ancient formulas with original combination, not only the single Chinese medicinal.
为了给糖尿病肾病新疗法的开发提供启示,本研究对中国唐代至清代的糖尿病肾病古方进行了梳理,并探讨了其在现代的应用。共收集糖尿病肾病古方87种。六味地黄丸和补肾益气丸出现频率较高。黄芪液、宣补丸、驴皮胶汤是唐代记载最早的糖尿病肾病方剂。参粉有4朝医生记载,朝代重复频率最高。近代仅应用或研究六味地黄丸、补肾益气丸、茯苓丸、四味汤、鹿茸丸等记录频率较高的方剂,保持原有的用药组合。大多数古代配方都没有得到很好的利用,只是在其中应用了主权药物。为了从中国古代的宝贵经验中得到更好的指导,我们应该更多地注意应用和研究具有原始组合的古代方剂,而不仅仅是单一的中药。
{"title":"Gifts from Chinese Medicine for diabetic nephropathy: Ancient formulas in modern times","authors":"Lei Zhang, W. Mao, Yin Li, G. Su, Xusheng Liu","doi":"10.1109/BIBMW.2012.6470374","DOIUrl":"https://doi.org/10.1109/BIBMW.2012.6470374","url":null,"abstract":"To provide inspiration for developing new therapies of diabetic nephropathy, our study combed ancient formulas for diabetic nephropathy recorded in China from Tang dynasty to Qing dynasty, and discussed their application in modern times. A total of 87 ancient formulas for diabetic nephropathy were collected. Six-Ingredient Rehmannia Pill and Supplemented Kidney Qi Pill were recorded at higher frequency. Radix Astragali liqid, Xuan Bu Pill, and Ass Hide Glue Decoction were the earliest formulas for diabetic nephropathy recorded in Tang dynasty. Ginseng Powder was recorded by doctors in 4 dynasties, with the highest dynastic repeated frequency. Only formulas with high recorded frequency, such as Six-Ingredient Rehmannia Pill, Supplemented Kidney Qi Pill, Poria Pill, Four Ingredients Decoction, Pilose Antler Pill, were applied or studied in modern times keeping original medicinal combination. Most ancient formulas have not been made good use, and just sovereign medicinals in them were applied. In order to have better guidance from valuable experiences in ancient China, we should pay more attention to apply and study ancient formulas with original combination, not only the single Chinese medicinal.","PeriodicalId":6392,"journal":{"name":"2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops","volume":"112 1","pages":"507-510"},"PeriodicalIF":0.0,"publicationDate":"2012-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83938434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clinical Case: Enhancing medical monitoring with visualization and analytics 临床案例:通过可视化和分析增强医疗监测
Pub Date : 2012-10-04 DOI: 10.1109/BIBM.2012.6392639
Michael Farnum, V. Lobanov, Michael Brennan, D. Agrafiotis, J. Kolpak, J. Ciervo, L. Alquier
Monitoring ongoing clinical trials is a crucial activity for sponsors mandated by the FDA. Some of the important goals are the early detection of safety issues, assurance of proper trial conduct at remote sites, and tracking efficiency and cost. Generating a comprehensive picture of the current state of a trial involves integrating a variety of data sources, including case report forms, laboratory results from contract labs, and safety reports. Since the volume of data is large and is updated repeatedly during the trial, tools to assist in quickly and thoroughly interrogating the data are greatly needed. Of particular importance is the ability to see both aggregate views containing multiple patients as well as the capacity to drill down to individual patients' data points. Here, we present Clinical Case, a tool that has been developed within the Janssen organization and is used for this purpose. Clinical Case provides the ability to quickly integrate SDTM datasets, define both standard and ad hoc data views, and provide regular data updates. Both standard and user-configured views can be persisted, which can be composed of a variety of interactive graphics, including typical visualizations, such as box plots, scatter plots, line charts, tree maps, heat maps, etc., as well as visualizations specifically designed for clinical data, such as the Hy's Law plot, patient timelines plot, and integrated subject listings. The user can define subsets of patients for filtering and highlighting data and use these to compare data across multiple domains. Integrated into the system is the ability to manually annotate patients and data, as well as communicate with trial administrators.
监督正在进行的临床试验是FDA授权的赞助商的一项重要活动。一些重要的目标是早期发现安全问题,确保在偏远地点进行适当的试验,以及跟踪效率和成本。生成试验当前状态的全面图像涉及整合各种数据源,包括病例报告表格、合同实验室的实验室结果和安全报告。由于数据量很大,并且在试验过程中会反复更新,因此非常需要能够帮助快速彻底地查询数据的工具。特别重要的是能够看到包含多个患者的聚合视图以及深入到单个患者数据点的能力。在这里,我们介绍临床案例,这是一个在杨森组织内开发的工具,用于此目的。临床案例提供了快速集成SDTM数据集的能力,定义标准和临时数据视图,并提供定期数据更新。可以保留标准视图和用户配置视图,这些视图可以由各种交互式图形组成,包括典型的可视化,如箱形图、散点图、折线图、树形图、热图等,以及专门为临床数据设计的可视化,如Hy’s Law图、患者时间线图和集成的主题列表。用户可以定义用于过滤和突出显示数据的患者子集,并使用这些子集跨多个域比较数据。集成到系统中的是手动注释患者和数据的能力,以及与试验管理员沟通的能力。
{"title":"Clinical Case: Enhancing medical monitoring with visualization and analytics","authors":"Michael Farnum, V. Lobanov, Michael Brennan, D. Agrafiotis, J. Kolpak, J. Ciervo, L. Alquier","doi":"10.1109/BIBM.2012.6392639","DOIUrl":"https://doi.org/10.1109/BIBM.2012.6392639","url":null,"abstract":"Monitoring ongoing clinical trials is a crucial activity for sponsors mandated by the FDA. Some of the important goals are the early detection of safety issues, assurance of proper trial conduct at remote sites, and tracking efficiency and cost. Generating a comprehensive picture of the current state of a trial involves integrating a variety of data sources, including case report forms, laboratory results from contract labs, and safety reports. Since the volume of data is large and is updated repeatedly during the trial, tools to assist in quickly and thoroughly interrogating the data are greatly needed. Of particular importance is the ability to see both aggregate views containing multiple patients as well as the capacity to drill down to individual patients' data points. Here, we present Clinical Case, a tool that has been developed within the Janssen organization and is used for this purpose. Clinical Case provides the ability to quickly integrate SDTM datasets, define both standard and ad hoc data views, and provide regular data updates. Both standard and user-configured views can be persisted, which can be composed of a variety of interactive graphics, including typical visualizations, such as box plots, scatter plots, line charts, tree maps, heat maps, etc., as well as visualizations specifically designed for clinical data, such as the Hy's Law plot, patient timelines plot, and integrated subject listings. The user can define subsets of patients for filtering and highlighting data and use these to compare data across multiple domains. Integrated into the system is the ability to manually annotate patients and data, as well as communicate with trial administrators.","PeriodicalId":6392,"journal":{"name":"2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops","volume":"23 1","pages":"1-1"},"PeriodicalIF":0.0,"publicationDate":"2012-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82810602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Rough sets and support vector machine for selecting differentially expressed miRNAs 基于粗糙集和支持向量机的差异表达mirna选择
Pub Date : 2012-10-04 DOI: 10.1109/BIBMW.2012.6470255
Sushmita Paul, P. Maji
The microRNAs, also known as miRNAs are, the class of small non-coding RNAs that repress the expression of a gene post-transcriptionally. In effect, they regulate expression of a gene or protein. It has been observed that they play an important role in various cellular processes and thus help in carrying out normal functioning of a cell. However, dysregulation of miRNAs is found to be a major cause of a disease. Various studies have also shown the role of miRNAs in cancer and utility of miRNAs for the diagnosis of cancer and other diseases. A large number of works have been conducted to identify differentially expressed miRNAs as unlike with mRNA expression, a modest number of miRNAs might be sufficient to classify human cancers. In this regard, this paper presents a rough set based feature selection algorithm to select miRNAs from expression data that can classify tissue samples into their respective category with minimal error rate. It selects a set of miRNAs by maximizing both the relevance and significance of miRNAs. The effectiveness of the rough set based algorithm, along with a comparison with other related algorithms, is demonstrated on three miRNA microarray expression data sets using the B.632+ bootstrap error rate of support vector machine.
microRNAs,也被称为miRNAs,是一类小的非编码rna,它们在转录后抑制基因的表达。实际上,它们调节基因或蛋白质的表达。据观察,它们在各种细胞过程中起着重要作用,从而有助于细胞的正常功能。然而,mirna的失调被发现是疾病的主要原因。各种研究也显示了mirna在癌症中的作用以及mirna在癌症和其他疾病诊断中的应用。已经进行了大量的工作来鉴定差异表达的mirna,因为与mRNA表达不同,适量的mirna可能足以对人类癌症进行分类。为此,本文提出了一种基于粗糙集的特征选择算法,从表达数据中选择mirna,以最小的错误率将组织样本分类到各自的类别中。它通过最大化mirna的相关性和意义来选择一组mirna。利用支持向量机的B.632+ bootstrap错误率,在三个miRNA微阵列表达数据集上证明了基于粗糙集算法的有效性,并与其他相关算法进行了比较。
{"title":"Rough sets and support vector machine for selecting differentially expressed miRNAs","authors":"Sushmita Paul, P. Maji","doi":"10.1109/BIBMW.2012.6470255","DOIUrl":"https://doi.org/10.1109/BIBMW.2012.6470255","url":null,"abstract":"The microRNAs, also known as miRNAs are, the class of small non-coding RNAs that repress the expression of a gene post-transcriptionally. In effect, they regulate expression of a gene or protein. It has been observed that they play an important role in various cellular processes and thus help in carrying out normal functioning of a cell. However, dysregulation of miRNAs is found to be a major cause of a disease. Various studies have also shown the role of miRNAs in cancer and utility of miRNAs for the diagnosis of cancer and other diseases. A large number of works have been conducted to identify differentially expressed miRNAs as unlike with mRNA expression, a modest number of miRNAs might be sufficient to classify human cancers. In this regard, this paper presents a rough set based feature selection algorithm to select miRNAs from expression data that can classify tissue samples into their respective category with minimal error rate. It selects a set of miRNAs by maximizing both the relevance and significance of miRNAs. The effectiveness of the rough set based algorithm, along with a comparison with other related algorithms, is demonstrated on three miRNA microarray expression data sets using the B.632+ bootstrap error rate of support vector machine.","PeriodicalId":6392,"journal":{"name":"2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops","volume":"734 ","pages":"864-871"},"PeriodicalIF":0.0,"publicationDate":"2012-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91549750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Auto dock-based incremental docking protocol to improve docking of large ligands 基于自动对接的增量对接协议,改进大配体的对接
Pub Date : 2012-10-04 DOI: 10.1109/BIBMW.2012.6470370
A. Dhanik, J. McMurray, L. Kavraki
It is well known that computer-aided docking of large ligands, with many rotatable bonds, is extremely difficult. AutoDock is a widely used docking program that can dock small ligands, with upto 5 or 6 rotatable bonds, accurately and quickly. Docking of larger ligands, however, is not very accurate and is computationally expensive. In this paper we present an AutoDock-based incremental docking protocol which docks a large ligand to its target protein in increments. A fragment of the large ligand is first chosen and then docked. Best docked conformations are incrementally grown and docked again, and this process is repeated until all the atoms of the ligand are docked. Each docking operation is performed using AutoDock. However, in each docking operation only a small number of rotatable bonds are allowed to rotate. We did a systematic docking study on a dataset of 73 protein-ligand complexes derived from the core set of PDBbind database. The number of rotatable bonds in the ligands vary from 7 to 30. Docking experiments were done to evaluate the docking performance of the incremental protocol in comparison to AutoDock's standard protocol. Results from the study show that, on average over the dataset, docking of large ligands using our incremental protocol is 23-fold computationally faster than docking using AutoDock's standard protocol and also has comparable or better accuracy. We propose that, for docking large ligands, our incremental protocol can be used as an alternative to AutoDock's standard protocol.
众所周知,具有许多可旋转键的大配体的计算机辅助对接是极其困难的。AutoDock是一种广泛使用的对接程序,可以准确快速地对接具有多达5或6个可旋转键的小配体。然而,大配体的对接不是很精确,而且计算成本很高。在本文中,我们提出了一种基于autodock的增量对接协议,该协议将一个大配体增量地对接到它的靶蛋白上。首先选择一个大配体的片段,然后进行对接。最好的对接构象会逐渐生长并再次对接,这一过程不断重复,直到配体的所有原子都被对接。每个对接操作都使用AutoDock执行。然而,在每次对接操作中,只允许少量的可旋转键旋转。我们对来自pdbinding数据库核心集的73个蛋白质配体复合物数据集进行了系统的对接研究。配体中可旋转键的数目从7到30不等。通过对接实验来评估增量协议与AutoDock标准协议的对接性能。研究结果表明,在数据集上,使用我们的增量协议对接大配体的计算速度比使用AutoDock的标准协议对接快23倍,并且具有相当或更好的准确性。我们建议,对于对接大配体,我们的增量协议可以用作AutoDock标准协议的替代方案。
{"title":"Auto dock-based incremental docking protocol to improve docking of large ligands","authors":"A. Dhanik, J. McMurray, L. Kavraki","doi":"10.1109/BIBMW.2012.6470370","DOIUrl":"https://doi.org/10.1109/BIBMW.2012.6470370","url":null,"abstract":"It is well known that computer-aided docking of large ligands, with many rotatable bonds, is extremely difficult. AutoDock is a widely used docking program that can dock small ligands, with upto 5 or 6 rotatable bonds, accurately and quickly. Docking of larger ligands, however, is not very accurate and is computationally expensive. In this paper we present an AutoDock-based incremental docking protocol which docks a large ligand to its target protein in increments. A fragment of the large ligand is first chosen and then docked. Best docked conformations are incrementally grown and docked again, and this process is repeated until all the atoms of the ligand are docked. Each docking operation is performed using AutoDock. However, in each docking operation only a small number of rotatable bonds are allowed to rotate. We did a systematic docking study on a dataset of 73 protein-ligand complexes derived from the core set of PDBbind database. The number of rotatable bonds in the ligands vary from 7 to 30. Docking experiments were done to evaluate the docking performance of the incremental protocol in comparison to AutoDock's standard protocol. Results from the study show that, on average over the dataset, docking of large ligands using our incremental protocol is 23-fold computationally faster than docking using AutoDock's standard protocol and also has comparable or better accuracy. We propose that, for docking large ligands, our incremental protocol can be used as an alternative to AutoDock's standard protocol.","PeriodicalId":6392,"journal":{"name":"2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops","volume":"91 1","pages":"48-55"},"PeriodicalIF":0.0,"publicationDate":"2012-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89149769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A random walk based approach for improving protein-protein interaction network and protein complex prediction 一种改进蛋白质相互作用网络和蛋白质复合物预测的随机漫步方法
Pub Date : 2012-10-04 DOI: 10.1109/BIBM.2012.6392693
Chengwei Lei, Jianhua Ruan
Recent advances in high-throughput technology have dramatically increased the quantity of available protein-protein interaction (PPI) data and stimulated the development of many methods for predicting protein complexes, which are important in understanding the functional organization of protein-protein interaction networks in different biological processes. However, automated protein complex prediction from PPI data alone is significantly hindered by the high level of noise, sparseness, and highly skewed degree distribution of PPI networks. Here we present a novel network topology-based algorithm to remove spurious interactions and recover missing ones by computational predictions, and to increase the accuracy of protein complex prediction by reducing the impact of hub nodes. The key idea of our algorithm is that two proteins sharing some high-order topological similarities, which are measured by a novel random walk-based procedure, are likely interacting with each other and may belong to the same protein complex. Applying our algorithm to a yeast protein-protein interaction network, we found that the interactions in the reconstructed PPI network have more significant biological relevance than the original network, assessed by multiple types of information, including gene ontology, gene expression, essentiality, conservation between species, and known protein complexes. Comparison with several existing methods show that the network reconstructed by our method has the highest quality. Finally, using two independent graph clustering algorithms, we found that the reconstructed network has resulted in significantly improved prediction accuracy of protein complexes.
高通量技术的最新进展极大地增加了可用的蛋白质-蛋白质相互作用(PPI)数据的数量,并刺激了许多预测蛋白质复合物方法的发展,这对于理解不同生物过程中蛋白质-蛋白质相互作用网络的功能组织非常重要。然而,仅从PPI数据自动预测蛋白质复合物会受到PPI网络的高噪声、稀疏性和高度偏斜度分布的严重阻碍。本文提出了一种新的基于网络拓扑的算法,通过计算预测来去除虚假相互作用并恢复缺失的相互作用,并通过减少集线器节点的影响来提高蛋白质复合体预测的准确性。我们的算法的关键思想是,两个蛋白质共享一些高阶拓扑相似性,这是由一种新的基于随机行走的程序来测量的,它们可能相互作用,可能属于同一个蛋白质复合物。将我们的算法应用于酵母蛋白-蛋白相互作用网络,我们发现重建的PPI网络中的相互作用比原始网络具有更显著的生物学相关性,包括基因本体、基因表达、必要性、物种之间的保守性和已知的蛋白质复合物等多种类型的信息。与现有几种方法的比较表明,本文方法重构的网络具有较高的质量。最后,使用两种独立的图聚类算法,我们发现重建的网络显著提高了蛋白质复合物的预测精度。
{"title":"A random walk based approach for improving protein-protein interaction network and protein complex prediction","authors":"Chengwei Lei, Jianhua Ruan","doi":"10.1109/BIBM.2012.6392693","DOIUrl":"https://doi.org/10.1109/BIBM.2012.6392693","url":null,"abstract":"Recent advances in high-throughput technology have dramatically increased the quantity of available protein-protein interaction (PPI) data and stimulated the development of many methods for predicting protein complexes, which are important in understanding the functional organization of protein-protein interaction networks in different biological processes. However, automated protein complex prediction from PPI data alone is significantly hindered by the high level of noise, sparseness, and highly skewed degree distribution of PPI networks. Here we present a novel network topology-based algorithm to remove spurious interactions and recover missing ones by computational predictions, and to increase the accuracy of protein complex prediction by reducing the impact of hub nodes. The key idea of our algorithm is that two proteins sharing some high-order topological similarities, which are measured by a novel random walk-based procedure, are likely interacting with each other and may belong to the same protein complex. Applying our algorithm to a yeast protein-protein interaction network, we found that the interactions in the reconstructed PPI network have more significant biological relevance than the original network, assessed by multiple types of information, including gene ontology, gene expression, essentiality, conservation between species, and known protein complexes. Comparison with several existing methods show that the network reconstructed by our method has the highest quality. Finally, using two independent graph clustering algorithms, we found that the reconstructed network has resulted in significantly improved prediction accuracy of protein complexes.","PeriodicalId":6392,"journal":{"name":"2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2012-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83146804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Modeling semantic influence for biomedicai research topics using MeSH hierarchy 基于MeSH层次结构的生物医学研究主题语义影响建模
Pub Date : 2012-10-04 DOI: 10.1109/BIBM.2012.6392645
Dan He
In this work, we model how biomedicai topics influence one another, given they are organized in a topic hierarchy, MeSH, in which the edges capture a parent-child/subsumption relationship among topics. This information enables studying influence of topics from a semantic perspective, which might be very important in analyzing topic evolution and is missing from the current literature. We first define a burst-based action for topics, which models upward momentum in popularity (or "elevated occurrences" of the topics), and use it to define two types of influence: accumulation influence and propagation influence. We then propose a model of influence between topics, and develop an efficient algorithm (TIPS) to identify influential topics. Experiments show that our model is successful at identifying influential topics and the algorithm is very efficient.
在这项工作中,我们对生物医学主题如何相互影响进行了建模,假设它们被组织在主题层次结构MeSH中,其中的边缘捕获了主题之间的亲子/包容关系。这些信息可以从语义的角度来研究话题的影响,这对于分析话题的演变可能是非常重要的,也是目前文献所缺失的。我们首先为主题定义了一个基于突发的行动,它模拟了人气上升的势头(或主题的“上升事件”),并用它来定义两种类型的影响:积累影响和传播影响。然后,我们提出了一个主题之间的影响模型,并开发了一个有效的算法(TIPS)来识别有影响力的主题。实验表明,该模型能够很好地识别有影响力的话题,算法也非常高效。
{"title":"Modeling semantic influence for biomedicai research topics using MeSH hierarchy","authors":"Dan He","doi":"10.1109/BIBM.2012.6392645","DOIUrl":"https://doi.org/10.1109/BIBM.2012.6392645","url":null,"abstract":"In this work, we model how biomedicai topics influence one another, given they are organized in a topic hierarchy, MeSH, in which the edges capture a parent-child/subsumption relationship among topics. This information enables studying influence of topics from a semantic perspective, which might be very important in analyzing topic evolution and is missing from the current literature. We first define a burst-based action for topics, which models upward momentum in popularity (or \"elevated occurrences\" of the topics), and use it to define two types of influence: accumulation influence and propagation influence. We then propose a model of influence between topics, and develop an efficient algorithm (TIPS) to identify influential topics. Experiments show that our model is successful at identifying influential topics and the algorithm is very efficient.","PeriodicalId":6392,"journal":{"name":"2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops","volume":"9 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2012-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87358502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Application of data mining to Zheng studies of Chinese medicine based on CER 基于CER的数据挖掘在中医郑学研究中的应用
Pub Date : 2012-10-04 DOI: 10.1109/BIBMW.2012.6470360
Yefeng Cai, Yue Zhang, Zhao-hui Liang
Comparative effectiveness research (CER) is a new clinical study model featured by its strategic framework consists of four categories and three themes. The core strategy of CER is to conduct observational longitude research supported by electronic registry and large database based on real world practice. Since CER studies do not uses a classic randomized control trial (RCT) design, the well-developed data analytic methods for RCTs are challenged. The data groups which are not acquired from the same time point, or have significant difference at the baseline are unable to be compared by the classic differential statistical methods, or the outcome will be without robust statistical support. In this paper, we described the characteristics of the Zheng studies of Chinese medicine. Then some data analytic methods based on machine learning are introduced as potential solutions for the data processing in the CER research of Chinese medicine. Finally, a new strategic framework is introduced to establish the CER methodology for Chinese medicine.
比较疗效研究是一种新的临床研究模式,其战略框架由四大类和三个主题组成。CER的核心策略是在电子注册表和大型数据库的支持下,开展观测经度研究。由于CER研究没有使用经典的随机对照试验(RCT)设计,因此完善的RCT数据分析方法受到了挑战。非同一时间点采集的数据组,或在基线处有显著差异的数据组,无法用经典的差分统计方法进行比较,或者结果将缺乏可靠的统计支持。本文论述了中医正学的特点。在此基础上,介绍了基于机器学习的数据分析方法,为中医CER研究中的数据处理提供了可能的解决方案。最后,介绍了建立中医临床责任评估方法论的新战略框架。
{"title":"Application of data mining to Zheng studies of Chinese medicine based on CER","authors":"Yefeng Cai, Yue Zhang, Zhao-hui Liang","doi":"10.1109/BIBMW.2012.6470360","DOIUrl":"https://doi.org/10.1109/BIBMW.2012.6470360","url":null,"abstract":"Comparative effectiveness research (CER) is a new clinical study model featured by its strategic framework consists of four categories and three themes. The core strategy of CER is to conduct observational longitude research supported by electronic registry and large database based on real world practice. Since CER studies do not uses a classic randomized control trial (RCT) design, the well-developed data analytic methods for RCTs are challenged. The data groups which are not acquired from the same time point, or have significant difference at the baseline are unable to be compared by the classic differential statistical methods, or the outcome will be without robust statistical support. In this paper, we described the characteristics of the Zheng studies of Chinese medicine. Then some data analytic methods based on machine learning are introduced as potential solutions for the data processing in the CER research of Chinese medicine. Finally, a new strategic framework is introduced to establish the CER methodology for Chinese medicine.","PeriodicalId":6392,"journal":{"name":"2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops","volume":"23 1","pages":"448-451"},"PeriodicalIF":0.0,"publicationDate":"2012-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82028024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying context-specific transcription factor targets from prior knowledge and gene expression data 从先验知识和基因表达数据中识别上下文特异性转录因子目标
Pub Date : 2012-10-04 DOI: 10.1109/BIBM.2012.6392656
E. Fertig, Alexander V. Favorov, M. Ochs
Numerous methodologies, assays, and databases presently provide candidate targets of transcription factors (TFs). However, TFs rarely regulate their targets universally. The context of activation of a TF can change the transcriptional response of targets. Direct multiple regulation typical to mammalian genes complicates direct inference of TF targets from gene expression data. We present a novel statistic that infers context-specific TF regulation based upon the CoGAPS algorithm, which infers overlapping gene expression patterns resulting from coregulation. Numerical experiments with simulated data showed that this statistic correctly inferred targets that are common to multiple TFs, except in cases where the signal from a TF is negligible relative to noise level and signal from other TFs. The statistic is robust to moderate levels of error in the simulated gene sets, identifying fewer false positives than false negatives. Significantly, the regulatory statistic refines the number of transcription factor targets relevant to cell signaling in gastrointestinal stromal tumors (GIST) to genes consistent with the phosphorylation patterns of TFs identified in previous studies. As formulated, the proposed regulatory statistic has wide applicability to inferring set membership in integrated datasets. This statistic could be naturally extended to account for prior probabilities of set membership or to add candidate gene targets.
目前,许多方法、分析和数据库提供了转录因子(tf)的候选靶标。然而,tf很少对其靶标进行普遍调控。TF的激活环境可以改变靶标的转录反应。哺乳动物基因的直接多重调控使从基因表达数据中直接推断TF靶点变得复杂。我们提出了一个基于CoGAPS算法推断上下文特异性TF调控的新统计数据,该算法推断由协同调控引起的重叠基因表达模式。用模拟数据进行的数值实验表明,除了来自一个TF的信号相对于噪声水平和来自其他TF的信号可以忽略不计的情况外,该统计量可以正确地推断出多个TF共有的目标。在模拟的基因集中,统计数据对中等水平的误差是稳健的,识别出的假阳性比假阴性少。值得注意的是,调控统计数据将胃肠道间质瘤(GIST)中与细胞信号传导相关的转录因子靶点的数量细化为与先前研究中发现的tf磷酸化模式一致的基因。所提出的调节统计量在推断集成数据集的集合隶属度方面具有广泛的适用性。这种统计可以自然地扩展到考虑集合成员的先验概率或添加候选基因目标。
{"title":"Identifying context-specific transcription factor targets from prior knowledge and gene expression data","authors":"E. Fertig, Alexander V. Favorov, M. Ochs","doi":"10.1109/BIBM.2012.6392656","DOIUrl":"https://doi.org/10.1109/BIBM.2012.6392656","url":null,"abstract":"Numerous methodologies, assays, and databases presently provide candidate targets of transcription factors (TFs). However, TFs rarely regulate their targets universally. The context of activation of a TF can change the transcriptional response of targets. Direct multiple regulation typical to mammalian genes complicates direct inference of TF targets from gene expression data. We present a novel statistic that infers context-specific TF regulation based upon the CoGAPS algorithm, which infers overlapping gene expression patterns resulting from coregulation. Numerical experiments with simulated data showed that this statistic correctly inferred targets that are common to multiple TFs, except in cases where the signal from a TF is negligible relative to noise level and signal from other TFs. The statistic is robust to moderate levels of error in the simulated gene sets, identifying fewer false positives than false negatives. Significantly, the regulatory statistic refines the number of transcription factor targets relevant to cell signaling in gastrointestinal stromal tumors (GIST) to genes consistent with the phosphorylation patterns of TFs identified in previous studies. As formulated, the proposed regulatory statistic has wide applicability to inferring set membership in integrated datasets. This statistic could be naturally extended to account for prior probabilities of set membership or to add candidate gene targets.","PeriodicalId":6392,"journal":{"name":"2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops","volume":"14 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2012-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91212120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1