Data and Text Mining in Bioinformatics最新文献

Inferring Undiscovered Public Knowledge by Using Text Mining-driven Graph Model 基于文本挖掘驱动图模型的未发现公共知识推理

Data and Text Mining in Bioinformatics

Pub Date : 2014-11-07 DOI: 10.1145/2665970.2665984

G. Heo, Keeheon Lee, Min Song

Due to the recent development of Information Technology, the number of publications is increasing exponentially. In response to the increasing number of publications, there has been a sharp surge in the demand for replacing the existing manual text data processing by an automatic text data processing. Swanson proposed ABC model [1] on the top of text mining as a part of literature-based knowledge discovery for finding new possible biomedical hypotheses about three decades ago. The following clinical scholars proved the effectiveness of the possible hypotheses found by ABC model [2]. Such effectiveness let scholars try various literature-based knowledge discovery approaches [3, 4, 5]. However, their trials are not fully automated but hybrids of automatic and manual processes. The manual process requires the intervention of experts. In addition, their trials consider a single perspective. Even trials involving network theory have difficulties in mal-understanding the entire network structure of the relationships among concepts and the systematic interpretation on the structure [6, 7]. Thus, this study proposes a novel approach to discover various relationships by extending the intermediate concept B to a multi-leveled concept. By applying a graph-based path finding method based on co-occurrence and the relational entities among concepts, we attempt to systematically analyze and investigate the relationships between two concepts of a source node and a target node in the total paths. For the analysis of our study, we set our baseline as the result of Swanson [8]'s work. This work suggested the intermediate concept or terms between Raynaud's disease and fish oils as blood viscosity, platelet aggregability, and vasconstriction. We compared our results of intermediate concepts with these intermediate concepts of Swanson's. This study provides distinct perspectives for literature-based discovery by not only discovering the meaningful relationship among concepts in biomedical literature through graph-based path interference but also being able to generate feasible new hypotheses.

由于信息技术的发展，出版物的数量呈指数级增长。由于出版物的数量不断增加，要求以自动文本数据处理取代现有的手工文本数据处理的需求急剧增加。大约三十年前，Swanson在文本挖掘的基础上提出了ABC模型[1]，作为基于文献的知识发现的一部分，用于寻找新的可能的生物医学假设。以下临床学者证明了ABC模型b[2]可能假设的有效性。这种有效性使得学者们尝试了各种基于文献的知识发现方法[3,4,5]。然而，他们的试验不是完全自动化的，而是自动和手动过程的混合。手工过程需要专家的介入。此外，他们的试验考虑的是单一的视角。即使是涉及网络理论的试验，也难以正确理解概念间关系的整个网络结构以及对该结构的系统解释[6,7]。因此，本研究提出了一种通过将中间概念B扩展为多层次概念来发现各种关系的新方法。采用基于概念间共现和关系实体的基于图的寻径方法，系统地分析和研究了总路径中源节点和目标节点两个概念之间的关系。为了分析我们的研究，我们将基线设置为Swanson[8]的工作结果。这项工作提出雷诺氏病和鱼油之间的中间概念或术语，如血液粘度，血小板聚集性和血管收缩。我们将我们的中间概念的结果与Swanson的这些中间概念进行了比较。本研究不仅通过基于图的路径干扰发现生物医学文献中概念之间有意义的关系，而且能够产生可行的新假设，为基于文献的发现提供了独特的视角。

{"title":"Inferring Undiscovered Public Knowledge by Using Text Mining-driven Graph Model","authors":"G. Heo, Keeheon Lee, Min Song","doi":"10.1145/2665970.2665984","DOIUrl":"https://doi.org/10.1145/2665970.2665984","url":null,"abstract":"Due to the recent development of Information Technology, the number of publications is increasing exponentially. In response to the increasing number of publications, there has been a sharp surge in the demand for replacing the existing manual text data processing by an automatic text data processing. Swanson proposed ABC model [1] on the top of text mining as a part of literature-based knowledge discovery for finding new possible biomedical hypotheses about three decades ago. The following clinical scholars proved the effectiveness of the possible hypotheses found by ABC model [2]. Such effectiveness let scholars try various literature-based knowledge discovery approaches [3, 4, 5]. However, their trials are not fully automated but hybrids of automatic and manual processes. The manual process requires the intervention of experts. In addition, their trials consider a single perspective. Even trials involving network theory have difficulties in mal-understanding the entire network structure of the relationships among concepts and the systematic interpretation on the structure [6, 7]. Thus, this study proposes a novel approach to discover various relationships by extending the intermediate concept B to a multi-leveled concept. By applying a graph-based path finding method based on co-occurrence and the relational entities among concepts, we attempt to systematically analyze and investigate the relationships between two concepts of a source node and a target node in the total paths. For the analysis of our study, we set our baseline as the result of Swanson [8]'s work. This work suggested the intermediate concept or terms between Raynaud's disease and fish oils as blood viscosity, platelet aggregability, and vasconstriction. We compared our results of intermediate concepts with these intermediate concepts of Swanson's. This study provides distinct perspectives for literature-based discovery by not only discovering the meaningful relationship among concepts in biomedical literature through graph-based path interference but also being able to generate feasible new hypotheses.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130530143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Detecting Phosphorylation Determined Active Protein Interaction Network during Cancer Development by Robust Network Component Analysis 通过鲁棒网络成分分析检测癌症发展过程中磷酸化决定的活性蛋白相互作用网络

Data and Text Mining in Bioinformatics

Pub Date : 2014-11-07 DOI: 10.1145/2665970.2665991

T. Zeng, Ziming Wang, Luonan Chen

Motivation: In recent disease study, many key pathogen genes/proteins are found to have not significant differential expressions, and thus, they tend to be disregarded in conventional differential expression analysis or network analysis. Meanwhile, the activity in dry-experiment rather than expression in wet-experiment have been proposed to effectively estimate the actual regulation power of such important biomolecules, e.g. transcriptional factors. But, it is still unknown what and how a hidden factor (e.g. phosphorylation) determines this kind of virtual regulation power as activity [1]. Especially, for the cancer development study, it is emergent to reconstruct the active protein interaction network and detect the underlying phosphorylation pattern in a dynamic manner [2-7]. Methods: Based on the c-Myc mouse model of liver cancer, we have first collected protein expression and protein phosphorylation data at several developmental time points. Then, we constructed a rough protein interaction network as background by conditional mutual information. Next, we improved the conventional network component analysis on its robustness, and used this advanced approach RNCA (Robust Network Component Analysis) to reconstruct the time-dependent protein interaction networks and estimate the activity of target protein at different times simultaneously. Finally, considering the different experiment-qualities of protein expression and phosphorylation data, we used canonical correlation analysis to detect the maximal correlation between the expression and phosphorylation of a group of proteins (e.g. protein network module), which could reveal the active protein sub-network and its determinate factor as phosphorylation. Results: In the preliminary study, we have evaluated the robustness of RNCA by comparing with other conventional methods. And on the real biological data, we have found the rewired protein interaction network during cancer development, its corresponding active proteins, and their drivers as protein phosphorylation. This work can be further used in early diagnosis of diseases by edge biomarkers [1-2], network biomarkers [3-4] and dynamical network biomarkers [5-7].

动机:在最近的疾病研究中，发现许多关键的病原体基因/蛋白没有显著的差异表达，因此在常规的差异表达分析或网络分析中往往被忽略。与此同时，我们提出了干燥实验中的活性而不是湿实验中的表达，以有效地估计转录因子等重要生物分子的实际调控能力。但是，一个隐藏的因素(如磷酸化)是什么以及如何决定这种虚拟调节能力作为活性[1]仍然是未知的。特别是在癌症发展研究中，重构活性蛋白相互作用网络，动态检测潜在的磷酸化模式已迫在眉睫[2-7]。方法:基于肝癌小鼠c-Myc模型，我们首先收集了多个发育时间点的蛋白表达和蛋白磷酸化数据。然后，我们利用条件互信息构建了一个粗略的蛋白质相互作用网络作为背景。接下来，我们对传统的网络成分分析方法进行鲁棒性改进，采用RNCA (Robust network component analysis，鲁棒网络成分分析)方法重构时间依赖性蛋白相互作用网络，同时估计目标蛋白在不同时间的活性。最后，考虑到蛋白质表达和磷酸化数据的实验质量不同，我们使用典型相关分析检测一组蛋白质(如蛋白质网络模块)的表达与磷酸化之间的最大相关性，从而揭示活性蛋白质子网络及其磷酸化的决定因素。结果:在初步研究中，我们通过与其他常规方法的比较，评估了RNCA的稳健性。在真实的生物学数据上，我们发现了癌症发展过程中重新连接的蛋白质相互作用网络，它对应的活性蛋白质，以及它们的驱动因素是蛋白质磷酸化。这项工作可以进一步应用于边缘生物标志物[1-2]、网络生物标志物[3-4]和动态网络生物标志物[5-7]的疾病早期诊断。

{"title":"Detecting Phosphorylation Determined Active Protein Interaction Network during Cancer Development by Robust Network Component Analysis","authors":"T. Zeng, Ziming Wang, Luonan Chen","doi":"10.1145/2665970.2665991","DOIUrl":"https://doi.org/10.1145/2665970.2665991","url":null,"abstract":"Motivation: In recent disease study, many key pathogen genes/proteins are found to have not significant differential expressions, and thus, they tend to be disregarded in conventional differential expression analysis or network analysis. Meanwhile, the activity in dry-experiment rather than expression in wet-experiment have been proposed to effectively estimate the actual regulation power of such important biomolecules, e.g. transcriptional factors. But, it is still unknown what and how a hidden factor (e.g. phosphorylation) determines this kind of virtual regulation power as activity [1]. Especially, for the cancer development study, it is emergent to reconstruct the active protein interaction network and detect the underlying phosphorylation pattern in a dynamic manner [2-7]. Methods: Based on the c-Myc mouse model of liver cancer, we have first collected protein expression and protein phosphorylation data at several developmental time points. Then, we constructed a rough protein interaction network as background by conditional mutual information. Next, we improved the conventional network component analysis on its robustness, and used this advanced approach RNCA (Robust Network Component Analysis) to reconstruct the time-dependent protein interaction networks and estimate the activity of target protein at different times simultaneously. Finally, considering the different experiment-qualities of protein expression and phosphorylation data, we used canonical correlation analysis to detect the maximal correlation between the expression and phosphorylation of a group of proteins (e.g. protein network module), which could reveal the active protein sub-network and its determinate factor as phosphorylation. Results: In the preliminary study, we have evaluated the robustness of RNCA by comparing with other conventional methods. And on the real biological data, we have found the rewired protein interaction network during cancer development, its corresponding active proteins, and their drivers as protein phosphorylation. This work can be further used in early diagnosis of diseases by edge biomarkers [1-2], network biomarkers [3-4] and dynamical network biomarkers [5-7].","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127710963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Exploration of the Collaborative Networks for Clinical and Academic Domains in AIDS Research: A Spatial Scientometric Approach 艾滋病研究中临床和学术领域合作网络的探索:一种空间科学计量方法

Data and Text Mining in Bioinformatics

Pub Date : 2014-11-07 DOI: 10.1145/2665970.2665982

Y. Jeong, Dahee Lee, Min Song

This study investigates the world-wide collaborative networks from a geographical perspective based on clinical tests (CT) and academic researches (AR) on Acquired immune deficiency syndrome or acquired immunodeficiency syndrome (AIDS). By applying text mining technique on the AIDS related documents, we extract the spatial information and are able to discover co-location pairs for each type of research at two levels: national level and city level. Co-location networks for CT and AR are analyzed using network features, visualization, and highly-ranked betweenness centrality nodes. The analysis results reveal that the CT network is more densely compact with about twice as many nodes than the AR network. According to the analysis at the national level, the AR network is rather focused on the United States while the CT network is more spread out throughout the world. At the city level, the collaborative work is more active among closely located cities in the AR network compared to the case of the CT network (see Figure 1). The AR network has core collaboration centers mainly situated in the United States and Europe, but those of the CT network also includes Asian and African cities. Overall, our study intuitively points out the differences in the collaborative networks for CT and AR, which contributes to the understanding of the research trend involving the productivity analysis of the collaborative work associated with the regional aspect.

本研究以获得性免疫缺陷综合征或获得性免疫缺陷综合征(AIDS)的临床试验(CT)和学术研究(AR)为基础，从地理角度探讨了全球协作网络。利用文本挖掘技术对AIDS相关文献进行空间信息挖掘，在国家层面和城市层面发现各类研究的共地对。利用网络特征、可视化和高阶间中心性节点分析了CT和AR的共定位网络。分析结果表明，CT网络的节点数量是AR网络的两倍，密度更大。从国家层面的分析来看，AR网络主要集中在美国，而CT网络则更多地分布在全球。在城市层面，与CT网络相比，AR网络中距离较近的城市之间的协作工作更为活跃(见图1)。AR网络的核心协作中心主要位于美国和欧洲，但CT网络的核心协作中心也包括亚洲和非洲的城市。总体而言，我们的研究直观地指出了CT和AR协同网络的差异，这有助于理解涉及与区域方面相关的协同工作生产率分析的研究趋势。

{"title":"An Exploration of the Collaborative Networks for Clinical and Academic Domains in AIDS Research: A Spatial Scientometric Approach","authors":"Y. Jeong, Dahee Lee, Min Song","doi":"10.1145/2665970.2665982","DOIUrl":"https://doi.org/10.1145/2665970.2665982","url":null,"abstract":"This study investigates the world-wide collaborative networks from a geographical perspective based on clinical tests (CT) and academic researches (AR) on Acquired immune deficiency syndrome or acquired immunodeficiency syndrome (AIDS). By applying text mining technique on the AIDS related documents, we extract the spatial information and are able to discover co-location pairs for each type of research at two levels: national level and city level. Co-location networks for CT and AR are analyzed using network features, visualization, and highly-ranked betweenness centrality nodes. The analysis results reveal that the CT network is more densely compact with about twice as many nodes than the AR network. According to the analysis at the national level, the AR network is rather focused on the United States while the CT network is more spread out throughout the world. At the city level, the collaborative work is more active among closely located cities in the AR network compared to the case of the CT network (see Figure 1). The AR network has core collaboration centers mainly situated in the United States and Europe, but those of the CT network also includes Asian and African cities. Overall, our study intuitively points out the differences in the collaborative networks for CT and AR, which contributes to the understanding of the research trend involving the productivity analysis of the collaborative work associated with the regional aspect.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121401009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Visualization of Zoomable Network for Multi-Compounds and Multi-Targets Analysis 多化合物和多目标分析可缩放网络的可视化

Data and Text Mining in Bioinformatics

Pub Date : 2014-11-07 DOI: 10.1145/2665970.2665988

Jaesub Park, Jaeho Kim, Junseok Park, Sunghwa Bae, Hyungseok Kim, Doheon Lee

Recent explosively increased bio-data enable to simulate the metabolism on whole body scale and it bring about needs of bioinformatics tools for visualizing and analyzing it. For such tools zooming is a key method for visualizing large and complex network in a single view[1]. But unfortunately most of developed tools are too dependent on the elaborately constructed hierarchy to get zoom function. So we developed the zoomable visualization system for the large bio-molecule network without preformed hierarchy data.

近年来，生物数据的爆炸式增长使我们能够在整个身体范围内模拟新陈代谢，这就需要生物信息学工具来对新陈代谢进行可视化和分析。对于这些工具来说，缩放是在单一视图中可视化大型复杂网络的关键方法[1]。但不幸的是，大多数已开发的工具过于依赖于精心构建的层次结构，无法获得缩放功能。为此，我们开发了可缩放的大型生物分子网络可视化系统。

引用次数: 0

Inference of Disease E3s from Integrated Functional Relation Network 基于综合功能关系网络的疾病e3推断

Data and Text Mining in Bioinformatics

Pub Date : 2014-11-07 DOI: 10.1145/2665970.2665979

Bumki Min, G. Yi

Recently, the potential of E3 ligase as a therapeutic target is increasing. The systematic method to derive disease-related E3s can provide significant contribution for this demand. Several disease gene prediction methods have been introduced but it is hard to find E3 ligase-specific information from them. We have developed a unique approach to prioritize the disease relation of E3 by integrating E3-substrate relations and their neighboring network with known disease genes. The potential of our method is demonstrated by showing better performance against the previous methods to predict known disease relations of E3. We could discover 101 E3s and their functional network having 1,285 relations with diseases. Our method will provide new promising chances in drug target discovery field as well as disease mechanism study.

近年来，E3连接酶作为治疗靶点的潜力越来越大。系统地推导疾病相关e3的方法可以为这一需求提供重要贡献。目前已有几种疾病基因预测方法，但很难从中找到E3连接酶特异性信息。我们开发了一种独特的方法，通过将E3-底物关系及其邻近网络与已知的疾病基因整合，来优先考虑E3的疾病关系。与之前的方法相比，我们的方法在预测已知E3疾病关系方面表现出更好的性能，证明了我们的方法的潜力。我们可以发现101个E3s及其功能网络与疾病有1285个关系。该方法将为药物靶点发现和疾病机制研究提供新的前景。

引用次数: 1

TILD: A Strategy to Identify Cancer-related Genes Using Title Information in Literature Data TILD:利用文献数据中的标题信息识别癌症相关基因的策略

Data and Text Mining in Bioinformatics

Pub Date : 2014-11-07 DOI: 10.1145/2665970.2665992

Jeongwoo Kim, H. Kim, Yunku Yeu, Mincheol Shin, Sanghyun Park

After genome project in 1990s, researches which are involved with gene have been progressed. These studies unearthed that gene is cause of disease, and relations between gene and disease are important. In this reason, we proposed a strategy called TILD that identifies cancer-related genes using title information in literature data. To implement our method, we selected cancer-specific literature data from the online database. We then extracted genes using text mining. In the next step, we classified into two kinds for extracted genes using title information. If genes are located in title, then they are classified as hub genes. In the contrast, if genes are located in body, then they are classified as sub genes which are connected with hub genes. We iterated the processes for each paper to construct the cancer-specific local gene network. In the last step, we constructed global cancer-specific gene network by integrating all local gene network, and calculated a score for each gene based on analysis of the global gene network. We assumed that genes in title have meaningful relations with cancer, and other genes in the body are related with the title genes. For validation, we compared with other methods for the top 20 genes inferred by each approach. Our approach found more cancer-related genes than comparable methods.

20世纪90年代基因组计划后，涉及基因的研究有了新的进展。这些研究揭示了基因是疾病的原因，基因与疾病之间的关系是重要的。因此，我们提出了一种名为TILD的策略，利用文献数据中的标题信息识别癌症相关基因。为了实现我们的方法，我们从在线数据库中选择了癌症特异性文献数据。然后我们使用文本挖掘提取基因。在接下来的步骤中，我们使用标题信息将提取的基因分为两类。如果基因位于标题中，则将其分类为枢纽基因。相反，如果基因位于体内，则将其归类为亚基因，亚基因与枢纽基因相连。我们为每篇论文重复了构建癌症特异性局部基因网络的过程。最后一步，我们通过整合所有局部基因网络构建全球癌症特异性基因网络，并在分析全球基因网络的基础上计算每个基因的得分。我们假设标题中的基因与癌症有意义的关系，而体内的其他基因也与标题基因有关。为了验证，我们将每种方法推断的前20个基因与其他方法进行了比较。我们的方法比同类方法发现了更多的癌症相关基因。

{"title":"TILD: A Strategy to Identify Cancer-related Genes Using Title Information in Literature Data","authors":"Jeongwoo Kim, H. Kim, Yunku Yeu, Mincheol Shin, Sanghyun Park","doi":"10.1145/2665970.2665992","DOIUrl":"https://doi.org/10.1145/2665970.2665992","url":null,"abstract":"After genome project in 1990s, researches which are involved with gene have been progressed. These studies unearthed that gene is cause of disease, and relations between gene and disease are important. In this reason, we proposed a strategy called TILD that identifies cancer-related genes using title information in literature data. To implement our method, we selected cancer-specific literature data from the online database. We then extracted genes using text mining. In the next step, we classified into two kinds for extracted genes using title information. If genes are located in title, then they are classified as hub genes. In the contrast, if genes are located in body, then they are classified as sub genes which are connected with hub genes. We iterated the processes for each paper to construct the cancer-specific local gene network. In the last step, we constructed global cancer-specific gene network by integrating all local gene network, and calculated a score for each gene based on analysis of the global gene network. We assumed that genes in title have meaningful relations with cancer, and other genes in the body are related with the title genes. For validation, we compared with other methods for the top 20 genes inferred by each approach. Our approach found more cancer-related genes than comparable methods.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116457778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Identification of Coexpressed Gene Modules across Multiple Brain Diseases by a Biclustering Analysis on Integrated Gene Expression Data 通过整合基因表达数据的双聚类分析鉴定多种脑部疾病共表达基因模块

Data and Text Mining in Bioinformatics

Pub Date : 2014-11-07 DOI: 10.1145/2665970.2665978

Kihoon Cha, Kimin Oh, Taeho Hwang, G. Yi

It has been reported that several brain diseases could share symptoms at clinical level, suggesting the necessity and possibility to develop therapeutics. In this paper, we carried out an integrated gene expression analysis on several microarray datasets of neurodegenerative diseases and psychiatric disorders to discover the uniqueness and commonness in their molecular basis. First, we selected and combined three sets of microarray data including eight brain diseases. Second, we applied a correlation-based biclustering approach, BICLIC [1], to efficiently identify coexpressed gene modules that are correlated in individual or multiple combinations of brain diseases. Third, Gene ontology-based functional enrichment analysis is performed to analyze functional characteristics of the identified cross-disease or and disease-specific modules. In this approach, we could examine various sets of correlated genes significantly in both single and multiple diseases. As a result, in total, 4,307 coexpressed gene modules were turned out to be common to two or more of brain diseases. Among them, eight modules having different combinations of total 16 genes were involved correlatively in more than seven brain diseases. The functional analysis showed that the multi-disease specific modules were more associated to higher brain functions like cognitive functions than single disease specific modules. The results in this study provide valuable resources to further investigate the key molecular players affecting on brain diseases in both transnosological or disease specific manner.

据报道，几种脑疾病在临床水平上可能具有相同的症状，这表明开发治疗方法的必要性和可能性。在本文中，我们对神经退行性疾病和精神疾病的几个微阵列数据集进行了整合基因表达分析，以发现其分子基础的独特性和共性。首先，我们选择并组合了三组包括八种脑部疾病的微阵列数据。其次，我们应用了一种基于相关性的双聚类方法BICLIC[1]，以有效地识别在个体或多种脑部疾病组合中相关的共表达基因模块。第三，进行基于基因本体的功能富集分析，分析鉴定出的跨疾病或疾病特异性模块的功能特征。在这种方法中，我们可以在单一和多种疾病中检测各种相关基因。结果，总共有4307个共表达的基因模块被证明是两种或两种以上脑部疾病的共同基因。其中，共有16个基因的不同组合的8个模块与7种以上的脑部疾病相关。功能分析表明，与单一疾病特定模块相比，多疾病特定模块与认知功能等高级脑功能的关联更大。本研究的结果为进一步研究影响脑疾病的关键分子提供了宝贵的资源，无论是在transnoology还是疾病特异性方面。

{"title":"Identification of Coexpressed Gene Modules across Multiple Brain Diseases by a Biclustering Analysis on Integrated Gene Expression Data","authors":"Kihoon Cha, Kimin Oh, Taeho Hwang, G. Yi","doi":"10.1145/2665970.2665978","DOIUrl":"https://doi.org/10.1145/2665970.2665978","url":null,"abstract":"It has been reported that several brain diseases could share symptoms at clinical level, suggesting the necessity and possibility to develop therapeutics. In this paper, we carried out an integrated gene expression analysis on several microarray datasets of neurodegenerative diseases and psychiatric disorders to discover the uniqueness and commonness in their molecular basis. First, we selected and combined three sets of microarray data including eight brain diseases. Second, we applied a correlation-based biclustering approach, BICLIC [1], to efficiently identify coexpressed gene modules that are correlated in individual or multiple combinations of brain diseases. Third, Gene ontology-based functional enrichment analysis is performed to analyze functional characteristics of the identified cross-disease or and disease-specific modules. In this approach, we could examine various sets of correlated genes significantly in both single and multiple diseases. As a result, in total, 4,307 coexpressed gene modules were turned out to be common to two or more of brain diseases. Among them, eight modules having different combinations of total 16 genes were involved correlatively in more than seven brain diseases. The functional analysis showed that the multi-disease specific modules were more associated to higher brain functions like cognitive functions than single disease specific modules. The results in this study provide valuable resources to further investigate the key molecular players affecting on brain diseases in both transnosological or disease specific manner.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134278638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Discriminatory Analysis of Alzheimer's Disease through pathway Activity inference in the Resting-State brain 静息状态脑通路活动推断对阿尔茨海默病的鉴别分析

Data and Text Mining in Bioinformatics

Pub Date : 2014-11-07 DOI: 10.1145/2665970.2665971

Jongan Lee, Younghoon Kim, Y. Jeong, D. Na, Jong-Won Kim, Kwang-H. Lee, Doheon Lee

Duk L. Na Dept. of Neurology, Samsung Medical Ctr. Sungkyunkwan Univ. School of Medicine Seoul, South Korea dukna@skku.edu Jong-Won Kim Dept. of Laboratory Medicine and Genetics Samsung Medical Ctr. Sungkyunkwan Univ. School of Medicine Seoul, South Korea kimjw@skku.edu Kwang-Hyung Lee Dept. of Bio and Brain Engineering KAIST Daejeon, South Korea khlee@kaist.ac.kr Doheon Lee Dept. of Bio and Brain Engineering KAIST Daejeon, South Korea dhlee@kaist.ac.kr

Duk L. Na三星首尔医院神经内科韩国成均馆大学医学院汉城dukna@skku.edu金钟元三星医疗中心检验医学与遗传学学系韩国首尔成均馆大学医学院kimjw@skku.edu韩国大田KAIST生物与脑工程系khlee@kaist.ac.kr韩国大田KAIST生物与脑工程系Doheon Lee dhlee@kaist.ac.kr

引用次数: 1

Identification of a Specific Base Sequence of Pathogenic E. Coli through a Genomic Analysis 致病性大肠杆菌特定碱基序列的基因组分析鉴定

Data and Text Mining in Bioinformatics

Pub Date : 2014-11-07 DOI: 10.1145/2665970.2665981

Soobok Joe, Hojung Nam

E. coli sequence type 131 (ST131) is one of pathogens that causes resistant infections. Comparative genome analyses allow interpretations of the virulence factors of pathogens. Thus, in this study, we analysis the genomic differences between the pathogenic E. coli ST131 and the non-pathogenic E. coli K-12. In this study, we identify the genomic differences between 96 E. coli ST131 strains and the E. coli K-12 in gene elements and their non-coding regulation elements. Using next-generation whole-genome sequencing data, we investigated genetic variations of protein-coding regions and their regulation regions. After the alignment of the sequence reads, large numbers of single nucleotide variants (SNVs) were observed in the regulation and protein-coding sequences. In the regulation regions, we found strong conserved regions, in this case, ribosome binding sites. In the gene regions, we found conserved start and stop codons with the specific position varying commonly in each codon. Except for these well-conserved regions, other variations were randomly distributed in regulation regions. Even a region having well-known conserved sequences such as -10 and -35 in the promoter had a similar level of variation. In this study, we found genomic variations between the pathogenic E. coli ST 131 strain and the non-pathogenic E. coli K-12. In addition, the numbers of sequence variations were determined in both the protein-coding regions and the regulation regions. However, we found that the effects of variations on the protein-coding regions are less significant than those on the regulation regions.

大肠杆菌序列131型(ST131)是引起耐药感染的病原体之一。比较基因组分析允许解释病原体的毒力因素。因此，在本研究中，我们分析了致病性大肠杆菌ST131和非致病性大肠杆菌K-12之间的基因组差异。在本研究中，我们鉴定了96株大肠杆菌ST131与大肠杆菌K-12在基因元件及其非编码调控元件上的基因组差异。利用下一代全基因组测序数据，我们研究了蛋白质编码区及其调控区的遗传变异。序列reads比对后，在调控序列和蛋白编码序列中发现了大量的单核苷酸变异(snv)。在调控区域，我们发现了强保守区域，在这种情况下，核糖体结合位点。在基因区域，我们发现了保守的启动和停止密码子，每个密码子的具体位置普遍不同。除了这些保守性良好的区域外，其他变异随机分布在调控区域。甚至在启动子中具有众所周知的保守序列(如-10和-35)的区域也有类似水平的变异。在这项研究中，我们发现致病性大肠杆菌ST 131菌株和非致病性大肠杆菌K-12之间存在基因组差异。此外，还测定了蛋白编码区和调控区序列变异的数量。然而，我们发现变异对蛋白质编码区域的影响不如对调控区域的影响显著。

{"title":"Identification of a Specific Base Sequence of Pathogenic E. Coli through a Genomic Analysis","authors":"Soobok Joe, Hojung Nam","doi":"10.1145/2665970.2665981","DOIUrl":"https://doi.org/10.1145/2665970.2665981","url":null,"abstract":"E. coli sequence type 131 (ST131) is one of pathogens that causes resistant infections. Comparative genome analyses allow interpretations of the virulence factors of pathogens. Thus, in this study, we analysis the genomic differences between the pathogenic E. coli ST131 and the non-pathogenic E. coli K-12. In this study, we identify the genomic differences between 96 E. coli ST131 strains and the E. coli K-12 in gene elements and their non-coding regulation elements. Using next-generation whole-genome sequencing data, we investigated genetic variations of protein-coding regions and their regulation regions. After the alignment of the sequence reads, large numbers of single nucleotide variants (SNVs) were observed in the regulation and protein-coding sequences. In the regulation regions, we found strong conserved regions, in this case, ribosome binding sites. In the gene regions, we found conserved start and stop codons with the specific position varying commonly in each codon. Except for these well-conserved regions, other variations were randomly distributed in regulation regions. Even a region having well-known conserved sequences such as -10 and -35 in the promoter had a similar level of variation. In this study, we found genomic variations between the pathogenic E. coli ST 131 strain and the non-pathogenic E. coli K-12. In addition, the numbers of sequence variations were determined in both the protein-coding regions and the regulation regions. However, we found that the effects of variations on the protein-coding regions are less significant than those on the regulation regions.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121448894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Mining the Main Health Trend of the General Public based on Opinion Mining of Korean Blogsphere 基于韩国博客舆论挖掘的大众健康主流趋势挖掘

Data and Text Mining in Bioinformatics

Pub Date : 2014-11-07 DOI: 10.1145/2665970.2665985

Yong-il Lee, Sang-Hyob Nam, Jaeseung Jeong

These days, social media usually becomes a reasonable standard for understanding the public's thought. Especially, people increasingly use internet media and SNS (twitter, facebook, blog, and etc.), to share opinions, news, advice, interests, moods, concerns, critics, facts, rumors, and everything. Therefore, public health research has been started a big change. Traditional public health study has depended on only regular clinical reports by health professionals. It is limited to practical use and general public has much difficulty to understand health information, even if it's his/her own information. Nowadays, over one billion people publish their ideas about many topics, including health conditions minute by minute. SNS provides researchers the freshest source of public health conditions on a global scale. Much of that data is public and available for mining. So this article pursues making an application of opinion mining for detecting the public's trend and finding valuable opinion among the massive information. The core of this research is analyzing the adjective of opinions. Our assumption is that many adjective expressions implicate deep and sincere meaning of its author. It is applicable for both low value postings filtering and tracking high value postings simultaneously. This approach is a simple and feasible criteria. The opinion mining process includes Korean morpheme analysis, opinion extraction, opinion tagging, positive / negative score evaluation. Our research's aim is to analyze Korean blog postings.

如今，社交媒体通常成为理解公众思想的合理标准。特别是，人们越来越多地使用网络媒体和SNS (twitter、facebook、博客等)来分享观点、新闻、建议、兴趣、情绪、关注、批评、事实、谣言等等。因此，公共卫生研究已经开始了一个大的变化。传统的公共卫生研究仅依赖于卫生专业人员的定期临床报告。它仅限于实际使用，一般公众很难理解健康信息，即使是他/她自己的信息。如今，每分钟都有超过10亿人发表他们对许多话题的看法，包括健康状况。SNS为研究人员提供了全球范围内公共卫生状况的最新来源。其中大部分数据都是公开的，可供挖掘。因此，本文试图将意见挖掘应用于在海量信息中发现公众的动向和有价值的意见。本研究的核心是对意见形容词的分析。我们的假设是，许多形容词表达都隐含着作者深刻而真诚的意思。它既适用于低价值的帖子过滤，也适用于高价值的帖子跟踪。这种方法是一种简单可行的准则。意见挖掘过程包括韩语语素分析、意见提取、意见标注、正面/负面评分评价。我们的研究目的是分析韩国的博客文章。

{"title":"Mining the Main Health Trend of the General Public based on Opinion Mining of Korean Blogsphere","authors":"Yong-il Lee, Sang-Hyob Nam, Jaeseung Jeong","doi":"10.1145/2665970.2665985","DOIUrl":"https://doi.org/10.1145/2665970.2665985","url":null,"abstract":"These days, social media usually becomes a reasonable standard for understanding the public's thought. Especially, people increasingly use internet media and SNS (twitter, facebook, blog, and etc.), to share opinions, news, advice, interests, moods, concerns, critics, facts, rumors, and everything. Therefore, public health research has been started a big change. Traditional public health study has depended on only regular clinical reports by health professionals. It is limited to practical use and general public has much difficulty to understand health information, even if it's his/her own information. Nowadays, over one billion people publish their ideas about many topics, including health conditions minute by minute. SNS provides researchers the freshest source of public health conditions on a global scale. Much of that data is public and available for mining. So this article pursues making an application of opinion mining for detecting the public's trend and finding valuable opinion among the massive information. The core of this research is analyzing the adjective of opinions. Our assumption is that many adjective expressions implicate deep and sincere meaning of its author. It is applicable for both low value postings filtering and tracking high value postings simultaneously. This approach is a simple and feasible criteria. The opinion mining process includes Korean morpheme analysis, opinion extraction, opinion tagging, positive / negative score evaluation. Our research's aim is to analyze Korean blog postings.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125449567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1