Due to the recent development of Information Technology, the number of publications is increasing exponentially. In response to the increasing number of publications, there has been a sharp surge in the demand for replacing the existing manual text data processing by an automatic text data processing. Swanson proposed ABC model [1] on the top of text mining as a part of literature-based knowledge discovery for finding new possible biomedical hypotheses about three decades ago. The following clinical scholars proved the effectiveness of the possible hypotheses found by ABC model [2]. Such effectiveness let scholars try various literature-based knowledge discovery approaches [3, 4, 5]. However, their trials are not fully automated but hybrids of automatic and manual processes. The manual process requires the intervention of experts. In addition, their trials consider a single perspective. Even trials involving network theory have difficulties in mal-understanding the entire network structure of the relationships among concepts and the systematic interpretation on the structure [6, 7]. Thus, this study proposes a novel approach to discover various relationships by extending the intermediate concept B to a multi-leveled concept. By applying a graph-based path finding method based on co-occurrence and the relational entities among concepts, we attempt to systematically analyze and investigate the relationships between two concepts of a source node and a target node in the total paths. For the analysis of our study, we set our baseline as the result of Swanson [8]'s work. This work suggested the intermediate concept or terms between Raynaud's disease and fish oils as blood viscosity, platelet aggregability, and vasconstriction. We compared our results of intermediate concepts with these intermediate concepts of Swanson's. This study provides distinct perspectives for literature-based discovery by not only discovering the meaningful relationship among concepts in biomedical literature through graph-based path interference but also being able to generate feasible new hypotheses.
{"title":"Inferring Undiscovered Public Knowledge by Using Text Mining-driven Graph Model","authors":"G. Heo, Keeheon Lee, Min Song","doi":"10.1145/2665970.2665984","DOIUrl":"https://doi.org/10.1145/2665970.2665984","url":null,"abstract":"Due to the recent development of Information Technology, the number of publications is increasing exponentially. In response to the increasing number of publications, there has been a sharp surge in the demand for replacing the existing manual text data processing by an automatic text data processing. Swanson proposed ABC model [1] on the top of text mining as a part of literature-based knowledge discovery for finding new possible biomedical hypotheses about three decades ago. The following clinical scholars proved the effectiveness of the possible hypotheses found by ABC model [2]. Such effectiveness let scholars try various literature-based knowledge discovery approaches [3, 4, 5]. However, their trials are not fully automated but hybrids of automatic and manual processes. The manual process requires the intervention of experts. In addition, their trials consider a single perspective. Even trials involving network theory have difficulties in mal-understanding the entire network structure of the relationships among concepts and the systematic interpretation on the structure [6, 7]. Thus, this study proposes a novel approach to discover various relationships by extending the intermediate concept B to a multi-leveled concept. By applying a graph-based path finding method based on co-occurrence and the relational entities among concepts, we attempt to systematically analyze and investigate the relationships between two concepts of a source node and a target node in the total paths. For the analysis of our study, we set our baseline as the result of Swanson [8]'s work. This work suggested the intermediate concept or terms between Raynaud's disease and fish oils as blood viscosity, platelet aggregability, and vasconstriction. We compared our results of intermediate concepts with these intermediate concepts of Swanson's. This study provides distinct perspectives for literature-based discovery by not only discovering the meaningful relationship among concepts in biomedical literature through graph-based path interference but also being able to generate feasible new hypotheses.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130530143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motivation: In recent disease study, many key pathogen genes/proteins are found to have not significant differential expressions, and thus, they tend to be disregarded in conventional differential expression analysis or network analysis. Meanwhile, the activity in dry-experiment rather than expression in wet-experiment have been proposed to effectively estimate the actual regulation power of such important biomolecules, e.g. transcriptional factors. But, it is still unknown what and how a hidden factor (e.g. phosphorylation) determines this kind of virtual regulation power as activity [1]. Especially, for the cancer development study, it is emergent to reconstruct the active protein interaction network and detect the underlying phosphorylation pattern in a dynamic manner [2-7]. Methods: Based on the c-Myc mouse model of liver cancer, we have first collected protein expression and protein phosphorylation data at several developmental time points. Then, we constructed a rough protein interaction network as background by conditional mutual information. Next, we improved the conventional network component analysis on its robustness, and used this advanced approach RNCA (Robust Network Component Analysis) to reconstruct the time-dependent protein interaction networks and estimate the activity of target protein at different times simultaneously. Finally, considering the different experiment-qualities of protein expression and phosphorylation data, we used canonical correlation analysis to detect the maximal correlation between the expression and phosphorylation of a group of proteins (e.g. protein network module), which could reveal the active protein sub-network and its determinate factor as phosphorylation. Results: In the preliminary study, we have evaluated the robustness of RNCA by comparing with other conventional methods. And on the real biological data, we have found the rewired protein interaction network during cancer development, its corresponding active proteins, and their drivers as protein phosphorylation. This work can be further used in early diagnosis of diseases by edge biomarkers [1-2], network biomarkers [3-4] and dynamical network biomarkers [5-7].
{"title":"Detecting Phosphorylation Determined Active Protein Interaction Network during Cancer Development by Robust Network Component Analysis","authors":"T. Zeng, Ziming Wang, Luonan Chen","doi":"10.1145/2665970.2665991","DOIUrl":"https://doi.org/10.1145/2665970.2665991","url":null,"abstract":"Motivation: In recent disease study, many key pathogen genes/proteins are found to have not significant differential expressions, and thus, they tend to be disregarded in conventional differential expression analysis or network analysis. Meanwhile, the activity in dry-experiment rather than expression in wet-experiment have been proposed to effectively estimate the actual regulation power of such important biomolecules, e.g. transcriptional factors. But, it is still unknown what and how a hidden factor (e.g. phosphorylation) determines this kind of virtual regulation power as activity [1]. Especially, for the cancer development study, it is emergent to reconstruct the active protein interaction network and detect the underlying phosphorylation pattern in a dynamic manner [2-7]. Methods: Based on the c-Myc mouse model of liver cancer, we have first collected protein expression and protein phosphorylation data at several developmental time points. Then, we constructed a rough protein interaction network as background by conditional mutual information. Next, we improved the conventional network component analysis on its robustness, and used this advanced approach RNCA (Robust Network Component Analysis) to reconstruct the time-dependent protein interaction networks and estimate the activity of target protein at different times simultaneously. Finally, considering the different experiment-qualities of protein expression and phosphorylation data, we used canonical correlation analysis to detect the maximal correlation between the expression and phosphorylation of a group of proteins (e.g. protein network module), which could reveal the active protein sub-network and its determinate factor as phosphorylation. Results: In the preliminary study, we have evaluated the robustness of RNCA by comparing with other conventional methods. And on the real biological data, we have found the rewired protein interaction network during cancer development, its corresponding active proteins, and their drivers as protein phosphorylation. This work can be further used in early diagnosis of diseases by edge biomarkers [1-2], network biomarkers [3-4] and dynamical network biomarkers [5-7].","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127710963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study investigates the world-wide collaborative networks from a geographical perspective based on clinical tests (CT) and academic researches (AR) on Acquired immune deficiency syndrome or acquired immunodeficiency syndrome (AIDS). By applying text mining technique on the AIDS related documents, we extract the spatial information and are able to discover co-location pairs for each type of research at two levels: national level and city level. Co-location networks for CT and AR are analyzed using network features, visualization, and highly-ranked betweenness centrality nodes. The analysis results reveal that the CT network is more densely compact with about twice as many nodes than the AR network. According to the analysis at the national level, the AR network is rather focused on the United States while the CT network is more spread out throughout the world. At the city level, the collaborative work is more active among closely located cities in the AR network compared to the case of the CT network (see Figure 1). The AR network has core collaboration centers mainly situated in the United States and Europe, but those of the CT network also includes Asian and African cities. Overall, our study intuitively points out the differences in the collaborative networks for CT and AR, which contributes to the understanding of the research trend involving the productivity analysis of the collaborative work associated with the regional aspect.
{"title":"An Exploration of the Collaborative Networks for Clinical and Academic Domains in AIDS Research: A Spatial Scientometric Approach","authors":"Y. Jeong, Dahee Lee, Min Song","doi":"10.1145/2665970.2665982","DOIUrl":"https://doi.org/10.1145/2665970.2665982","url":null,"abstract":"This study investigates the world-wide collaborative networks from a geographical perspective based on clinical tests (CT) and academic researches (AR) on Acquired immune deficiency syndrome or acquired immunodeficiency syndrome (AIDS). By applying text mining technique on the AIDS related documents, we extract the spatial information and are able to discover co-location pairs for each type of research at two levels: national level and city level. Co-location networks for CT and AR are analyzed using network features, visualization, and highly-ranked betweenness centrality nodes. The analysis results reveal that the CT network is more densely compact with about twice as many nodes than the AR network. According to the analysis at the national level, the AR network is rather focused on the United States while the CT network is more spread out throughout the world. At the city level, the collaborative work is more active among closely located cities in the AR network compared to the case of the CT network (see Figure 1). The AR network has core collaboration centers mainly situated in the United States and Europe, but those of the CT network also includes Asian and African cities. Overall, our study intuitively points out the differences in the collaborative networks for CT and AR, which contributes to the understanding of the research trend involving the productivity analysis of the collaborative work associated with the regional aspect.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121401009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, the potential of E3 ligase as a therapeutic target is increasing. The systematic method to derive disease-related E3s can provide significant contribution for this demand. Several disease gene prediction methods have been introduced but it is hard to find E3 ligase-specific information from them. We have developed a unique approach to prioritize the disease relation of E3 by integrating E3-substrate relations and their neighboring network with known disease genes. The potential of our method is demonstrated by showing better performance against the previous methods to predict known disease relations of E3. We could discover 101 E3s and their functional network having 1,285 relations with diseases. Our method will provide new promising chances in drug target discovery field as well as disease mechanism study.
{"title":"Inference of Disease E3s from Integrated Functional Relation Network","authors":"Bumki Min, G. Yi","doi":"10.1145/2665970.2665979","DOIUrl":"https://doi.org/10.1145/2665970.2665979","url":null,"abstract":"Recently, the potential of E3 ligase as a therapeutic target is increasing. The systematic method to derive disease-related E3s can provide significant contribution for this demand. Several disease gene prediction methods have been introduced but it is hard to find E3 ligase-specific information from them. We have developed a unique approach to prioritize the disease relation of E3 by integrating E3-substrate relations and their neighboring network with known disease genes. The potential of our method is demonstrated by showing better performance against the previous methods to predict known disease relations of E3. We could discover 101 E3s and their functional network having 1,285 relations with diseases. Our method will provide new promising chances in drug target discovery field as well as disease mechanism study.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127337664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jaesub Park, Jaeho Kim, Junseok Park, Sunghwa Bae, Hyungseok Kim, Doheon Lee
Recent explosively increased bio-data enable to simulate the metabolism on whole body scale and it bring about needs of bioinformatics tools for visualizing and analyzing it. For such tools zooming is a key method for visualizing large and complex network in a single view[1]. But unfortunately most of developed tools are too dependent on the elaborately constructed hierarchy to get zoom function. So we developed the zoomable visualization system for the large bio-molecule network without preformed hierarchy data.
{"title":"Visualization of Zoomable Network for Multi-Compounds and Multi-Targets Analysis","authors":"Jaesub Park, Jaeho Kim, Junseok Park, Sunghwa Bae, Hyungseok Kim, Doheon Lee","doi":"10.1145/2665970.2665988","DOIUrl":"https://doi.org/10.1145/2665970.2665988","url":null,"abstract":"Recent explosively increased bio-data enable to simulate the metabolism on whole body scale and it bring about needs of bioinformatics tools for visualizing and analyzing it. For such tools zooming is a key method for visualizing large and complex network in a single view[1]. But unfortunately most of developed tools are too dependent on the elaborately constructed hierarchy to get zoom function. So we developed the zoomable visualization system for the large bio-molecule network without preformed hierarchy data.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132572832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jeongwoo Kim, H. Kim, Yunku Yeu, Mincheol Shin, Sanghyun Park
After genome project in 1990s, researches which are involved with gene have been progressed. These studies unearthed that gene is cause of disease, and relations between gene and disease are important. In this reason, we proposed a strategy called TILD that identifies cancer-related genes using title information in literature data. To implement our method, we selected cancer-specific literature data from the online database. We then extracted genes using text mining. In the next step, we classified into two kinds for extracted genes using title information. If genes are located in title, then they are classified as hub genes. In the contrast, if genes are located in body, then they are classified as sub genes which are connected with hub genes. We iterated the processes for each paper to construct the cancer-specific local gene network. In the last step, we constructed global cancer-specific gene network by integrating all local gene network, and calculated a score for each gene based on analysis of the global gene network. We assumed that genes in title have meaningful relations with cancer, and other genes in the body are related with the title genes. For validation, we compared with other methods for the top 20 genes inferred by each approach. Our approach found more cancer-related genes than comparable methods.
{"title":"TILD: A Strategy to Identify Cancer-related Genes Using Title Information in Literature Data","authors":"Jeongwoo Kim, H. Kim, Yunku Yeu, Mincheol Shin, Sanghyun Park","doi":"10.1145/2665970.2665992","DOIUrl":"https://doi.org/10.1145/2665970.2665992","url":null,"abstract":"After genome project in 1990s, researches which are involved with gene have been progressed. These studies unearthed that gene is cause of disease, and relations between gene and disease are important. In this reason, we proposed a strategy called TILD that identifies cancer-related genes using title information in literature data. To implement our method, we selected cancer-specific literature data from the online database. We then extracted genes using text mining. In the next step, we classified into two kinds for extracted genes using title information. If genes are located in title, then they are classified as hub genes. In the contrast, if genes are located in body, then they are classified as sub genes which are connected with hub genes. We iterated the processes for each paper to construct the cancer-specific local gene network. In the last step, we constructed global cancer-specific gene network by integrating all local gene network, and calculated a score for each gene based on analysis of the global gene network. We assumed that genes in title have meaningful relations with cancer, and other genes in the body are related with the title genes. For validation, we compared with other methods for the top 20 genes inferred by each approach. Our approach found more cancer-related genes than comparable methods.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116457778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It has been reported that several brain diseases could share symptoms at clinical level, suggesting the necessity and possibility to develop therapeutics. In this paper, we carried out an integrated gene expression analysis on several microarray datasets of neurodegenerative diseases and psychiatric disorders to discover the uniqueness and commonness in their molecular basis. First, we selected and combined three sets of microarray data including eight brain diseases. Second, we applied a correlation-based biclustering approach, BICLIC [1], to efficiently identify coexpressed gene modules that are correlated in individual or multiple combinations of brain diseases. Third, Gene ontology-based functional enrichment analysis is performed to analyze functional characteristics of the identified cross-disease or and disease-specific modules. In this approach, we could examine various sets of correlated genes significantly in both single and multiple diseases. As a result, in total, 4,307 coexpressed gene modules were turned out to be common to two or more of brain diseases. Among them, eight modules having different combinations of total 16 genes were involved correlatively in more than seven brain diseases. The functional analysis showed that the multi-disease specific modules were more associated to higher brain functions like cognitive functions than single disease specific modules. The results in this study provide valuable resources to further investigate the key molecular players affecting on brain diseases in both transnosological or disease specific manner.
{"title":"Identification of Coexpressed Gene Modules across Multiple Brain Diseases by a Biclustering Analysis on Integrated Gene Expression Data","authors":"Kihoon Cha, Kimin Oh, Taeho Hwang, G. Yi","doi":"10.1145/2665970.2665978","DOIUrl":"https://doi.org/10.1145/2665970.2665978","url":null,"abstract":"It has been reported that several brain diseases could share symptoms at clinical level, suggesting the necessity and possibility to develop therapeutics. In this paper, we carried out an integrated gene expression analysis on several microarray datasets of neurodegenerative diseases and psychiatric disorders to discover the uniqueness and commonness in their molecular basis. First, we selected and combined three sets of microarray data including eight brain diseases. Second, we applied a correlation-based biclustering approach, BICLIC [1], to efficiently identify coexpressed gene modules that are correlated in individual or multiple combinations of brain diseases. Third, Gene ontology-based functional enrichment analysis is performed to analyze functional characteristics of the identified cross-disease or and disease-specific modules. In this approach, we could examine various sets of correlated genes significantly in both single and multiple diseases. As a result, in total, 4,307 coexpressed gene modules were turned out to be common to two or more of brain diseases. Among them, eight modules having different combinations of total 16 genes were involved correlatively in more than seven brain diseases. The functional analysis showed that the multi-disease specific modules were more associated to higher brain functions like cognitive functions than single disease specific modules. The results in this study provide valuable resources to further investigate the key molecular players affecting on brain diseases in both transnosological or disease specific manner.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134278638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jongan Lee, Younghoon Kim, Y. Jeong, D. Na, Jong-Won Kim, Kwang-H. Lee, Doheon Lee
Duk L. Na Dept. of Neurology, Samsung Medical Ctr. Sungkyunkwan Univ. School of Medicine Seoul, South Korea dukna@skku.edu Jong-Won Kim Dept. of Laboratory Medicine and Genetics Samsung Medical Ctr. Sungkyunkwan Univ. School of Medicine Seoul, South Korea kimjw@skku.edu Kwang-Hyung Lee Dept. of Bio and Brain Engineering KAIST Daejeon, South Korea khlee@kaist.ac.kr Doheon Lee Dept. of Bio and Brain Engineering KAIST Daejeon, South Korea dhlee@kaist.ac.kr
Duk L. Na三星首尔医院神经内科韩国成均馆大学医学院汉城dukna@skku.edu金钟元三星医疗中心检验医学与遗传学学系韩国首尔成均馆大学医学院kimjw@skku.edu韩国大田KAIST生物与脑工程系khlee@kaist.ac.kr韩国大田KAIST生物与脑工程系Doheon Lee dhlee@kaist.ac.kr
{"title":"Discriminatory Analysis of Alzheimer's Disease through pathway Activity inference in the Resting-State brain","authors":"Jongan Lee, Younghoon Kim, Y. Jeong, D. Na, Jong-Won Kim, Kwang-H. Lee, Doheon Lee","doi":"10.1145/2665970.2665971","DOIUrl":"https://doi.org/10.1145/2665970.2665971","url":null,"abstract":"Duk L. Na Dept. of Neurology, Samsung Medical Ctr. Sungkyunkwan Univ. School of Medicine Seoul, South Korea dukna@skku.edu Jong-Won Kim Dept. of Laboratory Medicine and Genetics Samsung Medical Ctr. Sungkyunkwan Univ. School of Medicine Seoul, South Korea kimjw@skku.edu Kwang-Hyung Lee Dept. of Bio and Brain Engineering KAIST Daejeon, South Korea khlee@kaist.ac.kr Doheon Lee Dept. of Bio and Brain Engineering KAIST Daejeon, South Korea dhlee@kaist.ac.kr","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"518 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133661745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. coli sequence type 131 (ST131) is one of pathogens that causes resistant infections. Comparative genome analyses allow interpretations of the virulence factors of pathogens. Thus, in this study, we analysis the genomic differences between the pathogenic E. coli ST131 and the non-pathogenic E. coli K-12. In this study, we identify the genomic differences between 96 E. coli ST131 strains and the E. coli K-12 in gene elements and their non-coding regulation elements. Using next-generation whole-genome sequencing data, we investigated genetic variations of protein-coding regions and their regulation regions. After the alignment of the sequence reads, large numbers of single nucleotide variants (SNVs) were observed in the regulation and protein-coding sequences. In the regulation regions, we found strong conserved regions, in this case, ribosome binding sites. In the gene regions, we found conserved start and stop codons with the specific position varying commonly in each codon. Except for these well-conserved regions, other variations were randomly distributed in regulation regions. Even a region having well-known conserved sequences such as -10 and -35 in the promoter had a similar level of variation. In this study, we found genomic variations between the pathogenic E. coli ST 131 strain and the non-pathogenic E. coli K-12. In addition, the numbers of sequence variations were determined in both the protein-coding regions and the regulation regions. However, we found that the effects of variations on the protein-coding regions are less significant than those on the regulation regions.
{"title":"Identification of a Specific Base Sequence of Pathogenic E. Coli through a Genomic Analysis","authors":"Soobok Joe, Hojung Nam","doi":"10.1145/2665970.2665981","DOIUrl":"https://doi.org/10.1145/2665970.2665981","url":null,"abstract":"E. coli sequence type 131 (ST131) is one of pathogens that causes resistant infections. Comparative genome analyses allow interpretations of the virulence factors of pathogens. Thus, in this study, we analysis the genomic differences between the pathogenic E. coli ST131 and the non-pathogenic E. coli K-12. In this study, we identify the genomic differences between 96 E. coli ST131 strains and the E. coli K-12 in gene elements and their non-coding regulation elements. Using next-generation whole-genome sequencing data, we investigated genetic variations of protein-coding regions and their regulation regions. After the alignment of the sequence reads, large numbers of single nucleotide variants (SNVs) were observed in the regulation and protein-coding sequences. In the regulation regions, we found strong conserved regions, in this case, ribosome binding sites. In the gene regions, we found conserved start and stop codons with the specific position varying commonly in each codon. Except for these well-conserved regions, other variations were randomly distributed in regulation regions. Even a region having well-known conserved sequences such as -10 and -35 in the promoter had a similar level of variation. In this study, we found genomic variations between the pathogenic E. coli ST 131 strain and the non-pathogenic E. coli K-12. In addition, the numbers of sequence variations were determined in both the protein-coding regions and the regulation regions. However, we found that the effects of variations on the protein-coding regions are less significant than those on the regulation regions.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121448894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
These days, social media usually becomes a reasonable standard for understanding the public's thought. Especially, people increasingly use internet media and SNS (twitter, facebook, blog, and etc.), to share opinions, news, advice, interests, moods, concerns, critics, facts, rumors, and everything. Therefore, public health research has been started a big change. Traditional public health study has depended on only regular clinical reports by health professionals. It is limited to practical use and general public has much difficulty to understand health information, even if it's his/her own information. Nowadays, over one billion people publish their ideas about many topics, including health conditions minute by minute. SNS provides researchers the freshest source of public health conditions on a global scale. Much of that data is public and available for mining. So this article pursues making an application of opinion mining for detecting the public's trend and finding valuable opinion among the massive information. The core of this research is analyzing the adjective of opinions. Our assumption is that many adjective expressions implicate deep and sincere meaning of its author. It is applicable for both low value postings filtering and tracking high value postings simultaneously. This approach is a simple and feasible criteria. The opinion mining process includes Korean morpheme analysis, opinion extraction, opinion tagging, positive / negative score evaluation. Our research's aim is to analyze Korean blog postings.
{"title":"Mining the Main Health Trend of the General Public based on Opinion Mining of Korean Blogsphere","authors":"Yong-il Lee, Sang-Hyob Nam, Jaeseung Jeong","doi":"10.1145/2665970.2665985","DOIUrl":"https://doi.org/10.1145/2665970.2665985","url":null,"abstract":"These days, social media usually becomes a reasonable standard for understanding the public's thought. Especially, people increasingly use internet media and SNS (twitter, facebook, blog, and etc.), to share opinions, news, advice, interests, moods, concerns, critics, facts, rumors, and everything. Therefore, public health research has been started a big change. Traditional public health study has depended on only regular clinical reports by health professionals. It is limited to practical use and general public has much difficulty to understand health information, even if it's his/her own information. Nowadays, over one billion people publish their ideas about many topics, including health conditions minute by minute. SNS provides researchers the freshest source of public health conditions on a global scale. Much of that data is public and available for mining. So this article pursues making an application of opinion mining for detecting the public's trend and finding valuable opinion among the massive information. The core of this research is analyzing the adjective of opinions. Our assumption is that many adjective expressions implicate deep and sincere meaning of its author. It is applicable for both low value postings filtering and tracking high value postings simultaneously. This approach is a simple and feasible criteria. The opinion mining process includes Korean morpheme analysis, opinion extraction, opinion tagging, positive / negative score evaluation. Our research's aim is to analyze Korean blog postings.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125449567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}