A. Rao, K. Maiden, Ben Carterette, Deborah B. Ehrenthal
Obesity is one of the most important health concerns in United States and is playing an important role in rising rates of chronic health conditions and health care costs. The percentage of the US population affected with childhood obesity and adult obesity has been on a constant upward linear trend for past few decades. According to Center for Disease control and prevention 35.7% of US adults are obese and 17% of children aged 2-19 years are obese. Researchers and health care providers in the US and the rest of world studying obesity are interested in factors affecting obesity. One such interesting factor potentially related to development of obesity is type of feeding provided to babies. In this work we describe an electronic health record (EHR) data set of babies with feeding method contained in the narrative portion of the record. We compare five supervised machine learning algorithms for predicting feeding method as a discrete value based on text in the field. We also compare these algorithms in terms of the classification error and prediction probability estimates generated by them.
{"title":"Predicting baby feeding method from unstructured electronic health record data","authors":"A. Rao, K. Maiden, Ben Carterette, Deborah B. Ehrenthal","doi":"10.1145/2390068.2390075","DOIUrl":"https://doi.org/10.1145/2390068.2390075","url":null,"abstract":"Obesity is one of the most important health concerns in United States and is playing an important role in rising rates of chronic health conditions and health care costs. The percentage of the US population affected with childhood obesity and adult obesity has been on a constant upward linear trend for past few decades. According to Center for Disease control and prevention 35.7% of US adults are obese and 17% of children aged 2-19 years are obese. Researchers and health care providers in the US and the rest of world studying obesity are interested in factors affecting obesity. One such interesting factor potentially related to development of obesity is type of feeding provided to babies. In this work we describe an electronic health record (EHR) data set of babies with feeding method contained in the narrative portion of the record. We compare five supervised machine learning algorithms for predicting feeding method as a discrete value based on text in the field. We also compare these algorithms in terms of the classification error and prediction probability estimates generated by them.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123464641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Mining clinical data and text","authors":"Hua Xu","doi":"10.1145/3260181","DOIUrl":"https://doi.org/10.1145/3260181","url":null,"abstract":"","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"412 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123565459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper exploits efficient indexing techniques for protein structure search where protein structures are represented as vectors by 3D-Zernike Descriptor (3DZD). 3DZD compactly represents a surface shape of protein tertiary structure as a vector, and the simplified representation accelerates the structural search. However, further speed up is needed to address the scenarios where multiple users access the database simultaneously. We address this need for further speed up in protein structural search by exploiting two indexing techniques, i.e., iDistance and iKernel, on the 3DZDs. The results show that both iDistance and iKernel significantly enhance the searching speed. In addition, we introduce an extended approach for protein structure search based on indexing techniques that use the 3DZD characteristic. In the extended approach, index structure is constructured using only the first few of the numbers in the 3DZDs. To find the top-k similar structures, first top-10 x k similar structures are selected using the reduced index structure, then top-k structures are selected using similarity measure of full 3DZDs of the selected structures. Using the indexing techniques, the searching time reduced 69.6% using iDistance, 77% using iKernel, 77.4% using extended iDistance, and 87.9% using extended iKernel method.
本文利用三维泽尼克描述符(3DZD)将蛋白质结构表示为向量,利用高效索引技术进行蛋白质结构搜索。3DZD将蛋白质三级结构的表面形状紧凑地表示为矢量,简化后的表示加速了结构搜索。但是,为了解决多个用户同时访问数据库的场景,需要进一步提高速度。我们利用两种索引技术,即iDistance和iKernel,在3DZDs上解决了这一需求,以进一步加快蛋白质结构搜索。结果表明,iDistance和iKernel都显著提高了搜索速度。此外,我们还介绍了一种基于使用3DZD特征的索引技术的蛋白质结构搜索扩展方法。在扩展方法中,索引结构仅使用3dzd中的前几个数字来构建。为了寻找top-k个相似结构,首先使用约简索引结构选择top-10 x k个相似结构,然后使用所选结构的全3dzd的相似性度量选择top-k个相似结构。使用索引技术,使用iDistance方法的搜索时间减少了69.6%,使用iKernel方法的搜索时间减少了77%,使用扩展iDistance方法的搜索时间减少了77.4%,使用扩展iKernel方法的搜索时间减少了87.9%。
{"title":"Indexing methods for efficient protein 3D surface search","authors":"Sungchul Kim, Lee Sael, Hwanjo Yu","doi":"10.1145/2390068.2390078","DOIUrl":"https://doi.org/10.1145/2390068.2390078","url":null,"abstract":"This paper exploits efficient indexing techniques for protein structure search where protein structures are represented as vectors by 3D-Zernike Descriptor (3DZD). 3DZD compactly represents a surface shape of protein tertiary structure as a vector, and the simplified representation accelerates the structural search. However, further speed up is needed to address the scenarios where multiple users access the database simultaneously. We address this need for further speed up in protein structural search by exploiting two indexing techniques, i.e., iDistance and iKernel, on the 3DZDs. The results show that both iDistance and iKernel significantly enhance the searching speed. In addition, we introduce an extended approach for protein structure search based on indexing techniques that use the 3DZD characteristic. In the extended approach, index structure is constructured using only the first few of the numbers in the 3DZDs. To find the top-k similar structures, first top-10 x k similar structures are selected using the reduced index structure, then top-k structures are selected using similarity measure of full 3DZDs of the selected structures. Using the indexing techniques, the searching time reduced 69.6% using iDistance, 77% using iKernel, 77.4% using extended iDistance, and 87.9% using extended iKernel method.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126628812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider the user task of designing clinical trial protocols and propose a method that outputs the most appropriate eligibility criteria from a potentially huge set of candidates. Each document d in our collection D is a clinical trial protocol which itself contains a set of eligibility criteria. Given a small set of sample documents D', |D'|<<|D|, a user has initially identified as relevant e.g., via a user query interface, our scoring method automatically suggests eligibility criteria from D by ranking them according to how appropriate they are to the clinical trial protocol currently being designed. We view a document as a mixture of latent topics and our method exploits this by applying a three-step procedure. First, we infer the latent topics in the sample documents using Latent Dirichlet Allocation (LDA) [3]. Next, we use logistic regression models to compute the probability that a given candidate criterion belongs to a particular topic. Lastly, we score each criterion by computing its expected value, the probability-weighted sum of the topic proportions inferred from the set of sample documents. Intuitively, the greater the probability that a candidate criterion belongs to the topics that are dominant in the samples, the higher its expected value or score. Results from our experiments indicate that our proposed method is 8 and 9 times better (resp., for inclusion and exclusion criteria) than randomly choosing from a set of candidates obtained from relevant documents. In user simulation experiments, we were able to automatically construct eligibility criteria that are on the average 75% and 70% (resp., for inclusion and exclusion criteria) similar to the correct eligibility criteria.
{"title":"Inferring appropriate eligibility criteria in clinical trial protocols without labeled data","authors":"Angelo C. Restificar, S. Ananiadou","doi":"10.1145/2390068.2390074","DOIUrl":"https://doi.org/10.1145/2390068.2390074","url":null,"abstract":"We consider the user task of designing clinical trial protocols and propose a method that outputs the most appropriate eligibility criteria from a potentially huge set of candidates. Each document d in our collection D is a clinical trial protocol which itself contains a set of eligibility criteria. Given a small set of sample documents D', |D'|<<|D|, a user has initially identified as relevant e.g., via a user query interface, our scoring method automatically suggests eligibility criteria from D by ranking them according to how appropriate they are to the clinical trial protocol currently being designed. We view a document as a mixture of latent topics and our method exploits this by applying a three-step procedure. First, we infer the latent topics in the sample documents using Latent Dirichlet Allocation (LDA) [3]. Next, we use logistic regression models to compute the probability that a given candidate criterion belongs to a particular topic. Lastly, we score each criterion by computing its expected value, the probability-weighted sum of the topic proportions inferred from the set of sample documents. Intuitively, the greater the probability that a candidate criterion belongs to the topics that are dominant in the samples, the higher its expected value or score. Results from our experiments indicate that our proposed method is 8 and 9 times better (resp., for inclusion and exclusion criteria) than randomly choosing from a set of candidates obtained from relevant documents. In user simulation experiments, we were able to automatically construct eligibility criteria that are on the average 75% and 70% (resp., for inclusion and exclusion criteria) similar to the correct eligibility criteria.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"38 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133352105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The paper concerns the issue of extraction of medicine names from free text documents written in Polish. Using lexicon-based approaches, it is impossible to identify unknown or misspelled medicine names. In this paper, we present the results of experimentation on two methods: Hidden Markov Model (HMM) and Pointwise Mutual Information (PMI)-based approach. The experiment was to identify the medicine names without the use of lexicon or contextual information. The experimentation results show, that HMM may be used as one of several steps in drug names' identification (with F-score slightly below 70% for the test set), while the PMI can help in increasing the precision of results achieved using HMM, but with significant loss in recall.
{"title":"Lexicon-free and context-free drug names identification methods using hidden markov models and pointwise mutual information","authors":"Jacek Małyszko, A. Filipowska","doi":"10.1145/2390068.2390072","DOIUrl":"https://doi.org/10.1145/2390068.2390072","url":null,"abstract":"The paper concerns the issue of extraction of medicine names from free text documents written in Polish. Using lexicon-based approaches, it is impossible to identify unknown or misspelled medicine names. In this paper, we present the results of experimentation on two methods: Hidden Markov Model (HMM) and Pointwise Mutual Information (PMI)-based approach. The experiment was to identify the medicine names without the use of lexicon or contextual information. The experimentation results show, that HMM may be used as one of several steps in drug names' identification (with F-score slightly below 70% for the test set), while the PMI can help in increasing the precision of results achieved using HMM, but with significant loss in recall.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129529870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Mining biological data and text","authors":"Min Song","doi":"10.1145/3260182","DOIUrl":"https://doi.org/10.1145/3260182","url":null,"abstract":"","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123904599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We explore an information extraction task where the goal is to determine the correct values for fields which are relevant to prescription drug administration such as dosage amount, frequency and route. The data set is a collection of prescriptions from a long-term health-care facility, a small subset of which we have manually annotated with values for these fields. We first examine a rule-based approach to the task, which uses a dependency parse of the prescription, achieving accuracies of 60-95% over various different fields, and 67.5% when all fields of the prescription are considered together. The outputs of such a system have potential applications in detecting irregularities in dosage delivery.
{"title":"Extracting structured information from free-text medication prescriptions using dependencies","authors":"Andrew D. MacKinlay, Karin M. Verspoor","doi":"10.1145/2390068.2390076","DOIUrl":"https://doi.org/10.1145/2390068.2390076","url":null,"abstract":"We explore an information extraction task where the goal is to determine the correct values for fields which are relevant to prescription drug administration such as dosage amount, frequency and route. The data set is a collection of prescriptions from a long-term health-care facility, a small subset of which we have manually annotated with values for these fields. We first examine a rule-based approach to the task, which uses a dependency parse of the prescription, achieving accuracies of 60-95% over various different fields, and 67.5% when all fields of the prescription are considered together. The outputs of such a system have potential applications in detecting irregularities in dosage delivery.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128652568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Essential reasons including robustness, redundancy, and crosstalk of biological systems, have been reported to explain the limited efficacy and unexpected side-effects of drugs. Many pharmaceutical laboratories have begun to develop multi-compound drugs to remedy this situation, and some of them have shown successful clinical results. Simultaneous application of multiple compounds could increase efficacy as well as reduce side-effects through pharmacodynamics and pharmacokinetic interactions. However, such approach requires overwhelming cost of preclinical experiments and tests as the number of possible combinations of compound dosages increases exponentially. Computer model-based experiments have been emerging as one of the most promising solutions to cope with such complexity. Though there have been many efforts to model specific molecular pathways using qualitative and quantitative formalisms, they suffer from unexpected results caused by distant interactions beyond their localized models. Here we propose a rule-based whole-body modeling platform. We have tested this platform with Type 2 diabetes (T2D) model, which involves the malfunction of numerous organs such as pancreas, circulation system, liver, and muscle. We have extracted T2D-related 117 rules by manual curation from literature and different types of existing models. The results of our simulation show drug effect pathways of T2D drugs and how combination of drugs could work on the whole-body scale. We expect that it would provide the insight for identifying effective combination of drugs and its mechanism for the drug development.
{"title":"Rule-based whole body modeling for analyzing multi-compound effects","authors":"W. Hwang, Y. Hwang, Sunjae Lee, Doheon Lee","doi":"10.1145/2390068.2390083","DOIUrl":"https://doi.org/10.1145/2390068.2390083","url":null,"abstract":"Essential reasons including robustness, redundancy, and crosstalk of biological systems, have been reported to explain the limited efficacy and unexpected side-effects of drugs. Many pharmaceutical laboratories have begun to develop multi-compound drugs to remedy this situation, and some of them have shown successful clinical results. Simultaneous application of multiple compounds could increase efficacy as well as reduce side-effects through pharmacodynamics and pharmacokinetic interactions. However, such approach requires overwhelming cost of preclinical experiments and tests as the number of possible combinations of compound dosages increases exponentially. Computer model-based experiments have been emerging as one of the most promising solutions to cope with such complexity. Though there have been many efforts to model specific molecular pathways using qualitative and quantitative formalisms, they suffer from unexpected results caused by distant interactions beyond their localized models.\u0000 Here we propose a rule-based whole-body modeling platform. We have tested this platform with Type 2 diabetes (T2D) model, which involves the malfunction of numerous organs such as pancreas, circulation system, liver, and muscle. We have extracted T2D-related 117 rules by manual curation from literature and different types of existing models. The results of our simulation show drug effect pathways of T2D drugs and how combination of drugs could work on the whole-body scale. We expect that it would provide the insight for identifying effective combination of drugs and its mechanism for the drug development.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121701244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jaegyoon Ahn, D. Lee, Youngmi Yoon, Yunku Yeu, Sanghyun Park
Detecting protein complexes is one of essential and fundamental tasks in understanding various biological functions or processes. Therefore, precise identification of protein complexes is indispensible. For more precise detection of protein complexes, we propose a novel data structure which employs bottleneck proteins as partitioning points for detecting the protein complexes. The partitioning process allows overlapping between resulting protein complexes. We applied our algorithm to several PPI (Protein-Protein Interaction) networks of Saccharomyces cerevisiae and Homo sapiens, and validated our results using public databases of protein complexes. Our algorithm resulted in overlapping protein complexes with significantly improved F1 score, which comes from higher precision.
检测蛋白质复合物是理解各种生物功能或过程的基本任务之一。因此,精确鉴定蛋白质复合物是必不可少的。为了更精确地检测蛋白质复合物,我们提出了一种新的数据结构,该结构采用瓶颈蛋白作为检测蛋白质复合物的分划点。分割过程允许产生的蛋白质复合物之间的重叠。我们将我们的算法应用于酿酒酵母和智人的几个PPI (protein - protein Interaction)网络,并使用蛋白质复合物的公共数据库验证了我们的结果。我们的算法产生重叠的蛋白复合物,F1分数显著提高,这来自于更高的精度。
{"title":"Protein complex prediction via bottleneck-based graph partitioning","authors":"Jaegyoon Ahn, D. Lee, Youngmi Yoon, Yunku Yeu, Sanghyun Park","doi":"10.1145/2390068.2390079","DOIUrl":"https://doi.org/10.1145/2390068.2390079","url":null,"abstract":"Detecting protein complexes is one of essential and fundamental tasks in understanding various biological functions or processes. Therefore, precise identification of protein complexes is indispensible. For more precise detection of protein complexes, we propose a novel data structure which employs bottleneck proteins as partitioning points for detecting the protein complexes. The partitioning process allows overlapping between resulting protein complexes. We applied our algorithm to several PPI (Protein-Protein Interaction) networks of Saccharomyces cerevisiae and Homo sapiens, and validated our results using public databases of protein complexes. Our algorithm resulted in overlapping protein complexes with significantly improved F1 score, which comes from higher precision.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122176469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Named entity recognition (NER) is an important task for natural language processing (NLP) of clinical text. Conditional Random Fields (CRFs), a sequential labeling algorithm, and Support Vector Machines (SVMs), which is based on large margin theory, are two typical machine learning algorithms that have been widely applied to NER tasks, including clinical entity recognition. However, Structural Support Vector Machines (SSVMs), an algorithm that combines the advantages of both CRFs and SVMs, has not been investigated for clinical text processing. In this study, we applied the SSVMs algorithm to the Concept Extraction task of the 2010 i2b2 clinical NLP challenge, which was to recognize entities of medical problems, treatments, and tests from hospital discharge summaries. Using the same training (N = 27,837) and test (N = 45,009) sets in the challenge, our evaluation showed that the SSVMs-based NER system required less training time, while achieved better performance than the CRFs-based system for clinical entity recognition, when same features were used. Our study also demonstrated that rich features such as unsupervised word representations improved the performance of clinical entity recognition. When rich features were integrated with SSVMs, our system achieved a highest F-measure of 85.74% on the test set of 2010 i2b2 NLP challenge, which outperformed the best system reported in the challenge by 0.5%.
{"title":"Clinical entity recognition using structural support vector machines with rich features","authors":"Buzhou Tang, Yonghui Wu, Min Jiang, Hua Xu","doi":"10.1145/2390068.2390073","DOIUrl":"https://doi.org/10.1145/2390068.2390073","url":null,"abstract":"Named entity recognition (NER) is an important task for natural language processing (NLP) of clinical text. Conditional Random Fields (CRFs), a sequential labeling algorithm, and Support Vector Machines (SVMs), which is based on large margin theory, are two typical machine learning algorithms that have been widely applied to NER tasks, including clinical entity recognition. However, Structural Support Vector Machines (SSVMs), an algorithm that combines the advantages of both CRFs and SVMs, has not been investigated for clinical text processing. In this study, we applied the SSVMs algorithm to the Concept Extraction task of the 2010 i2b2 clinical NLP challenge, which was to recognize entities of medical problems, treatments, and tests from hospital discharge summaries. Using the same training (N = 27,837) and test (N = 45,009) sets in the challenge, our evaluation showed that the SSVMs-based NER system required less training time, while achieved better performance than the CRFs-based system for clinical entity recognition, when same features were used. Our study also demonstrated that rich features such as unsupervised word representations improved the performance of clinical entity recognition. When rich features were integrated with SSVMs, our system achieved a highest F-measure of 85.74% on the test set of 2010 i2b2 NLP challenge, which outperformed the best system reported in the challenge by 0.5%.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129035767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}