{"title":"Top-k utility-based gene regulation sequential pattern discovery","authors":"Morteza Zihayat, Heydar Davoudi, Aijun An","doi":"10.1109/BIBM.2016.7822529","DOIUrl":null,"url":null,"abstract":"Sequential pattern mining has been used in bioinformatics to discover frequent gene regulation sequential patterns based on time course microarray datasets. While mining frequent sequences are important in biological studies for disease treatment, to date, most of the approaches do not consider the importance of the genes with respect to a disease being studied when identifying gene regulation sequential patterns. In addition, they focus on the more general up/down effects of genes in a microarray dataset and do not take into account the various degrees of expression during the mining process. As a result, the current techniques return too many sequences which may not be informative enough for biologists to explore relationships between the disease and underlying causes encoded in gene regulation sequences. In this paper, we propose a utility model by considering both the importance of genes with respect to a disease and their degrees of expression levels under a biological investigation. Then, we design a new method, called TU-SEQ, for identifying top-k high utility gene regulation sequential patterns from a time-course microarray dataset. The evaluation results show that our approach can effectively and efficiently discover key patterns representing meaningful gene regulation sequential patterns in a time course microarray dataset.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2016.7822529","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
Sequential pattern mining has been used in bioinformatics to discover frequent gene regulation sequential patterns based on time course microarray datasets. While mining frequent sequences are important in biological studies for disease treatment, to date, most of the approaches do not consider the importance of the genes with respect to a disease being studied when identifying gene regulation sequential patterns. In addition, they focus on the more general up/down effects of genes in a microarray dataset and do not take into account the various degrees of expression during the mining process. As a result, the current techniques return too many sequences which may not be informative enough for biologists to explore relationships between the disease and underlying causes encoded in gene regulation sequences. In this paper, we propose a utility model by considering both the importance of genes with respect to a disease and their degrees of expression levels under a biological investigation. Then, we design a new method, called TU-SEQ, for identifying top-k high utility gene regulation sequential patterns from a time-course microarray dataset. The evaluation results show that our approach can effectively and efficiently discover key patterns representing meaningful gene regulation sequential patterns in a time course microarray dataset.