Yiming Chen, Zhoujun Li, Xiaohua Hu, Hongxiang Diao, Junwan Liu
{"title":"Predicting gene function with positive and unlabeled examples","authors":"Yiming Chen, Zhoujun Li, Xiaohua Hu, Hongxiang Diao, Junwan Liu","doi":"10.1109/GRC.2009.5255161","DOIUrl":null,"url":null,"abstract":"Predicting gene function is usually formulated as binary classification problem. However, we only know which gene has some function while we are not sure that it doesn't belong to a function class, which means that only positive examples are given. Therefore, selecting a good training example set becomes a key step. In this paper, we cluster the genes on integrated weighted graph by generalizing the cluster coefficient of unweighted graph to weighted one, and identify the reliable negative samples based on distance between a gene and centroid of positive clusters. Then, the tri-training algorithm is used to learn three classifiers from labeled and unlabeled examples to predict the gene function by combining three prediction result. The experiment results show that our approach outperforms several classic prediction methods.","PeriodicalId":388774,"journal":{"name":"2009 IEEE International Conference on Granular Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE International Conference on Granular Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GRC.2009.5255161","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Predicting gene function is usually formulated as binary classification problem. However, we only know which gene has some function while we are not sure that it doesn't belong to a function class, which means that only positive examples are given. Therefore, selecting a good training example set becomes a key step. In this paper, we cluster the genes on integrated weighted graph by generalizing the cluster coefficient of unweighted graph to weighted one, and identify the reliable negative samples based on distance between a gene and centroid of positive clusters. Then, the tri-training algorithm is used to learn three classifiers from labeled and unlabeled examples to predict the gene function by combining three prediction result. The experiment results show that our approach outperforms several classic prediction methods.