{"title":"Mining Biomedical Ontologies and Data Using RDF Hypergraphs","authors":"Haishan Liu, D. Dou, R. Jin, P. LePendu, N. Shah","doi":"10.1109/ICMLA.2013.31","DOIUrl":null,"url":null,"abstract":"As researchers analyze huge amounts of data that are annotated by large biomedical ontologies, one of the major challenges for data mining and machine learning is to leverage both ontologies and data together in a systematic and scalable way. In this paper, we address two interesting and related problems for mining biomedical ontologies and data: i) how to discover semantic associations with the help of formal ontologies, ii) how to identify potential errors in the ontologies with the help of data. By representing both ontologies and data using RDF hyper graphs, and subsequently transforming the hyper graphs to corresponding bipartite forms, we provide a generalized data mining method that scales beyond what existing ontology-based approaches can provide. We show the proposed method is indeed capable of capturing semantic associations while seamlessly incorporate domain knowledge in ontologies by performing evaluations on real-world electronic health dataset and NCBO ontologies. We also show that our data mining methods can discover and suggest corrections for misinformation in biomedical ontologies.","PeriodicalId":168867,"journal":{"name":"2013 12th International Conference on Machine Learning and Applications","volume":"13 6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 12th International Conference on Machine Learning and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2013.31","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 26
Abstract
As researchers analyze huge amounts of data that are annotated by large biomedical ontologies, one of the major challenges for data mining and machine learning is to leverage both ontologies and data together in a systematic and scalable way. In this paper, we address two interesting and related problems for mining biomedical ontologies and data: i) how to discover semantic associations with the help of formal ontologies, ii) how to identify potential errors in the ontologies with the help of data. By representing both ontologies and data using RDF hyper graphs, and subsequently transforming the hyper graphs to corresponding bipartite forms, we provide a generalized data mining method that scales beyond what existing ontology-based approaches can provide. We show the proposed method is indeed capable of capturing semantic associations while seamlessly incorporate domain knowledge in ontologies by performing evaluations on real-world electronic health dataset and NCBO ontologies. We also show that our data mining methods can discover and suggest corrections for misinformation in biomedical ontologies.