{"title":"学习具有多个潜在变量的贝叶斯网络,实现隐式关系表征","authors":"Xinran Wu, Kun Yue, Liang Duan, Xiaodong Fu","doi":"10.1007/s10618-024-01012-3","DOIUrl":null,"url":null,"abstract":"<p>Artificial intelligence applications could be more powerful and comprehensive by incorporating the ability of inference, which could be achieved by probabilistic inference over implicit relations. It is significant yet challenging to represent implicit relations among observed variables and latent ones like disease etiologies and user preferences. In this paper, we propose the BN with multiple latent variables (MLBN) as the framework for representing the dependence relations, where multiple latent variables are incorporated to describe multi-dimensional abstract concepts. However, the efficiency of MLBN learning and effectiveness of MLBN based applications are still nontrivial due to the presence of multiple latent variables. To this end, we first propose the constraint induced and Spark based algorithm for MLBN learning, as well as several optimization strategies. Moreover, we present the concept of variation degree and further design a subgraph based algorithm for incremental learning of MLBN. Experimental results suggest that our proposed MLBN model could represent the dependence relations correctly. Our proposed method outperforms some state-of-the-art competitors for personalized recommendation, and facilitates some typical approaches to achieve better performance.</p>","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"94 24 1","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2024-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning a Bayesian network with multiple latent variables for implicit relation representation\",\"authors\":\"Xinran Wu, Kun Yue, Liang Duan, Xiaodong Fu\",\"doi\":\"10.1007/s10618-024-01012-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Artificial intelligence applications could be more powerful and comprehensive by incorporating the ability of inference, which could be achieved by probabilistic inference over implicit relations. It is significant yet challenging to represent implicit relations among observed variables and latent ones like disease etiologies and user preferences. In this paper, we propose the BN with multiple latent variables (MLBN) as the framework for representing the dependence relations, where multiple latent variables are incorporated to describe multi-dimensional abstract concepts. However, the efficiency of MLBN learning and effectiveness of MLBN based applications are still nontrivial due to the presence of multiple latent variables. To this end, we first propose the constraint induced and Spark based algorithm for MLBN learning, as well as several optimization strategies. Moreover, we present the concept of variation degree and further design a subgraph based algorithm for incremental learning of MLBN. Experimental results suggest that our proposed MLBN model could represent the dependence relations correctly. Our proposed method outperforms some state-of-the-art competitors for personalized recommendation, and facilitates some typical approaches to achieve better performance.</p>\",\"PeriodicalId\":55183,\"journal\":{\"name\":\"Data Mining and Knowledge Discovery\",\"volume\":\"94 24 1\",\"pages\":\"\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-02-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data Mining and Knowledge Discovery\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s10618-024-01012-3\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Mining and Knowledge Discovery","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10618-024-01012-3","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Learning a Bayesian network with multiple latent variables for implicit relation representation
Artificial intelligence applications could be more powerful and comprehensive by incorporating the ability of inference, which could be achieved by probabilistic inference over implicit relations. It is significant yet challenging to represent implicit relations among observed variables and latent ones like disease etiologies and user preferences. In this paper, we propose the BN with multiple latent variables (MLBN) as the framework for representing the dependence relations, where multiple latent variables are incorporated to describe multi-dimensional abstract concepts. However, the efficiency of MLBN learning and effectiveness of MLBN based applications are still nontrivial due to the presence of multiple latent variables. To this end, we first propose the constraint induced and Spark based algorithm for MLBN learning, as well as several optimization strategies. Moreover, we present the concept of variation degree and further design a subgraph based algorithm for incremental learning of MLBN. Experimental results suggest that our proposed MLBN model could represent the dependence relations correctly. Our proposed method outperforms some state-of-the-art competitors for personalized recommendation, and facilitates some typical approaches to achieve better performance.
期刊介绍:
Advances in data gathering, storage, and distribution have created a need for computational tools and techniques to aid in data analysis. Data Mining and Knowledge Discovery in Databases (KDD) is a rapidly growing area of research and application that builds on techniques and theories from many fields, including statistics, databases, pattern recognition and learning, data visualization, uncertainty modelling, data warehousing and OLAP, optimization, and high performance computing.