{"title":"变量排序对贝叶斯网络结构学习的影响","authors":"Neville K. Kitson, Anthony C. Constantinou","doi":"10.1007/s10618-024-01044-9","DOIUrl":null,"url":null,"abstract":"<p>Causal Bayesian Networks (CBNs) provide an important tool for reasoning under uncertainty with potential application to many complex causal systems. Structure learning algorithms that can tell us something about the causal structure of these systems are becoming increasingly important. In the literature, the validity of these algorithms is often tested for sensitivity over varying sample sizes, hyper-parameters, and occasionally objective functions, but the effect of the order in which the variables are read from data is rarely quantified. We show that many commonly-used algorithms, both established and state-of-the-art, are more sensitive to variable ordering than these other factors when learning CBNs from discrete variables. This effect is strongest in hill-climbing and its variants where we explain how it arises, but extends to hybrid, and to a lesser-extent, constraint-based algorithms. Because the variable ordering is arbitrary, any significant effect it has on learnt graph accuracy is concerning, and raises questions about the validity of both many older and more recent results produced by these algorithms in practical applications and their rankings in performance evaluations.</p>","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"44 1","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The impact of variable ordering on Bayesian network structure learning\",\"authors\":\"Neville K. Kitson, Anthony C. Constantinou\",\"doi\":\"10.1007/s10618-024-01044-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Causal Bayesian Networks (CBNs) provide an important tool for reasoning under uncertainty with potential application to many complex causal systems. Structure learning algorithms that can tell us something about the causal structure of these systems are becoming increasingly important. In the literature, the validity of these algorithms is often tested for sensitivity over varying sample sizes, hyper-parameters, and occasionally objective functions, but the effect of the order in which the variables are read from data is rarely quantified. We show that many commonly-used algorithms, both established and state-of-the-art, are more sensitive to variable ordering than these other factors when learning CBNs from discrete variables. This effect is strongest in hill-climbing and its variants where we explain how it arises, but extends to hybrid, and to a lesser-extent, constraint-based algorithms. Because the variable ordering is arbitrary, any significant effect it has on learnt graph accuracy is concerning, and raises questions about the validity of both many older and more recent results produced by these algorithms in practical applications and their rankings in performance evaluations.</p>\",\"PeriodicalId\":55183,\"journal\":{\"name\":\"Data Mining and Knowledge Discovery\",\"volume\":\"44 1\",\"pages\":\"\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data Mining and Knowledge Discovery\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s10618-024-01044-9\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Mining and Knowledge Discovery","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10618-024-01044-9","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
The impact of variable ordering on Bayesian network structure learning
Causal Bayesian Networks (CBNs) provide an important tool for reasoning under uncertainty with potential application to many complex causal systems. Structure learning algorithms that can tell us something about the causal structure of these systems are becoming increasingly important. In the literature, the validity of these algorithms is often tested for sensitivity over varying sample sizes, hyper-parameters, and occasionally objective functions, but the effect of the order in which the variables are read from data is rarely quantified. We show that many commonly-used algorithms, both established and state-of-the-art, are more sensitive to variable ordering than these other factors when learning CBNs from discrete variables. This effect is strongest in hill-climbing and its variants where we explain how it arises, but extends to hybrid, and to a lesser-extent, constraint-based algorithms. Because the variable ordering is arbitrary, any significant effect it has on learnt graph accuracy is concerning, and raises questions about the validity of both many older and more recent results produced by these algorithms in practical applications and their rankings in performance evaluations.
期刊介绍:
Advances in data gathering, storage, and distribution have created a need for computational tools and techniques to aid in data analysis. Data Mining and Knowledge Discovery in Databases (KDD) is a rapidly growing area of research and application that builds on techniques and theories from many fields, including statistics, databases, pattern recognition and learning, data visualization, uncertainty modelling, data warehousing and OLAP, optimization, and high performance computing.