{"title":"A Method for Scalable First-Order Rule Learning on Twitter Data","authors":"Monica Senapati, L. Njilla, P. Rao","doi":"10.1109/ICDEW.2019.000-1","DOIUrl":null,"url":null,"abstract":"We propose a method for scalable first-order rule learning on large-scale Twitter data. By learning rules, probabilistic inference queries can be executed to reason over the data to ascertain its veracity. Our method employs a divide-and-conquer approach, graph-based modeling, and data parallel processing during rule learning using a commodity cluster to overcome the problem of slow structure learning on large-scale Twitter data. The first-order predicates (constructed on the posts) are first partitioned in a balanced way by pivoting around users to reduce the chance of missing relevant rules. By constructing a weighted graph and applying graph partitioning, balanced partitions of the ground predicates can be created. Each partition is then processed using an existing structure learning approach to get the set of rules for that partition. We report a preliminary evaluation of our method to show that it offers a promising solution for scalable first-order rule learning on Twitter data.","PeriodicalId":186190,"journal":{"name":"2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDEW.2019.000-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
We propose a method for scalable first-order rule learning on large-scale Twitter data. By learning rules, probabilistic inference queries can be executed to reason over the data to ascertain its veracity. Our method employs a divide-and-conquer approach, graph-based modeling, and data parallel processing during rule learning using a commodity cluster to overcome the problem of slow structure learning on large-scale Twitter data. The first-order predicates (constructed on the posts) are first partitioned in a balanced way by pivoting around users to reduce the chance of missing relevant rules. By constructing a weighted graph and applying graph partitioning, balanced partitions of the ground predicates can be created. Each partition is then processed using an existing structure learning approach to get the set of rules for that partition. We report a preliminary evaluation of our method to show that it offers a promising solution for scalable first-order rule learning on Twitter data.