Pub Date : 2025-01-01Epub Date: 2025-10-24DOI: 10.1007/s13278-025-01502-2
Abhishek Dalvi, Neil Ashtekar, Vasant G Honavar
We address the problem of estimating causal effects from observational data in the presence of network confounding, a setting where both treatment assignment and observed outcomes of individuals may be influenced by their neighbors within a network structure, resulting in network interference. Traditional causal inference methods often fail to account for these dependencies, leading to biased estimates. To tackle this challenge, we introduce a novel matching-based approach that utilizes principles from hyperdimensional computing to effectively encode and incorporate structural network information. This enables more accurate identification of comparable individuals, thereby improving the reliability of causal effect estimates. Through extensive empirical evaluation on multiple benchmark datasets, we demonstrate that our method either outperforms or performs on par with existing state-of-the-art approaches, including several recent deep learning-based models that are significantly more computationally intensive. In addition to its strong empirical performance, our method offers substantial practical advantages, achieving nearly an order-of-magnitude reduction in runtime without compromising accuracy, making it particularly well-suited for large-scale or time-sensitive applications.
{"title":"C-HDNet: A Fast Hyperdimensional Computing Based Method for Causal Effect Estimation from Networked Observational Data.","authors":"Abhishek Dalvi, Neil Ashtekar, Vasant G Honavar","doi":"10.1007/s13278-025-01502-2","DOIUrl":"10.1007/s13278-025-01502-2","url":null,"abstract":"<p><p>We address the problem of estimating causal effects from observational data in the presence of network confounding, a setting where both treatment assignment and observed outcomes of individuals may be influenced by their neighbors within a network structure, resulting in network interference. Traditional causal inference methods often fail to account for these dependencies, leading to biased estimates. To tackle this challenge, we introduce a novel matching-based approach that utilizes principles from hyperdimensional computing to effectively encode and incorporate structural network information. This enables more accurate identification of comparable individuals, thereby improving the reliability of causal effect estimates. Through extensive empirical evaluation on multiple benchmark datasets, we demonstrate that our method either outperforms or performs on par with existing state-of-the-art approaches, including several recent deep learning-based models that are significantly more computationally intensive. In addition to its strong empirical performance, our method offers substantial practical advantages, achieving nearly an order-of-magnitude reduction in runtime without compromising accuracy, making it particularly well-suited for large-scale or time-sensitive applications.</p>","PeriodicalId":21842,"journal":{"name":"Social Network Analysis and Mining","volume":"15 1","pages":"97"},"PeriodicalIF":2.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12552378/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145378738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-07-17DOI: 10.1007/s13278-025-01485-0
Elly Hung, Anastasia Mantziou, Gesine Reinert
Multivariate count time series arise in a wide range of applications, including the number of COVID-19 cases recorded each week in different counties of the Republic of Ireland. In this example, it is natural to view the counties as nodes in a network, with edges between counties reflecting proximity. One could then model disease spread on a network through a regression model. Often Gaussian errors are assumed for such a model, but for count data this assumption may not be natural. With this motivating example in mind, we develop a model with the following features. We assume that the time series occur on the nodes of a known underlying network where the edges dictate the form of a structural vector autoregression model. In contrast to using a full vector autoregressive model, the network assumption is a means of imposing sparsity. Moreover we aim for a model that is able to accommodate heterogeneous node dynamics, and to cluster nodes that exhibit similar behaviour. To address these aims, we propose a new Bayesian Poisson network autoregression mixture model that we call a PNARM model, which combines ideas from Poisson network autoregression models, grouped network autoregression models, and non-uniform co-clustering priors.
{"title":"A Bayesian mixture model for Poisson network autoregression.","authors":"Elly Hung, Anastasia Mantziou, Gesine Reinert","doi":"10.1007/s13278-025-01485-0","DOIUrl":"10.1007/s13278-025-01485-0","url":null,"abstract":"<p><p>Multivariate count time series arise in a wide range of applications, including the number of COVID-19 cases recorded each week in different counties of the Republic of Ireland. In this example, it is natural to view the counties as nodes in a network, with edges between counties reflecting proximity. One could then model disease spread on a network through a regression model. Often Gaussian errors are assumed for such a model, but for count data this assumption may not be natural. With this motivating example in mind, we develop a model with the following features. We assume that the time series occur on the nodes of a known underlying network where the edges dictate the form of a structural vector autoregression model. In contrast to using a full vector autoregressive model, the network assumption is a means of imposing sparsity. Moreover we aim for a model that is able to accommodate heterogeneous node dynamics, and to cluster nodes that exhibit similar behaviour. To address these aims, we propose a new Bayesian Poisson network autoregression mixture model that we call a PNARM model, which combines ideas from Poisson network autoregression models, grouped network autoregression models, and non-uniform co-clustering priors.</p>","PeriodicalId":21842,"journal":{"name":"Social Network Analysis and Mining","volume":"15 1","pages":"70"},"PeriodicalIF":2.3,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12271270/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144675694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-13DOI: 10.1007/s13278-023-01191-9
M. Kasri, Anas El-Ansari, Mohamed El Fissaoui, Badreddine Cherkaoui, Marouane Birjali, A. Beni-Hssane
{"title":"Correction: Public sentiment toward renewable energy in Morocco: opinion mining using a rule-based approach","authors":"M. Kasri, Anas El-Ansari, Mohamed El Fissaoui, Badreddine Cherkaoui, Marouane Birjali, A. Beni-Hssane","doi":"10.1007/s13278-023-01191-9","DOIUrl":"https://doi.org/10.1007/s13278-023-01191-9","url":null,"abstract":"","PeriodicalId":21842,"journal":{"name":"Social Network Analysis and Mining","volume":"30 1","pages":"1"},"PeriodicalIF":2.8,"publicationDate":"2024-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-11DOI: 10.1007/s13278-023-01184-8
Matheus Schmitz, Goran Muric, Daniel Hickey, Keith Burghardt
{"title":"Do users adopt extremist beliefs from exposure to hate subreddits?","authors":"Matheus Schmitz, Goran Muric, Daniel Hickey, Keith Burghardt","doi":"10.1007/s13278-023-01184-8","DOIUrl":"https://doi.org/10.1007/s13278-023-01184-8","url":null,"abstract":"","PeriodicalId":21842,"journal":{"name":"Social Network Analysis and Mining","volume":"2 5","pages":"1-12"},"PeriodicalIF":2.8,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139438141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Twitter data has been widely used by researchers across various social and computer science disciplines. A common aim when working with Twitter data is the construction of a random sample of users from a given country. However, while several methods have been proposed in the literature, their comparative performance is mostly unexplored. In this paper, we implement four common methods to create a random sample of Twitter users in the US: 1% Stream, Bounding Box, Location Query, and Language Query. Then, we compare these methods according to their tweet- and user-level metrics as well as their accuracy in estimating the US population. Our results show that users collected by the 1% Stream method tend to have more tweets, tweets per day, followers, and friends, a fewer number of likes, are younger accounts, and include more male users compared to the other three methods. Moreover, it achieves the minimum error in estimating the US population. However, the 1% Stream method is time-consuming, cannot be used for the past time frames, and is not suitable when user engagement is part of the study. In situation where these three drawbacks are important, our results support the Bounding Box method as the second-best method.
Supplementary information: The online version contains supplementary material available at. 10.1007/s13278-024-01327-5.
{"title":"Comparing methods for creating a national random sample of twitter users.","authors":"Meysam Alizadeh, Darya Zare, Zeynab Samei, Mohammadamin Alizadeh, Mael Kubli, Mohammadhadi Aliahmadi, Sarvenaz Ebrahimi, Fabrizio Gilardi","doi":"10.1007/s13278-024-01327-5","DOIUrl":"10.1007/s13278-024-01327-5","url":null,"abstract":"<p><p>Twitter data has been widely used by researchers across various social and computer science disciplines. A common aim when working with Twitter data is the construction of a random sample of users from a given country. However, while several methods have been proposed in the literature, their comparative performance is mostly unexplored. In this paper, we implement four common methods to create a random sample of Twitter users in the US: <i>1% Stream</i>, <i>Bounding Box</i>, <i>Location Query</i>, and <i>Language Query</i>. Then, we compare these methods according to their tweet- and user-level metrics as well as their accuracy in estimating the US population. Our results show that users collected by the <i>1% Stream</i> method tend to have more tweets, tweets per day, followers, and friends, a fewer number of likes, are younger accounts, and include more male users compared to the other three methods. Moreover, it achieves the minimum error in estimating the US population. However, the <i>1% Stream</i> method is time-consuming, cannot be used for the past time frames, and is not suitable when user engagement is part of the study. In situation where these three drawbacks are important, our results support the <i>Bounding Box</i> method as the second-best method.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at. 10.1007/s13278-024-01327-5.</p>","PeriodicalId":21842,"journal":{"name":"Social Network Analysis and Mining","volume":"14 1","pages":"160"},"PeriodicalIF":2.3,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11861139/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143524463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-18DOI: 10.1007/s13278-023-01168-8
Manussawee Nokkaew, K. Nongpong, Tapanan Yeophantong, Pattravadee Ploykitikoon, W. Arjharn, A. Siritaratiwat, Sorawit Narkglom, W. Wongsinlatam, T. Remsungnen, A. Namvong, C. Surawanitkun
{"title":"Analyzing online public opinion on Thailand-China high-speed train and Laos-China railway mega-projects using advanced machine learning for sentiment analysis","authors":"Manussawee Nokkaew, K. Nongpong, Tapanan Yeophantong, Pattravadee Ploykitikoon, W. Arjharn, A. Siritaratiwat, Sorawit Narkglom, W. Wongsinlatam, T. Remsungnen, A. Namvong, C. Surawanitkun","doi":"10.1007/s13278-023-01168-8","DOIUrl":"https://doi.org/10.1007/s13278-023-01168-8","url":null,"abstract":"","PeriodicalId":21842,"journal":{"name":"Social Network Analysis and Mining","volume":" 5","pages":""},"PeriodicalIF":2.8,"publicationDate":"2023-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138994620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-16DOI: 10.1007/s13278-023-01160-2
Apurva Sharma, Ajay Kumar Yadav, A. K. Rai
{"title":"A novel and precise approach for similarity-based link prediction in diverse networks","authors":"Apurva Sharma, Ajay Kumar Yadav, A. K. Rai","doi":"10.1007/s13278-023-01160-2","DOIUrl":"https://doi.org/10.1007/s13278-023-01160-2","url":null,"abstract":"","PeriodicalId":21842,"journal":{"name":"Social Network Analysis and Mining","volume":"20 3","pages":""},"PeriodicalIF":2.8,"publicationDate":"2023-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138967188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-16DOI: 10.1007/s13278-023-01172-y
M. Alfaqeeh, D. B. Skillicorn
{"title":"Community detection in social networks by spectral embedding of typed graphs","authors":"M. Alfaqeeh, D. B. Skillicorn","doi":"10.1007/s13278-023-01172-y","DOIUrl":"https://doi.org/10.1007/s13278-023-01172-y","url":null,"abstract":"","PeriodicalId":21842,"journal":{"name":"Social Network Analysis and Mining","volume":"49 1","pages":""},"PeriodicalIF":2.8,"publicationDate":"2023-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138966803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}