Pub Date : 2023-01-01Epub Date: 2023-05-05DOI: 10.1007/s41109-023-00547-6
Alberto Cottica, Veronica Davidov, Magdalena Góralska, Jan Kubik, Guy Melançon, Richard Mole, Bruno Pinaud, Wojciech Szymański
The use of data and algorithms in the social sciences allows for exciting progress, but also poses epistemological challenges. Operations that appear innocent and purely technical may profoundly influence final results. Researchers working with data can make their process less arbitrary and more accountable by making theoretically grounded methodological choices. We apply this approach to the problem of simplifying networks representing ethnographic corpora, in the interest of visual interpretation. Network nodes represent ethnographic codes, and their edges the co-occurrence of codes in a corpus. We introduce and discuss four techniques to simplify such networks and facilitate visual analysis. We show how the mathematical characteristics of each one are aligned with an identifiable approach in sociology or anthropology: structuralism and post-structuralism; identifying the central concepts in a discourse; and discovering hegemonic and counter-hegemonic clusters of meaning. We then provide an example of how the four techniques complement each other in ethnographic analysis.
{"title":"Operationalizing anthropological theory: four techniques to simplify networks of co-occurring ethnographic codes.","authors":"Alberto Cottica, Veronica Davidov, Magdalena Góralska, Jan Kubik, Guy Melançon, Richard Mole, Bruno Pinaud, Wojciech Szymański","doi":"10.1007/s41109-023-00547-6","DOIUrl":"10.1007/s41109-023-00547-6","url":null,"abstract":"<p><p>The use of data and algorithms in the social sciences allows for exciting progress, but also poses epistemological challenges. Operations that appear innocent and purely technical may profoundly influence final results. Researchers working with data can make their process less arbitrary and more accountable by making theoretically grounded methodological choices. We apply this approach to the problem of simplifying networks representing ethnographic corpora, in the interest of visual interpretation. Network nodes represent ethnographic codes, and their edges the co-occurrence of codes in a corpus. We introduce and discuss four techniques to simplify such networks and facilitate visual analysis. We show how the mathematical characteristics of each one are aligned with an identifiable approach in sociology or anthropology: structuralism and post-structuralism; identifying the central concepts in a discourse; and discovering hegemonic and counter-hegemonic clusters of meaning. We then provide an example of how the four techniques complement each other in ethnographic analysis.</p>","PeriodicalId":37010,"journal":{"name":"Applied Network Science","volume":"8 1","pages":"22"},"PeriodicalIF":2.2,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10161994/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9857388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01Epub Date: 2023-08-21DOI: 10.1007/s41109-023-00566-3
Gergely Ódor, Jana Vuckovic, Miguel-Angel Sanchez Ndoye, Patrick Thiran
Inferring the source of a diffusion in a large network of agents is a difficult but feasible task, if a few agents act as sensors revealing the time at which they got hit by the diffusion. One of the main limitations of current source identification algorithms is that they assume full knowledge of the contact network, which is rarely the case, especially for epidemics, where the source is called patient zero. Inspired by recent implementations of contact tracing algorithms, we propose a new framework, which we call Source Identification via Contact Tracing Framework (SICTF). In the SICTF, the source identification task starts at the time of the first hospitalization, and initially we have no knowledge about the contact network other than the identity of the first hospitalized agent. We may then explore the network by contact queries, and obtain symptom onset times by test queries in an adaptive way, i.e., both contact and test queries can depend on the outcome of previous queries. We also assume that some of the agents may be asymptomatic, and therefore cannot reveal their symptom onset time. Our goal is to find patient zero with as few contact and test queries as possible. We implement two local search algorithms for the SICTF: the LS algorithm, which has recently been proposed by Waniek et al. in a similar framework, is more data-efficient, but can fail to find the true source if many asymptomatic agents are present, whereas the LS+ algorithm is more robust to asymptomatic agents. By simulations we show that both LS and LS+ outperform previously proposed adaptive and non-adaptive source identification algorithms adapted to the SICTF, even though these baseline algorithms have full access to the contact network. Extending the theory of random exponential trees, we analytically approximate the source identification probability of the LS/ LS+ algorithms, and we show that our analytic results match the simulations. Finally, we benchmark our algorithms on the Data-driven COVID-19 Simulator (DCS) developed by Lorch et al., which is the first time source identification algorithms are tested on such a complex dataset.
{"title":"Source identification via contact tracing in the presence of asymptomatic patients.","authors":"Gergely Ódor, Jana Vuckovic, Miguel-Angel Sanchez Ndoye, Patrick Thiran","doi":"10.1007/s41109-023-00566-3","DOIUrl":"10.1007/s41109-023-00566-3","url":null,"abstract":"<p><p>Inferring the source of a diffusion in a large network of agents is a difficult but feasible task, if a few agents act as sensors revealing the time at which they got hit by the diffusion. One of the main limitations of current source identification algorithms is that they assume full knowledge of the contact network, which is rarely the case, especially for epidemics, where the source is called patient zero. Inspired by recent implementations of contact tracing algorithms, we propose a new framework, which we call Source Identification via Contact Tracing Framework (SICTF). In the SICTF, the source identification task starts at the time of the first hospitalization, and initially we have no knowledge about the contact network other than the identity of the first hospitalized agent. We may then explore the network by contact queries, and obtain symptom onset times by test queries in an adaptive way, i.e., both contact and test queries can depend on the outcome of previous queries. We also assume that some of the agents may be asymptomatic, and therefore cannot reveal their symptom onset time. Our goal is to find patient zero with as few contact and test queries as possible. We implement two local search algorithms for the SICTF: the LS algorithm, which has recently been proposed by Waniek et al. in a similar framework, is more data-efficient, but can fail to find the true source if many asymptomatic agents are present, whereas the LS+ algorithm is more robust to asymptomatic agents. By simulations we show that both LS and LS+ outperform previously proposed adaptive and non-adaptive source identification algorithms adapted to the SICTF, even though these baseline algorithms have full access to the contact network. Extending the theory of random exponential trees, we analytically approximate the source identification probability of the LS/ LS+ algorithms, and we show that our analytic results match the simulations. Finally, we benchmark our algorithms on the Data-driven COVID-19 Simulator (DCS) developed by Lorch et al., which is the first time source identification algorithms are tested on such a complex dataset.</p>","PeriodicalId":37010,"journal":{"name":"Applied Network Science","volume":"8 1","pages":"53"},"PeriodicalIF":2.2,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10442312/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10442074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01Epub Date: 2023-07-25DOI: 10.1007/s41109-023-00573-4
Stefania Ionescu, Anikó Hannák, Nicolò Pagan
Motivation: Social media platforms centered around content creators (CCs) faced rapid growth in the past decade. Currently, millions of CCs make livable incomes through platforms such as YouTube, TikTok, and Instagram. As such, similarly to the job market, it is important to ensure the success and income (usually related to the follower counts) of CCs reflect the quality of their work. Since quality cannot be observed directly, two other factors govern the network-formation process: (a) the visibility of CCs (resulted from, e.g., recommender systems and moderation processes) and (b) the decision-making process of seekers (i.e., of users focused on finding CCs). Prior virtual experiments and empirical work seem contradictory regarding fairness: While the first suggests the expected number of followers of CCs reflects their quality, the second says that quality does not perfectly predict success.
Results: Our paper extends prior models in order to bridge this gap between theoretical and empirical work. We (a) define a parameterized recommendation process which allocates visibility based on popularity biases, (b) define two metrics of individual fairness (ex-ante and ex-post), and (c) define a metric for seeker satisfaction. Through an analytical approach we show our process is an absorbing Markov Chain where exploring only the most popular CCs leads to lower expected times to absorption but higher chances of unfairness for CCs. While increasing the exploration helps, doing so only guarantees fair outcomes for the highest (and lowest) quality CC. Simulations revealed that CCs and seekers prefer different algorithmic designs: CCs generally have higher chances of fairness with anti-popularity biased recommendation processes, while seekers are more satisfied with popularity-biased recommendations. Altogether, our results suggest that while the exploration of low-popularity CCs is needed to improve fairness, platforms might not have the incentive to do so and such interventions do not entirely prevent unfair outcomes.
{"title":"The role of luck in the success of social media influencers.","authors":"Stefania Ionescu, Anikó Hannák, Nicolò Pagan","doi":"10.1007/s41109-023-00573-4","DOIUrl":"10.1007/s41109-023-00573-4","url":null,"abstract":"<p><strong>Motivation: </strong>Social media platforms centered around content creators (CCs) faced rapid growth in the past decade. Currently, millions of CCs make livable incomes through platforms such as YouTube, TikTok, and Instagram. As such, similarly to the job market, it is important to ensure the success and income (usually related to the follower counts) of CCs reflect the quality of their work. Since quality cannot be observed directly, two other factors govern the network-formation process: (a) the <i>visibility</i> of CCs (resulted from, e.g., recommender systems and moderation processes) and (b) the <i>decision-making process</i> of seekers (i.e., of users focused on finding CCs). Prior virtual experiments and empirical work seem contradictory regarding fairness: While the first suggests the expected number of followers of CCs reflects their quality, the second says that quality does not perfectly predict success.</p><p><strong>Results: </strong>Our paper extends prior models in order to bridge this gap between theoretical and empirical work. We (a) define a parameterized recommendation process which allocates visibility based on popularity biases, (b) define two metrics of individual fairness (ex-ante and ex-post), and (c) define a metric for seeker satisfaction. Through an analytical approach we show our process is an absorbing Markov Chain where exploring only the most popular CCs leads to lower expected times to absorption but higher chances of unfairness for CCs. While increasing the exploration helps, doing so only guarantees fair outcomes for the highest (and lowest) quality CC. Simulations revealed that CCs and seekers prefer different algorithmic designs: CCs generally have higher chances of fairness with anti-popularity biased recommendation processes, while seekers are more satisfied with popularity-biased recommendations. Altogether, our results suggest that while the exploration of low-popularity CCs is needed to improve fairness, platforms might not have the incentive to do so and such interventions do not entirely prevent unfair outcomes.</p>","PeriodicalId":37010,"journal":{"name":"Applied Network Science","volume":"8 1","pages":"46"},"PeriodicalIF":2.2,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10368581/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9887900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01Epub Date: 2023-05-11DOI: 10.1007/s41109-023-00548-5
Carly A Bobak, Yifan Zhao, Joshua J Levy, A James O'Malley
Protecting medical privacy can create obstacles in the analysis and distribution of healthcare graphs and statistical inferences accompanying them. We pose a graph simulation model which generates networks using degree and property augmentation and provide a flexible R package that allows users to create graphs that preserve vertex attribute relationships and approximating the retention of topological properties observed in the original graph (e.g., community structure). We illustrate our proposed algorithm using a case study based on Zachary's karate network and a patient-sharing graph generated from Medicare claims data in 2019. In both cases, we find that community structure is preserved, and normalized root mean square error between cumulative distributions of the degrees across the generated and the original graphs is low (0.0508 and 0.0514 respectively).
保护医疗隐私会给医疗图表的分析和发布以及随之而来的统计推断造成障碍。我们提出了一个图仿真模型,该模型利用度和属性增强生成网络,并提供了一个灵活的 R 软件包,允许用户创建保留顶点属性关系的图,并近似保留原始图中观察到的拓扑属性(如群落结构)。我们使用基于 Zachary 空手道网络的案例研究和根据 2019 年医疗保险报销数据生成的患者共享图来说明我们提出的算法。在这两种情况下,我们都发现群落结构得到了保留,生成图和原始图的度数累积分布之间的归一化均方根误差很低(分别为 0.0508 和 0.0514)。
{"title":"GRANDPA: GeneRAtive network sampling using degree and property augmentation applied to the analysis of partially confidential healthcare networks.","authors":"Carly A Bobak, Yifan Zhao, Joshua J Levy, A James O'Malley","doi":"10.1007/s41109-023-00548-5","DOIUrl":"10.1007/s41109-023-00548-5","url":null,"abstract":"<p><p>Protecting medical privacy can create obstacles in the analysis and distribution of healthcare graphs and statistical inferences accompanying them. We pose a graph simulation model which generates networks using degree and property augmentation and provide a flexible R package that allows users to create graphs that preserve vertex attribute relationships and approximating the retention of topological properties observed in the original graph (e.g., community structure). We illustrate our proposed algorithm using a case study based on Zachary's karate network and a patient-sharing graph generated from Medicare claims data in 2019. In both cases, we find that community structure is preserved, and normalized root mean square error between cumulative distributions of the degrees across the generated and the original graphs is low (0.0508 and 0.0514 respectively).</p>","PeriodicalId":37010,"journal":{"name":"Applied Network Science","volume":"8 1","pages":"23"},"PeriodicalIF":1.3,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10173245/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10115610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01Epub Date: 2023-09-12DOI: 10.1007/s41109-023-00588-x
Gorm Gruner Jensen, Martin Benedikt Busch, Marco Piovesan, Jan O Haerter
We investigate the development of cooperative behavior in networks over time. In our controlled laboratory experiment, subjects can cooperate by sending costly messages that contain valuable information for the receiver or other subjects in the network. Any message sent can increase the chance that subjects find the information they are looking for and consequently their profit. We find that cooperation emerges spontaneously and remains stable over time. In an additional treatment, we provide a non-binding suggestion about who to contact at the beginning of the experiment. We find that subjects partially follow our recommendation, and this increases their own and others' profit. Despite the removal of suggestions, subjects build long-lasting relationships with the suggested contacts.
Supplementary information: The online version contains supplementary material available at 10.1007/s41109-023-00588-x.
{"title":"Nudging cooperation among agents in an experimental social network.","authors":"Gorm Gruner Jensen, Martin Benedikt Busch, Marco Piovesan, Jan O Haerter","doi":"10.1007/s41109-023-00588-x","DOIUrl":"10.1007/s41109-023-00588-x","url":null,"abstract":"<p><p>We investigate the development of cooperative behavior in networks over time. In our controlled laboratory experiment, subjects can cooperate by sending costly messages that contain valuable information for the receiver or other subjects in the network. Any message sent can increase the chance that subjects find the information they are looking for and consequently their profit. We find that cooperation emerges spontaneously and remains stable over time. In an additional treatment, we provide a non-binding suggestion about who to contact at the beginning of the experiment. We find that subjects partially follow our recommendation, and this increases their own and others' profit. Despite the removal of suggestions, subjects build long-lasting relationships with the suggested contacts.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1007/s41109-023-00588-x.</p>","PeriodicalId":37010,"journal":{"name":"Applied Network Science","volume":"8 1","pages":"62"},"PeriodicalIF":2.2,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10497665/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10269051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.1007/s41109-023-00544-9
Sinan A Ozbay, Maximilian M Nguyen
We present a simple method to quantitatively capture the heterogeneity in the degree distribution of a network graph using a single parameter . Using an exponential transformation of the shape parameter of the Weibull distribution, this control parameter allows the degree distribution to be easily interpolated between highly symmetric and highly heterogeneous distributions on the unit interval. This parameterization of heterogeneity also recovers several other canonical distributions as intermediate special cases, including the Gaussian, Rayleigh, and exponential distributions. We then outline a general graph generation algorithm to produce graphs with a desired amount of heterogeneity. The utility of this formulation of a heterogeneity parameter is demonstrated with examples relating to epidemiological modeling and spectral analysis.
{"title":"Parameterizing network graph heterogeneity using a modified Weibull distribution.","authors":"Sinan A Ozbay, Maximilian M Nguyen","doi":"10.1007/s41109-023-00544-9","DOIUrl":"https://doi.org/10.1007/s41109-023-00544-9","url":null,"abstract":"<p><p>We present a simple method to quantitatively capture the heterogeneity in the degree distribution of a network graph using a single parameter <math><mi>σ</mi></math> . Using an exponential transformation of the shape parameter of the Weibull distribution, this control parameter allows the degree distribution to be easily interpolated between highly symmetric and highly heterogeneous distributions on the unit interval. This parameterization of heterogeneity also recovers several other canonical distributions as intermediate special cases, including the Gaussian, Rayleigh, and exponential distributions. We then outline a general graph generation algorithm to produce graphs with a desired amount of heterogeneity. The utility of this formulation of a heterogeneity parameter is demonstrated with examples relating to epidemiological modeling and spectral analysis.</p>","PeriodicalId":37010,"journal":{"name":"Applied Network Science","volume":"8 1","pages":"20"},"PeriodicalIF":2.2,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10144902/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9784643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.1007/s41109-023-00540-z
Shiv G Yücel, Rafael H M Pereira, Pedro S Peixoto, Chico Q Camargo
The COVID-19 pandemic has shed light on how the spread of infectious diseases worldwide are importantly shaped by both human mobility networks and socio-economic factors. However, few studies look at how both socio-economic conditions and the complex network properties of human mobility patterns interact, and how they influence outbreaks together. We introduce a novel methodology, called the Infection Delay Model, to calculate how the arrival time of an infection varies geographically, considering both effective distance-based metrics and differences in regions' capacity to isolate-a feature associated with socio-economic inequalities. To illustrate an application of the Infection Delay Model, this paper integrates household travel survey data with cell phone mobility data from the São Paulo metropolitan region to assess the effectiveness of lockdowns to slow the spread of COVID-19. Rather than operating under the assumption that the next pandemic will begin in the same region as the last, the model estimates infection delays under every possible outbreak scenario, allowing for generalizable insights into the effectiveness of interventions to delay a region's first case. The model sheds light on how the effectiveness of lockdowns to slow the spread of disease is influenced by the interaction of mobility networks and socio-economic levels. We find that a negative relationship emerges between network centrality and the infection delay after a lockdown, irrespective of income. Furthermore, for regions across all income and centrality levels, outbreaks starting in less central locations were more effectively slowed by a lockdown. Using the Infection Delay Model, this paper identifies and quantifies a new dimension of disease risk faced by those most central in a mobility network.
{"title":"Impact of network centrality and income on slowing infection spread after outbreaks.","authors":"Shiv G Yücel, Rafael H M Pereira, Pedro S Peixoto, Chico Q Camargo","doi":"10.1007/s41109-023-00540-z","DOIUrl":"https://doi.org/10.1007/s41109-023-00540-z","url":null,"abstract":"<p><p>The COVID-19 pandemic has shed light on how the spread of infectious diseases worldwide are importantly shaped by both human mobility networks and socio-economic factors. However, few studies look at how both socio-economic conditions and the complex network properties of human mobility patterns interact, and how they influence outbreaks together. We introduce a novel methodology, called the Infection Delay Model, to calculate how the arrival time of an infection varies geographically, considering both effective distance-based metrics and differences in regions' capacity to isolate-a feature associated with socio-economic inequalities. To illustrate an application of the Infection Delay Model, this paper integrates household travel survey data with cell phone mobility data from the São Paulo metropolitan region to assess the effectiveness of lockdowns to slow the spread of COVID-19. Rather than operating under the assumption that the next pandemic will begin in the same region as the last, the model estimates infection delays under every possible outbreak scenario, allowing for generalizable insights into the effectiveness of interventions to delay a region's first case. The model sheds light on how the effectiveness of lockdowns to slow the spread of disease is influenced by the interaction of mobility networks and socio-economic levels. We find that a negative relationship emerges between network centrality and the infection delay after a lockdown, irrespective of income. Furthermore, for regions across all income and centrality levels, outbreaks starting in less central locations were more effectively slowed by a lockdown. Using the Infection Delay Model, this paper identifies and quantifies a new dimension of disease risk faced by those most central in a mobility network.</p>","PeriodicalId":37010,"journal":{"name":"Applied Network Science","volume":"8 1","pages":"16"},"PeriodicalIF":2.2,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9951146/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10872656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01Epub Date: 2023-02-16DOI: 10.1007/s41109-023-00534-x
Ferenc Béres, Tamás Vilmos Michaletzky, Rita Csoma, András A Benczúr
We investigate automatic methods to assess COVID vaccination views in Twitter content. Vaccine skepticism has been a controversial topic of long history that has become more important than ever with the COVID-19 pandemic. Our main goal is to demonstrate the importance of network effects in detecting vaccination skeptic content. Towards this end, we collected and manually labeled vaccination-related Twitter content in the first half of 2021. Our experiments confirm that the network carries information that can be exploited to improve the accuracy of classifying attitudes towards vaccination over content classification as baseline. We evaluate a variety of network embedding algorithms, which we combine with text embedding to obtain classifiers for vaccination skeptic content. In our experiments, by using Walklets, we improve the AUC of the best classifier with no network information by. We publicly release our labels, Tweet IDs and source codes on GitHub.
{"title":"Network embedding aided vaccine skepticism detection.","authors":"Ferenc Béres, Tamás Vilmos Michaletzky, Rita Csoma, András A Benczúr","doi":"10.1007/s41109-023-00534-x","DOIUrl":"10.1007/s41109-023-00534-x","url":null,"abstract":"<p><p>We investigate automatic methods to assess COVID vaccination views in Twitter content. Vaccine skepticism has been a controversial topic of long history that has become more important than ever with the COVID-19 pandemic. Our main goal is to demonstrate the importance of network effects in detecting vaccination skeptic content. Towards this end, we collected and manually labeled vaccination-related Twitter content in the first half of 2021. Our experiments confirm that the network carries information that can be exploited to improve the accuracy of classifying attitudes towards vaccination over content classification as baseline. We evaluate a variety of network embedding algorithms, which we combine with text embedding to obtain classifiers for vaccination skeptic content. In our experiments, by using Walklets, we improve the AUC of the best classifier with no network information by. We publicly release our labels, Tweet IDs and source codes on GitHub.</p>","PeriodicalId":37010,"journal":{"name":"Applied Network Science","volume":"8 1","pages":"11"},"PeriodicalIF":1.3,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9933796/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10765431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.1007/s41109-023-00538-7
Zachary M Boyd, Nick Callor, Taylor Gledhill, Abigail Jenkins, Robert Snellman, Benjamin Webb, Raelynn Wonnacott
Genealogical networks (i.e. family trees) are of growing interest, with the largest known data sets now including well over one billion individuals. Interest in family history also supports an 8.5 billion dollar industry whose size is projected to double within 7 years [FutureWise report HC-1137]. Yet little mathematical attention has been paid to the complex network properties of genealogical networks, especially at large scales. The structure of genealogical networks is of particular interest due to the practice of forming unions, e.g. marriages, that are typically well outside one's immediate family. In most other networks, including other social networks, no equivalent restriction exists on the distance at which relationships form. To study the effect this has on genealogical networks we use persistent homology to identify and compare the structure of 101 genealogical and 31 other social networks. Specifically, we introduce the notion of a network's persistence curve, which encodes the network's set of persistence intervals. We find that the persistence curves of genealogical networks have a distinct structure when compared to other social networks. This difference in structure also extends to subnetworks of genealogical and social networks suggesting that, even with incomplete data, persistent homology can be used to meaningfully analyze genealogical networks. Here we also describe how concepts from genealogical networks, such as common ancestor cycles, are represented using persistent homology. We expect that persistent homology tools will become increasingly important in genealogical exploration as popular interest in ancestry research continues to expand.
{"title":"The persistent homology of genealogical networks.","authors":"Zachary M Boyd, Nick Callor, Taylor Gledhill, Abigail Jenkins, Robert Snellman, Benjamin Webb, Raelynn Wonnacott","doi":"10.1007/s41109-023-00538-7","DOIUrl":"https://doi.org/10.1007/s41109-023-00538-7","url":null,"abstract":"<p><p>Genealogical networks (i.e. family trees) are of growing interest, with the largest known data sets now including well over one billion individuals. Interest in family history also supports an 8.5 billion dollar industry whose size is projected to double within 7 years [FutureWise report HC-1137]. Yet little mathematical attention has been paid to the complex network properties of genealogical networks, especially at large scales. The structure of genealogical networks is of particular interest due to the practice of forming unions, e.g. marriages, that are typically well outside one's immediate family. In most other networks, including other social networks, no equivalent restriction exists on the distance at which relationships form. To study the effect this has on genealogical networks we use persistent homology to identify and compare the structure of 101 genealogical and 31 other social networks. Specifically, we introduce the notion of a network's persistence curve, which encodes the network's set of persistence intervals. We find that the persistence curves of genealogical networks have a distinct structure when compared to other social networks. This difference in structure also extends to subnetworks of genealogical and social networks suggesting that, even with incomplete data, persistent homology can be used to meaningfully analyze genealogical networks. Here we also describe how concepts from genealogical networks, such as common ancestor cycles, are represented using persistent homology. We expect that persistent homology tools will become increasingly important in genealogical exploration as popular interest in ancestry research continues to expand.</p>","PeriodicalId":37010,"journal":{"name":"Applied Network Science","volume":"8 1","pages":"15"},"PeriodicalIF":2.2,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9950181/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9353129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-15DOI: 10.1007/s41109-022-00523-6
Marcos S. Lyra, B. Damásio, Flávio L. Pinheiro, F. Bação
{"title":"Fraud, corruption, and collusion in public procurement activities, a systematic literature review on data-driven methods","authors":"Marcos S. Lyra, B. Damásio, Flávio L. Pinheiro, F. Bação","doi":"10.1007/s41109-022-00523-6","DOIUrl":"https://doi.org/10.1007/s41109-022-00523-6","url":null,"abstract":"","PeriodicalId":37010,"journal":{"name":"Applied Network Science","volume":"7 1","pages":"1-30"},"PeriodicalIF":2.2,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45918762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}