Justine Zhang, William L Hamilton, Cristian Danescu-Niculescu-Mizil, Dan Jurafsky, Jure Leskovec
A community's identity defines and shapes its internal dynamics. Our current understanding of this interplay is mostly limited to glimpses gathered from isolated studies of individual communities. In this work we provide a systematic exploration of the nature of this relation across a wide variety of online communities. To this end we introduce a quantitative, language-based typology reflecting two key aspects of a community's identity: how distinctive, and how temporally dynamic it is. By mapping almost 300 Reddit communities into the landscape induced by this typology, we reveal regularities in how patterns of user engagement vary with the characteristics of a community. Our results suggest that the way new and existing users engage with a community depends strongly and systematically on the nature of the collective identity it fosters, in ways that are highly consequential to community maintainers. For example, communities with distinctive and highly dynamic identities are more likely to retain their users. However, such niche communities also exhibit much larger acculturation gaps between existing users and newcomers, which potentially hinder the integration of the latter. More generally, our methodology reveals differences in how various social phenomena manifest across communities, and shows that structuring the multi-community landscape can lead to a better understanding of the systematic nature of this diversity.
{"title":"Community Identity and User Engagement in a Multi-Community Landscape.","authors":"Justine Zhang, William L Hamilton, Cristian Danescu-Niculescu-Mizil, Dan Jurafsky, Jure Leskovec","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>A community's identity defines and shapes its internal dynamics. Our current understanding of this interplay is mostly limited to glimpses gathered from isolated studies of individual communities. In this work we provide a systematic exploration of the nature of this relation across a wide variety of online communities. To this end we introduce a quantitative, language-based typology reflecting two key aspects of a community's identity: how <i>distinctive</i>, and how temporally <i>dynamic</i> it is. By mapping almost 300 Reddit communities into the landscape induced by this typology, we reveal regularities in how patterns of user engagement vary with the characteristics of a community. Our results suggest that the way new and existing users engage with a community depends strongly and systematically on the nature of the collective identity it fosters, in ways that are highly consequential to community maintainers. For example, communities with distinctive and highly dynamic identities are more likely to retain their users. However, such niche communities also exhibit much larger acculturation gaps between existing users and newcomers, which potentially hinder the integration of the latter. More generally, our methodology reveals differences in how various social phenomena manifest across communities, and shows that structuring the multi-community landscape can lead to a better understanding of the systematic nature of this diversity.</p>","PeriodicalId":74525,"journal":{"name":"Proceedings of the ... International AAAI Conference on Weblogs and Social Media. International AAAI Conference on Weblogs and Social Media","volume":"2017 ","pages":"377-386"},"PeriodicalIF":0.0,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5774974/pdf/nihms933918.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35754496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-03-09DOI: 10.1609/icwsm.v11i1.14972
William L. Hamilton, Justine Zhang, Cristian Danescu-Niculescu-Mizil, Dan Jurafsky, J. Leskovec
Loyalty is an essential component of multi-community engagement. When users have the choice to engage with a variety of different communities, they often become loyal to just one, focusing on that community at the expense of others. However, it is unclear how loyalty is manifested in user behavior, or whether certain community characteristics encourage loyalty. In this paper we operationalize loyalty as a user-community relation: users loyal to a community consistently prefer it over all others; loyal communities retain their loyal users over time. By exploring a large set of Reddit communities, we reveal that loyalty is manifested in remarkably consistent behaviors. Loyal users employ language that signals collective identity and engage with more esoteric, less popular content, indicating that they may play a curational role in surfacing new material. Loyal communities have denser user-user interaction networks and lower rates of triadic closure, suggesting that community-level loyalty is associated with more cohesive interactions and less fragmentation into subgroups. We exploit these general patterns to predict future rates of loyalty. Our results show that a user's propensity to become loyal is apparent from their initial interactions with a community, suggesting that some users are intrinsically loyal from the very beginning.
{"title":"Loyalty in Online Communities","authors":"William L. Hamilton, Justine Zhang, Cristian Danescu-Niculescu-Mizil, Dan Jurafsky, J. Leskovec","doi":"10.1609/icwsm.v11i1.14972","DOIUrl":"https://doi.org/10.1609/icwsm.v11i1.14972","url":null,"abstract":"Loyalty is an essential component of multi-community engagement. When users have the choice to engage with a variety of different communities, they often become loyal to just one, focusing on that community at the expense of others. However, it is unclear how loyalty is manifested in user behavior, or whether certain community characteristics encourage loyalty. In this paper we operationalize loyalty as a user-community relation: users loyal to a community consistently prefer it over all others; loyal communities retain their loyal users over time. By exploring a large set of Reddit communities, we reveal that loyalty is manifested in remarkably consistent behaviors. Loyal users employ language that signals collective identity and engage with more esoteric, less popular content, indicating that they may play a curational role in surfacing new material. Loyal communities have denser user-user interaction networks and lower rates of triadic closure, suggesting that community-level loyalty is associated with more cohesive interactions and less fragmentation into subgroups. We exploit these general patterns to predict future rates of loyalty. Our results show that a user's propensity to become loyal is apparent from their initial interactions with a community, suggesting that some users are intrinsically loyal from the very beginning.","PeriodicalId":74525,"journal":{"name":"Proceedings of the ... International AAAI Conference on Weblogs and Social Media. International AAAI Conference on Weblogs and Social Media","volume":"27 1","pages":"540-543"},"PeriodicalIF":0.0,"publicationDate":"2017-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90805480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Munmun De Choudhury, Shagun Jhaver, Benjamin Sugar, Ingmar Weber
From the Arab Spring to the Occupy Movement, social media has been instrumental in driving and supporting socio-political movements throughout the world. In this paper, we present one of the first social media investigations of an activist movement around racial discrimination and police violence, known as "Black Lives Matter". Considering Twitter as a sensor for the broader community's perception of the events related to the movement, we study participation over time, the geographical differences in this participation, and its relationship to protests that unfolded on the ground. We find evidence for continued participation across four temporally separated events related to the movement, with notable changes in engagement and language over time. We also find that participants from regions of historically high rates of black victimization due to police violence tend to express greater negativity and make more references to loss of life. Finally, we observe that social media attributes of affect, behavior and language can predict future protest participation on the ground. We discuss the role of social media in enabling collective action around this unique movement and how social media platforms may help understand perceptions on a socially contested and sensitive issue like race.
{"title":"Social Media Participation in an Activist Movement for Racial Equality.","authors":"Munmun De Choudhury, Shagun Jhaver, Benjamin Sugar, Ingmar Weber","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>From the Arab Spring to the Occupy Movement, social media has been instrumental in driving and supporting socio-political movements throughout the world. In this paper, we present one of the first social media investigations of an activist movement around racial discrimination and police violence, known as \"Black Lives Matter\". Considering Twitter as a sensor for the broader community's perception of the events related to the movement, we study participation over time, the geographical differences in this participation, and its relationship to protests that unfolded on the ground. We find evidence for continued participation across four temporally separated events related to the movement, with notable changes in engagement and language over time. We also find that participants from regions of historically high rates of black victimization due to police violence tend to express greater negativity and make more references to loss of life. Finally, we observe that social media attributes of affect, behavior and language can predict future protest participation on the ground. We discuss the role of social media in enabling collective action around this unique movement and how social media platforms may help understand perceptions on a socially contested and sensitive issue like race.</p>","PeriodicalId":74525,"journal":{"name":"Proceedings of the ... International AAAI Conference on Weblogs and Social Media. International AAAI Conference on Weblogs and Social Media","volume":"2016 ","pages":"92-101"},"PeriodicalIF":0.0,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5565729/pdf/nihms891348.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35346751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-03-31DOI: 10.1609/icwsm.v10i1.14735
Zhijun Yin, You Chen, D. Fabbri, Jimeng Sun, B. Malin
User-generated content in social media is increasingly acknowledged as a rich resource for research into health problems. One particular area of interest is in the semantics individuals evoke because they can influence when health-related information is disclosed. While there have been multiple investigations into why self-disclose occurs, much less is known about when individuals choose to disclose information about other people (e.g., a relative), which is a significant privacy concern. In this paper, we introduce a novel framework to investigate how semantics influence disclosure routines for 34 health issues. This framework begins with a supervised classification model to distinguish tweets that communicate personal health issues from confounding concepts (e.g., metaphorical statements that include a health-related keyword). Next, we annotate tweets for each health issue with linguistic and psychological categories (e.g. social processes, affective processes and personal concerns). Then, we apply a non-negative matrix factorization over a health issue-by-language category space. Finally, the factorized basis space is leveraged to group health issues into natural aggregations based around how they are discussed. We evaluate this framework with four months of tweets (over 200 million) and show that certain semantics correspond with whom a health mention pertains to. Our findings show that health issues related with family members, high medical cost and social support (e.g., Alzheimer's Disease, cancer, and Down syndrome) lead to tweets that are more likely to disclose another individual's health status, while tweets with more benign health issues (e.g., allergy, arthritis, and bronchitis) with biological processes (e.g., health and ingestion) and negative emotions are more likely to contain self-disclosures.
{"title":"#PrayForDad: Learning the Semantics Behind Why Social Media Users Disclose Health Information","authors":"Zhijun Yin, You Chen, D. Fabbri, Jimeng Sun, B. Malin","doi":"10.1609/icwsm.v10i1.14735","DOIUrl":"https://doi.org/10.1609/icwsm.v10i1.14735","url":null,"abstract":"User-generated content in social media is increasingly acknowledged as a rich resource for research into health problems. One particular area of interest is in the semantics individuals evoke because they can influence when health-related information is disclosed. While there have been multiple investigations into why self-disclose occurs, much less is known about when individuals choose to disclose information about other people (e.g., a relative), which is a significant privacy concern. In this paper, we introduce a novel framework to investigate how semantics influence disclosure routines for 34 health issues. This framework begins with a supervised classification model to distinguish tweets that communicate personal health issues from confounding concepts (e.g., metaphorical statements that include a health-related keyword). Next, we annotate tweets for each health issue with linguistic and psychological categories (e.g. social processes, affective processes and personal concerns). Then, we apply a non-negative matrix factorization over a health issue-by-language category space. Finally, the factorized basis space is leveraged to group health issues into natural aggregations based around how they are discussed. We evaluate this framework with four months of tweets (over 200 million) and show that certain semantics correspond with whom a health mention pertains to. Our findings show that health issues related with family members, high medical cost and social support (e.g., Alzheimer's Disease, cancer, and Down syndrome) lead to tweets that are more likely to disclose another individual's health status, while tweets with more benign health issues (e.g., allergy, arthritis, and bronchitis) with biological processes (e.g., health and ingestion) and negative emotions are more likely to contain self-disclosures.","PeriodicalId":74525,"journal":{"name":"Proceedings of the ... International AAAI Conference on Weblogs and Social Media. International AAAI Conference on Weblogs and Social Media","volume":"46 1","pages":"456-465"},"PeriodicalIF":0.0,"publicationDate":"2016-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78808031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-03-31DOI: 10.1609/icwsm.v10i1.14758
M. Choudhury, Shagun Jhaver, Benjamin Sugar, Ingmar Weber
From the Arab Spring to the Occupy Movement, social media has been instrumental in driving and supporting socio-political movements throughout the world. In this paper, we present one of the first social media investigations of an activist movement around racial discrimination and police violence, known as "Black Lives Matter". Considering Twitter as a sensor for the broader community's perception of the events related to the movement, we study participation over time, the geographical differences in this participation, and its relationship to protests that unfolded on the ground. We find evidence for continued participation across four temporally separated events related to the movement, with notable changes in engagement and language over time. We also find that participants from regions of historically high rates of black victimization due to police violence tend to express greater negativity and make more references to loss of life. Finally, we observe that social media attributes of affect, behavior and language can predict future protest participation on the ground. We discuss the role of social media in enabling collective action around this unique movement and how social media platforms may help understand perceptions on a socially contested and sensitive issue like race.
{"title":"Social Media Participation in an Activist Movement for Racial Equality","authors":"M. Choudhury, Shagun Jhaver, Benjamin Sugar, Ingmar Weber","doi":"10.1609/icwsm.v10i1.14758","DOIUrl":"https://doi.org/10.1609/icwsm.v10i1.14758","url":null,"abstract":"From the Arab Spring to the Occupy Movement, social media has been instrumental in driving and supporting socio-political movements throughout the world. In this paper, we present one of the first social media investigations of an activist movement around racial discrimination and police violence, known as \"Black Lives Matter\". Considering Twitter as a sensor for the broader community's perception of the events related to the movement, we study participation over time, the geographical differences in this participation, and its relationship to protests that unfolded on the ground. We find evidence for continued participation across four temporally separated events related to the movement, with notable changes in engagement and language over time. We also find that participants from regions of historically high rates of black victimization due to police violence tend to express greater negativity and make more references to loss of life. Finally, we observe that social media attributes of affect, behavior and language can predict future protest participation on the ground. We discuss the role of social media in enabling collective action around this unique movement and how social media platforms may help understand perceptions on a socially contested and sensitive issue like race.","PeriodicalId":74525,"journal":{"name":"Proceedings of the ... International AAAI Conference on Weblogs and Social Media. International AAAI Conference on Weblogs and Social Media","volume":"24 1","pages":"92-101"},"PeriodicalIF":0.0,"publicationDate":"2016-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73741266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
José Manuel Delgado Valdes, Jacob Eisenstein, Munmun De Choudhury
Exposure to frequent crime incidents has been found to have a negative bearing on the well-being of city residents, even if they are not themselves a direct victim. We pursue the research question of whether naturalistic data shared on Twitter may provide a "lens" to understand changes in psychological attributes of urban communities (1) immediately following crime incidents, as well as (2) due to long-term exposure to crime. We analyze half a million Twitter posts from the City of Atlanta in 2014, where the rate of violent crime is three times of the national average. In a first study, we develop a statistical method to detect changes in social media psychological attributes in the immediate aftermath of a crime event. Second, we develop a regression model that uses historical (yearlong) crime to predict Twitter negative emotion, anxiety, anger, and sadness. We do not find significant changes in social media affect immediately following crime in Atlanta. However we do observe significant ability of historical crime to account for heightened negative emotion and anger in the future. Our findings have implications in gauging the utility of social media to infer longitudinal and population-scale patterns of urban well-being.
{"title":"Psychological Effects of Urban Crime Gleaned from Social Media.","authors":"José Manuel Delgado Valdes, Jacob Eisenstein, Munmun De Choudhury","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Exposure to frequent crime incidents has been found to have a negative bearing on the well-being of city residents, even if they are not themselves a direct victim. We pursue the research question of whether naturalistic data shared on Twitter may provide a \"lens\" to understand changes in psychological attributes of urban communities (1) immediately following crime incidents, as well as (2) due to long-term exposure to crime. We analyze half a million Twitter posts from the City of Atlanta in 2014, where the rate of violent crime is three times of the national average. In a first study, we develop a statistical method to detect changes in social media psychological attributes in the immediate aftermath of a crime event. Second, we develop a regression model that uses historical (yearlong) crime to predict Twitter negative emotion, anxiety, anger, and sadness. We do not find significant changes in social media affect immediately following crime in Atlanta. However we do observe significant ability of historical crime to account for heightened negative emotion and anger in the future. Our findings have implications in gauging the utility of social media to infer longitudinal and population-scale patterns of urban well-being.</p>","PeriodicalId":74525,"journal":{"name":"Proceedings of the ... International AAAI Conference on Weblogs and Social Media. International AAAI Conference on Weblogs and Social Media","volume":"2015 ","pages":"598-601"},"PeriodicalIF":0.0,"publicationDate":"2015-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5648364/pdf/nihms703921.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35535751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Geoffrey Fairchild, Sara Y Del Valle, Lalindra De Silva, Alberto M Segre
Traditional disease surveillance systems suffer from several disadvantages, including reporting lags and antiquated technology, that have caused a movement towards internet-based disease surveillance systems. Internet systems are particularly attractive for disease outbreaks because they can provide data in near real-time and can be verified by individuals around the globe. However, most existing systems have focused on disease monitoring and do not provide a data repository for policy makers or researchers. In order to fill this gap, we analyzed Wikipedia article content. We demonstrate how a named-entity recognizer can be trained to tag case counts, death counts, and hospitalization counts in the article narrative that achieves an F1 score of 0.753. We also show, using the 2014 West African Ebola virus disease epidemic article as a case study, that there are detailed time series data that are consistently updated that closely align with ground truth data. We argue that Wikipedia can be used to create the first community-driven open-source emerging disease detection, monitoring, and repository system.
{"title":"Eliciting Disease Data from Wikipedia Articles.","authors":"Geoffrey Fairchild, Sara Y Del Valle, Lalindra De Silva, Alberto M Segre","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Traditional disease surveillance systems suffer from several disadvantages, including reporting lags and antiquated technology, that have caused a movement towards internet-based disease surveillance systems. Internet systems are particularly attractive for disease outbreaks because they can provide data in near real-time and can be verified by individuals around the globe. However, most existing systems have focused on disease monitoring and do not provide a data repository for policy makers or researchers. In order to fill this gap, we analyzed Wikipedia article content. We demonstrate how a named-entity recognizer can be trained to tag case counts, death counts, and hospitalization counts in the article narrative that achieves an F1 score of 0.753. We also show, using the 2014 West African Ebola virus disease epidemic article as a case study, that there are detailed time series data that are consistently updated that closely align with ground truth data. We argue that Wikipedia can be used to create the first community-driven open-source emerging disease detection, monitoring, and repository system.</p>","PeriodicalId":74525,"journal":{"name":"Proceedings of the ... International AAAI Conference on Weblogs and Social Media. International AAAI Conference on Weblogs and Social Media","volume":"2015 ","pages":"26-33"},"PeriodicalIF":0.0,"publicationDate":"2015-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5511739/pdf/nihms875513.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35180937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-04-21DOI: 10.1609/icwsm.v9i1.14665
Jose Manuel Delgado Valdes, Jacob Eisenstein, M. Choudhury
Exposure to frequent crime incidents has been found to have a negative bearing on the well-being of city residents, even if they are not themselves a direct victim. We pursue the research question of whether naturalistic data shared on Twitter may provide a "lens" to understand changes in psychological attributes of urban communities (1) immediately following crime incidents, as well as (2) due to long-term exposure to crime. We analyze half a million Twitter posts from the City of Atlanta in 2014, where the rate of violent crime is three times of the national average. In a first study, we develop a statistical method to detect changes in social media psychological attributes in the immediate aftermath of a crime event. Second, we develop a regression model that uses historical (yearlong) crime to predict Twitter negative emotion, anxiety, anger, and sadness. We do not find significant changes in social media affect immediately following crime in Atlanta. However we do observe significant ability of historical crime to account for heightened negative emotion and anger in the future. Our findings have implications in gauging the utility of social media to infer longitudinal and population-scale patterns of urban well-being.
{"title":"Psychological Effects of Urban Crime Gleaned from Social Media","authors":"Jose Manuel Delgado Valdes, Jacob Eisenstein, M. Choudhury","doi":"10.1609/icwsm.v9i1.14665","DOIUrl":"https://doi.org/10.1609/icwsm.v9i1.14665","url":null,"abstract":"Exposure to frequent crime incidents has been found to have a negative bearing on the well-being of city residents, even if they are not themselves a direct victim. We pursue the research question of whether naturalistic data shared on Twitter may provide a \"lens\" to understand changes in psychological attributes of urban communities (1) immediately following crime incidents, as well as (2) due to long-term exposure to crime. We analyze half a million Twitter posts from the City of Atlanta in 2014, where the rate of violent crime is three times of the national average. In a first study, we develop a statistical method to detect changes in social media psychological attributes in the immediate aftermath of a crime event. Second, we develop a regression model that uses historical (yearlong) crime to predict Twitter negative emotion, anxiety, anger, and sadness. We do not find significant changes in social media affect immediately following crime in Atlanta. However we do observe significant ability of historical crime to account for heightened negative emotion and anger in the future. Our findings have implications in gauging the utility of social media to infer longitudinal and population-scale patterns of urban well-being.","PeriodicalId":74525,"journal":{"name":"Proceedings of the ... International AAAI Conference on Weblogs and Social Media. International AAAI Conference on Weblogs and Social Media","volume":"69 1","pages":"598-601"},"PeriodicalIF":0.0,"publicationDate":"2015-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82608958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Geoffrey Fairchild, Lalindra De Silva, S. D. Valle, Alberto Maria Segre
Traditional disease surveillance systems suffer from several disadvantages, including reporting lags and antiquated technology, that have caused a movement towards internet-based disease surveillance systems. Internet systems are particularly attractive for disease outbreaks because they can provide data in near real-time and can be verified by individuals around the globe. However, most existing systems have focused on disease monitoring and do not provide a data repository for policy makers or researchers. In order to fill this gap, we analyzed Wikipedia article content. We demonstrate how a named-entity recognizer can be trained to tag case counts, death counts, and hospitalization counts in the article narrative that achieves an F1 score of 0.753. We also show, using the 2014 West African Ebola virus disease epidemic article as a case study, that there are detailed time series data that are consistently updated that closely align with ground truth data. We argue that Wikipedia can be used to create the first community-driven open-source emerging disease detection, monitoring, and repository system.
{"title":"Eliciting Disease Data from Wikipedia Articles","authors":"Geoffrey Fairchild, Lalindra De Silva, S. D. Valle, Alberto Maria Segre","doi":"10.5210/OJPHI.V8I1.6526","DOIUrl":"https://doi.org/10.5210/OJPHI.V8I1.6526","url":null,"abstract":"Traditional disease surveillance systems suffer from several disadvantages, including reporting lags and antiquated technology, that have caused a movement towards internet-based disease surveillance systems. Internet systems are particularly attractive for disease outbreaks because they can provide data in near real-time and can be verified by individuals around the globe. However, most existing systems have focused on disease monitoring and do not provide a data repository for policy makers or researchers. In order to fill this gap, we analyzed Wikipedia article content. We demonstrate how a named-entity recognizer can be trained to tag case counts, death counts, and hospitalization counts in the article narrative that achieves an F1 score of 0.753. We also show, using the 2014 West African Ebola virus disease epidemic article as a case study, that there are detailed time series data that are consistently updated that closely align with ground truth data. We argue that Wikipedia can be used to create the first community-driven open-source emerging disease detection, monitoring, and repository system.","PeriodicalId":74525,"journal":{"name":"Proceedings of the ... International AAAI Conference on Weblogs and Social Media. International AAAI Conference on Weblogs and Social Media","volume":"106 1","pages":"26-33"},"PeriodicalIF":0.0,"publicationDate":"2015-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73523120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}