Samuel Munaf, Kevin Swingler, Franz Brülisauer, Anthony O’Hare, George Gunn, Aaron Reeves
{"title":"Text mining of veterinary forums for epidemiological surveillance supplementation","authors":"Samuel Munaf, Kevin Swingler, Franz Brülisauer, Anthony O’Hare, George Gunn, Aaron Reeves","doi":"10.1007/s13278-023-01131-7","DOIUrl":null,"url":null,"abstract":"Abstract Web scraping and text mining are popular computer science methods deployed by public health researchers to augment traditional epidemiological surveillance. However, within veterinary disease surveillance, such techniques are still in the early stages of development and have not yet been fully utilised. This study presents an exploration into the utility of incorporating internet-based data to better understand smallholder farming communities within the UK, by using online text extraction and the subsequent mining of this data. Web scraping of the livestock fora was conducted, with text mining and topic modelling of data in search of common themes, words, and topics found within the text, in addition to temporal analysis through anomaly detection. Results revealed that some of the key areas in pig forum discussions included identification, age management, containment, and breeding and weaning practices. In discussions about poultry farming, a preference for free-range practices was expressed, along with a focus on feeding practices and addressing red mite infestations. Temporal topic modelling revealed an increase in conversations around pig containment and care, as well as poultry equipment maintenance. Moreover, anomaly detection was discovered to be particularly effective for tracking unusual spikes in forum activity, which may suggest new concerns or trends. Internet data can be a very effective tool in aiding traditional veterinary surveillance methods, but the requirement for human validation of said data is crucial. This opens avenues of research via the incorporation of other dynamic social media data, namely Twitter, in addition to location analysis to highlight spatial patterns.","PeriodicalId":21842,"journal":{"name":"Social Network Analysis and Mining","volume":"19 1","pages":"0"},"PeriodicalIF":2.3000,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Social Network Analysis and Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s13278-023-01131-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Abstract Web scraping and text mining are popular computer science methods deployed by public health researchers to augment traditional epidemiological surveillance. However, within veterinary disease surveillance, such techniques are still in the early stages of development and have not yet been fully utilised. This study presents an exploration into the utility of incorporating internet-based data to better understand smallholder farming communities within the UK, by using online text extraction and the subsequent mining of this data. Web scraping of the livestock fora was conducted, with text mining and topic modelling of data in search of common themes, words, and topics found within the text, in addition to temporal analysis through anomaly detection. Results revealed that some of the key areas in pig forum discussions included identification, age management, containment, and breeding and weaning practices. In discussions about poultry farming, a preference for free-range practices was expressed, along with a focus on feeding practices and addressing red mite infestations. Temporal topic modelling revealed an increase in conversations around pig containment and care, as well as poultry equipment maintenance. Moreover, anomaly detection was discovered to be particularly effective for tracking unusual spikes in forum activity, which may suggest new concerns or trends. Internet data can be a very effective tool in aiding traditional veterinary surveillance methods, but the requirement for human validation of said data is crucial. This opens avenues of research via the incorporation of other dynamic social media data, namely Twitter, in addition to location analysis to highlight spatial patterns.
期刊介绍:
Social Network Analysis and Mining (SNAM) is a multidisciplinary journal serving researchers and practitioners in academia and industry. It is the main venue for a wide range of researchers and readers from computer science, network science, social sciences, mathematical sciences, medical and biological sciences, financial, management and political sciences. We solicit experimental and theoretical work on social network analysis and mining using a wide range of techniques from social sciences, mathematics, statistics, physics, network science and computer science. The main areas covered by SNAM include: (1) data mining advances on the discovery and analysis of communities, personalization for solitary activities (e.g. search) and social activities (e.g. discovery of potential friends), the analysis of user behavior in open forums (e.g. conventional sites, blogs and forums) and in commercial platforms (e.g. e-auctions), and the associated security and privacy-preservation challenges; (2) social network modeling, construction of scalable and customizable social network infrastructure, identification and discovery of complex, dynamics, growth, and evolution patterns using machine learning and data mining approaches or multi-agent based simulation; (3) social network analysis and mining for open source intelligence and homeland security. Papers should elaborate on data mining and machine learning or related methods, issues associated to data preparation and pattern interpretation, both for conventional data (usage logs, query logs, document collections) and for multimedia data (pictures and their annotations, multi-channel usage data). Topics include but are not limited to: Applications of social network in business engineering, scientific and medical domains, homeland security, terrorism and criminology, fraud detection, public sector, politics, and case studies.