The term “cybersecurity” has gained widespread popularity but has not been defined properly. The term is used by many different people to mean different things in different contexts. A better understanding of “cybersecurity” will allow us a better understanding of what it means to be “cybersecure.” This in turn will allow us to take more appropriate measures to ensure actual cybersecurity.
{"title":"Toward a Better Understanding of “Cybersecurity”","authors":"J. V. D. Ham","doi":"10.1145/3442445","DOIUrl":"https://doi.org/10.1145/3442445","url":null,"abstract":"The term “cybersecurity” has gained widespread popularity but has not been defined properly. The term is used by many different people to mean different things in different contexts. A better understanding of “cybersecurity” will allow us a better understanding of what it means to be “cybersecure.” This in turn will allow us to take more appropriate measures to ensure actual cybersecurity.","PeriodicalId":202552,"journal":{"name":"Digital Threats: Research and Practice","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129059088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christos Iliou, Theodoros Kostoulas, T. Tsikrika, Vasilis Katos, S. Vrochidis, I. Kompatsiaris
Web bots vary in sophistication based on their purpose, ranging from simple automated scripts to advanced web bots that have a browser fingerprint, support the main browser functionalities, and exhibit a humanlike behaviour. Advanced web bots are especially appealing to malicious web bot creators, due to their browserlike fingerprint and humanlike behaviour that reduce their detectability. This work proposes a web bot detection framework that comprises two detection modules: (i) a detection module that utilises web logs, and (ii) a detection module that leverages mouse movements. The framework combines the results of each module in a novel way to capture the different temporal characteristics of the web logs and the mouse movements, as well as the spatial characteristics of the mouse movements. We assess its effectiveness on web bots of two levels of evasiveness: (a) moderate web bots that have a browser fingerprint and (b) advanced web bots that have a browser fingerprint and also exhibit a humanlike behaviour. We show that combining web logs with visitors’ mouse movements is more effective and robust toward detecting advanced web bots that try to evade detection, as opposed to using only one of those approaches.
{"title":"Detection of Advanced Web Bots by Combining Web Logs with Mouse Behavioural Biometrics","authors":"Christos Iliou, Theodoros Kostoulas, T. Tsikrika, Vasilis Katos, S. Vrochidis, I. Kompatsiaris","doi":"10.1145/3447815","DOIUrl":"https://doi.org/10.1145/3447815","url":null,"abstract":"Web bots vary in sophistication based on their purpose, ranging from simple automated scripts to advanced web bots that have a browser fingerprint, support the main browser functionalities, and exhibit a humanlike behaviour. Advanced web bots are especially appealing to malicious web bot creators, due to their browserlike fingerprint and humanlike behaviour that reduce their detectability. This work proposes a web bot detection framework that comprises two detection modules: (i) a detection module that utilises web logs, and (ii) a detection module that leverages mouse movements. The framework combines the results of each module in a novel way to capture the different temporal characteristics of the web logs and the mouse movements, as well as the spatial characteristics of the mouse movements. We assess its effectiveness on web bots of two levels of evasiveness: (a) moderate web bots that have a browser fingerprint and (b) advanced web bots that have a browser fingerprint and also exhibit a humanlike behaviour. We show that combining web logs with visitors’ mouse movements is more effective and robust toward detecting advanced web bots that try to evade detection, as opposed to using only one of those approaches.","PeriodicalId":202552,"journal":{"name":"Digital Threats: Research and Practice","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114542628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Malware authors use domain generation algorithms to establish more reliable communication methods that can avoid reactive defender blocklisting techniques. Network defense has sought to supplement blocklists with methods for detecting machine-generated domains. We present a repeatable evaluation and comparison of the available open source detection methods. We designed our evaluation with multiple interrelated aspects, to improve both interpretability and realism. In addition to evaluating detection methods, we assess the impact of the domain generation ecosystem on prior results about the nature of blocklists and how they are maintained. The results of the evaluation of open source detection methods finds all methods are inadequate for practical use. The results of the blocklist impact study finds that generated domains decrease the overlap among blocklists; however, while the effect is large in relative terms, the baseline is so small that the core conclusions of the prior work are sustained. Namely, that blocklist construction is very targeted, context-specific, and as a result blocklists do no overlap much. We recommend that Domain Generation Algorithm detection should also be similarly narrowly targeted to specific algorithms and specific malware families, rather than attempting to create general-purpose detection for machine-generated domains.
{"title":"The Ecosystem of Detection and Blocklisting of Domain Generation","authors":"Leigh Metcalf, Jonathan M. Spring","doi":"10.1145/3423951","DOIUrl":"https://doi.org/10.1145/3423951","url":null,"abstract":"Malware authors use domain generation algorithms to establish more reliable communication methods that can avoid reactive defender blocklisting techniques. Network defense has sought to supplement blocklists with methods for detecting machine-generated domains. We present a repeatable evaluation and comparison of the available open source detection methods. We designed our evaluation with multiple interrelated aspects, to improve both interpretability and realism. In addition to evaluating detection methods, we assess the impact of the domain generation ecosystem on prior results about the nature of blocklists and how they are maintained. The results of the evaluation of open source detection methods finds all methods are inadequate for practical use. The results of the blocklist impact study finds that generated domains decrease the overlap among blocklists; however, while the effect is large in relative terms, the baseline is so small that the core conclusions of the prior work are sustained. Namely, that blocklist construction is very targeted, context-specific, and as a result blocklists do no overlap much. We recommend that Domain Generation Algorithm detection should also be similarly narrowly targeted to specific algorithms and specific malware families, rather than attempting to create general-purpose detection for machine-generated domains.","PeriodicalId":202552,"journal":{"name":"Digital Threats: Research and Practice","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122228202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In an age where news information is created by millions and consumed by billions over social media (SM) every day, issues of information biases, fake news, and echo-chambers have dominated the corridors of technology firms, news corporations, policy makers, and society. While multiple disciplines have tried to tackle the issue using their disciplinary lenses, there has, hitherto, been no integrative model that surface the intricate, albeit “dark” explainable AI confluence of both technology and psychology. Investigating information bias anchoring as the overarching phenomenon, this research proposes a theoretical framework that brings together traditionally fragmented domains of AI technology, and human psychology. The proposed Information Bias Anchoring Model reveals how SM news information creates an information deluge leading to uncertainty, and how technological rationality and individual biases intersect to mitigate the uncertainty, often leading to news information biases. The research ends with a discussion of contributions and offering to reduce information bias anchoring.
{"title":"A Perfect Storm","authors":"P. Datta, Mark R. Whitmore, Joseph K. Nwankpa","doi":"10.1145/3428157","DOIUrl":"https://doi.org/10.1145/3428157","url":null,"abstract":"In an age where news information is created by millions and consumed by billions over social media (SM) every day, issues of information biases, fake news, and echo-chambers have dominated the corridors of technology firms, news corporations, policy makers, and society. While multiple disciplines have tried to tackle the issue using their disciplinary lenses, there has, hitherto, been no integrative model that surface the intricate, albeit “dark” explainable AI confluence of both technology and psychology. Investigating information bias anchoring as the overarching phenomenon, this research proposes a theoretical framework that brings together traditionally fragmented domains of AI technology, and human psychology. The proposed Information Bias Anchoring Model reveals how SM news information creates an information deluge leading to uncertainty, and how technological rationality and individual biases intersect to mitigate the uncertainty, often leading to news information biases. The research ends with a discussion of contributions and offering to reduce information bias anchoring.","PeriodicalId":202552,"journal":{"name":"Digital Threats: Research and Practice","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131201965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li Wang, S. Iyengar, Amith K. Belman, P. Sniatala, V. Phoha, C. Wan
Potential for huge loss from malicious exploitation of software calls for development of principles of cyber-insurance. Estimating what to insure and for how much and what might be the premiums poses challenges because of the uncertainties, such as the timings of emergence and lethality of malicious apps, human propensity to favor ease by giving more privilege to downloaded apps over inconvenience of delay or functionality, the chance of infection determined by the lifestyle of the mobile device user, and the monetary value of the compromise of software, and so on. We provide a theoretical framework for cyber-insurance backed by game-theoretic formulation to calculate monetary value of risk and the insurance premiums associated with software compromise. By establishing the conditions for Nash equilibrium between strategies of an adversary and software we derive probabilities for risk, potential loss, gain to adversary from app categories, such as lifestyles, entertainment, education, and so on, and their prevalence ratios. Using simulations over a range of possibilities, and using publicly available malware statistics, we provide insights about the strategies that can be taken by the software and the adversary. We show the application of our framework on the most recent mobile malware data (2018 ISTR report—data for the year 2017) that consists of the top five Android malware apps: Malapp, Fakeinst, Premiumtext, Maldownloader, and Simplelocker and the resulting leaked phone number, location information, and installed app information. Uniqueness of our work stems from developing mathematical framework and providing insights of estimating cyber-insurance parameters through game-theoretic choice of strategies and by showing its efficacy on a recent real malicious app data. These insights will be of tremendous help to researchers and practitioners in the security community.
{"title":"Game Theory based Cyber-Insurance to Cover Potential Loss from Mobile Malware Exploitation","authors":"Li Wang, S. Iyengar, Amith K. Belman, P. Sniatala, V. Phoha, C. Wan","doi":"10.1145/3409959","DOIUrl":"https://doi.org/10.1145/3409959","url":null,"abstract":"Potential for huge loss from malicious exploitation of software calls for development of principles of cyber-insurance. Estimating what to insure and for how much and what might be the premiums poses challenges because of the uncertainties, such as the timings of emergence and lethality of malicious apps, human propensity to favor ease by giving more privilege to downloaded apps over inconvenience of delay or functionality, the chance of infection determined by the lifestyle of the mobile device user, and the monetary value of the compromise of software, and so on. We provide a theoretical framework for cyber-insurance backed by game-theoretic formulation to calculate monetary value of risk and the insurance premiums associated with software compromise. By establishing the conditions for Nash equilibrium between strategies of an adversary and software we derive probabilities for risk, potential loss, gain to adversary from app categories, such as lifestyles, entertainment, education, and so on, and their prevalence ratios. Using simulations over a range of possibilities, and using publicly available malware statistics, we provide insights about the strategies that can be taken by the software and the adversary. We show the application of our framework on the most recent mobile malware data (2018 ISTR report—data for the year 2017) that consists of the top five Android malware apps: Malapp, Fakeinst, Premiumtext, Maldownloader, and Simplelocker and the resulting leaked phone number, location information, and installed app information. Uniqueness of our work stems from developing mathematical framework and providing insights of estimating cyber-insurance parameters through game-theoretic choice of strategies and by showing its efficacy on a recent real malicious app data. These insights will be of tremendous help to researchers and practitioners in the security community.","PeriodicalId":202552,"journal":{"name":"Digital Threats: Research and Practice","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122084089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Automated journalism technology is transforming news production and changing how audiences perceive the news. As automated text-generation models advance, it is important to understand how readers perceive human-written and machine-generated content. This study used OpenAI’s GPT-2 text-generation model (May 2019 release) and articles from news organizations across the political spectrum to study participants’ reactions to human- and machine-generated articles. As participants read the articles, we collected their facial expression and galvanic skin response (GSR) data together with self-reported perceptions of article source and content credibility. We also asked participants to identify their political affinity and assess the articles’ political tone to gain insight into the relationship between political leaning and article perception. Our results indicate that the May 2019 release of OpenAI’s GPT-2 model generated articles that were misidentified as written by a human close to half the time, while human-written articles were identified correctly as written by a human about 70 percent of the time.
{"title":"Perceptions of Human and Machine-Generated Articles","authors":"Shubhra Tewari, Renos Zabounidis, Ammina Kothari, Reynold J. Bailey, Cecilia Ovesdotter Alm","doi":"10.1145/3428158","DOIUrl":"https://doi.org/10.1145/3428158","url":null,"abstract":"Automated journalism technology is transforming news production and changing how audiences perceive the news. As automated text-generation models advance, it is important to understand how readers perceive human-written and machine-generated content. This study used OpenAI’s GPT-2 text-generation model (May 2019 release) and articles from news organizations across the political spectrum to study participants’ reactions to human- and machine-generated articles. As participants read the articles, we collected their facial expression and galvanic skin response (GSR) data together with self-reported perceptions of article source and content credibility. We also asked participants to identify their political affinity and assess the articles’ political tone to gain insight into the relationship between political leaning and article perception. Our results indicate that the May 2019 release of OpenAI’s GPT-2 model generated articles that were misidentified as written by a human close to half the time, while human-written articles were identified correctly as written by a human about 70 percent of the time.","PeriodicalId":202552,"journal":{"name":"Digital Threats: Research and Practice","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123783690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Insurance premiums reflect expectations about the future losses of each insured. Given the dearth of cyber security loss data, market premiums could shed light on the true magnitude of cyber losses despite noise from factors unrelated to losses. To that end, we extract cyber insurance pricing information from the regulatory filings of 26 insurers. We provide empirical observations on how premiums vary by coverage type, amount, and policyholder type and over time. A method using particle swarm optimisation and the expected value premium principle is introduced to iterate through candidate parameterised distributions with the goal of reducing error in predicting observed prices. We then aggregate the inferred loss models across 6,828 observed prices from all 26 insurers to derive the County Fair Cyber Loss Distribution. We demonstrate its value in decision support by applying it to a theoretical retail firm with annual revenue of $50M. The results suggest that the expected cyber liability loss is $428K and that the firm faces a 2.3% chance of experiencing a cyber liability loss between $100K and $10M each year. The method and resulting estimates could help organisations better manage cyber risk, regardless of whether they purchase insurance.
{"title":"The County Fair Cyber Loss Distribution","authors":"Daniel W. Woods, Tyler Moore, A. Simpson","doi":"10.1145/3434403","DOIUrl":"https://doi.org/10.1145/3434403","url":null,"abstract":"Insurance premiums reflect expectations about the future losses of each insured. Given the dearth of cyber security loss data, market premiums could shed light on the true magnitude of cyber losses despite noise from factors unrelated to losses. To that end, we extract cyber insurance pricing information from the regulatory filings of 26 insurers. We provide empirical observations on how premiums vary by coverage type, amount, and policyholder type and over time. A method using particle swarm optimisation and the expected value premium principle is introduced to iterate through candidate parameterised distributions with the goal of reducing error in predicting observed prices. We then aggregate the inferred loss models across 6,828 observed prices from all 26 insurers to derive the County Fair Cyber Loss Distribution. We demonstrate its value in decision support by applying it to a theoretical retail firm with annual revenue of $50M. The results suggest that the expected cyber liability loss is $428K and that the firm faces a 2.3% chance of experiencing a cyber liability loss between $100K and $10M each year. The method and resulting estimates could help organisations better manage cyber risk, regardless of whether they purchase insurance.","PeriodicalId":202552,"journal":{"name":"Digital Threats: Research and Practice","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124516578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rohit Valecha, San Antonio, Srikrishna Krishnarao Srinivasan, Tejaswi Volety, Hazel Kwon, Rohit Valecha, Srikrishna Krishnarao Srinivasan, Tejaswi Volety, K. Kwon, M. Agrawal
Fake news has become a growing problem for societies, spreading virally and transforming into harmful impacts in social networks. The problem of fake news is even more troubling in the healthcare context. In the healthcare literature, it has been well established that threat situations and coping responses facilitate information sharing and seeking among the public. Along a similar vein, we argue that threat and coping related cues are important indicators of shareworthiness of fake news in social media. We address the following research questions associated with fake news sharing in the context of Zika virus: How do threat- and coping-related cues influence fake news sharing? We characterize threat situations that have threat and severity cues and coping responses that are based on reaction to protection and fear cues. The results indicate the significant positive effect of threat cues and protection cues on fake news sharing. Such an investigation can allow the monitoring of viral fake messages in a timely manner.
{"title":"Fake News Sharing","authors":"Rohit Valecha, San Antonio, Srikrishna Krishnarao Srinivasan, Tejaswi Volety, Hazel Kwon, Rohit Valecha, Srikrishna Krishnarao Srinivasan, Tejaswi Volety, K. Kwon, M. Agrawal","doi":"10.1145/3410025","DOIUrl":"https://doi.org/10.1145/3410025","url":null,"abstract":"Fake news has become a growing problem for societies, spreading virally and transforming into harmful impacts in social networks. The problem of fake news is even more troubling in the healthcare context. In the healthcare literature, it has been well established that threat situations and coping responses facilitate information sharing and seeking among the public. Along a similar vein, we argue that threat and coping related cues are important indicators of shareworthiness of fake news in social media. We address the following research questions associated with fake news sharing in the context of Zika virus: How do threat- and coping-related cues influence fake news sharing? We characterize threat situations that have threat and severity cues and coping responses that are based on reaction to protection and fear cues. The results indicate the significant positive effect of threat cues and protection cues on fake news sharing. Such an investigation can allow the monitoring of viral fake messages in a timely manner.","PeriodicalId":202552,"journal":{"name":"Digital Threats: Research and Practice","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116041929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Establishing a solid mechanism for finding credible and trustworthy people in online social networks is an important first step to avoid useless, misleading, or even malicious information. There is a body of existing work studying trustworthiness of social media users and finding credible sources in specific target domains. However, most of the related work lacks the connection between the credibility in the real-world and credibility on the Internet, which makes the formation of social media credibility and trustworthiness incomplete. In this article, working in the financial domain, we identify attributes that can distinguish credible users on the Internet who are indeed trustworthy experts in the real-world. To ensure objectivity, we gather the list of credible financial experts from real-world financial authorities. We analyze the distribution of attributes of about 10K stock-related Twitter users and their 600K tweets over six months in 2015/2016, and over 2.6M typical Twitter users and their 4.8M tweets on November 2nd, 2015, comprising 1% of the entire Twitter in that time period. By using the random forest classifier, we find which attributes are related to real-world expertise. Our work sheds light on the properties of trustworthy users and paves the way for their automatic identification.
{"title":"Identifying Real-world Credible Experts in the Financial Domain","authors":"Teng-Chieh Huang, Razieh Nokhbeh, Teng-Chieh Huang, Razieh Nokhbeh Zaeem","doi":"10.1145/3446783","DOIUrl":"https://doi.org/10.1145/3446783","url":null,"abstract":"Establishing a solid mechanism for finding credible and trustworthy people in online social networks is an important first step to avoid useless, misleading, or even malicious information. There is a body of existing work studying trustworthiness of social media users and finding credible sources in specific target domains. However, most of the related work lacks the connection between the credibility in the real-world and credibility on the Internet, which makes the formation of social media credibility and trustworthiness incomplete. In this article, working in the financial domain, we identify attributes that can distinguish credible users on the Internet who are indeed trustworthy experts in the real-world. To ensure objectivity, we gather the list of credible financial experts from real-world financial authorities. We analyze the distribution of attributes of about 10K stock-related Twitter users and their 600K tweets over six months in 2015/2016, and over 2.6M typical Twitter users and their 4.8M tweets on November 2nd, 2015, comprising 1% of the entire Twitter in that time period. By using the random forest classifier, we find which attributes are related to real-world expertise. Our work sheds light on the properties of trustworthy users and paves the way for their automatic identification.","PeriodicalId":202552,"journal":{"name":"Digital Threats: Research and Practice","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121303324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lev Konstantinovskiy, Oliver Price, Mevan Babakar, A. Zubiaga
In an effort to assist factcheckers in the process of factchecking, we tackle the claim detection task, one of the necessary stages prior to determining the veracity of a claim. It consists of identifying the set of sentences, out of a long text, deemed capable of being factchecked. This article is a collaborative work between Full Fact, an independent factchecking charity, and academic partners. Leveraging the expertise of professional factcheckers, we develop an annotation schema and a benchmark for automated claim detection that is more consistent across time, topics, and annotators than are previous approaches. Our annotation schema has been used to crowdsource the annotation of a dataset with sentences from UK political TV shows. We introduce an approach based on universal sentence representations to perform the classification, achieving an F1 score of 0.83, with over 5% relative improvement over the state-of-the-art methods ClaimBuster and ClaimRank. The system was deployed in production and received positive user feedback.
{"title":"Toward Automated Factchecking","authors":"Lev Konstantinovskiy, Oliver Price, Mevan Babakar, A. Zubiaga","doi":"10.1145/3412869","DOIUrl":"https://doi.org/10.1145/3412869","url":null,"abstract":"In an effort to assist factcheckers in the process of factchecking, we tackle the claim detection task, one of the necessary stages prior to determining the veracity of a claim. It consists of identifying the set of sentences, out of a long text, deemed capable of being factchecked. This article is a collaborative work between Full Fact, an independent factchecking charity, and academic partners. Leveraging the expertise of professional factcheckers, we develop an annotation schema and a benchmark for automated claim detection that is more consistent across time, topics, and annotators than are previous approaches. Our annotation schema has been used to crowdsource the annotation of a dataset with sentences from UK political TV shows. We introduce an approach based on universal sentence representations to perform the classification, achieving an F1 score of 0.83, with over 5% relative improvement over the state-of-the-art methods ClaimBuster and ClaimRank. The system was deployed in production and received positive user feedback.","PeriodicalId":202552,"journal":{"name":"Digital Threats: Research and Practice","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125906429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}