{"title":"Beyond the west: Revealing and bridging the gap between Western and Chinese phishing website detection","authors":"Ying Yuan , Giovanni Apruzzese , Mauro Conti","doi":"10.1016/j.cose.2024.104115","DOIUrl":null,"url":null,"abstract":"<div><div>Phishing attacks are on the rise, and phishing <em>websites</em> are everywhere, denoting the brittleness of security mechanisms reliant on blocklists. To cope with this threat, many works proposed to enhance Phishing Website Detectors (PWD) with data-driven techniques powered by Machine Learning (ML). Despite achieving promising results both in research and practice, existing solutions mostly focus “on the West”, e.g., they consider websites in English, German, or Italian. In contrast, phishing websites targeting “Eastern” countries, such as China, have been mostly neglected—despite phishing being rampant also in this side of the world.</div><div>In this paper, we scrutinize whether current PWD can simultaneously work against Western and Chinese phishing websites. First, after highlighting the difficulties of practically testing PWD on Chinese phishing websites, we create CghPghrg—a dataset which enables assessment of PWD on Chinese websites. Then, we evaluate 72 PWD developed by industry practitioners and 10 ML-based PWD proposed in recent research on Western and Chinese websites: our results highlight that existing solutions, despite achieving low false positive rates, exhibit unacceptably low detection rates (sometimes inferior to 1%) on phishing websites of different <em>regions</em>. Next, to bridge the gap we brought to light, we elucidate the differences between Western and Chinese websites, and devise an enhanced feature set that accounts for the unique characteristics of Chinese websites. We empirically demonstrate the effectiveness of our proposed feature set by replicating (and testing) state-of-the-art ML-PWD: our results show a small but statistically significant improvement over the baselines. Finally, we review all our previous contributions and combine them to develop practical PWD that simultaneously work on Chinese and Western websites, achieving over 0.98 detection rate while maintaining only 0.01 false positive rate in a cross-regional setting. We openly release all our tools, disclose all our benchmark results, and also perform proof-of-concept experiments revealing that the problem tackled by our paper extends to other “Eastern” countries that have been overlooked by prior research on PWD.</div></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"148 ","pages":"Article 104115"},"PeriodicalIF":4.8000,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167404824004206","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Phishing attacks are on the rise, and phishing websites are everywhere, denoting the brittleness of security mechanisms reliant on blocklists. To cope with this threat, many works proposed to enhance Phishing Website Detectors (PWD) with data-driven techniques powered by Machine Learning (ML). Despite achieving promising results both in research and practice, existing solutions mostly focus “on the West”, e.g., they consider websites in English, German, or Italian. In contrast, phishing websites targeting “Eastern” countries, such as China, have been mostly neglected—despite phishing being rampant also in this side of the world.
In this paper, we scrutinize whether current PWD can simultaneously work against Western and Chinese phishing websites. First, after highlighting the difficulties of practically testing PWD on Chinese phishing websites, we create CghPghrg—a dataset which enables assessment of PWD on Chinese websites. Then, we evaluate 72 PWD developed by industry practitioners and 10 ML-based PWD proposed in recent research on Western and Chinese websites: our results highlight that existing solutions, despite achieving low false positive rates, exhibit unacceptably low detection rates (sometimes inferior to 1%) on phishing websites of different regions. Next, to bridge the gap we brought to light, we elucidate the differences between Western and Chinese websites, and devise an enhanced feature set that accounts for the unique characteristics of Chinese websites. We empirically demonstrate the effectiveness of our proposed feature set by replicating (and testing) state-of-the-art ML-PWD: our results show a small but statistically significant improvement over the baselines. Finally, we review all our previous contributions and combine them to develop practical PWD that simultaneously work on Chinese and Western websites, achieving over 0.98 detection rate while maintaining only 0.01 false positive rate in a cross-regional setting. We openly release all our tools, disclose all our benchmark results, and also perform proof-of-concept experiments revealing that the problem tackled by our paper extends to other “Eastern” countries that have been overlooked by prior research on PWD.
期刊介绍:
Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world.
Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.