{"title":"Research on Malicious URL Detection Technology Based on BERT Model","authors":"Wei-hwa Chang, Fei Du, Yijing Wang","doi":"10.1109/icicn52636.2021.9673860","DOIUrl":null,"url":null,"abstract":"In network security, as malicious URLs increase and change, their detection has gradually become more important. The existing malicious URL detection methods lack the description of location and context semantics. This paper proposes a malicious URL based on the BERT model. The URL detection method first uses the preprocessing method to solve the problem of a large number of random characters forming words in the URL, uses special symbols as a separator to segment the URL, and then trains the BERT model to extract the short string characteristics of the URL and classify it. The experimental results show that the method’s accuracy is 98.30%, the recall rate is 95.21%, and the F1 value is 94.33%.","PeriodicalId":231379,"journal":{"name":"2021 IEEE 9th International Conference on Information, Communication and Networks (ICICN)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 9th International Conference on Information, Communication and Networks (ICICN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icicn52636.2021.9673860","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
In network security, as malicious URLs increase and change, their detection has gradually become more important. The existing malicious URL detection methods lack the description of location and context semantics. This paper proposes a malicious URL based on the BERT model. The URL detection method first uses the preprocessing method to solve the problem of a large number of random characters forming words in the URL, uses special symbols as a separator to segment the URL, and then trains the BERT model to extract the short string characteristics of the URL and classify it. The experimental results show that the method’s accuracy is 98.30%, the recall rate is 95.21%, and the F1 value is 94.33%.