Huaxin Li, Zheyu Xu, Haojin Zhu, Di Ma, Shuai Li, Kai Xing
{"title":"通过Wi-Fi网络流量分析进行人口统计推断","authors":"Huaxin Li, Zheyu Xu, Haojin Zhu, Di Ma, Shuai Li, Kai Xing","doi":"10.1109/INFOCOM.2016.7524528","DOIUrl":null,"url":null,"abstract":"Although privacy leaking through content analysis of Wi-Fi traffic has received an increased attention, privacy inference through meta-data (e.g. IP, Host) analysis of Wi-Fi traffic represents a potentially more serious threat to user privacy. Firstly, it represents a more efficient and scalable approach to infer users' sensitive information without checking the content of Wi-Fi traffic. Secondly, meta-data based demographics inference can work on both unencrypted and encrypted traffic (e.g., HTTPS traffic). In this study, we present a novel approach to infer user demographic information by exploiting the meta-data of Wi-Fi traffic. We develop a proof-of-concept prototype, Demographic Information Predictor (DIP) system, and evaluate its performance on a real-world dataset, which includes the Wi-Fi access of 28,158 users in 5 months. DIP extracts four kinds of features from real-world Wi-Fi traffic and proposes a novel machine learning based inference technique to predict user demographics. Our analytical results show that, for unencrypted traffic, DIP can predict gender and education level of users with an accuracy of 78% and 74% respectively. It is surprising to show that, even for HTTPS traffic, user demographics can still be predicted at a precision of 67% and 72% respectively, which well demonstrates the practicality of the proposed privacy inference scheme.","PeriodicalId":274591,"journal":{"name":"IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"37","resultStr":"{\"title\":\"Demographics inference through Wi-Fi network traffic analysis\",\"authors\":\"Huaxin Li, Zheyu Xu, Haojin Zhu, Di Ma, Shuai Li, Kai Xing\",\"doi\":\"10.1109/INFOCOM.2016.7524528\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Although privacy leaking through content analysis of Wi-Fi traffic has received an increased attention, privacy inference through meta-data (e.g. IP, Host) analysis of Wi-Fi traffic represents a potentially more serious threat to user privacy. Firstly, it represents a more efficient and scalable approach to infer users' sensitive information without checking the content of Wi-Fi traffic. Secondly, meta-data based demographics inference can work on both unencrypted and encrypted traffic (e.g., HTTPS traffic). In this study, we present a novel approach to infer user demographic information by exploiting the meta-data of Wi-Fi traffic. We develop a proof-of-concept prototype, Demographic Information Predictor (DIP) system, and evaluate its performance on a real-world dataset, which includes the Wi-Fi access of 28,158 users in 5 months. DIP extracts four kinds of features from real-world Wi-Fi traffic and proposes a novel machine learning based inference technique to predict user demographics. Our analytical results show that, for unencrypted traffic, DIP can predict gender and education level of users with an accuracy of 78% and 74% respectively. It is surprising to show that, even for HTTPS traffic, user demographics can still be predicted at a precision of 67% and 72% respectively, which well demonstrates the practicality of the proposed privacy inference scheme.\",\"PeriodicalId\":274591,\"journal\":{\"name\":\"IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-04-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"37\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INFOCOM.2016.7524528\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INFOCOM.2016.7524528","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Demographics inference through Wi-Fi network traffic analysis
Although privacy leaking through content analysis of Wi-Fi traffic has received an increased attention, privacy inference through meta-data (e.g. IP, Host) analysis of Wi-Fi traffic represents a potentially more serious threat to user privacy. Firstly, it represents a more efficient and scalable approach to infer users' sensitive information without checking the content of Wi-Fi traffic. Secondly, meta-data based demographics inference can work on both unencrypted and encrypted traffic (e.g., HTTPS traffic). In this study, we present a novel approach to infer user demographic information by exploiting the meta-data of Wi-Fi traffic. We develop a proof-of-concept prototype, Demographic Information Predictor (DIP) system, and evaluate its performance on a real-world dataset, which includes the Wi-Fi access of 28,158 users in 5 months. DIP extracts four kinds of features from real-world Wi-Fi traffic and proposes a novel machine learning based inference technique to predict user demographics. Our analytical results show that, for unencrypted traffic, DIP can predict gender and education level of users with an accuracy of 78% and 74% respectively. It is surprising to show that, even for HTTPS traffic, user demographics can still be predicted at a precision of 67% and 72% respectively, which well demonstrates the practicality of the proposed privacy inference scheme.