Automated parsing and interpretation of identity leaks

Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI:10.1145/2903150.2903156

Hendrik Graupner, David Jaeger, Feng Cheng, C. Meinel

{"title":"Automated parsing and interpretation of identity leaks","authors":"Hendrik Graupner, David Jaeger, Feng Cheng, C. Meinel","doi":"10.1145/2903150.2903156","DOIUrl":null,"url":null,"abstract":"The relevance of identity data leaks on the Internet is more present than ever. Almost every month we read about leakage of databases with more than a million users in the news. Smaller but not less dangerous leaks happen even multiple times a day. The public availability of such leaked data is a major threat to the victims, but also creates the opportunity to learn not only about security of service providers but also the behavior of users when choosing passwords. Our goal is to analyze this data and generate knowledge that can be used to increase security awareness and security, respectively. This paper presents a novel approach to automatic analysis of a vast majority of bigger and smaller leaks. Our contribution is the concept and a prototype implementation of a parser, composed of a syntactic and a semantic module, and a data analyzer for identity leaks. In this context, we deal with the two major challenges of a huge amount of different formats and the recognition of leaks' unknown data types. Based on the data collected, this paper reveals how easy it is for criminals to collect lots of passwords, which are plain text or only weakly hashed.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM International Conference on Computing Frontiers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2903150.2903156","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

The relevance of identity data leaks on the Internet is more present than ever. Almost every month we read about leakage of databases with more than a million users in the news. Smaller but not less dangerous leaks happen even multiple times a day. The public availability of such leaked data is a major threat to the victims, but also creates the opportunity to learn not only about security of service providers but also the behavior of users when choosing passwords. Our goal is to analyze this data and generate knowledge that can be used to increase security awareness and security, respectively. This paper presents a novel approach to automatic analysis of a vast majority of bigger and smaller leaks. Our contribution is the concept and a prototype implementation of a parser, composed of a syntactic and a semantic module, and a data analyzer for identity leaks. In this context, we deal with the two major challenges of a huge amount of different formats and the recognition of leaks' unknown data types. Based on the data collected, this paper reveals how easy it is for criminals to collect lots of passwords, which are plain text or only weakly hashed.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

自动解析和解释身份泄漏

互联网上身份数据泄露的相关性比以往任何时候都更加突出。几乎每个月我们都会在新闻中读到超过一百万用户的数据库泄露事件。更小但同样危险的泄漏甚至一天会发生多次。这些泄露数据的公开可用性对受害者来说是一个主要威胁，但也创造了一个机会，不仅可以了解服务提供商的安全性，还可以了解用户在选择密码时的行为。我们的目标是分析这些数据并生成可分别用于提高安全意识和安全性的知识。本文提出了一种自动分析绝大多数大小泄漏的新方法。我们的贡献是解析器的概念和原型实现，由语法和语义模块组成，以及用于身份泄漏的数据分析器。在这种情况下，我们要处理两个主要的挑战:大量不同的格式和识别泄漏的未知数据类型。根据收集到的数据，本文揭示了犯罪分子如何容易收集大量密码，这些密码是纯文本或仅弱散列的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the ACM International Conference on Computing Frontiers

自引率

0.00%

发文量

期刊最新文献

Big data analytics and the LHC Using colored petri nets for GPGPU performance modeling Predictive modeling based power estimation for embedded multicore systems Boosting performance of directory-based cache coherence protocols with coherence bypass at subpage granularity and a novel on-chip page table Prototyping real-time tracking systems on mobile devices