Shared Multi-Keyboard and Bilingual Datasets to Support Keystroke Dynamics Research

Proceedings of the Twelfth ACM Conference on Data and Application Security and Privacy Pub Date : 2022-04-14 DOI:10.1145/3508398.3511516

A. Wahab, Daqing Hou, M. Banavar, S. Schuckers, Kenneth Eaton, Jacob Baldwin, Robert Wright

{"title":"Shared Multi-Keyboard and Bilingual Datasets to Support Keystroke Dynamics Research","authors":"A. Wahab, Daqing Hou, M. Banavar, S. Schuckers, Kenneth Eaton, Jacob Baldwin, Robert Wright","doi":"10.1145/3508398.3511516","DOIUrl":null,"url":null,"abstract":"Keystroke dynamics has been shown to be a promising method for user authentication based on a user's typing rhythms. Over the years, it has seen increasing applications such as in preventing transaction fraud, account takeovers, and identity theft. However, due to the variable nature of keystroke dynamics, a user's typing patterns may vary on a different keyboard or in a different keyboard language setting, which may affect the system accuracy. In other words, an algorithm modeled with data collected using a mechanical keyboard may perform significantly differently when tested with an ergonomic keyboard. Similarly, an algorithm modeled with data collected in one language may perform significantly differently when tested with another language. Hence, there is a need to study the impact of multiple keyboards and multiple languages on keystroke dynamics performance. This motivated us to develop two free-text keystroke dynamics datasets. The first is a multi-keyboard keystroke dataset comprising of four (4) physical keyboards - mechanical, ergonomic, membrane, and laptop keyboards - and the second is a bilingual keystroke dataset in both English and Chinese languages. Data were collected from a total of 86 participants using a non-intrusive web-based keylogger in a semi-controlled setting. To the best of our knowledge, these are the first multi-keyboard and bilingual keystroke datasets, as well as the data collection software, to be made publicly available for research purposes. The usefulness of our datasets was demonstrated by evaluating the performance of two state-of-the-art free-text algorithms.","PeriodicalId":102306,"journal":{"name":"Proceedings of the Twelfth ACM Conference on Data and Application Security and Privacy","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Twelfth ACM Conference on Data and Application Security and Privacy","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3508398.3511516","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Keystroke dynamics has been shown to be a promising method for user authentication based on a user's typing rhythms. Over the years, it has seen increasing applications such as in preventing transaction fraud, account takeovers, and identity theft. However, due to the variable nature of keystroke dynamics, a user's typing patterns may vary on a different keyboard or in a different keyboard language setting, which may affect the system accuracy. In other words, an algorithm modeled with data collected using a mechanical keyboard may perform significantly differently when tested with an ergonomic keyboard. Similarly, an algorithm modeled with data collected in one language may perform significantly differently when tested with another language. Hence, there is a need to study the impact of multiple keyboards and multiple languages on keystroke dynamics performance. This motivated us to develop two free-text keystroke dynamics datasets. The first is a multi-keyboard keystroke dataset comprising of four (4) physical keyboards - mechanical, ergonomic, membrane, and laptop keyboards - and the second is a bilingual keystroke dataset in both English and Chinese languages. Data were collected from a total of 86 participants using a non-intrusive web-based keylogger in a semi-controlled setting. To the best of our knowledge, these are the first multi-keyboard and bilingual keystroke datasets, as well as the data collection software, to be made publicly available for research purposes. The usefulness of our datasets was demonstrated by evaluating the performance of two state-of-the-art free-text algorithms.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

共享多键盘和双语数据集，支持击键动力学研究

击键动力学已被证明是一种很有前途的基于用户输入节奏的用户身份验证方法。多年来，它在防止交易欺诈、账户接管和身份盗窃等方面的应用越来越多。然而，由于击键动力学的可变性，用户的输入模式在不同的键盘或不同的键盘语言设置中可能会有所不同，这可能会影响系统的准确性。换句话说，使用机械键盘收集的数据建模的算法在使用人体工程学键盘进行测试时可能表现明显不同。类似地，用一种语言收集的数据建模的算法在用另一种语言测试时可能表现出明显不同。因此，有必要研究多种键盘和多种语言对击键动力学性能的影响。这促使我们开发了两个自由文本击键动力学数据集。第一个是多键盘击键数据集，包括四(4)个物理键盘——机械键盘、人体工程学键盘、薄膜键盘和笔记本电脑键盘——第二个是中英文双语击键数据集。在半受控环境下，使用非侵入式网络键盘记录器从总共86名参与者中收集数据。据我们所知，这是首个公开供研究使用的多键盘和双语击键数据集，以及数据收集软件。通过评估两种最先进的自由文本算法的性能，我们的数据集的有用性得到了证明。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the Twelfth ACM Conference on Data and Application Security and Privacy

自引率

0.00%

发文量

期刊最新文献

Session details: Session 7: Encryption and Privacy RS-PKE: Ranked Searchable Public-Key Encryption for Cloud-Assisted Lightweight Platforms Prediction of Mobile App Privacy Preferences with User Profiles via Federated Learning Building a Commit-level Dataset of Real-world Vulnerabilities Shared Multi-Keyboard and Bilingual Datasets to Support Keystroke Dynamics Research