This research focused on enhancing post-incident malware forensic investigation using reinforcement learning RL. We proposed an advanced MDP post incident malware forensics investigation model and framework to expedite post incident forensics. We then implement our RL Malware Investigation Model based on structured MDP within the proposed framework. To identify malware artefacts, the RL agent acquires and examines forensics evidence files, iteratively improving its capabilities using Q Table and temporal difference learning. The Q learning algorithm significantly improved the agent ability to identify malware. An epsilon greedy exploration strategy and Q learning updates enabled efficient learning and decision making. Our experimental testing revealed that optimal learning rates depend on the MDP environment complexity, with simpler environments benefiting from higher rates for quicker convergence and complex ones requiring lower rates for stability. Our model performance in identifying and classifying malware reduced malware analysis time compared to human experts, demonstrating robustness and adaptability. The study highlighted the significance of hyper parameter tuning and suggested adaptive strategies for complex environments. Our RL based approach produced promising results and is validated as an alternative to traditional methods notably by offering continuous learning and adaptation to new and evolving malware threats which ultimately enhance the post incident forensics investigations.
{"title":"Reinforcement Learning for an Efficient and Effective Malware Investigation during Cyber Incident Response","authors":"Dipo Dunsin, Mohamed Chahine Ghanem, Karim Ouazzane, Vassil Vassilev","doi":"arxiv-2408.01999","DOIUrl":"https://doi.org/arxiv-2408.01999","url":null,"abstract":"This research focused on enhancing post-incident malware forensic\u0000investigation using reinforcement learning RL. We proposed an advanced MDP post\u0000incident malware forensics investigation model and framework to expedite post\u0000incident forensics. We then implement our RL Malware Investigation Model based\u0000on structured MDP within the proposed framework. To identify malware artefacts,\u0000the RL agent acquires and examines forensics evidence files, iteratively\u0000improving its capabilities using Q Table and temporal difference learning. The\u0000Q learning algorithm significantly improved the agent ability to identify\u0000malware. An epsilon greedy exploration strategy and Q learning updates enabled\u0000efficient learning and decision making. Our experimental testing revealed that\u0000optimal learning rates depend on the MDP environment complexity, with simpler\u0000environments benefiting from higher rates for quicker convergence and complex\u0000ones requiring lower rates for stability. Our model performance in identifying\u0000and classifying malware reduced malware analysis time compared to human\u0000experts, demonstrating robustness and adaptability. The study highlighted the\u0000significance of hyper parameter tuning and suggested adaptive strategies for\u0000complex environments. Our RL based approach produced promising results and is\u0000validated as an alternative to traditional methods notably by offering\u0000continuous learning and adaptation to new and evolving malware threats which\u0000ultimately enhance the post incident forensics investigations.","PeriodicalId":501168,"journal":{"name":"arXiv - CS - Emerging Technologies","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This work shows a procedural method for extracting object heights from LiDAR and aerial imagery. We discuss how to get heights and the future of LiDAR and imagery processing. SOTA object segmentation allows us to take get object heights with no deep learning background. Engineers will be keeping track of world data across generations and reprocessing them. They will be using older procedural methods like this paper and newer ones discussed here. SOTA methods are going beyond analysis and into generative AI. We cover both a procedural methodology and the newer ones performed with language models. These include point cloud, imagery and text encoding allowing for spatially aware AI.
{"title":"Extracting Object Heights From LiDAR & Aerial Imagery","authors":"Jesus Guerrero","doi":"arxiv-2408.00967","DOIUrl":"https://doi.org/arxiv-2408.00967","url":null,"abstract":"This work shows a procedural method for extracting object heights from LiDAR\u0000and aerial imagery. We discuss how to get heights and the future of LiDAR and\u0000imagery processing. SOTA object segmentation allows us to take get object\u0000heights with no deep learning background. Engineers will be keeping track of\u0000world data across generations and reprocessing them. They will be using older\u0000procedural methods like this paper and newer ones discussed here. SOTA methods\u0000are going beyond analysis and into generative AI. We cover both a procedural\u0000methodology and the newer ones performed with language models. These include\u0000point cloud, imagery and text encoding allowing for spatially aware AI.","PeriodicalId":501168,"journal":{"name":"arXiv - CS - Emerging Technologies","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giorgia Adorni, Francesca Mangili, Alberto Piatti, Claudio Bonesana, Alessandro Antonucci
In modern and personalised education, there is a growing interest in developing learners' competencies and accurately assessing them. In a previous work, we proposed a procedure for deriving a learner model for automatic skill assessment from a task-specific competence rubric, thus simplifying the implementation of automated assessment tools. The previous approach, however, suffered two main limitations: (i) the ordering between competencies defined by the assessment rubric was only indirectly modelled; (ii) supplementary skills, not under assessment but necessary for accomplishing the task, were not included in the model. In this work, we address issue (i) by introducing dummy observed nodes, strictly enforcing the skills ordering without changing the network's structure. In contrast, for point (ii), we design a network with two layers of gates, one performing disjunctive operations by noisy-OR gates and the other conjunctive operations through logical ANDs. Such changes improve the model outcomes' coherence and the modelling tool's flexibility without compromising the model's compact parametrisation, interpretability and simple experts' elicitation. We used this approach to develop a learner model for Computational Thinking (CT) skills assessment. The CT-cube skills assessment framework and the Cross Array Task (CAT) are used to exemplify it and demonstrate its feasibility.
在现代个性化教育中,人们越来越关注培养学习者的能力并对其进行准确评估。在之前的工作中,我们提出了一种从特定任务的能力标准中推导出学习者模型的方法,用于自动技能评估,从而简化了自动评估工具的实施。然而,以前的方法有两个主要局限:(i) 评估标准所定义的能力之间的排序只是间接建模;(ii) 模型中没有包括不在评估范围内但完成任务所必需的辅助技能。在这项工作中,我们通过引入虚拟观察节点来解决第(i)点问题,在不改变网络结构的情况下严格执行技能排序。相反,针对问题(ii),我们设计了一个具有两层门的网络,一层通过噪声-OR 门进行非连接操作,另一层通过逻辑 AND 进行连接操作。这种改变提高了模型结果的一致性和建模工具的灵活性,同时又不影响模型的紧凑参数化、可解释性和简单的专家诱导。我们采用这种方法开发了用于计算思维(CT)技能评估的学习者模型。CT-立方体技能评估框架和交叉阵列任务(CAT)被用来示范和证明其可行性。
{"title":"Rubric-based Learner Modelling via Noisy Gates Bayesian Networks for Computational Thinking Skills Assessment","authors":"Giorgia Adorni, Francesca Mangili, Alberto Piatti, Claudio Bonesana, Alessandro Antonucci","doi":"arxiv-2408.01221","DOIUrl":"https://doi.org/arxiv-2408.01221","url":null,"abstract":"In modern and personalised education, there is a growing interest in\u0000developing learners' competencies and accurately assessing them. In a previous\u0000work, we proposed a procedure for deriving a learner model for automatic skill\u0000assessment from a task-specific competence rubric, thus simplifying the\u0000implementation of automated assessment tools. The previous approach, however,\u0000suffered two main limitations: (i) the ordering between competencies defined by\u0000the assessment rubric was only indirectly modelled; (ii) supplementary skills,\u0000not under assessment but necessary for accomplishing the task, were not\u0000included in the model. In this work, we address issue (i) by introducing dummy\u0000observed nodes, strictly enforcing the skills ordering without changing the\u0000network's structure. In contrast, for point (ii), we design a network with two\u0000layers of gates, one performing disjunctive operations by noisy-OR gates and\u0000the other conjunctive operations through logical ANDs. Such changes improve the\u0000model outcomes' coherence and the modelling tool's flexibility without\u0000compromising the model's compact parametrisation, interpretability and simple\u0000experts' elicitation. We used this approach to develop a learner model for\u0000Computational Thinking (CT) skills assessment. The CT-cube skills assessment\u0000framework and the Cross Array Task (CAT) are used to exemplify it and\u0000demonstrate its feasibility.","PeriodicalId":501168,"journal":{"name":"arXiv - CS - Emerging Technologies","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper aims to give readers a high-level overview of the different MCX depth reduction techniques that utilize ancilla qubits. We also exhibit a brief analysis of how they would perform under different quantum topological settings. The techniques examined are recursion and v-chain, as they are the most commonly used techniques in the most popular quantum computing libraries, Qiskit. The target audience of this paper is people who do not have intricate mathematical or physics knowledge related to quantum computing.
本文旨在为读者提供一个利用 ancilla 量子比特的不同 MCX 深度缩减技术的高层次概览。我们还简要分析了这些技术在不同量子拓扑结构下的表现。我们研究的技术是递归和 v 链,因为它们是最流行的量子计算库 Qiskit 中最常用的技术。本文的目标读者是那些不具备与量子计算相关的复杂数学或物理知识的人。
{"title":"Analyzing Quantum Circuit Depth Reduction with Ancilla Qubits in MCX Gates","authors":"Ahmad Bennakhi, Paul Franzon, Gregory T. Byrd","doi":"arxiv-2408.01304","DOIUrl":"https://doi.org/arxiv-2408.01304","url":null,"abstract":"This paper aims to give readers a high-level overview of the different MCX\u0000depth reduction techniques that utilize ancilla qubits. We also exhibit a brief\u0000analysis of how they would perform under different quantum topological\u0000settings. The techniques examined are recursion and v-chain, as they are the\u0000most commonly used techniques in the most popular quantum computing libraries,\u0000Qiskit. The target audience of this paper is people who do not have intricate\u0000mathematical or physics knowledge related to quantum computing.","PeriodicalId":501168,"journal":{"name":"arXiv - CS - Emerging Technologies","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Artificial intelligence (AI)coupled with existing Internet of Things (IoT) enables more streamlined and autonomous operations across various economic sectors. Consequently, the paradigm of Artificial Intelligence of Things (AIoT) having AI techniques at its core implies additional energy and carbon costs that may become significant with more complex neural architectures. To better understand the energy and Carbon Footprint (CF) of some AIoT components, very recent studies employ conventional metrics. However, these metrics are not designed to capture energy efficiency aspects of inference. In this paper, we propose a new metric, the Energy Cost of AIoT Lifecycle (eCAL) to capture the overall energy cost of inference over the lifecycle of an AIoT system. We devise a new methodology for determining eCAL of an AIoT system by analyzing the complexity of data manipulation in individual components involved in the AIoT lifecycle and derive the overall and per bit energy consumption. With eCAL we show that the better a model is and the more it is used, the more energy efficient an inference is. For an example AIoT configuration, eCAL for making $100$ inferences is $1.43$ times higher than for $1000$ inferences. We also evaluate the CF of the AIoT system by calculating the equivalent CO$_{2}$ emissions based on the energy consumption and the Carbon Intensity (CI) across different countries. Using 2023 renewable data, our analysis reveals that deploying an AIoT system in Germany results in emitting $4.62$ times higher CO$_2$ than in Finland, due to latter using more low-CI energy sources.
{"title":"The Energy Cost of Artificial Intelligence of Things Lifecycle","authors":"Shih-Kai Chou, Jernej Hribar, Mihael Mohorčič, Carolina Fortuna","doi":"arxiv-2408.00540","DOIUrl":"https://doi.org/arxiv-2408.00540","url":null,"abstract":"Artificial intelligence (AI)coupled with existing Internet of Things (IoT)\u0000enables more streamlined and autonomous operations across various economic\u0000sectors. Consequently, the paradigm of Artificial Intelligence of Things (AIoT)\u0000having AI techniques at its core implies additional energy and carbon costs\u0000that may become significant with more complex neural architectures. To better\u0000understand the energy and Carbon Footprint (CF) of some AIoT components, very\u0000recent studies employ conventional metrics. However, these metrics are not\u0000designed to capture energy efficiency aspects of inference. In this paper, we\u0000propose a new metric, the Energy Cost of AIoT Lifecycle (eCAL) to capture the\u0000overall energy cost of inference over the lifecycle of an AIoT system. We\u0000devise a new methodology for determining eCAL of an AIoT system by analyzing\u0000the complexity of data manipulation in individual components involved in the\u0000AIoT lifecycle and derive the overall and per bit energy consumption. With eCAL\u0000we show that the better a model is and the more it is used, the more energy\u0000efficient an inference is. For an example AIoT configuration, eCAL for making\u0000$100$ inferences is $1.43$ times higher than for $1000$ inferences. We also\u0000evaluate the CF of the AIoT system by calculating the equivalent CO$_{2}$\u0000emissions based on the energy consumption and the Carbon Intensity (CI) across\u0000different countries. Using 2023 renewable data, our analysis reveals that\u0000deploying an AIoT system in Germany results in emitting $4.62$ times higher\u0000CO$_2$ than in Finland, due to latter using more low-CI energy sources.","PeriodicalId":501168,"journal":{"name":"arXiv - CS - Emerging Technologies","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141881507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent work in behavioral testing for natural language processing (NLP) models, such as Checklist, is inspired by related paradigms in software engineering testing. They allow evaluation of general linguistic capabilities and domain understanding, hence can help evaluate conceptual soundness and identify model weaknesses. However, a major challenge is the creation of test cases. The current packages rely on semi-automated approach using manual development which requires domain expertise and can be time consuming. This paper introduces an automated approach to develop test cases by exploiting the power of large language models and statistical techniques. It clusters the text representations to carefully construct meaningful groups and then apply prompting techniques to automatically generate Minimal Functionality Tests (MFT). The well-known Amazon Reviews corpus is used to demonstrate our approach. We analyze the behavioral test profiles across four different classification algorithms and discuss the limitations and strengths of those models.
{"title":"Automatic Generation of Behavioral Test Cases For Natural Language Processing Using Clustering and Prompting","authors":"Ying Li, Rahul Singh, Tarun Joshi, Agus Sudjianto","doi":"arxiv-2408.00161","DOIUrl":"https://doi.org/arxiv-2408.00161","url":null,"abstract":"Recent work in behavioral testing for natural language processing (NLP)\u0000models, such as Checklist, is inspired by related paradigms in software\u0000engineering testing. They allow evaluation of general linguistic capabilities\u0000and domain understanding, hence can help evaluate conceptual soundness and\u0000identify model weaknesses. However, a major challenge is the creation of test\u0000cases. The current packages rely on semi-automated approach using manual\u0000development which requires domain expertise and can be time consuming. This\u0000paper introduces an automated approach to develop test cases by exploiting the\u0000power of large language models and statistical techniques. It clusters the text\u0000representations to carefully construct meaningful groups and then apply\u0000prompting techniques to automatically generate Minimal Functionality Tests\u0000(MFT). The well-known Amazon Reviews corpus is used to demonstrate our\u0000approach. We analyze the behavioral test profiles across four different\u0000classification algorithms and discuss the limitations and strengths of those\u0000models.","PeriodicalId":501168,"journal":{"name":"arXiv - CS - Emerging Technologies","volume":"35 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141881505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Early detection of drought stress is critical for taking timely measures for reducing crop loss before the drought impact becomes irreversible. The subtle phenotypical and physiological changes in response to drought stress are captured by non-invasive imaging techniques and these imaging data serve as valuable resource for machine learning methods to identify drought stress. While convolutional neural networks (CNNs) are in wide use, vision transformers (ViTs) present a promising alternative in capturing long-range dependencies and intricate spatial relationships, thereby enhancing the detection of subtle indicators of drought stress. We propose an explainable deep learning pipeline that leverages the power of ViTs for drought stress detection in potato crops using aerial imagery. We applied two distinct approaches: a synergistic combination of ViT and support vector machine (SVM), where ViT extracts intricate spatial features from aerial images, and SVM classifies the crops as stressed or healthy and an end-to-end approach using a dedicated classification layer within ViT to directly detect drought stress. Our key findings explain the ViT model's decision-making process by visualizing attention maps. These maps highlight the specific spatial features within the aerial images that the ViT model focuses as the drought stress signature. Our findings demonstrate that the proposed methods not only achieve high accuracy in drought stress identification but also shedding light on the diverse subtle plant features associated with drought stress. This offers a robust and interpretable solution for drought stress monitoring for farmers to undertake informed decisions for improved crop management.
早期检测干旱胁迫对于在干旱影响变得不可逆转之前及时采取措施减少作物损失至关重要。虽然卷积神经网络(CNNs)得到了广泛应用,但视觉变换器(ViTs)在捕捉长程依赖性和错综复杂的空间关系方面提供了一种有前途的替代方法,从而增强了对干旱胁迫微妙指标的检测。我们提出了一种可解释的深度学习流水线,利用视觉转换器的强大功能,利用航空图像检测马铃薯作物的干旱胁迫。我们采用了两种不同的方法:一种是 ViT 和支持向量机(SVM)的协同组合,其中 ViT 从航空图像中提取错综复杂的空间特征,SVM 将作物分类为受胁迫或健康;另一种是端到端方法,使用 ViT 中的专用分类层直接检测干旱胁迫。我们的主要发现通过可视化注意力地图解释了 ViT 模型的决策过程。这些地图突出显示了 ViT 模型作为干旱胁迫特征所关注的航空图像中的特定空间特征。我们的研究结果表明,所提出的方法不仅能实现高精度的干旱胁迫识别,还能揭示与干旱胁迫相关的各种细微植物特征。这为干旱胁迫监测提供了一种稳健且可解释的解决方案,农民可据此做出明智的决策,改善作物管理。
{"title":"An Explainable Vision Transformer with Transfer Learning Combined with Support Vector Machine Based Efficient Drought Stress Identification","authors":"Aswini Kumar Patra, Ankit Varshney, Lingaraj Sahoo","doi":"arxiv-2407.21666","DOIUrl":"https://doi.org/arxiv-2407.21666","url":null,"abstract":"Early detection of drought stress is critical for taking timely measures for\u0000reducing crop loss before the drought impact becomes irreversible. The subtle\u0000phenotypical and physiological changes in response to drought stress are\u0000captured by non-invasive imaging techniques and these imaging data serve as\u0000valuable resource for machine learning methods to identify drought stress.\u0000While convolutional neural networks (CNNs) are in wide use, vision transformers\u0000(ViTs) present a promising alternative in capturing long-range dependencies and\u0000intricate spatial relationships, thereby enhancing the detection of subtle\u0000indicators of drought stress. We propose an explainable deep learning pipeline\u0000that leverages the power of ViTs for drought stress detection in potato crops\u0000using aerial imagery. We applied two distinct approaches: a synergistic\u0000combination of ViT and support vector machine (SVM), where ViT extracts\u0000intricate spatial features from aerial images, and SVM classifies the crops as\u0000stressed or healthy and an end-to-end approach using a dedicated classification\u0000layer within ViT to directly detect drought stress. Our key findings explain\u0000the ViT model's decision-making process by visualizing attention maps. These\u0000maps highlight the specific spatial features within the aerial images that the\u0000ViT model focuses as the drought stress signature. Our findings demonstrate\u0000that the proposed methods not only achieve high accuracy in drought stress\u0000identification but also shedding light on the diverse subtle plant features\u0000associated with drought stress. This offers a robust and interpretable solution\u0000for drought stress monitoring for farmers to undertake informed decisions for\u0000improved crop management.","PeriodicalId":501168,"journal":{"name":"arXiv - CS - Emerging Technologies","volume":"46 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141866682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marie Tcholakian, Karolina Gorna, Maryline Laurent, Hella Kaffel Ben Ayed, Montassar Naghmouchi
Electronic Health Records (EHRs) and Medical Data are classified as personal data in every privacy law, meaning that any related service that includes processing such data must come with full security, confidentiality, privacy and accountability. Solutions for health data management, as in storing it, sharing and processing it, are emerging quickly and were significantly boosted by the Covid-19 pandemic that created a need to move things online. EHRs makes a crucial part of digital identity data, and the same digital identity trends -- as in self sovereign identity powered by decentralized ledger technologies like Blockchain, are being researched or implemented in contexts managing digital interactions between health facilities, patients and health professionals. In this paper, we propose a blockchain-based solution enabling secure exchange of EHRs between different parties powered by a self-sovereign identity (SSI) wallet and decentralized identifiers. We also make use of a consortium IPFS network for off-chain storage and attribute-based encryption (ABE) to ensure data confidentiality and integrity. Through our solution, we grant users full control over their medical data, and enable them to securely share it in total confidentiality over secure communication channels between user wallets using encryption. We also use DIDs for better user privacy and limit any possible correlations or identification by using pairwise DIDs. Overall, combining this set of technologies guarantees secure exchange of EHRs, secure storage and management along with by-design features inherited from the technological stack.
电子健康记录 (EHR) 和医疗数据在所有隐私法中都被归类为个人数据,这意味着任何包括处理此类数据的相关服务都必须具备全面的安全性、保密性、隐私性和责任性。医疗数据管理解决方案,如存储、共享和处理数据,正在迅速兴起,并因第 19 号科维德病毒大流行而得到极大推动,该病毒产生了将数据转移到网上的需求。电子病历是数字身份数据的重要组成部分,而同样的数字身份趋势--即由区块链等分散式分类账技术驱动的自我主权身份--正在医疗机构、患者和医疗专业人员之间的数字互动管理中得到研究或实施。在本文中,我们提出了一种基于区块链的解决方案,通过自我主权身份(SSI)钱包和去中心化标识符,实现各方之间安全交换电子健康记录。我们还利用联盟 IPFS 网络进行链外存储,并使用基于属性的加密(ABE)来确保数据的机密性和完整性。通过我们的解决方案,用户可以完全控制自己的医疗数据,并通过用户钱包之间的安全通信渠道,使用加密技术安全地共享完全保密的数据。我们还使用 DID 来改善用户隐私,并通过使用成对 DID 来限制任何可能的关联或识别。总之,将这一系列技术结合起来,可以保证电子病历的安全交换、安全存储和管理,以及从技术栈中继承的设计功能。
{"title":"Self-Sovereign Identity for Consented and Content-Based Access to Medical Records using Blockchain","authors":"Marie Tcholakian, Karolina Gorna, Maryline Laurent, Hella Kaffel Ben Ayed, Montassar Naghmouchi","doi":"arxiv-2407.21559","DOIUrl":"https://doi.org/arxiv-2407.21559","url":null,"abstract":"Electronic Health Records (EHRs) and Medical Data are classified as personal\u0000data in every privacy law, meaning that any related service that includes\u0000processing such data must come with full security, confidentiality, privacy and\u0000accountability. Solutions for health data management, as in storing it, sharing\u0000and processing it, are emerging quickly and were significantly boosted by the\u0000Covid-19 pandemic that created a need to move things online. EHRs makes a\u0000crucial part of digital identity data, and the same digital identity trends --\u0000as in self sovereign identity powered by decentralized ledger technologies like\u0000Blockchain, are being researched or implemented in contexts managing digital\u0000interactions between health facilities, patients and health professionals. In\u0000this paper, we propose a blockchain-based solution enabling secure exchange of\u0000EHRs between different parties powered by a self-sovereign identity (SSI)\u0000wallet and decentralized identifiers. We also make use of a consortium IPFS\u0000network for off-chain storage and attribute-based encryption (ABE) to ensure\u0000data confidentiality and integrity. Through our solution, we grant users full\u0000control over their medical data, and enable them to securely share it in total\u0000confidentiality over secure communication channels between user wallets using\u0000encryption. We also use DIDs for better user privacy and limit any possible\u0000correlations or identification by using pairwise DIDs. Overall, combining this\u0000set of technologies guarantees secure exchange of EHRs, secure storage and\u0000management along with by-design features inherited from the technological\u0000stack.","PeriodicalId":501168,"journal":{"name":"arXiv - CS - Emerging Technologies","volume":"49 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141866683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
CultureVo, Inc. has developed the Integrated Culture Learning Suite (ICLS) to deliver foundational knowledge of world cultures through a combination of interactive lessons and gamified experiences. This paper explores how Generative AI powered by open source Large Langauge Models are utilized within the ICLS to enhance cultural intelligence. The suite employs Generative AI techniques to automate the assessment of learner knowledge, analyze behavioral patterns, and manage interactions with non-player characters using real time learner assessment. Additionally, ICLS provides contextual hint and recommend course content by assessing learner proficiency, while Generative AI facilitates the automated creation and validation of educational content.
{"title":"CultureVo: The Serious Game of Utilizing Gen AI for Enhancing Cultural Intelligence","authors":"Ajita Agarwala, Anupam Purwar, Viswanadhasai Rao","doi":"arxiv-2407.20685","DOIUrl":"https://doi.org/arxiv-2407.20685","url":null,"abstract":"CultureVo, Inc. has developed the Integrated Culture Learning Suite (ICLS) to\u0000deliver foundational knowledge of world cultures through a combination of\u0000interactive lessons and gamified experiences. This paper explores how\u0000Generative AI powered by open source Large Langauge Models are utilized within\u0000the ICLS to enhance cultural intelligence. The suite employs Generative AI\u0000techniques to automate the assessment of learner knowledge, analyze behavioral\u0000patterns, and manage interactions with non-player characters using real time\u0000learner assessment. Additionally, ICLS provides contextual hint and recommend\u0000course content by assessing learner proficiency, while Generative AI\u0000facilitates the automated creation and validation of educational content.","PeriodicalId":501168,"journal":{"name":"arXiv - CS - Emerging Technologies","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141866684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper introduces a new hyper-parameter for Retrieval-Augmented Generation (RAG) systems called Context Window Utilization. RAG systems enhance generative models by incorporating relevant information retrieved from external knowledge bases, improving the factual accuracy and contextual relevance of generated responses. The size of the text chunks retrieved and processed is a critical factor influencing RAG performance. This study aims to identify the optimal chunk size that maximizes answer generation quality. Through systematic experimentation, we analyze the effects of varying chunk sizes on the efficiency and effectiveness of RAG frameworks. Our findings reveal that an optimal chunk size balances the trade-off between providing sufficient context and minimizing irrelevant information. These insights are crucial for enhancing the design and implementation of RAG systems, underscoring the importance of selecting an appropriate chunk size to achieve superior performance.
{"title":"Introducing a new hyper-parameter for RAG: Context Window Utilization","authors":"Kush Juvekar, Anupam Purwar","doi":"arxiv-2407.19794","DOIUrl":"https://doi.org/arxiv-2407.19794","url":null,"abstract":"This paper introduces a new hyper-parameter for Retrieval-Augmented\u0000Generation (RAG) systems called Context Window Utilization. RAG systems enhance\u0000generative models by incorporating relevant information retrieved from external\u0000knowledge bases, improving the factual accuracy and contextual relevance of\u0000generated responses. The size of the text chunks retrieved and processed is a\u0000critical factor influencing RAG performance. This study aims to identify the\u0000optimal chunk size that maximizes answer generation quality. Through systematic\u0000experimentation, we analyze the effects of varying chunk sizes on the\u0000efficiency and effectiveness of RAG frameworks. Our findings reveal that an\u0000optimal chunk size balances the trade-off between providing sufficient context\u0000and minimizing irrelevant information. These insights are crucial for enhancing\u0000the design and implementation of RAG systems, underscoring the importance of\u0000selecting an appropriate chunk size to achieve superior performance.","PeriodicalId":501168,"journal":{"name":"arXiv - CS - Emerging Technologies","volume":"205 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141866687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}