首页 > 最新文献

Journal of Innovation in Digital Ecosystems最新文献

英文 中文
Rogue behavior detection in NoSQL graph databases NoSQL图数据库中的流氓行为检测
Pub Date : 2016-12-01 DOI: 10.1016/j.jides.2016.10.004
Arnaud Castelltort, Anne Laurent

Rogue behaviors refer to behavioral anomalies that can occur in human activities and that can thus be retrieved from human generated data. In this paper, we aim at showing that NoSQL graph databases are a useful tool for this purpose. Indeed these database engines exploit property graphs that can easily represent human and object interactions whatever the volume and complexity of the data. These interactions lead to fraud rings in the graphs in the form of sophisticated chains of indirect links between fraudsters representing successive transactions (money, communications, etc.) from which rogue behaviours are detected. Our work is based on two extensions of such NoSQL graph databases. The first extension allows the handling of time-variant data while the second one is devoted to the management of imprecise queries with a DSL (to define flexible operators and operations with Scala) and the Cypherf declarative flexible query language over NoSQL graph databases. These extensions allow to better address and describe sophisticated frauds. Feasibility have been studied to assess our proposition.

流氓行为是指在人类活动中可能发生的行为异常,因此可以从人类生成的数据中检索到。在本文中,我们旨在表明NoSQL图数据库是实现这一目的的有用工具。实际上,这些数据库引擎利用属性图,可以很容易地表示人和对象之间的交互,而不管数据的数量和复杂性如何。这些相互作用导致图表中的欺诈环以复杂的间接链接链的形式存在于代表连续交易(金钱、通信等)的欺诈者之间,从中可以检测到流氓行为。我们的工作是基于这类NoSQL图数据库的两个扩展。第一个扩展允许处理时变数据,而第二个扩展致力于用DSL(用Scala定义灵活的操作符和操作)和Cypherf声明式灵活查询语言管理NoSQL图数据库上的不精确查询。这些扩展允许更好地处理和描述复杂的欺诈行为。已经研究了可行性,以评估我们的建议。
{"title":"Rogue behavior detection in NoSQL graph databases","authors":"Arnaud Castelltort,&nbsp;Anne Laurent","doi":"10.1016/j.jides.2016.10.004","DOIUrl":"10.1016/j.jides.2016.10.004","url":null,"abstract":"<div><p>Rogue behaviors refer to behavioral anomalies that can occur in human activities and that can thus be retrieved from human generated data. In this paper, we aim at showing that NoSQL graph databases are a useful tool for this purpose. Indeed these database engines exploit property graphs that can easily represent human and object interactions whatever the volume and complexity of the data. These interactions lead to fraud rings in the graphs in the form of sophisticated chains of indirect links between fraudsters representing successive transactions (money, communications, etc.) from which rogue behaviours are detected. Our work is based on two extensions of such NoSQL graph databases. The first extension allows the handling of time-variant data while the second one is devoted to the management of imprecise queries with a DSL (to define flexible operators and operations with Scala) and the Cypherf declarative flexible query language over NoSQL graph databases. These extensions allow to better address and describe sophisticated frauds. Feasibility have been studied to assess our proposition.</p></div>","PeriodicalId":100792,"journal":{"name":"Journal of Innovation in Digital Ecosystems","volume":"3 2","pages":"Pages 70-82"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.jides.2016.10.004","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130886063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Mining local process models 挖掘本地流程模型
Pub Date : 2016-12-01 DOI: 10.1016/j.jides.2016.11.001
Niek Tax , Natalia Sidorova , Reinder Haakma , Wil M.P. van der Aalst

In this paper we describe a method to discover frequent behavioral patterns in event logs. We express these patterns as local process models. Local process model mining can be positioned in-between process discovery and episode/sequential pattern mining. The technique presented in this paper is able to learn behavioral patterns involving sequential composition, concurrency, choice and loop, like in process mining. However, we do not look at start-to-end models, which distinguishes our approach from process discovery and creates a link to episode/sequential pattern mining. We propose an incremental procedure for building local process models capturing frequent patterns based on so-called process trees. We propose five quality dimensions and corresponding metrics for local process models, given an event log. We show monotonicity properties for some quality dimensions, enabling a speedup of local process model discovery through pruning. We demonstrate through a real life case study that mining local patterns allows us to get insights in processes where regular start-to-end process discovery techniques are only able to learn unstructured, flower-like, models.

本文描述了一种在事件日志中发现频繁行为模式的方法。我们将这些模式表示为本地流程模型。本地流程模型挖掘可以定位在流程发现和事件/顺序模式挖掘之间。本文提出的技术能够像过程挖掘一样学习涉及顺序组合、并发、选择和循环的行为模式。然而,我们没有考虑从开始到结束的模型,这将我们的方法与过程发现区分开来,并创建了到事件/顺序模式挖掘的链接。我们提出了一个增量过程,用于构建基于所谓的过程树的捕获频繁模式的本地过程模型。在给定事件日志的情况下,我们为本地流程模型提出了五个质量维度和相应的度量。我们展示了一些质量维度的单调性,通过剪枝加速了局部过程模型的发现。我们通过一个真实的案例研究证明,挖掘本地模式使我们能够深入了解流程,而常规的从开始到结束的流程发现技术只能学习非结构化的、类似花朵的模型。
{"title":"Mining local process models","authors":"Niek Tax ,&nbsp;Natalia Sidorova ,&nbsp;Reinder Haakma ,&nbsp;Wil M.P. van der Aalst","doi":"10.1016/j.jides.2016.11.001","DOIUrl":"10.1016/j.jides.2016.11.001","url":null,"abstract":"<div><p>In this paper we describe a method to discover frequent behavioral patterns in event logs. We express these patterns as <em>local process models</em>. Local process model mining can be positioned in-between process discovery and episode/sequential pattern mining. The technique presented in this paper is able to learn behavioral patterns involving sequential composition, concurrency, choice and loop, like in process mining. However, we do not look at start-to-end models, which distinguishes our approach from process discovery and creates a link to episode/sequential pattern mining. We propose an incremental procedure for building local process models capturing frequent patterns based on so-called process trees. We propose five quality dimensions and corresponding metrics for local process models, given an event log. We show monotonicity properties for some quality dimensions, enabling a speedup of local process model discovery through pruning. We demonstrate through a real life case study that mining local patterns allows us to get insights in processes where regular start-to-end process discovery techniques are only able to learn unstructured, flower-like, models.</p></div>","PeriodicalId":100792,"journal":{"name":"Journal of Innovation in Digital Ecosystems","volume":"3 2","pages":"Pages 183-196"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.jides.2016.11.001","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133283395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 98
Evaluating the descriptive power of Instagram hashtags 评估Instagram标签的描述能力
Pub Date : 2016-12-01 DOI: 10.1016/j.jides.2016.10.001
Stamatios Giannoulakis, Nicolas Tsapatsoulis

Image tagging is an essential step for developing Automatic Image Annotation (AIA) methods that are based on the learning by example paradigm. However, manual image annotation, even for creating training sets for machine learning algorithms, requires hard effort and contains human judgment errors and subjectivity. Thus, alternative ways for automatically creating training examples, i.e., pairs of images and tags, are pursued. In this work, we investigate whether tags accompanying photos in the Instagram can be considered as image annotation metadata. If such a claim is proved then Instagram could be used as a very rich, easy to collect automatically, source of training data for the development of AIA techniques. Our hypothesis is that Instagram hashtags, and especially those provided by the photo owner/creator, express more accurately the content of a photo compared to the tags assigned to a photo during explicit image annotation processes like crowdsourcing. In this context, we explore the descriptive power of hashtags by examining whether other users would use the same, with the owner, hashtags to annotate an image. For this purpose 1000 Instagram images were collected and one to four hashtags, considered as the most descriptive ones for the image in question, were chosen among the hashtags used by the photo owner. An online database was constructed to generate online questionnaires containing 20 images each, which were distributed to experiment participants so they can choose the best suitable hashtag for every image according to their interpretation. Results show that an average of 66% of the participants hashtag choices coincide with those suggested by the photo owners; thus, an initial evidence towards our hypothesis confirmation can be claimed.

图像标注是基于实例学习的自动图像标注方法的重要组成部分。然而,手动图像标注,即使是为机器学习算法创建训练集,也需要付出艰苦的努力,并且包含人为的判断错误和主观性。因此,寻求自动创建训练样例的替代方法,即图像和标签对。在这项工作中,我们研究了Instagram中照片的标签是否可以被视为图像标注元数据。如果这种说法得到证实,那么Instagram可以作为一个非常丰富、易于自动收集的培训数据来源,用于开发AIA技术。我们的假设是,Instagram的标签,尤其是照片所有者/创作者提供的标签,比在众包等明确的图像注释过程中分配给照片的标签更准确地表达了照片的内容。在这种情况下,我们通过检查其他用户是否会与所有者一起使用相同的标签来注释图像,来探索标签的描述能力。为此,收集了1000张Instagram图片,并从照片所有者使用的标签中选择了一到四个最具描述性的标签。我们构建了一个在线数据库,生成在线问卷,每个问卷包含20张图片,分发给实验参与者,让他们根据自己的理解为每张图片选择最合适的标签。结果显示,平均66%的参与者选择的标签与照片所有者建议的标签一致;因此,对我们的假设确认的初步证据可以声称。
{"title":"Evaluating the descriptive power of Instagram hashtags","authors":"Stamatios Giannoulakis,&nbsp;Nicolas Tsapatsoulis","doi":"10.1016/j.jides.2016.10.001","DOIUrl":"10.1016/j.jides.2016.10.001","url":null,"abstract":"<div><p>Image tagging is an essential step for developing Automatic Image Annotation (AIA) methods that are based on the learning by example paradigm. However, manual image annotation, even for creating training sets for machine learning algorithms, requires hard effort and contains human judgment errors and subjectivity. Thus, alternative ways for automatically creating training examples, i.e., pairs of images and tags, are pursued. In this work, we investigate whether tags accompanying photos in the Instagram can be considered as image annotation metadata. If such a claim is proved then Instagram could be used as a very rich, easy to collect automatically, source of training data for the development of AIA techniques. Our hypothesis is that Instagram hashtags, and especially those provided by the photo owner/creator, express more accurately the content of a photo compared to the tags assigned to a photo during explicit image annotation processes like crowdsourcing. In this context, we explore the descriptive power of hashtags by examining whether other users would use the same, with the owner, hashtags to annotate an image. For this purpose 1000 Instagram images were collected and one to four hashtags, considered as the most descriptive ones for the image in question, were chosen among the hashtags used by the photo owner. An online database was constructed to generate online questionnaires containing 20 images each, which were distributed to experiment participants so they can choose the best suitable hashtag for every image according to their interpretation. Results show that an average of 66% of the participants hashtag choices coincide with those suggested by the photo owners; thus, an initial evidence towards our hypothesis confirmation can be claimed.</p></div>","PeriodicalId":100792,"journal":{"name":"Journal of Innovation in Digital Ecosystems","volume":"3 2","pages":"Pages 114-129"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.jides.2016.10.001","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115578994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 75
Occupancy driven building performance assessment 占用率驱动的建筑性能评估
Pub Date : 2016-12-01 DOI: 10.1016/j.jides.2016.10.008
Dimosthenis Ioannidis , Pantelis Tropios , Stelios Krinidis , George Stavropoulos , Dimitrios Tzovaras , Spiridon Likothanasis

In this paper, we focus on the building performance assessment using big data and visual analytics techniques driven by building occupancy. Building occupancy is a paramount factor in building performance, specifically lighting, plug loads and HVAC equipment utilization. Extrapolation of patterns from big data sets, which consist of building information, energy consumption, environmental measurements and namely occupancy information, is a powerful analysis technique to extract useful semantic information about building performance. To this end, visual analytics techniques are exploited to visualize them in a compact and comprehensive way taking into account properties of human cognition, perception and sense making. Visual Analytics facilitates the detailed spatiotemporal analysis building performance in terms of occupancy comfort, building performance and energy consumption and exploits innovative data mining techniques and mechanisms to allow analysts to detect patterns and crucial point that are difficult to be detected otherwise, thus assisting them to further optimize the building’s operation. The presented tool has been tested on real data information acquired from a building located at southern Europe demonstrating its effectiveness and its usability for building managers.

在本文中,我们关注的是使用大数据和可视化分析技术来评估建筑的性能,这些技术是由建筑占用率驱动的。建筑占用率是影响建筑性能的最重要因素,特别是照明、插头负荷和暖通空调设备利用率。从建筑信息、能源消耗、环境测量和占用信息组成的大数据集中推断模式是一种强大的分析技术,可以提取有关建筑性能的有用语义信息。为此,可视化分析技术被利用,以一种紧凑而全面的方式将它们可视化,同时考虑到人类认知、感知和意义制造的特性。可视化分析有助于在使用舒适度、建筑性能和能耗方面对建筑性能进行详细的时空分析,并利用创新的数据挖掘技术和机制,使分析人员能够发现其他方式难以发现的模式和关键点,从而帮助他们进一步优化建筑的运营。所提出的工具已在从位于南欧的建筑物获得的真实数据信息上进行了测试,证明了其有效性和对建筑物管理人员的可用性。
{"title":"Occupancy driven building performance assessment","authors":"Dimosthenis Ioannidis ,&nbsp;Pantelis Tropios ,&nbsp;Stelios Krinidis ,&nbsp;George Stavropoulos ,&nbsp;Dimitrios Tzovaras ,&nbsp;Spiridon Likothanasis","doi":"10.1016/j.jides.2016.10.008","DOIUrl":"10.1016/j.jides.2016.10.008","url":null,"abstract":"<div><p>In this paper, we focus on the building performance assessment using big data and visual analytics techniques driven by building occupancy. Building occupancy is a paramount factor in building performance, specifically lighting, plug loads and HVAC equipment utilization. Extrapolation of patterns from big data sets, which consist of building information, energy consumption, environmental measurements and namely occupancy information, is a powerful analysis technique to extract useful semantic information about building performance. To this end, visual analytics techniques are exploited to visualize them in a compact and comprehensive way taking into account properties of human cognition, perception and sense making. Visual Analytics facilitates the detailed spatiotemporal analysis building performance in terms of occupancy comfort, building performance and energy consumption and exploits innovative data mining techniques and mechanisms to allow analysts to detect patterns and crucial point that are difficult to be detected otherwise, thus assisting them to further optimize the building’s operation. The presented tool has been tested on real data information acquired from a building located at southern Europe demonstrating its effectiveness and its usability for building managers.</p></div>","PeriodicalId":100792,"journal":{"name":"Journal of Innovation in Digital Ecosystems","volume":"3 2","pages":"Pages 57-69"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.jides.2016.10.008","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117169013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
PEAS-LI: PEAS with Location Information for coverage in Wireless Sensor Networks pase - li:用于无线传感器网络覆盖的带有位置信息的豌豆
Pub Date : 2016-12-01 DOI: 10.1016/j.jides.2016.11.002
Rachid Beghdad, Mohamed Abdenour Hocini, Narimane Cherchour, Mourad Chelik

Probing Environment and Adaptive Sleeping (PEAS) is one of the most cited protocols in the literature for coverage in Wireless Sensor Networks (WSNs). PEAS maintains only two variables: the number of received messages N and the period of time T necessary to receive these messages. Sensor nodes do not keep any information about their neighbors. In this paper we present PEAS-LI an extension of PEAS to improve the coverage and connectivity. PEAS-LI operates in two steps, initially we apply PEAS as described in Ye et al. (2003) then the neighbors exchange their state and location information in order to estimate precisely the coverage and to make their decision basing on the gathered information. The alone additional requirement is that PEAS-LI supposes that each node knows its position in the monitored area of interest (AI). PEAS-LI performance evaluation proves that it is a robust protocol with high coverage ratio and that it outperforms PEAS and a set of other protocols.

探测环境和自适应睡眠(PEAS)是文献中被引用最多的无线传感器网络(WSNs)覆盖协议之一。PEAS只维护两个变量:接收消息的数量N和接收这些消息所需的时间周期T。传感器节点不保留其邻居的任何信息。在本文中,我们提出了pea - li作为pea的扩展,以提高其覆盖和连通性。pea - li分为两个步骤,首先我们像Ye等人(2003)所描述的那样应用pea,然后邻居交换他们的状态和位置信息,以便精确地估计覆盖范围,并根据收集到的信息做出决策。唯一的额外要求是,pease - li假设每个节点都知道其在监视的感兴趣区域(AI)中的位置。通过对pea - li的性能评估,证明了它是一个鲁棒的协议,具有很高的覆盖率,优于pea和其他一组协议。
{"title":"PEAS-LI: PEAS with Location Information for coverage in Wireless Sensor Networks","authors":"Rachid Beghdad,&nbsp;Mohamed Abdenour Hocini,&nbsp;Narimane Cherchour,&nbsp;Mourad Chelik","doi":"10.1016/j.jides.2016.11.002","DOIUrl":"10.1016/j.jides.2016.11.002","url":null,"abstract":"<div><p>Probing Environment and Adaptive Sleeping (PEAS) is one of the most cited protocols in the literature for coverage in Wireless Sensor Networks (WSNs). PEAS maintains only two variables: the number of received messages N and the period of time T necessary to receive these messages. Sensor nodes do not keep any information about their neighbors. In this paper we present PEAS-LI an extension of PEAS to improve the coverage and connectivity. PEAS-LI operates in two steps, initially we apply PEAS as described in Ye et al. (2003) then the neighbors exchange their state and location information in order to estimate precisely the coverage and to make their decision basing on the gathered information. The alone additional requirement is that PEAS-LI supposes that each node knows its position in the monitored area of interest (AI). PEAS-LI performance evaluation proves that it is a robust protocol with high coverage ratio and that it outperforms PEAS and a set of other protocols.</p></div>","PeriodicalId":100792,"journal":{"name":"Journal of Innovation in Digital Ecosystems","volume":"3 2","pages":"Pages 163-171"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.jides.2016.11.002","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127211074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
CAS-based information retrieval in semi-structured documents: CASISS model 基于CASISS的半结构化文档信息检索:CASISS模型
Pub Date : 2016-12-01 DOI: 10.1016/j.jides.2016.11.004
Larbi Guezouli , Hassane Essafi

This paper aims to address the assessment the similarity between documents or pieces of documents. For this purpose we have developed CASISS (CAlculation of SImilarity of Semi-Structured documents) method to quantify how two given texts are similar. The method can be employed in wide area of applications including content reuse detection which is a hot and challenging topic. It can be also used to increase the accuracy of the information retrieval process by taking into account not only the presence of query terms in the given document (Content Only search — CO) but also the topology (position continuity) of these terms (based on Content And Structure Search — CAS). Tracking the origin of the information in social media, copy right management, plagiarism detection, social media mining and monitoring, digital forensic are among other applications require tools such as CASISS to measure, with a high accuracy, the content overlap between two documents.

CASISS identify elements of semi-structured documents using elements descriptors. Each semi-structured document is pre-processed before the extraction of a set of elements descriptors, which characterize the content of the elements.

本文旨在解决文件之间或文件片段之间的相似性评估问题。为此,我们开发了CASISS(计算半结构化文档的相似性)方法来量化两个给定文本的相似程度。该方法可用于广泛的应用领域,包括内容重用检测,这是一个热点和具有挑战性的课题。它还可以用于提高信息检索过程的准确性,不仅考虑给定文档中查询词的存在(仅内容搜索- CO),而且考虑这些词的拓扑结构(位置连续性)(基于内容和结构搜索- CAS)。跟踪社交媒体信息的来源、版权管理、抄袭检测、社交媒体挖掘和监控、数字取证等应用都需要CASISS等工具来高精度地测量两个文档之间的内容重叠。CASISS使用元素描述符标识半结构化文档的元素。在提取一组元素描述符(描述元素的内容)之前,对每个半结构化文档进行预处理。
{"title":"CAS-based information retrieval in semi-structured documents: CASISS model","authors":"Larbi Guezouli ,&nbsp;Hassane Essafi","doi":"10.1016/j.jides.2016.11.004","DOIUrl":"10.1016/j.jides.2016.11.004","url":null,"abstract":"<div><p>This paper aims to address the assessment the similarity between documents or pieces of documents. For this purpose we have developed CASISS (CAlculation of SImilarity of Semi-Structured documents) method to quantify how two given texts are similar. The method can be employed in wide area of applications including content reuse detection which is a hot and challenging topic. It can be also used to increase the accuracy of the information retrieval process by taking into account not only the presence of query terms in the given document (Content Only search — CO) but also the topology (position continuity) of these terms (based on Content And Structure Search — CAS). Tracking the origin of the information in social media, copy right management, plagiarism detection, social media mining and monitoring, digital forensic are among other applications require tools such as CASISS to measure, with a high accuracy, the content overlap between two documents.</p><p>CASISS identify elements of semi-structured documents using elements descriptors. Each semi-structured document is pre-processed before the extraction of a set of elements descriptors, which characterize the content of the elements.</p></div>","PeriodicalId":100792,"journal":{"name":"Journal of Innovation in Digital Ecosystems","volume":"3 2","pages":"Pages 155-162"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.jides.2016.11.004","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132321088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Meaning-based machine learning for information assurance 用于信息保障的基于意义的机器学习
Pub Date : 2016-12-01 DOI: 10.1016/j.jides.2016.10.007
Courtney Falk, Lauren Stuart

This paper presents meaning-based machine learning, the use of semantically meaningful input data into machine learning systems in order to produce output that is meaningful to a human user where the semantic input comes from the Ontological Semantics Technology theory of natural language processing. How to bridge from knowledge-based natural language processing architectures to traditional machine learning systems is described to include high-level descriptions of the steps taken. These meaning-based machine learning systems are then applied to problems in information assurance and security that remain unsolved and feature large amounts of natural language text.

本文介绍了基于意义的机器学习,将语义上有意义的输入数据使用到机器学习系统中,以便产生对人类用户有意义的输出,其中语义输入来自自然语言处理的本体语义技术理论。描述了如何从基于知识的自然语言处理体系结构过渡到传统的机器学习系统,包括对所采取步骤的高级描述。然后,这些基于意义的机器学习系统被应用于信息保障和安全方面尚未解决的问题,并以大量自然语言文本为特征。
{"title":"Meaning-based machine learning for information assurance","authors":"Courtney Falk,&nbsp;Lauren Stuart","doi":"10.1016/j.jides.2016.10.007","DOIUrl":"10.1016/j.jides.2016.10.007","url":null,"abstract":"<div><p>This paper presents meaning-based machine learning, the use of semantically meaningful input data into machine learning systems in order to produce output that is meaningful to a human user where the semantic input comes from the Ontological Semantics Technology theory of natural language processing. How to bridge from knowledge-based natural language processing architectures to traditional machine learning systems is described to include high-level descriptions of the steps taken. These meaning-based machine learning systems are then applied to problems in information assurance and security that remain unsolved and feature large amounts of natural language text.</p></div>","PeriodicalId":100792,"journal":{"name":"Journal of Innovation in Digital Ecosystems","volume":"3 2","pages":"Pages 141-147"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.jides.2016.10.007","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79982565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An enhanced Graph Analytics Platform (GAP) providing insight in Big Network Data 增强型图形分析平台(GAP)提供大网络数据洞察力
Pub Date : 2016-12-01 DOI: 10.1016/j.jides.2016.10.005
Anastasios Drosou , Ilias Kalamaras , Stavros Papadopoulos , Dimitrios Tzovaras

Being a widely adapted and acknowledged practice for the representation of inter- and intra-dependent information streams, network graphs are nowadays growing vast in both size and complexity, due to the rapid expansion of sources, types, and amounts of produced data. In this context, the efficient processing of the big amounts of information, also known as Big Data forms a major challenge for both the research community and a wide variety of industrial sectors, involving security, health and financial applications. Serving these emerging needs, the current paper presents a Graph Analytics based Platform (GAP) that implements a top-down approach for the facilitation of Data Mining processes through the incorporation of state-of-the-art techniques, like behavioural clustering, interactive visualizations, multi-objective optimization, etc. The applicability of this platform is validated on 2 istinct real-world use cases, which can be considered as characteristic examples of modern Big Data problems, due to the vast amount of information they deal with. In particular, (i) the root cause analysis of a Denial of Service attack in the network of a mobile operator and (ii) the early detection of an emerging event or a hot topic in social media communities. In order to address the large volume of the data, the proposed application starts with an aggregated overview of the whole network and allows the operator to gradually focus on smaller sets of data, using different levels of abstraction. The proposed platform offers differentiation between different user behaviors that enable the analyst to obtain insight on the network’s operation and to extract the meaningful information in an effortless manner. Dynamic hypothesis formulation techniques exploited by graph traversing and pattern mining, enable the analyst to set concrete network-related hypotheses, and validate or reject them accordingly.

作为一种广泛适应和公认的用于表示相互依赖和相互依赖的信息流的实践,由于产生的数据的来源、类型和数量的快速扩展,网络图在规模和复杂性方面都在不断增长。在这种情况下,高效处理大量信息(也称为大数据)对研究界和涉及安全、卫生和金融应用的各种工业部门构成了重大挑战。为满足这些新兴需求,本文提出了一个基于图形分析的平台(GAP),该平台通过结合最先进的技术(如行为聚类、交互式可视化、多目标优化等),实现了一种自上而下的方法,以促进数据挖掘过程。该平台的适用性在两个不同的实际用例上得到验证,由于它们处理的信息量巨大,因此可以被视为现代大数据问题的典型例子。特别是,(i)移动运营商网络中拒绝服务攻击的根本原因分析和(ii)早期发现新兴事件或社交媒体社区中的热门话题。为了处理大量数据,拟议的应用程序从整个网络的汇总概述开始,并允许运营商使用不同的抽象级别逐步关注较小的数据集。所提出的平台提供了不同用户行为之间的区别,使分析师能够获得对网络运行的洞察力,并以毫不费力的方式提取有意义的信息。通过图遍历和模式挖掘利用的动态假设制定技术,使分析人员能够设置具体的与网络相关的假设,并相应地验证或拒绝它们。
{"title":"An enhanced Graph Analytics Platform (GAP) providing insight in Big Network Data","authors":"Anastasios Drosou ,&nbsp;Ilias Kalamaras ,&nbsp;Stavros Papadopoulos ,&nbsp;Dimitrios Tzovaras","doi":"10.1016/j.jides.2016.10.005","DOIUrl":"10.1016/j.jides.2016.10.005","url":null,"abstract":"<div><p>Being a widely adapted and acknowledged practice for the representation of inter- and intra-dependent information streams, network graphs are nowadays growing vast in both size and complexity, due to the rapid expansion of sources, types, and amounts of produced data. In this context, the efficient processing of the big amounts of information, also known as Big Data forms a major challenge for both the research community and a wide variety of industrial sectors, involving security, health and financial applications. Serving these emerging needs, the current paper presents a Graph Analytics based Platform (GAP) that implements a top-down approach for the facilitation of Data Mining processes through the incorporation of state-of-the-art techniques, like behavioural clustering, interactive visualizations, multi-objective optimization, etc. The applicability of this platform is validated on 2 istinct real-world use cases, which can be considered as characteristic examples of modern Big Data problems, due to the vast amount of information they deal with. In particular, (i) the root cause analysis of a Denial of Service attack in the network of a mobile operator and (ii) the early detection of an emerging event or a hot topic in social media communities. In order to address the large volume of the data, the proposed application starts with an aggregated overview of the whole network and allows the operator to gradually focus on smaller sets of data, using different levels of abstraction. The proposed platform offers differentiation between different user behaviors that enable the analyst to obtain insight on the network’s operation and to extract the meaningful information in an effortless manner. Dynamic hypothesis formulation techniques exploited by graph traversing and pattern mining, enable the analyst to set concrete network-related hypotheses, and validate or reject them accordingly.</p></div>","PeriodicalId":100792,"journal":{"name":"Journal of Innovation in Digital Ecosystems","volume":"3 2","pages":"Pages 83-97"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.jides.2016.10.005","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128269939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Using neural networks to aid CVSS risk aggregation — An empirically validated approach 使用神经网络来帮助CVSS风险聚合-一种经验验证的方法
Pub Date : 2016-12-01 DOI: 10.1016/j.jides.2016.10.002
Alexander Beck , Stefan Rass

Managing risks in large information infrastructures is often tied to inevitable simplification of the system, to make a risk analysis feasible. One common way of “compacting” matters for efficient decision making is to aggregate vulnerabilities and risks identified for distinct components into an overall risk measure related to an entire subsystem and the system as a whole. Traditionally, this aggregation is done pessimistically by taking the overall risk as the maximum of all individual risks, following the heuristic understanding that the “security chain” is only as strong as its weakest link. As that method is quite wasteful of information, this work proposes a new approach, which uses neural networks to resemble human expert’s decision making in the same regard. To validate the concept, we conducted an empirical study on human expert’s risk assessments, and trained several candidate networks on the empirical data to identify the best approximation to the opinions in our expert group.

管理大型信息基础设施中的风险通常与不可避免的系统简化联系在一起,以使风险分析可行。有效决策制定的“压缩”事项的一种常见方法是将为不同组件识别的漏洞和风险聚合到与整个子系统和系统作为一个整体相关的总体风险度量中。传统上,这种聚合是悲观地通过将整体风险作为所有个体风险的最大值来完成的,遵循启发式理解,即“安全链”的强度仅与其最弱的环节一样强。由于这种方法非常浪费信息,本文提出了一种新的方法,即利用神经网络来模拟人类专家在同一方面的决策。为了验证这一概念,我们对人类专家的风险评估进行了实证研究,并在经验数据上训练了几个候选网络,以确定与我们专家组意见的最佳近似。
{"title":"Using neural networks to aid CVSS risk aggregation — An empirically validated approach","authors":"Alexander Beck ,&nbsp;Stefan Rass","doi":"10.1016/j.jides.2016.10.002","DOIUrl":"10.1016/j.jides.2016.10.002","url":null,"abstract":"<div><p>Managing risks in large information infrastructures is often tied to inevitable simplification of the system, to make a risk analysis feasible. One common way of “compacting” matters for efficient decision making is to aggregate vulnerabilities and risks identified for distinct components into an overall risk measure related to an entire subsystem and the system as a whole. Traditionally, this aggregation is done pessimistically by taking the overall risk as the maximum of all individual risks, following the heuristic understanding that the “security chain” is only as strong as its weakest link. As that method is quite wasteful of information, this work proposes a new approach, which uses neural networks to resemble human expert’s decision making in the same regard. To validate the concept, we conducted an empirical study on human expert’s risk assessments, and trained several candidate networks on the empirical data to identify the best approximation to the opinions in our expert group.</p></div>","PeriodicalId":100792,"journal":{"name":"Journal of Innovation in Digital Ecosystems","volume":"3 2","pages":"Pages 148-154"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.jides.2016.10.002","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123550635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
The importance of socio-technical resources for software ecosystems management 社会技术资源对软件生态系统管理的重要性
Pub Date : 2016-12-01 DOI: 10.1016/j.jides.2016.10.006
Thaiana Lima, Rodrigo Pereira dos Santos, Jonice Oliveira, Cláudia Werner

Software Ecosystem (SECO) is often understood as a set of actors interacting among themselves and manipulating artifacts with the support of a common technology platform. Usually, SECO approaches can be designed as an environment whose component repository is gathering stakeholders as well as software products and components. By manipulating software artifacts, a technical network emerges from interactions made over the component repository in order to reuse artifacts, improving code quality, downloading, selling, buying etc. Although technical repositories are essential to store SECO’s artifacts, the interaction among actors in an emerging social network is a key factor to strengthen the SECO’s through increasing actor’s participation, e.g., developing new software, reporting bugs, and communicating with suppliers. In the SECO context, both the internal and external actors keep the platform’s components updated and documented, and even support requirements and suggestions for new releases and bug fixes. However, those repositories often lack resources to support actors’ relationships and consequently to improve the reuse processes by stimulating actors’ interactions, information exchange and better understanding on how artifacts are manipulated by actors. In this paper, we focused on investigating SECO as component repositories that include socio-technical resources. As such, we present a survey that allowed us to identify the relevance of each resource for a SECO based on component repositories, initially focused on the Brazilian scenario. This paper also describes the analysis of the data collected in that survey. Information of other SECO elements extracted from the data is also presented, e.g., the participants’ profile and how they behave within a SECO. As an evolution of our research, a study for evaluating the availability and the use of such resources on top of two platforms was also conducted with experts in collaborative development in order to analyze the usage of the most relevant resources in real SECO’s platforms. We concluded that socio-technical resources have aided collaboration in software development for SECO, coordination of teams based on more knowledge of actor’s tasks and interactions, and monitoring of quality of SECOs’ platforms through the orchestration of the contributions developed by external actors.

软件生态系统(SECO)通常被理解为一组参与者之间的交互,并在公共技术平台的支持下操纵工件。通常,SECO方法可以被设计成一个环境,其组件存储库收集涉众以及软件产品和组件。通过操纵软件构件,一个技术网络从组件存储库的交互中浮现出来,以便重用构件、提高代码质量、下载、销售、购买等等。尽管技术存储库对于存储SECO的工件至关重要,但新兴社会网络中参与者之间的互动是通过增加参与者的参与来加强SECO的关键因素,例如,开发新软件,报告错误,以及与供应商沟通。在SECO上下文中,内部和外部参与者都保持平台组件的更新和文档化,甚至支持新版本和错误修复的需求和建议。然而,这些存储库通常缺乏资源来支持参与者的关系,从而通过刺激参与者的交互、信息交换和更好地理解参与者如何操纵工件来改进重用过程。在本文中,我们重点研究SECO作为包含社会技术资源的组件存储库。因此,我们提出了一项调查,该调查允许我们根据组件存储库确定SECO的每个资源的相关性,最初的重点是巴西场景。本文还描述了对该调查中收集的数据的分析。从数据中提取的其他SECO元素的信息也被呈现出来,例如,参与者的简介以及他们在SECO中的行为方式。作为我们研究的演变,我们还与协作开发专家一起进行了一项评估两个平台上这些资源的可用性和使用情况的研究,以分析SECO平台中最相关资源的使用情况。我们的结论是,社会技术资源有助于SECO软件开发的协作,基于对参与者任务和交互的更多了解来协调团队,并通过协调外部参与者开发的贡献来监测SECO平台的质量。
{"title":"The importance of socio-technical resources for software ecosystems management","authors":"Thaiana Lima,&nbsp;Rodrigo Pereira dos Santos,&nbsp;Jonice Oliveira,&nbsp;Cláudia Werner","doi":"10.1016/j.jides.2016.10.006","DOIUrl":"10.1016/j.jides.2016.10.006","url":null,"abstract":"<div><p>Software Ecosystem (SECO) is often understood as a set of actors interacting among themselves and manipulating artifacts with the support of a common technology platform. Usually, SECO approaches can be designed as an environment whose component repository is gathering stakeholders as well as software products and components. By manipulating software artifacts, a technical network emerges from interactions made over the component repository in order to reuse artifacts, improving code quality, downloading, selling, buying etc. Although technical repositories are essential to store SECO’s artifacts, the interaction among actors in an emerging social network is a key factor to strengthen the SECO’s through increasing actor’s participation, e.g., developing new software, reporting bugs, and communicating with suppliers. In the SECO context, both the internal and external actors keep the platform’s components updated and documented, and even support requirements and suggestions for new releases and bug fixes. However, those repositories often lack resources to support actors’ relationships and consequently to improve the reuse processes by stimulating actors’ interactions, information exchange and better understanding on how artifacts are manipulated by actors. In this paper, we focused on investigating SECO as component repositories that include socio-technical resources. As such, we present a survey that allowed us to identify the relevance of each resource for a SECO based on component repositories, initially focused on the Brazilian scenario. This paper also describes the analysis of the data collected in that survey. Information of other SECO elements extracted from the data is also presented, e.g., the participants’ profile and how they behave within a SECO. As an evolution of our research, a study for evaluating the availability and the use of such resources on top of two platforms was also conducted with experts in collaborative development in order to analyze the usage of the most relevant resources in real SECO’s platforms. We concluded that socio-technical resources have aided collaboration in software development for SECO, coordination of teams based on more knowledge of actor’s tasks and interactions, and monitoring of quality of SECOs’ platforms through the orchestration of the contributions developed by external actors.</p></div>","PeriodicalId":100792,"journal":{"name":"Journal of Innovation in Digital Ecosystems","volume":"3 2","pages":"Pages 98-113"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.jides.2016.10.006","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129792549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
期刊
Journal of Innovation in Digital Ecosystems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1