首页 > 最新文献

International Journal of Data Warehousing and Mining最新文献

英文 中文
Concept of Temporal Pretopology for the Analysis for Structural Changes: Application to Econometrics 结构变化分析的时间预拓扑概念:在计量经济学中的应用
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2022-04-01 DOI: 10.4018/ijdwm.298004
Nazha Selmaoui-Folcher, Jannaï Tokotoko, Samuel Gorohouna, Laïsa Roi, C. Leschi, Catherine Ris
Pretopology is a mathematical model developed from a weakening of the topological axiomatic. It was initially used in economic, social and biological sciences and next in pattern recognition and image analysis. More recently, it has been applied to the analysis of complex networks. Pretopology enables to work in a mathematical framework with weak properties, and its nonidempotent operator called pseudo-closure permits to implement iterative algorithms. It proposes a formalism that generalizes graph theory concepts and allows to model problems universally. In this paper, authors will extend this mathematical model to analyze complex data with spatiotemporal dimensions. Authors define the notion of a temporal pretopology based on a temporal function. They give an example of temporal function based on a binary relation, and construct a temporal pretopology. They define two new notions of temporal substructures which aim at representing evolution of substructures. They propose algorithms to extract these substructures. They experiment the proposition on 2 data and two economic real data.
预拓扑学是一种从拓扑公理的弱化发展而来的数学模型。它最初用于经济、社会和生物科学,然后用于模式识别和图像分析。最近,它被应用于复杂网络的分析。Pretopology允许在具有弱性质的数学框架中工作,它的非幂等算子称为伪闭包,允许实现迭代算法。它提出了一种一般化图论概念并允许对问题进行普遍建模的形式主义。在本文中,作者将这一数学模型扩展到分析具有时空维度的复杂数据。作者基于时间函数定义了时间预拓扑的概念。他们给出了一个基于二元关系的时间函数的例子,并构造了一个时间预拓扑。他们定义了两个新的时间子结构概念,旨在表示子结构的演化。他们提出了提取这些子结构的算法。他们在两个数据和两个经济真实数据上对这个命题进行了实验。
{"title":"Concept of Temporal Pretopology for the Analysis for Structural Changes: Application to Econometrics","authors":"Nazha Selmaoui-Folcher, Jannaï Tokotoko, Samuel Gorohouna, Laïsa Roi, C. Leschi, Catherine Ris","doi":"10.4018/ijdwm.298004","DOIUrl":"https://doi.org/10.4018/ijdwm.298004","url":null,"abstract":"Pretopology is a mathematical model developed from a weakening of the topological axiomatic. It was initially used in economic, social and biological sciences and next in pattern recognition and image analysis. More recently, it has been applied to the analysis of complex networks. Pretopology enables to work in a mathematical framework with weak properties, and its nonidempotent operator called pseudo-closure permits to implement iterative algorithms. It proposes a formalism that generalizes graph theory concepts and allows to model problems universally. In this paper, authors will extend this mathematical model to analyze complex data with spatiotemporal dimensions. Authors define the notion of a temporal pretopology based on a temporal function. They give an example of temporal function based on a binary relation, and construct a temporal pretopology. They define two new notions of temporal substructures which aim at representing evolution of substructures. They propose algorithms to extract these substructures. They experiment the proposition on 2 data and two economic real data.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79691839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Open Domain Question Answering With Delayed Attention in Transformer-Based Models 基于变压器的模型中具有延迟注意的开放域高效问答
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2022-04-01 DOI: 10.4018/ijdwm.298005
Wissam Siblini, Mohamed Challal, Charlotte Pasqual
Open Domain Question Answering (ODQA) on a large-scale corpus of documents (e.g. Wikipedia) is a key challenge in computer science. Although Transformer-based language models such as Bert have shown an ability to outperform humans to extract answers from small pre-selected passages of text, they suffer from their high complexity if the search space is much larger. The most common way to deal with this problem is to add a preliminary information retrieval step to strongly filter the corpus and keep only the relevant passages. In this article, the authors consider a more direct and complementary solution which consists in restricting the attention mechanism in Transformer-based models to allow a more efficient management of computations. The resulting variants are competitive with the original models on the extractive task and allow, in the ODQA setting, a significant acceleration of predictions and sometimes even an improvement in the quality of response.
基于大规模文档语料库(如维基百科)的开放领域问答(ODQA)是计算机科学中的一个关键挑战。尽管基于变形金刚的语言模型(如Bert)已经显示出从预先选择的文本段落中提取答案的能力,但如果搜索空间大得多,它们就会受到高复杂性的困扰。解决这一问题最常见的方法是增加一个初步的信息检索步骤,对语料库进行强过滤,只保留相关的段落。在本文中,作者考虑了一个更直接和互补的解决方案,该解决方案包括限制基于transformer的模型中的注意力机制,以允许更有效地管理计算。由此产生的变体在提取任务上与原始模型相竞争,并且在ODQA设置中允许显著加速预测,有时甚至提高响应质量。
{"title":"Efficient Open Domain Question Answering With Delayed Attention in Transformer-Based Models","authors":"Wissam Siblini, Mohamed Challal, Charlotte Pasqual","doi":"10.4018/ijdwm.298005","DOIUrl":"https://doi.org/10.4018/ijdwm.298005","url":null,"abstract":"Open Domain Question Answering (ODQA) on a large-scale corpus of documents (e.g. Wikipedia) is a key challenge in computer science. Although Transformer-based language models such as Bert have shown an ability to outperform humans to extract answers from small pre-selected passages of text, they suffer from their high complexity if the search space is much larger. The most common way to deal with this problem is to add a preliminary information retrieval step to strongly filter the corpus and keep only the relevant passages. In this article, the authors consider a more direct and complementary solution which consists in restricting the attention mechanism in Transformer-based models to allow a more efficient management of computations. The resulting variants are competitive with the original models on the extractive task and allow, in the ODQA setting, a significant acceleration of predictions and sometimes even an improvement in the quality of response.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84632210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Boat Detection in Marina Using Time-Delay Analysis and Deep Learning 基于时延分析和深度学习的码头船舶检测
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2022-04-01 DOI: 10.4018/ijdwm.298006
Romane Scherrer, Erwan Aulnette, T. Quiniou, J. Kasarhérou, Pierre Kolb, Nazha Selmaoui-Folcher
An autonomous acoustic system based on two bottom-moored hydrophones, a two-input audio board and a small single-board computer was installed at the entrance of a marina to detect entering/exiting boat. Windowed time lagged cross-correlations are calculated by the system to find the consecutive time delays between the hydrophone signals and to compute a signal which is a function of the boats' angular trajectories. Since its installation, the single-board computer performs online prediction with a signal processing-based algorithm which achieved an accuracy of 80 %. To improve system performance, a convolutional neural network (CNN) is trained with the acquired data to perform real-time detection. Two classification tasks were considered (binary and multiclass) to both detect a boat and its direction of navigation. Finally, a trained CNN was implemented in a single-board computer to ensure that prediction can be performed in real time.
在码头入口处安装了一个基于两个底系泊水听器、一个双输入音频板和一个小型单板计算机的自主声学系统,以探测进出船只。系统通过计算带窗时间滞后的相互关系,找到水听器信号之间的连续时间延迟,并计算出一个信号,该信号是船的角轨迹的函数。自安装以来,单板计算机使用基于信号处理的算法进行在线预测,准确率达到80%。为了提高系统性能,利用采集到的数据训练卷积神经网络(CNN)进行实时检测。考虑了两种分类任务(二元分类和多分类)来检测船只及其航行方向。最后,在单板计算机上实现经过训练的CNN,以确保能够实时进行预测。
{"title":"Boat Detection in Marina Using Time-Delay Analysis and Deep Learning","authors":"Romane Scherrer, Erwan Aulnette, T. Quiniou, J. Kasarhérou, Pierre Kolb, Nazha Selmaoui-Folcher","doi":"10.4018/ijdwm.298006","DOIUrl":"https://doi.org/10.4018/ijdwm.298006","url":null,"abstract":"An autonomous acoustic system based on two bottom-moored hydrophones, a two-input audio board and a small single-board computer was installed at the entrance of a marina to detect entering/exiting boat. Windowed time lagged cross-correlations are calculated by the system to find the consecutive time delays between the hydrophone signals and to compute a signal which is a function of the boats' angular trajectories. Since its installation, the single-board computer performs online prediction with a signal processing-based algorithm which achieved an accuracy of 80 %. To improve system performance, a convolutional neural network (CNN) is trained with the acquired data to perform real-time detection. Two classification tasks were considered (binary and multiclass) to both detect a boat and its direction of navigation. Finally, a trained CNN was implemented in a single-board computer to ensure that prediction can be performed in real time.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73432847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Iterative and Semi-Supervised Design of Chatbots Using Interactive Clustering 基于交互聚类的聊天机器人迭代半监督设计
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2022-04-01 DOI: 10.4018/ijdwm.298007
Erwan Schild, Gautier Durantin, Jean-Charles Lamirel, F. Miconi
Chatbots represent a promising tool to automate the processing of requests in a business context. However, despite major progress in natural language processing technologies, constructing a dataset deemed relevant by business experts is a manual, iterative and error-prone process. To assist these experts during modelling and labelling, the authors propose an active learning methodology coined Interactive Clustering. It relies on interactions between computer-guided segmentation of data in intents, and response-driven human annotations imposing constraints on clusters to improve relevance.This article applies Interactive Clustering on a realistic dataset, and measures the optimal settings required for relevant segmentation in a minimal number of annotations. The usability of the method is discussed in terms of computation time, and the achieved compromise between business relevance and classification performance during training.In this context, Interactive Clustering appears as a suitable methodology combining human and computer initiatives to efficiently develop a useable chatbot.
聊天机器人代表了一种很有前途的工具,可以在业务上下文中自动处理请求。然而,尽管自然语言处理技术取得了重大进展,但构建业务专家认为相关的数据集是一个手动的、迭代的、容易出错的过程。为了在建模和标记过程中帮助这些专家,作者提出了一种主动学习方法,称为交互式聚类。它依赖于意图中计算机引导的数据分割和响应驱动的人为注释之间的交互,这些注释在集群上施加约束以提高相关性。本文在实际数据集上应用交互式聚类,并测量在最少数量的注释中进行相关分割所需的最佳设置。从计算时间的角度讨论了该方法的可用性,并在训练过程中实现了业务相关性和分类性能之间的折衷。在这种情况下,交互式聚类似乎是一种合适的方法,结合了人类和计算机的主动性,以有效地开发可用的聊天机器人。
{"title":"Iterative and Semi-Supervised Design of Chatbots Using Interactive Clustering","authors":"Erwan Schild, Gautier Durantin, Jean-Charles Lamirel, F. Miconi","doi":"10.4018/ijdwm.298007","DOIUrl":"https://doi.org/10.4018/ijdwm.298007","url":null,"abstract":"Chatbots represent a promising tool to automate the processing of requests in a business context. However, despite major progress in natural language processing technologies, constructing a dataset deemed relevant by business experts is a manual, iterative and error-prone process. To assist these experts during modelling and labelling, the authors propose an active learning methodology coined Interactive Clustering. It relies on interactions between computer-guided segmentation of data in intents, and response-driven human annotations imposing constraints on clusters to improve relevance.This article applies Interactive Clustering on a realistic dataset, and measures the optimal settings required for relevant segmentation in a minimal number of annotations. The usability of the method is discussed in terms of computation time, and the achieved compromise between business relevance and classification performance during training.In this context, Interactive Clustering appears as a suitable methodology combining human and computer initiatives to efficiently develop a useable chatbot.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73614194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Large-Scale System for Social Media Data Warehousing: The Case of Twitter-Related Drug Abuse Events Integration 社交媒体数据仓库的大规模系统:以twitter相关药物滥用事件整合为例
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2022-01-01 DOI: 10.4018/ijdwm.290890
Ferdaous Jenhani, M. Gouider
Social media data become an integral part in the business data and should be integrated into the decisional process for better decision making based on information which reflects better the true situation of business in any field. However, social media data are unstructured and generated in very high frequency which exceeds the capacity of the data warehouse. In this work, we propose to extend the data warehousing process with a staging area which heart is a large scale system implementing an information extraction process using Storm and Hadoop frameworks to better manage their volume and frequency. Concerning structured information extraction, mainly events, we combine a set of techniques from NLP, linguistic rules and machine learning to succeed the task. Finally, we propose the adequate data warehouse conceptual model for events modeling and integration with enterprise data warehouse using an intermediate table called Bridge table. For application and experiments, we focus on drug abuse events extraction from Twitter data and their modeling into the Event Data Warehouse.
社交媒体数据成为商业数据中不可或缺的一部分,应该融入到决策过程中,以便更好地根据信息做出决策,从而更好地反映任何领域的商业真实情况。然而,社交媒体数据是非结构化的,并且生成的频率非常高,超过了数据仓库的容量。在这项工作中,我们建议扩展数据仓库流程,并建立一个临时区,该临时区是一个使用Storm和Hadoop框架实现信息提取流程的大型系统,以更好地管理其数量和频率。关于结构化信息提取,主要是事件,我们结合了一组来自NLP,语言规则和机器学习的技术来完成任务。最后,我们提出了适当的数据仓库概念模型,用于使用一个称为桥表的中间表对事件进行建模并与企业数据仓库集成。在应用和实验方面,我们着重于从Twitter数据中提取药物滥用事件并将其建模到事件数据仓库中。
{"title":"Large-Scale System for Social Media Data Warehousing: The Case of Twitter-Related Drug Abuse Events Integration","authors":"Ferdaous Jenhani, M. Gouider","doi":"10.4018/ijdwm.290890","DOIUrl":"https://doi.org/10.4018/ijdwm.290890","url":null,"abstract":"Social media data become an integral part in the business data and should be integrated into the decisional process for better decision making based on information which reflects better the true situation of business in any field. However, social media data are unstructured and generated in very high frequency which exceeds the capacity of the data warehouse. In this work, we propose to extend the data warehousing process with a staging area which heart is a large scale system implementing an information extraction process using Storm and Hadoop frameworks to better manage their volume and frequency. Concerning structured information extraction, mainly events, we combine a set of techniques from NLP, linguistic rules and machine learning to succeed the task. Finally, we propose the adequate data warehouse conceptual model for events modeling and integration with enterprise data warehouse using an intermediate table called Bridge table. For application and experiments, we focus on drug abuse events extraction from Twitter data and their modeling into the Event Data Warehouse.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75167432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Stock Trading Expert System Established by the CNN-GA-Based Collaborative System 基于CNN-GA的协同系统构建股票交易专家系统
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2022-01-01 DOI: 10.4018/ijdwm.309957
J. Wu, Lingyun Sun, Gautam Srivastava, Vicente García Díaz, Jerry Chun‐wei Lin
This article uses a new convolutional neural network framework, which has good performance for time series feature extraction and stock price prediction. This method is called the stock sequence array convolutional neural network, or SSACNN for short. SSACNN collects data on leading indicators including historical prices and their futures and options, and uses arrays as the input map of the CNN framework. In the financial market, every number has its logic behind it. Leading indicators such as futures and options can reflect changes in many markets, such as the industry's prosperity. Adding the data set of leading indicators can predict the trend of stock prices well. This study takes the stock markets of the United States and Taiwan as the research objects and uses historical data, futures, and options as data sets to predict the stock prices of these two markets, and then uses genetic algorithms to find trading signals, so as to get a stock trading system. The experimental results show that the stock trading system proposed in this research can help investors obtain certain returns.
本文使用了一种新的卷积神经网络框架,该框架在时间序列特征提取和股价预测方面具有良好的性能。这种方法被称为股票序列阵列卷积神经网络,简称SSACNN。SSACNN收集领先指标的数据,包括历史价格及其期货和期权,并使用数组作为CNN框架的输入图。在金融市场上,每个数字背后都有其逻辑。期货和期权等领先指标可以反映许多市场的变化,例如行业的繁荣程度。加入领先指标的数据集可以很好地预测股价的走势。本研究以美国和台湾股市为研究对象,以历史数据、期货和期权为数据集,预测这两个市场的股价,然后利用遗传算法寻找交易信号,从而得到股票交易系统。实验结果表明,本文提出的股票交易系统可以帮助投资者获得一定的收益。
{"title":"A Stock Trading Expert System Established by the CNN-GA-Based Collaborative System","authors":"J. Wu, Lingyun Sun, Gautam Srivastava, Vicente García Díaz, Jerry Chun‐wei Lin","doi":"10.4018/ijdwm.309957","DOIUrl":"https://doi.org/10.4018/ijdwm.309957","url":null,"abstract":"This article uses a new convolutional neural network framework, which has good performance for time series feature extraction and stock price prediction. This method is called the stock sequence array convolutional neural network, or SSACNN for short. SSACNN collects data on leading indicators including historical prices and their futures and options, and uses arrays as the input map of the CNN framework. In the financial market, every number has its logic behind it. Leading indicators such as futures and options can reflect changes in many markets, such as the industry's prosperity. Adding the data set of leading indicators can predict the trend of stock prices well. This study takes the stock markets of the United States and Taiwan as the research objects and uses historical data, futures, and options as data sets to predict the stock prices of these two markets, and then uses genetic algorithms to find trading signals, so as to get a stock trading system. The experimental results show that the stock trading system proposed in this research can help investors obtain certain returns.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46827513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Semi-Supervised Sentiment Classification on E-Commerce Reviews Using Tripartite Graph and Clustering 基于三部图和聚类的电子商务评论半监督情感分类
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2022-01-01 DOI: 10.4018/ijdwm.307904
Xin Lu, Donghong Gu, Haolan Zhang, Zhengxin Song, Qianhua Cai, Hongya Zhao, Haiming Wu
Sentiment classification constitutes an important topic in the field of Natural Language Processing, whose main purpose is to extract the sentiment polarity from unstructured texts. The label propagation algorithm, as a semi-supervised learning method, has been widely used in sentiment classification due to its describing sample relation in a graph-based pattern. Whereas, current graph developing strategies fail to use the global distribution and cannot handle the issues of polysemy and synonymy properly. In this paper, a semi-supervised learning methodology, integrating the tripartite graph and the clustering, is proposed for graph construction. Experiments on E-commerce reviews demonstrate the proposed method outperform baseline methods on the whole, which enables precise sentiment classification with few labeled samples.
情感分类是自然语言处理领域的一个重要课题,其主要目的是从非结构化文本中提取情感极性。标签传播算法作为一种半监督学习方法,以基于图的模式描述样本关系,在情感分类中得到了广泛的应用。然而,现有的图开发策略没有充分利用全局分布,不能很好地处理多义、同义问题。本文提出了一种结合三部图和聚类的半监督学习方法,用于图的构造。电子商务评论实验表明,该方法总体上优于基线方法,能够在较少标记样本的情况下实现精确的情感分类。
{"title":"Semi-Supervised Sentiment Classification on E-Commerce Reviews Using Tripartite Graph and Clustering","authors":"Xin Lu, Donghong Gu, Haolan Zhang, Zhengxin Song, Qianhua Cai, Hongya Zhao, Haiming Wu","doi":"10.4018/ijdwm.307904","DOIUrl":"https://doi.org/10.4018/ijdwm.307904","url":null,"abstract":"Sentiment classification constitutes an important topic in the field of Natural Language Processing, whose main purpose is to extract the sentiment polarity from unstructured texts. The label propagation algorithm, as a semi-supervised learning method, has been widely used in sentiment classification due to its describing sample relation in a graph-based pattern. Whereas, current graph developing strategies fail to use the global distribution and cannot handle the issues of polysemy and synonymy properly. In this paper, a semi-supervised learning methodology, integrating the tripartite graph and the clustering, is proposed for graph construction. Experiments on E-commerce reviews demonstrate the proposed method outperform baseline methods on the whole, which enables precise sentiment classification with few labeled samples.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81757483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A New Approach for Fairness Increment of Consensus-Driven Group Recommender Systems Based on Choquet Integral 基于Choquet积分的共识驱动群推荐系统公平性增量新方法
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2022-01-01 DOI: 10.4018/ijdwm.290891
Cu Nguyen Giap, Nguyen Nhu Son, Long Giang Nguyen, Hoang Thi Minh Chau, Tran Manh Tuan, Le Hoang Son
It has been witnessed in recent years for the rising of Group recommender systems (GRSs) in most e-commerce and tourism applications like Booking.com, Traveloka.com, Amazon, etc. One of the most concerned problems in GRSs is to guarantee the fairness between users in a group so-called the consensus-driven group recommender system. This paper proposes a new flexible alternative that embeds a fuzzy measure to aggregation operators of consensus process to improve fairness of group recommendation and deals with group member interaction. Choquet integral is used to build a fuzzy measure based on group member interactions and to seek a better fairness recommendation. The empirical results on the benchmark datasets show the incremental advances of the proposal for dealing with group member interactions and the issue of fairness in Consensus-driven GRS.
近年来,在大多数电子商务和旅游应用程序中,如Booking.com、Traveloka.com、Amazon等,都出现了群组推荐系统(grs)。grs中最受关注的问题之一是如何保证群体中用户之间的公平性,即共识驱动的群体推荐系统。本文提出了一种新的灵活方案,该方案在共识过程的聚合算子中嵌入模糊度量,以提高群体推荐的公平性,并处理群体成员之间的相互作用。利用Choquet积分建立基于群体成员相互作用的模糊度量,寻求更好的公平推荐。在基准数据集上的实证结果表明,在共识驱动的GRS中,该建议在处理群体成员互动和公平问题方面取得了渐进式进展。
{"title":"A New Approach for Fairness Increment of Consensus-Driven Group Recommender Systems Based on Choquet Integral","authors":"Cu Nguyen Giap, Nguyen Nhu Son, Long Giang Nguyen, Hoang Thi Minh Chau, Tran Manh Tuan, Le Hoang Son","doi":"10.4018/ijdwm.290891","DOIUrl":"https://doi.org/10.4018/ijdwm.290891","url":null,"abstract":"It has been witnessed in recent years for the rising of Group recommender systems (GRSs) in most e-commerce and tourism applications like Booking.com, Traveloka.com, Amazon, etc. One of the most concerned problems in GRSs is to guarantee the fairness between users in a group so-called the consensus-driven group recommender system. This paper proposes a new flexible alternative that embeds a fuzzy measure to aggregation operators of consensus process to improve fairness of group recommendation and deals with group member interaction. Choquet integral is used to build a fuzzy measure based on group member interactions and to seek a better fairness recommendation. The empirical results on the benchmark datasets show the incremental advances of the proposal for dealing with group member interactions and the issue of fairness in Consensus-driven GRS.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86618244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Density-Based Spatial Anomalous Window Discovery 基于密度的空间异常窗口发现
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2022-01-01 DOI: 10.4018/ijdwm.299015
Prerna Mohod, V. Janeja
The focus of this paper is to identify anomalous spatial windows using clustering-based methods. Spatial Anomalous windows are the contiguous groupings of spatial nodes which are unusual with respect to the rest of the data. Many scan statistics based approaches have been proposed for the identification of spatial anomalous windows. To identify similarly behaving groups of points, clustering techniques have been proposed. There are parallels between both types of approaches but these approaches have not been used interchangeably. Thus, the focus of our work is to bridge this gap and identify anomalous spatial windows using clustering based methods. Specifically, we use the circular scan statistic based approach and DBSCAN- Density based Spatial Clustering of Applications with Noise, to bridge the gap between clustering and scan statistics based approach. We present experimental results in US crime data Our results show that our approach is effective in identifying spatial anomalous windows and performs equal or better than existing techniques and does better than pure clustering.
本文的重点是利用基于聚类的方法识别异常空间窗口。空间异常窗口是空间节点的连续分组,这些节点相对于其他数据来说是不寻常的。许多基于扫描统计的方法被提出用于空间异常窗的识别。为了识别行为相似的点群,提出了聚类技术。这两种方法之间有相似之处,但这些方法不能互换使用。因此,我们的工作重点是弥合这一差距,并使用基于聚类的方法识别异常空间窗口。具体来说,我们使用基于圆形扫描统计的方法和基于DBSCAN-密度的带噪声应用空间聚类,以弥合聚类和基于扫描统计的方法之间的差距。我们在美国犯罪数据中展示了实验结果。我们的结果表明,我们的方法在识别空间异常窗口方面是有效的,并且比现有的技术表现相同或更好,并且比纯聚类更好。
{"title":"Density-Based Spatial Anomalous Window Discovery","authors":"Prerna Mohod, V. Janeja","doi":"10.4018/ijdwm.299015","DOIUrl":"https://doi.org/10.4018/ijdwm.299015","url":null,"abstract":"The focus of this paper is to identify anomalous spatial windows using clustering-based methods. Spatial Anomalous windows are the contiguous groupings of spatial nodes which are unusual with respect to the rest of the data. Many scan statistics based approaches have been proposed for the identification of spatial anomalous windows. To identify similarly behaving groups of points, clustering techniques have been proposed. There are parallels between both types of approaches but these approaches have not been used interchangeably. Thus, the focus of our work is to bridge this gap and identify anomalous spatial windows using clustering based methods. Specifically, we use the circular scan statistic based approach and DBSCAN- Density based Spatial Clustering of Applications with Noise, to bridge the gap between clustering and scan statistics based approach. We present experimental results in US crime data Our results show that our approach is effective in identifying spatial anomalous windows and performs equal or better than existing techniques and does better than pure clustering.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86493401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Crime Analyses Using Data Analytics 使用数据分析进行犯罪分析
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2022-01-01 DOI: 10.4018/ijdwm.299014
Thanu Dayara, F. Thabtah, Hussein Abdel-jaber, S. Zeidan
One potential approach for crime analysis that has shown promising results is data analytics, particularly descriptive and predictive techniques. Data analytics can explore former criminal incidents seeking hidden correlations and patterns, which potentially could be used in crime prevention and resource management. The purpose of this research is to build a crime analysis model using supervised techniques to predict the arrest status of serious crimes in Chicago. This is based on specific indicators, such as timeframe, location in terms of district, community, and beat, and crime type among others. We used time series and clustering techniques to help us identify influential features. Supervised machine learning algorithms then modelled the subset of features against incidents related to battery and assaults in specific timeframes and locations to predict the arrest status response variable. The models derived from Naïve Bayes, Decision Tree, and Support Vector Machine (SVM) algorithms reveal a high predictive accuracy rate at certain times in some communities within Chicago.
数据分析是一种潜在的犯罪分析方法,已经显示出有希望的结果,特别是描述性和预测性技术。数据分析可以探索以前的犯罪事件,寻找隐藏的相关性和模式,这可能会用于预防犯罪和资源管理。本研究的目的是利用监督技术建立一个犯罪分析模型来预测芝加哥严重犯罪的逮捕状况。这是基于具体的指标,如时间框架,地区,社区和殴打的地点,以及犯罪类型等。我们使用时间序列和聚类技术来帮助我们识别有影响的特征。然后,有监督的机器学习算法根据特定时间范围和地点的电池和攻击事件对特征子集进行建模,以预测逮捕状态响应变量。从Naïve贝叶斯、决策树和支持向量机(SVM)算法中得出的模型显示,在芝加哥的一些社区,在特定时间内的预测准确率很高。
{"title":"Crime Analyses Using Data Analytics","authors":"Thanu Dayara, F. Thabtah, Hussein Abdel-jaber, S. Zeidan","doi":"10.4018/ijdwm.299014","DOIUrl":"https://doi.org/10.4018/ijdwm.299014","url":null,"abstract":"One potential approach for crime analysis that has shown promising results is data analytics, particularly descriptive and predictive techniques. Data analytics can explore former criminal incidents seeking hidden correlations and patterns, which potentially could be used in crime prevention and resource management. The purpose of this research is to build a crime analysis model using supervised techniques to predict the arrest status of serious crimes in Chicago. This is based on specific indicators, such as timeframe, location in terms of district, community, and beat, and crime type among others. We used time series and clustering techniques to help us identify influential features. Supervised machine learning algorithms then modelled the subset of features against incidents related to battery and assaults in specific timeframes and locations to predict the arrest status response variable. The models derived from Naïve Bayes, Decision Tree, and Support Vector Machine (SVM) algorithms reveal a high predictive accuracy rate at certain times in some communities within Chicago.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88401411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of Data Warehousing and Mining
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1