首页 > 最新文献

International Journal of Data Warehousing and Mining最新文献

英文 中文
RETAD: Vehicle Trajectory Anomaly Detection Based on Reconstruction Error RETAD:基于重构误差的车辆轨迹异常检测
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2023-01-13 DOI: 10.4018/ijdwm.316460
Chaoneng Li, Guanwen Feng, Yiran Jia, Yunan Li, Jian Ji, Qiguang Miao
Due to the rapid advancement of wireless sensor and location technologies, a large amount of mobile agent trajectory data has become available. Intelligent city systems and video surveillance all benefit from trajectory anomaly detection. The authors propose an unsupervised reconstruction error-based trajectory anomaly detection (RETAD) method for vehicles to address the issues of conventional anomaly detection, which include difficulty extracting features, are susceptible to overfitting, and have a poor anomaly detection effect. RETAD reconstructs the original vehicle trajectories through an autoencoder based on recurrent neural networks. The model obtains moving patterns of normal trajectories by eliminating the gap between the reconstruction results and the initial inputs. Anomalous trajectories are defined as those with a reconstruction error larger than anomaly threshold. Experimental results demonstrate that the effectiveness of RETAD in detecting anomalies is superior to traditional distance-based, density-based, and machine learning classification algorithms on multiple metrics.
由于无线传感器和定位技术的快速发展,大量的移动智能体轨迹数据已经可用。智能城市系统和视频监控都受益于轨迹异常检测。针对传统异常检测方法存在特征提取困难、易出现过拟合、异常检测效果差等问题,提出了一种基于无监督重构误差的车辆轨迹异常检测方法。RETAD通过基于循环神经网络的自编码器重构原始车辆轨迹。该模型通过消除重建结果与初始输入之间的差距来获得法向轨迹的运动模式。异常轨迹是指重建误差大于异常阈值的轨迹。实验结果表明,RETAD在检测异常方面的有效性优于传统的基于距离、基于密度和机器学习的多指标分类算法。
{"title":"RETAD: Vehicle Trajectory Anomaly Detection Based on Reconstruction Error","authors":"Chaoneng Li, Guanwen Feng, Yiran Jia, Yunan Li, Jian Ji, Qiguang Miao","doi":"10.4018/ijdwm.316460","DOIUrl":"https://doi.org/10.4018/ijdwm.316460","url":null,"abstract":"Due to the rapid advancement of wireless sensor and location technologies, a large amount of mobile agent trajectory data has become available. Intelligent city systems and video surveillance all benefit from trajectory anomaly detection. The authors propose an unsupervised reconstruction error-based trajectory anomaly detection (RETAD) method for vehicles to address the issues of conventional anomaly detection, which include difficulty extracting features, are susceptible to overfitting, and have a poor anomaly detection effect. RETAD reconstructs the original vehicle trajectories through an autoencoder based on recurrent neural networks. The model obtains moving patterns of normal trajectories by eliminating the gap between the reconstruction results and the initial inputs. Anomalous trajectories are defined as those with a reconstruction error larger than anomaly threshold. Experimental results demonstrate that the effectiveness of RETAD in detecting anomalies is superior to traditional distance-based, density-based, and machine learning classification algorithms on multiple metrics.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"19 1","pages":"1-14"},"PeriodicalIF":1.2,"publicationDate":"2023-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70455534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Personal Health and Illness Management and the Future Vision of Biomedical Clothing Based on WSN 基于WSN的个人健康和疾病管理及生物医学服装的未来展望
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2023-01-01 DOI: 10.4018/ijdwm.316126
Ge Zhang, Zubin Ning
It is essential to have a fast, reliable, and energy-efficient connection between wireless sensor networks (WSNs). Control specifications, networking layers, media access control, and physical layers should be optimised or co-designed. Health insurance will become more expensive for individuals with lower incomes. There are privacy and cyber security issues, an increased risk of malpractice lawsuits, and more costs in terms of both time and money for doctors and patients. In this paper, personal health biomedical clothing based on wireless sensor networks (PH-BC-WSN) was used to enhance access to quality health care, boost food production through precision agriculture, and improve the quality of human resources. The internet of things enables the creation of healthcare and medical asset monitoring systems that are more efficient. There was extensive discussion of medical data eavesdropping, manipulation, fabrication of warnings, denial of services, position and tracker of users, physical interference with devices, and electromagnetic attacks.
在无线传感器网络(wsn)之间建立快速、可靠和节能的连接至关重要。控制规范、网络层、媒体访问控制和物理层应进行优化或共同设计。对于收入较低的个人来说,医疗保险将变得更加昂贵。还有隐私和网络安全问题,医疗事故诉讼的风险增加,以及医生和患者在时间和金钱方面的更多成本。本文将基于无线传感器网络(PH-BC-WSN)的个人健康生物医学服装用于提高优质医疗保健的可及性,通过精准农业促进粮食生产,提高人力资源质量。物联网使创建更高效的医疗保健和医疗资产监控系统成为可能。会议广泛讨论了医疗数据窃听、操纵、捏造警告、拒绝服务、定位和跟踪用户、对设备的物理干扰以及电磁攻击等问题。
{"title":"Personal Health and Illness Management and the Future Vision of Biomedical Clothing Based on WSN","authors":"Ge Zhang, Zubin Ning","doi":"10.4018/ijdwm.316126","DOIUrl":"https://doi.org/10.4018/ijdwm.316126","url":null,"abstract":"It is essential to have a fast, reliable, and energy-efficient connection between wireless sensor networks (WSNs). Control specifications, networking layers, media access control, and physical layers should be optimised or co-designed. Health insurance will become more expensive for individuals with lower incomes. There are privacy and cyber security issues, an increased risk of malpractice lawsuits, and more costs in terms of both time and money for doctors and patients. In this paper, personal health biomedical clothing based on wireless sensor networks (PH-BC-WSN) was used to enhance access to quality health care, boost food production through precision agriculture, and improve the quality of human resources. The internet of things enables the creation of healthcare and medical asset monitoring systems that are more efficient. There was extensive discussion of medical data eavesdropping, manipulation, fabrication of warnings, denial of services, position and tracker of users, physical interference with devices, and electromagnetic attacks.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"12 1","pages":"1-21"},"PeriodicalIF":1.2,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83531376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Loan Default Prediction Based on Convolutional Neural Network and LightGBM 基于卷积神经网络和LightGBM的贷款违约预测
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2023-01-01 DOI: 10.4018/ijdwm.315823
Q. Zhu, Wenhao Ding, Mingsen Xiang, M. Hu, Ning Zhang
With the change of people's consumption mode, credit consumption has gradually become a new consumption trend. Frequent loan defaults give default prediction more and more attention. This paper proposes a new comprehensive prediction method of loan default. This method combines convolutional neural network and LightGBM algorithm to establish a prediction model. Firstly, the excellent feature extraction ability of convolutional neural network is used to extract features from the original loan data and generate a new feature matrix. Secondly, the new feature matrix is used as input data, and the parameters of LightGBM algorithm are adjusted through grid search so as to build the LightGBM model. Finally, the LightGBM model is trained based on the new feature matrix, and the CNN-LightGBM loan default prediction model is obtained. To verify the effectiveness and superiority of our model, a series of experiments were conducted to compare the proposed prediction model with four classical models. The results show that CNN-LightGBM model is superior to other models in all evaluation indexes.
随着人们消费方式的转变,信用消费逐渐成为一种新的消费趋势。频繁的贷款违约使违约预测越来越受到人们的关注。本文提出了一种新的贷款违约综合预测方法。该方法结合卷积神经网络和LightGBM算法建立预测模型。首先,利用卷积神经网络出色的特征提取能力,从原始贷款数据中提取特征,生成新的特征矩阵;其次,将新的特征矩阵作为输入数据,通过网格搜索调整LightGBM算法的参数,建立LightGBM模型;最后,基于新的特征矩阵对LightGBM模型进行训练,得到CNN-LightGBM贷款违约预测模型。为了验证该模型的有效性和优越性,进行了一系列的实验,将所提出的预测模型与四种经典模型进行了比较。结果表明,CNN-LightGBM模型在各评价指标上均优于其他模型。
{"title":"Loan Default Prediction Based on Convolutional Neural Network and LightGBM","authors":"Q. Zhu, Wenhao Ding, Mingsen Xiang, M. Hu, Ning Zhang","doi":"10.4018/ijdwm.315823","DOIUrl":"https://doi.org/10.4018/ijdwm.315823","url":null,"abstract":"With the change of people's consumption mode, credit consumption has gradually become a new consumption trend. Frequent loan defaults give default prediction more and more attention. This paper proposes a new comprehensive prediction method of loan default. This method combines convolutional neural network and LightGBM algorithm to establish a prediction model. Firstly, the excellent feature extraction ability of convolutional neural network is used to extract features from the original loan data and generate a new feature matrix. Secondly, the new feature matrix is used as input data, and the parameters of LightGBM algorithm are adjusted through grid search so as to build the LightGBM model. Finally, the LightGBM model is trained based on the new feature matrix, and the CNN-LightGBM loan default prediction model is obtained. To verify the effectiveness and superiority of our model, a series of experiments were conducted to compare the proposed prediction model with four classical models. The results show that CNN-LightGBM model is superior to other models in all evaluation indexes.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"24 1","pages":"1-16"},"PeriodicalIF":1.2,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90475469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Top-K Pseudo Labeling for Semi-Supervised Image Classification 半监督图像分类的Top-K伪标记
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2022-12-30 DOI: 10.4018/ijdwm.316150
Yi Jiang, Hui Sun
In this paper, a top-k pseudo labeling method for semi-supervised self-learning is proposed. Pseudo labeling is a key technology in semi-supervised self-learning. Briefly, the quality of the pseudo label generated largely determined the convergence of the neural network and the accuracy obtained. In this paper, the authors use a method called top-k pseudo labeling to generate pseudo label during the training of semi-supervised neural network model. The proposed labeling method helps a lot in learning features from unlabeled data. The proposed method is easy to implement and only relies on the neural network prediction and hyper-parameter k. The experiment results show that the proposed method works well with semi-supervised learning on CIFAR-10 and CIFAR-100 datasets. Also, a variant of top-k labeling for supervised learning named top-k regulation is proposed. The experiment results show that various models can achieve higher accuracy on test set when trained with top-k regulation.
提出了一种用于半监督自学习的top-k伪标注方法。伪标注是半监督自学习中的一项关键技术。简而言之,生成的伪标签的质量在很大程度上决定了神经网络的收敛性和获得的精度。在本文中,作者在半监督神经网络模型的训练过程中使用了一种称为top-k伪标记的方法来生成伪标记。所提出的标注方法有助于从未标注的数据中学习特征。该方法实现简单,仅依赖于神经网络预测和超参数k。实验结果表明,该方法在CIFAR-10和CIFAR-100数据集上的半监督学习效果良好。此外,还提出了一种用于监督学习的top-k标记的变体,称为top-k调节。实验结果表明,通过top-k调节训练,各种模型在测试集上都能达到较高的准确率。
{"title":"Top-K Pseudo Labeling for Semi-Supervised Image Classification","authors":"Yi Jiang, Hui Sun","doi":"10.4018/ijdwm.316150","DOIUrl":"https://doi.org/10.4018/ijdwm.316150","url":null,"abstract":"In this paper, a top-k pseudo labeling method for semi-supervised self-learning is proposed. Pseudo labeling is a key technology in semi-supervised self-learning. Briefly, the quality of the pseudo label generated largely determined the convergence of the neural network and the accuracy obtained. In this paper, the authors use a method called top-k pseudo labeling to generate pseudo label during the training of semi-supervised neural network model. The proposed labeling method helps a lot in learning features from unlabeled data. The proposed method is easy to implement and only relies on the neural network prediction and hyper-parameter k. The experiment results show that the proposed method works well with semi-supervised learning on CIFAR-10 and CIFAR-100 datasets. Also, a variant of top-k labeling for supervised learning named top-k regulation is proposed. The experiment results show that various models can achieve higher accuracy on test set when trained with top-k regulation.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"12 1","pages":"1-18"},"PeriodicalIF":1.2,"publicationDate":"2022-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90776239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Method for Generating Comparison Tables From the Semantic Web 一种从语义网生成比较表的方法
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2022-04-01 DOI: 10.4018/ijdwm.298008
A. Giacometti, Béatrice Bouchou-Markhoff, Arnaud Soulet
This paper presents Versus, which is the first automatic method for generating comparison tables from knowledge bases of the Semantic Web. For this purpose, it introduces the contextual reference level to evaluate whether a feature is relevant to compare a set of entities. This measure relies on contexts that are sets of entities similar to the compared entities. Its principle is to favor the features whose values for the compared entities are reference (or frequent) in these contexts. The proposal efficiently evaluates the contextual reference level from a public SPARQL endpoint limited by a fair-use policy. Using a new benchmark based on Wikidata, the experiments show the interest of the contextual reference level for identifying the features deemed relevant by users with high precision and recall. In addition, the proposed optimizations significantly reduce the number of required queries for properties as well as for inverse relations. Interestingly, this experimental study also show that the inverse relations bring out a large number of numerical comparison features.
本文提出了第一个基于语义网知识库自动生成比较表的方法Versus。为此,它引入了上下文引用级别,以评估一个特性是否与比较一组实体相关。此度量依赖于上下文,这些上下文是与被比较实体相似的实体集。其原则是优先考虑比较实体的值在这些上下文中是引用(或频繁)的特征。该建议有效地从受合理使用策略限制的公共SPARQL端点评估上下文引用级别。使用基于Wikidata的新基准,实验表明上下文参考水平对识别用户认为相关的特征具有较高的准确性和召回率。此外,建议的优化显著减少了属性和逆关系所需查询的数量。有趣的是,本实验研究还表明,反比关系带来了大量的数值比较特征。
{"title":"A Method for Generating Comparison Tables From the Semantic Web","authors":"A. Giacometti, Béatrice Bouchou-Markhoff, Arnaud Soulet","doi":"10.4018/ijdwm.298008","DOIUrl":"https://doi.org/10.4018/ijdwm.298008","url":null,"abstract":"This paper presents Versus, which is the first automatic method for generating comparison tables from knowledge bases of the Semantic Web. For this purpose, it introduces the contextual reference level to evaluate whether a feature is relevant to compare a set of entities. This measure relies on contexts that are sets of entities similar to the compared entities. Its principle is to favor the features whose values for the compared entities are reference (or frequent) in these contexts. The proposal efficiently evaluates the contextual reference level from a public SPARQL endpoint limited by a fair-use policy. Using a new benchmark based on Wikidata, the experiments show the interest of the contextual reference level for identifying the features deemed relevant by users with high precision and recall. In addition, the proposed optimizations significantly reduce the number of required queries for properties as well as for inverse relations. Interestingly, this experimental study also show that the inverse relations bring out a large number of numerical comparison features.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"37 1","pages":"1-20"},"PeriodicalIF":1.2,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83159465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Open Domain Question Answering With Delayed Attention in Transformer-Based Models 基于变压器的模型中具有延迟注意的开放域高效问答
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2022-04-01 DOI: 10.4018/ijdwm.298005
Wissam Siblini, Mohamed Challal, Charlotte Pasqual
Open Domain Question Answering (ODQA) on a large-scale corpus of documents (e.g. Wikipedia) is a key challenge in computer science. Although Transformer-based language models such as Bert have shown an ability to outperform humans to extract answers from small pre-selected passages of text, they suffer from their high complexity if the search space is much larger. The most common way to deal with this problem is to add a preliminary information retrieval step to strongly filter the corpus and keep only the relevant passages. In this article, the authors consider a more direct and complementary solution which consists in restricting the attention mechanism in Transformer-based models to allow a more efficient management of computations. The resulting variants are competitive with the original models on the extractive task and allow, in the ODQA setting, a significant acceleration of predictions and sometimes even an improvement in the quality of response.
基于大规模文档语料库(如维基百科)的开放领域问答(ODQA)是计算机科学中的一个关键挑战。尽管基于变形金刚的语言模型(如Bert)已经显示出从预先选择的文本段落中提取答案的能力,但如果搜索空间大得多,它们就会受到高复杂性的困扰。解决这一问题最常见的方法是增加一个初步的信息检索步骤,对语料库进行强过滤,只保留相关的段落。在本文中,作者考虑了一个更直接和互补的解决方案,该解决方案包括限制基于transformer的模型中的注意力机制,以允许更有效地管理计算。由此产生的变体在提取任务上与原始模型相竞争,并且在ODQA设置中允许显著加速预测,有时甚至提高响应质量。
{"title":"Efficient Open Domain Question Answering With Delayed Attention in Transformer-Based Models","authors":"Wissam Siblini, Mohamed Challal, Charlotte Pasqual","doi":"10.4018/ijdwm.298005","DOIUrl":"https://doi.org/10.4018/ijdwm.298005","url":null,"abstract":"Open Domain Question Answering (ODQA) on a large-scale corpus of documents (e.g. Wikipedia) is a key challenge in computer science. Although Transformer-based language models such as Bert have shown an ability to outperform humans to extract answers from small pre-selected passages of text, they suffer from their high complexity if the search space is much larger. The most common way to deal with this problem is to add a preliminary information retrieval step to strongly filter the corpus and keep only the relevant passages. In this article, the authors consider a more direct and complementary solution which consists in restricting the attention mechanism in Transformer-based models to allow a more efficient management of computations. The resulting variants are competitive with the original models on the extractive task and allow, in the ODQA setting, a significant acceleration of predictions and sometimes even an improvement in the quality of response.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"45 1","pages":"1-16"},"PeriodicalIF":1.2,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84632210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Concept of Temporal Pretopology for the Analysis for Structural Changes: Application to Econometrics 结构变化分析的时间预拓扑概念:在计量经济学中的应用
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2022-04-01 DOI: 10.4018/ijdwm.298004
Nazha Selmaoui-Folcher, Jannaï Tokotoko, Samuel Gorohouna, Laïsa Roi, C. Leschi, Catherine Ris
Pretopology is a mathematical model developed from a weakening of the topological axiomatic. It was initially used in economic, social and biological sciences and next in pattern recognition and image analysis. More recently, it has been applied to the analysis of complex networks. Pretopology enables to work in a mathematical framework with weak properties, and its nonidempotent operator called pseudo-closure permits to implement iterative algorithms. It proposes a formalism that generalizes graph theory concepts and allows to model problems universally. In this paper, authors will extend this mathematical model to analyze complex data with spatiotemporal dimensions. Authors define the notion of a temporal pretopology based on a temporal function. They give an example of temporal function based on a binary relation, and construct a temporal pretopology. They define two new notions of temporal substructures which aim at representing evolution of substructures. They propose algorithms to extract these substructures. They experiment the proposition on 2 data and two economic real data.
预拓扑学是一种从拓扑公理的弱化发展而来的数学模型。它最初用于经济、社会和生物科学,然后用于模式识别和图像分析。最近,它被应用于复杂网络的分析。Pretopology允许在具有弱性质的数学框架中工作,它的非幂等算子称为伪闭包,允许实现迭代算法。它提出了一种一般化图论概念并允许对问题进行普遍建模的形式主义。在本文中,作者将这一数学模型扩展到分析具有时空维度的复杂数据。作者基于时间函数定义了时间预拓扑的概念。他们给出了一个基于二元关系的时间函数的例子,并构造了一个时间预拓扑。他们定义了两个新的时间子结构概念,旨在表示子结构的演化。他们提出了提取这些子结构的算法。他们在两个数据和两个经济真实数据上对这个命题进行了实验。
{"title":"Concept of Temporal Pretopology for the Analysis for Structural Changes: Application to Econometrics","authors":"Nazha Selmaoui-Folcher, Jannaï Tokotoko, Samuel Gorohouna, Laïsa Roi, C. Leschi, Catherine Ris","doi":"10.4018/ijdwm.298004","DOIUrl":"https://doi.org/10.4018/ijdwm.298004","url":null,"abstract":"Pretopology is a mathematical model developed from a weakening of the topological axiomatic. It was initially used in economic, social and biological sciences and next in pattern recognition and image analysis. More recently, it has been applied to the analysis of complex networks. Pretopology enables to work in a mathematical framework with weak properties, and its nonidempotent operator called pseudo-closure permits to implement iterative algorithms. It proposes a formalism that generalizes graph theory concepts and allows to model problems universally. In this paper, authors will extend this mathematical model to analyze complex data with spatiotemporal dimensions. Authors define the notion of a temporal pretopology based on a temporal function. They give an example of temporal function based on a binary relation, and construct a temporal pretopology. They define two new notions of temporal substructures which aim at representing evolution of substructures. They propose algorithms to extract these substructures. They experiment the proposition on 2 data and two economic real data.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"1 1","pages":"1-17"},"PeriodicalIF":1.2,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79691839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Iterative and Semi-Supervised Design of Chatbots Using Interactive Clustering 基于交互聚类的聊天机器人迭代半监督设计
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2022-04-01 DOI: 10.4018/ijdwm.298007
Erwan Schild, Gautier Durantin, Jean-Charles Lamirel, F. Miconi
Chatbots represent a promising tool to automate the processing of requests in a business context. However, despite major progress in natural language processing technologies, constructing a dataset deemed relevant by business experts is a manual, iterative and error-prone process. To assist these experts during modelling and labelling, the authors propose an active learning methodology coined Interactive Clustering. It relies on interactions between computer-guided segmentation of data in intents, and response-driven human annotations imposing constraints on clusters to improve relevance.This article applies Interactive Clustering on a realistic dataset, and measures the optimal settings required for relevant segmentation in a minimal number of annotations. The usability of the method is discussed in terms of computation time, and the achieved compromise between business relevance and classification performance during training.In this context, Interactive Clustering appears as a suitable methodology combining human and computer initiatives to efficiently develop a useable chatbot.
聊天机器人代表了一种很有前途的工具,可以在业务上下文中自动处理请求。然而,尽管自然语言处理技术取得了重大进展,但构建业务专家认为相关的数据集是一个手动的、迭代的、容易出错的过程。为了在建模和标记过程中帮助这些专家,作者提出了一种主动学习方法,称为交互式聚类。它依赖于意图中计算机引导的数据分割和响应驱动的人为注释之间的交互,这些注释在集群上施加约束以提高相关性。本文在实际数据集上应用交互式聚类,并测量在最少数量的注释中进行相关分割所需的最佳设置。从计算时间的角度讨论了该方法的可用性,并在训练过程中实现了业务相关性和分类性能之间的折衷。在这种情况下,交互式聚类似乎是一种合适的方法,结合了人类和计算机的主动性,以有效地开发可用的聊天机器人。
{"title":"Iterative and Semi-Supervised Design of Chatbots Using Interactive Clustering","authors":"Erwan Schild, Gautier Durantin, Jean-Charles Lamirel, F. Miconi","doi":"10.4018/ijdwm.298007","DOIUrl":"https://doi.org/10.4018/ijdwm.298007","url":null,"abstract":"Chatbots represent a promising tool to automate the processing of requests in a business context. However, despite major progress in natural language processing technologies, constructing a dataset deemed relevant by business experts is a manual, iterative and error-prone process. To assist these experts during modelling and labelling, the authors propose an active learning methodology coined Interactive Clustering. It relies on interactions between computer-guided segmentation of data in intents, and response-driven human annotations imposing constraints on clusters to improve relevance.This article applies Interactive Clustering on a realistic dataset, and measures the optimal settings required for relevant segmentation in a minimal number of annotations. The usability of the method is discussed in terms of computation time, and the achieved compromise between business relevance and classification performance during training.In this context, Interactive Clustering appears as a suitable methodology combining human and computer initiatives to efficiently develop a useable chatbot.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"17 1","pages":"1-19"},"PeriodicalIF":1.2,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73614194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Boat Detection in Marina Using Time-Delay Analysis and Deep Learning 基于时延分析和深度学习的码头船舶检测
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2022-04-01 DOI: 10.4018/ijdwm.298006
Romane Scherrer, Erwan Aulnette, T. Quiniou, J. Kasarhérou, Pierre Kolb, Nazha Selmaoui-Folcher
An autonomous acoustic system based on two bottom-moored hydrophones, a two-input audio board and a small single-board computer was installed at the entrance of a marina to detect entering/exiting boat. Windowed time lagged cross-correlations are calculated by the system to find the consecutive time delays between the hydrophone signals and to compute a signal which is a function of the boats' angular trajectories. Since its installation, the single-board computer performs online prediction with a signal processing-based algorithm which achieved an accuracy of 80 %. To improve system performance, a convolutional neural network (CNN) is trained with the acquired data to perform real-time detection. Two classification tasks were considered (binary and multiclass) to both detect a boat and its direction of navigation. Finally, a trained CNN was implemented in a single-board computer to ensure that prediction can be performed in real time.
在码头入口处安装了一个基于两个底系泊水听器、一个双输入音频板和一个小型单板计算机的自主声学系统,以探测进出船只。系统通过计算带窗时间滞后的相互关系,找到水听器信号之间的连续时间延迟,并计算出一个信号,该信号是船的角轨迹的函数。自安装以来,单板计算机使用基于信号处理的算法进行在线预测,准确率达到80%。为了提高系统性能,利用采集到的数据训练卷积神经网络(CNN)进行实时检测。考虑了两种分类任务(二元分类和多分类)来检测船只及其航行方向。最后,在单板计算机上实现经过训练的CNN,以确保能够实时进行预测。
{"title":"Boat Detection in Marina Using Time-Delay Analysis and Deep Learning","authors":"Romane Scherrer, Erwan Aulnette, T. Quiniou, J. Kasarhérou, Pierre Kolb, Nazha Selmaoui-Folcher","doi":"10.4018/ijdwm.298006","DOIUrl":"https://doi.org/10.4018/ijdwm.298006","url":null,"abstract":"An autonomous acoustic system based on two bottom-moored hydrophones, a two-input audio board and a small single-board computer was installed at the entrance of a marina to detect entering/exiting boat. Windowed time lagged cross-correlations are calculated by the system to find the consecutive time delays between the hydrophone signals and to compute a signal which is a function of the boats' angular trajectories. Since its installation, the single-board computer performs online prediction with a signal processing-based algorithm which achieved an accuracy of 80 %. To improve system performance, a convolutional neural network (CNN) is trained with the acquired data to perform real-time detection. Two classification tasks were considered (binary and multiclass) to both detect a boat and its direction of navigation. Finally, a trained CNN was implemented in a single-board computer to ensure that prediction can be performed in real time.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"16 1","pages":"1-16"},"PeriodicalIF":1.2,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73432847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large-Scale System for Social Media Data Warehousing: The Case of Twitter-Related Drug Abuse Events Integration 社交媒体数据仓库的大规模系统:以twitter相关药物滥用事件整合为例
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2022-01-01 DOI: 10.4018/ijdwm.290890
Ferdaous Jenhani, M. Gouider
Social media data become an integral part in the business data and should be integrated into the decisional process for better decision making based on information which reflects better the true situation of business in any field. However, social media data are unstructured and generated in very high frequency which exceeds the capacity of the data warehouse. In this work, we propose to extend the data warehousing process with a staging area which heart is a large scale system implementing an information extraction process using Storm and Hadoop frameworks to better manage their volume and frequency. Concerning structured information extraction, mainly events, we combine a set of techniques from NLP, linguistic rules and machine learning to succeed the task. Finally, we propose the adequate data warehouse conceptual model for events modeling and integration with enterprise data warehouse using an intermediate table called Bridge table. For application and experiments, we focus on drug abuse events extraction from Twitter data and their modeling into the Event Data Warehouse.
社交媒体数据成为商业数据中不可或缺的一部分,应该融入到决策过程中,以便更好地根据信息做出决策,从而更好地反映任何领域的商业真实情况。然而,社交媒体数据是非结构化的,并且生成的频率非常高,超过了数据仓库的容量。在这项工作中,我们建议扩展数据仓库流程,并建立一个临时区,该临时区是一个使用Storm和Hadoop框架实现信息提取流程的大型系统,以更好地管理其数量和频率。关于结构化信息提取,主要是事件,我们结合了一组来自NLP,语言规则和机器学习的技术来完成任务。最后,我们提出了适当的数据仓库概念模型,用于使用一个称为桥表的中间表对事件进行建模并与企业数据仓库集成。在应用和实验方面,我们着重于从Twitter数据中提取药物滥用事件并将其建模到事件数据仓库中。
{"title":"Large-Scale System for Social Media Data Warehousing: The Case of Twitter-Related Drug Abuse Events Integration","authors":"Ferdaous Jenhani, M. Gouider","doi":"10.4018/ijdwm.290890","DOIUrl":"https://doi.org/10.4018/ijdwm.290890","url":null,"abstract":"Social media data become an integral part in the business data and should be integrated into the decisional process for better decision making based on information which reflects better the true situation of business in any field. However, social media data are unstructured and generated in very high frequency which exceeds the capacity of the data warehouse. In this work, we propose to extend the data warehousing process with a staging area which heart is a large scale system implementing an information extraction process using Storm and Hadoop frameworks to better manage their volume and frequency. Concerning structured information extraction, mainly events, we combine a set of techniques from NLP, linguistic rules and machine learning to succeed the task. Finally, we propose the adequate data warehouse conceptual model for events modeling and integration with enterprise data warehouse using an intermediate table called Bridge table. For application and experiments, we focus on drug abuse events extraction from Twitter data and their modeling into the Event Data Warehouse.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"10 1","pages":"1-18"},"PeriodicalIF":1.2,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75167432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
International Journal of Data Warehousing and Mining
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1