Pub Date : 2024-01-28DOI: 10.1007/s41019-023-00240-9
Jiaqi Duan, Xiangfu Meng, Guihong Liu
{"title":"Where To Go at the Next Timestamp","authors":"Jiaqi Duan, Xiangfu Meng, Guihong Liu","doi":"10.1007/s41019-023-00240-9","DOIUrl":"https://doi.org/10.1007/s41019-023-00240-9","url":null,"abstract":"","PeriodicalId":52220,"journal":{"name":"Data Science and Engineering","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2024-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139592182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-19DOI: 10.1007/s41019-023-00235-6
Xuanhe Zhou, Zhaoyan Sun, Guoliang Li
{"title":"DB-GPT: Large Language Model Meets Database","authors":"Xuanhe Zhou, Zhaoyan Sun, Guoliang Li","doi":"10.1007/s41019-023-00235-6","DOIUrl":"https://doi.org/10.1007/s41019-023-00235-6","url":null,"abstract":"","PeriodicalId":52220,"journal":{"name":"Data Science and Engineering","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139612922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-01Epub Date: 2024-03-13DOI: 10.1007/s41019-023-00239-2
Eric Austin, Shraddha Makwana, Amine Trabelsi, Christine Largeron, Osmar R Zaïane
Topic modeling aims to discover latent themes in collections of text documents. It has various applications across fields such as sociology, opinion analysis, and media studies. In such areas, it is essential to have easily interpretable, diverse, and coherent topics. An efficient topic modeling technique should accurately identify flat and hierarchical topics, especially useful in disciplines where topics can be logically arranged into a tree format. In this paper, we propose Community Topic, a novel algorithm that exploits word co-occurrence networks to mine communities and produces topics. We also evaluate the proposed approach using several metrics and compare it with usual baselines, confirming its good performances. Community Topic enables quick identification of flat topics and topic hierarchy, facilitating the on-demand exploration of sub- and super-topics. It also obtains good results on datasets in different languages.
{"title":"Uncovering Flat and Hierarchical Topics by Community Discovery on Word Co-occurrence Network.","authors":"Eric Austin, Shraddha Makwana, Amine Trabelsi, Christine Largeron, Osmar R Zaïane","doi":"10.1007/s41019-023-00239-2","DOIUrl":"10.1007/s41019-023-00239-2","url":null,"abstract":"<p><p>Topic modeling aims to discover latent themes in collections of text documents. It has various applications across fields such as sociology, opinion analysis, and media studies. In such areas, it is essential to have easily interpretable, diverse, and coherent topics. An efficient topic modeling technique should accurately identify flat and hierarchical topics, especially useful in disciplines where topics can be logically arranged into a tree format. In this paper, we propose Community Topic, a novel algorithm that exploits word co-occurrence networks to mine communities and produces topics. We also evaluate the proposed approach using several metrics and compare it with usual baselines, confirming its good performances. Community Topic enables quick identification of flat topics and topic hierarchy, facilitating the on-demand exploration of sub- and super-topics. It also obtains good results on datasets in different languages.</p>","PeriodicalId":52220,"journal":{"name":"Data Science and Engineering","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10980674/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140337633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-19DOI: 10.1007/s41019-023-00236-5
A. Forkan, Yongjin Kang, Felip Martí, Abhik Banerjee, Chris McCarthy, Hadi Ghaderi, Breno Costa, Anas Dawod, Dimitrios Georgakopolous, P. Jayaraman
{"title":"AIoT-CitySense: AI and IoT-Driven City-Scale Sensing for Roadside Infrastructure Maintenance","authors":"A. Forkan, Yongjin Kang, Felip Martí, Abhik Banerjee, Chris McCarthy, Hadi Ghaderi, Breno Costa, Anas Dawod, Dimitrios Georgakopolous, P. Jayaraman","doi":"10.1007/s41019-023-00236-5","DOIUrl":"https://doi.org/10.1007/s41019-023-00236-5","url":null,"abstract":"","PeriodicalId":52220,"journal":{"name":"Data Science and Engineering","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138962482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-29DOI: 10.1007/s41019-023-00234-7
Rob Muspratt, Musa Mammadov
{"title":"Anomaly Detection with Sub-Extreme Values: Health Provider Billing","authors":"Rob Muspratt, Musa Mammadov","doi":"10.1007/s41019-023-00234-7","DOIUrl":"https://doi.org/10.1007/s41019-023-00234-7","url":null,"abstract":"","PeriodicalId":52220,"journal":{"name":"Data Science and Engineering","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2023-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139211057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-17DOI: 10.1007/s41019-023-00232-9
Sharon Torao Pingi, Duoyi Zhang, Md Abul Bashar, Richi Nayak
Abstract Generative adversarial networks (GANs) have demonstrated their effectiveness in generating temporal data to fill in missing values, enhancing the classification performance of time series data. Longitudinal datasets encompass multivariate time series data with additional static features that contribute to sample variability over time. These datasets often encounter missing values due to factors such as irregular sampling. However, existing GAN-based imputation methods that address this type of data missingness often overlook the impact of static features on temporal observations and classification outcomes. This paper presents a novel method, fusion-aided imputer-classifier GAN (FaIC-GAN), tailored for longitudinal data classification. FaIC-GAN simultaneously leverages partially observed temporal data and static features to enhance imputation and classification learning. We present four multimodal fusion strategies that effectively extract correlated information from both static and temporal modalities. Our extensive experiments reveal that FaIC-GAN successfully exploits partially observed temporal data and static features, resulting in improved classification accuracy compared to unimodal models. Our post-additive and attention-based multimodal fusion approaches within the FaIC-GAN model consistently rank among the top three methods for classification.
摘要生成对抗网络(GANs)在生成时间数据来填补缺失值,提高时间序列数据的分类性能方面已经证明了其有效性。纵向数据集包含具有额外静态特征的多变量时间序列数据,这些静态特征有助于样本随时间的变化。由于不规则采样等因素,这些数据集经常会遇到缺失值。然而,解决这类数据缺失的现有基于gan的插值方法往往忽略了静态特征对时间观测和分类结果的影响。本文提出了一种专为纵向数据分类而设计的新方法——融合辅助imputer-classifier GAN (FaIC-GAN)。FaIC-GAN同时利用部分观测到的时间数据和静态特征来增强输入和分类学习。我们提出了四种多模态融合策略,有效地从静态和时间模态中提取相关信息。我们的大量实验表明,FaIC-GAN成功地利用了部分观测到的时间数据和静态特征,与单峰模型相比,提高了分类精度。在FaIC-GAN模型中,我们的后加和基于注意力的多模态融合方法一直名列前三种分类方法之列。
{"title":"Joint Representation Learning with Generative Adversarial Imputation Network for Improved Classification of Longitudinal Data","authors":"Sharon Torao Pingi, Duoyi Zhang, Md Abul Bashar, Richi Nayak","doi":"10.1007/s41019-023-00232-9","DOIUrl":"https://doi.org/10.1007/s41019-023-00232-9","url":null,"abstract":"Abstract Generative adversarial networks (GANs) have demonstrated their effectiveness in generating temporal data to fill in missing values, enhancing the classification performance of time series data. Longitudinal datasets encompass multivariate time series data with additional static features that contribute to sample variability over time. These datasets often encounter missing values due to factors such as irregular sampling. However, existing GAN-based imputation methods that address this type of data missingness often overlook the impact of static features on temporal observations and classification outcomes. This paper presents a novel method, fusion-aided imputer-classifier GAN (FaIC-GAN), tailored for longitudinal data classification. FaIC-GAN simultaneously leverages partially observed temporal data and static features to enhance imputation and classification learning. We present four multimodal fusion strategies that effectively extract correlated information from both static and temporal modalities. Our extensive experiments reveal that FaIC-GAN successfully exploits partially observed temporal data and static features, resulting in improved classification accuracy compared to unimodal models. Our post-additive and attention-based multimodal fusion approaches within the FaIC-GAN model consistently rank among the top three methods for classification.","PeriodicalId":52220,"journal":{"name":"Data Science and Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135996121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}