Journal of Supercomputing最新文献

Topic sentiment analysis based on deep neural network using document embedding technique. 基于深度神经网络的主题情感分析采用文档嵌入技术。

IF 2.5 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Supercomputing

Pub Date : 2023-06-05 DOI: 10.1007/s11227-023-05423-9

Azam Seilsepour, Reza Ravanmehr, Ramin Nassiri

Sentiment Analysis (SA) is a domain- or topic-dependent task since polarity terms convey different sentiments in various domains. Hence, machine learning models trained on a specific domain cannot be employed in other domains, and existing domain-independent lexicons cannot correctly recognize the polarity of domain-specific polarity terms. Conventional approaches of Topic Sentiment Analysis perform Topic Modeling (TM) and SA sequentially, utilizing the previously trained models on irrelevant datasets for classifying sentiments that cannot provide acceptable accuracy. However, some researchers perform TM and SA simultaneously using topic-sentiment joint models, which require a list of seeds and their sentiments from widely used domain-independent lexicons. As a result, these methods cannot find the polarity of domain-specific terms correctly. This paper proposes a novel supervised hybrid TSA approach, called Embedding Topic Sentiment Analysis using Deep Neural Networks (ETSANet), that extracts the semantic relationships between the hidden topics and the training dataset using Semantically Topic-Related Documents Finder (STRDF). STRDF discovers those training documents in the same context as the topic based on the semantic relationships between the Semantic Topic Vector, a newly introduced concept that encompasses the semantic aspects of a topic, and the training dataset. Then, a hybrid CNN-GRU model is trained by these semantically topic-related documents. Moreover, a hybrid metaheuristic method utilizing Grey Wolf Optimization and Whale Optimization Algorithm is employed to fine-tune the hyperparameters of the CNN-GRU network. The evaluation results demonstrate that ETSANet increases the accuracy of the state-of-the-art methods by 1.92%.

情感分析（SA）是一项与领域或主题相关的任务，因为极性术语在不同领域传达不同的情感。因此，在特定领域上训练的机器学习模型不能用于其他领域，并且现有的与领域无关的词典不能正确识别领域特定极性项的极性。主题情感分析的传统方法依次执行主题建模（TM）和SA，利用先前在不相关数据集上训练的模型对无法提供可接受准确性的情感进行分类。然而，一些研究人员使用主题情感联合模型同时执行TM和SA，这需要一份来自广泛使用的领域无关词典的种子及其情感列表。因此，这些方法无法正确地找到特定领域术语的极性。本文提出了一种新的监督混合TSA方法，称为使用深度神经网络嵌入主题情感分析（ETSANet），该方法使用语义主题相关文档查找器（STRDF）提取隐藏主题与训练数据集之间的语义关系。STRDF基于语义主题向量和训练数据集之间的语义关系，在与主题相同的上下文中发现这些训练文档。语义主题向量是一个新引入的概念，包含主题的语义方面。然后，通过这些语义主题相关的文档来训练混合CNN-GRU模型。此外，还采用了一种利用灰太狼优化和鲸鱼优化算法的混合元启发式方法来微调CNN-GRU网络的超参数。评估结果表明，ETSANet将最先进方法的准确性提高了1.92%。

{"title":"Topic sentiment analysis based on deep neural network using document embedding technique.","authors":"Azam Seilsepour, Reza Ravanmehr, Ramin Nassiri","doi":"10.1007/s11227-023-05423-9","DOIUrl":"10.1007/s11227-023-05423-9","url":null,"abstract":"Sentiment Analysis (SA) is a domain- or topic-dependent task since polarity terms convey different sentiments in various domains. Hence, machine learning models trained on a specific domain cannot be employed in other domains, and existing domain-independent lexicons cannot correctly recognize the polarity of domain-specific polarity terms. Conventional approaches of Topic Sentiment Analysis perform Topic Modeling (TM) and SA sequentially, utilizing the previously trained models on irrelevant datasets for classifying sentiments that cannot provide acceptable accuracy. However, some researchers perform TM and SA simultaneously using topic-sentiment joint models, which require a list of seeds and their sentiments from widely used domain-independent lexicons. As a result, these methods cannot find the polarity of domain-specific terms correctly. This paper proposes a novel supervised hybrid TSA approach, called Embedding Topic Sentiment Analysis using Deep Neural Networks (ETSANet), that extracts the semantic relationships between the hidden topics and the training dataset using Semantically Topic-Related Documents Finder (STRDF). STRDF discovers those training documents in the same context as the topic based on the semantic relationships between the Semantic Topic Vector, a newly introduced concept that encompasses the semantic aspects of a topic, and the training dataset. Then, a hybrid CNN-GRU model is trained by these semantically topic-related documents. Moreover, a hybrid metaheuristic method utilizing Grey Wolf Optimization and Whale Optimization Algorithm is employed to fine-tune the hyperparameters of the CNN-GRU network. The evaluation results demonstrate that ETSANet increases the accuracy of the state-of-the-art methods by 1.92%.","PeriodicalId":50034,"journal":{"name":"Journal of Supercomputing","volume":" ","pages":"1-39"},"PeriodicalIF":2.5,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10241384/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10091321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Fechner multiscale local descriptor for face recognition. 用于人脸识别的Fechner多尺度局部描述符。

IF 3.3 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Supercomputing

Pub Date : 2023-06-02 DOI: 10.1007/s11227-023-05421-x

Jinxiang Feng, Jie Xu, Yizhi Deng, Jun Gao

Inspired by Fechner's law, we propose a Fechner multiscale local descriptor (FMLD) for feature extraction and face recognition. Fechner's law is a well-known law in psychology, which states that a human perception is proportional to the logarithm of the intensity of the corresponding significant differences physical quantity. FMLD uses the significant difference between pixels to simulate the pattern perception of human beings to the changes of surroundings. The first round of feature extraction is performed in two local domains of different sizes to capture the structural features of the facial images, resulting in four facial feature images. In the second round of feature extraction, two binary patterns are used to extract local features on the obtained magnitude and direction feature images, and four corresponding feature maps are output. Finally, all feature maps are fused to form an overall histogram feature. Different from the existing descriptors, the FMLD's magnitude and direction features are not isolated. They are derived from the "perceived intensity", thus there is a close relationship between them, which further facilitates the feature representation. In the experiments, we evaluated the performance of FMLD in multiple face databases and compared it with the leading edge approaches. The results show that the proposed FMLD performs well in recognizing images with illumination, pose, expression and occlusion changes. The results also indicate that the feature images produced by FMLD significantly improve the performance of convolutional neural network (CNN), and the combination of FMLD and CNN exhibits better performance than other advanced descriptors.

受Fechner定律的启发，我们提出了一种用于特征提取和人脸识别的Fechner多尺度局部描述符（FMLD）。费什内尔定律是心理学中一个著名的定律，它指出人类的感知与相应显著差异的物理量的强度的对数成正比。FMLD利用像素之间的显著差异来模拟人类对周围环境变化的模式感知。第一轮特征提取是在两个不同大小的局部域中进行的，以捕捉面部图像的结构特征，得到四个面部特征图像。在第二轮特征提取中，使用两个二进制模式来提取所获得的幅度和方向特征图像上的局部特征，并输出四个相应的特征图。最后，将所有特征图进行融合，形成整体直方图特征。与现有的描述符不同，FMLD的幅度和方向特征不是孤立的。它们来源于“感知强度”，因此它们之间有着密切的关系，这进一步促进了特征的表示。在实验中，我们评估了FMLD在多个人脸数据库中的性能，并将其与前沿方法进行了比较。结果表明，所提出的FMLD在识别具有光照、姿态、表情和遮挡变化的图像方面表现良好。结果还表明，FMLD生成的特征图像显著提高了卷积神经网络（CNN）的性能，并且FMLD和CNN的组合比其他高级描述符表现出更好的性能。

{"title":"A Fechner multiscale local descriptor for face recognition.","authors":"Jinxiang Feng, Jie Xu, Yizhi Deng, Jun Gao","doi":"10.1007/s11227-023-05421-x","DOIUrl":"10.1007/s11227-023-05421-x","url":null,"abstract":"Inspired by Fechner's law, we propose a Fechner multiscale local descriptor (FMLD) for feature extraction and face recognition. Fechner's law is a well-known law in psychology, which states that a human perception is proportional to the logarithm of the intensity of the corresponding significant differences physical quantity. FMLD uses the significant difference between pixels to simulate the pattern perception of human beings to the changes of surroundings. The first round of feature extraction is performed in two local domains of different sizes to capture the structural features of the facial images, resulting in four facial feature images. In the second round of feature extraction, two binary patterns are used to extract local features on the obtained magnitude and direction feature images, and four corresponding feature maps are output. Finally, all feature maps are fused to form an overall histogram feature. Different from the existing descriptors, the FMLD's magnitude and direction features are not isolated. They are derived from the \"perceived intensity\", thus there is a close relationship between them, which further facilitates the feature representation. In the experiments, we evaluated the performance of FMLD in multiple face databases and compared it with the leading edge approaches. The results show that the proposed FMLD performs well in recognizing images with illumination, pose, expression and occlusion changes. The results also indicate that the feature images produced by FMLD significantly improve the performance of convolutional neural network (CNN), and the combination of FMLD and CNN exhibits better performance than other advanced descriptors.","PeriodicalId":50034,"journal":{"name":"Journal of Supercomputing","volume":" ","pages":"1-28"},"PeriodicalIF":3.3,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10234800/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10072649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Data quality model for assessing public COVID-19 big datasets. 用于评估公共新冠肺炎大数据集的数据质量模型。

IF 2.5 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Supercomputing

Pub Date : 2023-05-31 DOI: 10.1007/s11227-023-05410-0

Alladoumbaye Ngueilbaye, Joshua Zhexue Huang, Mehak Khan, Hongzhi Wang

For decision-making support and evidence based on healthcare, high quality data are crucial, particularly if the emphasized knowledge is lacking. For public health practitioners and researchers, the reporting of COVID-19 data need to be accurate and easily available. Each nation has a system in place for reporting COVID-19 data, albeit these systems' efficacy has not been thoroughly evaluated. However, the current COVID-19 pandemic has shown widespread flaws in data quality. We propose a data quality model (canonical data model, four adequacy levels, and Benford's law) to assess the quality issue of COVID-19 data reporting carried out by the World Health Organization (WHO) in the six Central African Economic and Monitory Community (CEMAC) region countries between March 6,2020, and June 22, 2022, and suggest potential solutions. These levels of data quality sufficiency can be interpreted as dependability indicators and sufficiency of Big Dataset inspection. This model effectively identified the quality of the entry data for big dataset analytics. The future development of this model requires scholars and institutions from all sectors to deepen their understanding of its core concepts, improve integration with other data processing technologies, and broaden the scope of its applications.

对于基于医疗保健的决策支持和证据，高质量的数据至关重要，尤其是在缺乏所强调的知识的情况下。对于公共卫生从业者和研究人员来说，新冠肺炎数据的报告需要准确且易于获得。每个国家都有一个报告新冠肺炎数据的系统，尽管这些系统的功效尚未得到彻底评估。然而，当前的新冠肺炎疫情显示出数据质量方面的普遍缺陷。我们提出了一个数据质量模型（规范数据模型、四个充分性水平和本福德定律），以评估世界卫生组织（世界卫生组织）在2020年3月6日至2022年6月22日期间在中非经济和监测共同体（中非经货共同体）六个区域国家进行的新冠肺炎数据报告的质量问题，并提出潜在的解决方案。这些数据质量充分性水平可以解释为大数据集检查的可靠性指标和充分性。该模型有效地确定了大数据集分析的入口数据的质量。该模型的未来发展需要各界学者和机构加深对其核心概念的理解，提高与其他数据处理技术的集成度，拓宽其应用范围。

{"title":"Data quality model for assessing public COVID-19 big datasets.","authors":"Alladoumbaye Ngueilbaye, Joshua Zhexue Huang, Mehak Khan, Hongzhi Wang","doi":"10.1007/s11227-023-05410-0","DOIUrl":"10.1007/s11227-023-05410-0","url":null,"abstract":"For decision-making support and evidence based on healthcare, high quality data are crucial, particularly if the emphasized knowledge is lacking. For public health practitioners and researchers, the reporting of COVID-19 data need to be accurate and easily available. Each nation has a system in place for reporting COVID-19 data, albeit these systems' efficacy has not been thoroughly evaluated. However, the current COVID-19 pandemic has shown widespread flaws in data quality. We propose a data quality model (canonical data model, four adequacy levels, and Benford's law) to assess the quality issue of COVID-19 data reporting carried out by the World Health Organization (WHO) in the six Central African Economic and Monitory Community (CEMAC) region countries between March 6,2020, and June 22, 2022, and suggest potential solutions. These levels of data quality sufficiency can be interpreted as dependability indicators and sufficiency of Big Dataset inspection. This model effectively identified the quality of the entry data for big dataset analytics. The future development of this model requires scholars and institutions from all sectors to deepen their understanding of its core concepts, improve integration with other data processing technologies, and broaden the scope of its applications.","PeriodicalId":50034,"journal":{"name":"Journal of Supercomputing","volume":" ","pages":"1-33"},"PeriodicalIF":2.5,"publicationDate":"2023-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10230148/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9713878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BTDA: Two-factor dynamic identity authentication scheme for data trading based on alliance chain. BTDA：基于联盟链的数据交易双因素动态身份认证方案。

IF 3.3 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Supercomputing

Pub Date : 2023-05-25 DOI: 10.1007/s11227-023-05393-y

Fengmei Chen, Bin Zhao, Yilong Gao, Wenyin Zhang

With the increase in the market share of data trading, the risks such as identity authentication and authority management are increasingly intensified. Aiming at the problems of centralization of identity authentication, dynamic changes of identities, and ambiguity of trading authority in data trading, a two-factor dynamic identity authentication scheme for data trading based on alliance chain (BTDA) is proposed. Firstly, the use of identity certificates is simplified to solve the problems of large calculation and difficult storage. Secondly, a two-factor dynamic authentication strategy is designed, which uses distributed ledger to achieve dynamic identity authentication throughout the data trading. Finally, a simulation experiment is carried out on the proposed scheme. The theoretical comparison and analysis with similar schemes show that the proposed scheme has lower cost, higher authentication efficiency and security, easier authority management, and can be widely used in various fields of data trading scenarios.

随着数据交易市场份额的增加，身份认证和权限管理等风险日益加剧。针对数据交易中身份认证集中、身份动态变化、交易权限模糊等问题，提出了一种基于联盟链的数据交易双因素动态身份认证方案。首先，简化了身份证书的使用，解决了计算量大、存储困难的问题。其次，设计了一种双因素动态身份认证策略，该策略利用分布式账本实现了整个数据交易过程中的动态身份认证。最后，对所提出的方案进行了仿真实验。与同类方案的理论比较分析表明，该方案成本较低，认证效率和安全性较高，权限管理更简单，可广泛应用于数据交易场景的各个领域。

引用次数: 0

Driving behavior analysis and classification by vehicle OBD data using machine learning. 使用机器学习通过车辆OBD数据进行驾驶行为分析和分类。

IF 3.3 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Supercomputing

Pub Date : 2023-05-19 DOI: 10.1007/s11227-023-05364-3

Raman Kumar, Anuj Jain

The transportation industry's focus on improving performance and reducing costs has driven the integration of IoT and machine learning technologies. The correlation between driving style and behavior with fuel consumption and emissions has highlighted the need to classify different driver's driving patterns. In response, vehicles now come equipped with sensors that gather a wide range of operational data. The proposed technique collects critical vehicle performance data, including speed, motor RPM, paddle position, determined motor load, and over 50 other parameters through the OBD interface. The OBD-II diagnostics protocol, the primary diagnostic process used by technicians, can acquire this information via the car's communication port. OBD-II protocol is used to acquire real-time data linked to the vehicle's operation. This data are used to collect engine operation-related characteristics and assist with fault detection. The proposed method uses machine learning techniques, such as SVM, AdaBoost, and Random Forest, to classify driver's behavior based on ten categories that include fuel consumption, steering stability, velocity stability, and braking patterns. The solution offers an effective means to study driving behavior and recommend corrective actions for efficient and safe driving. The proposed model offers a classification of ten driver classes based on fuel consumption, steering stability, velocity stability, and braking patterns. This research work uses data extracted from the engine's internal sensors via the OBD-II protocol, eliminating the need for additional sensors. The collected data are used to build a model that classifies driver's behavior and can be used to provide feedback to improve driving habits. Key driving events, such as high-speed braking, rapid acceleration, deceleration, and turning, are used to characterize individual drivers. Visualization techniques, such as line plots and correlation matrices, are used to compare drivers' performance. Time-series values of the sensor data are considered in the model. The supervised learning methods are employed to compare all driver classes. SVM, AdaBoost, and Random Forest algorithms are implemented with 99%, 99%, and 100% accuracy, respectively. The suggested model offers a practical approach to examining driving behavior and suggesting necessary measures to enhance driving safety and efficiency.

交通行业对提高性能和降低成本的关注推动了物联网和机器学习技术的集成。驾驶风格和行为与油耗和排放之间的相关性突出了对不同驾驶员驾驶模式进行分类的必要性。作为回应，车辆现在配备了传感器，可以收集广泛的操作数据。所提出的技术通过OBD接口收集关键的车辆性能数据，包括速度、电机RPM、拨杆位置、确定的电机负载和50多个其他参数。OBD-II诊断协议是技术人员使用的主要诊断过程，可以通过汽车的通信端口获取这些信息。OBD-II协议用于获取与车辆运行相关的实时数据。这些数据用于收集与发动机运行相关的特性，并有助于故障检测。所提出的方法使用机器学习技术，如SVM、AdaBoost和随机森林，根据油耗、转向稳定性、速度稳定性和制动模式等十个类别对驾驶员的行为进行分类。该解决方案提供了一种有效的方法来研究驾驶行为，并为高效安全驾驶提出纠正措施建议。所提出的模型根据油耗、转向稳定性、速度稳定性和制动模式提供了十种驾驶员类别的分类。这项研究工作使用了通过OBD-II协议从发动机内部传感器提取的数据，消除了对额外传感器的需求。收集的数据用于建立一个对驾驶员行为进行分类的模型，并可用于提供反馈以改善驾驶习惯。关键驾驶事件，如高速制动、快速加速、减速和转弯，用于描述单个驾驶员的特征。可视化技术，如折线图和相关矩阵，用于比较驾驶员的表现。在模型中考虑传感器数据的时间序列值。采用监督学习方法来比较所有驾驶员类别。SVM、AdaBoost和随机森林算法分别以99%、99%和100%的准确率实现。所提出的模型为检验驾驶行为提供了一种实用的方法，并提出了提高驾驶安全性和效率的必要措施。

{"title":"Driving behavior analysis and classification by vehicle OBD data using machine learning.","authors":"Raman Kumar, Anuj Jain","doi":"10.1007/s11227-023-05364-3","DOIUrl":"10.1007/s11227-023-05364-3","url":null,"abstract":"The transportation industry's focus on improving performance and reducing costs has driven the integration of IoT and machine learning technologies. The correlation between driving style and behavior with fuel consumption and emissions has highlighted the need to classify different driver's driving patterns. In response, vehicles now come equipped with sensors that gather a wide range of operational data. The proposed technique collects critical vehicle performance data, including speed, motor RPM, paddle position, determined motor load, and over 50 other parameters through the OBD interface. The OBD-II diagnostics protocol, the primary diagnostic process used by technicians, can acquire this information via the car's communication port. OBD-II protocol is used to acquire real-time data linked to the vehicle's operation. This data are used to collect engine operation-related characteristics and assist with fault detection. The proposed method uses machine learning techniques, such as SVM, AdaBoost, and Random Forest, to classify driver's behavior based on ten categories that include fuel consumption, steering stability, velocity stability, and braking patterns. The solution offers an effective means to study driving behavior and recommend corrective actions for efficient and safe driving. The proposed model offers a classification of ten driver classes based on fuel consumption, steering stability, velocity stability, and braking patterns. This research work uses data extracted from the engine's internal sensors via the OBD-II protocol, eliminating the need for additional sensors. The collected data are used to build a model that classifies driver's behavior and can be used to provide feedback to improve driving habits. Key driving events, such as high-speed braking, rapid acceleration, deceleration, and turning, are used to characterize individual drivers. Visualization techniques, such as line plots and correlation matrices, are used to compare drivers' performance. Time-series values of the sensor data are considered in the model. The supervised learning methods are employed to compare all driver classes. SVM, AdaBoost, and Random Forest algorithms are implemented with 99%, 99%, and 100% accuracy, respectively. The suggested model offers a practical approach to examining driving behavior and suggesting necessary measures to enhance driving safety and efficiency.","PeriodicalId":50034,"journal":{"name":"Journal of Supercomputing","volume":" ","pages":"1-20"},"PeriodicalIF":3.3,"publicationDate":"2023-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10198028/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10091322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Time-aware neural ordinary differential equations for incomplete time series modeling. 用于不完全时间序列建模的时间感知神经常微分方程。

IF 3.3 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Supercomputing

Pub Date : 2023-05-18 DOI: 10.1007/s11227-023-05327-8

Zhuoqing Chang, Shubo Liu, Run Qiu, Song Song, Zhaohui Cai, Guoqing Tu

Internet of Things realizes the ubiquitous connection of all things, generating countless time-tagged data called time series. However, real-world time series are often plagued with missing values on account of noise or malfunctioning sensors. Existing methods for modeling such incomplete time series typically involve preprocessing steps, such as deletion or missing data imputation using statistical learning or machine learning methods. Unfortunately, these methods unavoidable destroy time information and bring error accumulation to the subsequent model. To this end, this paper introduces a novel continuous neural network architecture, named Time-aware Neural-Ordinary Differential Equations (TN-ODE), for incomplete time data modeling. The proposed method not only supports imputation missing values at arbitrary time points, but also enables multi-step prediction at desired time points. Specifically, TN-ODE employs a time-aware Long Short-Term Memory as an encoder, which effectively learns the posterior distribution from partial observed data. Additionally, the derivative of latent states is parameterized with a fully connected network, thereby enabling continuous-time latent dynamics generation. The proposed TN-ODE model is evaluated on both real-world and synthetic incomplete time-series datasets by conducting data interpolation and extrapolation tasks as well as classification task. Extensive experiments show the TN-ODE model outperforms baseline methods in terms of Mean Square Error for imputation and prediction tasks, as well as accuracy in downstream classification task.

物联网实现了万物无处不在的连接，生成了无数被称为时间序列的时间标记数据。然而，由于噪声或传感器故障，真实世界的时间序列经常存在缺失值的问题。现有的建模这种不完整时间序列的方法通常涉及预处理步骤，例如使用统计学习或机器学习方法进行删除或缺失数据插补。遗憾的是，这些方法不可避免地会破坏时间信息，并给后续模型带来误差积累。为此，本文介绍了一种新的连续神经网络结构，称为时间感知神经常微分方程（TN-ODE），用于不完全时间数据建模。所提出的方法不仅支持在任意时间点插补缺失值，而且能够在所需时间点进行多步骤预测。具体而言，TN-ODE采用时间感知的长短期存储器作为编码器，它可以从部分观测数据中有效地学习后验分布。此外，利用完全连接的网络对潜在状态的导数进行参数化，从而实现连续时间的潜在动力学生成。通过执行数据插值和外推任务以及分类任务，在真实世界和合成的不完全时间序列数据集上对所提出的TN-ODE模型进行了评估。大量实验表明，TN-ODE模型在插补和预测任务的均方误差以及下游分类任务的准确性方面优于基线方法。

{"title":"Time-aware neural ordinary differential equations for incomplete time series modeling.","authors":"Zhuoqing Chang, Shubo Liu, Run Qiu, Song Song, Zhaohui Cai, Guoqing Tu","doi":"10.1007/s11227-023-05327-8","DOIUrl":"10.1007/s11227-023-05327-8","url":null,"abstract":"Internet of Things realizes the ubiquitous connection of all things, generating countless time-tagged data called time series. However, real-world time series are often plagued with missing values on account of noise or malfunctioning sensors. Existing methods for modeling such incomplete time series typically involve preprocessing steps, such as deletion or missing data imputation using statistical learning or machine learning methods. Unfortunately, these methods unavoidable destroy time information and bring error accumulation to the subsequent model. To this end, this paper introduces a novel continuous neural network architecture, named Time-aware Neural-Ordinary Differential Equations (TN-ODE), for incomplete time data modeling. The proposed method not only supports imputation missing values at arbitrary time points, but also enables multi-step prediction at desired time points. Specifically, TN-ODE employs a time-aware Long Short-Term Memory as an encoder, which effectively learns the posterior distribution from partial observed data. Additionally, the derivative of latent states is parameterized with a fully connected network, thereby enabling continuous-time latent dynamics generation. The proposed TN-ODE model is evaluated on both real-world and synthetic incomplete time-series datasets by conducting data interpolation and extrapolation tasks as well as classification task. Extensive experiments show the TN-ODE model outperforms baseline methods in terms of Mean Square Error for imputation and prediction tasks, as well as accuracy in downstream classification task.","PeriodicalId":50034,"journal":{"name":"Journal of Supercomputing","volume":" ","pages":"1-29"},"PeriodicalIF":3.3,"publicationDate":"2023-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10192786/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10091324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

SiMAIM: identifying sockpuppets and puppetmasters on a single forum-oriented social media site. SiMAIM：在一个面向论坛的社交媒体网站上识别鞋垫和木偶师。

IF 3.3 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Supercomputing

Pub Date : 2023-05-17 DOI: 10.1007/s11227-023-05376-z

Ying-Ho Liu, Chia-Yu Kuo

With the Internet becoming indispensable in our lives, social media has become an integral part of our lives. However, with this has come the phenomenon of a single user registering multiple accounts (sockpuppets) to advertise, spam, or cause controversy on social media sites, where the user is called the puppetmaster. This phenomenon is even more evident on forum-oriented social media sites. Identifying sockpuppets is a critical step in stopping the above-mentioned malicious acts. The identification of sockpuppets on a single forum-oriented social media site has seldom been addressed. This paper proposes a Single-site Multiple Accounts Identification Model (SiMAIM) framework to address this research gap. We used Mobile01, Taiwan's most popular forum-oriented social media site, to validate SiMAIM's performance. SiMAIM achieved F1 scores between 0.6 and 0.9 on identifying sockpuppets and puppetmasters under different datasets and settings. SiMAIM also outperformed the compared methods by 6-38% in F1 score.

随着互联网在我们的生活中变得不可或缺，社交媒体已经成为我们生活中不可或缺的一部分。然而，随之而来的是，一个用户注册多个帐户（sockputs）在社交媒体网站上做广告、发垃圾邮件或引发争议的现象，该用户被称为木偶大师。这种现象在以论坛为导向的社交媒体网站上更为明显。识别鞋垫是阻止上述恶意行为的关键一步。在一个面向论坛的社交媒体网站上识别sockputs的问题很少得到解决。本文提出了一个单站点多账户识别模型（SiMAIM）框架来解决这一研究空白。我们使用台湾最受欢迎的面向论坛的社交媒体网站Mobile01来验证SiMAIM的性能。在不同的数据集和设置下，SiMAIM在识别鞋垫和木偶大师方面获得了0.6至0.9的F1分数。SiMAIM在F1得分方面也优于比较方法6-38%。

引用次数: 1

KG-MFEND: an efficient knowledge graph-based model for multi-domain fake news detection. KG-MFEND：一种高效的基于知识图的多领域假新闻检测模型。

IF 3.3 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Supercomputing

Pub Date : 2023-05-15 DOI: 10.1007/s11227-023-05381-2

Lifang Fu, Huanxin Peng, Shuai Liu

The widespread dissemination of fake news on social media brings adverse effects on the public and social development. Most existing techniques are limited to a single domain (e.g., medicine or politics) to identify fake news. However, many differences exist commonly across domains, such as word usage, which lead to those methods performing poorly in other domains. In the real world, social media releases millions of news pieces in diverse domains every day. Therefore, it is of significant practical importance to propose a fake news detection model that can be applied to multiple domains. In this paper, we propose a novel framework based on knowledge graphs (KG) for multi-domain fake news detection, named KG-MFEND. The model's performance is enhanced by improving the BERT and integrating external knowledge to alleviate domain differences at the word level. Specifically, we construct a new KG that encompasses multi-domain knowledge and injects entity triples to build a sentence tree to enrich the news background knowledge. To solve the problem of embedding space and knowledge noise, we use the soft position and visible matrix in knowledge embedding. To reduce the influence of label noise, we add label smoothing to the training. Extensive experiments are conducted on real Chinese datasets. And the results show that KG-MFEND has a strong generalization capability in single, mixed, and multiple domains and outperforms the current state-of-the-art methods for multi-domain fake news detection.

假新闻在社交媒体上的广泛传播给公众和社会发展带来了不利影响。大多数现有技术仅限于一个领域（如医学或政治）来识别假新闻。然而，跨领域通常存在许多差异，例如单词用法，这导致这些方法在其他领域中表现不佳。在现实世界中，社交媒体每天在不同领域发布数百万条新闻。因此，提出一种适用于多个领域的假新闻检测模型具有重要的现实意义。在本文中，我们提出了一种新的基于知识图（KG）的多域假新闻检测框架，称为KG-MFEND。通过改进BERT和整合外部知识来缓解单词层面的领域差异，提高了模型的性能。具体来说，我们构建了一个包含多领域知识的新KG，并注入实体三元组来构建句子树，以丰富新闻背景知识。为了解决嵌入空间和知识噪声的问题，我们在知识嵌入中使用了软位置和可见矩阵。为了减少标签噪声的影响，我们在训练中添加了标签平滑。在真实的中国数据集上进行了广泛的实验。结果表明，KG-MFEND在单域、混合域和多域中具有较强的泛化能力，优于目前最先进的多域假新闻检测方法。

{"title":"KG-MFEND: an efficient knowledge graph-based model for multi-domain fake news detection.","authors":"Lifang Fu, Huanxin Peng, Shuai Liu","doi":"10.1007/s11227-023-05381-2","DOIUrl":"10.1007/s11227-023-05381-2","url":null,"abstract":"The widespread dissemination of fake news on social media brings adverse effects on the public and social development. Most existing techniques are limited to a single domain (e.g., medicine or politics) to identify fake news. However, many differences exist commonly across domains, such as word usage, which lead to those methods performing poorly in other domains. In the real world, social media releases millions of news pieces in diverse domains every day. Therefore, it is of significant practical importance to propose a fake news detection model that can be applied to multiple domains. In this paper, we propose a novel framework based on knowledge graphs (KG) for multi-domain fake news detection, named KG-MFEND. The model's performance is enhanced by improving the BERT and integrating external knowledge to alleviate domain differences at the word level. Specifically, we construct a new KG that encompasses multi-domain knowledge and injects entity triples to build a sentence tree to enrich the news background knowledge. To solve the problem of embedding space and knowledge noise, we use the soft position and visible matrix in knowledge embedding. To reduce the influence of label noise, we add label smoothing to the training. Extensive experiments are conducted on real Chinese datasets. And the results show that KG-MFEND has a strong generalization capability in single, mixed, and multiple domains and outperforms the current state-of-the-art methods for multi-domain fake news detection.","PeriodicalId":50034,"journal":{"name":"Journal of Supercomputing","volume":" ","pages":"1-28"},"PeriodicalIF":3.3,"publicationDate":"2023-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10184086/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9713875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

XAI-reduct: accuracy preservation despite dimensionality reduction for heart disease classification using explainable AI. XAI还原：使用可解释人工智能对心脏病分类进行降维，但仍能保持准确性。

IF 3.3 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Supercomputing

Pub Date : 2023-05-12 DOI: 10.1007/s11227-023-05356-3

Surajit Das, Mahamuda Sultana, Suman Bhattacharya, Diganta Sengupta, Debashis De

Machine learning (ML) has been used for classification of heart diseases for almost a decade, although understanding of the internal working of the black boxes, i.e., non-interpretable models, remain a demanding problem. Another major challenge in such ML models is the curse of dimensionality leading to resource intensive classification using the comprehensive set of feature vector (CFV). This study focuses on dimensionality reduction using explainable artificial intelligence, without negotiating on accuracy for heart disease classification. Four explainable ML models, using SHAP, were used for classification which reflected the feature contributions (FC) and feature weights (FW) for each feature in the CFV for generating the final results. FC and FW were taken into account in generating the reduced dimensional feature subset (FS). The findings of the study are as follows: (a) XGBoost classifies heart diseases best with explanations, with an increase in 2% in model accuracy over existing best proposals, (b) explainable classification using FS exhibits better accuracy than most of the literary proposals, and (c) with the increase in explainability, accuracy can be preserved using XGBoost classifier for classifying heart diseases, and (d) the top four features responsible for diagnosis of heart disease have been exhibited which have common occurrences in all the explanations reflected by the five explainable techniques used on XGBoost classifier based on feature contributions. To the best of our knowledge, this is first attempt to explain XGBoost classification for diagnosis of heart diseases using five explainable techniques.

机器学习（ML）已用于心脏病分类近十年，尽管了解黑匣子的内部工作，即不可解释的模型，仍然是一个棘手的问题。这种ML模型中的另一个主要挑战是维度诅咒，导致使用综合特征向量集（CFV）进行资源密集型分类。这项研究的重点是使用可解释的人工智能进行降维，而没有就心脏病分类的准确性进行协商。使用SHAP的四个可解释的ML模型用于分类，这些模型反映了CFV中每个特征的特征贡献（FC）和特征权重（FW），以生成最终结果。在生成降维特征子集（FS）时考虑了FC和FW。研究结果如下：（a）XGBoost对心脏病进行了最好的解释分类，模型准确度比现有的最佳建议提高了2%；（b）使用FS的可解释分类比大多数文献建议表现出更好的准确度；（c）随着可解释性的提高，使用XGBoost分类器对心脏病进行分类可以保持准确性，并且（d）在XGBoost基于特征贡献的分类器上使用的五种可解释技术所反映的所有解释中，都表现出了负责心脏病诊断的前四个特征。据我们所知，这是首次尝试使用五种可解释的技术来解释XGBoost分类用于心脏病诊断。

{"title":"XAI-reduct: accuracy preservation despite dimensionality reduction for heart disease classification using explainable AI.","authors":"Surajit Das, Mahamuda Sultana, Suman Bhattacharya, Diganta Sengupta, Debashis De","doi":"10.1007/s11227-023-05356-3","DOIUrl":"10.1007/s11227-023-05356-3","url":null,"abstract":"Machine learning (ML) has been used for classification of heart diseases for almost a decade, although understanding of the internal working of the black boxes, i.e., non-interpretable models, remain a demanding problem. Another major challenge in such ML models is the curse of dimensionality leading to resource intensive classification using the comprehensive set of feature vector (CFV). This study focuses on dimensionality reduction using explainable artificial intelligence, without negotiating on accuracy for heart disease classification. Four explainable ML models, using SHAP, were used for classification which reflected the feature contributions (FC) and feature weights (FW) for each feature in the CFV for generating the final results. FC and FW were taken into account in generating the reduced dimensional feature subset (FS). The findings of the study are as follows: (a) XGBoost classifies heart diseases best with explanations, with an increase in 2% in model accuracy over existing best proposals, (b) explainable classification using FS exhibits better accuracy than most of the literary proposals, and (c) with the increase in explainability, accuracy can be preserved using XGBoost classifier for classifying heart diseases, and (d) the top four features responsible for diagnosis of heart disease have been exhibited which have common occurrences in all the explanations reflected by the five explainable techniques used on XGBoost classifier based on feature contributions. To the best of our knowledge, this is first attempt to explain XGBoost classification for diagnosis of heart diseases using five explainable techniques.","PeriodicalId":50034,"journal":{"name":"Journal of Supercomputing","volume":" ","pages":"1-31"},"PeriodicalIF":3.3,"publicationDate":"2023-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10177719/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9713872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Composition of caching and classification in edge computing based on quality optimization for SDN-based IoT healthcare solutions. 基于SDN的物联网医疗解决方案的基于质量优化的边缘计算缓存和分类组成。

IF 3.3 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Supercomputing

Pub Date : 2023-05-09 DOI: 10.1007/s11227-023-05332-x

Seyedeh Shabnam Jazaeri, Parvaneh Asghari, Sam Jabbehdari, Hamid Haj Seyyed Javadi

This paper proposes a novel approach that uses a spectral clustering method to cluster patients with e-health IoT devices based on their similarity and distance and connect each cluster to an SDN edge node for efficient caching. The proposed MFO-Edge Caching algorithm is considered for selecting the near-optimal data options for caching based on considered criteria and improving QoS. Experimental results demonstrate that the proposed approach outperforms other methods in terms of performance, achieving decrease in average time between data retrieval delays and the cache hit rate of 76%. Emergency and on-demand requests are prioritized for caching response packets, while periodic requests have a lower cache hit ratio of 35%. The approach shows improvement in performance compared to other methods, highlighting the effectiveness of SDN-Edge caching and clustering for optimizing e-health network resources.

本文提出了一种新的方法，该方法使用频谱聚类方法，根据患者的相似性和距离，将其与电子健康物联网设备进行聚类，并将每个聚类连接到SDN边缘节点，以实现高效缓存。所提出的MFO边缘缓存算法被考虑用于基于所考虑的标准选择用于缓存的接近最优的数据选项并提高QoS。实验结果表明，该方法在性能上优于其他方法，平均数据检索延迟时间减少，缓存命中率降低76%。紧急请求和按需请求优先缓存响应数据包，而定期请求的缓存命中率较低，为35%。与其他方法相比，该方法的性能有所提高，突出了SDN边缘缓存和集群在优化电子健康网络资源方面的有效性。

引用次数: 2