Pub Date : 2023-06-05DOI: 10.1007/s11227-023-05423-9
Azam Seilsepour, Reza Ravanmehr, Ramin Nassiri
Sentiment Analysis (SA) is a domain- or topic-dependent task since polarity terms convey different sentiments in various domains. Hence, machine learning models trained on a specific domain cannot be employed in other domains, and existing domain-independent lexicons cannot correctly recognize the polarity of domain-specific polarity terms. Conventional approaches of Topic Sentiment Analysis perform Topic Modeling (TM) and SA sequentially, utilizing the previously trained models on irrelevant datasets for classifying sentiments that cannot provide acceptable accuracy. However, some researchers perform TM and SA simultaneously using topic-sentiment joint models, which require a list of seeds and their sentiments from widely used domain-independent lexicons. As a result, these methods cannot find the polarity of domain-specific terms correctly. This paper proposes a novel supervised hybrid TSA approach, called Embedding Topic Sentiment Analysis using Deep Neural Networks (ETSANet), that extracts the semantic relationships between the hidden topics and the training dataset using Semantically Topic-Related Documents Finder (STRDF). STRDF discovers those training documents in the same context as the topic based on the semantic relationships between the Semantic Topic Vector, a newly introduced concept that encompasses the semantic aspects of a topic, and the training dataset. Then, a hybrid CNN-GRU model is trained by these semantically topic-related documents. Moreover, a hybrid metaheuristic method utilizing Grey Wolf Optimization and Whale Optimization Algorithm is employed to fine-tune the hyperparameters of the CNN-GRU network. The evaluation results demonstrate that ETSANet increases the accuracy of the state-of-the-art methods by 1.92%.
{"title":"Topic sentiment analysis based on deep neural network using document embedding technique.","authors":"Azam Seilsepour, Reza Ravanmehr, Ramin Nassiri","doi":"10.1007/s11227-023-05423-9","DOIUrl":"10.1007/s11227-023-05423-9","url":null,"abstract":"<p><p>Sentiment Analysis (SA) is a domain- or topic-dependent task since polarity terms convey different sentiments in various domains. Hence, machine learning models trained on a specific domain cannot be employed in other domains, and existing domain-independent lexicons cannot correctly recognize the polarity of domain-specific polarity terms. Conventional approaches of Topic Sentiment Analysis perform Topic Modeling (TM) and SA sequentially, utilizing the previously trained models on irrelevant datasets for classifying sentiments that cannot provide acceptable accuracy. However, some researchers perform TM and SA simultaneously using topic-sentiment joint models, which require a list of seeds and their sentiments from widely used domain-independent lexicons. As a result, these methods cannot find the polarity of domain-specific terms correctly. This paper proposes a novel supervised hybrid TSA approach, called Embedding Topic Sentiment Analysis using Deep Neural Networks (ETSANet), that extracts the semantic relationships between the hidden topics and the training dataset using Semantically Topic-Related Documents Finder (STRDF). STRDF discovers those training documents in the same context as the topic based on the semantic relationships between the Semantic Topic Vector, a newly introduced concept that encompasses the semantic aspects of a topic, and the training dataset. Then, a hybrid CNN-GRU model is trained by these semantically topic-related documents. Moreover, a hybrid metaheuristic method utilizing Grey Wolf Optimization and Whale Optimization Algorithm is employed to fine-tune the hyperparameters of the CNN-GRU network. The evaluation results demonstrate that ETSANet increases the accuracy of the state-of-the-art methods by 1.92%.</p>","PeriodicalId":50034,"journal":{"name":"Journal of Supercomputing","volume":" ","pages":"1-39"},"PeriodicalIF":2.5,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10241384/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10091321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-02DOI: 10.1007/s11227-023-05421-x
Jinxiang Feng, Jie Xu, Yizhi Deng, Jun Gao
Inspired by Fechner's law, we propose a Fechner multiscale local descriptor (FMLD) for feature extraction and face recognition. Fechner's law is a well-known law in psychology, which states that a human perception is proportional to the logarithm of the intensity of the corresponding significant differences physical quantity. FMLD uses the significant difference between pixels to simulate the pattern perception of human beings to the changes of surroundings. The first round of feature extraction is performed in two local domains of different sizes to capture the structural features of the facial images, resulting in four facial feature images. In the second round of feature extraction, two binary patterns are used to extract local features on the obtained magnitude and direction feature images, and four corresponding feature maps are output. Finally, all feature maps are fused to form an overall histogram feature. Different from the existing descriptors, the FMLD's magnitude and direction features are not isolated. They are derived from the "perceived intensity", thus there is a close relationship between them, which further facilitates the feature representation. In the experiments, we evaluated the performance of FMLD in multiple face databases and compared it with the leading edge approaches. The results show that the proposed FMLD performs well in recognizing images with illumination, pose, expression and occlusion changes. The results also indicate that the feature images produced by FMLD significantly improve the performance of convolutional neural network (CNN), and the combination of FMLD and CNN exhibits better performance than other advanced descriptors.
{"title":"A Fechner multiscale local descriptor for face recognition.","authors":"Jinxiang Feng, Jie Xu, Yizhi Deng, Jun Gao","doi":"10.1007/s11227-023-05421-x","DOIUrl":"10.1007/s11227-023-05421-x","url":null,"abstract":"<p><p>Inspired by Fechner's law, we propose a Fechner multiscale local descriptor (FMLD) for feature extraction and face recognition. Fechner's law is a well-known law in psychology, which states that a human perception is proportional to the logarithm of the intensity of the corresponding significant differences physical quantity. FMLD uses the significant difference between pixels to simulate the pattern perception of human beings to the changes of surroundings. The first round of feature extraction is performed in two local domains of different sizes to capture the structural features of the facial images, resulting in four facial feature images. In the second round of feature extraction, two binary patterns are used to extract local features on the obtained magnitude and direction feature images, and four corresponding feature maps are output. Finally, all feature maps are fused to form an overall histogram feature. Different from the existing descriptors, the FMLD's magnitude and direction features are not isolated. They are derived from the \"perceived intensity\", thus there is a close relationship between them, which further facilitates the feature representation. In the experiments, we evaluated the performance of FMLD in multiple face databases and compared it with the leading edge approaches. The results show that the proposed FMLD performs well in recognizing images with illumination, pose, expression and occlusion changes. The results also indicate that the feature images produced by FMLD significantly improve the performance of convolutional neural network (CNN), and the combination of FMLD and CNN exhibits better performance than other advanced descriptors.</p>","PeriodicalId":50034,"journal":{"name":"Journal of Supercomputing","volume":" ","pages":"1-28"},"PeriodicalIF":3.3,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10234800/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10072649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-31DOI: 10.1007/s11227-023-05410-0
Alladoumbaye Ngueilbaye, Joshua Zhexue Huang, Mehak Khan, Hongzhi Wang
For decision-making support and evidence based on healthcare, high quality data are crucial, particularly if the emphasized knowledge is lacking. For public health practitioners and researchers, the reporting of COVID-19 data need to be accurate and easily available. Each nation has a system in place for reporting COVID-19 data, albeit these systems' efficacy has not been thoroughly evaluated. However, the current COVID-19 pandemic has shown widespread flaws in data quality. We propose a data quality model (canonical data model, four adequacy levels, and Benford's law) to assess the quality issue of COVID-19 data reporting carried out by the World Health Organization (WHO) in the six Central African Economic and Monitory Community (CEMAC) region countries between March 6,2020, and June 22, 2022, and suggest potential solutions. These levels of data quality sufficiency can be interpreted as dependability indicators and sufficiency of Big Dataset inspection. This model effectively identified the quality of the entry data for big dataset analytics. The future development of this model requires scholars and institutions from all sectors to deepen their understanding of its core concepts, improve integration with other data processing technologies, and broaden the scope of its applications.
{"title":"Data quality model for assessing public COVID-19 big datasets.","authors":"Alladoumbaye Ngueilbaye, Joshua Zhexue Huang, Mehak Khan, Hongzhi Wang","doi":"10.1007/s11227-023-05410-0","DOIUrl":"10.1007/s11227-023-05410-0","url":null,"abstract":"<p><p>For decision-making support and evidence based on healthcare, high quality data are crucial, particularly if the emphasized knowledge is lacking. For public health practitioners and researchers, the reporting of COVID-19 data need to be accurate and easily available. Each nation has a system in place for reporting COVID-19 data, albeit these systems' efficacy has not been thoroughly evaluated. However, the current COVID-19 pandemic has shown widespread flaws in data quality. We propose a data quality model (canonical data model, four adequacy levels, and Benford's law) to assess the quality issue of COVID-19 data reporting carried out by the World Health Organization (WHO) in the six Central African Economic and Monitory Community (CEMAC) region countries between March 6,2020, and June 22, 2022, and suggest potential solutions. These levels of data quality sufficiency can be interpreted as dependability indicators and sufficiency of Big Dataset inspection. This model effectively identified the quality of the entry data for big dataset analytics. The future development of this model requires scholars and institutions from all sectors to deepen their understanding of its core concepts, improve integration with other data processing technologies, and broaden the scope of its applications.</p>","PeriodicalId":50034,"journal":{"name":"Journal of Supercomputing","volume":" ","pages":"1-33"},"PeriodicalIF":2.5,"publicationDate":"2023-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10230148/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9713878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-25DOI: 10.1007/s11227-023-05393-y
Fengmei Chen, Bin Zhao, Yilong Gao, Wenyin Zhang
With the increase in the market share of data trading, the risks such as identity authentication and authority management are increasingly intensified. Aiming at the problems of centralization of identity authentication, dynamic changes of identities, and ambiguity of trading authority in data trading, a two-factor dynamic identity authentication scheme for data trading based on alliance chain (BTDA) is proposed. Firstly, the use of identity certificates is simplified to solve the problems of large calculation and difficult storage. Secondly, a two-factor dynamic authentication strategy is designed, which uses distributed ledger to achieve dynamic identity authentication throughout the data trading. Finally, a simulation experiment is carried out on the proposed scheme. The theoretical comparison and analysis with similar schemes show that the proposed scheme has lower cost, higher authentication efficiency and security, easier authority management, and can be widely used in various fields of data trading scenarios.
{"title":"BTDA: Two-factor dynamic identity authentication scheme for data trading based on alliance chain.","authors":"Fengmei Chen, Bin Zhao, Yilong Gao, Wenyin Zhang","doi":"10.1007/s11227-023-05393-y","DOIUrl":"10.1007/s11227-023-05393-y","url":null,"abstract":"<p><p>With the increase in the market share of data trading, the risks such as identity authentication and authority management are increasingly intensified. Aiming at the problems of centralization of identity authentication, dynamic changes of identities, and ambiguity of trading authority in data trading, a two-factor dynamic identity authentication scheme for data trading based on alliance chain (BTDA) is proposed. Firstly, the use of identity certificates is simplified to solve the problems of large calculation and difficult storage. Secondly, a two-factor dynamic authentication strategy is designed, which uses distributed ledger to achieve dynamic identity authentication throughout the data trading. Finally, a simulation experiment is carried out on the proposed scheme. The theoretical comparison and analysis with similar schemes show that the proposed scheme has lower cost, higher authentication efficiency and security, easier authority management, and can be widely used in various fields of data trading scenarios.</p>","PeriodicalId":50034,"journal":{"name":"Journal of Supercomputing","volume":" ","pages":"1-20"},"PeriodicalIF":3.3,"publicationDate":"2023-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10209950/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9713880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-19DOI: 10.1007/s11227-023-05364-3
Raman Kumar, Anuj Jain
The transportation industry's focus on improving performance and reducing costs has driven the integration of IoT and machine learning technologies. The correlation between driving style and behavior with fuel consumption and emissions has highlighted the need to classify different driver's driving patterns. In response, vehicles now come equipped with sensors that gather a wide range of operational data. The proposed technique collects critical vehicle performance data, including speed, motor RPM, paddle position, determined motor load, and over 50 other parameters through the OBD interface. The OBD-II diagnostics protocol, the primary diagnostic process used by technicians, can acquire this information via the car's communication port. OBD-II protocol is used to acquire real-time data linked to the vehicle's operation. This data are used to collect engine operation-related characteristics and assist with fault detection. The proposed method uses machine learning techniques, such as SVM, AdaBoost, and Random Forest, to classify driver's behavior based on ten categories that include fuel consumption, steering stability, velocity stability, and braking patterns. The solution offers an effective means to study driving behavior and recommend corrective actions for efficient and safe driving. The proposed model offers a classification of ten driver classes based on fuel consumption, steering stability, velocity stability, and braking patterns. This research work uses data extracted from the engine's internal sensors via the OBD-II protocol, eliminating the need for additional sensors. The collected data are used to build a model that classifies driver's behavior and can be used to provide feedback to improve driving habits. Key driving events, such as high-speed braking, rapid acceleration, deceleration, and turning, are used to characterize individual drivers. Visualization techniques, such as line plots and correlation matrices, are used to compare drivers' performance. Time-series values of the sensor data are considered in the model. The supervised learning methods are employed to compare all driver classes. SVM, AdaBoost, and Random Forest algorithms are implemented with 99%, 99%, and 100% accuracy, respectively. The suggested model offers a practical approach to examining driving behavior and suggesting necessary measures to enhance driving safety and efficiency.
{"title":"Driving behavior analysis and classification by vehicle OBD data using machine learning.","authors":"Raman Kumar, Anuj Jain","doi":"10.1007/s11227-023-05364-3","DOIUrl":"10.1007/s11227-023-05364-3","url":null,"abstract":"<p><p>The transportation industry's focus on improving performance and reducing costs has driven the integration of IoT and machine learning technologies. The correlation between driving style and behavior with fuel consumption and emissions has highlighted the need to classify different driver's driving patterns. In response, vehicles now come equipped with sensors that gather a wide range of operational data. The proposed technique collects critical vehicle performance data, including speed, motor RPM, paddle position, determined motor load, and over 50 other parameters through the OBD interface. The OBD-II diagnostics protocol, the primary diagnostic process used by technicians, can acquire this information via the car's communication port. OBD-II protocol is used to acquire real-time data linked to the vehicle's operation. This data are used to collect engine operation-related characteristics and assist with fault detection. The proposed method uses machine learning techniques, such as SVM, AdaBoost, and Random Forest, to classify driver's behavior based on ten categories that include fuel consumption, steering stability, velocity stability, and braking patterns. The solution offers an effective means to study driving behavior and recommend corrective actions for efficient and safe driving. The proposed model offers a classification of ten driver classes based on fuel consumption, steering stability, velocity stability, and braking patterns. This research work uses data extracted from the engine's internal sensors via the OBD-II protocol, eliminating the need for additional sensors. The collected data are used to build a model that classifies driver's behavior and can be used to provide feedback to improve driving habits. Key driving events, such as high-speed braking, rapid acceleration, deceleration, and turning, are used to characterize individual drivers. Visualization techniques, such as line plots and correlation matrices, are used to compare drivers' performance. Time-series values of the sensor data are considered in the model. The supervised learning methods are employed to compare all driver classes. SVM, AdaBoost, and Random Forest algorithms are implemented with 99%, 99%, and 100% accuracy, respectively. The suggested model offers a practical approach to examining driving behavior and suggesting necessary measures to enhance driving safety and efficiency.</p>","PeriodicalId":50034,"journal":{"name":"Journal of Supercomputing","volume":" ","pages":"1-20"},"PeriodicalIF":3.3,"publicationDate":"2023-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10198028/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10091322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-18DOI: 10.1007/s11227-023-05327-8
Zhuoqing Chang, Shubo Liu, Run Qiu, Song Song, Zhaohui Cai, Guoqing Tu
Internet of Things realizes the ubiquitous connection of all things, generating countless time-tagged data called time series. However, real-world time series are often plagued with missing values on account of noise or malfunctioning sensors. Existing methods for modeling such incomplete time series typically involve preprocessing steps, such as deletion or missing data imputation using statistical learning or machine learning methods. Unfortunately, these methods unavoidable destroy time information and bring error accumulation to the subsequent model. To this end, this paper introduces a novel continuous neural network architecture, named Time-aware Neural-Ordinary Differential Equations (TN-ODE), for incomplete time data modeling. The proposed method not only supports imputation missing values at arbitrary time points, but also enables multi-step prediction at desired time points. Specifically, TN-ODE employs a time-aware Long Short-Term Memory as an encoder, which effectively learns the posterior distribution from partial observed data. Additionally, the derivative of latent states is parameterized with a fully connected network, thereby enabling continuous-time latent dynamics generation. The proposed TN-ODE model is evaluated on both real-world and synthetic incomplete time-series datasets by conducting data interpolation and extrapolation tasks as well as classification task. Extensive experiments show the TN-ODE model outperforms baseline methods in terms of Mean Square Error for imputation and prediction tasks, as well as accuracy in downstream classification task.
{"title":"Time-aware neural ordinary differential equations for incomplete time series modeling.","authors":"Zhuoqing Chang, Shubo Liu, Run Qiu, Song Song, Zhaohui Cai, Guoqing Tu","doi":"10.1007/s11227-023-05327-8","DOIUrl":"10.1007/s11227-023-05327-8","url":null,"abstract":"<p><p>Internet of Things realizes the ubiquitous connection of all things, generating countless time-tagged data called time series. However, real-world time series are often plagued with missing values on account of noise or malfunctioning sensors. Existing methods for modeling such incomplete time series typically involve preprocessing steps, such as deletion or missing data imputation using statistical learning or machine learning methods. Unfortunately, these methods unavoidable destroy time information and bring error accumulation to the subsequent model. To this end, this paper introduces a novel continuous neural network architecture, named Time-aware Neural-Ordinary Differential Equations (TN-ODE), for incomplete time data modeling. The proposed method not only supports imputation missing values at arbitrary time points, but also enables multi-step prediction at desired time points. Specifically, TN-ODE employs a time-aware Long Short-Term Memory as an encoder, which effectively learns the posterior distribution from partial observed data. Additionally, the derivative of latent states is parameterized with a fully connected network, thereby enabling continuous-time latent dynamics generation. The proposed TN-ODE model is evaluated on both real-world and synthetic incomplete time-series datasets by conducting data interpolation and extrapolation tasks as well as classification task. Extensive experiments show the TN-ODE model outperforms baseline methods in terms of Mean Square Error for imputation and prediction tasks, as well as accuracy in downstream classification task.</p>","PeriodicalId":50034,"journal":{"name":"Journal of Supercomputing","volume":" ","pages":"1-29"},"PeriodicalIF":3.3,"publicationDate":"2023-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10192786/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10091324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-17DOI: 10.1007/s11227-023-05376-z
Ying-Ho Liu, Chia-Yu Kuo
With the Internet becoming indispensable in our lives, social media has become an integral part of our lives. However, with this has come the phenomenon of a single user registering multiple accounts (sockpuppets) to advertise, spam, or cause controversy on social media sites, where the user is called the puppetmaster. This phenomenon is even more evident on forum-oriented social media sites. Identifying sockpuppets is a critical step in stopping the above-mentioned malicious acts. The identification of sockpuppets on a single forum-oriented social media site has seldom been addressed. This paper proposes a Single-site Multiple Accounts Identification Model (SiMAIM) framework to address this research gap. We used Mobile01, Taiwan's most popular forum-oriented social media site, to validate SiMAIM's performance. SiMAIM achieved F1 scores between 0.6 and 0.9 on identifying sockpuppets and puppetmasters under different datasets and settings. SiMAIM also outperformed the compared methods by 6-38% in F1 score.
{"title":"SiMAIM: identifying sockpuppets and puppetmasters on a single forum-oriented social media site.","authors":"Ying-Ho Liu, Chia-Yu Kuo","doi":"10.1007/s11227-023-05376-z","DOIUrl":"10.1007/s11227-023-05376-z","url":null,"abstract":"<p><p>With the Internet becoming indispensable in our lives, social media has become an integral part of our lives. However, with this has come the phenomenon of a single user registering multiple accounts (<i>sockpuppets</i>) to advertise, spam, or cause controversy on social media sites, where the user is called the <i>puppetmaster</i>. This phenomenon is even more evident on forum-oriented social media sites. Identifying sockpuppets is a critical step in stopping the above-mentioned malicious acts. The identification of sockpuppets on a single forum-oriented social media site has seldom been addressed. This paper proposes a <i>Single-site Multiple Accounts Identification Model</i> (<i>SiMAIM</i>) framework to address this research gap. We used Mobile01, Taiwan's most popular forum-oriented social media site, to validate SiMAIM's performance. SiMAIM achieved F1 scores between 0.6 and 0.9 on identifying sockpuppets and puppetmasters under different datasets and settings. SiMAIM also outperformed the compared methods by 6-38% in F1 score.</p>","PeriodicalId":50034,"journal":{"name":"Journal of Supercomputing","volume":" ","pages":"1-32"},"PeriodicalIF":3.3,"publicationDate":"2023-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10188322/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9686586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-15DOI: 10.1007/s11227-023-05381-2
Lifang Fu, Huanxin Peng, Shuai Liu
The widespread dissemination of fake news on social media brings adverse effects on the public and social development. Most existing techniques are limited to a single domain (e.g., medicine or politics) to identify fake news. However, many differences exist commonly across domains, such as word usage, which lead to those methods performing poorly in other domains. In the real world, social media releases millions of news pieces in diverse domains every day. Therefore, it is of significant practical importance to propose a fake news detection model that can be applied to multiple domains. In this paper, we propose a novel framework based on knowledge graphs (KG) for multi-domain fake news detection, named KG-MFEND. The model's performance is enhanced by improving the BERT and integrating external knowledge to alleviate domain differences at the word level. Specifically, we construct a new KG that encompasses multi-domain knowledge and injects entity triples to build a sentence tree to enrich the news background knowledge. To solve the problem of embedding space and knowledge noise, we use the soft position and visible matrix in knowledge embedding. To reduce the influence of label noise, we add label smoothing to the training. Extensive experiments are conducted on real Chinese datasets. And the results show that KG-MFEND has a strong generalization capability in single, mixed, and multiple domains and outperforms the current state-of-the-art methods for multi-domain fake news detection.
{"title":"KG-MFEND: an efficient knowledge graph-based model for multi-domain fake news detection.","authors":"Lifang Fu, Huanxin Peng, Shuai Liu","doi":"10.1007/s11227-023-05381-2","DOIUrl":"10.1007/s11227-023-05381-2","url":null,"abstract":"<p><p>The widespread dissemination of fake news on social media brings adverse effects on the public and social development. Most existing techniques are limited to a single domain (e.g., medicine or politics) to identify fake news. However, many differences exist commonly across domains, such as word usage, which lead to those methods performing poorly in other domains. In the real world, social media releases millions of news pieces in diverse domains every day. Therefore, it is of significant practical importance to propose a fake news detection model that can be applied to multiple domains. In this paper, we propose a novel framework based on knowledge graphs (KG) for multi-domain fake news detection, named KG-MFEND. The model's performance is enhanced by improving the BERT and integrating external knowledge to alleviate domain differences at the word level. Specifically, we construct a new KG that encompasses multi-domain knowledge and injects entity triples to build a sentence tree to enrich the news background knowledge. To solve the problem of embedding space and knowledge noise, we use the soft position and visible matrix in knowledge embedding. To reduce the influence of label noise, we add label smoothing to the training. Extensive experiments are conducted on real Chinese datasets. And the results show that KG-MFEND has a strong generalization capability in single, mixed, and multiple domains and outperforms the current state-of-the-art methods for multi-domain fake news detection.</p>","PeriodicalId":50034,"journal":{"name":"Journal of Supercomputing","volume":" ","pages":"1-28"},"PeriodicalIF":3.3,"publicationDate":"2023-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10184086/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9713875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-12DOI: 10.1007/s11227-023-05356-3
Surajit Das, Mahamuda Sultana, Suman Bhattacharya, Diganta Sengupta, Debashis De
Machine learning (ML) has been used for classification of heart diseases for almost a decade, although understanding of the internal working of the black boxes, i.e., non-interpretable models, remain a demanding problem. Another major challenge in such ML models is the curse of dimensionality leading to resource intensive classification using the comprehensive set of feature vector (CFV). This study focuses on dimensionality reduction using explainable artificial intelligence, without negotiating on accuracy for heart disease classification. Four explainable ML models, using SHAP, were used for classification which reflected the feature contributions (FC) and feature weights (FW) for each feature in the CFV for generating the final results. FC and FW were taken into account in generating the reduced dimensional feature subset (FS). The findings of the study are as follows: (a) XGBoost classifies heart diseases best with explanations, with an increase in 2% in model accuracy over existing best proposals, (b) explainable classification using FS exhibits better accuracy than most of the literary proposals, and (c) with the increase in explainability, accuracy can be preserved using XGBoost classifier for classifying heart diseases, and (d) the top four features responsible for diagnosis of heart disease have been exhibited which have common occurrences in all the explanations reflected by the five explainable techniques used on XGBoost classifier based on feature contributions. To the best of our knowledge, this is first attempt to explain XGBoost classification for diagnosis of heart diseases using five explainable techniques.
{"title":"XAI-reduct: accuracy preservation despite dimensionality reduction for heart disease classification using explainable AI.","authors":"Surajit Das, Mahamuda Sultana, Suman Bhattacharya, Diganta Sengupta, Debashis De","doi":"10.1007/s11227-023-05356-3","DOIUrl":"10.1007/s11227-023-05356-3","url":null,"abstract":"<p><p>Machine learning (ML) has been used for classification of heart diseases for almost a decade, although understanding of the internal working of the black boxes, i.e., non-interpretable models, remain a demanding problem. Another major challenge in such ML models is the curse of dimensionality leading to resource intensive classification using the comprehensive set of feature vector (CFV). This study focuses on dimensionality reduction using explainable artificial intelligence, without negotiating on accuracy for heart disease classification. Four explainable ML models, using SHAP, were used for classification which reflected the feature contributions (FC) and feature weights (FW) for each feature in the CFV for generating the final results. FC and FW were taken into account in generating the reduced dimensional feature subset (FS). The findings of the study are as follows: (a) XGBoost classifies heart diseases best with explanations, with an increase in 2% in model accuracy over existing best proposals, (b) explainable classification using FS exhibits better accuracy than most of the literary proposals, and (c) with the increase in explainability, accuracy can be preserved using XGBoost classifier for classifying heart diseases, and (d) the top four features responsible for diagnosis of heart disease have been exhibited which have common occurrences in all the explanations reflected by the five explainable techniques used on XGBoost classifier based on feature contributions. To the best of our knowledge, this is first attempt to explain XGBoost classification for diagnosis of heart diseases using five explainable techniques.</p>","PeriodicalId":50034,"journal":{"name":"Journal of Supercomputing","volume":" ","pages":"1-31"},"PeriodicalIF":3.3,"publicationDate":"2023-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10177719/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9713872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes a novel approach that uses a spectral clustering method to cluster patients with e-health IoT devices based on their similarity and distance and connect each cluster to an SDN edge node for efficient caching. The proposed MFO-Edge Caching algorithm is considered for selecting the near-optimal data options for caching based on considered criteria and improving QoS. Experimental results demonstrate that the proposed approach outperforms other methods in terms of performance, achieving decrease in average time between data retrieval delays and the cache hit rate of 76%. Emergency and on-demand requests are prioritized for caching response packets, while periodic requests have a lower cache hit ratio of 35%. The approach shows improvement in performance compared to other methods, highlighting the effectiveness of SDN-Edge caching and clustering for optimizing e-health network resources.
{"title":"Composition of caching and classification in edge computing based on quality optimization for SDN-based IoT healthcare solutions.","authors":"Seyedeh Shabnam Jazaeri, Parvaneh Asghari, Sam Jabbehdari, Hamid Haj Seyyed Javadi","doi":"10.1007/s11227-023-05332-x","DOIUrl":"10.1007/s11227-023-05332-x","url":null,"abstract":"<p><p>This paper proposes a novel approach that uses a spectral clustering method to cluster patients with e-health IoT devices based on their similarity and distance and connect each cluster to an SDN edge node for efficient caching. The proposed MFO-Edge Caching algorithm is considered for selecting the near-optimal data options for caching based on considered criteria and improving QoS. Experimental results demonstrate that the proposed approach outperforms other methods in terms of performance, achieving decrease in average time between data retrieval delays and the cache hit rate of 76%. Emergency and on-demand requests are prioritized for caching response packets, while periodic requests have a lower cache hit ratio of 35%. The approach shows improvement in performance compared to other methods, highlighting the effectiveness of SDN-Edge caching and clustering for optimizing e-health network resources.</p>","PeriodicalId":50034,"journal":{"name":"Journal of Supercomputing","volume":" ","pages":"1-51"},"PeriodicalIF":3.3,"publicationDate":"2023-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10169185/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9713879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}