Pub Date : 2024-11-20eCollection Date: 2024-01-01DOI: 10.7717/peerj-cs.2499
Sanpawat Kantabutra
In many combinatorial optimization problems we want a particular set of k out of n items with some certain properties (or constraints). These properties may involve the k items. In the worst case a deterministic algorithm must scan n-k items in the set to verify the k items. If we pick a set of k items randomly and verify the properties, it will take about (n/k)k verifications, which can be a really large number for some values of k and n. In this article we introduce a significantly faster randomized strategy with very high probability to pick the set of such k items by amplifying the probability of obtaining a target set of k items and show how this probability boosting technique can be applied to solve three different combinatorial optimization problems efficiently. In all three applications algorithms that use the probability boosting technique show superiority over their deterministic counterparts.
{"title":"Probability-boosting technique for combinatorial optimization.","authors":"Sanpawat Kantabutra","doi":"10.7717/peerj-cs.2499","DOIUrl":"10.7717/peerj-cs.2499","url":null,"abstract":"<p><p>In many combinatorial optimization problems we want a particular set of k out of n items with some certain properties (or constraints). These properties may involve the k items. In the worst case a deterministic algorithm must scan n-k items in the set to verify the k items. If we pick a set of k items randomly and verify the properties, it will take about (n/k)<sup>k</sup> verifications, which can be a really large number for some values of k and n. In this article we introduce a significantly faster randomized strategy with very high probability to pick the set of such k items by amplifying the probability of obtaining a target set of k items and show how this probability boosting technique can be applied to solve three different combinatorial optimization problems efficiently. In all three applications algorithms that use the probability boosting technique show superiority over their deterministic counterparts.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2499"},"PeriodicalIF":3.5,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11623238/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142803407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-19eCollection Date: 2024-01-01DOI: 10.7717/peerj-cs.2465
Dávid Držík, Frantisek Forgac
This study introduces a new approach to text tokenization, SlovaK Morphological Tokenizer (SKMT), which integrates the morphology of the Slovak language into the training process using the Byte-Pair Encoding (BPE) algorithm. Unlike conventional tokenizers, SKMT focuses on preserving the integrity of word roots in individual tokens, crucial for maintaining lexical meaning. The methodology involves segmenting and extracting word roots from morphological dictionaries and databases, followed by corpus preprocessing and training SKMT alongside a traditional BPE tokenizer. Comparative evaluation against existing tokenizers demonstrates SKMT's outstanding ability to maintain root integrity, achieving 99.7% root integrity compared to SlovakBERT (90.5%) and a pureBPE tokenizer (93.1%). Further validation involved fine-tuning models on a sentiment classification NLP task, where models trained with SKMT achieved an F1-score improvement of 3.5% over those trained with conventional BPE tokenization, followed by a focus on the Semantic Textual Similarity (STS) task. These findings suggest that training language models on the SKMT tokenizer significantly enhances model performance and quality.
{"title":"Slovak morphological tokenizer using the Byte-Pair Encoding algorithm.","authors":"Dávid Držík, Frantisek Forgac","doi":"10.7717/peerj-cs.2465","DOIUrl":"https://doi.org/10.7717/peerj-cs.2465","url":null,"abstract":"<p><p>This study introduces a new approach to text tokenization, SlovaK Morphological Tokenizer (SKMT), which integrates the morphology of the Slovak language into the training process using the Byte-Pair Encoding (BPE) algorithm. Unlike conventional tokenizers, SKMT focuses on preserving the integrity of word roots in individual tokens, crucial for maintaining lexical meaning. The methodology involves segmenting and extracting word roots from morphological dictionaries and databases, followed by <i>corpus</i> preprocessing and training SKMT alongside a traditional BPE tokenizer. Comparative evaluation against existing tokenizers demonstrates SKMT's outstanding ability to maintain root integrity, achieving 99.7% root integrity compared to SlovakBERT (90.5%) and a pureBPE tokenizer (93.1%). Further validation involved fine-tuning models on a sentiment classification NLP task, where models trained with SKMT achieved an F1-score improvement of 3.5% over those trained with conventional BPE tokenization, followed by a focus on the Semantic Textual Similarity (STS) task. These findings suggest that training language models on the SKMT tokenizer significantly enhances model performance and quality.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2465"},"PeriodicalIF":3.5,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11622830/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142803383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-19eCollection Date: 2024-01-01DOI: 10.7717/peerj-cs.2493
Yuanzhi Huo, Mengjie Jin, Sicong You
Crafting a lucrative stock trading strategy is pivotal in the realm of investments. However, the task of devising such a strategy becomes challenging task the intricate and ever-changing situation of the stock market. In recent years, with the development of artificial intelligence (AI), some AI technologies have been proven to be successfully applied in stock price and asset management. For example, long short-term memory networks (LSTM) can be used for predicting stock price variation, reinforcement learning (RL) can be used for control stock trading, however, they are generally used separately and cannot achieve simultaneous prediction and trading. In this study, we propose a hybrid deep learning model to predict stock prices and control stock trading to manage assets. LSTM is responsible for predicting stock prices, while RL is responsible for stock trading based on the predicted price trends. Meanwhile, to reduce uncertainty in the stock market and maximize stock assets, the proposed LSTM model can predict the average directional index (ADX) to comprehend the stock trends in advance and we also propose several constraints to assist assets management, thereby reducing the risk and maximizing the stock assets. In our results, the hybrid model yields an average R2 value of 0.94 when predicting price variations. Moreover, employing the proposed approach, which integrates ADX and constraints, the hybrid model augments stock assets to 1.05 times than initial assets.
{"title":"A study of hybrid deep learning model for stock asset management.","authors":"Yuanzhi Huo, Mengjie Jin, Sicong You","doi":"10.7717/peerj-cs.2493","DOIUrl":"10.7717/peerj-cs.2493","url":null,"abstract":"<p><p>Crafting a lucrative stock trading strategy is pivotal in the realm of investments. However, the task of devising such a strategy becomes challenging task the intricate and ever-changing situation of the stock market. In recent years, with the development of artificial intelligence (AI), some AI technologies have been proven to be successfully applied in stock price and asset management. For example, long short-term memory networks (LSTM) can be used for predicting stock price variation, reinforcement learning (RL) can be used for control stock trading, however, they are generally used separately and cannot achieve simultaneous prediction and trading. In this study, we propose a hybrid deep learning model to predict stock prices and control stock trading to manage assets. LSTM is responsible for predicting stock prices, while RL is responsible for stock trading based on the predicted price trends. Meanwhile, to reduce uncertainty in the stock market and maximize stock assets, the proposed LSTM model can predict the average directional index (ADX) to comprehend the stock trends in advance and we also propose several constraints to assist assets management, thereby reducing the risk and maximizing the stock assets. In our results, the hybrid model yields an average <i>R</i> <sup>2</sup> value of 0.94 when predicting price variations. Moreover, employing the proposed approach, which integrates ADX and constraints, the hybrid model augments stock assets to 1.05 times than initial assets.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2493"},"PeriodicalIF":3.5,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639306/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142830722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-19eCollection Date: 2024-01-01DOI: 10.7717/peerj-cs.2453
Ying Hu, Xiongyan Liu, Hao Chen
To mitigate synchronization errors within a multi-PID controller system and enhance its resistance to interference, an improved competitive and cooperative swarm optimizer for constrained multi-objective optimization (CMOCSO) algorithm is employed to optimize the parameters of the multi-PID controller. Initially, a mathematical model representing the constrained multi-objective problem associated with the multi-PID controller is formulated. In this model, the parameters are designated as decision variables, the performance index serves as the objective function, and the stability constraints of the system are incorporated. Subsequently, an improved CMOCSO algorithm is introduced, which bifurcates the evolutionary process into two distinct stages using a central point-moving strategy; each stage employs different evolutionary techniques to accelerate convergence rates, and a novel grouping strategy is implemented to increase the learning efficiency of the population. The efficacy of the algorithm is evaluated through testing on 16 standard functions, demonstrating its effectiveness in addressing constrained multi-objective problems. Ultimately, the algorithm is applied to optimize the parameters of the multi-PID controller. The simulation results indicate that the proposed method yields superior control performance, reduced synchronization errors, and notable interference resistance capacity.
{"title":"Optimal tuning of multi-PID controller using improved CMOCSO algorithm.","authors":"Ying Hu, Xiongyan Liu, Hao Chen","doi":"10.7717/peerj-cs.2453","DOIUrl":"10.7717/peerj-cs.2453","url":null,"abstract":"<p><p>To mitigate synchronization errors within a multi-PID controller system and enhance its resistance to interference, an improved competitive and cooperative swarm optimizer for constrained multi-objective optimization (CMOCSO) algorithm is employed to optimize the parameters of the multi-PID controller. Initially, a mathematical model representing the constrained multi-objective problem associated with the multi-PID controller is formulated. In this model, the parameters are designated as decision variables, the performance index serves as the objective function, and the stability constraints of the system are incorporated. Subsequently, an improved CMOCSO algorithm is introduced, which bifurcates the evolutionary process into two distinct stages using a central point-moving strategy; each stage employs different evolutionary techniques to accelerate convergence rates, and a novel grouping strategy is implemented to increase the learning efficiency of the population. The efficacy of the algorithm is evaluated through testing on 16 standard functions, demonstrating its effectiveness in addressing constrained multi-objective problems. Ultimately, the algorithm is applied to optimize the parameters of the multi-PID controller. The simulation results indicate that the proposed method yields superior control performance, reduced synchronization errors, and notable interference resistance capacity.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2453"},"PeriodicalIF":3.5,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11622856/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142803397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-19eCollection Date: 2024-01-01DOI: 10.7717/peerj-cs.2514
Oğuz Mısır
The integration of artificial intelligence into the field of robotics enables robots to perform their tasks more meaningfully. In particular, deep-learning methods contribute significantly to robots becoming intelligent cybernetic systems. The effective use of deep-learning mobile cyber-physical systems has enabled mobile robots to become more intelligent. This effective use of deep learning can also help mobile robots determine a safe path. The drivable pathfinding problem involves a mobile robot finding the path to a target in a challenging environment with obstacles. In this paper, a semantic-segmentation-based drivable path detection method is presented for use in the indoor navigation of mobile robots. The proposed method uses a perspective transformation strategy based on transforming high-accuracy segmented images into real-world space. This transformation enables the motion space to be divided into grids, based on the image perceived in a real-world space. A grid-based RRT* navigation strategy was developed that uses images divided into grids to enable the mobile robot to avoid obstacles and meet the optimal path requirements. Smoothing was performed to improve the path planning of the grid-based RRT* and avoid unnecessary turning angles of the mobile robot. Thus, the mobile robot could reach the target in an optimum manner in the drivable area determined by segmentation. Deeplabv3+ and ResNet50 backbone architecture with superior segmentation ability are proposed for accurate determination of drivable path. Gaussian filter was used to reduce the noise caused by segmentation. In addition, multi-otsu thresholding was used to improve the masked images in multiple classes. The segmentation model and backbone architecture were compared in terms of their performance using different methods. DeepLabv3+ and ResNet50 backbone architectures outperformed the other compared methods by 0.21%-4.18% on many metrics. In addition, a mobile robot design is presented to test the proposed drivable path determination method. This design validates the proposed method by using different scenarios in an indoor environment.
{"title":"Drivable path detection for a mobile robot with differential drive using a deep Learning based segmentation method for indoor navigation.","authors":"Oğuz Mısır","doi":"10.7717/peerj-cs.2514","DOIUrl":"10.7717/peerj-cs.2514","url":null,"abstract":"<p><p>The integration of artificial intelligence into the field of robotics enables robots to perform their tasks more meaningfully. In particular, deep-learning methods contribute significantly to robots becoming intelligent cybernetic systems. The effective use of deep-learning mobile cyber-physical systems has enabled mobile robots to become more intelligent. This effective use of deep learning can also help mobile robots determine a safe path. The drivable pathfinding problem involves a mobile robot finding the path to a target in a challenging environment with obstacles. In this paper, a semantic-segmentation-based drivable path detection method is presented for use in the indoor navigation of mobile robots. The proposed method uses a perspective transformation strategy based on transforming high-accuracy segmented images into real-world space. This transformation enables the motion space to be divided into grids, based on the image perceived in a real-world space. A grid-based RRT* navigation strategy was developed that uses images divided into grids to enable the mobile robot to avoid obstacles and meet the optimal path requirements. Smoothing was performed to improve the path planning of the grid-based RRT* and avoid unnecessary turning angles of the mobile robot. Thus, the mobile robot could reach the target in an optimum manner in the drivable area determined by segmentation. Deeplabv3+ and ResNet50 backbone architecture with superior segmentation ability are proposed for accurate determination of drivable path. Gaussian filter was used to reduce the noise caused by segmentation. In addition, multi-otsu thresholding was used to improve the masked images in multiple classes. The segmentation model and backbone architecture were compared in terms of their performance using different methods. DeepLabv3+ and ResNet50 backbone architectures outperformed the other compared methods by 0.21%-4.18% on many metrics. In addition, a mobile robot design is presented to test the proposed drivable path determination method. This design validates the proposed method by using different scenarios in an indoor environment.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2514"},"PeriodicalIF":3.5,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639217/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142830806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-19eCollection Date: 2024-01-01DOI: 10.7717/peerj-cs.2484
Aasma Akram, Fatima Anjum, Sajid Latif, Muhammad Imran Zulfiqar, Mohsin Nazir
The Internet of Things (IoT) paradigm is a foundational and integral factor for the development of smart applications in different sectors. These applications are comprised over set of interconnected modules that exchange data and realize the distributed data flow (DDF) model. The execution of these modules on distant cloud data-center is prone to quality of service (QoS) degradation. This is where fog computing philosophy comes in to bridge this gap and bring the computation closer to the IoT devices. However, resource management in fog and optimal allocation of fog devices to application modules is critical for better resource utilization and achieve QoS. Significant challenge in this regard is to manage the fog network dynamically to determine cost effective placement of application modules on resources. In this study, we propose the optimal placement strategy for smart health-care application modules on fog resources. The objective of this strategy is to ensure optimal execution in terms of latency, bandwidth and earliest completion time as compared to few baseline techniques. A honey bee inspired strategy has been proposed for allocation and utilization of the resource for application module processing. In order to model the application and measure the effectiveness of our strategy, iFogSim Java-based simulation classes have been extended and conduct the experiments that demonstrate the satisfactory results.
{"title":"Honey bee inspired resource allocation scheme for IoT-driven smart healthcare applications in fog-cloud paradigm.","authors":"Aasma Akram, Fatima Anjum, Sajid Latif, Muhammad Imran Zulfiqar, Mohsin Nazir","doi":"10.7717/peerj-cs.2484","DOIUrl":"10.7717/peerj-cs.2484","url":null,"abstract":"<p><p>The Internet of Things (IoT) paradigm is a foundational and integral factor for the development of smart applications in different sectors. These applications are comprised over set of interconnected modules that exchange data and realize the distributed data flow (DDF) model. The execution of these modules on distant cloud data-center is prone to quality of service (QoS) degradation. This is where fog computing philosophy comes in to bridge this gap and bring the computation closer to the IoT devices. However, resource management in fog and optimal allocation of fog devices to application modules is critical for better resource utilization and achieve QoS. Significant challenge in this regard is to manage the fog network dynamically to determine cost effective placement of application modules on resources. In this study, we propose the optimal placement strategy for smart health-care application modules on fog resources. The objective of this strategy is to ensure optimal execution in terms of latency, bandwidth and earliest completion time as compared to few baseline techniques. A honey bee inspired strategy has been proposed for allocation and utilization of the resource for application module processing. In order to model the application and measure the effectiveness of our strategy, iFogSim Java-based simulation classes have been extended and conduct the experiments that demonstrate the satisfactory results.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2484"},"PeriodicalIF":3.5,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11623239/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142803362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-18eCollection Date: 2024-01-01DOI: 10.7717/peerj-cs.2445
Helen L Smith, Patrick J Biggs, Nigel P French, Adam N H Smith, Jonathan C Marshall
Performance of random forest classification models is often assessed and interpreted using out-of-bag (OOB) samples. Observations which are OOB when a tree is trained may serve as a test set for that tree and predictions from the OOB observations used to calculate OOB error and variable importance measures (VIM). OOB errors are popular because they are fast to compute and, for large samples, are a good estimate of the true prediction error. In this study, we investigate how target-based vs. target-agnostic encoding of categorical predictor variables for random forest can bias performance measures based on OOB samples. We show that, when categorical variables are encoded using a target-based encoding method, and when the encoding takes place prior to bagging, the OOB sample can underestimate the true misclassification rate, and overestimate variable importance. We recommend using a separate test data set when evaluating variable importance and/or predictive performance of tree based methods that utilise a target-based encoding method.
{"title":"Out of (the) bag-encoding categorical predictors impacts out-of-bag samples.","authors":"Helen L Smith, Patrick J Biggs, Nigel P French, Adam N H Smith, Jonathan C Marshall","doi":"10.7717/peerj-cs.2445","DOIUrl":"10.7717/peerj-cs.2445","url":null,"abstract":"<p><p>Performance of random forest classification models is often assessed and interpreted using out-of-bag (OOB) samples. Observations which are OOB when a tree is trained may serve as a test set for that tree and predictions from the OOB observations used to calculate OOB error and variable importance measures (VIM). OOB errors are popular because they are fast to compute and, for large samples, are a good estimate of the true prediction error. In this study, we investigate how target-based <i>vs</i>. target-agnostic encoding of categorical predictor variables for random forest can bias performance measures based on OOB samples. We show that, when categorical variables are encoded using a target-based encoding method, and when the encoding takes place prior to bagging, the OOB sample can underestimate the true misclassification rate, and overestimate variable importance. We recommend using a separate test data set when evaluating variable importance and/or predictive performance of tree based methods that utilise a target-based encoding method.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2445"},"PeriodicalIF":3.5,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11623134/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142802804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-18eCollection Date: 2024-01-01DOI: 10.7717/peerj-cs.2473
Dan Yang, Xiaoling Miao
In the hospitality business, cancellations negatively affect the precise estimation of revenue management. With today's powerful computational advances, it is feasible to develop a model to predict cancellations to reduce the risks for business owners. Although these models have not yet been tested in real-world conditions, several prototypes were developed and deployed in two hotels. The their main goal was to study how these models could be incorporated into a decision support system and to assess their influence on demand-management decisions. In our study, we introduce a tree-based neural network (TNN) that combines a tree-based learning algorithm with a feed-forward neural network as a computational method for predicting hotel booking cancellation. Experimental results indicated that the TNN model significantly improved the predictive power on two benchmark datasets compared to tree-based models and baseline artificial neural networks alone. Also, the preliminary success of our study confirmed that tree-based neural networks are promising in dealing with tabular data.
{"title":"Predicting hotel booking cancellations using tree-based neural network.","authors":"Dan Yang, Xiaoling Miao","doi":"10.7717/peerj-cs.2473","DOIUrl":"10.7717/peerj-cs.2473","url":null,"abstract":"<p><p>In the hospitality business, cancellations negatively affect the precise estimation of revenue management. With today's powerful computational advances, it is feasible to develop a model to predict cancellations to reduce the risks for business owners. Although these models have not yet been tested in real-world conditions, several prototypes were developed and deployed in two hotels. The their main goal was to study how these models could be incorporated into a decision support system and to assess their influence on demand-management decisions. In our study, we introduce a tree-based neural network (TNN) that combines a tree-based learning algorithm with a feed-forward neural network as a computational method for predicting hotel booking cancellation. Experimental results indicated that the TNN model significantly improved the predictive power on two benchmark datasets compared to tree-based models and baseline artificial neural networks alone. Also, the preliminary success of our study confirmed that tree-based neural networks are promising in dealing with tabular data.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2473"},"PeriodicalIF":3.5,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11623061/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142803402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-18eCollection Date: 2024-01-01DOI: 10.7717/peerj-cs.2501
Büşra Çalmaz, Belgin Ergenç Bostanoğlu
Clique counting is a crucial task in graph mining, as the count of cliques provides different insights across various domains, social and biological network analysis, community detection, recommendation systems, and fraud detection. Counting cliques is algorithmically challenging due to combinatorial explosion, especially for large datasets and larger clique sizes. There are comprehensive surveys and reviews on algorithms for counting subgraphs and triangles (three-clique), but there is a notable lack of reviews addressing k-clique counting algorithms for k > 3. This paper addresses this gap by reviewing clique counting algorithms designed to overcome this challenge. Also, a systematic analysis and comparison of exact and approximation techniques are provided by highlighting their advantages, disadvantages, and suitability for different contexts. It also presents a taxonomy of clique counting methodologies, covering approximate and exact methods and parallelization strategies. The paper aims to enhance understanding of this specific domain and guide future research of k-clique counting in large-scale graphs.
{"title":"k-Clique counting on large scale-graphs: a survey.","authors":"Büşra Çalmaz, Belgin Ergenç Bostanoğlu","doi":"10.7717/peerj-cs.2501","DOIUrl":"10.7717/peerj-cs.2501","url":null,"abstract":"<p><p>Clique counting is a crucial task in graph mining, as the count of cliques provides different insights across various domains, social and biological network analysis, community detection, recommendation systems, and fraud detection. Counting cliques is algorithmically challenging due to combinatorial explosion, especially for large datasets and larger clique sizes. There are comprehensive surveys and reviews on algorithms for counting subgraphs and triangles (three-clique), but there is a notable lack of reviews addressing k-clique counting algorithms for k > 3. This paper addresses this gap by reviewing clique counting algorithms designed to overcome this challenge. Also, a systematic analysis and comparison of exact and approximation techniques are provided by highlighting their advantages, disadvantages, and suitability for different contexts. It also presents a taxonomy of clique counting methodologies, covering approximate and exact methods and parallelization strategies. The paper aims to enhance understanding of this specific domain and guide future research of k-clique counting in large-scale graphs.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2501"},"PeriodicalIF":3.5,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11622928/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142803084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-13eCollection Date: 2024-01-01DOI: 10.7717/peerj-cs.2385
Heng Guo
Fuzzy preference modeling in intelligent decision support systems aims to improve the efficiency and accuracy of decision-making processes by incorporating fuzzy logic and preference modeling techniques. While network public opinion (NPO) has the potential to drive judicial reform and progress, it also poses challenges to the independence of the judiciary due to the negative impact of malicious public opinion. To tackle this issue within the context of intelligent decision support systems, this study provides an insightful overview of current NPO monitoring technologies. Recognizing the complexities associated with handling large-scale NPO data and mitigating significant interference, a novel judicial domain NPO monitoring model is proposed, which centers around semantic feature analysis. This model takes into account time series characteristics, binary semantic fitting, and public sentiment intensity. Notably, it leverages a bidirectional long short-term memory (Bi-LSTM) network (S-Bi-LSTM) to construct a judicial domain semantic similarity calculation model. The semantic similarity values between sentences are obtained through the utilization of a fully connected layer. Empirical evaluations demonstrate the remarkable performance of the proposed model, achieving an accuracy rate of 85.9% and an F1 value of 87.1 on the test set, surpassing existing sentence semantic similarity models. Ultimately, the proposed model significantly enhances the monitoring capabilities of judicial authorities over NPO, thereby alleviating the burden on public relations faced by judicial institutions and fostering a more equitable execution of judicial power.
{"title":"Design of judicial public opinion supervision and intelligent decision-making model based on Bi-LSTM.","authors":"Heng Guo","doi":"10.7717/peerj-cs.2385","DOIUrl":"10.7717/peerj-cs.2385","url":null,"abstract":"<p><p>Fuzzy preference modeling in intelligent decision support systems aims to improve the efficiency and accuracy of decision-making processes by incorporating fuzzy logic and preference modeling techniques. While network public opinion (NPO) has the potential to drive judicial reform and progress, it also poses challenges to the independence of the judiciary due to the negative impact of malicious public opinion. To tackle this issue within the context of intelligent decision support systems, this study provides an insightful overview of current NPO monitoring technologies. Recognizing the complexities associated with handling large-scale NPO data and mitigating significant interference, a novel judicial domain NPO monitoring model is proposed, which centers around semantic feature analysis. This model takes into account time series characteristics, binary semantic fitting, and public sentiment intensity. Notably, it leverages a bidirectional long short-term memory (Bi-LSTM) network (S-Bi-LSTM) to construct a judicial domain semantic similarity calculation model. The semantic similarity values between sentences are obtained through the utilization of a fully connected layer. Empirical evaluations demonstrate the remarkable performance of the proposed model, achieving an accuracy rate of 85.9% and an F1 value of 87.1 on the test set, surpassing existing sentence semantic similarity models. Ultimately, the proposed model significantly enhances the monitoring capabilities of judicial authorities over NPO, thereby alleviating the burden on public relations faced by judicial institutions and fostering a more equitable execution of judicial power.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2385"},"PeriodicalIF":3.5,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11623130/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142803309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}