Feiqi Liu, Dong Yang, Yuyang Zhang, Chengcai Yang, Jingjing Yang
The rabbit breeding industry exhibits vast economic potential and growth opportunities. Nevertheless, the ineffective prediction of environmental conditions in rabbit houses often leads to the spread of infectious diseases, causing illness and death among rabbits. This paper presents a multi-parameter predictive model for environmental conditions such as temperature, humidity, illumination, CO2 concentration, NH3 concentration, and dust conditions in rabbit houses. The model adeptly distinguishes between day and night forecasts, thereby improving the adaptive adjustment of environmental data trends. Importantly, the model encapsulates multi-parameter environmental forecasting to heighten precision, given the high degree of interrelation among parameters. The model's performance is assessed through RMSE, MAE, and MAPE metrics, yielding values of 0.018, 0.031, and 6.31% respectively in predicting rabbit house environmental factors. Experimentally juxtaposed with Bert, Seq2seq, and conventional transformer models, the method demonstrates superior performance.
{"title":"Research on Multi-Parameter Prediction of Rabbit Housing Environment Based on Transformer","authors":"Feiqi Liu, Dong Yang, Yuyang Zhang, Chengcai Yang, Jingjing Yang","doi":"10.4018/ijdwm.336286","DOIUrl":"https://doi.org/10.4018/ijdwm.336286","url":null,"abstract":"The rabbit breeding industry exhibits vast economic potential and growth opportunities. Nevertheless, the ineffective prediction of environmental conditions in rabbit houses often leads to the spread of infectious diseases, causing illness and death among rabbits. This paper presents a multi-parameter predictive model for environmental conditions such as temperature, humidity, illumination, CO2 concentration, NH3 concentration, and dust conditions in rabbit houses. The model adeptly distinguishes between day and night forecasts, thereby improving the adaptive adjustment of environmental data trends. Importantly, the model encapsulates multi-parameter environmental forecasting to heighten precision, given the high degree of interrelation among parameters. The model's performance is assessed through RMSE, MAE, and MAPE metrics, yielding values of 0.018, 0.031, and 6.31% respectively in predicting rabbit house environmental factors. Experimentally juxtaposed with Bert, Seq2seq, and conventional transformer models, the method demonstrates superior performance.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"78 11","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139440702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The study quantitatively examines how AI-generated cosmetic packaging design impact consumer satisfaction, offering strategies for database-driven development and design based on this evaluation. A comprehensive evaluation system consisting of 18 indicators in five dimensions was constructed by combining literature review and user interviews with expert opinions. On this basis, a questionnaire survey on AI-generated packaging design was conducted based on three types of datasets. In addition, importance-performance analysis was used to analyze the satisfaction of AI-generated packaging design indicators. The study found that while consumers are highly satisfied with the information transmission and creative attraction of AI-generated packaging design, the design's functional availability and user experience still have to be improved. It is suggested that the public model be combined into the data warehouse to build an AI packaging service platform. Focusing on the interpretability and controllability of the design process will also help increase consumer satisfaction and trust.
{"title":"Analyzing AI-Generated Packaging's Impact on Consumer Satisfaction With Three Types of Datasets","authors":"Tao Chen, D. Luh, J. Wang","doi":"10.4018/ijdwm.334024","DOIUrl":"https://doi.org/10.4018/ijdwm.334024","url":null,"abstract":"The study quantitatively examines how AI-generated cosmetic packaging design impact consumer satisfaction, offering strategies for database-driven development and design based on this evaluation. A comprehensive evaluation system consisting of 18 indicators in five dimensions was constructed by combining literature review and user interviews with expert opinions. On this basis, a questionnaire survey on AI-generated packaging design was conducted based on three types of datasets. In addition, importance-performance analysis was used to analyze the satisfaction of AI-generated packaging design indicators. The study found that while consumers are highly satisfied with the information transmission and creative attraction of AI-generated packaging design, the design's functional availability and user experience still have to be improved. It is suggested that the public model be combined into the data warehouse to build an AI packaging service platform. Focusing on the interpretability and controllability of the design process will also help increase consumer satisfaction and trust.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"76 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139218299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Existing book recommendation methods often overlook the rich information contained in the comment text, which can limit their effectiveness. Therefore, a cross-domain recommender system for literary books that leverages multi-head self-attention interaction and knowledge transfer learning is proposed. Firstly, the BERT model is employed to obtain word vectors, and CNN is used to extract user and project features. Then, higher-level features are captured through the fusion of multi-head self-attention and addition pooling. Finally, knowledge transfer learning is introduced to conduct joint modeling between different domains by simultaneously extracting domain-specific features and shared features between domains. On the Amazon dataset, the proposed model achieved MAE and MSE of 0.801 and 1.058 in the “movie-book” recommendation task and 0.787 and 0.805 in the “music-book” recommendation task, respectively. This performance is significantly superior to other advanced recommendation models. Moreover, the proposed model also has good universality on the Chinese dataset.
{"title":"A Cross-Domain Recommender System for Literary Books Using Multi-Head Self-Attention Interaction and Knowledge Transfer Learning","authors":"Yuan Cui, Yuexing Duan, Yueqin Zhang, Li Pan","doi":"10.4018/ijdwm.334122","DOIUrl":"https://doi.org/10.4018/ijdwm.334122","url":null,"abstract":"Existing book recommendation methods often overlook the rich information contained in the comment text, which can limit their effectiveness. Therefore, a cross-domain recommender system for literary books that leverages multi-head self-attention interaction and knowledge transfer learning is proposed. Firstly, the BERT model is employed to obtain word vectors, and CNN is used to extract user and project features. Then, higher-level features are captured through the fusion of multi-head self-attention and addition pooling. Finally, knowledge transfer learning is introduced to conduct joint modeling between different domains by simultaneously extracting domain-specific features and shared features between domains. On the Amazon dataset, the proposed model achieved MAE and MSE of 0.801 and 1.058 in the “movie-book” recommendation task and 0.787 and 0.805 in the “music-book” recommendation task, respectively. This performance is significantly superior to other advanced recommendation models. Moreover, the proposed model also has good universality on the Chinese dataset.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"23 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139220014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Outlier detection for batch and streaming data is an important branch of data mining. However, there are shortcomings for existing algorithms. For batch data, the outlier detection algorithm, only labeling a few data points, is not accurate enough because it uses histogram strategy to generate feature vectors. For streaming data, the outlier detection algorithms are sensitive to data distance, resulting in low accuracy when sparse clusters and dense clusters are close to each other. Moreover, they require tuning of parameters, which takes a lot of time. With this, the manuscript per the authors propose a new outlier detection algorithm, called PDC which use probability density to generate feature vectors to train a lightweight machine learning model that is finally applied to detect outliers. PDC takes advantages of accuracy and insensitivity-to-data-distance of probability density, so it can overcome the aforementioned drawbacks.
{"title":"An Outlier Detection Algorithm Based on Probability Density Clustering","authors":"Wei Wang, Yongjian Ren, Renjie Zhou, Jilin Zhang","doi":"10.4018/ijdwm.333901","DOIUrl":"https://doi.org/10.4018/ijdwm.333901","url":null,"abstract":"Outlier detection for batch and streaming data is an important branch of data mining. However, there are shortcomings for existing algorithms. For batch data, the outlier detection algorithm, only labeling a few data points, is not accurate enough because it uses histogram strategy to generate feature vectors. For streaming data, the outlier detection algorithms are sensitive to data distance, resulting in low accuracy when sparse clusters and dense clusters are close to each other. Moreover, they require tuning of parameters, which takes a lot of time. With this, the manuscript per the authors propose a new outlier detection algorithm, called PDC which use probability density to generate feature vectors to train a lightweight machine learning model that is finally applied to detect outliers. PDC takes advantages of accuracy and insensitivity-to-data-distance of probability density, so it can overcome the aforementioned drawbacks.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"31 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2023-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139253281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cardiovascular diseases (CVD) rank among the leading global causes of mortality. Early detection and diagnosis are paramount in minimizing their impact. The application of ML and DL in classifying the occurrence of cardiovascular diseases holds significant potential for reducing diagnostic errors. This research endeavors to construct a model capable of accurately predicting cardiovascular diseases, thereby mitigating the fatality associated with CVD. In this paper, the authors introduce a novel approach that combines an artificial intelligence network (AIN)-based feature selection (FS) technique with cutting-edge DL and ML classifiers for the early detection of heart diseases based on patient medical histories. The proposed model is rigorously evaluated using two real-world datasets sourced from the University of California. The authors conduct extensive data preprocessing and analysis, and the findings from this study demonstrate that the proposed methodology surpasses the performance of existing state-of-the-art methods, achieving an exceptional accuracy rate of 99.99%.
心血管疾病(CVD)是导致全球死亡的主要原因之一。早期检测和诊断对最大限度地减少其影响至关重要。应用 ML 和 DL 对心血管疾病的发生进行分类,在减少诊断错误方面具有巨大潜力。本研究致力于构建一个能够准确预测心血管疾病的模型,从而降低与心血管疾病相关的死亡率。在本文中,作者介绍了一种新方法,该方法将基于人工智能网络(AIN)的特征选择(FS)技术与最先进的 DL 和 ML 分类器相结合,用于根据患者病史早期检测心脏病。作者使用来自加利福尼亚大学的两个真实数据集对所提出的模型进行了严格评估。作者对数据进行了广泛的预处理和分析,研究结果表明,所提出的方法超越了现有最先进方法的性能,准确率高达 99.99%。
{"title":"An Intelligent Heart Disease Prediction Framework Using Machine Learning and Deep Learning Techniques","authors":"Nasser Allheeib, Summrina Kanwal, Sultan Alamri","doi":"10.4018/ijdwm.333862","DOIUrl":"https://doi.org/10.4018/ijdwm.333862","url":null,"abstract":"Cardiovascular diseases (CVD) rank among the leading global causes of mortality. Early detection and diagnosis are paramount in minimizing their impact. The application of ML and DL in classifying the occurrence of cardiovascular diseases holds significant potential for reducing diagnostic errors. This research endeavors to construct a model capable of accurately predicting cardiovascular diseases, thereby mitigating the fatality associated with CVD. In this paper, the authors introduce a novel approach that combines an artificial intelligence network (AIN)-based feature selection (FS) technique with cutting-edge DL and ML classifiers for the early detection of heart diseases based on patient medical histories. The proposed model is rigorously evaluated using two real-world datasets sourced from the University of California. The authors conduct extensive data preprocessing and analysis, and the findings from this study demonstrate that the proposed methodology surpasses the performance of existing state-of-the-art methods, achieving an exceptional accuracy rate of 99.99%.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"1 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2023-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139265941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many existing image and text sentiment analysis methods only consider the interaction between image and text modalities, while ignoring the inconsistency and correlation of image and text data, to address this issue, an image and text aspect level multimodal sentiment analysis model using transformer and multi-layer attention interaction is proposed. Firstly, ResNet50 is used to extract image features, and RoBERTa-BiLSTM is used to extract text and aspect level features. Then, through the aspect direct interaction mechanism and deep attention interaction mechanism, multi-level fusion of aspect information and graphic information is carried out to remove text and images unrelated to the given aspect. The emotional representations of text data, image data, and aspect type sentiments are concatenated, fused, and fully connected. Finally, the designed sentiment classifier is used to achieve sentiment analysis in terms of images and texts. This effectively has improved the performance of sentiment discrimination in terms of graphics and text.
{"title":"Image and Text Aspect Level Multimodal Sentiment Classification Model Using Transformer and Multilayer Attention Interaction","authors":"Xiuye Yin, Liyong Chen","doi":"10.4018/ijdwm.333854","DOIUrl":"https://doi.org/10.4018/ijdwm.333854","url":null,"abstract":"Many existing image and text sentiment analysis methods only consider the interaction between image and text modalities, while ignoring the inconsistency and correlation of image and text data, to address this issue, an image and text aspect level multimodal sentiment analysis model using transformer and multi-layer attention interaction is proposed. Firstly, ResNet50 is used to extract image features, and RoBERTa-BiLSTM is used to extract text and aspect level features. Then, through the aspect direct interaction mechanism and deep attention interaction mechanism, multi-level fusion of aspect information and graphic information is carried out to remove text and images unrelated to the given aspect. The emotional representations of text data, image data, and aspect type sentiments are concatenated, fused, and fully connected. Finally, the designed sentiment classifier is used to achieve sentiment analysis in terms of images and texts. This effectively has improved the performance of sentiment discrimination in terms of graphics and text.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"64 16","pages":""},"PeriodicalIF":1.2,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139275602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The loading of Automatic Identification System equipment on low-orbiting satellites can adapt to the demand of exchanging data and information with greater “capacity” brought by the AIS data information of ships in deep waters that cannot be covered by land-based stations. The information in the satellite AIS data contains a large number of potential features of ship activities, and by selecting the ship satellite AIS data of typical months in the South China Sea in 2020. Data mining, geographic information system, and traffic flow theory are used to visualize and analyze the ship activities in the South China Sea. The study shows that the distribution of ship routes in the South China Sea is highly compatible with the recommended routes of merchant ships, and the width of the track belt is obviously characterized. The number of ships passing through the southern waters of the Taiwan Strait has increased significantly, and the focus of traffic safety in the South China Sea should also focus on major route belt and important straits.
{"title":"Mining and Analysis of the Traffic Information Situation in the South China Sea Based on Satellite AIS Data","authors":"Tianyu Pu","doi":"10.4018/ijdwm.332864","DOIUrl":"https://doi.org/10.4018/ijdwm.332864","url":null,"abstract":"The loading of Automatic Identification System equipment on low-orbiting satellites can adapt to the demand of exchanging data and information with greater “capacity” brought by the AIS data information of ships in deep waters that cannot be covered by land-based stations. The information in the satellite AIS data contains a large number of potential features of ship activities, and by selecting the ship satellite AIS data of typical months in the South China Sea in 2020. Data mining, geographic information system, and traffic flow theory are used to visualize and analyze the ship activities in the South China Sea. The study shows that the distribution of ship routes in the South China Sea is highly compatible with the recommended routes of merchant ships, and the width of the track belt is obviously characterized. The number of ships passing through the southern waters of the Taiwan Strait has increased significantly, and the focus of traffic safety in the South China Sea should also focus on major route belt and important straits.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"48 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136234799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brainstorming is a widely used problem-solving method that generates a large number of innovative ideas by guiding and stimulating intuitive and divergent thinking. However, in practice, the method is limited by the human brain's capacity or special capabilities, especially by the experience and knowledge they possess. How does our brain create ideas like storming? Based on the new discipline of Extenics, the authors propose a new model that explores the process of how ideas are created in our brain, with the goal of helping people think multi-dimensionally and getting more ideas. With the support of information technology and artificial intelligence, we can systematically collect more information and knowledge than ever before to form a basic-element information base and build human-computer interaction models, to make up for the lack of information and knowledge in the human brain. In addition, the authors provide a methodology to help people think positively in a multidimensional way based on the guidance of Extenics in the brainstorming process.
{"title":"An Integration Model on Brainstorming and Extenics for Intelligent Innovation in Big Data Environment","authors":"Xingsen Li, Haibin Pi, Junwen Sun, Hao Lan Zhang, Zhencheng Liang","doi":"10.4018/ijdwm.332413","DOIUrl":"https://doi.org/10.4018/ijdwm.332413","url":null,"abstract":"Brainstorming is a widely used problem-solving method that generates a large number of innovative ideas by guiding and stimulating intuitive and divergent thinking. However, in practice, the method is limited by the human brain's capacity or special capabilities, especially by the experience and knowledge they possess. How does our brain create ideas like storming? Based on the new discipline of Extenics, the authors propose a new model that explores the process of how ideas are created in our brain, with the goal of helping people think multi-dimensionally and getting more ideas. With the support of information technology and artificial intelligence, we can systematically collect more information and knowledge than ever before to form a basic-element information base and build human-computer interaction models, to make up for the lack of information and knowledge in the human brain. In addition, the authors provide a methodology to help people think positively in a multidimensional way based on the guidance of Extenics in the brainstorming process.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"18 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135168344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As a new mobile communication technology in the era of the internet of things, 5G is characterized by high speed, low delay, and large connection. It is a network infrastructure to realize human-computer and internet of things in the era of the internet of things. Power quality data is the efficiency with which a power grid delivers electricity to users and expresses how well a piece of machinery uses the electricity it receives. The waveform at the nominal voltage and frequency is the goal of power quality research and improvement. The power internet of things (IoT) is an intelligent service platform that fully uses cutting-edge tech to enable user-machine interaction, data-driven decision-making, real-time analytics, and adaptive software design. The process by which plaintext is converted into cipher text is called an encryption algorithm. The cipher text may seem completely random, but it can be decrypted using the exact mechanism that created the encryption key.
{"title":"Secure Transmission Method of Power Quality Data in Power Internet of Things Based on the Encryption Algorithm","authors":"Xin Liu, Yingxian Chang, Honglei Yao, Bing Su","doi":"10.4018/ijdwm.330014","DOIUrl":"https://doi.org/10.4018/ijdwm.330014","url":null,"abstract":"As a new mobile communication technology in the era of the internet of things, 5G is characterized by high speed, low delay, and large connection. It is a network infrastructure to realize human-computer and internet of things in the era of the internet of things. Power quality data is the efficiency with which a power grid delivers electricity to users and expresses how well a piece of machinery uses the electricity it receives. The waveform at the nominal voltage and frequency is the goal of power quality research and improvement. The power internet of things (IoT) is an intelligent service platform that fully uses cutting-edge tech to enable user-machine interaction, data-driven decision-making, real-time analytics, and adaptive software design. The process by which plaintext is converted into cipher text is called an encryption algorithm. The cipher text may seem completely random, but it can be decrypted using the exact mechanism that created the encryption key.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44637794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Viet-Thang Vu, T. T. Q. Bui, Tien Loi Nguyen, Doan-Vinh Tran, Quan Hong, V. Vu, S. Avdoshin
Clustering is a commonly used tool for discovering knowledge in data mining. Density peak clustering (DPC) has recently gained attention for its ability to detect clusters with various shapes and noise, using just one parameter. DPC has shown advantages over other methods, such as DBSCAN and K-means, but it struggles with datasets that have both high and low-density clusters. To overcome this limitation, the paper introduces a new semi-supervised DPC method that improves clustering results with a small set of constraints expressed as must-link and cannot-link. The proposed method combines constraints and a k-nearest neighbor graph to filter out peaks and find the center for each cluster. Constraints are also used to support label assignment during the clustering procedure. The efficacy of this method is demonstrated through experiments on well-known data sets from UCI and benchmarked against contemporary semi-supervised clustering techniques.
{"title":"Constrained Density Peak Clustering","authors":"Viet-Thang Vu, T. T. Q. Bui, Tien Loi Nguyen, Doan-Vinh Tran, Quan Hong, V. Vu, S. Avdoshin","doi":"10.4018/ijdwm.328776","DOIUrl":"https://doi.org/10.4018/ijdwm.328776","url":null,"abstract":"Clustering is a commonly used tool for discovering knowledge in data mining. Density peak clustering (DPC) has recently gained attention for its ability to detect clusters with various shapes and noise, using just one parameter. DPC has shown advantages over other methods, such as DBSCAN and K-means, but it struggles with datasets that have both high and low-density clusters. To overcome this limitation, the paper introduces a new semi-supervised DPC method that improves clustering results with a small set of constraints expressed as must-link and cannot-link. The proposed method combines constraints and a k-nearest neighbor graph to filter out peaks and find the center for each cluster. Constraints are also used to support label assignment during the clustering procedure. The efficacy of this method is demonstrated through experiments on well-known data sets from UCI and benchmarked against contemporary semi-supervised clustering techniques.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2023-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46687825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}