{"title":"Enhancing suicidal ideation detection through advanced feature selection and stacked deep learning models","authors":"Shiv Shankar Prasad Shukla, Maheshwari Prasad Singh","doi":"10.1007/s10489-025-06256-0","DOIUrl":null,"url":null,"abstract":"<div><p>Detecting suicidal ideation on communication platforms such as social media is critical for suicide prevention, as these platforms are frequently used for emotional expression and can reflect significant behavior changes. Many machine learning and deep learning techniques have been employed to address this issue, utilizing embedding methods such as Count Vector, Term Frequency-Inverse Document Frequency, Bidirectional Encoder Representations from Transformers, Multilingual Universal Sentence Encoder etc generate high-dimensional vectors. Directly inputting word embeddings into models can introduce noise and outliers, which may negatively impact predictive accuracy. Therefore, feature selection to optimize the dimensionality of word embedding vectors has emerged as a promising direction for future research. This study proposes a feature selection method called Propose Best Feature Selection, which combines Grey Wolf Optimization, Recursive Feature Elimination, and Stepwise Feature Selection. It uses a Voting Classifier to identify and filter the most significant features, reducing dimensionality. These optimized features are then fed into a stacked ensemble hybrid model, with Bi-Directional Gated Recurrent Unit with Attention and Convolutional Neural Network, acting like base and Extreme Gradient Boostis working like the meta-classifier, achieving an accuracy of 98% in Reddit and 97% in Twitter(X) dataset, outperforming similar methods in the field. This work is focused on textual data, and future efforts may expand to include multimodal analysis, incorporating image-based emotional cues. Scalability challenges for large datasets and real-time applications remain a key limitation.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 4","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-025-06256-0","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Detecting suicidal ideation on communication platforms such as social media is critical for suicide prevention, as these platforms are frequently used for emotional expression and can reflect significant behavior changes. Many machine learning and deep learning techniques have been employed to address this issue, utilizing embedding methods such as Count Vector, Term Frequency-Inverse Document Frequency, Bidirectional Encoder Representations from Transformers, Multilingual Universal Sentence Encoder etc generate high-dimensional vectors. Directly inputting word embeddings into models can introduce noise and outliers, which may negatively impact predictive accuracy. Therefore, feature selection to optimize the dimensionality of word embedding vectors has emerged as a promising direction for future research. This study proposes a feature selection method called Propose Best Feature Selection, which combines Grey Wolf Optimization, Recursive Feature Elimination, and Stepwise Feature Selection. It uses a Voting Classifier to identify and filter the most significant features, reducing dimensionality. These optimized features are then fed into a stacked ensemble hybrid model, with Bi-Directional Gated Recurrent Unit with Attention and Convolutional Neural Network, acting like base and Extreme Gradient Boostis working like the meta-classifier, achieving an accuracy of 98% in Reddit and 97% in Twitter(X) dataset, outperforming similar methods in the field. This work is focused on textual data, and future efforts may expand to include multimodal analysis, incorporating image-based emotional cues. Scalability challenges for large datasets and real-time applications remain a key limitation.
期刊介绍:
With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance.
The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.