This article describes a publicly available multimodal Bangla sentiment dataset designed to support research in speech processing, sentiment analysis, and low-resource language modeling. The dataset comprises two synchronized modalities: sentiment-annotated Bangla text and corresponding speech recordings. It contains 1,000 manually curated Bangla sentences evenly distributed across positive and negative sentiment classes, alongside 4,000 aligned audio recordings produced by four native speakers. Each sentence is recorded independently by all speakers to ensure speaker diversity while maintaining consistent textual content. The text component reflects natural, everyday Bangla language usage and is structured to facilitate sentiment classification and linguistic analysis. The audio recordings were collected under controlled yet realistic acoustic conditions using multiple recording devices, introducing natural variability relevant for real-world speech applications. All samples underwent manual quality verification to ensure accurate text–audio alignment and to remove noisy or duplicated recordings. The dataset is suitable for a wide range of applications, including multimodal sentiment classification, sentiment-aware speech recognition, audio–text alignment, and benchmarking of multimodal learning approaches for low-resource languages. Its modular structure allows straightforward extension with additional speakers, dialects, or sentiment categories. By providing aligned textual and speech data for Bangla, this dataset contributes a valuable resource to the research community and supports broader efforts toward linguistic diversity in artificial intelligence.
扫码关注我们
求助内容:
应助结果提醒方式:
