Music Note Series Precipitation using Two Stacked Deep Long Short Term Memory Model

Carmel Mary Belinda M J, M. Shyamala Devi, J. Pandian, R. Aruna, S. Ravikumar, K. A. Kumar
{"title":"Music Note Series Precipitation using Two Stacked Deep Long Short Term Memory Model","authors":"Carmel Mary Belinda M J, M. Shyamala Devi, J. Pandian, R. Aruna, S. Ravikumar, K. A. Kumar","doi":"10.1109/ICAECT54875.2022.9807884","DOIUrl":null,"url":null,"abstract":"People need to get relieve from their stress and thoughts by engaging themselves with entertainment. Music plays a vital role in changing the people environment to overcome their personal problems. As technology is involved in all the fields, deep learning is extensively contributing its performance in the music industry towards generation of music note sequence. Music confirms to be a tough dynamic than image data as it is temporal with hierarchical structure and cross-temporal dependencies. As music is composed of multiple instruments by being interdependent and being evolved over time, it remains a challenging issue for the researchers to work. Since music is distributed into chords, arpeggios, melodies and each time-step generating multiple outputs, the generation of music note sequences through deep learning network extend higher issue for the researchers in programming. By interpreting the above scenario, a new two-stacked Deep Long Short Term Memory (TSD-LSTM) model is designed for music note sequence generation using audio framework of piano with 16000 frame length. The dataset is extracted from Maestro database repository. The audio framework data is converted into MIDI format using pretty_midi package. The MIDI digital data is parsed with 30513 number of notes resulting with sequence length of 25 frames and vocabulary size of 128 pitch length. The total digital data is ended up with 1282 MIDI information frames. The parsed audio digital data is splitted into training data with 967 MIDI data, validation data with 137 MIDI data and testing data with 178 MIDI data. Model fitting is done with TSD-LSTM. The proposed model design is refined with parameter optimization. The project is implemented with Python under NVidia Tesla V100 GPU server with 215938 trainable parameters, training epochs of 300, batch size of 64 along with training rate of 0.001. The music note sequence generation is done with TSD-LSTM and fitted with single stack LSTM, 3 stack LSTM and 4 stack LSTM and the performance is analyzed and compared with metrics like train loss, duration loss, pitch loss and step loss. Implementation results shows that TSD- LSTM model have minimum value for test loss of 0.0567, duration loss of 0.0086, pitch loss of 0.8150 and step loss of 0.0073 compared to other models.","PeriodicalId":346658,"journal":{"name":"2022 Second International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Second International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAECT54875.2022.9807884","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

People need to get relieve from their stress and thoughts by engaging themselves with entertainment. Music plays a vital role in changing the people environment to overcome their personal problems. As technology is involved in all the fields, deep learning is extensively contributing its performance in the music industry towards generation of music note sequence. Music confirms to be a tough dynamic than image data as it is temporal with hierarchical structure and cross-temporal dependencies. As music is composed of multiple instruments by being interdependent and being evolved over time, it remains a challenging issue for the researchers to work. Since music is distributed into chords, arpeggios, melodies and each time-step generating multiple outputs, the generation of music note sequences through deep learning network extend higher issue for the researchers in programming. By interpreting the above scenario, a new two-stacked Deep Long Short Term Memory (TSD-LSTM) model is designed for music note sequence generation using audio framework of piano with 16000 frame length. The dataset is extracted from Maestro database repository. The audio framework data is converted into MIDI format using pretty_midi package. The MIDI digital data is parsed with 30513 number of notes resulting with sequence length of 25 frames and vocabulary size of 128 pitch length. The total digital data is ended up with 1282 MIDI information frames. The parsed audio digital data is splitted into training data with 967 MIDI data, validation data with 137 MIDI data and testing data with 178 MIDI data. Model fitting is done with TSD-LSTM. The proposed model design is refined with parameter optimization. The project is implemented with Python under NVidia Tesla V100 GPU server with 215938 trainable parameters, training epochs of 300, batch size of 64 along with training rate of 0.001. The music note sequence generation is done with TSD-LSTM and fitted with single stack LSTM, 3 stack LSTM and 4 stack LSTM and the performance is analyzed and compared with metrics like train loss, duration loss, pitch loss and step loss. Implementation results shows that TSD- LSTM model have minimum value for test loss of 0.0567, duration loss of 0.0086, pitch loss of 0.8150 and step loss of 0.0073 compared to other models.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于两叠深长短期记忆模型的音符序列沉淀
人们需要通过娱乐来缓解压力和思想。音乐在改变人们的环境以克服个人问题方面起着至关重要的作用。由于技术涉及到各个领域,深度学习在音乐产业中为音符序列的生成做出了广泛的贡献。与图像数据相比,音乐证实是一种艰难的动态数据,因为它具有层次结构和跨时间依赖性。由于音乐是由多种乐器组成的,相互依存,并随着时间的推移而进化,这对研究人员来说仍然是一个具有挑战性的问题。由于音乐分布在和弦、琶音、旋律中,并且每个时间步都会产生多个输出,因此通过深度学习网络生成音符序列为研究人员在编程方面提出了更高的问题。通过对上述场景的解释,设计了一种新的双堆叠深度长短期记忆(TSD-LSTM)模型,用于使用16000帧长的钢琴音频框架生成音符序列。数据集是从Maestro数据库存储库中提取的。使用pretty_midi包将音频框架数据转换为MIDI格式。MIDI数字数据被解析为30513个音符,结果序列长度为25帧,词汇大小为128个音高长度。总数字数据以1282个MIDI信息帧结束。将解析后的音频数字数据分为包含967个MIDI数据的训练数据、包含137个MIDI数据的验证数据和包含178个MIDI数据的测试数据。模型拟合采用TSD-LSTM。通过参数优化对模型设计进行了细化。该项目在NVidia Tesla V100 GPU服务器下使用Python实现,可训练参数为215938个,训练epoch为300个,batch大小为64个,训练率为0.001。音符序列生成用TSD-LSTM完成,并与单堆栈LSTM、3堆栈LSTM和4堆栈LSTM拟合,并与列损失、持续时间损失、基音损失和阶跃损失等指标进行了性能分析和比较。实现结果表明,与其他模型相比,TSD- LSTM模型的测试损耗最小值为0.0567,持续损耗最小值为0.0086,节距损耗最小值为0.8150,阶跃损耗最小值为0.0073。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Electrical Vehicle Charging Station Mathematical Modeling and Stability Analysis Single Server Queueing Model with Multiple Working Vacation and with Breakdown A Deep Learning Based Image Steganalysis Using Gray Level Co-Occurrence Matrix Power Management in DC Microgrid Based on Distributed Energy Sources’ Available Virtual Generation Design and Techno-economic Analysis of a Grid-connected Solar Photovoltaic System in Bangladesh
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1