Carmel Mary Belinda M J, M. Shyamala Devi, J. Pandian, R. Aruna, S. Ravikumar, K. A. Kumar
{"title":"Music Note Series Precipitation using Two Stacked Deep Long Short Term Memory Model","authors":"Carmel Mary Belinda M J, M. Shyamala Devi, J. Pandian, R. Aruna, S. Ravikumar, K. A. Kumar","doi":"10.1109/ICAECT54875.2022.9807884","DOIUrl":null,"url":null,"abstract":"People need to get relieve from their stress and thoughts by engaging themselves with entertainment. Music plays a vital role in changing the people environment to overcome their personal problems. As technology is involved in all the fields, deep learning is extensively contributing its performance in the music industry towards generation of music note sequence. Music confirms to be a tough dynamic than image data as it is temporal with hierarchical structure and cross-temporal dependencies. As music is composed of multiple instruments by being interdependent and being evolved over time, it remains a challenging issue for the researchers to work. Since music is distributed into chords, arpeggios, melodies and each time-step generating multiple outputs, the generation of music note sequences through deep learning network extend higher issue for the researchers in programming. By interpreting the above scenario, a new two-stacked Deep Long Short Term Memory (TSD-LSTM) model is designed for music note sequence generation using audio framework of piano with 16000 frame length. The dataset is extracted from Maestro database repository. The audio framework data is converted into MIDI format using pretty_midi package. The MIDI digital data is parsed with 30513 number of notes resulting with sequence length of 25 frames and vocabulary size of 128 pitch length. The total digital data is ended up with 1282 MIDI information frames. The parsed audio digital data is splitted into training data with 967 MIDI data, validation data with 137 MIDI data and testing data with 178 MIDI data. Model fitting is done with TSD-LSTM. The proposed model design is refined with parameter optimization. The project is implemented with Python under NVidia Tesla V100 GPU server with 215938 trainable parameters, training epochs of 300, batch size of 64 along with training rate of 0.001. The music note sequence generation is done with TSD-LSTM and fitted with single stack LSTM, 3 stack LSTM and 4 stack LSTM and the performance is analyzed and compared with metrics like train loss, duration loss, pitch loss and step loss. Implementation results shows that TSD- LSTM model have minimum value for test loss of 0.0567, duration loss of 0.0086, pitch loss of 0.8150 and step loss of 0.0073 compared to other models.","PeriodicalId":346658,"journal":{"name":"2022 Second International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Second International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAECT54875.2022.9807884","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
People need to get relieve from their stress and thoughts by engaging themselves with entertainment. Music plays a vital role in changing the people environment to overcome their personal problems. As technology is involved in all the fields, deep learning is extensively contributing its performance in the music industry towards generation of music note sequence. Music confirms to be a tough dynamic than image data as it is temporal with hierarchical structure and cross-temporal dependencies. As music is composed of multiple instruments by being interdependent and being evolved over time, it remains a challenging issue for the researchers to work. Since music is distributed into chords, arpeggios, melodies and each time-step generating multiple outputs, the generation of music note sequences through deep learning network extend higher issue for the researchers in programming. By interpreting the above scenario, a new two-stacked Deep Long Short Term Memory (TSD-LSTM) model is designed for music note sequence generation using audio framework of piano with 16000 frame length. The dataset is extracted from Maestro database repository. The audio framework data is converted into MIDI format using pretty_midi package. The MIDI digital data is parsed with 30513 number of notes resulting with sequence length of 25 frames and vocabulary size of 128 pitch length. The total digital data is ended up with 1282 MIDI information frames. The parsed audio digital data is splitted into training data with 967 MIDI data, validation data with 137 MIDI data and testing data with 178 MIDI data. Model fitting is done with TSD-LSTM. The proposed model design is refined with parameter optimization. The project is implemented with Python under NVidia Tesla V100 GPU server with 215938 trainable parameters, training epochs of 300, batch size of 64 along with training rate of 0.001. The music note sequence generation is done with TSD-LSTM and fitted with single stack LSTM, 3 stack LSTM and 4 stack LSTM and the performance is analyzed and compared with metrics like train loss, duration loss, pitch loss and step loss. Implementation results shows that TSD- LSTM model have minimum value for test loss of 0.0567, duration loss of 0.0086, pitch loss of 0.8150 and step loss of 0.0073 compared to other models.
人们需要通过娱乐来缓解压力和思想。音乐在改变人们的环境以克服个人问题方面起着至关重要的作用。由于技术涉及到各个领域,深度学习在音乐产业中为音符序列的生成做出了广泛的贡献。与图像数据相比,音乐证实是一种艰难的动态数据,因为它具有层次结构和跨时间依赖性。由于音乐是由多种乐器组成的,相互依存,并随着时间的推移而进化,这对研究人员来说仍然是一个具有挑战性的问题。由于音乐分布在和弦、琶音、旋律中,并且每个时间步都会产生多个输出,因此通过深度学习网络生成音符序列为研究人员在编程方面提出了更高的问题。通过对上述场景的解释,设计了一种新的双堆叠深度长短期记忆(TSD-LSTM)模型,用于使用16000帧长的钢琴音频框架生成音符序列。数据集是从Maestro数据库存储库中提取的。使用pretty_midi包将音频框架数据转换为MIDI格式。MIDI数字数据被解析为30513个音符,结果序列长度为25帧,词汇大小为128个音高长度。总数字数据以1282个MIDI信息帧结束。将解析后的音频数字数据分为包含967个MIDI数据的训练数据、包含137个MIDI数据的验证数据和包含178个MIDI数据的测试数据。模型拟合采用TSD-LSTM。通过参数优化对模型设计进行了细化。该项目在NVidia Tesla V100 GPU服务器下使用Python实现,可训练参数为215938个,训练epoch为300个,batch大小为64个,训练率为0.001。音符序列生成用TSD-LSTM完成,并与单堆栈LSTM、3堆栈LSTM和4堆栈LSTM拟合,并与列损失、持续时间损失、基音损失和阶跃损失等指标进行了性能分析和比较。实现结果表明,与其他模型相比,TSD- LSTM模型的测试损耗最小值为0.0567,持续损耗最小值为0.0086,节距损耗最小值为0.8150,阶跃损耗最小值为0.0073。