Sandi Baressi Segota, N. Anđelić, I. Lorencin, J. Musulin, D. Štifanić, Z. Car
{"title":"用于卷积神经网络的简化分子输入行输入系统符号数据集的制备","authors":"Sandi Baressi Segota, N. Anđelić, I. Lorencin, J. Musulin, D. Štifanić, Z. Car","doi":"10.1109/BIBE52308.2021.9635320","DOIUrl":null,"url":null,"abstract":"Simplified Molecular Input Line Entry System (SMILES) is a type of chemical notation. The SMILES format allows the representation of chemical structures in a shape easily readable by computer programs. This allows many techniques, such as Artificial Neural Networks (ANNs) to be applied on the SMILES formatted data. One of the highest-performing ANN types is the Convolutional Neural Networks (CNNs), designed to work on images or matrix-shaped data. In this paper, the authors will present the preparation of the SMILES dataset for use by CNNs. The paper will start with a brief description of the SMILES format, followed by the explanation of the dataset transformation into an NPY matrix-based format, with an example of utilization via the application of popular CNN architectures on a transformed dataset. The proposed architecture achieves satisfactory results (AUC=0.92), with the transformation algorithm speed also proving satisfactory (0.08 seconds per data point)","PeriodicalId":343724,"journal":{"name":"2021 IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Preparation of Simplified Molecular Input Line Entry System Notation Datasets for use in Convolutional Neural Networks\",\"authors\":\"Sandi Baressi Segota, N. Anđelić, I. Lorencin, J. Musulin, D. Štifanić, Z. Car\",\"doi\":\"10.1109/BIBE52308.2021.9635320\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Simplified Molecular Input Line Entry System (SMILES) is a type of chemical notation. The SMILES format allows the representation of chemical structures in a shape easily readable by computer programs. This allows many techniques, such as Artificial Neural Networks (ANNs) to be applied on the SMILES formatted data. One of the highest-performing ANN types is the Convolutional Neural Networks (CNNs), designed to work on images or matrix-shaped data. In this paper, the authors will present the preparation of the SMILES dataset for use by CNNs. The paper will start with a brief description of the SMILES format, followed by the explanation of the dataset transformation into an NPY matrix-based format, with an example of utilization via the application of popular CNN architectures on a transformed dataset. The proposed architecture achieves satisfactory results (AUC=0.92), with the transformation algorithm speed also proving satisfactory (0.08 seconds per data point)\",\"PeriodicalId\":343724,\"journal\":{\"name\":\"2021 IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BIBE52308.2021.9635320\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBE52308.2021.9635320","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Preparation of Simplified Molecular Input Line Entry System Notation Datasets for use in Convolutional Neural Networks
Simplified Molecular Input Line Entry System (SMILES) is a type of chemical notation. The SMILES format allows the representation of chemical structures in a shape easily readable by computer programs. This allows many techniques, such as Artificial Neural Networks (ANNs) to be applied on the SMILES formatted data. One of the highest-performing ANN types is the Convolutional Neural Networks (CNNs), designed to work on images or matrix-shaped data. In this paper, the authors will present the preparation of the SMILES dataset for use by CNNs. The paper will start with a brief description of the SMILES format, followed by the explanation of the dataset transformation into an NPY matrix-based format, with an example of utilization via the application of popular CNN architectures on a transformed dataset. The proposed architecture achieves satisfactory results (AUC=0.92), with the transformation algorithm speed also proving satisfactory (0.08 seconds per data point)