{"title":"Single Concatenated Input is Better than Indenpendent Multiple-input for CNNs to Predict Chemical-induced Disease Relation from Literature","authors":"P. Trang, Bui Manh Thang, Dang Thanh Hai","doi":"10.25073/2588-1086/VNUCSCE.237","DOIUrl":null,"url":null,"abstract":"Chemical compounds (drugs) and diseases are among top searched keywords on the PubMed database of biomedical literature by biomedical researchers all over the world (according to a study in 2009). Working with PubMed is essential for researchers to get insights into drugs’ side effects (chemical-induced disease relations (CDR), which is essential for drug safety and toxicity. It is, however, a catastrophic burden for them as PubMed is a huge database of unstructured texts, growing steadily very fast (~28 millions scientific articles currently, approximately two deposited per minute). As a result, biomedical text mining has been empirically demonstrated its great implications in biomedical research communities. Biomedical text has its own distinct challenging properties, attracting much attetion from natural language processing communities. A large-scale study recently in 2018 showed that incorporating information into indenpendent multiple-input layers outperforms concatenating them into a single input layer (for biLSTM), producing better performance when compared to state-of-the-art CDR classifying models. This paper demonstrates that for a CNN it is vice-versa, in which concatenation is better for CDR classification. To this end, we develop a CNN based model with multiple input concatenated for CDR classification. Experimental results on the benchmark dataset demonstrate its outperformance over other recent state-of-the-art CDR classification models. \nKeywords: \nChemical disease relation prediction, Convolutional neural network, Biomedical text mining \nReferences \n[1] Paul SM, S. Mytelka, C.T. Dunwiddie, C.C. Persinger, B.H. Munos, S.R. Lindborg, A.L. Schacht, How to improve R&D productivity: The pharmaceutical industry's grand challenge, Nat Rev Drug Discov. 9(3) (2010) 203-14. https://doi.org/10.1038/nrd3078. \n[2] J.A. DiMasi, New drug development in the United States from 1963 to 1999, Clinical pharmacology and therapeutics 69 (2001) 286-296. https://doi.org/10.1067/mcp.2001.115132. \n[3] C.P. Adams, V. Van Brantner, Estimating the cost of new drug development: Is it really $802 million? Health Affairs 25 (2006) 420-428. https://doi.org/10.1377/hlthaff.25.2.420. \n[4] R.I. Doğan, G.C. Murray, A. Névéol et al., \"Understanding PubMed user search behavior through log analysis\", Oxford Database, 2009. \n[5] G.K. Savova, J.J. Masanz, P.V. Ogren et al., \"Mayo clinical text analysis and knowledge extraction system (cTAKES): Architecture, component evaluation and applications\", Journal of the American Medical Informatics Association, 2010. \n[6] T.C. Wiegers, A.P. Davis, C.J. Mattingly, Collaborative biocuration-text mining development task for document prioritization for curation, Database 22 (2012) pp. bas037. \n[7] N. Kang, B. Singh, C. Bui et al., \"Knowledge-based extraction of adverse drug events from biomedical text\", BMC Bioinformatics 15, 2014. \n[8] A. Névéol, R.L. Doğan, Z. Lu, \"Semi-automatic semantic annotation of PubMed queries: A study on quality, Efficiency, Satisfaction\", Journal of Biomedical Informatics 44, 2011. \n[9] L. Hirschman, G.A. Burns, M. Krallinger, C. Arighi, K.B. Cohen et al., Text mining for the biocuration workflow, Database Apr 18, 2012, pp. bas020. \n[10] Wei et al., \"Overview of the BioCreative V Chemical Disease Relation (CDR) Task\", Proceedings of the Fifth BioCreative Challenge Evaluation Workshop, 2015. \n[11] P. Verga, E. Strubell, A. McCallum, Simultaneously Self-Attending to All Mentions for Full-Abstract Biological Relation Extraction, In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 1 (2018) 872-884. \n[12] Y. Shen, X. Huang, Attention-based convolutional neural network for semantic relation extraction, In: Proceedings of COLING 2016, the Twenty-sixth International Conference on Computational Linguistics: Technical Papers, The COLING 2016 Organizing Committee, Osaka, Japan, 2016, pp. 2526-2536. \n[13] Y. Peng, Z. Lu, Deep learning for extracting protein-protein interactions from biomedical literature, In: Proceedings of the BioNLP 2017 Workshop, Association for Computational Linguistics, Vancouver, Canada, 2016, pp. 29-38. \n[14] S. Liu, F. Shen, R. Komandur Elayavilli, Y. Wang, M. Rastegar-Mojarad, V. Chaudhary, H. Liu, Extracting chemical-protein relations using attention-based neural networks, Database, 2018. \n[15] H. Zhou, H. Deng, L. Chen, Y. Yang, C. Jia, D. Huang, Exploiting syntactic and semantics information for chemical-disease relation extraction, Database, 2016, pp. baw048. \n[16] S. Liu, B. Tang, Q. Chen et al., Drug–drug interaction extraction via convolutional neural networks, Comput, Math, Methods Med, Vol (2016) 1-8. https://doi.org/10.1155/2016/6918381. \n[17] L. Wang, Z. Cao, G. De Meloet al., Relation classification via multi-level attention CNNs, In: Proceedings of the Fifty-fourth Annual Meeting of the Association for Computational Linguistics 1 (2016) 1298-1307. \nhttps://doi.org/10.18653/v1/P16-1123. \n[18] J. Gu, F. Sun, L. Qian et al., Chemical-induced disease relation extraction via convolutional neural network, Database (2017) 1-12. https://doi.org/10.1093/database/bax024. \n[19] H.Q. Le, D.C. Can, S.T. Vu, T.H. Dang, M.T. Pilehvar, N. Collier, Large-scale Exploration of Neural Relation Classification Architectures, In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018, pp. 2266-2277. \n[20] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, In Proceedings of the IEEE. 86(11) (1998) 2278-2324. \n[21] Y. Kim, Convolutional neural networks for sentence classification, ArXiv preprint arXiv:1408.5882. \n[22] C. Nagesh, Panyam, Karin Verspoor, Trevor Cohn and Kotagiri Ramamohanarao, Exploiting graph kernels for high performance biomedical relation extraction, Journal of biomedical semantics 9(1) (2018) 7. \n[23] H. Zhou, H. Deng, L. Chen, Y. Yang, C. Jia, D. Huang, Exploiting syntactic and semantics information for chemical-disease relation extraction, Database, 2016.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"VNU Journal of Science: Computer Science and Communication Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.25073/2588-1086/VNUCSCE.237","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Chemical compounds (drugs) and diseases are among top searched keywords on the PubMed database of biomedical literature by biomedical researchers all over the world (according to a study in 2009). Working with PubMed is essential for researchers to get insights into drugs’ side effects (chemical-induced disease relations (CDR), which is essential for drug safety and toxicity. It is, however, a catastrophic burden for them as PubMed is a huge database of unstructured texts, growing steadily very fast (~28 millions scientific articles currently, approximately two deposited per minute). As a result, biomedical text mining has been empirically demonstrated its great implications in biomedical research communities. Biomedical text has its own distinct challenging properties, attracting much attetion from natural language processing communities. A large-scale study recently in 2018 showed that incorporating information into indenpendent multiple-input layers outperforms concatenating them into a single input layer (for biLSTM), producing better performance when compared to state-of-the-art CDR classifying models. This paper demonstrates that for a CNN it is vice-versa, in which concatenation is better for CDR classification. To this end, we develop a CNN based model with multiple input concatenated for CDR classification. Experimental results on the benchmark dataset demonstrate its outperformance over other recent state-of-the-art CDR classification models.
Keywords:
Chemical disease relation prediction, Convolutional neural network, Biomedical text mining
References
[1] Paul SM, S. Mytelka, C.T. Dunwiddie, C.C. Persinger, B.H. Munos, S.R. Lindborg, A.L. Schacht, How to improve R&D productivity: The pharmaceutical industry's grand challenge, Nat Rev Drug Discov. 9(3) (2010) 203-14. https://doi.org/10.1038/nrd3078.
[2] J.A. DiMasi, New drug development in the United States from 1963 to 1999, Clinical pharmacology and therapeutics 69 (2001) 286-296. https://doi.org/10.1067/mcp.2001.115132.
[3] C.P. Adams, V. Van Brantner, Estimating the cost of new drug development: Is it really $802 million? Health Affairs 25 (2006) 420-428. https://doi.org/10.1377/hlthaff.25.2.420.
[4] R.I. Doğan, G.C. Murray, A. Névéol et al., "Understanding PubMed user search behavior through log analysis", Oxford Database, 2009.
[5] G.K. Savova, J.J. Masanz, P.V. Ogren et al., "Mayo clinical text analysis and knowledge extraction system (cTAKES): Architecture, component evaluation and applications", Journal of the American Medical Informatics Association, 2010.
[6] T.C. Wiegers, A.P. Davis, C.J. Mattingly, Collaborative biocuration-text mining development task for document prioritization for curation, Database 22 (2012) pp. bas037.
[7] N. Kang, B. Singh, C. Bui et al., "Knowledge-based extraction of adverse drug events from biomedical text", BMC Bioinformatics 15, 2014.
[8] A. Névéol, R.L. Doğan, Z. Lu, "Semi-automatic semantic annotation of PubMed queries: A study on quality, Efficiency, Satisfaction", Journal of Biomedical Informatics 44, 2011.
[9] L. Hirschman, G.A. Burns, M. Krallinger, C. Arighi, K.B. Cohen et al., Text mining for the biocuration workflow, Database Apr 18, 2012, pp. bas020.
[10] Wei et al., "Overview of the BioCreative V Chemical Disease Relation (CDR) Task", Proceedings of the Fifth BioCreative Challenge Evaluation Workshop, 2015.
[11] P. Verga, E. Strubell, A. McCallum, Simultaneously Self-Attending to All Mentions for Full-Abstract Biological Relation Extraction, In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 1 (2018) 872-884.
[12] Y. Shen, X. Huang, Attention-based convolutional neural network for semantic relation extraction, In: Proceedings of COLING 2016, the Twenty-sixth International Conference on Computational Linguistics: Technical Papers, The COLING 2016 Organizing Committee, Osaka, Japan, 2016, pp. 2526-2536.
[13] Y. Peng, Z. Lu, Deep learning for extracting protein-protein interactions from biomedical literature, In: Proceedings of the BioNLP 2017 Workshop, Association for Computational Linguistics, Vancouver, Canada, 2016, pp. 29-38.
[14] S. Liu, F. Shen, R. Komandur Elayavilli, Y. Wang, M. Rastegar-Mojarad, V. Chaudhary, H. Liu, Extracting chemical-protein relations using attention-based neural networks, Database, 2018.
[15] H. Zhou, H. Deng, L. Chen, Y. Yang, C. Jia, D. Huang, Exploiting syntactic and semantics information for chemical-disease relation extraction, Database, 2016, pp. baw048.
[16] S. Liu, B. Tang, Q. Chen et al., Drug–drug interaction extraction via convolutional neural networks, Comput, Math, Methods Med, Vol (2016) 1-8. https://doi.org/10.1155/2016/6918381.
[17] L. Wang, Z. Cao, G. De Meloet al., Relation classification via multi-level attention CNNs, In: Proceedings of the Fifty-fourth Annual Meeting of the Association for Computational Linguistics 1 (2016) 1298-1307.
https://doi.org/10.18653/v1/P16-1123.
[18] J. Gu, F. Sun, L. Qian et al., Chemical-induced disease relation extraction via convolutional neural network, Database (2017) 1-12. https://doi.org/10.1093/database/bax024.
[19] H.Q. Le, D.C. Can, S.T. Vu, T.H. Dang, M.T. Pilehvar, N. Collier, Large-scale Exploration of Neural Relation Classification Architectures, In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018, pp. 2266-2277.
[20] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, In Proceedings of the IEEE. 86(11) (1998) 2278-2324.
[21] Y. Kim, Convolutional neural networks for sentence classification, ArXiv preprint arXiv:1408.5882.
[22] C. Nagesh, Panyam, Karin Verspoor, Trevor Cohn and Kotagiri Ramamohanarao, Exploiting graph kernels for high performance biomedical relation extraction, Journal of biomedical semantics 9(1) (2018) 7.
[23] H. Zhou, H. Deng, L. Chen, Y. Yang, C. Jia, D. Huang, Exploiting syntactic and semantics information for chemical-disease relation extraction, Database, 2016.