We evaluated the effect of 19 hepatotoxicants and 20 antimetabolites on the expression of genes of the human nuclear receptor (NR) superfamily in human primary hepatocytes, utilizing NR superfamily-related data extracted from the toxicogenomics database Open TG-GATES. A considerable number of the drugs alone induced a significant fold change in the expression of a large number of NRs. The members of the NR superfamily that changed expression with more than 40% of the drugs consisted of 12 NRs common to both classes (COUP, FXR, HNF4, LRH1, LXR, PPAR PPAR, PXR, ROR, RXR, and TR4), 3 NRs specific to hepatotoxicants (GCNF1, RAR and TR), and 7 NRs specific to antimetabolites (ER GR, RAR REVERB RXRSHP, and VDR Nine of these were classified into cluster I involved in reproduction, development, and growth, whereas 13 were classified into cluster II, involved in nutrient uptake, metabolism, and excretion. These were also characterized by containing members of 6 out of 8 circadian-regulated subfamilies (ROR, Rev-erb, PPAR, FXR, TR, and TR2/TR4) including circadian oscillator genes Rev-erbs and ROR and by containing 8 out of 9 NR subfamilies controlling the expression of genes for drug-metabolizing enzymes (CAR, FXR, GR, HNF4, LXR, PXR, PPAR, RAR, and VDR). The unsupervised hierarchical clustering of the NRs mobilized by drugs showed markedly different profiles between hepatotoxicants and antimetabolites. The results suggest that the profile of the expression response is determined by coordinated changes of drug-specific NRs and homeostasis-maintaining core NRs including circadian-regulated and circadian oscillator NRs and NRs controlling the expression of genes for drug-metabolizing enzymes. The hierarchial clustering of the hepatotoxicants and antimetabolites based on their effect on NRs showed that hepatotoxicants were classified into two subfamilies, one of which consisted exclusively of those inducing coagulopathy, while antimetabolites were divided into Chem-Bio Informatics Journal, Vol.16, pp.13-24 (2016) 14 4 subfamilies where functionally-related drugs were generally classified together but with some exceptions. The classification of drugs based on their effect on the NR superfamily would urge us to re-examine the profile of toxicological actions of the drugs.
{"title":"Characteristic gene expression profile of nuclear receptor superfamily induced by hepatotoxic and antimetabolic drugs in human primary hepatocytes","authors":"H. Kojo, Y. Eguchi, K. Makino, H. Terada","doi":"10.1273/CBIJ.16.13","DOIUrl":"https://doi.org/10.1273/CBIJ.16.13","url":null,"abstract":"We evaluated the effect of 19 hepatotoxicants and 20 antimetabolites on the expression of genes of the human nuclear receptor (NR) superfamily in human primary hepatocytes, utilizing NR superfamily-related data extracted from the toxicogenomics database Open TG-GATES. A considerable number of the drugs alone induced a significant fold change in the expression of a large number of NRs. The members of the NR superfamily that changed expression with more than 40% of the drugs consisted of 12 NRs common to both classes (COUP, FXR, HNF4, LRH1, LXR, PPAR PPAR, PXR, ROR, RXR, and TR4), 3 NRs specific to hepatotoxicants (GCNF1, RAR and TR), and 7 NRs specific to antimetabolites (ER GR, RAR REVERB RXRSHP, and VDR Nine of these were classified into cluster I involved in reproduction, development, and growth, whereas 13 were classified into cluster II, involved in nutrient uptake, metabolism, and excretion. These were also characterized by containing members of 6 out of 8 circadian-regulated subfamilies (ROR, Rev-erb, PPAR, FXR, TR, and TR2/TR4) including circadian oscillator genes Rev-erbs and ROR and by containing 8 out of 9 NR subfamilies controlling the expression of genes for drug-metabolizing enzymes (CAR, FXR, GR, HNF4, LXR, PXR, PPAR, RAR, and VDR). The unsupervised hierarchical clustering of the NRs mobilized by drugs showed markedly different profiles between hepatotoxicants and antimetabolites. The results suggest that the profile of the expression response is determined by coordinated changes of drug-specific NRs and homeostasis-maintaining core NRs including circadian-regulated and circadian oscillator NRs and NRs controlling the expression of genes for drug-metabolizing enzymes. The hierarchial clustering of the hepatotoxicants and antimetabolites based on their effect on NRs showed that hepatotoxicants were classified into two subfamilies, one of which consisted exclusively of those inducing coagulopathy, while antimetabolites were divided into Chem-Bio Informatics Journal, Vol.16, pp.13-24 (2016) 14 4 subfamilies where functionally-related drugs were generally classified together but with some exceptions. The classification of drugs based on their effect on the NR superfamily would urge us to re-examine the profile of toxicological actions of the drugs.","PeriodicalId":40659,"journal":{"name":"Chem-Bio Informatics Journal","volume":"27 1","pages":"13-24"},"PeriodicalIF":0.3,"publicationDate":"2016-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75539988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A non-nucleoside reverse-transcriptase inhibitor nevirapine (NVP) used to treat HIV-1 infection can cause severe, life-threatening idiosyncratic drug toxicity (IDT). It is known that the IDT caused by NVP or its metabolites is associated with the HLA-B*14:02 haplotype. The molecular mechanism of the HLA -associated IDT, however, has not been disclosed. In this study, we have simulated the interaction modes between NVP-related compounds, HLA-B*14:02 , and a T-cell receptor in order to understand the molecular mechanism leading to the onset of IDT.
{"title":"In silico Analysis of Interactions between Nevirapine-related Compounds, HLA-B*14:02 and T-cell Receptor","authors":"Hideto Isogai, N. Hirayama","doi":"10.1273/CBIJ.16.9","DOIUrl":"https://doi.org/10.1273/CBIJ.16.9","url":null,"abstract":"A non-nucleoside reverse-transcriptase inhibitor nevirapine (NVP) used to treat HIV-1 infection can cause severe, life-threatening idiosyncratic drug toxicity (IDT). It is known that the IDT caused by NVP or its metabolites is associated with the HLA-B*14:02 haplotype. The molecular mechanism of the HLA -associated IDT, however, has not been disclosed. In this study, we have simulated the interaction modes between NVP-related compounds, HLA-B*14:02 , and a T-cell receptor in order to understand the molecular mechanism leading to the onset of IDT.","PeriodicalId":40659,"journal":{"name":"Chem-Bio Informatics Journal","volume":"IM-25 1","pages":"9-12"},"PeriodicalIF":0.3,"publicationDate":"2016-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84704576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carbamazepine (CBZ) is a widely used anticonvulsant and is one of the major causative drugs of cutaneous adverse drug reactions (cADRs), such as Stevens-Johnson syndrome and toxic epidermal necrolysis. For the East Asians and Europeans HLA-A*31:01 is associated with CBZ-induced cADRs. We have undertaken in silico docking simulations of CBZ and its metabolites at the peptide-binding groove of HLA-A*31:01 in order to identify the chemical species responsible for the CBZ-induced cADR.
{"title":"In silico Analysis of Interactions between HLA-A*31:01 and carbamazepine-related Compounds","authors":"H. Miyadera, T. Ozeki, T. Mushiroda, N. Hirayama","doi":"10.1273/CBIJ.16.5","DOIUrl":"https://doi.org/10.1273/CBIJ.16.5","url":null,"abstract":"Carbamazepine (CBZ) is a widely used anticonvulsant and is one of the major causative drugs of cutaneous adverse drug reactions (cADRs), such as Stevens-Johnson syndrome and toxic epidermal necrolysis. For the East Asians and Europeans HLA-A*31:01 is associated with CBZ-induced cADRs. We have undertaken in silico docking simulations of CBZ and its metabolites at the peptide-binding groove of HLA-A*31:01 in order to identify the chemical species responsible for the CBZ-induced cADR.","PeriodicalId":40659,"journal":{"name":"Chem-Bio Informatics Journal","volume":"24 1","pages":"5-8"},"PeriodicalIF":0.3,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76703476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Allopurinol, the most traditional and widely used medication for hyperuricemia and gout, has been reported as a common cause of severe cutaneous adverse reactions. Allopurinol is rapidly and extensively metabolized to oxipurinol. At least six allopurinol-related impurities have been reported to be contained in allopurinol. It is of interest to identify the compound which is likely to be responsible to the adverse reactions. Since a strong association between allopurinol-induced adverse reactions and HLA-B*58:01 has been observed, binding of allopurinol-related compounds to HLA-B*58:01 must be important for the onset of the adverse reactions. In this study, using the three-dimensional structure of HLA-B*58:01 constructed by homology modeling, the binding modes and affinities between allopurinol-related compounds and HLA-B*58:01 were simulated by docking simulations. The results have indicated that the adverse reactions of allopurinol should be due very largely to oxipurinol. The results also suggested that the concentrations of several impurities currently approved by the United States Pharmacopeia should be strictly monitored not to exceed the limits because they may strongly bind to HLA-B*58:01 and possibly leading to more severe adverse reactions.
{"title":"In silico Analysis of Interactions between HLA-B*58:01 and Allopurinol-related Compounds","authors":"M. Osabe, M. Tohkin, N. Hirayama","doi":"10.1273/CBIJ.16.1","DOIUrl":"https://doi.org/10.1273/CBIJ.16.1","url":null,"abstract":"Allopurinol, the most traditional and widely used medication for hyperuricemia and gout, has been reported as a common cause of severe cutaneous adverse reactions. Allopurinol is rapidly and extensively metabolized to oxipurinol. At least six allopurinol-related impurities have been reported to be contained in allopurinol. It is of interest to identify the compound which is likely to be responsible to the adverse reactions. Since a strong association between allopurinol-induced adverse reactions and HLA-B*58:01 has been observed, binding of allopurinol-related compounds to HLA-B*58:01 must be important for the onset of the adverse reactions. In this study, using the three-dimensional structure of HLA-B*58:01 constructed by homology modeling, the binding modes and affinities between allopurinol-related compounds and HLA-B*58:01 were simulated by docking simulations. The results have indicated that the adverse reactions of allopurinol should be due very largely to oxipurinol. The results also suggested that the concentrations of several impurities currently approved by the United States Pharmacopeia should be strictly monitored not to exceed the limits because they may strongly bind to HLA-B*58:01 and possibly leading to more severe adverse reactions.","PeriodicalId":40659,"journal":{"name":"Chem-Bio Informatics Journal","volume":"18 1","pages":"1-4"},"PeriodicalIF":0.3,"publicationDate":"2016-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89436844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Passive immune therapy with“trastizumab” have been proven to be useful to treat HER-2/neu overexpressing breast cancers. However, serious problems such as recurrence presumably due to the resistance acquisition occasionally occur. Therefore, several peptide vaccines have been studied to overcome the problems. Several peptides have been shown to elicit specific immune response and expected to confer a clinical benefit. The interactions between the peptides and specific HLA molecules are crucial for the proper immune response. Hence, understanding the detailed molecular mechanisms of the interactions are of particular interest from the view point of designing better peptide vaccines. In this study, the interaction modes between these peptides and HLA-A*24:02 which is the most common allele in Japanese populations were elucidated by docking simulations. The roles of each amino acid of these peptides in immunization deduced from the present study would be useful for designing more potent and specific peptide vaccines.
{"title":"Insight into the Intermolecular Recognition Mechanism between HLA-A*24:02 and Antitumor Peptides against Breast Cancer","authors":"N. Hirayama","doi":"10.1273/CBIJ.15.1","DOIUrl":"https://doi.org/10.1273/CBIJ.15.1","url":null,"abstract":"Passive immune therapy with“trastizumab” have been proven to be useful to treat HER-2/neu overexpressing breast cancers. However, serious problems such as recurrence presumably due to the resistance acquisition occasionally occur. Therefore, several peptide vaccines have been studied to overcome the problems. Several peptides have been shown to elicit specific immune response and expected to confer a clinical benefit. The interactions between the peptides and specific HLA molecules are crucial for the proper immune response. Hence, understanding the detailed molecular mechanisms of the interactions are of particular interest from the view point of designing better peptide vaccines. In this study, the interaction modes between these peptides and HLA-A*24:02 which is the most common allele in Japanese populations were elucidated by docking simulations. The roles of each amino acid of these peptides in immunization deduced from the present study would be useful for designing more potent and specific peptide vaccines.","PeriodicalId":40659,"journal":{"name":"Chem-Bio Informatics Journal","volume":"258 1","pages":"1-4"},"PeriodicalIF":0.3,"publicationDate":"2015-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77542266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuji. Kato, T. Fujiwara, Y. Komeiji, T. Nakano, H. Mori, Yoshio Okiyama, Y. Mochizuki
A simulation protocol based on fragment molecular orbital−based molecular dynamics (FMO-MD) was applied to a droplet model consisting of a divalent copper ion and 64 water molecules. The total energy and forces were evaluated at the unrestricted Hartree-Fock (UHF) level with three-body fragment correction (FMO3). Two MD runs were performed: one with a six-coordination setting and the other with a five-coordination setting in the first hydration shell. Both runs resulted in the main peak position of the Cu-O radial distribution function at 2.02 Å, in reasonable agreement with the experimental data. The O-Cu-O angular distribution function showed different characteristics between the two cases.
{"title":"Fragment molecular orbital−based molecular dynamics (FMO-MD) simulations on hydrated Cu(II) ion","authors":"Yuji. Kato, T. Fujiwara, Y. Komeiji, T. Nakano, H. Mori, Yoshio Okiyama, Y. Mochizuki","doi":"10.1273/CBIJ.14.1","DOIUrl":"https://doi.org/10.1273/CBIJ.14.1","url":null,"abstract":"A simulation protocol based on fragment molecular orbital−based molecular dynamics (FMO-MD) was applied to a droplet model consisting of a divalent copper ion and 64 water molecules. The total energy and forces were evaluated at the unrestricted Hartree-Fock (UHF) level with three-body fragment correction (FMO3). Two MD runs were performed: one with a six-coordination setting and the other with a five-coordination setting in the first hydration shell. Both runs resulted in the main peak position of the Cu-O radial distribution function at 2.02 Å, in reasonable agreement with the experimental data. The O-Cu-O angular distribution function showed different characteristics between the two cases.","PeriodicalId":40659,"journal":{"name":"Chem-Bio Informatics Journal","volume":"2021 1","pages":"1-13"},"PeriodicalIF":0.3,"publicationDate":"2014-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73870486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y. Komeiji, T. Fujiwara, Yoshio Okiyama, Y. Mochizuki
The ab initio fragment molecular orbital-based molecular dynamics (FMO-MD) method was extended for simulation of solvated polypeptides by the introduction of an algorithm named dynamic fragmentation with static fragments (DF/SF). In FMO-MD, the force acting on each nucleus is calculated by the FMO method, which requires fragmentation of the simulated molecule. The fragmentation data must be redefined, depending on the time-dependent change of the molecular configuration, and the DF/SF algorithm governs this redefinition. In the DF/SF algorithm, some fragments are manually classified as static and unchanged, while others are considered dynamic and subject to change. Various options of the algorithm were implemented in the ABINIT-MP program. The options were tested and discussed as they applied to FMO-MD simulations of the solvated (Gly)2 dipeptide, in which the two amino acid residues of the peptide were regarded as static (invariable) while surrounding water molecules were regarded as dynamic (variable). Future prospects for the FMO-MD simulation of biopolymers are discussed based upon the tests of the DF/SF algorithm.
{"title":"Dynamic fragmentation with static fragments (DF/SF) algorithm designed for ab initio fragment molecular orbital-based molecular dynamics (FMO-MD) simulations of polypeptides","authors":"Y. Komeiji, T. Fujiwara, Yoshio Okiyama, Y. Mochizuki","doi":"10.1273/CBIJ.13.45","DOIUrl":"https://doi.org/10.1273/CBIJ.13.45","url":null,"abstract":"The ab initio fragment molecular orbital-based molecular dynamics (FMO-MD) method was extended for simulation of solvated polypeptides by the introduction of an algorithm named dynamic fragmentation with static fragments (DF/SF). In FMO-MD, the force acting on each nucleus is calculated by the FMO method, which requires fragmentation of the simulated molecule. The fragmentation data must be redefined, depending on the time-dependent change of the molecular configuration, and the DF/SF algorithm governs this redefinition. In the DF/SF algorithm, some fragments are manually classified as static and unchanged, while others are considered dynamic and subject to change. Various options of the algorithm were implemented in the ABINIT-MP program. The options were tested and discussed as they applied to FMO-MD simulations of the solvated (Gly)2 dipeptide, in which the two amino acid residues of the peptide were regarded as static (invariable) while surrounding water molecules were regarded as dynamic (variable). Future prospects for the FMO-MD simulation of biopolymers are discussed based upon the tests of the DF/SF algorithm.","PeriodicalId":40659,"journal":{"name":"Chem-Bio Informatics Journal","volume":"130 1","pages":"45-57"},"PeriodicalIF":0.3,"publicationDate":"2013-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84739711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We have developed a couple of optimal damping algorithms (ODAs) for unrestricted Hartree-Fock (UHF) calculations of open-shell molecular systems. A series of equations were derived for both concurrent and alternate constructions of alpha- and beta-Fock matrices in the integral-direct self-consistent-field (SCF) procedure. Several test calculations were performed to check the convergence behaviors. It was shown that the concurrent algorithm provides better performance than does the alternate one.
{"title":"Optimal damping algorithm for unrestricted Hartree-Fock calculations","authors":"J. Yamamoto, Y. Mochizuki","doi":"10.1273/cbij.14.14","DOIUrl":"https://doi.org/10.1273/cbij.14.14","url":null,"abstract":"We have developed a couple of optimal damping algorithms (ODAs) for unrestricted Hartree-Fock (UHF) calculations of open-shell molecular systems. A series of equations were derived for both concurrent and alternate constructions of alpha- and beta-Fock matrices in the integral-direct self-consistent-field (SCF) procedure. Several test calculations were performed to check the convergence behaviors. It was shown that the concurrent algorithm provides better performance than does the alternate one.","PeriodicalId":40659,"journal":{"name":"Chem-Bio Informatics Journal","volume":"29 1","pages":"14-33"},"PeriodicalIF":0.3,"publicationDate":"2013-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77292539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We are perhaps at a turning point for making cheminformatics accessible to scientists who are not computational chemists. The proliferation of mobile devices has seen the development of software or ‘apps’ that can be used for sophisticated chemistry workflows. These apps can offer capabilities to the practicing chemist that are approaching those of conventional desktop-based software, whereby each app focuses on a relatively small range of tasks. Mobile apps that can pull in and integrate public content from many sources relating to molecules and data are also being developed. Apps for drug discovery are already evolving rapidly and are able to communicate with each other to create composite workflows of increasing complexity, enabling informatics aspects of drug discovery (i.e. accessing data, modeling and visualization) to be done anywhere by potentially anyone. We will describe how these cheminformatics apps can be used productively and some of the future opportunities that we envision.
{"title":"Cheminformatics workflows using mobile apps","authors":"A. Clark, Antony J. Williams, S. Ekins","doi":"10.1273/CBIJ.13.1","DOIUrl":"https://doi.org/10.1273/CBIJ.13.1","url":null,"abstract":"We are perhaps at a turning point for making cheminformatics accessible to scientists who are not computational chemists. The proliferation of mobile devices has seen the development of software or ‘apps’ that can be used for sophisticated chemistry workflows. These apps can offer capabilities to the practicing chemist that are approaching those of conventional desktop-based software, whereby each app focuses on a relatively small range of tasks. Mobile apps that can pull in and integrate public content from many sources relating to molecules and data are also being developed. Apps for drug discovery are already evolving rapidly and are able to communicate with each other to create composite workflows of increasing complexity, enabling informatics aspects of drug discovery (i.e. accessing data, modeling and visualization) to be done anywhere by potentially anyone. We will describe how these cheminformatics apps can be used productively and some of the future opportunities that we envision.","PeriodicalId":40659,"journal":{"name":"Chem-Bio Informatics Journal","volume":"111 1","pages":"1-18"},"PeriodicalIF":0.3,"publicationDate":"2013-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80580584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xuan Tho Dang, Osamu Hirose, Duong Hung Bui, Thammakorn Saethang, Vu Anh Tran, L. A. T. Nguyen, T. K. T. Le, Mamoru Kubo, Yoichi Yamada, K. Satou
One of the most critical and frequent problems in biomedical data classification is imbalanced class distribution, where samples from the majority class significantly outnumber the minority class. SMOTE is a well-known general over-sampling method used to address this problem; however, in some cases it cannot improve or even reduces classification performance. To address these issues, we have developed a novel minority over-sampling method named safe-SMOTE. Experimental results from two gene expression datasets for cancer classification (i.e., colon-cancer and leukemia) and six imbalanced benchmark datasets from the UCI Machine Learning Repository showed that our method achieved better sensitivity and G-mean values than both the control method (i.e., no over-sampling) and SMOTE. For example, in the colon-cancer dataset, although the sensitivity and specificity achieved by SMOTE (81.36% and 88.63%) were lower than for the control method (81.59% and 89.50%), safe-SMOTE in contrast had these values increase (81.82% and 90.50%). Similarly, the G-mean value of the control (85.45%) decreased to 84.91% when SMOTE was employed, but increased to 86.04% when using safe-SMOTE. In the leukemia dataset, SMOTE was able to improve the sensitivity and G-mean values with respect to the control; however, safe-SMOTE achieved noticeable, even greater improvements for both of these criteria.
{"title":"A Novel Over-Sampling Method and its Application to Cancer Classification from Gene Expression Data","authors":"Xuan Tho Dang, Osamu Hirose, Duong Hung Bui, Thammakorn Saethang, Vu Anh Tran, L. A. T. Nguyen, T. K. T. Le, Mamoru Kubo, Yoichi Yamada, K. Satou","doi":"10.1273/CBIJ.13.19","DOIUrl":"https://doi.org/10.1273/CBIJ.13.19","url":null,"abstract":"One of the most critical and frequent problems in biomedical data classification is imbalanced class distribution, where samples from the majority class significantly outnumber the minority class. SMOTE is a well-known general over-sampling method used to address this problem; however, in some cases it cannot improve or even reduces classification performance. To address these issues, we have developed a novel minority over-sampling method named safe-SMOTE. Experimental results from two gene expression datasets for cancer classification (i.e., colon-cancer and leukemia) and six imbalanced benchmark datasets from the UCI Machine Learning Repository showed that our method achieved better sensitivity and G-mean values than both the control method (i.e., no over-sampling) and SMOTE. For example, in the colon-cancer dataset, although the sensitivity and specificity achieved by SMOTE (81.36% and 88.63%) were lower than for the control method (81.59% and 89.50%), safe-SMOTE in contrast had these values increase (81.82% and 90.50%). Similarly, the G-mean value of the control (85.45%) decreased to 84.91% when SMOTE was employed, but increased to 86.04% when using safe-SMOTE. In the leukemia dataset, SMOTE was able to improve the sensitivity and G-mean values with respect to the control; however, safe-SMOTE achieved noticeable, even greater improvements for both of these criteria.","PeriodicalId":40659,"journal":{"name":"Chem-Bio Informatics Journal","volume":"4 1","pages":"19-29"},"PeriodicalIF":0.3,"publicationDate":"2013-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90106552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}