Pub Date : 2023-12-01Epub Date: 2023-09-13DOI: 10.1089/big.2023.0014
S V N Sreenivasu, P Santosh Kumar Patra, Vasujadevi Midasala, G S N Murthy, Krishna Chaitanya Janapati, J N V R Swarup Kumar, Pala Mahesh Kumar
Tongue analysis plays the major role in disease type prediction and classification according to Indian ayurvedic medicine. Traditionally, there is a manual inspection of tongue image by the expert ayurvedic doctor to identify or predict the disease. However, this is time-consuming and even imprecise. Due to the advancements in recent machine learning models, several researchers addressed the disease prediction from tongue image analysis. However, they have failed to provide enough accuracy. In addition, multiclass disease classification with enhanced accuracy is still a challenging task. Therefore, this article focuses on the development of optimized deep q-neural network (DQNN) for disease identification and classification from tongue images, hereafter referred as ODQN-Net. Initially, the multiscale retinex approach is introduced for enhancing the quality of tongue images, which also acts as a noise removal technique. In addition, a local ternary pattern is used to extract the disease-specific and disease-dependent features based on color analysis. Then, the best features are extracted from the available features set using the natural inspired Remora optimization algorithm with reduced computational time. Finally, the DQNN model is used to classify the type of diseases from these pretrained features. The obtained simulation performance on tongue imaging data set proved that the proposed ODQN-Net resulted in superior performance compared with state-of-the-art approaches with 99.17% of accuracy and 99.75% and 99.84% of F1-score and Mathew's correlation coefficient, respectively.
{"title":"ODQN-Net: Optimized Deep Q Neural Networks for Disease Prediction Through Tongue Image Analysis Using Remora Optimization Algorithm.","authors":"S V N Sreenivasu, P Santosh Kumar Patra, Vasujadevi Midasala, G S N Murthy, Krishna Chaitanya Janapati, J N V R Swarup Kumar, Pala Mahesh Kumar","doi":"10.1089/big.2023.0014","DOIUrl":"10.1089/big.2023.0014","url":null,"abstract":"<p><p>Tongue analysis plays the major role in disease type prediction and classification according to Indian ayurvedic medicine. Traditionally, there is a manual inspection of tongue image by the expert ayurvedic doctor to identify or predict the disease. However, this is time-consuming and even imprecise. Due to the advancements in recent machine learning models, several researchers addressed the disease prediction from tongue image analysis. However, they have failed to provide enough accuracy. In addition, multiclass disease classification with enhanced accuracy is still a challenging task. Therefore, this article focuses on the development of optimized deep q-neural network (DQNN) for disease identification and classification from tongue images, hereafter referred as ODQN-Net. Initially, the multiscale retinex approach is introduced for enhancing the quality of tongue images, which also acts as a noise removal technique. In addition, a local ternary pattern is used to extract the disease-specific and disease-dependent features based on color analysis. Then, the best features are extracted from the available features set using the natural inspired Remora optimization algorithm with reduced computational time. Finally, the DQNN model is used to classify the type of diseases from these pretrained features. The obtained simulation performance on tongue imaging data set proved that the proposed ODQN-Net resulted in superior performance compared with state-of-the-art approaches with 99.17% of accuracy and 99.75% and 99.84% of F1-score and Mathew's correlation coefficient, respectively.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"452-465"},"PeriodicalIF":4.6,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10223867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-01Epub Date: 2023-10-27DOI: 10.1089/big.2022.0178
Megan Schröder, Sam H A Muller, Eleni Vradi, Johanna Mielke, Yvonne M F Lim, Fabrice Couvelard, Menno Mostert, Stefan Koudstaal, Marinus J C Eijkemans, Christoph Gerlinger
Sharing individual patient data (IPD) is a simple concept but complex to achieve due to data privacy and data security concerns, underdeveloped guidelines, and legal barriers. Sharing IPD is additionally difficult in big data-driven collaborations such as Bigdata@Heart in the Innovative Medicines Initiative, due to competing interests between diverse consortium members. One project within BigData@Heart, case study 1, needed to pool data from seven heterogeneous data sets: five randomized controlled trials from three different industry partners, and two disease registries. Sharing IPD was not considered feasible due to legal requirements and the sensitive medical nature of these data. In addition, harmonizing the data sets for a federated data analysis was difficult due to capacity constraints and the heterogeneity of the data sets. An alternative option was to share summary statistics through contingency tables. Here it is demonstrated that this method along with anonymization methods to ensure patient anonymity had minimal loss of information. Although sharing IPD should continue to be encouraged and strived for, our approach achieved a good balance between data transparency while protecting patient privacy. It also allowed a successful collaboration between industry and academia.
{"title":"Sharing Medical Big Data While Preserving Patient Confidentiality in Innovative Medicines Initiative: A Summary and Case Report from BigData@Heart.","authors":"Megan Schröder, Sam H A Muller, Eleni Vradi, Johanna Mielke, Yvonne M F Lim, Fabrice Couvelard, Menno Mostert, Stefan Koudstaal, Marinus J C Eijkemans, Christoph Gerlinger","doi":"10.1089/big.2022.0178","DOIUrl":"10.1089/big.2022.0178","url":null,"abstract":"<p><p>Sharing individual patient data (IPD) is a simple concept but complex to achieve due to data privacy and data security concerns, underdeveloped guidelines, and legal barriers. Sharing IPD is additionally difficult in big data-driven collaborations such as Bigdata@Heart in the Innovative Medicines Initiative, due to competing interests between diverse consortium members. One project within BigData@Heart, case study 1, needed to pool data from seven heterogeneous data sets: five randomized controlled trials from three different industry partners, and two disease registries. Sharing IPD was not considered feasible due to legal requirements and the sensitive medical nature of these data. In addition, harmonizing the data sets for a federated data analysis was difficult due to capacity constraints and the heterogeneity of the data sets. An alternative option was to share summary statistics through contingency tables. Here it is demonstrated that this method along with anonymization methods to ensure patient anonymity had minimal loss of information. Although sharing IPD should continue to be encouraged and strived for, our approach achieved a good balance between data transparency while protecting patient privacy. It also allowed a successful collaboration between industry and academia.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"399-407"},"PeriodicalIF":4.6,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10733752/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"61566098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yvonne Nartey, C. Crooks, Joe West, Timothy R. Card, Laila J. Tata
{"title":"The incidence and prevalence of coeliac disease in the United Kingdom","authors":"Yvonne Nartey, C. Crooks, Joe West, Timothy R. Card, Laila J. Tata","doi":"10.1370/afm.22.s1.5051","DOIUrl":"https://doi.org/10.1370/afm.22.s1.5051","url":null,"abstract":"","PeriodicalId":51314,"journal":{"name":"Big Data","volume":"18 1","pages":""},"PeriodicalIF":4.6,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139303896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Changes in Reasons for Visits to Primary Care as a Result of the COVID-19 Pandemic: by INTRePID","authors":"Karen Tu, M. Lapadula","doi":"10.1370/afm.22.s1.5425","DOIUrl":"https://doi.org/10.1370/afm.22.s1.5425","url":null,"abstract":"","PeriodicalId":51314,"journal":{"name":"Big Data","volume":"11 1","pages":""},"PeriodicalIF":4.6,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139301044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
William Curry, Wen-Jan Tuan, Qiushi Chen, Andrew Chung
{"title":"Breast cancer screening during the COVID-19 Pandemic in the United States: Results from real-world health records data","authors":"William Curry, Wen-Jan Tuan, Qiushi Chen, Andrew Chung","doi":"10.1370/afm.22.s1.4885","DOIUrl":"https://doi.org/10.1370/afm.22.s1.4885","url":null,"abstract":"","PeriodicalId":51314,"journal":{"name":"Big Data","volume":"48 1","pages":""},"PeriodicalIF":4.6,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139292120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tarin Clay, Melissa Filippi, Elise Robertson, Cory B. Lutgen, Elisabeth F. Callen
{"title":"A Novel Method for Utilizing Electronic Health Record Data in Condition-specific Research","authors":"Tarin Clay, Melissa Filippi, Elise Robertson, Cory B. Lutgen, Elisabeth F. Callen","doi":"10.1370/afm.22.s1.4955","DOIUrl":"https://doi.org/10.1370/afm.22.s1.4955","url":null,"abstract":"","PeriodicalId":51314,"journal":{"name":"Big Data","volume":"12 1","pages":""},"PeriodicalIF":4.6,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139294842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chance R. Strenth, David Schneider, U. Sambamoorthi, Sravan Mattevada, Kimberly Fulda, Bhaskar Thakur, Anna Espinoza
{"title":"Harmonized Healthcare Database across Family Medicine Institutions","authors":"Chance R. Strenth, David Schneider, U. Sambamoorthi, Sravan Mattevada, Kimberly Fulda, Bhaskar Thakur, Anna Espinoza","doi":"10.1370/afm.22.s1.5404","DOIUrl":"https://doi.org/10.1370/afm.22.s1.5404","url":null,"abstract":"","PeriodicalId":51314,"journal":{"name":"Big Data","volume":"14 1","pages":""},"PeriodicalIF":4.6,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139291188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Winston R. Liaw, Ben King, Omolola E. Adepoju, Jiangtao Luo, Ioannis Kakadiaris, Todd Prewitt, Jessica Dobbins, Pete Womack
{"title":"Identifying the Factors Associated with the Accumulation of Diabetes Complications to Inform a Prediction Tool","authors":"Winston R. Liaw, Ben King, Omolola E. Adepoju, Jiangtao Luo, Ioannis Kakadiaris, Todd Prewitt, Jessica Dobbins, Pete Womack","doi":"10.1370/afm.22.s1.5071","DOIUrl":"https://doi.org/10.1370/afm.22.s1.5071","url":null,"abstract":"","PeriodicalId":51314,"journal":{"name":"Big Data","volume":"23 1","pages":""},"PeriodicalIF":4.6,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139291940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Organizations have been investing in analytics relying on internal and external data to gain a competitive advantage. However, the legal and regulatory acts imposed nationally and internationally have become a challenge, especially for highly regulated sectors such as health or finance/banking. Data handlers such as Facebook and Amazon have already sustained considerable fines or are under investigation due to violations of data governance. The era of big data has further intensified the challenges of minimizing the risk of data loss by introducing the dimensions of Volume, Velocity, and Variety into confidentiality. Although Volume and Velocity have been extensively researched, Variety, "the ugly duckling" of big data, is often neglected and difficult to solve, thus increasing the risk of data exposure and data loss. In mitigating the risk of data exposure and data loss in this article, a framework is proposed to utilize algorithmic classification and workflow capabilities to provide a consistent approach toward data evaluations across the organizations. A rule-based system, implementing the corporate data classification policy, will minimize the risk of exposure by facilitating users to identify the approved guidelines and enforce them quickly. The framework includes an exception handling process with appropriate approval for extenuating circumstances. The system was implemented in a proof of concept working prototype to showcase the capabilities and provide a hands-on experience. The information system was evaluated and accredited by a diverse audience of academics and senior business executives in the fields of security and data management. The audience had an average experience of ∼25 years and amasses a total experience of almost three centuries (294 years). The results confirmed that the 3Vs are of concern and that Variety, with a majority of 90% of the commentators, is the most troubling. In addition to that, with an approximate average of 60%, it was confirmed that appropriate policies, procedure, and prerequisites for classification are in place while implementation tools are lagging.
{"title":"Big Data Confidentiality: An Approach Toward Corporate Compliance Using a Rule-Based System.","authors":"Georgios Vranopoulos, Nathan Clarke, Shirley Atkinson","doi":"10.1089/big.2022.0201","DOIUrl":"https://doi.org/10.1089/big.2022.0201","url":null,"abstract":"<p><p>Organizations have been investing in analytics relying on internal and external data to gain a competitive advantage. However, the legal and regulatory acts imposed nationally and internationally have become a challenge, especially for highly regulated sectors such as health or finance/banking. Data handlers such as Facebook and Amazon have already sustained considerable fines or are under investigation due to violations of data governance. The era of big data has further intensified the challenges of minimizing the risk of data loss by introducing the dimensions of Volume, Velocity, and Variety into confidentiality. Although Volume and Velocity have been extensively researched, Variety, \"the ugly duckling\" of big data, is often neglected and difficult to solve, thus increasing the risk of data exposure and data loss. In mitigating the risk of data exposure and data loss in this article, a framework is proposed to utilize algorithmic classification and workflow capabilities to provide a consistent approach toward data evaluations across the organizations. A rule-based system, implementing the corporate data classification policy, will minimize the risk of exposure by facilitating users to identify the approved guidelines and enforce them quickly. The framework includes an exception handling process with appropriate approval for extenuating circumstances. The system was implemented in a proof of concept working prototype to showcase the capabilities and provide a hands-on experience. The information system was evaluated and accredited by a diverse audience of academics and senior business executives in the fields of security and data management. The audience had an average experience of ∼25 years and amasses a total experience of almost three centuries (294 years). The results confirmed that the 3Vs are of concern and that Variety, with a majority of 90% of the commentators, is the most troubling. In addition to that, with an approximate average of 60%, it was confirmed that appropriate policies, procedure, and prerequisites for classification are in place while implementation tools are lagging.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71415222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}