Pub Date : 2023-06-01Epub Date: 2023-12-11DOI: 10.1109/ichi57859.2023.00139
Aokun Chen, Qian Li, Elizabeth Shenkman, Yonghui Wu, Yi Guo, Jiang Bian
Clinical trials were vital tools to prove the effectiveness and safety of medications. To maximize generalizability, the study sample should represent the sample population and the target population. However, the clinical trial design tends to favor the evaluation of drug safety and procedure (i.e., internal validity) without clear knowledge of its penalty on trial generalizability (i.e., external validity). Alzheimer's Disease (AD) trials are known to have generalizability issues. Thus, in this study, we explore the effect of eligibility criteria on the AD severity patients and the severe adverse event (SAE) among the eligible patients.
临床试验是证明药物有效性和安全性的重要工具。为了最大限度地提高可推广性,研究样本应代表样本人群和目标人群。然而,临床试验设计往往偏重于药物安全性和程序的评估(即内部效度),而不清楚其对试验可推广性(即外部效度)的影响。众所周知,阿尔茨海默病(AD)试验存在可推广性问题。因此,在本研究中,我们探讨了合格标准对 AD 严重程度患者和合格患者中严重不良事件(SAE)的影响。
{"title":"Exploring the Effect of Eligibility Criteria on AD Severity and Severe Adverse Event in Eligible Patients.","authors":"Aokun Chen, Qian Li, Elizabeth Shenkman, Yonghui Wu, Yi Guo, Jiang Bian","doi":"10.1109/ichi57859.2023.00139","DOIUrl":"10.1109/ichi57859.2023.00139","url":null,"abstract":"<p><p>Clinical trials were vital tools to prove the effectiveness and safety of medications. To maximize generalizability, the study sample should represent the sample population and the target population. However, the clinical trial design tends to favor the evaluation of drug safety and procedure (i.e., internal validity) without clear knowledge of its penalty on trial generalizability (i.e., external validity). Alzheimer's Disease (AD) trials are known to have generalizability issues. Thus, in this study, we explore the effect of eligibility criteria on the AD severity patients and the severe adverse event (SAE) among the eligible patients.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2023 ","pages":"756-759"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11273173/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141790216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01Epub Date: 2023-12-11DOI: 10.1109/ichi57859.2023.00122
Leilei Su, Yifan Peng, Zezheng Wang, Cong Sun
Offensive language refers to the use of language in a manner that may offend or harm others who are within earshot or view in a public place. Given the importance of identifying such language in social media for promoting emotional well-being, we propose a prompt learning method and compare its performance with fine-tuning on two widely used datasets, HatEval and OffensEval. Experimental results demonstrate that prompt learning can achieve a performance improvement over fine-tuning in a fully supervised setting.
{"title":"Identification of Offensive Language in Social Media Using Prompt Learning.","authors":"Leilei Su, Yifan Peng, Zezheng Wang, Cong Sun","doi":"10.1109/ichi57859.2023.00122","DOIUrl":"10.1109/ichi57859.2023.00122","url":null,"abstract":"<p><p>Offensive language refers to the use of language in a manner that may offend or harm others who are within earshot or view in a public place. Given the importance of identifying such language in social media for promoting emotional well-being, we propose a prompt learning method and compare its performance with fine-tuning on two widely used datasets, HatEval and OffensEval. Experimental results demonstrate that prompt learning can achieve a performance improvement over fine-tuning in a fully supervised setting.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2023 ","pages":"690-691"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11811837/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143400880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-01DOI: 10.1109/ichi54592.2022.00101
Chonghao Zhang, Luca Bonomi
The use of deep learning techniques in medical applications holds great promises for advancing health care. However, there are growing privacy concerns regarding what information about individual data contributors (i.e., patients in the training set) these deep models may reveal when shared with external users. In this work, we first investigate the membership privacy risks in sharing deep learning models for cancer genomics tasks, and then study the applicability of privacy-protecting strategies for mitigating these privacy risks.
{"title":"Mitigating Membership Inference in Deep Learning Applications with High Dimensional Genomic Data.","authors":"Chonghao Zhang, Luca Bonomi","doi":"10.1109/ichi54592.2022.00101","DOIUrl":"https://doi.org/10.1109/ichi54592.2022.00101","url":null,"abstract":"<p><p>The use of deep learning techniques in medical applications holds great promises for advancing health care. However, there are growing privacy concerns regarding what information about individual data contributors (i.e., patients in the training set) these deep models may reveal when shared with external users. In this work, we first investigate the membership privacy risks in sharing deep learning models for cancer genomics tasks, and then study the applicability of privacy-protecting strategies for mitigating these privacy risks.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2022 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9473339/pdf/nihms-1815588.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10181248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-01Epub Date: 2022-09-08DOI: 10.1109/ichi54592.2022.00027
Maksims Kazijevs, Furkan A Akyelken, Manar D Samad
The unpredictability and unknowns surrounding the ongoing coronavirus disease (COVID-19) pandemic have led to an unprecedented consequence taking a heavy toll on the lives and economies of all countries. There have been efforts to predict COVID-19 case counts (CCC) using epidemiological data and numerical tokens online, which may allow early preventive measures to slow the spread of the disease. In this paper, we use state-of-the-art natural language processing (NLP) algorithms to numerically encode COVID-19 related tweets originated from eight cities in the United States and predict city-specific CCC up to eight days in the future. A city-embedding is proposed to obtain a time series representation of daily tweets posted from a city, which is then used to predict case counts using a custom long-short term memory (LSTM) model. The universal sentence encoder yields the best normalized root mean squared error (NRMSE) 0.090 (0.039), averaged across all cities in predicting CCC six days in the future. The R2 scores in predicting CCC are more than 0.70 and often over 0.8, which suggests a strong correlation between the actual and our model predicted CCC values. Our analyses show that the NRMSE and R2 scores are consistently robust across different cities and different numbers of time steps in time series data. Results show that the LSTM model can learn the mapping between the NLP-encoded tweet semantics and the case counts, which infers that social media text can be directly mined to identify the future course of the pandemic.
{"title":"Mining Social Media Data to Predict COVID-19 Case Counts.","authors":"Maksims Kazijevs, Furkan A Akyelken, Manar D Samad","doi":"10.1109/ichi54592.2022.00027","DOIUrl":"https://doi.org/10.1109/ichi54592.2022.00027","url":null,"abstract":"<p><p>The unpredictability and unknowns surrounding the ongoing coronavirus disease (COVID-19) pandemic have led to an unprecedented consequence taking a heavy toll on the lives and economies of all countries. There have been efforts to predict COVID-19 case counts (CCC) using epidemiological data and numerical tokens online, which may allow early preventive measures to slow the spread of the disease. In this paper, we use state-of-the-art natural language processing (NLP) algorithms to numerically encode COVID-19 related tweets originated from eight cities in the United States and predict city-specific CCC up to eight days in the future. A city-embedding is proposed to obtain a time series representation of daily tweets posted from a city, which is then used to predict case counts using a custom long-short term memory (LSTM) model. The universal sentence encoder yields the best normalized root mean squared error (NRMSE) 0.090 (0.039), averaged across all cities in predicting CCC six days in the future. The <i>R</i> <sup>2</sup> scores in predicting CCC are more than 0.70 and often over 0.8, which suggests a strong correlation between the actual and our model predicted CCC values. Our analyses show that the NRMSE and <i>R</i> <sup>2</sup> scores are consistently robust across different cities and different numbers of time steps in time series data. Results show that the LSTM model can learn the mapping between the NLP-encoded tweet semantics and the case counts, which infers that social media text can be directly mined to identify the future course of the pandemic.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":" ","pages":"104-111"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9490453/pdf/nihms-1836082.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33477762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-01Epub Date: 2022-09-08DOI: 10.1109/ichi54592.2022.00014
Luca Bonomi, Liyue Fan
Sharing time-to-event data is beneficial for enabling collaborative research efforts (e.g., survival studies), facilitating the design of effective interventions, and advancing patient care (e.g., early diagnosis). Despite numerous privacy solutions for sharing time-to-event data, recent research studies have shown that external information may become available (e.g., self-disclosure of study participation on social media) to an adversary, posing new privacy concerns. In this work, we formulate a cohort inference attack for time-to-event data sharing, in which an informed adversary aims at inferring the membership of a target individual in a specific cohort. Our study investigates the privacy risks associated with time-to-event data and evaluates the empirical privacy protection offered by popular privacy-protecting solutions (e.g., binning, differential privacy). Furthermore, we propose a novel approach to privately release individual level time-to-event data with high utility, while providing indistinguishability guarantees for the input value. Our method TE-Sanitizer is shown to provide effective mitigation against the inference attacks and high usefulness in survival analysis. The results and discussion provide domain experts with insights on the privacy and the usefulness of the studied methods.
{"title":"Sharing Time-to-Event Data with Privacy Protection.","authors":"Luca Bonomi, Liyue Fan","doi":"10.1109/ichi54592.2022.00014","DOIUrl":"10.1109/ichi54592.2022.00014","url":null,"abstract":"<p><p>Sharing time-to-event data is beneficial for enabling collaborative research efforts (e.g., survival studies), facilitating the design of effective interventions, and advancing patient care (e.g., early diagnosis). Despite numerous privacy solutions for sharing time-to-event data, recent research studies have shown that external information may become available (e.g., self-disclosure of study participation on social media) to an adversary, posing new privacy concerns. In this work, we formulate a cohort inference attack for time-to-event data sharing, in which an informed adversary aims at inferring the membership of a target individual in a specific cohort. Our study investigates the privacy risks associated with time-to-event data and evaluates the empirical privacy protection offered by popular privacy-protecting solutions (e.g., binning, differential privacy). Furthermore, we propose a novel approach to privately release individual level time-to-event data with high utility, while providing indistinguishability guarantees for the input value. Our method TE-Sanitizer is shown to provide effective mitigation against the inference attacks and high usefulness in survival analysis. The results and discussion provide domain experts with insights on the privacy and the usefulness of the studied methods.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2022 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9473343/pdf/nihms-1815589.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10181249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-01Epub Date: 2022-09-08DOI: 10.1109/ichi54592.2022.00040
Omar A Ibrahim, Sunyang Fu, Maria Vassilaki, Michelle M Mielke, Jennifer St Sauver, Ronald C Petersen, Sunghwan Sohn
Dementia is one of the major health challenges in aging populations, with 50 million people diagnosed worldwide. However, dementia is often underdiagnosed or delayed resulting in missed opportunities for appropriate care plans. Identifying early signs of dementia is essential for better life quality of aging populations. Monitoring early signs of individual health changes could help clinicians diagnose dementia in its early stages with more effective treatment plans. However, rare data for dementia cases compared to the normal (i.e., imbalance class distribution) make it challenging to develop robust supervised learning models. In order to alleviate this issue, we investigated one-class classification (OCC) techniques, which use only majority class (i.e., normal cases) in model development to detect dementia signals from older adult clinical visits. The OCC models identify abnormality of older adults' longitudinal health conditions to predict incident dementia. The predictive performance of the OCC was compared with a recent streaming clustering-based technique and demonstrated higher predictive power. Our analysis showed that OCC has a promising potential to increase power in predicting dementia.
{"title":"Detection of Dementia Signals from Longitudinal Clinical Visits Using One-Class Classification.","authors":"Omar A Ibrahim, Sunyang Fu, Maria Vassilaki, Michelle M Mielke, Jennifer St Sauver, Ronald C Petersen, Sunghwan Sohn","doi":"10.1109/ichi54592.2022.00040","DOIUrl":"10.1109/ichi54592.2022.00040","url":null,"abstract":"<p><p>Dementia is one of the major health challenges in aging populations, with 50 million people diagnosed worldwide. However, dementia is often underdiagnosed or delayed resulting in missed opportunities for appropriate care plans. Identifying early signs of dementia is essential for better life quality of aging populations. Monitoring early signs of individual health changes could help clinicians diagnose dementia in its early stages with more effective treatment plans. However, rare data for dementia cases compared to the normal (i.e., imbalance class distribution) make it challenging to develop robust supervised learning models. In order to alleviate this issue, we investigated one-class classification (OCC) techniques, which use only majority class (i.e., normal cases) in model development to detect dementia signals from older adult clinical visits. The OCC models identify abnormality of older adults' longitudinal health conditions to predict incident dementia. The predictive performance of the OCC was compared with a recent streaming clustering-based technique and demonstrated higher predictive power. Our analysis showed that OCC has a promising potential to increase power in predicting dementia.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2022 ","pages":"211-216"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9728104/pdf/nihms-1852693.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9328507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-01DOI: 10.1109/ichi54592.2022.00024
Yao Ge, Yuting Guo, Yuan-Chi Yang, Mohammed Ali Al-Garadi, Abeed Sarker
Many research problems involving medical texts have limited amounts of annotated data available (e.g., expressions of rare diseases). Traditional supervised machine learning algorithms, particularly those based on deep neural networks, require large volumes of annotated data, and they underperform when only small amounts of labeled data are available. Few-shot learning (FSL) is a category of machine learning models that are designed with the intent of solving problems that have small annotated datasets available. However, there is no current study that compares the performances of FSL models with traditional models (e.g., conditional random fields) for medical text at different training set sizes. In this paper, we attempted to fill this gap in research by comparing multiple FSL models with traditional models for the task of named entity recognition (NER) from medical texts. Using five health-related annotated NER datasets, we benchmarked three traditional NER models based on BERT-BERT-Linear Classifier (BLC), BERT-CRF (BC) and SANER; and three FSL NER models-StructShot & NNShot, Few-Shot Slot Tagging (FS-ST) and ProtoNER. Our benchmarking results show that almost all models, whether traditional or FSL, achieve significantly lower performances compared to the state-of-the-art with small amounts of training data. For the NER experiments we executed, the F1-scores were very low with small training sets, typically below 30%. FSL models that were reported to perform well on non-medical texts significantly underperformed, compared to their reported best, on medical texts. Our experiments also suggest that FSL methods tend to perform worse on data sets from noisy sources of medical texts, such as social media (which includes misspellings and colloquial expressions), compared to less noisy sources such as medical literature. Our experiments demonstrate that the current state-of-the-art FSL systems are not yet suitable for effective NER in medical natural language processing tasks, and further research needs to be carried out to improve their performances. Creation of specialized, standardized datasets replicating real-world scenarios may help to move this category of methods forward.
{"title":"A comparison of few-shot and traditional named entity recognition models for medical text.","authors":"Yao Ge, Yuting Guo, Yuan-Chi Yang, Mohammed Ali Al-Garadi, Abeed Sarker","doi":"10.1109/ichi54592.2022.00024","DOIUrl":"https://doi.org/10.1109/ichi54592.2022.00024","url":null,"abstract":"<p><p>Many research problems involving medical texts have limited amounts of annotated data available (<i>e.g</i>., expressions of rare diseases). Traditional supervised machine learning algorithms, particularly those based on deep neural networks, require large volumes of annotated data, and they underperform when only small amounts of labeled data are available. Few-shot learning (FSL) is a category of machine learning models that are designed with the intent of solving problems that have small annotated datasets available. However, there is no current study that compares the performances of FSL models with traditional models (<i>e.g</i>., conditional random fields) for medical text at different training set sizes. In this paper, we attempted to fill this gap in research by comparing multiple FSL models with traditional models for the task of named entity recognition (NER) from medical texts. Using five health-related annotated NER datasets, we benchmarked three traditional NER models based on BERT-BERT-Linear Classifier (BLC), BERT-CRF (BC) and SANER; and three FSL NER models-StructShot & NNShot, Few-Shot Slot Tagging (FS-ST) and ProtoNER. Our benchmarking results show that almost all models, whether traditional or FSL, achieve significantly lower performances compared to the state-of-the-art with small amounts of training data. For the NER experiments we executed, the F<sub>1</sub>-scores were very low with small training sets, typically below 30%. FSL models that were reported to perform well on non-medical texts significantly underperformed, compared to their reported best, on medical texts. Our experiments also suggest that FSL methods tend to perform worse on data sets from noisy sources of medical texts, such as social media (which includes misspellings and colloquial expressions), compared to less noisy sources such as medical literature. Our experiments demonstrate that the current state-of-the-art FSL systems are not yet suitable for effective NER in medical natural language processing tasks, and further research needs to be carried out to improve their performances. Creation of specialized, standardized datasets replicating real-world scenarios may help to move this category of methods forward.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2022 ","pages":"84-89"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10462421/pdf/nihms-1926966.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10186790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-01DOI: 10.1109/ichi54592.2022.00035
Akhil Shiju, Zhe He
Drug review websites such as Drugs.com provide users' textual reviews and numeric ratings of drugs. These reviews along with the ratings are used for the consumers for choosing a drug. However, the numeric ratings may not always be consistent with text reviews and purely relying on the rating score for finding positive/negative reviews may not be reliable. Automatic classification of user ratings based on textual review can create a more reliable rating for drugs. In this project, we built classification models to classify drug review ratings using textual reviews with traditional machine learning and deep learning models. Traditional machine learning models including Random Forest and Naive Bayesian classifiers were built using TF-IDF features as input. Also, transformer-based neural network models including BERT, Bio_ClinicalBERT, RoBERTa, XLNet, ELECTRA, and ALBERT were built using the raw text as input. Overall, Bio_ClinicalBERT model outperformed the other models with an overall accuracy of 87%. We further identified concepts of the Unified Medical Language System (UMLS) from the postings and analyzed their semantic types stratified by class types. This research demonstrated that transformer-based models can be used to classify drug reviews based solely on textual reviews.
{"title":"Classifying Drug Ratings Using User Reviews with Transformer-Based Language Models.","authors":"Akhil Shiju, Zhe He","doi":"10.1109/ichi54592.2022.00035","DOIUrl":"https://doi.org/10.1109/ichi54592.2022.00035","url":null,"abstract":"<p><p>Drug review websites such as Drugs.com provide users' textual reviews and numeric ratings of drugs. These reviews along with the ratings are used for the consumers for choosing a drug. However, the numeric ratings may not always be consistent with text reviews and purely relying on the rating score for finding positive/negative reviews may not be reliable. Automatic classification of user ratings based on textual review can create a more reliable rating for drugs. In this project, we built classification models to classify drug review ratings using textual reviews with traditional machine learning and deep learning models. Traditional machine learning models including Random Forest and Naive Bayesian classifiers were built using TF-IDF features as input. Also, transformer-based neural network models including BERT, Bio_ClinicalBERT, RoBERTa, XLNet, ELECTRA, and ALBERT were built using the raw text as input. Overall, Bio_ClinicalBERT model outperformed the other models with an overall accuracy of 87%. We further identified concepts of the Unified Medical Language System (UMLS) from the postings and analyzed their semantic types stratified by class types. This research demonstrated that transformer-based models can be used to classify drug reviews based solely on textual reviews.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2022 ","pages":"163-169"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9744636/pdf/nihms-1855900.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10701370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Complementary and Integrative Health (CIH) has gained increasing popularity in the past decades. The overall goal of this study is to represent information pertinent to music therapy, chiropractic and aquatic exercise in an EHR system. A total of 300 clinical notes were randomly selected and manually annotated. Annotations were made for status, symptom and frequency of each approach. This set of annotations was used as a gold standard to evaluate performance of NLP systems used in this study (specifically BioMedICUS, MetaMap and cTAKES) for extracting CIH concepts. Three NLP systems achieved an average lenient match F1-score of 0.50 in all three CIH approaches. BioMedICUS achieved the best performance in music therapy with an F1-score of 0.73. This study is a pilot to investigate CIH representation in clinical note and lays a foundation for using EHR for clinical research for CIH approaches.
{"title":"Annotating Music Therapy, Chiropractic and Aquatic Exercise Using Electronic Health Record.","authors":"Huixue Zhou, Greg Silverman, Zhongran Niu, Jenzi Silverman, Roni Evans, Robin Austin, Rui Zhang","doi":"10.1109/ichi54592.2022.00121","DOIUrl":"10.1109/ichi54592.2022.00121","url":null,"abstract":"<p><p>Complementary and Integrative Health (CIH) has gained increasing popularity in the past decades. The overall goal of this study is to represent information pertinent to music therapy, chiropractic and aquatic exercise in an EHR system. A total of 300 clinical notes were randomly selected and manually annotated. Annotations were made for <i>status</i>, <i>symptom</i> and <i>frequency</i> of each approach. This set of annotations was used as a gold standard to evaluate performance of NLP systems used in this study (specifically BioMedICUS, MetaMap and cTAKES) for extracting CIH concepts. Three NLP systems achieved an average lenient match F1-score of 0.50 in all three CIH approaches. BioMedICUS achieved the best performance in music therapy with an F1-score of 0.73. This study is a pilot to investigate CIH representation in clinical note and lays a foundation for using EHR for clinical research for CIH approaches.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2022 ","pages":"610-611"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10110363/pdf/nihms-1890434.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9751841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-01Epub Date: 2022-09-08DOI: 10.1109/ichi54592.2022.00050
Song Wang, Mingquan Lin, Ying Ding, George Shih, Zhiyong Lu, Yifan Peng
Analyzing radiology reports is a time-consuming and error-prone task, which raises the need for an efficient automated radiology report analysis system to alleviate the workloads of radiologists and encourage precise diagnosis. In this work, we present RadText, a high-performance open-source Python radiology text analysis system. RadText offers an easy-to-use text analysis pipeline, including de-identification, section segmentation, sentence split and word tokenization, named entity recognition, parsing, and negation detection. Superior to existing widely used toolkits, RadText features a hybrid text processing schema, supports raw text processing and local processing, which enables higher accuracy, better usability and improved data privacy. RadText adopts BioC as the unified interface, and also standardizes the output into a structured representation that is compatible with Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), which allows for a more systematic approach to observational research across multiple, disparate data sources. We evaluated RadText on the MIMIC-CXR dataset, with five new disease labels that we annotated for this work. RadText demonstrates highly accurate classification performances, with a 0.91 average precision, 0.94 average recall and 0.92 average F-1 score. We also annotated a test set for the five new disease labels to facilitate future research or applications. We have made our code, documentations, examples and the test set available at https://github.com/bionlplab/radtext.
{"title":"Radiology Text Analysis System (RadText): Architecture and Evaluation.","authors":"Song Wang, Mingquan Lin, Ying Ding, George Shih, Zhiyong Lu, Yifan Peng","doi":"10.1109/ichi54592.2022.00050","DOIUrl":"https://doi.org/10.1109/ichi54592.2022.00050","url":null,"abstract":"<p><p>Analyzing radiology reports is a time-consuming and error-prone task, which raises the need for an efficient automated radiology report analysis system to alleviate the workloads of radiologists and encourage precise diagnosis. In this work, we present RadText, a high-performance open-source Python radiology text analysis system. RadText offers an easy-to-use text analysis pipeline, including de-identification, section segmentation, sentence split and word tokenization, named entity recognition, parsing, and negation detection. Superior to existing widely used toolkits, RadText features a hybrid text processing schema, supports raw text processing and local processing, which enables higher accuracy, better usability and improved data privacy. RadText adopts BioC as the unified interface, and also standardizes the output into a structured representation that is compatible with Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), which allows for a more systematic approach to observational research across multiple, disparate data sources. We evaluated RadText on the MIMIC-CXR dataset, with five new disease labels that we annotated for this work. RadText demonstrates highly accurate classification performances, with a 0.91 average precision, 0.94 average recall and 0.92 average F-1 score. We also annotated a test set for the five new disease labels to facilitate future research or applications. We have made our code, documentations, examples and the test set available at https://github.com/bionlplab/radtext.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":" ","pages":"288-296"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9484781/pdf/nihms-1836549.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40373631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}