Goal
To analyze free-text patient narratives from electronic health records for a deeper understanding of Long COVID experiences, focusing on the impact on patients' lives and the variety of symptoms they encounter.
Dataset
The study analyzed 'journey/major events' attributes from 655 electronic health records of Long COVID patients at Parkview Health's post-COVID clinic, spanning from March 2021 to September 2022. These free-text responses detailed patients' journeys and major events post-COVID-19 infection, discussing symptoms, activities, places visited, and how daily life was affected by Long COVID.
What I did
- Data Preprocessing and Exploration:
- Performed text preprocessing using Python libraries like gensim, spaCy, and LemmInflect, involving punctuation removal, tokenization, lemmatization, and stop words removal.
- Topic Modeling:
- Utilized Python libraries to implement Latent Dirichlet Allocation (LDA), Non-negative Matrix Factorization (NMF), Biterm Topic Model (BTM), and Correlation Explanation (CorEx) topic models.
- Generated models with a varying number of topics, evaluating them based on metrics like keyword overlap, perplexity, coherence, and Jaccard similarity.
Major Findings
Severity of COVID-19: Identified different Long COVID symptoms and recovery needs based on the severity of patients' initial COVID-19 illness.
Diverse Symptomatology: Uncovered a wide range of symptoms, including neurological issues, mental health conditions, sensory changes, and physical discomforts.
Impact on Quality of Life: Highlighted how persistent Long COVID symptoms significantly affect patients' physical, psychological, and social well-being.
Social Determinants of Health: Revealed the challenges Long COVID patients face in accessing appropriate care, returning to health, and maintaining employment.