Characterizing Long COVID Patients

Characterizing Long COVID Patients

Tags
Python
Machine Learning
Cluster Analysis
Published
August 19, 2023
Author

Goal

To characterize Long COVID patients and subtype them using cluster analysis, topic modeling, and thematic analysis on structured and unstructured Electronic Health Records (EHR). This research aims to inform clinical practices and interventions for better patient outcomes.

Dataset

Data from Parkview Health's post-COVID clinic included 655 patients, predominantly female and white, with a median age, significant proportions being overweight and obese. Over half had at least one co-morbidity, like hypertension or heart disease.The dataset was skewed towards specific demographics and health conditions, presenting a challenge in achieving unbiased clustering results.

What I Did

  • Handling Dataset Bias:
    • Acknowledged the inherent biases in the dataset due to its skewed demographic and health condition composition.
    • Conducted multiple tests using different subsets of the dataset (input variables) to mitigate the influence of biases on clustering results.
  • Cluster Analysis:
    • Implemented k-means and agglomerative hierarchical clustering methods.
    • Employed the elbow method and rank correlation for optimal cluster number determination.
    • Carefully selected features for clustering, focusing on COVID-19-related attributes like vaccination status, hospitalization status, and top long COVID symptoms.
    • Conducted a comparative analysis of the clusters obtained from both clustering methods.
  • Innovative Approach to Patient Subtyping:
    • Focused on a broader set of symptoms rather than limiting to symptom severity or binary categorizations, as seen in previous studies.
    • This approach allowed for a more comprehensive understanding of Long COVID patient subtypes.
 
Major Findings:
Diverse Patient Subtypes Identified: Uncovered nine distinct patient clusters, each characterized by a unique combination of symptoms, thereby offering a more granular view of Long COVID phenotypes.
Model Robustness: Demonstrated a high degree of overlap between the clusters obtained from different clustering methods, enhancing the robustness of the findings.
Broad Symptom Coverage: Compared to existing studies, the identified subtypes encompassed a wider array of symptoms, providing a more holistic understanding of Long COVID.