Jun 2020 – Sep 2020
[IEEE] Machine Learning Covid-19 Comorbidity Research
Built random forest neural network to predict individuals' COVID-19 status with 90% accuracy
Full abstract: Predictions of COVID-19 Infection Severity Based on Co-associations between the SNPs of Co-morbid Diseases and COVID-19 through Machine Learning of Genetic Data DOI1 DOI2
In this research, a quantitative model is built to predict people's susceptibility to COVID-19 based on their genomes. Identifying people vulnerable to COVID-19 infections is crucial in stopping the spread of the virus. In previous studies, researchers have found that individuals with comorbid diseases have higher chances of being infected and developing more severe COVID-19 conditions. However, these patterns are only observed through correlational analyses between patient phenotypes and the severity of their COVID-19 infection. In this study, genetic variants underlying the observed comorbidity patterns are analyzed through machine learning of COVID-19 data from GWAS studies, which may reveal biological pathways underlying COVID-19 contraction that are essential to the development of effective and targeted therapeutics. Furthermore, through combining genetic variants with the individual's phenotypes, this study built a Neural Network model and Random Forest classifier to predict an individual's likelihood of COVID-19 infection. The Random Forest Classifier in this study shows that on-going symptoms are generally better predictors of COVID-19 condition (higher impurity-based feature importance) than diseases or medical histories. In addition, when trained with genomic data, the comorbid disease impact ranking deduced by the resulting RF model is highly consistent with phenotypic comorbidity patterns observed in past studies.