Document Type : Original article
Abstract
Background: Machine learning models could assist physicians in identifying high-risk COVID-19 patients. This study aimed to predict the Intensive Care Unit (ICU) admission in COVID-19 hospitalized patients by the Artificial Neural Network (ANN) model combined with Elastic Net algorithm.
Methods: In this prospective study, the data of 139 COVID-19 patients admitted to Imam Reza Hospital in Tabriz between 20 March and 5 April 2020, were analyzed. The Elastic Net method was used to choose features with high importance. The chosen variables were standardized and ANN was fitted to the data with one hidden layer based on the descending gradient algorithm. To validate the model, the training and test group method with a ratio of 70 to 30 was used. The model’s predictive power was reported by calculating the overall accuracy, sensitivity, specificity, and Area Under the ROC curve (AUC).
Results: According to the results of the Elastic Net, the ANN model was constructed based on age, sex, body mass index, diabetes mellitus status, history of heart disease, and vital signs including systolic and diastolic blood pressure, saturated oxygen level, pulse rate, respiration rate, and body temperature. The overall accuracy of this model was 93.15%, sensitivity 80%, specificity 95.8%, and AUC 0.90. Saturated oxygen level, pulse rate, and age were the most important and predictive variables.
Conclusion: In the investigated sample of patients admitted with COVID-19, the fitted ANN model had acceptable performance to predict the ICU admission. This finding could be useful for physicians and policy makers.
Keywords: COVID-19, Intensive care units, Neural networks, ROC curve
Introduction
In December 2019, an increasing number of abnormal cases of pneumonia were reported in Wuhan, China. On March 11, 2020, a new coronavirus, called COVID-19 was identified as the cause of this viral disease, and the World Health Organization (WHO) recognized COVID-19 as a global pandemic with very high morbidity and mortality.
According to statistics, by June 2023, more than 767 million people worldwide were infected with the disease and more than 6.9 million people had died (1). According to reports from different countries, there are many differences in the outcome of patients and from mild infection and complete recovery to hospitalization, the need for the Intensive Care Unit (ICU), and death is the consequence of this disease, with factors such as age, underlying diseases, and gender affecting the consequences of this disease (2-9).
ICU admission is one of the consequences that is very important to be predicted at the time of hospitalization. From a therapeutic point of view, predicting the ICU admission consequence based on the influencing factors can help to triage the patients and can guide clinicians in treatment choices at the time of hospitalization to prevent the progression of the disease. Also, from a managerial perspective, it leads to the management of health service resources and equipment. If it is predicted that the patient would need to be hospitalized in ICU in the coming days, it would lead to managing the facilities and determining the number of ICU beds required for the coming days, or in the absence of an empty bed, it is effective in coordination with other medical centers for patient admission.
Model predictions are gaining increased interest in clinical medicine. Machine Learning (ML) algorithms as high-performance prediction tools, are used to predict various clinical outcomes.
So far there are several studies that used machine learning with various algorithms and approaches such as Artificial Neural Networks (ANNs), Decision Trees (DT), Support Vector Machine (SVM), Random Forest (RF), and Naive Bayes (NB) (6-9) in predicting ICU admission in COVID-19 patients. Given the potential utility of machine learning-based decision rules and the urgency of the COVID-19 pandemic, it is highly important to select the approaches of machine learning which lead to building more efficient models. One of these approaches is using feature selection algorithms such as LASSO, Elastic Net, and genetic algorithm to choose the minimum number of effective and accessible variables to construct a more efficient and applicable model.
In theory, a large number of variables can be contributed to predicting the outcomes, but in practice, including all of them in the prediction model is not feasible.
High-dimensional datasets lead to statistical or mathematical problems. Irrelevancy and redundancy in features can increase the misperception of ML algorithms and decrease learning accuracy (10). Furthermore, most of the evaluated features are not routinely measured and registered in the hospital registration system and the constructed model would not be applicable to future patients. So, in practice, one should use feature selection methods to select the most important predictors and introduce them to the data registration unit of hospitals to collect them routinely for all relevant patients. So, the built model will have an executable capability at the general level and for a wide range of future patients.
Given the importance of COVID-19 disease and the large number of cases and the lack of time and resources to accurately assess all the variables effective in predicting a patient’s clinical condition, building an efficient model just with minimum appropriate predictors is required. To the best of our knowledge, no study tries to predict ICU admission by machine learning algorithms combined with feature selection methods. Due to the great importance of having a more efficient and feasible model to predict ICU admission for COVID-19 patients, the novelty of this study is constructing an ANN model paired with Elastic Net to predict ICU admission.
Thus, this study aimed to construct an ANN model to predict the outcome of ICU admission for hospitalized patients with COVID-19 based on predictors selected by the Elastic Net feature selection method.
Materials and methods
Data
This study is a secondary study and contains information on 139 patients with COVID-19 who were referred to Imam Reza Hospital between March 20 and April 5, 2020, and their data has been collected during the original study (11). Informed consent for participation and publication had been obtained from patients or their close relatives.
The present study is a prospective observational study and 139 patients who had participated in the original study and underwent the normal course of treatment in Imam Reza Hospital, were included in this study. The ethics committee approval code for the current study is IR.TBZMED.REC.1400.238.
ICU admission was set as the outcome of the study and a wide range of independent variables including demographic information, anthropometric indices, underlying diseases, vital signs of the patient at the time of admission in the hospital, clinical and laboratory tests, and variables related to disease severity (lung CT scan and lung involvement rate) were considered as the independent variables.
Statistical analysis
At first, for preprocessing, data were cleaned and variables removed with over 20% of the missingness rate. For the imputation of the missing data, Multiple Imputations by Chained Equations was used.
In the available data, there were more than 50 input variables (Tables 1-4), which due to the small sample size and computational problems, it was not possible to include all of them in the neural network model. Also, it is important from the point of view of feasibility and resource limitations that only a limited number of important and scrutinized variables are selected and used for the modeling process. So, in order to make the model simpler and more practical, at first, all features of interest were entered into the Elastic Net model to identify the most important predictors of ICU admission. To determine the optimal λ value, 10-fold cross-validation with the “lambda.1se” criterion was performed, where the value of λ represented the most regularized model in which the error was within one standard error of the minimum. Retained variables with nonzero coefficients were used for ANN model construction.
For ANN modeling, the multilayer perceptron ANN method was fitted. The variables were standardized for inclusion in the model and the neural network was fitted to the data with one hidden layer based on the descending gradient algorithm. The activation function for the output layer was softmax and Hyperbolic for the hidden layer tangent. To prevent overfitting, the training and test group method with a ratio of 70 to 30 was used. Training and test validation method is a common method in machine learning to determine the validity of the model. For this purpose, a model is made from the data of the training group and its validity is checked by the data of the test group. Therefore, the accuracy of the model in this study was checked and reported by the test group. The random over-sampling method (12) was used to balance the two classes of the dataset. The predictive power of the model was assessed by calculating the overall accuracy, sensitivity, specificity, and Area Under the ROC curve (AUC). To determine the effective variables, the index of the importance and normalized importance of the variables was calculated and reported. Importance index of a predictor is measure of the effect changes in that predictor have on the output. Normalized importance measures how strongly a variable acts as a primary of surrogate predictor relative to the variable having the largest importance value. Rating from 0 to 100, with the variable having the largest importance value scored as 100.
Analysis was performed by R [“glmnet” (13), “nnet” (14), and mice (15) packages].
Elastic net method
Elastic net, as a variable selection algorithm, is an extension of the LASSO method. Elastic net shrinks some coefficients to be very small, and on the other hand, set some of them to be zero. This method is robust to existing correlations among the predictors and in this case, in comparison to LASSO and ridge regression, leads to lower mean squared errors (16,17). Moreover, the Elastic Net identifies more correctly influential variables than LASSO and has a lower false positive rate than ridge regression (18). The method optimizes the following equation where λs, are the amount of shrinkage
ANN model
Each ANN is made up of layers that contain components called neurons. A neuron is the smallest unit of information processing that forms the basis of network performance. Neurons on a surface form a layer. Each layer also has a weight that indicates the extent to which neurons affect each other. A neural network usually has three layers: input, middle (hidden), and output. Each input layer is associated with one or more hidden layers and the hidden layers are also related to the output layer, which results in the output layer being the network output for the desired outcome. For each neuron, there is a threshold value and an activation function that is involved in the model training process. Learning in the Perceptron network is performed by minimizing the mean squares of the output error and by using the Backpropagation learning algorithm with numerical methods (20).
Results
Of the 139 patients admitted to the hospital, 65 were female (46.8%) and 74 were male (53.2%). Of the patients under study, 23 (16.5%) patients needed to be hospitalized in ICU during their hospital stay. All demographic and clinical variables, medical history, and clinical signs of disease during hospitalization are reported in tables 1-4.
For model construction, at first, the variables that were most related to the ICU admission outcome were selected through the elastic net algorithm. A large number of variables were removed from the modeling process by performing the Elastic Net feature selection algorithm and considering their coefficients. Among 50 registered variables, 11 variable with the highest predictive value were identified in this stage. The number of selected variables were: age, sex, Body Mass Index (BMI), diabetes mellitus status, history of heart disease, and vital signs including systolic and diastolic blood pressure, saturated oxygen level, pulse rate, respiration rate, and body temperature. Then these 11 selected variables were entered into the ANN model and the final model was built with these variables. Finally, the ANN model was fitted based on these 11 variables. To assess the performance of the fitted model, training and test sets method was used. According to the results of the test data, the overall accuracy of this model was 93.15%, sensitivity 80%, specificity 95.8%, and AUC 0.90. The variables with the most predictive ability based on the normalized importance index were: blood oxygen saturation level at the time of hospitalization (100%), pulse rate (63.9%), respiration rate (55.9 %), and age (42.2%) followed by systolic blood pressure (31.3%) and diastolic blood pressure at the time of hospitalization (23.2%). Ischemic heart disease (21.9%), diabetes (16.6%), and BMI (13.6%) had moderate importance; finally, the least important variables were body temperature at the time of hospitalization (8.5%) and gender (5.6%) (Figure 1).
Table 1. Demographic and clinical characteristics of the patients with COVID19 Hospitalized in Imam Reza Hospital, Tabriz
|
Characteristics |
Patients admitted to ICU |
Hospitalized patients without the need for ICU |
|
|
Demographic |
Age (years) |
67.21±15.26 |
59.26±15.52 |
|
Gender (male) |
11(47.8%) |
63(54.3%) |
|
|
BMI (m2/kg) |
30.17±6.14 |
28.59±4.74 |
|
|
Smoking (yes) |
1(7.1%) |
3(3.2%) |
|
|
Vital sign |
SPO2 (%) |
85.07±6.00 |
90.14±4.09 |
|
Respiratory rate (breaths-min) |
25.95±15.42 |
22.29±7.06 |
|
|
Pulse rate (breaths-min) |
83.00±17.34 |
88.80±13.32 |
|
|
SBP (mmHg) |
128.47±17.08 |
121.84±14.38 |
|
|
DBP (mmHg) |
81.30±8.81 |
77.29±9.46 |
|
|
Temperature (°C) |
37.30±.55 |
37.28±.70 |
|
|
Laboratory findings |
WBC (cells/microliter) |
5500.00(3800.00) |
7900.00(3400.00) |
|
Neutrophil (×109/L) |
79.00(15.00) |
83.00(15.50) |
|
|
Lymphocyte (×109/L) |
19.65(13.78) |
15.00(14.20) |
|
|
Hemoglobine (×109/L) |
13.20(3.14) |
11.80(2.20) |
|
|
Platelet (×109/L) |
181.00(104.00) |
157.00(157.00) |
|
|
Creatinine (mg/dL) |
1.15(.50) |
1.05(.32) |
|
|
AST (units/L) |
35.00(24.00) |
28.00(21.75) |
|
|
ALT(units/L) |
26.00(17.00) |
23.00(17.75) |
|
|
LDH(units/L) |
630.00(294.25) |
502.00(203.75) |
|
|
CPK(units/L) |
169.00(245.75) |
108.00(114.50) |
|
|
Sodium (mEq/L) |
134.00(4.00) |
136.00(4.00) |
|
|
Potassium (mEq/L) |
4.20(.90) |
4.00(.60) |
|
|
Magnesium(mEq/L) |
2.00(.30) |
2.00(.40) |
|
|
PT (s) |
14.30(1.00) |
13.60(1.88) |
|
|
PTT (s) |
36.00(8.00) |
33.00(7.00) |
|
|
INR (ratio) |
1.05(.09) |
1.02(.08) |
|
Continuous variables were described as means±SD for symmetric variables and median (IQR) for asymmetric variables. Categorical variables expressed as number (percentage %). Variables were described for nonmissing cases.
BMI: Body Mass Index(m2/kg); PCR: Polymerase Chain Reaction; SPO2: Oxygen Saturation; SBP: Systolic Blood Pressure(mmHg); DBP: Diastolic Blood Pressure(mmHg); WBC: White Blood Cell; ALT: Alanine Aminotransferase (units/L); AST: Aspartate Aminotransferase (units/L); LDH: Lactate Dehydrogenase; CPK: Creatine Phosphokinase; PT: prothrombin time; PTT: A partial thromboplastin time; INR: The international normalised ratio
Table 2. Clinical symptom and Sensory complication of patients with COVID19 Hospitalized in Imam Reza Hospital, Tabriz
|
Characteristics |
|
Patients admitted to ICU
|
Hospitalized patients without the need for ICU
|
|
Symptoms |
Fever |
10(43.5%) |
57(49.1%) |
|
Cough |
17(73.9%) |
90(77.6%) |
|
|
Dyspnea |
17(73.9%) |
79(68.1%) |
|
|
Myalgia |
8(34.8%) |
65(56.0%) |
|
|
Diarrhea |
4(17.4%) |
11(9.5%) |
|
|
Abdominal pain |
1(4.3%) |
6(5.2%) |
|
|
Headache |
3(13.0%) |
16(13.8%) |
|
|
Nausea |
2(8.7%) |
19(16.4%) |
|
|
Loss of appetite |
3(13.0%) |
12(10.3%) |
|
|
Weakness |
10(43.5%) |
23(19.8%) |
|
|
Shivering |
2(8.7%) |
23(19.8%) |
|
|
Sensory complication |
Taste |
5(35.7%) |
39(42.4%) |
|
Smell |
5(35.7%) |
45(48.9%) |
|
|
Hearing |
0(0.0%) |
10(11.1%) |
Categorical variables expressed as number (percentage %). Variables were described for nonmissing cases.
Table 3. Comorbidities information of the patients with COVID19 Hospitalized in Imam Reza Hospital, Tabriz
|
Characteristics |
|
Patients admitted to ICU |
Hospitalized patients without the need for ICU |
|
|
Comorbidities |
Diabetes |
11(47.8%) |
32(27.6%) |
|
|
Hypertention |
12(52.2%) |
47(40.5%) |
||
|
Hyperlepidemia |
7(30.4%) |
15(12.9%) |
||
|
IHD |
9(39.1%) |
15(12.9%) |
||
|
CHF |
1(4.3%) |
2(1.7%) |
||
|
Hypothyroidism |
0(0.0%) |
7(6.0%) |
||
|
CKD |
1(4.3%) |
1(0.9%) |
||
|
Asthma |
4(17.4%) |
6(5.2%) |
||
|
COPD |
0(0.0%) |
6(5.2%) |
||
IHD: Ischemic Heart Disease; CHF: Chronic Heart Failure; CKD: Chronic Kidney Disease; COPD: Chronic obstructive pulmonary disease; Categorical variables expressed as number (percentage %). Variables were described for nonmissing cases.
Table 4. Lung related Imaging finding of the patients with COVID19 Hospitalized in Imam Reza Hospital, Tabriz
|
Characteristics |
|
Patients admitted to ICU |
Hospitalized patients without the need for ICU |
|
Involvement |
CT peri/bilateral |
10(90.9%) |
49(92.5%) |
|
CT peri/unilateral |
0(0%) |
4(7.5%) |
|
|
CT Central/unilateral |
1(9.1%) |
0(0%) |
CT: Computed Tomography; Categorical variables expressed as number (percentage %). Variables were described for nonmissing cases.
Discussion
In the present study, a multilayer ANN prediction model was constructed to predict ICU admission for hospitalized COVID-19 patients based on the predictors selected by the Elastic net algorithm. ANN was trained based on selected features, and data balancing was performed by the random over-sampling method. The overall accuracy of the fitted ANN model was 93.15%, sensitivity 80%, specificity 95.8%, and AUC 0.90. Saturated oxygen level, pulse rate, respiration rate, age, and blood pressure were the most important and predictive variables.
Given that at the time of hospitalization, the prediction of the need to be admitted to the ICU in the coming days is very important from the perspective of treatment and resource management, it is of great importance to have a model to make such prediction based on the minimum number of important variables. In the investigated sample of patients, the fitted ANN model with a low number of available variables had acceptable performance to predict ICU admission. The computational command of this algorithm is available and by applying it to the input data, the output and the prediction result can be obtained with the mentioned accuracy. This model can be used as the prediction tool to predict the ICU admission for future patients in medical centers or other samples by connecting the fitted model to the new hospitalized cases via some programming codes.
The variables that remain in the final model are the variables that can be easily obtained with the least amount of time and cost. Although adding laboratory information, the severity of lung involvement, and other factors affecting the patient’s clinical condition to the model can increase the accuracy of the model, the main advantage of the present model is that with the basic information that is always available for almost any COVID-19 patient, the need for ICU admission can be predicted with acceptable accuracy.
According to the results, in the present sample, several prognostic factors for the need for ICU admission in COVID-19 patients admitted to Imam Reza Hospital in Tabriz were identified that some of them are common with similar studies (6-9), but among them, the variables of vital signs at the time of hospitalization and patient age were the most important predictors of the ICU admission. Regarding vital indicators, including systolic and diastolic blood pressure, pulse rate, body temperature, and respiration rate, it can be said that they are a reflection of body function, and to a large extent, the result of organ dysfunction is reflected in these indicators. So, they can be a good representative of the general condition of the body. Therefore, in this study, it can be seen that vital signs were the most important predictors among a wide range of evaluated variables.
Regarding diabetes, many studies have shown the effect of this comorbid disease on the outcomes of COVID-19 patients, including a meta-analysis to identify the predictors of mortality in hospitalized COVID-19 patients conducted in 2020. According to the results of this meta-analysis, diabetes, as the second most common comorbid disease in patients (after hypertension), is to be associated with a 2-fold increase in COVID-19 mortality odds (4). Also, according to the results of the study about the survival of diabetic patients with COVID-19, the adjusted Hazard Ratio (HR) of death for hospitalized patients was 1.23, which was significant (21).
Regarding the history of hypertension and heart disease, studies consistently with the present results show the effect of these variables on the patient’s clinical condition and disease outcomes (4).
Regarding other variables, although their presence in the model increases the predictive value of the model to some extent, according to the principle of parsimony, the improvement of the model fit is not significant enough to replace such more complex models with the current simpler model including a minimum number of important and available variables.
Numerous studies have been performed in predicting ICU admission in patients with COVID-19 by various methods, including Artificial Neural Networks (ANNs), Decision Trees (DT), Support Vector Machine (SVM), Random Forest (RF), and Naive Bayes (NB) (7-9) that their models performance were similar to our results. Podder et al (7) reported an accuracy of 98.13% and AUC 99% for their stacking ensemble model with random forest, extra trees and logistic regression. Saadatmand et al (8) observed that for the prediction of admission to the ICU, the ensemble stacking via a Neural Net achieved an accuracy of over 95%. In other study by Subudhi et al (9), it is reported that all fitted machine learning models for prediction ICU admission consists of ensemble, Gaussian process, linear, naïve bayes, nearest neighbor, support vector machine, tree-based, discriminant analysis and neural network models using cross validation, reported that all models had mean F1 scores ≥0.7 .
One study also has evaluated predicting the risk of mortality in COVID-19 hospitalized patients using various ML algorithms in combination with the genetic algorithm (22); but to the best of our knowledge, no study tries to predict ICU admission by feature selection method. The innovation of our study was to predict ICU admission for COVID-19 patients by ANN algorithm paired with Elastic Net feature selection method to have a more efficient and feasible model for prediction.
Due to the low number of samples, this study was considered as a feasibility study and results showed that in spite of the low sample size, the fitted hybrid ANN model had acceptable performance, and such model or other machine learning models combined with feature selection methods could be used to predict ICU admission or other outcomes such as mortality of COVID-19 patients.
Conclusion
Considering that at the time of hospitalization, the prediction of the need for ICU admission in the coming days is vital from the perspective of treatment and resource management, it is of great significance to investigate a model for making such a prediction based on the minimum number of available variables.
In the present study, the Elastic Net algorithm was used to address the optimization of the predictors which are considered one of the challenges in ML models. According to the results, the Elastic Net can select the best subset features for inclusion in the ANN algorithm.
In the investigated sample of patients hospitalized with COVID-19, ANN combining with Elastic Net provided a reliable model for predicting the outcome of the patient’s need for ICU admission aiming to provide a reference for physicians, clinical workers, and even policymakers. The overall accuracy of this model was 93.15%, sensitivity 80%, specificity 95.8%, and AUC 0.90.
Limitations and study suggestions
One of the main limitations of this study was the small size of examined sample and as a result, the small number of cases with ICU admission in the modeling process. It is suggested that observational studies with larger sample sizes are performed on hospitalized patients by recording important demographic and clinical information affecting patients’ clinical outcomes to build a stronger and more reliable model for prediction. The strengths of this study are the good quality of the data, the large number of examined variables, and the accurate recording of information. Since in the sample under study, such a model has provided acceptable results, an ANN can be used to make a prediction tool to predict the need for ICU admission for patients in other medical centers and other samples. Such a prediction tool can also be constructed for other outcomes such as mortality.
Ethics approval and consent to participate
The ethics committee of Tabriz University of Medical Sciences approved this study. Approval code for the current study is IR.TBZMED.REC.1400.238.
Acknowledgement
This study was funded by the Tabriz University of Medical Sciences under grant number 67691. Also, this study as a secondary study was approved by the ethics committee of Tabriz University of Medical Sciences (approval code IR.TBZMED.REC.1400.238). The ethics committee approval code for the original study was IR.TBZMED.REC.1398.1310. We are indebted to all the patients who participated in this research and made it possible.
Conflict of Interest
The authors declare no conflict of interest.