Critical Care Learning Tools: Machine Learning for Mortality Prediction in Emergency Settings

Emergency departments (EDs) and intensive care units (ICUs) are critical environments where rapid and accurate decision-making is paramount. Predicting patient outcomes, especially mortality, is a crucial aspect of critical care. This article delves into a retrospective study conducted at Peking University Third Hospital, which investigated the use of machine learning (ML) models as Critical Care Learning Tools to forecast 7-day mortality in patients admitted through emergency medical services (EMS). Understanding and implementing such predictive tools can significantly enhance clinical decision support and resource allocation in high-pressure emergency care settings.

Study Setting and Patient Cohort

The study was meticulously carried out within the emergency system of Peking University Third Hospital, encompassing an 18-bed resuscitation unit and a 15-bed ED-ICU. The research team focused on a retrospective cohort of patients who arrived via EMS and were subsequently admitted to either the resuscitation unit or the ED-ICU between February and December 2015. Inclusion criteria mandated that patients were alive upon EMS arrival. Ethical approval was granted by the Peking University Third Hospital Medical Science Research Ethics Committee, with a waiver for informed consent due to the retrospective nature of the study and the use of anonymized data from standard patient care.

Data Collection and Feature Selection

A panel of expert emergency physicians and epidemiologists played a vital role in designing the study and ensuring data consistency. They developed standardized data extraction strategies for physicians trained in resuscitation to gather information from electronic medical records. The extracted data included a comprehensive range of patient information: demographics, pre-existing conditions (comorbidities), physiological measurements, laboratory results, diagnoses, and length of hospital stay. To maintain data relevance and focus, the variables were limited to those recorded within the first 6 hours of medical contact. The primary outcome of interest was defined as death within seven days of admission, identified from the medical records.

Addressing Missing Data

Recognizing the inherent challenges of real-world data, the researchers addressed the issue of missing data points within their dataset. After excluding patients with excessive missing data, a small proportion of incompleteness remained (less than 5% of collected features). To handle this, a practical approach of imputation was employed, replacing missing values with the average value for that specific variable calculated from complete patient data. This method ensured that the ML models could be trained on a reasonably complete dataset without discarding valuable patient records.

Rigorous Feature Screening for Model Accuracy

The Peking University Third Hospital health records database contains a vast amount of clinical data, some of which may be redundant or poorly structured for machine learning purposes. To build effective and efficient models, a rigorous feature screening process was implemented. Candidate variables were carefully examined and discussed in team meetings. Established risk adjustment algorithms, such as DAVROS and SAPS 3, along with published ICU admission criteria, served as initial guides for variable selection. Furthermore, the Delphi method and comprehensive literature reviews were used to assess the potential impact of each variable on the predictive models. Each variable was evaluated based on its clinical significance, representativeness of patient condition, and accessibility within the electronic health records. This thorough screening process ultimately led to the selection of 75 key features for inclusion in the machine learning models.

Machine Learning Model Development and Comparison

The study leveraged the Python 2.7 (Anaconda) platform and the scikit-learn 0.19.1 framework for model training. Python 2.7 is recognized for its flexibility in experimental design and debugging various machine learning frameworks. Four distinct machine learning models were chosen for comparison:

Logistic Regression (LR): A widely used statistical model in data mining and disease prediction. In this study, univariate and multivariate logistic regression analyses were performed to identify variables significantly associated with 7-day mortality. This model helps determine risk factors and predict the probability of death based on patient characteristics.
Support Vector Machine (SVM): Based on statistical learning theory, SVM aims to find an optimal hyperplane for data classification, maximizing the margin between classes while maintaining accuracy. SVM is effective for classifying linearly separable data and is a valuable tool in predictive modeling.
Gradient Boosting Decision Tree (GBDT): A powerful algorithm that combines boosting and decision tree methods. GBDT builds multiple decision trees sequentially, with each tree correcting the errors of its predecessors, leading to high predictive accuracy. It excels at capturing complex relationships in data.
XGBoost (Extreme Gradient Boosting): An enhanced version of GBDT, XGBoost incorporates both linear classifiers and tree-based models. It utilizes second-order Taylor expansion for optimization, unlike traditional GBDT which only uses first-order derivatives. XGBoost also includes regularization terms to control model complexity and prevent overfitting, making it a robust and highly effective machine learning algorithm.

The LR, SVM, and GBDT algorithms were implemented using the scikit-learn 0.19.1 package. XGBoost was integrated using the XGBoost 0.82 framework with scikit-learn 0.19.1. Specific parameter settings included L1 regularization for LR and a linear kernel for SVM. GBDT and XGBoost parameters were set to their default values, allowing for a standardized comparison across models.

Model Performance Evaluation

To rigorously compare the performance of the four machine learning algorithms (LR, SVM, GBDT, and XGBoost), the study employed ten-fold cross-validation. The patient data, consisting of the 75 selected features, was randomly divided into ten subsets (folds). In each of the ten experiments, one fold was used as the test set, and the remaining nine folds were used for training the models. The average performance across all ten experiments provided a robust measure of each algorithm’s accuracy. Performance metrics included average accuracy (ACC), sensitivity (Se), specificity (Sp), Youden index, and area under the receiver operating characteristic curve (AUC). These metrics comprehensively evaluated the models’ ability to correctly predict 7-day mortality.

Ethical Considerations

The study adhered to strict ethical guidelines. As stated, the Institutional Ethics Committee approved the research and granted a waiver of informed consent. This was justified because the study was retrospective, using data collected as part of routine patient care, and there was no alteration of patient treatment protocols for research purposes. The researchers affirmed that all research activities were conducted in accordance with relevant regulations and ethical principles.

This study demonstrates the potential of machine learning algorithms as valuable critical care learning tools. By analyzing patient data and predicting mortality risk, these models can aid clinicians in making more informed decisions in emergency and critical care settings. Further research should focus on the practical implementation and integration of these models into clinical workflows to enhance patient care and outcomes.