Abstract:
In pediatrics, infectious disease is a subspecialty that addresses the diagnosis, prevention, and
treatment of infections in children aged from birth to 21 years of age. Given their developing
immune systems and frequent exposure, children are especially vulnerable to infections.
Globally, infectious diseases have a significant impact, causing millions of deaths each year. The
advancement of emerging technologies, such as machine learning, has gained new momentum to
fight against pediatric infectious diseases. This study investigates the application of machine
learning (ML) in enhancing the diagnosis and treatment of pediatric infectious diseases, aiming
to improve healthcare outcomes for the pediatric population. We employed a quantitative
research design approach, combining an experimental phase to develop and fine-tune ML
models like KNN, NB, SVM, LR, RF, and XGBoost with a survey method to evaluate the
effectiveness of the machine learning integrated software framework. We emphasize meticulous
data preprocessing, utilizing the K-Nearest Neighbors (KNN) imputation method for handling
missing data and the Synthetic Minority Oversampling Technique (SMOTE) for addressing data
imbalances. These preprocessing steps are critical for enhancing model performance and
accuracy in complex medical applications. Additionally, z-score normalization is applied to
standardize datasets, ensuring stable and reliable ML model outcomes. After conducting the
experiments, we found that Random Forest performed best and integrated it into a framework
designed for practical use in pediatric healthcare settings. This work integrates SHAP (SHapley
Additive exPlanations) into a random forest model to enhance transparency and build trust
among healthcare stakeholders. A software framework incorporating these explainable models
was developed to improve both usability, understandability, and transparency. We performed a
usability test with clinicians, resulting in a SUS score of 74.25, which corresponds to C and
acceptable on the grade and acceptability scale, respectively. Using a random forest model, we
achieved 0.97 of accuracy in predicting pediatric infectious diseases, employing a 90/10 traintest
split, 5-fold cross-validation, and grid search hyper parameter optimization technique. With
this result integrated with the framework, our study contributes to facilitating the analysis of
patient data and identification of healthcare trends, which supports the clinicians, limited in
number and unable to perform a good deal of diagnoses in a short period of time.
Keywords: pediatric infectious diseases, machine learning, explainability, software framework,
system usability scale.