Abstract:
Pneumonia is the leading cause of death in children, killing 5 million children under
the age of five in 2020, with Nigeria, India, Pakistan, the Democratic Republic of
the Congo, and Ethiopia accounting for half of all deaths, according to the WHO
2020 report. In this thesis work, we proposed a simple CNN model to detect the
presence of pneumonia in a patient from a chest x-ray image using public and local
x-ray image datasets. The x-ray image data collected from two regional hospitals
in Ethiopia (Merawi and Felegehiwot Referral Hospital) is processed, analyzed,
and combined in an appropriate ratio.
To evaluate the whole images from each dataset, different exploratory data analysis
techniques are employed to compute image quality, similarity, and variances.
Exploratory analysis revealed that x-ray images obtained from Merawi hospital
have low quality compared to other image sources due to exposure imbalance
during imaging, whereas the image obtained from an online source is of high
quality. Following that, 3 classical machine learning algorithms and 6 pretrained
models where selected and trained on prepared data: SVM, KNN, and Logistic
Regression, VGG16, VGG19, DenseNet121, MobileNetV2, InceptionResNetV2,
and Xception. Besides that, we proposed a CNN model with few convolution
layers using the Keras Sequential API, and the performances were examined using
selected metrics and compared to pretrained models.
We discovered that the proposed CNN model outperformed both the deep transfer
learning and classical models with a test accuracy, weighted recall, and precision
f1 score of 93% and an AUC of 96.02%.The model missed 7 pneumonia-infected
images obtained from Felegehiwot hospital and 8 normal images obtained from
Merawi hospital. During model debugging, we observed that the model couldn’t
obtain enough information from pneumonia images obtained from Felegehiwot
during training due to the small sample size of only 177 images, whereas the normal
missed images obtained from Merawi hospital are noisy due to low exposures. The
model performs well on online test data, with test accuracy, weighted precision,
recall, an F1 score of 99%, and a 100% AUC score.
For external validation, the model is evaluated on chest x-ray images from Debre
Markos Comprehensive and specialty hospitals. The model properly predicts all
images with pneumonia and wrongly labels only one normal image as having
pneumonia. Furthermore, we also addressed the association factors of clinical
and machine learning algorithms for pneumonia diagnosis and the key challenges
and issues in the clinical application of machine learning techniques in terms of
various factors. As a result, the thesis work will be extremely beneficial, especially
in developing countries where the medical healthcare economy is crucial.
Keywords: Chest X-ray image, Evaluation metrics, Machine learning algorithm,
Medical image dataset, Pneumonia.