Abstract:
The appearance of the Coronavirus disease 2019 (COVID-19) caused a global crisis
both socially and economically. Reverse Transcription Polymerase Chain Reaction
(RT-PCR) is currently being used as a test kit for detecting the presence of the virus in
the patient. However, RT-PCRs are limited in number and costly to achieve the required
number of tests and they produce many false-negative results both in the initial tests
and in recovered patients. As a result, many deep learning (DL) algorithms are
developed and achieved state-of-the-art performance in detecting the virus and
identifying the severity stage. However, these methods used old Convolutional Neural
Network (CNN) models that are computationally expensive and complex. In this
research work, we designed a two-stage DL model using an approach known as Vision
Transformer (ViT) that can detect COVID-19 and determine its severity stage using
thoracic CT images. In the first stage of the DL model, we used a pre-trained ViT model
called ViT_B/32 to classify CT images into COVID and non-COVID. We also designed
our own simple custom CNN model and performed extensive set of experiments for the
detection of COVID-19 using both ViT_B/32 and CNN models in the first stage
network. In the second stage of the DL model, we used a U-Net like ViT based model
known as Vision Transformer for Biomedical Image Segmentation (VITBIS) to
segment both the lung and infection regions of the COVID infected CT images and
computed the severity level of the infection. Transformers with attention mechanism
are used both in the encoder and decoder parts of the VITBIS instead of CNN encoder
and decoder architecture. The second stage DL network contains two sub-models; one
for the lung segmentation and the other for the lesion segmentation. In the first stage
network, the ViT model outperformed the CNN models and achieved a 5-fold cross
validated accuracy result of 99.7%. Our custom CNN model is a runner up with a 5-fold cross validated accuracy result of 98%. The VGG16 pre-trained CNN model took
the third place with accuracy result of 97%. For the second stage network, the best
performance of the lung segmentation network is 95.8% Intersection Over Union
(IOU), 96.22% Dice similarity Coefficient (DSC) and sensitivity of 99.69%. The lesion
segmentation network performed with IOU of 94%, DSC of 95.23% and sensitivity of
98.3%.
Keywords: COVID-19, Deep Learning, Severity, Thoracic CT