Diagnosis of COVID-19 alike Viral Pneumonia: - From Experimental Nuclear Physicist to Data Scientist/AI Engineer

Building CNN from Scratch for Pneumonia Diagnosis by Classifying Chest X-Ray Images in Patients

I worked on building up one convolutional neural (CNN) network model from scratch. This CNN model is used to indicate the presence of pneumonia in one person’s lung. Pneumonia is a worldwide disease, and even there are 3 millions of cases every single year in the united states. The best apprach to pneumonia diagnosis is to have the patient undergo the X-ray image testing. Despite the chest X-ray image diagnosis, we, however, couldn’t quickly figure out the causes of different types of pneumonia via the visual examination of X-ray images. More importantly, different types of pneumonia require different treatments. We could determine whether someone gets pneumonia or not by running this person’s chest X-ray images through a classifier. Nevertheless, to define the type of pneumonia, e.g. either viral pneumonia or bacterial pneumonia, that one person currently sufers from, a much finer (or more precise) classifier is strongly demanded. This finer classifier can help medical doctors’ judgement about the medical treatment to heal the pneumonia for patients.

The purpose of establishing one pneumonia classifier is to save the workload of mdical doctors. To acheve the goal, we constructed two models: one is the baseline; the other is CNN. The former was created by packing 4 dense layers altogether. The later, on the other hand, is composed of three pairs of covolutional and maximal-pooling layers, yielding the convolutional base. In the CNN model, the convolutional base is fully connected (after one 3-D image was flattened and then turned into a 1-D array) with two following dense layers as the classifier. Overall, one CNN comprises a CNN base togather with dense layers.

I downloaded all chest X-ray images from Kaggle and saved them on my laptop’s hard disk. There are more than four thousands and five hundreds chest X-ray images. 80% of them was randomly picked up as the training pool. The rest of 20% is evenly spliited into two portions as testing and validation samples, respectively. I had training and testing samples as well go through both of the baseline and CNN models. On one hand, the accuracy of 95% for the baseline model was obtained from the training samples. It’s, on the other, 96% from the testing sample. For the CNN model, a 5% of improvement was achieved from the training samples, and an amount of 1% was gained from the testing samples.

In spite of the accuracy increment by comparing the CNN model against baseline, a common issue, that is the overfitting, arose from an expansion of neural layers in its size. Three solutions to the overfitting issue are listed here. Two approaches were mentioned here in order to decrease neural layers by eliminating some neurons. These two approaches are named as regularization and dropout. The last method to get the overfitting issue resolved is to implement one well-known, pretrained model by downloading a feature map’s weights. Since a complete discussion of these solutions needs more time, I won’t address them in great details here.

Here, I would like to present three recommensation item. First of all, a careless examination of lung-related diseases, whether they are malignant or non-malignant, by eye-reading chest X-ray images might be misled as the pneumonia. Hence, clinicians encourage patients to get more follow-up radiographs for the further confirmation. Secondly, as mentioned previously, chest X-ray imgaes hardly tell viral pneumonia from bacterial one. A termary classifier is anticipated to be finer than a binay one. This termary classifier comprises three components: normal, bacteria and virus, and it needs chest X-ray images being categorized into three sub-groups as described in Mendeley. The last recomendation is correlated to the current Coronavirus epidemic spread all over the world. The illness tied to the new coronavirus was originally called novel coronavirus-infected pneumonia (NCIP). The World Health Organization renamed it COVID-19, which is short for coronavirus disease 2019. A severe Coronavirus complication is the pneumonia through the viral infections in both lungs, leading to inflammation in tiny air sacs inside your lungs. COVID-19 damages the cells and tissue that line the air sacs in your lungs. These sacs are where the oxygen you breathe is processed and delivered to your blood. The damage causes tissue to break off and clog your lungs. The walls of the sacs can thicken, making it very hard for you to breathe. A chest X-ray image scan may show patchy areas of damage in both your lungs. Hence, we might take the development of one COVID-19 pneumonia classifier into consideration.

Regarding the future work, I propose four plans. The first is to develop a classifier used to separate viral from bacterial pneumonia. The second one is to develop a translator used to intepret picrures and generate captions as printed on every single X-ray image. This application can save the time consumption of diagnosis for medical doctors. The third one is develop a label generator used to produce more images which shows the sign of pneumonia. The forth one is develop an frame maker used to align every X-ray image for the convenience of handwritten labelling.