Facemask Detection: 3 Class Model

8 min readApr 28, 2021

Introduction

On 31st December 2019, pneumonia of unknown cause was detected in the city of Wuhan in Hubei province, China[1]. Australia’s first case of the novel coronavirus — also known as COVID-19 — was confirmed on the 25th of January 2020 in Melbourne, Victoria[2]. Since then, COVID-19 has become a global pandemic, affecting countries with more than 36 million confirmed cases and 1 million deaths worldwide[3].

Problem Definition

Coronavirus is a highly infectious disease that can spread easily from one person to another via mouth and nose secretions. Since these secretions can travel as droplets through the air, as well as contaminating surfaces, close contact with infected people poses a high risk of viral transmission. Thus, when physical distancing is not possible, wearing a face mask is important to protect others[4]. From the data collected from WHO we see that approximately 57 percent of countries have community transmission as the predominant cause of new covid infections [5]. The major solution to reduce the spread of the virus and keep the community safe is by wearing face masks.

Methodology

Data Preparation

Two-class Model

The blurred images in the dataset have been removed in order to preserve the consistency of the model using the OpenCV library. We attempted to manually remove childrens’ images from the dataset, based on our assumption about the age of the person in the image. This was done due to potential ethics violations involving the use of images of children.

Multiple faces have been extracted from the images containing groups of people with the help of Haar Cascades[6], an object detection method used to locate an object of interest in images. We have also experimented with converting the input image to grayscale for the two-class model because in some cases, detecting luminance, as opposed to color, yielded better results in object detection. The input images have been scaled down for improving face detection and decreasing the size of the model since our model has a fixed scale during training.

In order to increase the diversity of the data available for training models, data augmentation was performed. This is a technique commonly used, to artificially expand the size of the training dataset by cropping, rotating, shearing and horizontally flipping the data.

Three-class Model

Data preparation for the 3-class model built on the approach used for the 2-class model, with some additional tasks. These were managing the challenges around identifying specific features when a mask is worn incorrectly and dealing with class imbalance due to image availability (there were comparatively few ‘incorrect_mask’ images available). While ‘mask’ and ‘no_mask’ classes are pretty clear in terms of physical features expected, ‘incorrect_mask’ is more complex. According to an article in ‘The Guardian’, there are a number of ways people commonly wear masks incorrectly[7]. These include but are not limited to exposing the chin, exposing the mouth but not the nose, exposing the nose but not the mouth, and hanging off the ear. These different versions of mask-wearing all contain different facial features such as nose but not the mouth, or mouth but not nose. In order to optimize performance in this proof of concept model, given the limited number of ‘incorrect_mask’ images available, the decision was made to hand curate the ‘incorrect_mask’ images data to only include the highest represented version of incorrect mask-wearing, which was with both mask and nose showing.

There were roughly 25 times as many images in the ‘mask’ class, as in the ‘incorrect_mask’ class. As we did not have ethics clearance for this project we were unable to collect our own additional images to rectify this problem. Training a model on such imbalanced data results in the possibility of good accuracy scores while no images of the minority class have been correctly identified.

In order to address this problem, a two-pronged approach was used. Firstly, data augmentation was used to increase the number of images in each class to be roughly equal. Secondly, weights were applied to each of the 3 classes when the model was trained on the data, ensuring fair weighting of each class given the small imbalance that still existed after augmentation[8].

Methods and Models

Two-Class Model: Transfer Learning using MobileNet

In order to effectively train a model on limited data, a pre-trained network named MobileNetV2 was fine-tuned (a very large dataset is required to train a neural network). The MobileNet network was initially trained on the Imagenet dataset. As we were also planning to use the neural network model in applications such as mobile phone apps, it was important that the model be able to load and provide output very quickly. The MobileNet neural networks were designed for the purpose of providing a deep neural network that could run on a personal mobile device, providing the reliability, privacy, and security of a model efficient enough to run solely on the client [9].

Model Performance

The model was compiled and trained on the augmented data using the Adam optimizer to update the weights of the network iteratively and produce a higher accuracy. The resulting accuracy of the model was over 99% after training. Accuracy is a metric that is used to evaluate classification models, that shows the percentage of correctly predicted classes. For example, if the accuracy of the model is 99%, this means that 99 of 100 predictions were made correctly. The graph below shows the accuracy and loss(error) of the model, and an accuracy of 1.0 means it is close to 100% (see Figure 1). Epoch indicates the number of passes of the entire training dataset the machine learning algorithm has completed.

*Figure 1. Training error (Loss) and accuracy*

Three-Class Model

The 3-class model was built using the pre-trained model, MobileNetV2. The MobileNet model was imported, the top layer was discarded, the base layers’ weights were fixed to their current values, and a few extra layers were added on the top to create our model. Transfer learning was used to train these top layers to recognize images with a face either wearing a mask correctly, wearing a mask incorrectly (nose showing), or not wearing a mask.

Performance of the three-class model

A number of approaches have been explored to fine-tune the accuracy of our 3-class model including variations in the architecture of the top layers added in transfer learning [10]. The confusion matrix below (see Figure 2) provides an example of the kind of performance currently being achieved by our model. It can be seen that for every class, at least 65% of images in the test set are being labeled correctly by our model.

*Figure 2. — Confusion Matrix for the 3-class model using MobilNetV2*

When examining a plot of both loss (error) and recall (percentage of correctly predicted images from total images for a specific class), against epochs (number of times the learning algorithm works through the entire training dataset before tuning parameters for optimization), it appears that recall for the model as a whole is approaching around 80%.

This suggests that with more training epochs on our current dataset, our model could achieve an average recall of about 0.8 (see Figure 3). The recall was used in preference to accuracy as the performance metric in this particular case because for imbalanced classes, accuracy does not optimize the performance of each class evenly. Given that the use case for this model involves correctly identifying each of the 3 classes, with no preference for one class over another, recall is more useful for optimization [11].

*Figure 3. The plot of loss and recall against epochs for current 3-class model*

Viability of and need for a 3-class model

Given the promising results of our 3-class proof of concept (POC) model despite limited data for training, we believe it would be possible to improve our model given larger and more diverse data sets with a similar representation of each class. In terms of deciding what constitutes incorrect mask-wearing, there are different approaches that could be taken. It may be useful to create a number of incorrect mask classes in order to identify different features present in different types of incorrect mask use ie. to create classes for mask and nose present or mask and mouth present. This approach would most likely use facial recognition software and adapt it to recognize the presence of certain facial features without other facial features as described in an article from National Geographic [12]. The benefits of using a 3-class model as opposed to a 2-class model include public education, and the ability to target this education where it is needed, as well as tighter control for businesses that need to make some guarantee of public safety such as supermarkets and public libraries.

References

[1] Who.int. 2020. Coronavirus Disease (COVID-19) — Events As They Happen. [online] Available at: <https://www.who.int/emergencies/diseases/novel-coronavirus-2019/events-as-they-happen> [Accessed 10 October 2020].

[2] Department of Health. 2020. First Confirmed Case Of Novel Coronavirus In Australia. [online] Available at: <https://www.health.gov.au/ministers/the-hon-greg-hunt-mp/media/first-confirmed-case-of-novel-coronavirus-in-australia> [Accessed 10 October 2020].

[3] Covid19.who.int. 2020. WHO Coronavirus Disease (COVID-19) Dashboard. [online] Available at: <https://covid19.who.int/> [Accessed 10 October 2020].

[4] H. Noi, “Q&A: How is COVID-19 transmitted?,” 14 July 2020. [Online]. Available: https://www.who.int/vietnam/news/detail/14-07-2020-q-a-how-is-covid-19-transmitted. [Accessed 07 Oct 2020].

[5] 07 Oct 2020. [Online]. Available: https://covid19.who.int/table. [Accessed 07 Oct 2020].

[6] “ Face Detection using Haar Cascades”[Online] Available: https://docs.opencv.org/3.4.3/d7/d8b/tutorial_py_face_detection.html [Accessed: 08- Oct- 2020].

[7]L. Geddes, “The most common ways we’re wearing face masks incorrectly”, the Guardian, 2020. [Online]. Available: https://www.theguardian.com/world/2020/oct/02/the-most-common-ways-were-wearing-face-masks-incorrectly. [Accessed: 07- Oct- 2020].

[8]B. Bhatt, “Class Weights for Handling Imbalanced Datasets”, YouTube, 2020. [Online]. Available: https://www.youtube.com/watch?v=Kp31wfHpG2c&t=74s&ab_channel=BhaveshBhatt. [Accessed: 01- Oct- 2020].

[9]M. Sandler, A. Howard, “MobileNetV2: The Next Generation of On-Device Computer Vision Networks”, Google AI Blog, 2020. [Online]. Available: https://ai.googleblog.com/2018/04/mobilenetv2-next-generation-of-on.html. [Accessed: 10- Oct- 2020].

[10] J. Brownlee, “How to Control Neural Network Model Capacity With Nodes and Layers”, Machine Learning Mastery, 2020. [Online]. Available: https://machinelearningmastery.com/how-to-control-neural-network-model-capacity-with-nodes-and-layers/. [Accessed: 01- Oct- 2020].

[11]S. Ghoneim, “Accuracy, Recall, Precision, F-Score & Specificity, which to optimize on?”, Medium, 2020. [Online]. Available: https://towardsdatascience.com/accuracy-recall-precision-f-score-specificity-which-to-optimize-on-867d3f11124. [Accessed: 14- Oct- 2020].

[12]W. Yan, “Face-mask recognition has arrived — for better or worse”, National Geographic, 2020. [Online]. Available: https://www.nationalgeographic.com/science/2020/09/face-mask-recognition-has-arrived-for-coronavirus-better-or-worse-cvd/. [Accessed: 10- Oct- 2020].