GSoC 2019 – Polishing Face Detection and working on Web App

The baby is growing up!

Long time no see! This month I already have had 2 exams and the last one is taking place this Friday, thus this post is coming a bit late and it will describe all the things I have done since the First Evaluations. First, I will talk about all the changes to the Face Recognition module and a new Age and Gender detection module. After that, I will briefly discuss all the Web App related things I have done in this period.

Making Face Recognition “Great Again”

In my last post, I was talking about using K-Means to cluster group embeddings and thus improve the accuracy of Face Recognition models. However, I faced several problems that were serious enough to drop this idea.

Red ellipse denotes the “cloud” of hard to recognize faces (face embeddings).

To illustrate the problem I did PCA on 512 face embeddings generated for “Pozner” test video and plotted dimensionally-reduced embeddings in 3D. As you can see, there are 2 very well separated clusters for 2 most occurring persons in the video (orange and pink colored ones). However, our model starts to predict anything but the correct names for faces inside the red ellipse. There are several reasons for that:

  1. Either it is not a face or a face that is hardly recognizable (e.g. showing only one side of the face).
  2. The person in the video is not well-represented in the face recognition dataset.
  3. Finally, face recognition simply doesn’t do its job well.

And this list doesn’t even include some major problems with K-Means itself like choosing the number of clusters, which I tried to solve by using silhouette analysis. This method works more-or-less well in practice but it still didn’t improve the situation a lot.

The Power of Image Augmentation

Clearly, my test celebrities mini-dataset was not perfectly balanced and I decided to improve this by augmenting it. Also, it might be a good idea to add some variance to your dataset (e.g. apply transformations on augmented images) in order to make your model generalize better.

To do image augmentation I chose a great library called “albumenations”, which I found in ODS (Open Data Science) Slack Community (a great knowledge and networking source for data analysts). This library provides a convenient and efficient way to augment your dataset with transformations of all sorts and types (e.g. rotation, flipping, adding noise, changing colors).

Original image vs. rotated, more bright augmentation with noise

Problems with ArcFace

In my last post, I mentioned a new Face Recognition SOTA model ArcFace, that is now a part of PMR’s Model Zoo. After days of tests and experiments, I still couldn’t get to perform it at least as good as FaceNet, despite that I used the code of original authors. I am still investigating this problem and my next step is to change the way data is loaded for training celebrities face embeddings datasets using ArcFace as now it uses image loading procedure from Facenet library (though it supposed to be pretty generic). Thus, we won’t look into ArcFace results in this post.

Image Augmentation Improvements

Face Recognition Precision shows the portion of faces that we were able to correctly detect and recognize and it cannot be higher than Face Detection Precision.

Despite the funny look of augmented images, this small step can increase the accuracy. One of the drawbacks of this technique is the increase in the number of embeddings and thus the time needed for search also goes up. Currently, I am doing 4 types of augmentations for every image but to reduce the size of resulting dataset I will reduce the number of augmentations in future, trying not to lose improvements in accuracy.

Finally, I encourage everybody to check out “albumentations” library (that can also greatly help you in Kaggle) and add some augmentation to your faces datasets!

Detecting Main Persons

Does Courteney Cox look like Eteri Tutberidze? 🤔

Another idea was to detect the most occurring persons and consider only them in the KNN stage, effectively reducing the amount of noise in the predictions. I was interested mainly in “Friends” test video as there are 6 persons there. As you can see in the picture above, some faces are pretty hard to recognize when a person doesn’t look more-or-less directly into the camera.

Of course, successful classification of such cases depends largely on the dataset and its ability to reflect all the variance of a particular person’s face. However, it turned out to be quite difficult to choose which persons to exclude as sometimes a wrong person appeared more often than the right one! And even if you end up with most of the wrong persons appearing in the bottom of your persons’ list, it still problematic to choose the threshold below which a person should be excluded.

However, when I will run PMR on the long videos I will test this technique again. I believe that for long (30-60 minutes +) videos like interviews (e.g. Pozner, Putin from my mini test set) we can effectively decrease the noise in predictions but for a smaller one it will introduce more problems than improvements

Face Tracking

And here comes the killer feature that can greatly help you to increase recognition accuracy! It would be pretty dumb not to utilize the sequential nature of video in order to detect the faces that occur throughout sequences of frames. We still calculate embeddings for all the faces and resulting detected face tracks are used only for computing predictions. This technique works as follows:

  1. Once Face Detection is done we start processing the detected face boxes
  2. In each step, we compare face boxes from the current frame to those in the next frame
  3. If the IoU (Intersection over Union) is higher than a given threshold (in our case it is 0.5) than most probably this is the same person.
  4. Continue to create a “track” until IoU becomes less than a threshold
  5. Finally, face_tracking() returns list of detected persons that consists of a list with tuples (frame id, box id), which describe a sequence of person’s faces.

With Face Tracking I was able to obtain the following results:

  1. Pozner (with augmentations) – 86.2% (1.7% improvement)
  2. Putin (without augmentations) – 88.3% (!) (1.6% improvement)
  3. Friends (with augmentations) – 63% (3% improvement)

Overall, Face Tracking can improve the precision of your system by making your predictions more stable. The good thing is that we can use these face tracks to smooth age and gender predictions, about which we will talk in the next section.

Age and Gender Detection

As it turned out, the task of age and gender detection is pretty similar to face recognition in terms of the workflow. Thus, FaceAgeGenderElem and FaceAgeGenderKernel look pretty much alike as their Face Recognition counterparts. Thus, I consider creating a common parent class for them to better follow DRY principle. In any case, given the infrastructure that I have developed in previous stages, it was quite hassle-free to add a whole new family of models 🙂

For now, there is only one age and gender detection model, which is called DEX (Deep EXpectation of apparent age). I used the following code and it turned out to work pretty well (see images below). This model is supposed to be used with GPU (but it works pretty fast so I assume one could also use CPU) and of course many more models are coming.

Don’t be surprised that our algorithm is over-optimistic – celebrities look well after themselves!

As it has been already mentioned, I use face tracking to make age and gender predictions more stable – we compute an average of all the predicted age values for a particular person and choose the gender based on the majority voting.

Other Things

Apart from working on Face Recognition and adding Age and Gender Detection I spent some time on fixing and updating things:

  1. I added name labels to Pozner, Putin and Friends videos. This allowed me to measure the performance of all the tricks I tried.
  2. I had to spend some time to figure out that DSFD algorithm was broken by upgrading to the latest version of PyTorch.
  3. Tried to adopt PMR for Jupyter Notebook but Jupyter was pretty hostile to multiprocessing 🙁 I will return to that issue later
  4. Upgraded functions for outputting all the results in JSON format. Previously, it was handled by a separate node but now each PipelineElement defines its own way to handle JSON.
  5. Upgraded ImageHandlerElem and ImageOutputHandler to the new app’s structure (these guys have not been touched since the times of PoC).

And all of these lead us to a new chapter of PMR development – the Web App!

Working on Web App

Already when I was working on PMR’s PoC I faced the problem with the main ML library blocking Web App. At that moment, I decided to follow the advice of one of my friends and use Python’s Tornado Web Framework, that was designed to serve a huge number of requests fast and efficiently by processing them asynchronously. For sure, we don’t have a lot of users (yet) but being able to run all the ML processing in the background while not blocking the web app is definitely what we need.

Becoming Friends with “Tornado”

During PoC I had not so much time to learn Tornado well. Thus, I decided to invest some time in reading about this powerful tool. To do that, I chose Introduction to Tornado book by O’Reilly, which is already pretty outdated (2012). Though some of the stuff that this book describes had been already deprecated in favor of new techniques, I still found this book helpful as I learned:

  1. Tornado’s template language that allows one to write Python’s code right in the templates.
  2. How to work with MongoDB (though I use MySQL in PMR)
  3. How to use Tornado’s asynchronous functions (though the book uses deprecated techniques, I learned the new ones by upgrading the book’s code)
  4. Basics of Web security.
  5. How to authenticate using external services via OAuth 2.0 with the help of Tornado’s auth module.

Thus, I would recommend this relatively small book (~120 pages) to those interested in learning Tornado, but don’t forget to use Tornado’s documentation to be as up-to-date as possible.

Login with GitHub and Google Account

And the first new feature for Web App is a new login system! The user now can log in either with his or her Google and GitHub account or via credentials obtained through the registration.

New sign in panel

It took me some time to implement login with Google Account and Tornado’s example greatly helped me to do that. However, it took me almost a day to add GitHub authentication support as it is not implemented in Tornado’s auth module.

First, I was searching for ready-available implementation but all of them were either outdated or super outdated 🙂 Finally, I ended up implementing it myself based on Tornado’s implementation of Facebook OAuth2 authentication. If you’ll ever need to add sign in with GitHub to your Tornado based website, feel free to look at the following GitHub’s gist.

Speaking about other stuff, I cleaned up the mess in some parts of the code that had been there since PoC times and I also updated the way Web App creates and uses Pipelines.

Future Plans

Currently, I am focused on adding Facial Expression Detection category of models and creating REST API, which I expect to finish in the coming days. REST API will allow a user to log in using his or her credentials, pass parameters of pipeline and get a link using which user can obtain processing results.

I plan to publish a new post in the middle of the next week, right during the Second Evaluations. By that time, I expect to deliver a Web App that supports the choice of different pipeline parameters, advanced options for uploading video files, YouTube videos and ordinary images, display of processing progress and other features described in my PoC.

Thank you!

Yay, you made it that far! Thank you, and sorry for writing yet another lengthy blog post, which (I hope) you found interesting. Would be happy to get any type of feedback from you!