GSoC 2019 – Poor Man’s Rekognition

Poor Man’s RekognitionGitHub Repo, PMR-WebGitHub Repo

So, my second GSoC has come to an end and it is time to once again remember what has been done during this summer. In this post, I will talk about all the accomplishments that I have achieved and the challenges that I faced while working on Poor Man’s Rekognition at CCExtractor Development.

If you check my original proposal you will see that the majority of goals have been met. Moreover, I faced and successfully handled a few challenges that I hadn’t foreseen initially. The remaining stuff will be done after GSoC. But for now, let’s remember all the moments of GSoC 2019.

Never Mess with the Structure

One of the first challenges that I faced was an ever-growing memory consumption whenever I started to read video frames. I understood almost immediately that this is a perfect case for using generators instead of reading first the frames and storing all of them in one array. Still, there were some problems with TensorFlow not releasing memory and pyav (library for handling videos).

These problems made me come up with a concept of Kernel – a base class for all PipelineElements that runs jobs as a separate process. Once you finished your face detection/recognition or age/gender recognition, GPU memory and RAM is effectively cleared, paving a way for the next elements in your pipeline. Moreover, with this kind of logic, one can do Kernels in basically every language that has an interface with Python.

Original post – Choosing the Right Structure

GPL and Face Detection

I had quite a hard time choosing a license as I wasn’t inclined towards GPL initially. However, after discussing this issue with other guys and reading about licenses I decided to go for GPLv3. It follows the same license as CCExtractor has, thus enjoy freely using PMR!

Right after handling license stuff I continued to work on Face Detection. Currently, PMR supports 4 major face detection algorithms – 2 from the “Average performance, best speed” branch (MTCNN and MobileNetSSD) and 2 from the “State-of-the-Art” category (YOLOv3 and DSFD). The latter ones are heavy-lifters that are designed to be run via GPU, while the former one can be run on CPU.

As with all experiments I needed to have a baseline upon which comparisons between algorithms could be made. To do that, I labeled several short videos using a great tool called CVAT. These labels are available within PMR along with all the infrastructure for testing and debugging your algorithms.

Finally, it was right after I experimented with various Face Detection models when I came to the conclusion that clearly there is a room for improvement.

Original postsChoosing the Right License, Working on Face Detection

Finding Similar Frames

The basic idea was to capture groups of similar frames and process them as one frame, based on the premise that faces in these frames don’t move a lot. One can say that while the face remains in the same place across multiple frames, the face itself can change (e.g. facial expression). This can be mitigated by carefully choosing a similarity threshold, making the similarity algorithm more or less confident.

Similarity frames finder has several methods that power it – similarity based on SSIM (more time required, good results) and simple color histogram comparison using various distance metrics. Almost all the time I choose the latter one as it is fast and it gives good results.

I also added two different backends for Face Recognition (FaceNet and ArcFace) and for the following KNN stage (SciKit-Learn and FAISS). Currently, I mostly use FaceNet and SciKit-Learn as other backends require further tuning, which will be done in the future.

Original postFacial Recognition and First Evaluations

All About Faces

In the next stage, I focused on improving the performance of face recognition algorithms. I added 2 techniques that increased performance on almost all of the videos in the test set:

Image Augmentation – I used a great library called “albumentations” that helped me to generate augmented images of faces. The increase in performance from this small step ranged from 3% to 10%.

Face Tracking – As I said in the original post “It would be pretty dumb not to utilize the sequential nature of video”! Thus, I decided to search for the same faces across the frames based on face position. Once we found some such faces we can do face/age/gender and even emotion recognition based on the majority of votes for the same faces across the frames.

Also during this stage, I added age and gender recognition model and started to work on the Web App.

Original postPolishing Face Detection

Working on the Web App

Once I become happy with the structure and the model zoo, it was time to work on the Web App. But first, I added the facial expression recognition model, as I stated in my proposal.

I spent quite a time to learn Tornado Python Web-Framework and refresh my knowledge of HTML, CSS, and JS in order to come up with a web app that could deliver all the power of PMR in a user-friendly way.

During this period I added login via Google and GitHub, came up with a REST API, worked on UI and added support for MongoDB, where I store all the recognition data. The Web App is clearly far from perfect, but the needed foundation is there.

Original posts – Web App, Continuing Working on Web App

Final Demo

Ok, enough talking, here are three short videos showcasing all the power of PMR 🙂

Notice how good PMR was able to handle Unknown cases. The whole process is not perfect but I am still happy with the results. And this is just a beginning!

Things to Be Done

As I have already said, I met almost all of the goals that I had initially proposed. Partly, I didn’t do everything because you can’t foresee all the challenges you will face.

The problems, that I described in this post, took me quite a time to solve and thus I still need to work on finishing the Web App (some visualization tools, pipeline builder and face dataset editor), Native Binding (partly it is hard to come up with a good one as Python itself acts as a glue code very often), celebrities dataset and documentation.

I plan to continue working on this project after GSoC as it really became a part of my daily routine and all of these goals will be eventually accomplished.

You can run PMR using the instructions provided in the GitHub repo. The instructions for PMR-Web and documentation for the whole project is coming soon!

Finally, I would like to thank my mentor Johannes Lochter for believing in me and the whole CCExtractor Development community for being really cool guys 🙂