GSoC 2019 – Choosing the Right License and building the DL Rig

So, it has been almost 2 weeks since I have posted my last posts and I am happy to grab your attention again! In my first GSoC post, I stated that I will publish a new post every Sunday but turns out this format does not suit me that good. Instead, I will publish 2 new posts every 2 weeks – one describing technical stuff and another talking about various non-technical things related to the project. And this post falls exactly into the latter category.

Choosing the Right License

Image result for gpl
Free as in Freedom. GPL’s logo reminds me of Sonic the Hedgehog 😃

Life is all about the right choices 🙂 Previously, I considered GPL (GNU General Public License) as the only game in town and others (e.g. MIT, Apache 2.0) looked for me as a merely the same thing. Turns out, I was wrong by a large margin.

First of all, what qualities does software need to be “open-source”? According to the Open Source Initiative website, it should be allowed to be freely used, modified, and shared. There are multiple licenses that allow one to do that but in this post, I will briefly describe only 3 most popular open-source licenses and talk about my motivation behind choosing one of them.

  1. The MIT License – The most simple license that allows barely everything (by not prohibiting a lot). According to Choose a License, the only limitation of MIT is that it doesn’t provide any warranty while commercial and private use is allowed.
  2. Apache License, Version 2.0 – A more restrictive license than MIT, however, it is still 100% open-source. The text of the license deals with patents and trademarks making Apache 2.0 appealing for commercial use.
  3. GNU General Public License v3.0 (GPLv3) – The mighty GPL! A child of one of the biggest (if not the biggest) open-source influencers, Richard Stallman, that clearly changed the way we think about software. GPL is famous for its strong copyleft requirements, meaning that source code of all forks, modifications, etc. should be publicly available.

Licenses Compatibility

One of my questions that remained unanswered even after my small research was how can we combine software packages with different licenses?

Turns out that because of the strong copyleft of GPL, it is hard to use GPL-licensed projects within projects with other open-source licenses that are not imposing a strong copyleft. For example, according to Apache website you can use Apache 2.0 licensed code within GPL projects, but not the other way around.

As a result, a lot of commercial software is simply not using GPL-licensed software due to copyleft requirements. On the other hand, GPL enabled hundreds of projects to remain truly free (a notable example is Linux, that is licensed under GPLv2). Consequentially, I devised a rule of thumb for myself – you can use MIT and Apache 2.0 code within GPLv3 projects, but think twice before doing it other way around.

Final Decision

After a discussion in our CCExtractor Slack channel, I decided to go for GPLv3. At first, I was more inclined to use Apache 2.0 as I do believe that open-source word will only benefit by cooperating with businesses but since the most widespread license in CCExtractor’s community is GPL, I decided to do the same. In the end, PMR should be a truly open-source alternative of its proprietary counterpart and GPL will make sure that it remains non-closed and non-private.

Building a Deep Learning Rig

After thinking for a lot of time and experimenting with various Deep Learning algorithms I came to the conclusion that I need to have my own Deep Learning Rig in order to be able to test my ideas fast (and also start competing at Kaggle 🙂).

After reading various guides (best of which are choosing HW and choosing GPU) I decided to go with the following setup:

The first version of my DL Rig!

Let’s briefly discuss each component:

  1. SSD is a must-have thing if you want your OS and data to load fast. However, I am thinking about changing this SSD to an NVMe one (namely Samsung 970 Evo Plus 500 GB) that is much faster but also twice as expensive.
  2. Tim Dettmers recommends buying as much RAM as VRAM (GPU memory) you have. However, I think that more RAM is never enough and I am definitely interested in buying additional GPUs in the future.
  3. This motherboard has 3 PCIe (x16) lanes meaning that I will be able to add 2 more GPUs in the future.
  4. The main hero of this build! RTX 2080 with Blower Design that will allow combining several GPUs without making them throttle! However, I am still not sure whether I should buy RTX 2080 or go for RTX 2070 as the latter one has better performance per dollar ratio.
  5. Even when you have SSD having a 1 TB HDD is a good thing for downloading and storing datasets.
  6. I was choosing between this CPU and i3-9100F and went with a former one because i5 is not that more expensive and having 2 more cores is always a good thing.
  7. 700 Watt Power Supply Unit and I do hope that it will be powerful enough for the future!

Again, this is just a first version of my DL PC and some of this stuff can be changed in future. Meanwhile, I am waiting for NVIDIA to announce their new RTX Super GPUs. According to rumors, they won’t be drastically different from current RTX line-up, but this release will definitely drive the prices for current RTX iteration down.

I will publish a detailed post with final configuration and all my recommendations when I get all the components, presumably at the end of June. With this Rig, I am going to retrain all the models that are used in PMR in order to play with different hyperparameters and squeeze everything out of them!

Small Advice to Myself

What is particularly great about GSoC is that it requires one to build up self-discipline. This is your project, and you are the employee and the employer at the same time. Thus, you need not only think about implementation stuff but also have a clear view of your project, meet deadlines and most importantly – actually deliver the stuff you promised to your community and yourself.

You could say that there are two evaluations and you have to report your progress to mentor, but clearly, you need to have a lot of motivation and more importantly self-discipline to wake up every morning and continue to work on your project, no matter how hard it is. Thus, I would like to share a few advises that stem from my GSoC’s everyday routine

Start your Day with Small Things

I have noticed that if I start my day with a running session I stay more focused during the whole day. Moreover, starting your day by fixing a small bug or adding a few features can greatly boost your self-esteem and help achieve bigger things throughout the day. A fruitful day starts with a few commits.

Start Simple

This is something terribly obvious, yet I find myself constantly forgetting this simple principle. For example, I currently work on adding benchmarks to PMR (to test Face Detection and Face Recognition algorithms against hand-labeled videos) as well as time-benchmarking of particular parts of code (inference, data loading).

At first, I was trying to come up with a good benchmarking structure that would cover already fast-growing codebase of PMR right from the start. Of course, I failed as both Face Detection, Face Recognition and time benchmarking require different functionality and API design. Instead, I just simply started with adding benchmarking to Face Detection and I have already got it working pretty good!

Conclusion

I hope that you enjoyed this rather informal monologue about all the GSoC-related stuff. Feel free to ask me anything in comments or via E-Mail. Stay tuned, more is coming!

This post is also available in my Medium blog