So it is August and frankly speaking I am a bit slacking as I am back to my home country and it is pretty hard to focus on the work when there is great summer weather, friends and bicycle in place 😛. In this post, I will talk about new features of the Web App and at the end of the week, I will publish a final (😢) GSoC-related post, describing everything I’ve accomplished so far.
As I have already mentioned August was not the most productive month of mine. Apart from constant distractions I also had to have JS and HTML refreshers. Thus, I have finished 2 major components that lay a foundation for future work on Web App.
Interactive Task List
First of all – PMR-Web now supports all types of files:
- Videos – .mp4, .avi etc.
- Images – either one image or a .zip archive with a pack of images
- YouTube video – video will be downloaded in the background using PyTube library and AJAX technology.
Also, task list supports pagination as you can see from the GIF above. It took me a while to figure out how to handle it – basically, every time you change a page, your browser sends a request to a server for additional tasks. In the future, I might add caching so that we don’t request pages with tasks that didn’t change.
You might have noticed in the GIF above that task name is now a link. Let’s check out what is hidden behind it!
Let’s look at it step by step:
- Video Player with overlay for bounding boxes along with names
- Person’s characteristics like age, gender, facial expression.
- Task details
I had to spend quite a lot of time to come up with this page as there were some non-trivial problems that I have encountered:
One of the most “digestible” types of information is a visual one – that is how humans work. That is why right from the beginning I came up with VideoOutputHandlerElem to draw bounding boxes with detected faces on video. However, it takes quite a time to loop over all the frames in a video and that is why I decided to create an HTML overlay for a video that could do the same thing without creating a new video.
However, it turned up to be not an easy task. My initial try was to simply create new bounding boxes as
<div> elements and put it as an overlay on a video. I was able to create new boxes but then I thought that it would be terrible performance-wise – just think about creating and deleting new divs every 100-200 ms? Turns out there was a better solution.
<canvas> element.” That was exactly what we needed. I was able to lay canvas over
<video> and draw and clear boxes on it.
Tip: don’t set the width or height of
<canvas> via CSS, use them as attributes. I spent a lot of time figuring out why my
<canvas> was so blurry.
I was happy with the canvas solution until I encountered the problem with watching a video in a full-screen mode – nothing helped to lay
<video> in this mode!
Turns out that full-screen mode is a very special thing in newest versions of HTML – you can get almost every HTML element into full-screen mode but nothing else can overlay it and you can’t get to full-screen unless a user specifically clicked or interacted somehow else with that element!
I spent a lot of time thinking about how to handle this issue and the only thing that came up to my mind was to get the whole
<canvas> into full-screen mode. Long story short – I was able to do that but I had to create custom controls for the player in order to overwrite behavior of “Full-Screen” button.
Big Data Ain’t a Joke
Everything was great, I was happy with my new video player until I didn’t notice that pages with long videos were taking a lot of time to load (and that was happening on localhost!). Sure, the longer the video, the more time it takes to send from server to client, but that was not it – obviously JSON files with recognition data were quite large for long videos (some 4 minutes video had 10.5 MB large JSON file).
At that moment recognition data for each video was stored in MySQL database as JSON type along with other entries. Turns out that MySQL is not particularly good when it comes to handling loosely structured data such as the one we had here. Thus, I decided to move this already growing part of the data to MongoDB.
MongoDB is a document-oriented database that is much less strict to data structure than MySQL. I was thinking about using it right from the beginning but it turned out that a combination of MySQL for user data and MongoDB for recognition-related data works the best.
With MongoDB’s aggregations, I no longer had to fetch the whole JSON file from the database as it allows to query only a part of an array from the database (e.g. JSON data for frames from 10th to 25th second).
Every 15 seconds video player queries PMR’s backend for more JSON data. PMR’s backend, in turn, calculates the range of array’s slice that corresponds to the requested 15 seconds timeframe. This way JSON data is loaded asynchronously, effectively lifting up from the server the burden of querying and sending JSON data all at once.
Summing up, MongoDB is a valuable addition to the project as I am planning to use it for other types of data too.
Demo of Video Player
Enough talking, let’s see how all these actions perform in real life!
In the beginning, I was really afraid that
<canvas> won’t have good performance while updating it several times a second, but as you can see it works pretty well. If you hover over a bounding box you can see all the recognized data in the box on the left from the player. Finally, as I have already mentioned it works in Full-Screen mode too (but without showing extra recognized data).
Thanks for reading yet another PMR-related blogpost! In the coming days, I will work on the blogpost that will summarize all the things I have done during GSoC 2019. I will also work on preparing PMR-Web for demonstration and work on readmes for both PMR-Web and PMR.
See you soon!