Machine Learning for AOM/AV1 and its Application in RTC

We continue our review of the All Things RTC conference with a play-by-play of the informative, invaluable, and inspiring presentations that were given during the event. Recently, we went back over the keynote speech by Debargha Mukherjee, discussing the history of AV1 and its ongoing impact on the industry.

In parallel to this, Zoe Liu, the co-founder and president of the Visionular startup, also gave a talk titled “Machine Learning for AOM/AV1 and its Application in RTC.”

Here, she used Debargha’s presentation as a launchpad and showed how her company had started to harness machine learning to advanced AV1 development and already lay the groundwork for AV2 development.

Zoe discussed how Visionular was one of the 42 members of AOM, the organization founded to build royalty-free codecs and promote a royalty-free ecosystem. Her company has focused on AV1’s code efficiency and potential capacity for

Machine Learning Informs Encoding Progress
Machine learning can be used primarily to improve encoder performance, speed, and video enhancement on a variety of levels, including frame prediction and frame synthesis, optimizing the AV1 codec for future purposes.

Machine Learning for Encoder Speed
By applying machine learning to encoder processing on the partition side, Visionular has been able to see 30-50% speed improvements on average. Usually, there is a lot of video partitioning going on, which is quite time consuming. By applying a new network, a new frame can be introduced with a partition map supplied far faster than ever before.

Machine Learning for Mode Selection
Mode selection is another immediate application for machine learning and AV1, with empirical encoding rules able to be replaced by neural-network-based decisions. This lets modes be transformed or switched far faster and more seamlessly, with less user oversight.

Machine Learning for Frame Prediction
Machine learning can be fed a stream of video frames that, even though they may have lower quality or gaps in the feed, can have quality restored or have missing areas filled in for better consistency. Overall video appearance and performance can be upgraded, thanks to neural network training and processing, which can even synthesize entirely new frames based on video histories.

These are just a few of the growing applications that machine learning provides with AV1. Visionular has also developed its own version called dav1d, a support process for AV1 that is exceptionally fast, scalable, and production ready. It’s also compatible with most major browser platforms, with more being added every day.

Zoe finished by introducing the Visionular team, a small group of experts and an aggregation of codec programmers and machine learning pioneers. They are all contributing to the AOM organization and are finding new ways to shrink video size while maintaining or improving quality.

Click here to watch the full presentation.