85% of the data consumed over the internet is via videos. About 2.8 exabytes of data is transferred over the internet via streaming videos. This growth is driven by the advent of VOD platforms like Netflix, Video communication platforms like Zoom, Social platforms like Tiktok, esports, live streaming to name a few.
Covid19 pandemic has accelerated the video consumption and has been the driving force of companies moving from offline mode to online live mode. With this explosion of the video consumption on a day-to-day basis we need to be prepared for the upcoming demand.
In this article we will be discussing what are the latest advancements in video streaming technology and how can they help in improving the streaming experience.
Artificial Intelligence has been disrupting all kinds of industries and video streaming is no exception. AI models can learn how to generate a high-resolution image from a low-resolution image by learning from a lot of images. This method of generating a high-res image from a low-res image is called Super-Resolution
Super-Resolution comes under the realm of generative algorithms where the algorithm has the ability to generate information which is not present before. As can be seen in the above figure, the network can take the image from the left and imagine the finer details to re-create the image on the right. This is possible because the AI models have been trained on lots of data of images and it now has an understanding of how to upscale the image when a new image has been provided.
The same concept can be extended for videos with minor modifications. In the case of videos, multiple generated high-resolution frames of the past and the current low resolution frame are used together to generate the current high-resolution frame but nevertheless the concept is same. This technology provides the capability to send a video at 480p but to watch on the client device at say 1080p.
This technology is possible due to the recent advancements in deep learning and the availability of huge compute power at client devices. A recent paper TecoGAN has produced high-resolution results which are eerily similar to the real-world images as can be seen in image above. Using this technology can save up-to 30% of the bandwidth consumption thus improving the overall user experience.
Video streaming follows a client-server model. The content is delivered from edge locations called CDN which cache the content from the server. The client devices i.e the mobile/laptop/TV fetch the content from these CDN to start playing your video. Video streaming via CDN has a few limitations as mentioned below
- Video streaming using CDN is expensive
- Viewership spike leads to high buffering
- Delivery of content to remote locations due to poor coverage of CDN
Due to these reasons it becomes difficult for a video streaming company to scale their service effectively and to provide a good user experience. All of these problems can be solved by extending a CDN with P2P streaming aka Hybrid CDN
Adding Peer-to-Peer layer over the traditional CDN reduces the load on the main CDN by distributing the content from neighboring peers which act like a CDN themselves thus bringing the edge much closer to the user. Due to less load on CDN, the expenses will now be 40% less while the user experience improves as well due to low re-buffering as the content is fetched from a nearby peer than a far-away CDN. More info about how the technology works is available here.
We have seen that content is delivered to client devices via CDN which cache the content from the server. A CDN consists of data-centers located at multiple locations to serve content. These locations are called Point-of-Presence(POP). Ideally we would want to have as many POP’s as possible. But not all CDN’s have equal reach worldwide. For example some of the most popular CDN doesn’t even have a single POP in China.
In addition, the performance of every CDN varies with time and is inconsistent.A CDN can have an outage at any point of time due to Murphy’s law. To deal with all these risks, an ideal option would be to have access to multiple CDN at a time. This can either be done by reaching out to multiple CDN and cutting deals with each of them individually or instead reach out to a Multi-CDN provider who manages all of this for you.
Powering your video streaming service using a Multi-CDN provides you with the advantages which help in improving the overall user experience.
- Prevention of sudden stoppage of service by switching to a working CDN
- Fetching content from the highest performing CDN at that point of time using Real User Monitoring(RUM) metrics to improve Quality of Service(QOS)
- Mid-stream switching can help in reducing CDN costs
Live streaming a sports match would have a delay of 25–30 seconds than a television broadcast thus impeding user experience. This is due to the way the technology around video streaming is designed. The most prominent way of streaming videos is HTTP based streaming, where the video content is delivered in the following steps
- The video is encoded with a codec say h264 to reduce the video file size
- Then the video is converted to a streamable format like HLS or MPEG-DASH
- The content from the server is now distributed via CDN
- The video player buffers few segments of the video before starting to play
HLS or MPEG-DASH allows sending the videos in segments which are downloaded sequentially by the video player. A segment generally contains around 10 seconds of video. The encoder has to wait for encoding the entire segment of the video before making it available for CDN. The CDN has to wait for receiving the entire segment of video before passing it across to the video player. The video player has to buffer atleast few segments of video before it starts playing to maintain user experience.
This entire process leads to a delay of 25–30 seconds. Also due to the presence of multiple formats i.e .ts fr HLS and .mp4 of MPEG-DASH there is a need for double the storage, double the encoding and double the CDN. CMAF tries to solve both the above problems.
CMAF provides a consistent format fragment mp4(fmp4) which is supported by both HLS and MPEG-DASH. It also has the capability of doing chunked transfer encoding. What this means is, the encoder won’t be waiting for the entire segment of video before transmitting. The segment is further divided into smaller chunks and these chunks are sent as and when encoded which is passed on by the CDN to encoder in no particular order. The video player takes care of organizing the chunks and playing the video segment.
Thus CMAF helps in reducing the encoding and storage costs as well as reducing the latency using chunked transfer encoding.
Video encoding is the process of compressing video using codecs like H.264 which take advantage of common information present across consecutive frames and thus only storing the newly added information in the frame. To deal with a variety of network conditions and volatile internet connections the video content is encoded at multiple resolutions and is intelligently switched. This technique is called Adaptive Bitrate(ABR).
The below image depicts multiple resolutions and the corresponding bitrate allocated. The key-factor to observe here is that the content at the same resolution can be encoded at multiple bitrates i.e a 1920 x 1080 video can be encoded at 5800 mbps or 6800 mbps or even 4800 mbps. This is due to the fact that video compression is a lossy compression and hence the lesser the chosen bitrate the lesser the information available in the compressed video. This can be observed in the encoding artifacts similar to the image above which you would have observed while you watch a video on a streaming platform.
Now that we know we can encode a video at a particular resolution using different bitrates, in real-world an ideal generic bitrate is chosen such that there would lesser artifacts while not consuming a lot of bandwidth. The above picture shows a generic bandwidth pattern which is chosen for all videos. But there is a problem with this one size fits all approach. Some of the videos such as high octane action films might have rich information and thus might need a higher bitrate for encoding while a simple cartoon video can perform the same even with a lower bitrate.
Thus Netflix proposed encoding a particular video resolution at multiple bitrates and then choosing the best bitrate which can render the video without artifacts. This is done using a metric called VMAF which is a visual score to identify the quality of rendered video. With the help of this, custom bitrate ladders can be designed for each video. Thus if we are streaming a cartoon video, it can be published at a lower bitrate while still having similar quality whereas the action film can be published at a higher bitrate without any artifacts and thus improving user experience.
To summarize we have seen multiple techniques which can help scaling, improve user experience and reduce latency in video streaming.The advent of 5G and the world of Virtual Reality(VR) and Augmented Reality(AR) and the rise of esports will skyrocket video bandwidth consumption. Thus it is time for the adoption of the latest technologies to catch up and service this growth.