Machine Learning Is Getting Better At Video Editing: Makes People To Disappear

The researchers at Virginia Tech and Facebook AI have come with an improved technique that allows machine learning to edit videos like never before. With their paper titled, “Flow-edge Guided Video Completion”, they have presented a new flow-based video completion algorithm.

Video completion in this context refers to filling up a pre-recorded video with newly synthesised content. The use cases of a successful video completion algorithm are plenty. From automating VFX workflows to removing watermarks, they can be quite handy.

Previous methods on video completion tasks have used colours among local flow connections between adjacent frames. However, because the motion boundaries form impenetrable barriers, not all missing regions in a video can be reached in this way. So, the researchers in their method, try to address this problem by introducing non-local flow connections to temporally distant frames, which can propagate video content over motion boundaries. The whole experiment is validated on the DAVIS dataset.

So far, the ML techniques could not synthesise sharp flow edges, especially in complex situations. It is challenging to keep the output temporally coherent with respect to the dynamic motion of the camera. In this work, the researchers somehow seem to have managed to perform video completion seamlessly.

How It Works

As shown above, the algorithm works as follows:

  • A binary mask is applied to the colour video to figure out which parts need to be synthesised.
  • Forward and backward flow are computed between adjacent and non-adjacent frames.
  • Flow edges are extracted and completed
  • These completed edges act as a guide for piecewise-smooth flow completion.
  • Candidate pixels are computed for each missing pixel by estimating a confidence score as well as a binary validity indicator.
  • A frame with most missing pixels is chosen and is filled with image inpainting.
  • This is repeated until there is no missing pixel.

As illustrated below, flow completion requires Optical flow estimation on the input video. Missing

regions given have zero value (white). This is followed by edge extraction, and later piecewise-smooth completed flow, using the edges as guidance.

For evaluation, the researchers used the DAVIS dataset, which contains a total of 150 video sequences. Following the evaluation protocol, 60 sequences in 2017-test-dev and 2017-test-challenge were used for training the flow edge completion network.

Masks were adopted from NVIDIA Irregular Mask Dataset testing split. During training, wrote the authors, edge images and corresponding flow magnitude images were first cropped to 256×256 patches. Then they are corrupted with a randomly chosen mask, which is resized to 256×256. ADAM optimiser with a learning rate of 0.001 was used, and training the network on a single NVIDIA P100 GPU took 12 hours.

Why Is This Work Important

To make an algorithm to differentiate between two objects in a dynamic setting is tricky. Imagine a person walking; the background will be visible even in the sweeping motion of the foot movement. Video completion algorithms have to fill in for the missing pieces of information. The applications of these techniques can be extended to removing scratches from videos, for video editing and special effects workflows (removing unwanted objects), watermark and logo removal, and video stabilisation (filling the exterior after shake removal instead of cropping).

Though the work gives a new direction for automated VFX workflows, there are limitations such as frame rate and detection of objects in fast-paced environments (think: boxing match).

This work by Chen Gao and his peers was also presented at the recently concluded 16th edition of European Conference on Computer Vision (ECCV).

Know more about this work here.


Original post:

Leave a Reply

Your email address will not be published. Required fields are marked *