Stitch it in Time: GAN-Based Facial Editing of Real Videos

Tel Aviv University

This page contains many wide videos which may not display well on a cellphone. Viewing on browser is recommended

Abstract

The ability of Generative Adversarial Networks to encode rich semantics within their latent space has been widely adopted for facial image editing. However, replicating their success with videos has proven challenging. Sets of high-quality facial videos are lacking, and working with videos introduces a fundamental barrier to overcome - temporal coherency. We propose that this barrier is largely artificial. The source video is already temporally coherent, and deviations from this state arise in part due to careless treatment of individual components in the editing pipeline. We leverage the natural alignment of StyleGAN and the tendency of neural networks to learn low frequency functions, and demonstrate that they provide a strongly consistent prior. We draw on these insights and propose a framework for semantic editing of faces in videos, demonstrating significant improvements over the current state-of-the-art. Our method produces meaningful face manipulations, maintains a higher degree of temporal consistency, and can be applied to challenging, high quality, talking head videos which current methods struggle with.

BibTeX

If you find our work useful, please cite our paper:

@misc{tzaban2022stitch, title={Stitch it in Time: GAN-Based Facial Editing of Real Videos}, author={Rotem Tzaban and Ron Mokady and Rinon Gal and Amit H. Bermano and Daniel Cohen-Or}, year={2022}, eprint={2201.08361}, archivePrefix={arXiv}, primaryClass={cs.CV} }

Stitch it in Time: GAN-Based Facial Editing of Real Videos

Our method can apply semantic manipulations to real facial videos without requiring any temporal components.

Don't have time? Skip to the tl;dr

Abstract

So, what's the gist?

Video outputs at different stages of our editing pipeline

Effects of removing / replacing pipeline components

tl;dr?

What does it look like, compared to the alternatives?

+Smile

+Young

+Old

+Old

Our method can be applied not only to real videos, but also to animated media!

Our method extends beyond human faces; We demonstrate editing a video of a dog (Using an AFHQ-based generator).

Additional Examples

Editing a video with camera movement

We are able to use our method to perform non constant edits to videos

BibTeX