A Seoul National University Master’s student and developer has trained a face generating model to transfer normal face photographs into cartoon images in the distinctive style of Lee Mal-nyeon.
The student (GitHub user name: bryandlee) used webcomics images by South Korean cartoonist Lee Mal-nyeon (이말년) as input data, building a dataset of malnyun cartoon faces then testing popular deep generative models on it. By combining a pretrained face generating model with special training techniques, they were able to train a generator at 256×256 resolution in just 10 hours on a single RTX 2080ti GPU, using only 500 manually annotated images.
Since the cascade classifier for human faces provided in OpenCV— a library of programming functions mainly aimed at real-time computer vision — did not work well on the cartoon domain, the student manually annotated 500 input cartoon face images.
The student incorporated FreezeD, a simple yet effective baseline for transfer learning of GANs proposed earlier this year by KAIST (Korea Advanced Institute of Science and Technology) and POSTECH ( Pohang University of Science and Technology) researchers to reduce the burden of heavy data and computational resources when training GANs. The developer tested the idea of freezing the early layers of the generator in transfer learning settings on the proposed FreezeG (freezing generator) and found that “it worked pretty well.”
For example, when basing the generator on a StyleGAN2 model trained on the FFHQ dataset, it required only about 10 hours to fine-tune the pretrained model before it learned to successfully generate realistic cartoon images.
With the promising results from the malnyun cartoon faces, FreezeG was further tested on other datasets with large geometric transformations such as face2simpsons. Here, however, connections between the original images and the generated images became less apparent.
The student also experimented with U-GAT-IT, an image-to-image translation method that has achieved great success in the face2anime task, by attaching the output of the FFHQ-trained StyleGAN2 to a trained U-GAT-IT model to explore the learned space. Although the StyleGAN2 model was trained mostly on Caucasian faces and the U-GAT-IT model on Asian faces, in combination the two generated acceptable results.
The student-developer noted that the proposed training method is in fact a pseudo translation method because the input image must be projected to the learned latent space first before the projected vector is propagated again to generate the target image. This consequently limits performance to the in-domain images of the original GAN.
Reporter: Yuan Yuan | Editor: Michael Sarazen
This report offers a look at how China has leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon Kindle. Along with this report, we also introduced a database covering additional 1428 artificial intelligence solutions from 12 pandemic scenarios.
Click here to find more reports from us.
We know you don’t want to miss any story. Subscribe to our popular Synced Global AI Weekly to get weekly AI updates.