We’re becoming increasingly adept at training computers to manipulate images and videos, putting the words of Trump advisors into the mouths of French singers and turning beach scenes into pornographic reveries. So it was only a matter of time before such technology was turned on internet favourite, the cat.
A project from from Nvidia and Cornell University is the next step in accurately “translating” images, and it has been demonstrated by manipulating a video of a dog to make the canine transform into a cat. Not only one cat, mind, but four different breeds of cat – each moving their head in the same way as the original huskie.
READ NEXT: AI versus machine learning
The technique is called a Multimodal Unsupervised Image-to-image Translation (MUNIT) framework by its creators and is pitched as an improvement over previous methods in that it gives more freedom for a given image to be manipulated into a range of different outputs. As a video of the framework in action shows, a cat can be “translated” into a number of different dogs, and vice versa.
“Image-to-image translation refers to transforming an image from one domain to another (e.g., cats to dogs, sketches to shoes, summer to winter) while keeping the underlying structure unchanged,” Xun Huang, lead author of a study on the research and a PhD student at Cornell, tells Alphr.
“Our framework is unsupervised, which means it does not need to see examples of corresponding images (eg., this cat should be transformed to that dog), but it can discover the relationship on its own. It is also multimodal, which means one cat could be transformed into multiple dogs, while previous works only support one-to-one mapping.”
The researchers aren’t only interested in swapping the bodies of housepets. They have also used the MUNIT framework to manipulate pictures of landscapes in different seasons, images of shoes and handbags from drawn sketches, and street scenes from computer-generated driving scenarios.
“This technique provides more freedom for image manipulation,” says Huang. “Previously the manipulation process is deterministic – you get a single output image from your input. With our method you can choose which output you want from a distribution of possible outputs. You can also control the style of output by providing an example image.
“In practice, this technique can be used to aid the designing process, to make games/movies, and to help the development of self-driving cars.”
The uncanny, Men-in-Black-villain appearance of the “translated” cats suggests the framework could do with some refining. Nevertheless, the project shows just how far unsupervised image manipulation is progressing. Will it undermine the reality of what we see on our screens? Perhaps. For now, at least, you can see what Fido looks like as a Ginger Tom.
The code for the study is available from GitHub here. Found via Prosthetic Knowledge.
Disclaimer: Some pages on this site may include an affiliate link. This does not effect our editorial in any way.