Something You Should Know About DeepFake
There’s some serious hype lately about this AI algorithm called DeepFake, that’s able to swap the face of anyone in a video with anybody else. Some of the example I’ve seen are amazingly realistic. I’m going to tell you how this algorithm works through theory and code.
Last month, a user named DeepFakes, of course, created a name for himself by releasing convincing videos, say, not safe for work, that exchanged the person’s face in the original video with such a fame as Taylor Swift, for instance. It’s quick. To do this, he taught his deep learning algorithm on only a few publicly accessible videos on his personal computer. This practice has erupted in last weeks.
There is a whole community of GitHub and reddit dedicated to the maintenance, guaranteeing the results of this algorithm. The subreddit has grown enormously, and there is even this desktop application called FakeApp, which allows anyone to recreate one of these videos with his own dataset, no programming knowledge is needed. Because of that, people put Nicholas Cage’s face on everything and spread it, because it’s the Internet.
How Does The DeepFake algorithm work?
Suppose we have this video of Harrison Ford in Indiana Jones, and we want to replace his face with Nicholas Cage, because why the heck not, our first move is to collect training information. More specifically, the pictures of Harrison Ford and Nicholas Cage, who will shape our models later on.
Possible sources of these images could include Google, DuckDuckGo or Bing image search. Luckily, the face swap repository has scripts to automatically download large amounts of images from one of these sources to our home directory. Once we got a couple hundred images, we can place them each in their respective folder.
Notice, though, that these pictures are full of things not linked to our characters. In the surroundings of these pictures, there are all kinds of objects and distractions that have nothing to do with our characters.
We’re going to want to perform some face detection on each of these images. Pretty much all cameras that have been created in the past decade, have some sort of real time face detection algorithm. The open computer vision, or open library will allow us to easily recognise faces in images with just a function call.
What Method Is It Using?
But the way it’s doing this in the background, is by using a method called histogram of oriented gradients or hog for short. It starts by making our image black and white to simplify it. We don’t need color data to find faces. Then it looks at every single pixel in the image one at a time.
For each pixel, it looks at the pixels that surrounds the character directly. The goal is to determine how dark the current pixel is relative to the pixel that surrounds it directly. Then, it draws an arrow indicating in which direction the image darkens. Once this is done for each pixel in the image, each pixel will be replaced by an arrow.
These arrows show the flow of light through the entire picture in the dark. If we analyze the pixels straight, the very dark pictures and the very transparent pictures of the same individual will have completely distinct pixel values. But since we only take into account the direction in which the brightness changes, the two kinds of pictures are identified with the same depiction.
Let’s make it simpler to fix the issue. Recording all of these instructions needs too much room, so that the picture is broken down into tiny squares and then the AI takes it further.