How to Create an AI Clone

In this article, I will show a step-by-step tutorial of how to use AI Clone function of a2e.ai

What is AI Clone?

An AI clone of a specific user is an approach to creating his / her videos without the person’s physical presence or speaking. AI learns a neural network that is capable of synthesizing audio and images of the specific user. Such learning is accomplished given a short video clip of the user.

A2E’s AI clone is also referred as “instant avatar” or “small sample learning” in other avatar creation website. In academic, such method is called “fine-tuning”.

We have trained a large AI model using the audio and video data collected from thousands of people of different ethics and languages. Such model is referred as the base model. We assign a version to the base model. E.g. V3 as you can see from the video creation tab.

Given new video data of a new user (who is likely not part of our base model training), we start the training process of the base model again with only the new user’s data, so as to learn the face appearance, lip motion, and audio rhythm of the new user. The result of such training process is called “fine-tuned” model. A fine-tuned model is only able to create videos of a specific user while a base model is able to create videos of many users. On the other hand, a fine-tuned model creates much better visual and audio of a specific user than if you directly use the base model.

Step 1: Prepare Data

High quality data is the key to high quality AI clone. Make sure your data is a video clip of the target user speaking from the start to the end.

	Ideal	Bad
Video Length	60 -120 seconds	less than 60 seconds longer than 300 seconds
Face	Containing only front-look face	Side face, swinging face, upside-down face
# of persons	only 1 person	Frames with no person at all. Frames with more than 1 person
Audio Sync	Audio in sync with lip motion	audio not in sync with lip motion
Speaking Volume	Mild and comfortable. Normal person speaking.	Too loud. Too quite.
Noise	No noise at all	Background Noise Noise of air conditioner Other people speaking Noise from streets
Resolution	1080p	less than 720p
Trim	Starting from a frame in the middle of the person’s speech. Ending at a frame in the middle of the person’s speech.	Starting or ending at a frame when the user stops speech.

Step 2: Train Voice and Video

2.1 Train the Voice Model

Click “Voice Clone” tab, type a voice name that you can memorize.

Upload the audio that you extracted from the video file. There are a number of free software allowing you to extract audio from a video file. E.g. capcut, VLC player, or ffmpeg

Alternatively, you can record in live by clicking “Start Recording”.

The uploaded audio file should follow these rules:

	Ideal	Bad
Length	2-5 minutes	< 1 minute
Noise	No noise	Background Noise Noise of air conditioner Other people speaking Noise from streets
# of persons	only 1 person	Audio containing more than 1 person speaking
Speaking Volume	Mild and comfortable. Normal person speaking.	Too loud. Too quite.

Click “Start Training” and agree the credits that you will pay.

Wait for 1 minutes. Then you can use English. If you want to use other languages e.g. Chinese, you will need to wait for 24 hours.

2.2 Train the Video Model

Click “Video Twin” tab

Click create new. Upload the video of the user.

Hover the mouse over the thumbnail picture. Click Animate.

Wait for ~ 2 hours. Then go to create video tab. Refresh your browsers. You will see the thumbnail of your video clone.

3. Use the AI Clone

Click create video tab. You will see the picture of your video clone with a diamond sticker.

Select your AI clone. Type some text in the text box on the left. Select a proper launguage. Select “Voice Clone / XXX” from the Voices drop menu.

Click “Preview Audio”. Wait till the spinning wait bar finishes. Listen to the audio.

Click “Generate Video” if you are fine with the generated audio. Or click the speaker icon to listen to the generated audio again.

Wait for several minutes for your results. You will receive a success notice by email. Go to “My Assets” tab. View or download your result.