Introduce A2E Caption Removal API


Captions and text overlays can be a hassle when you want to reuse videos for different purposes. Traditional tools for removing captions fall short—they require manually selecting text regions, which is tedious and inefficient. This becomes even more challenging with dynamic captions, common in today’s short videos. Our solution? A fully automated API designed to remove captions effortlessly. Using OCR, it intelligently detects text, while a deep learning-based inpainting technique fills in the areas, leaving your video clean and ready for reuse—no manual effort needed.

Original video with captions and the result video with caption removal

Let’s go through the caption removal API step by step. In this example, we will use the following video for illustration purpose. Download the video and store it into an URL that is accessible for A2E API.

Input video for the demo purpose.

First, get you login token and then get your API token:

Next, run {{base_url_us}}/api/v1/userCaptionRemoval/start and provide the source_url and the name.

We got task id from “_id” field as 66ed3ff114f81f76b90f9ef1. Use this id to query for the task status. Send http get request to {{base_url_us}}/api/v1/userCaptionRemoval/66ed3ff114f81f76b90f9ef1

First we see “processing”, which indicates we need to wait. Because many AI algorithms are involved in the caption removal process, the current processing speed is ~1.5 frames / second. In other words, if your uploaded video is 1 minute (60 seconds) with 30 FPS, you will need to wait for 30*60/1.5=1200 seconds.

Wait until “current_status” shows “completed”, download the result from the “result_url” field of the reponse json.

Result video of the caption removal API. The texts are automatically removed.