Captions and text overlays can be a hassle when you want to reuse videos for different purposes. Traditional tools for removing captions fall short—they require manually selecting text regions, which is tedious and inefficient. This becomes even more challenging with dynamic captions, common in today’s short videos. Our solution? A fully automated API designed to remove captions effortlessly. Using OCR, it intelligently detects text, while a deep learning-based inpainting technique fills in the areas, leaving your video clean and ready for reuse—no manual effort needed.
Let’s go through the caption removal API step by step. In this example, we will use the following video for illustration purpose. Download the video and store it into an URL that is accessible for A2E API.
First, get you login token and then get your API token:
Next, run {{base_url_us}}/api/v1/userCaptionRemoval/start and provide the source_url and the name.
We got task id from “_id” field as 66ed3ff114f81f76b90f9ef1. Use this id to query for the task status. Send http get request to {{base_url_us}}/api/v1/userCaptionRemoval/66ed3ff114f81f76b90f9ef1
First we see “processing”, which indicates we need to wait. Because many AI algorithms are involved in the caption removal process, the current processing speed is ~1.5 frames / second. In other words, if your uploaded video is 1 minute (60 seconds) with 30 FPS, you will need to wait for 30*60/1.5=1200 seconds.
Wait until “current_status” shows “completed”, download the result from the “result_url” field of the reponse json.