Subtitle overlay

Burn animated word-by-word captions onto a video. Pick a preset, tweak typography, position and animation, and download a captioned MP4 - all in your browser.

  • Free
  • Runs in your browser
  • No watermark
  • Per-word animation
  • No signup

Render done. Auto-post the captioned reel with Storrito.

Drop the captioned MP4 into Storrito, pick the day and time, and it auto-posts as an Instagram Reel, Story, or TikTok video. Add link, poll or quiz stickers if it's a Story. Free mode included.

Try Storrito free →

How to use it

  1. Drop a video

    Click the preview area or drop an mp4/mov/webm onto the page. The video stays on your device.

  2. Add a transcript

    If you transcribed this clip in our Subtitle Generator the words load automatically. Otherwise drop a Whisper-shape JSON file or pick one with the prompt that appears.

  3. Style and render

    Pick a preset, fine-tune typography, colors, position, and animation. Hit Render & download to get an MP4 with captions burned in.

Why burn captions in the browser

Most short-form social video lives or dies on the strength of its captions. Viewers scroll with the sound off; the words on screen are doing the work. The traditional path to animated, word-by-word captions is a desktop editor (Premiere, DaVinci, CapCut) or a paid SaaS that uploads your footage. For a 30-second TikTok or a 60-second Reel, neither is necessary - modern browsers can decode, composite, and re-encode video locally with a few hundred kilobytes of JavaScript.

This tool wraps that pipeline in a preset-driven UI: drop a video and a transcript, pick a look, render an MP4 you can post. The preview you see while scrubbing the player is exactly what gets burned into the file - same draw function, same fonts, same animation curves.

What's actually happening under the hood

When you drop a video, the browser hands the file to mediabunny for codec demuxing. The live preview pulls frames out via the WebCodecs <video> element and overlays a <canvas> driven by a requestAnimationFrame loop that reads video.currentTime, finds the active word, and draws the chosen typography + animation each frame.

When you click Render & download, mediabunny's Conversion API replays the video frame-by-frame through a process callback. Each frame goes onto a working canvas, the same draw-overlay! function lays the captions on top, and the result is fed back to the encoder. Audio passes through unchanged - no re-encoding - so the original audio quality is preserved.

The fonts are loaded once at module init via the FontFace API, keyed off the same Google Fonts CDN the rest of the web uses. Pacifico, Anton, Bebas Neue and friends are bog-standard choices, unlikely to surprise the eye, deliberately.

Pairing with the Subtitle Generator

The companion Subtitle generator tool produces exactly the JSON shape this tool expects - per-word start/end plus the text. If you transcribe a video there, then open this tool and drop the same video in, the per-word data loads from a local IndexedDB cache automatically. No second upload of the JSON needed.

If you already have a transcript from somewhere else (the OpenAI Whisper CLI, faster-whisper, WhisperX, AssemblyAI's word-level output), drop that JSON file on the page and it'll be normalised to the same shape on load. The accepted formats are listed in the prompt that appears once a video is in.

When to reach for something else

If you need diarisation across multiple speakers, scripted multilingual captions with manual timing tweaks, or you're captioning a 30-minute talk, this isn't the right tool - reach for a desktop editor or a server-side workflow. For short-form social where the captions are the point, browser-local rendering is fast enough, private by default, and free.

Frequently asked questions

Is my video uploaded anywhere?

No. Audio passthrough, frame composition, and re-encoding all happen in your browser via the WebCodecs API. Nothing ever touches a server.

What's a 'Whisper-shape' JSON?

It's the per-word JSON format OpenAI's Whisper CLI emits, with segments containing nested word arrays. Our Subtitle Generator exports the same shape, and most modern transcription pipelines (faster-whisper, WhisperX) read it without adapter code.

How long does rendering take?

Roughly real-time on a modern laptop with hardware video encoding - a 1-minute clip renders in about a minute. Older machines or unusual resolutions can take noticeably longer; the progress bar tracks it.

Will the rendered MP4 look exactly like the preview?

Yes. Both the live preview and the burn-in render call the same draw function, frame-for-frame, so what you see is what you ship.

Can I save and reuse my custom style?

Not yet. Presets are a starting point; tweak from there. We're considering a saved-style feature - if you'd find it useful, drop us a line via the chat.

What if the rendered MP4 has no audio or fails to encode?

Some unusual codecs decode for playback but can't be re-encoded in the browser. Try a clip with H.264 video and AAC audio - those work everywhere. mp4 from a phone or screen-recorder is almost always fine.

MP4 in hand. Schedule the post.

Auto-post your captioned clip as a Reel, Story or TikTok with Storrito. Web Story editor with interactive stickers. Free mode to try it.

Try Storrito free →

Free mode included. Instagram Stories, Reels and TikTok videos.