Subtitle generator

Pick a video, get an editable transcript with per-word timings back. Whisper runs locally in your browser - no upload, no signup, no API key.

  • Free
  • Runs in your browser
  • Per-word timings
  • No signup

Captions ready. Schedule the Reel or Story with Storrito.

Drop your captioned clip into Storrito, pick a day and time, and it auto-posts as an Instagram Reel, Story or TikTok video. Add link, poll or quiz stickers if you're posting it as a Story. Free mode included.

Try Storrito free →

How to use it

  1. Pick a video

    Drop in an mp4, mov or webm clip. Audio is extracted in your browser - the file never leaves your device.

  2. Generate the transcript

    Whisper runs locally on your CPU. The progress bar tracks the model download (first time only) and the transcription.

  3. Edit and download

    Click any word to seek the video to that point, fix typos in place, then download an .srt or per-word .json.

Why captions still drive watch time on Instagram, TikTok, and Reels

Most people scroll social video with the sound off. They are on the train, in a meeting, in bed at 11pm with someone asleep next to them. If your clip cannot hold attention with the words on screen alone, the thumb keeps moving, and the algorithm reads the early drop-off as a signal that the video is not worth pushing. Captions fix that, which is why every platform now boosts videos that include them.

The hard part has never been "should I add captions." It is "how do I add captions without paying a monthly fee or learning a video editor I will only open once a month." This is the gap the Subtitle Generator fills.

How the Subtitle Generator turns a video into an editable transcript

Drop a video onto the page and the tool pulls the audio out of the file inside your browser, runs the words through a transcription model, and gives you back a transcript with a time for every word. Click any word in the transcript and the video jumps to that moment. Edit any word that came out wrong and the timing follows along. When the transcript looks right, download it as a subtitle file you can use in any video editor, or as a per-word file you can drop into the Subtitle Overlay tool.

Nothing leaves your computer. The video, the audio, the transcript, all of it stays in your browser tab. There is no upload, no signup, and no watermark on the result.

A few practical things to know:

  • Pick the language at the top of the panel before you click Generate. Picking the right language up front gives a much cleaner first pass than letting the model guess.
  • The first run downloads the transcription model, which is around 140 megabytes at the recommended quality. After that it is cached, so the next run starts instantly.
  • Aim for clear speech, one speaker, and short clips. Five minutes or less is the sweet spot.

Why editing the transcript matters before you export

The transcript editor on the right side is the part most people skip and then regret. The model is good but not perfect. It will write "they're" when the speaker said "their," it will hear a brand name as an ordinary word, it will sometimes drop a beat at the end of a sentence.

The fix is fast. Click into the transcript and edit like a normal text box, because the timing data updates underneath you as you go. If you want to listen back while you edit, leave the "Play video when clicking a word" box checked and the video will jump and play from each word you click. Once the words look right, hit the chevron next to the export button. Use the subtitle file in editors like CapCut or DaVinci, or use the per-word file in the Subtitle Overlay tool that animates the captions onto the video for you.

When to skip the Subtitle Generator and use a paid service

The Subtitle Generator is built for the clip a marketer or creator cuts in five minutes between meetings. It is not the right tool for a 45-minute podcast, a panel with three people talking over each other, or a heavily accented voiceover where word-perfect transcription matters legally.

For those cases a paid service that runs on a server farm will give you better accuracy and proper speaker separation. For the 30-second TikTok and the 60-second Instagram Reel, which is what most teams post day to day, browser-local transcription is good enough, free, and private.

Frequently asked questions

Is my video uploaded anywhere?

No. The whole pipeline runs in your browser - audio extraction, model download (cached locally afterwards), transcription. Nothing ever touches a server.

What languages are supported?

Auto-detect plus English, Spanish, German, French, Italian, Portuguese, Dutch, Japanese, Korean, Chinese, Russian, Hindi and Arabic. Quality varies by language and model size; English is the strongest.

How accurate is it?

It depends on the model: Tiny (~75 MB) is rough, Base (~140 MB) is good for most clear-speech material, Small (~470 MB) is the best the browser-side build supports. Larger models are more accurate but take longer to download and run.

How long can the video be?

Practical browser limit is around 5 minutes. Longer clips will work but transcription gets slow and memory pressure rises. For long-form content, split the video first or use a desktop tool.

What does the per-word JSON look like?

It's the same shape OpenAI's Whisper CLI emits: a top-level `text` plus a `segments` array, each with `start`, `end`, `text` and a nested `words` array of `{word, start, end}` entries. Most modern transcription pipelines (faster-whisper, WhisperX, Pyannote) read this format.

Why are some word timings off after I edit?

If you insert words that weren't spoken, those new words get approximate timings carved out of the original word's duration. Real-word edits (typo fixes, splits on space, trailing-space merges) keep their original timings exactly.

Captions in hand. Schedule the post.

Auto-post the captioned clip as a Reel, Instagram Story or TikTok with Storrito. Web Story editor with interactive stickers. Free mode to try it.

Try Storrito free →

Free mode included. Instagram Stories, Reels and TikTok videos.