Learn Without Walls
AI for Creators & Researchers › Day 20
Day 20 of 30

Captions

Record, caption, edit, enhance, and publish — all in one mobile app.

~30 minFree with limitsPaid: $10–31/mo
You're walking and a thought lands. You want to capture it as a talking-head video before it evaporates. Opening a camera app, recording, transferring to a laptop, editing, exporting, re-uploading — by the time you were done the thought would be gone. What if the whole production pipeline fit in your pocket?

Why this tool matters

Captions is the all-in-one AI video studio designed for phone-first creators. It combines: a teleprompter for recording, AI-powered auto-captions in over 100 languages, eye-contact correction (you can look at the script and the AI makes you appear to be looking at camera), auto-editing (remove silences, ums, retakes), AI-generated B-roll to illustrate your words, color and lighting enhancement, and direct publishing to TikTok, Reels, and Shorts. All of it runs on an iPhone or Android device; most of it runs locally.

The category this tool is best understood in is “what replaces CapCut for creators who want AI in the loop.” CapCut is the dominant free mobile video editor, used by essentially every creator shooting on their phone. Captions is the AI-augmented competitor: slightly pricier, dramatically smarter. For creators who value speed (record-to-publish in 15 minutes) and consistency (every video has the same caption style, the same pacing, the same polish), Captions is a genuine productivity unlock.

Because it runs on your phone, Captions unlocks a different shape of content: the just-captured thought, the in-the-moment response, the on-location reflection. For educators and researchers who already do thinking-out-loud content, Captions makes it frictionless to ship.

Setup

Before you start

Installation: download Captions from the App Store or Google Play. Sign up free to evaluate. Paid tiers: Pro ($10/mo) for extended features; Scale ($31/mo) for AI avatars and advanced generation.

Hardware: a recent iPhone (Pro models preferred for low-light quality) or mid-range Android. Earbuds with a built-in mic dramatically improve audio quality — use them for any published content.

Walkthrough

Step 1: Create a new project and pick a template

Open Captions. Start a new project. Pick a caption style from the library — bold centered-yellow is the TikTok standard; cleaner black-on-white works for LinkedIn. You can customize later; the point now is consistency, not perfection.

Step 2: Use the teleprompter

If you have a script, paste it into the Teleprompter. Captions scrolls the text over the camera preview as you record. The scroll speed adapts to your reading speed. Position the phone at eye level for natural-looking delivery.

Step 3: Record

Tap record. Deliver your lines. Don't worry about stumbles, long pauses, or restarts — Captions will clean them up. If you blow a take completely, restart from the beginning of that sentence; don't stop the recording.

Step 4: Let AI edit pass one: silences and retakes

After recording, tap AI Edit. Captions identifies and removes: long silences (>1 second), filler words (um, uh, like), and detected retakes (where you said the same sentence twice in a row). Review the result; undo individual cuts if they lost a beat you wanted.

Step 5: Apply eye-contact correction (if needed)

If you read from the teleprompter and your eyes drifted off-camera, enable Eye Contact. The AI re-renders your eyes to look straight at the lens. It's subtle and remarkable — and also slightly eerie if you study it too closely. Use it for teleprompter recordings, not for authentic talking-head vlogs where off-camera eyes read as human.

Step 6: Add captions + B-roll + publish

Auto-captions are already in place. Customize the timing and styling. Add AI B-roll by tapping Add B-roll — Captions generates a short clip illustrating each of your key points. Preview the full video. Export directly to TikTok, Reels, Shorts, or as a file.

Your turn

Exercise 1

Basic: A captured thought, shipped in 15 minutes

~15 minLevel: Beginner

Think of something you've wanted to say to your audience this week. Write 3-5 sentences of script. Open Captions. Record with the teleprompter. Let AI Edit clean it up. Add captions and export.

Publish it to one platform where your audience is. Note the total elapsed time from idea to published post. Anything under 20 minutes is a huge win.

Exercise 2

Advanced: Five-video batch day

~90 minLevel: Advanced

Pick a batch day on your calendar. Plan five short videos in advance — five ideas, five 60–90-word scripts, five hooks. Wear the same outfit. Pick a good light source.

In one 90-minute session: record, AI-edit, caption, and export all five. Save them as drafts across the next two weeks of publishing. Batching is the productivity move that separates creators who ship consistently from those who don't; Captions makes batching viable on a phone.

After publishing all five over two weeks, write a 150-word reflection: what did the batch day feel like versus recording-one-at-a-time? Which of the five performed best, and why? What do you want to do differently on your next batch day?

Pitfalls and pro tips

Eye-contact correction at length feels unsettling. For a 20-second clip where the speaker glanced at the teleprompter a few times, eye-contact correction is invisible and useful. For a 3-minute video of sustained direct eye contact, the correction starts to read as unnatural. Use it sparingly.

AI B-roll is decorative, not illustrative. The AI-generated B-roll is useful filler but rarely adds real information. For instructional content, shoot real B-roll (a photo, a screen recording, a demonstration) and layer it in. AI B-roll is fine for talking-head expansions of a concept, risky for how-to content.

Battery and storage. A batch day with Captions will drain a phone battery in 2 hours and chew through 10+ GB of storage. Plug in during the session; clear cache between projects; or invest in an external battery pack and a sync-to-cloud workflow.

How it compares

Among alternatives

Captions's competitors include CapCut (free, more manual, the dominant mobile video editor), InShot (similar to CapCut, older), Splice (faster UI, fewer AI features), and Descript's mobile app (Day 13, desktop-first but growing on mobile). For AI-first phone-based creation, Captions leads. For free, flexible, non-AI mobile editing, CapCut still dominates. Most creators use both: CapCut for heavy edits, Captions for speed and talking-head polish.

When to use — and when not to

Use Captions when you're a phone-first creator shipping talking-head content, when speed from idea to published matters more than cinematic polish, or when you need consistent caption styling and auto-editing at batch scale.

Do not use Captions when you're producing cinematic video (Runway — Day 16 — or a desktop editor like Descript or Premiere), when you need multi-track audio and complex layering (desktop wins), or when authenticity of your real delivery matters more than polish (skip Eye Contact correction; embrace the human).

Further reading