Descript
Delete a word in the script; the video cuts that word out. It's that simple.
You just recorded a 20-minute podcast episode. There are 43 filler words (“um,” “like,” “you know”) scattered throughout. You said a thing at 4:12 that you want to cut. You want to trim the awkward 15-second pause at 11:30. In a traditional audio editor, that is 90 minutes of scrubbing the waveform. In Descript, it is about five minutes of editing a Word document.
Why this tool matters
Descript is an audio and video editor where the edit surface is the transcript. Import a recording; Descript transcribes it; the transcript becomes a document. Delete a word from the document and the corresponding audio (and video) disappears. Cut a paragraph, rearrange a sentence, remove every filler word with one button — all at the speed of word processing.
For anyone making long-form audio or video — podcasts, video essays, tutorials, courses, interviews — Descript compresses what used to take days of scrubbing waveforms into an afternoon of reading and editing text. The UX metaphor is so natural that many users never realize they've bypassed the entire traditional audio-editing learning curve.
Beyond the transcript edit, Descript includes Studio Sound (one-click audio enhancement that removes background noise and adds warmth), Overdub (AI voice cloning of yourself, so you can “re-record” a word by typing it), Eye Contact (corrects your gaze in video so you're always looking at the camera), and Filler Word Removal (the feature you'll use most and thank the universe for).
Setup
Account: descript.com free tier includes 1 hour of transcription per month and basic editing — enough to evaluate with real work. Creator tier ($12/mo) unlocks more transcription, Studio Sound, and Overdub. Pro ($24/mo) adds AI actions and higher limits.
Hardware: Descript runs as a desktop app on Mac and Windows. A decent headset microphone matters more than the app for the quality of your output.
Walkthrough
Step 1: Import a recording
Open Descript. Create a new project. Drag in an audio or video file. Descript transcribes it (about half real-time for audio, slower for video). When done, you see a document: speaker names, paragraphs, everything they said.
Step 2: Listen while you read
Click any word; playback jumps to that moment. Press spacebar to play from there. This is the habit that makes the rest of the tool work: always edit audio by reading, not by scrubbing.
Step 3: Delete text; the audio deletes too
Select a sentence you want to cut. Press Delete. The audio (and video) cut instantly. Listen: does it flow? If the cut is abrupt, Descript lets you re-add a short silence or a crossfade with one click.
Step 4: Remove filler words in one pass
Tools → Remove Filler Words. Descript scans the whole transcript, highlights every “um,” “uh,” “like,” “you know.” Review (some “likes” are real words, not filler). One click removes all the surviving ones from both transcript and audio simultaneously.
Step 5: Clean the sound with Studio Sound
Select the audio track. Toggle Studio Sound. Descript removes background noise (HVAC, room echo, keyboard clicks) and applies podcast-style EQ and compression. Compare before and after. The difference on a mediocre recording is shocking.
Step 6: Publish or export
When the edit is done, Export as MP4 (video), MP3 (audio), or publish directly to YouTube, Descript's hosted player, or a podcast host. The transcript itself exports as SRT captions, clean text, or a Word doc. One recording → five deliverables.
Your turn
Basic: Clean a 5-minute voice memo
Record yourself on your phone talking about one topic for 5 minutes — a work idea, a summary of something you read, a mini-tutorial. Import to Descript. Run Remove Filler Words. Delete the sentences that didn't quite land. Apply Studio Sound.
Listen to the before and after. The before was a rough voice memo. The after is the opening of a podcast episode. You could publish it today.
Advanced: Produce a real 10-minute episode
Record a genuine 15-minute interview or monologue. Import to Descript. Edit until it's a tight 10 minutes: remove filler words, cut sections that didn't land, rearrange a paragraph or two if that improves the flow.
Add: an intro sting (5 seconds of music), a brief verbal intro (3 sentences), chapter markers at natural transitions, captions exported as SRT, and a cover image.
Publish it: as a podcast RSS feed from Descript's built-in host, or as a YouTube upload with captions, or as a blog post with embedded player.
Write a 150-word reflection: where in the edit did the transcript-first workflow unlock something that would have been tedious in a traditional audio editor? Which Descript feature will you use every time going forward?
This exercise builds the muscle that turns you into a creator, not just a consumer, of audio.
Pitfalls and pro tips
Transcript accuracy is proportional to audio quality. Descript's ASR is among the best in the world, but a noisy recording (HVAC, echo, an inexpensive laptop mic) produces a noisy transcript. Invest in an under-$100 USB microphone before investing more time in Descript.
Over-editing makes speech sound unnatural. Removing every “um” and every 200ms pause produces audio that sounds weirdly machine-like. Leave some thinking pauses in. Good editing makes you sound like your best self, not like a synthesizer.
Overdub requires training data + consent. Descript's Overdub lets you generate new audio in your own voice by typing. It needs about 10 minutes of training audio. Use it sparingly — corrections of single mispronounced words are great; re-generating whole paragraphs is likely to sound uncanny to listeners who know you.
How it compares
Descript's pure-audio competitors include Adobe Podcast (Day 15, stronger on audio enhancement, weaker on editing UX), Riverside.fm (better for recording remote interviews, weaker as an editor), and Alitu (podcaster-focused, simpler). On the video side, Descript competes with CapCut, Premiere Pro, and Final Cut — none of which have the transcript-first metaphor. If you are making audio or video regularly, Descript is the single tool most likely to change how you work; for one-off or highly stylized video, dedicated editors still win.
When to use — and when not to
Use Descript when you are making audio or video at any regular cadence: a podcast, a YouTube channel, course recordings, tutorial videos, client case-study interviews. The transcript-first edit metaphor compounds: the more you use it, the faster you get.
Do not use Descript when you're making music or heavily-mixed multi-track audio (a proper DAW like Logic or Ableton wins), or when you're making highly-composed cinematic video with complex color grading and VFX (Premiere Pro and DaVinci Resolve remain the pros' tools).