Turn recordings into structured assets, not walls of text.
The precision-engineered Windows utility for heavy media workflows. Export word-level JSON, sentence clips, and unique stills — all offline.
Built for creators, editors, and researchers who need structured transcript outputs instead of a wall of text.
Most tools give you a flat text file. EchoText treats your media as data — decomposed into addressable word-level and sentence-level assets for your NLE, database, or research engine.
Structured outputs for real production work.
Export transcript data, sentence clips, still images, and previews from one desktop workflow instead of stitching together separate tools.
Process Folders & Volumes
Batch process entire directories of audio and video. Drop a folder, get a structured asset library out the other end.
Word-by-Word JSON
Ultra-precise timestamps for every single word uttered. Perfect for automated subtitling and search indexing.
Sentence-level Clips
Automatically export WAV or MP4 snippets for every individual sentence.
Unique PNG Stills
EchoText captures the exact frame associated with each sentence export.
In-app Preview
Review transcriptions and clips instantly within the utility before export.
Recent upgrades that change the product story.
EchoText is moving fast. Only the highlights that shift what the tool can do.
Unique Images Mode
Export only visually distinct PNG stills from video sources when transcript files are not the goal.
Stable Video Preview
Video preview now renders reliably on Windows, making mixed file selections safer to inspect before processing.
Sentence MP4 Clips
Video sources can now export sentence-level MP4 clips using the same timing data that drives JSON output.
Offline-First. Privacy-Native.
EchoText starts with the bundled tiny.en model for instant local processing, lets you download small.en for a free quality bump, and reserves large-v3 for Pro when accuracy matters most.
What you need to run EchoText.
The Windows installer bundles the full app runtime. Customers only need a 64-bit Windows machine with enough memory and disk space for the models they want to use.
64-bit Windows 10 or 11
EchoText ships as a Windows desktop installer targeting x64-compatible systems.
Intel or AMD CPU
No dedicated GPU required for the standard Windows workflow.
8 GB System RAM Recommended
Regular system memory, not VRAM. Leave extra disk space if you plan to download larger Whisper models later.
Simple. One-time. Yours.
EchoText Personal
- check Bundled tiny.en plus optional small.en download
- check Word JSON, sentence JSON, transcript text, and preview
- check Sentence clips and unique image export
- block Personal, non-commercial use only
EchoText Pro
- bolt large-v3 transcription accuracy
- check Commercial-use license for client and business work
- check Same export workflow, just with the best model
- check One-time license key
Process audio and video into outputs you can ship.
Use Personal for free local work, then upgrade to Pro when you need commercial rights and large-v3 accuracy.