Version 3.4.0 — Windows Only

Turn recordings into structured assets, not walls of text.

The precision-engineered Windows utility for heavy media workflows. Export word-level JSON, sentence clips, and unique stills — all offline.

ECHOTEXT — PROCESSOR
EchoText desktop app screenshot showing the main transcript processing interface
Output / word_level.json
{"word":"structured","start":0.12,"end":0.74,"conf":0.99}
{"word":"offline","start":0.76,"end":1.18,"conf":1.00}
{"word":"precise","start":1.20,"end":1.64,"conf":0.98}
Scroll
The Architecture of Clarity

Built for creators, editors, and researchers who need structured transcript outputs instead of a wall of text.

Most tools give you a flat text file. EchoText treats your media as data — decomposed into addressable word-level and sentence-level assets for your NLE, database, or research engine.

Features

Structured outputs for real production work.

Export transcript data, sentence clips, still images, and previews from one desktop workflow instead of stitching together separate tools.

folder_managed

Process Folders & Volumes

Batch process entire directories of audio and video. Drop a folder, get a structured asset library out the other end.

data_object

Word-by-Word JSON

Ultra-precise timestamps for every single word uttered. Perfect for automated subtitling and search indexing.

{ "word": "EchoText", "start": 0.12, "end": 0.85, "conf": 0.99 }
movie

Sentence-level Clips

Automatically export WAV or MP4 snippets for every individual sentence.

image_search

Unique PNG Stills

EchoText captures the exact frame associated with each sentence export.

preview

In-app Preview

Review transcriptions and clips instantly within the utility before export.

What's New

Recent upgrades that change the product story.

EchoText is moving fast. Only the highlights that shift what the tool can do.

1.10.0

Unique Images Mode

Export only visually distinct PNG stills from video sources when transcript files are not the goal.

1.9.x

Stable Video Preview

Video preview now renders reliably on Windows, making mixed file selections safer to inspect before processing.

1.8.0

Sentence MP4 Clips

Video sources can now export sentence-level MP4 clips using the same timing data that drives JSON output.

Offline-First. Privacy-Native.

EchoText starts with the bundled tiny.en model for instant local processing, lets you download small.en for a free quality bump, and reserves large-v3 for Pro when accuracy matters most.

100% Local — No Cloud Dependency
System Requirements

What you need to run EchoText.

The Windows installer bundles the full app runtime. Customers only need a 64-bit Windows machine with enough memory and disk space for the models they want to use.

desktop_windows

64-bit Windows 10 or 11

EchoText ships as a Windows desktop installer targeting x64-compatible systems.

memory

Intel or AMD CPU

No dedicated GPU required for the standard Windows workflow.

hard_drive_2

8 GB System RAM Recommended

Regular system memory, not VRAM. Leave extra disk space if you plan to download larger Whisper models later.

Pricing

Simple. One-time. Yours.

EchoText Personal

  • check Bundled tiny.en plus optional small.en download
  • check Word JSON, sentence JSON, transcript text, and preview
  • check Sentence clips and unique image export
  • block Personal, non-commercial use only
$0 / Always Free
Download Personal
Precision Tier

EchoText Pro

  • bolt large-v3 transcription accuracy
  • check Commercial-use license for client and business work
  • check Same export workflow, just with the best model
  • check One-time license key
$29 / One-Time License
Launch Price
Upgrade to Pro — large-v3 + Commercial Use

Process audio and video into outputs you can ship.

Use Personal for free local work, then upgrade to Pro when you need commercial rights and large-v3 accuracy.

security Secure Checkout
cloud_off Offline Engine
key One-Time License