01 · the promisescene 01 / 12

Paste a civic URL.
Get a clip whose every quote is provably real.

Name: vidtranscript
Author: Alec Meeker

what vidtranscript is

vidtranscript is the code-enforced-correctness civic-video pipeline that lets one operator — no engineer, no CLI — turn a pasted civic-video URL or uploaded footage into a speaker-attributed transcript, provably-spoken grounded quotes with an article draft, and broadcast-looking captioned social video, where the load-bearing claims are decided by deterministic code, not by the model.

Paste a civic URL. Get a trustworthy clip. The quote is provably real — and the code, not the model, guarantees it.

▼ scroll · this is the run you fall through

video frame · 9:16

Finished 9:16 vertical render: karaoke caption with active-word highlight, ANTONIO REYNOSO lower-third, FOREIGN POLICY topic banner, BUSHWICK DAILY watermark.

render_frame_captions_01_reynoso.png — the finished clip you post

02 · the problemthe night you lose

It's 11:14 p.m. You have a three-hour recording and a 9 a.m. deadline.

You scrub the council meeting at 2× looking for the one exchange that matters. Roll call, procedural motions, dead air. Somewhere in those 2,410 utterances is your story — and you have to find it, attribute it, and quote it correctly before the city wakes up.

the turn

So you reach for the fast way. That's where it gets dangerous.

localhost:5174 / view / audio.m4a · 179:57 · 2,410 utterances

Transcript viewer dimmed: a three-hour recording, audio.m4a 179:57, 2,410 utterances — the wall the operator stares down.

transcript_viewer_desktop.png — the wall, before the tool tames it

03 · the dangerthe shortcut that backfires

A generic AI clip tool will hand you a quote nobody said — and you'll ship it.

A language model writes text that is fluent. Fluent is not the same as spoken. It will smooth a paraphrase into a clean sentence and present it as a verbatim quote. You publish it, and the source says: I never said that.

⟵ generic tool

The model writes the quote it thinks you want.

"We must protect every resident in this city."

→ fabrication risk: a word nobody spoke, in your byline.

🔒

vidtranscript ⟶

Code rebuilds the quote from the recording's own words.

mechanism stays sealed — it unlocks at scene 07, where you watch it work.

0%

of journalists concerned about trust in AI tools

0%

concerned about accuracy

0%

use AI for transcription monthly

illustrative · external research · CNTI / Digital Content Next

read this exactly

These percentages are external industry research, never a vidtranscript metric. They describe the fear that blocks adoption — the fear the next four scenes answer mechanically.

04 · the solutionone paste → three outputs

One source in. Three reviewable outputs out.

A speaker-attributed transcriptWord-level timing, every utterance attributed to a human-confirmed speaker.
Grounded quotes + an article draftEvery quote provably spoken; a draft you edit, never one that decides for you.
A branded captioned clip — 9:16 / 16:9 / 1:1Broadcast-looking social video in three aspect ratios. verified · 3 ratios

no terminal · no config · no engineer

The whole loop happens in the /stories web UI. Built first for Bushwick Daily — a real Brooklyn newsroom. verified · 4 event profiles · a new format is a JSON file

localhost:5174 / stories — ingest a video

stories_ingest_panel_element.png — the whole input, one row

the fan-out

Run pipeline → ① transcript · ② grounded quotes + draft · ③ captioned clip

05 · time-to-valuesecond to trust

Paste to a publishable clip + draft in one sitting — at cents per meeting.

Read the free captions to find where the news actually is, then spend paid ASR only on the newsworthy chapters. First value lands the moment the transcript is on screen — the rest is review, not waiting.

≈$0.20–0.60

per meeting, hybrid chapter mode

in-code · pending billed test

≈$1.85

full 3-hour diarization

in-code estimate

$0.00

per render, every render

architectural fact

honest about money

The per-meeting figures are estimates measured in code, not yet invoiced (dashed amber). Only $0.00/render is a hard architectural fact (solid green) — the renderer is ffmpeg, with no paid render API.

localhost:5174 / stories

The /stories front door: the ingest panel, the transcript-extract picker, and a live wall of story cards — the loop begins at a single screen.

stories_frontdoor_desktop.png — the loop starts at one screen

06 · watch it runthe spine — paste to clip

A real source, the real UI: paste → transcript → grounded quotes → approved cut → clip.

Paste & runA civic URL or uploaded footage, one profile, one button./stories · run pipeline
Watch it workThe job log streams; no black box, no terminal.job-running state
The transcript appearsSpeaker-attributed, word-level timing — first value, on screen./view/[id]
You label the speakersThe model suggests; you confirm. Names are never auto-applied./label
The workbench — 30 quotesEvery grounded quote, playable, in one editorial screen.cut (30 quotes)
The payoff — render + draftA captioned clip and an article draft you download.render · done

localhost:5174 / stories

The live pipeline UI, cross-fading through the real run: front door, transcript, speaker-labeling, the quote workbench, and the render panel.

real UI · cross-fading the live run (not a mockup)

the walkthrough ends here

The run ends at render + draft. Auto-recap / debate-reel is not part of this — it is CLI-only and never demoed as a web feature.

07 · trust gate 1 · the headlinesafety as a feature

A published quote cannot contain a word nobody spoke.

The model can be wrong about which quote matters. It structurally cannot invent the words. Here is the one motion that makes the whole product visible — scrub it, and watch the quote rebuild itself from the recording's own word-chips.

the recording's words → the published quote

This is a sanctuary city, and we will not turn our neighbors over.

✗ "deport" — no matching word-chip in the transcript → REJECTED, never published.

transcript_id=audio_bc9c… · start_word 14302 · end_word 14338 · ✓ exact subsequence verified

▲ scroll to rebuild · ▼ scroll to reset

the mechanism (mono = the machine that guarantees it)

model text → exact normalized-token subsequence match vs the Deepgram words[] array → store start_word/end_word indices → display text is regenerated from the array, never retyped. No match → rejected_quotes[].

localhost:5174 / stories/[id] — cut (quote row)

A single quote-approval row: speaker Claire Valdez, the quote text, a why-this-cut note, and the play-then-approve gate.

story_quote_row_element.png — one quote, anchored to its audio

the honest boundary (stated plainly)

Grounding proves the quote matches the transcript — not that the transcript perfectly matches the audio. No ASR is perfect, and we don't claim it is. That is exactly why a human approves every quote by ear (scene 08). verified · implemented on live path

This is the one beat the whole page is built toward: the model proposes, the code disposes.

08 · trust gates 2 & 3safety as a feature

The tool protects your credibility for you — and the gates live on the server, not just the screen.

2

Approve by ear

Approve unlocks only after the clip plays through to its end.

Server-enforced: the render API returns HTTP 409 on any unapproved quote — it holds even if the front end is bypassed.

UNAPPROVED CUT → HTTP 409

3

Render fair (debates)

Airtime is recomputed on the final trimmed cut, moderators excluded.

HTTP 409 until the operator types a justification. Render is never forbidden — it requires a conscious, logged choice.

IMBALANCED → 409 UNTIL JUSTIFIED

＋

You stay the editor

Both gates add friction on purpose, then resolve once you've done the human work.

White-flash markers at every splice show the audience where two moments were joined.

FRICTION → RESOLVED

honest limit on the fairness gate (identical in both decks)

Airtime is recomputed on the final cut with moderators excluded; one honest limit — where speakers talk over each other, overlapping words are single-attributed, so airtime can undercount crosstalk. typed-justification UI form on roadmap

localhost:5174 / stories/[id] — render

Render panel and the product's own gate hint: Approve unlocks only after you have played a quote through. Format checkboxes, Render branded video, and the article-draft header.

story_render_player_element.png — the rule, in the product's own words

what we will NOT show you

There is no fairness-acknowledgment UI form today. The gate is server-enforced (HTTP 409); the typed-justification interstitial is roadmap. We don't mock UI that doesn't exist.

09 · proof + candorhonest about the edges

The core path runs live for a real Brooklyn newsroom. Here's exactly what isn't built yet.

the proof

The full loop — capture → transcript → grounded quotes → approved cut → captioned clip — runs live for Bushwick Daily. verified · reference newsroom, a proof point — not an adoption metric

The honest edges — equal weight to the proof (authored once, identical in both decks)

roadmap	No fairness-ack UI form yet. The render gate is server-enforced via HTTP 409; today the justification is applied by PATCH and only its presence is validated, not its wording. The typed-justification interstitial is roadmap.
emerging	Auto-recap / debate-reel is CLI-only. Implemented but not exposed over HTTP and not in the web app. Never demoed as a web feature.
scoping	Single-operator by design. No auth, CORS locked to localhost, in-memory non-durable jobs that do not survive a backend restart.
partial	No scheduled capture-health probe. The check is implemented and cheap, but no scheduler runs it; "daily" is documented intent, not an automated job.
partial	HLS capture is partial. Lead with YouTube civic capture, which is proven; HLS flows through the same generic calls but has no HLS-specific handling.

localhost:5174 /

The transcripts library: a dense wall of 44 transcribed sources with speaker chips and N-unlabeled work-queue badges — a working newsroom tool, not a toy demo.

home_desktop.png — 44 real transcripts · density = a working tool

10 · unit economicsobjection, pre-empted

Rendering is free. The only metered spend is transcription + editorial — and you can query every cent.

The renderer is ffmpeg, so every render costs $0.00 — an architectural fact, not a promotion. A content-hash check means you never pay to transcribe the same audio twice, and a per-vendor ledger lets you query the cost of any run. It runs local-first, single-operator: your footage is never pooled into someone else's platform.

$0.00

per render

architectural fact

$0.0052

per audio-minute, Deepgram ledger rate

pending billed confirmation

1

queryable per-vendor cost ledger

implemented

video frame · 9:16

A second finished 9:16 render frame: JULIE WON lower-third, IMMIGRATION topic banner, karaoke caption — proving the $0 render looks broadcast-grade and tracks identity across a multi-speaker debate.

render_frame_captions_02_won_sanctuary.png — $0 still looks broadcast-grade

11 · what you do nextthe only ask

the one thing to do

Run your first real source.
Pick a meeting you already covered.

A source you already reported is the fastest trust-build: you already know what was said, so you can check the output against your own memory of the room. Open the front door and go — no login.

run your first real source→ request a demo→

The one beta ask, after your first run: did the output match what was actually said?

Built by Alec Meeker · alec.meeker@gmail.com

Open /storiesNo login. The front door is the whole app.
Paste a URL or uploadA civic-video link, or footage from disk.
Pick a profile + key termsOne of four event profiles; boost the names that matter.
Run the pipeline, watch the logNo black box — the job streams as it works.
Label the speakersYou decide; the model only suggests.
Review & play each quoteApprove is locked until you've heard it.
Render + draft, downloadA publishable asset — the value milestone, in one sitting.

a publishable asset · one sitting

Story detail on a phone: the player, render panel, and the 30-quote approval list stacked — the first run is reachable anywhere.

story_detail_mobile.png — reachable anywhere · do this right now

12 · appendixoptional depth · on request

For the tester who wants to verify the claim, not just hear it.

How the guarantees actually work expand ▾

Quote groundingExact normalized-token subsequence match with ±1-utterance slack. No fuzzy fallback. Stored as start_word/end_word indices; the render engine never sees the model's string — it reads the word array.
Server-side render gatesApproval and fairness are enforced by the render API with HTTP 409 — bypassing the front end does not bypass the gate.
One orchestratorrun_capture() handles both pasted-URL and uploaded-file inputs through a single path.
Hardened civic captureYouTube civic capture is proven and looks like a viewer, not a scraper; HLS is partial — same generic calls, no HLS-specific handling.

localhost:5174 / label

Speaker labeling: Speaker 1 of 41, a sample utterance and video frame, and a Who-is-this-speaker field pre-seeded with an LLM suggestion the human must confirm.

label_speakers_desktop.png — the AI suggests, you decide

Paste a civic URL.Get a clip whose every quote is provably real.