01 · the promisescene 01 / 12
what vidtranscript is
vidtranscript is the code-enforced-correctness civic-video pipeline that lets one operator — no engineer, no CLI — turn a pasted civic-video URL or uploaded footage into a speaker-attributed transcript, provably-spoken grounded quotes with an article draft, and broadcast-looking captioned social video, where the load-bearing claims are decided by deterministic code, not by the model.
Paste a civic URL. Get a trustworthy clip. The quote is provably real — and the code, not the model, guarantees it.
render_frame_captions_01_reynoso.png — the finished clip you post
02 · the problemthe night you lose
You scrub the council meeting at 2× looking for the one exchange that matters. Roll call, procedural motions, dead air. Somewhere in those 2,410 utterances is your story — and you have to find it, attribute it, and quote it correctly before the city wakes up.
So you reach for the fast way. That's where it gets dangerous.

transcript_viewer_desktop.png — the wall, before the tool tames it
03 · the dangerthe shortcut that backfires
A language model writes text that is fluent. Fluent is not the same as spoken. It will smooth a paraphrase into a clean sentence and present it as a verbatim quote. You publish it, and the source says: I never said that.
The model writes the quote it thinks you want.
→ fabrication risk: a word nobody spoke, in your byline.
Code rebuilds the quote from the recording's own words.
mechanism stays sealed — it unlocks at scene 07, where you watch it work.
illustrative · external research · CNTI / Digital Content Next
These percentages are external industry research, never a vidtranscript metric. They describe the fear that blocks adoption — the fear the next four scenes answer mechanically.
04 · the solutionone paste → three outputs
The whole loop happens in the /stories web UI. Built first for Bushwick Daily — a real
Brooklyn newsroom. verified · 4 event profiles · a new format is a JSON file

stories_ingest_panel_element.png — the whole input, one row
Run pipeline → ① transcript · ② grounded quotes + draft · ③ captioned clip
05 · time-to-valuesecond to trust
Read the free captions to find where the news actually is, then spend paid ASR only on the newsworthy chapters. First value lands the moment the transcript is on screen — the rest is review, not waiting.
The per-meeting figures are estimates measured in code, not yet invoiced (dashed amber). Only $0.00/render is a hard architectural fact (solid green) — the renderer is ffmpeg, with no paid render API.

stories_frontdoor_desktop.png — the loop starts at one screen
06 · watch it runthe spine — paste to clip
real UI · cross-fading the live run (not a mockup)
The run ends at render + draft. Auto-recap / debate-reel is not part of this — it is CLI-only and never demoed as a web feature.
07 · trust gate 1 · the headlinesafety as a feature
The model can be wrong about which quote matters. It structurally cannot invent the words. Here is the one motion that makes the whole product visible — scrub it, and watch the quote rebuild itself from the recording's own word-chips.
This is a sanctuary city, and we will not turn our neighbors over.
✗ "deport" — no matching word-chip in the transcript → REJECTED, never published.
transcript_id=audio_bc9c… · start_word 14302 · end_word 14338 · ✓ exact subsequence verified▲ scroll to rebuild · ▼ scroll to reset
model text → exact normalized-token
subsequence match vs the Deepgram words[] array → store start_word/end_word
indices → display text is regenerated from the array, never retyped. No match → rejected_quotes[].

story_quote_row_element.png — one quote, anchored to its audio
Grounding proves the quote matches the transcript — not that the transcript perfectly matches the audio. No ASR is perfect, and we don't claim it is. That is exactly why a human approves every quote by ear (scene 08). verified · implemented on live path
08 · trust gates 2 & 3safety as a feature
Approve unlocks only after the clip plays through to its end.
Server-enforced: the render API returns HTTP 409 on any unapproved quote — it holds even if the front end is bypassed.
UNAPPROVED CUT → HTTP 409Airtime is recomputed on the final trimmed cut, moderators excluded.
HTTP 409 until the operator types a justification. Render is never forbidden — it requires a conscious, logged choice.
IMBALANCED → 409 UNTIL JUSTIFIEDBoth gates add friction on purpose, then resolve once you've done the human work.
White-flash markers at every splice show the audience where two moments were joined.
FRICTION → RESOLVEDAirtime is recomputed on the final cut with moderators excluded; one honest limit — where speakers talk over each other, overlapping words are single-attributed, so airtime can undercount crosstalk. typed-justification UI form on roadmap

story_render_player_element.png — the rule, in the product's own words
There is no fairness-acknowledgment UI form today. The gate is server-enforced (HTTP 409); the typed-justification interstitial is roadmap. We don't mock UI that doesn't exist.
09 · proof + candorhonest about the edges
The full loop — capture → transcript → grounded quotes → approved cut → captioned clip — runs live for Bushwick Daily. verified · reference newsroom, a proof point — not an adoption metric
| roadmap | No fairness-ack UI form yet. The render gate is server-enforced via HTTP 409; today the justification is applied by PATCH and only its presence is validated, not its wording. The typed-justification interstitial is roadmap. |
| emerging | Auto-recap / debate-reel is CLI-only. Implemented but not exposed over HTTP and not in the web app. Never demoed as a web feature. |
| scoping | Single-operator by design. No auth, CORS locked to localhost, in-memory non-durable jobs that do not survive a backend restart. |
| partial | No scheduled capture-health probe. The check is implemented and cheap, but no scheduler runs it; "daily" is documented intent, not an automated job. |
| partial | HLS capture is partial. Lead with YouTube civic capture, which is proven; HLS flows through the same generic calls but has no HLS-specific handling. |

home_desktop.png — 44 real transcripts · density = a working tool
10 · unit economicsobjection, pre-empted
The renderer is ffmpeg, so every render costs $0.00 — an architectural fact, not a promotion. A content-hash check means you never pay to transcribe the same audio twice, and a per-vendor ledger lets you query the cost of any run. It runs local-first, single-operator: your footage is never pooled into someone else's platform.

render_frame_captions_02_won_sanctuary.png — $0 still looks broadcast-grade
11 · what you do nextthe only ask
the one thing to do
A source you already reported is the fastest trust-build: you already know what was said, so you can check the output against your own memory of the room. Open the front door and go — no login.
run your first real source→ request a demo→The one beta ask, after your first run: did the output match what was actually said?
Built by Alec Meeker · alec.meeker@gmail.com
/storiesNo login. The front door is the whole app.
story_detail_mobile.png — reachable anywhere · do this right now
12 · appendixoptional depth · on request
start_word/end_word indices; the render engine never sees the model's string — it reads the word array.run_capture() handles both pasted-URL and uploaded-file inputs through a single path.
label_speakers_desktop.png — the AI suggests, you decide