Content Pipeline

The content pipeline is the core of the system. Every piece of content follows a linear state machine from idea to posted video.

What It Does

The pipeline takes a raw topic (a headline, a trending movie, a Reddit post) and transforms it step by step into a fully produced short-form video ready for YouTube. Each step is a discrete stage that can be triggered independently, retried on failure, or run as part of the full automated sequence.

How It Works

The State Machine


draft --> scripted --> voiced --> assembled --> review --> approved --> posted
                                                 |
                                              rejected

Each status corresponds to one pipeline stage. Content progresses forward one stage at a time. If something goes wrong, items can be retried at any individual stage.

Stage Breakdown

draft — A topic has been fetched from a source (RSS, Reddit, API) and filtered by the AI. The content record exists with a source_headline, source_url, and source_type. No script, no media yet.

scripted — The AI (GPT-4o-mini) has generated a hook (the opening line) and a full voiceover script. The hook, script, and duration_seconds fields are populated. Duration is calibrated to the target word count based on speaking pace.

voiced — OpenAI TTS has generated an MP3 audio file, and Whisper has produced word-level subtitle timing. The audio_path points to the file in Supabase Storage. subtitle_data contains the word-by-word timestamps.

assembled — FFmpeg has composed the final video: background + subtitles + audio + optional music + branded thumbnail. The video_path and thumbnail_path are populated. Status moves to review for human approval.

review — Waiting for you to watch the video and decide. From here you can:

Approve — moves to approved (ready for posting)
Reject — moves to rejected (archived)
Retry a specific stage — resets to that stage and re-runs it

approved — Ready to post. Can be posted immediately or scheduled for a specific time via scheduled_for.

posted — Successfully uploaded to YouTube. The posted_platforms field contains the YouTube video ID and timestamp.

Full Pipeline Run

When the full pipeline is triggered (either manually or by the daily cron), it runs all stages in sequence:

Fetch topics for all active channels
Generate scripts for up to 20 draft items
Generate voice for up to 20 scripted items
Assemble video for up to 1 voiced item (memory-limited)

The assembly limit of 1 per run is a safeguard for the 512MB Render starter plan. Items that are not assembled in one run will be picked up on the next.

Where to Find It

Library (sidebar) — Browse all content by status
Review Queue (sidebar) — Content waiting for approval
Pipeline (sidebar) — Trigger pipeline stages and see run history
Content detail page — View any individual content item, retry stages, approve/reject
API: POST /pipeline/run-full (all stages), or individual stage endpoints

Error Handling

If any stage fails, the content is marked with status failed and the error_message field is populated with details about what went wrong. You can retry any individual stage from the content detail page or via the Pipeline API. Retrying resets the status to the appropriate stage and re-runs the processing.

Dependencies

OPENAI_API_KEY — Required for script generation, TTS, and Whisper
ffmpeg / ffprobe — Required for video assembly and audio duration detection
PEXELS_API_KEY — Only if using slideshow backgrounds
YOUTUBE_CLIENT_ID, YOUTUBE_CLIENT_SECRET, YOUTUBE_REFRESH_TOKEN — Only for posting