Video Assembly

Video assembly is the most complex pipeline stage. It uses FFmpeg to compose the final video from multiple layers: background, subtitles, audio, optional music, and optional overlays.

What It Does

This stage takes all the pieces produced by earlier stages (audio file, subtitle timing, channel visual config) and combines them into a finished MP4 video ready for review. It also generates a branded thumbnail image. Both files are uploaded to Supabase Storage.

How It Works

The assembly follows these steps in order:

Download audio from Supabase Storage
Fetch slideshow images (if configured) — extracts keywords from the script, fetches stock photos from Pexels
Fetch background music (if configured) — downloads a music track
Generate background — either a slideshow video with Ken Burns effect, or a static image (solid color or template PNG)
Download overlay (if configured) — transparent PNG layered on top
Generate ASS subtitles — word-level highlighting with configurable colors and font
Mix audio — blend voiceover with background music (if present)
Assemble with FFmpeg — compose all layers into the final MP4
Generate branded thumbnail — hook text overlaid on background with accent color branding
Upload video and thumbnail to Supabase Storage

Background Modes

Static Background

A single image for the entire video. Either:

A solid color (from visual_config.background_color, default #1a1a2e)
A template PNG uploaded to channel assets (from visual_config.background_image)

Slideshow (Ken Burns Effect)

Multiple stock photos with smooth pan/zoom transitions:

Images fetched from Pexels based on keywords extracted from the script
Each image gets a zoompan effect (alternating zoom-in and zoom-out)
Duration per image is auto-calculated (total duration divided by image count)
Configurable: image count (default 5), zoom range (default 1.0 to 1.3x)

Subtitles

Word-level highlighting using ASS (Advanced SubStation Alpha) format:

Words grouped into lines of approximately 5 words for readability
Font: Montserrat Bold, 72pt
Default color: white (subtitle_color)
Highlight/accent color: configurable (subtitle_highlight)
3px outline for readability on any background
Positioned at bottom-center of the frame

The subtitle system uses the word-level timing data from the voice generation stage. As each word is spoken, it is highlighted in the accent color while surrounding words remain in the default color.

Branded Thumbnails

Instead of extracting a random frame, the system generates a custom thumbnail:

Background: First slideshow image (darkened 45%) or solid color
Accent bar: 12px bar at top in the channel’s accent color
Hook text: First 7 words of the hook in large uppercase font with text shadow
Falls back to first-frame extraction if Pillow is not available

Video Specifications

Property	Value
Dimensions	1080 x 1920 (vertical, 9:16)
Codec	H.264 (libx264)
Preset	ultrafast
Quality	CRF 23
Audio	AAC, 128kbps
FPS	24 (slideshow) or source
Container	MP4 with faststart flag

Where to Find It

Dashboard: Content detail page shows the assembled video in an inline player
Trigger: Pipeline page, “Assemble Video” button
API: POST /pipeline/assemble-video

Configuration

Background, subtitle, and slideshow settings are all controlled via visual_config on the channel record. Music settings are in music_config. See the Configuration Reference for the full list of fields.

Dependencies

ffmpeg and ffprobe — Required (installed in the Worker Docker image)
PEXELS_API_KEY — Only if using slideshow mode for stock photo backgrounds
JAMENDO_CLIENT_ID — Only if using background music
Pillow (Python library) — For branded thumbnail generation (falls back gracefully if not installed)