Video Assembly
Video assembly is the most complex pipeline stage. It uses FFmpeg to compose the final video from multiple layers: background, subtitles, audio, optional music, and optional overlays.
What It Does
This stage takes all the pieces produced by earlier stages (audio file, subtitle timing, channel visual config) and combines them into a finished MP4 video ready for review. It also generates a branded thumbnail image. Both files are uploaded to Supabase Storage.
How It Works
The assembly follows these steps in order:
- Download audio from Supabase Storage
- Fetch slideshow images (if configured) — extracts keywords from the script, fetches stock photos from Pexels
- Fetch background music (if configured) — downloads a music track
- Generate background — either a slideshow video with Ken Burns effect, or a static image (solid color or template PNG)
- Download overlay (if configured) — transparent PNG layered on top
- Generate ASS subtitles — word-level highlighting with configurable colors and font
- Mix audio — blend voiceover with background music (if present)
- Assemble with FFmpeg — compose all layers into the final MP4
- Generate branded thumbnail — hook text overlaid on background with accent color branding
- Upload video and thumbnail to Supabase Storage
Background Modes
Static Background
A single image for the entire video. Either:
- A solid color (from
visual_config.background_color, default#1a1a2e) - A template PNG uploaded to channel assets (from
visual_config.background_image)
Slideshow (Ken Burns Effect)
Multiple stock photos with smooth pan/zoom transitions:
- Images fetched from Pexels based on keywords extracted from the script
- Each image gets a zoompan effect (alternating zoom-in and zoom-out)
- Duration per image is auto-calculated (total duration divided by image count)
- Configurable: image count (default 5), zoom range (default 1.0 to 1.3x)
Subtitles
Word-level highlighting using ASS (Advanced SubStation Alpha) format:
- Words grouped into lines of approximately 5 words for readability
- Font: Montserrat Bold, 72pt
- Default color: white (
subtitle_color) - Highlight/accent color: configurable (
subtitle_highlight) - 3px outline for readability on any background
- Positioned at bottom-center of the frame
The subtitle system uses the word-level timing data from the voice generation stage. As each word is spoken, it is highlighted in the accent color while surrounding words remain in the default color.
Branded Thumbnails
Instead of extracting a random frame, the system generates a custom thumbnail:
- Background: First slideshow image (darkened 45%) or solid color
- Accent bar: 12px bar at top in the channel’s accent color
- Hook text: First 7 words of the hook in large uppercase font with text shadow
- Falls back to first-frame extraction if Pillow is not available
Video Specifications
| Property | Value |
|---|---|
| Dimensions | 1080 x 1920 (vertical, 9:16) |
| Codec | H.264 (libx264) |
| Preset | ultrafast |
| Quality | CRF 23 |
| Audio | AAC, 128kbps |
| FPS | 24 (slideshow) or source |
| Container | MP4 with faststart flag |
Where to Find It
- Dashboard: Content detail page shows the assembled video in an inline player
- Trigger: Pipeline page, “Assemble Video” button
- API:
POST /pipeline/assemble-video
Configuration
Background, subtitle, and slideshow settings are all controlled via visual_config on the channel record. Music settings are in music_config. See the Configuration Reference for the full list of fields.
Dependencies
ffmpegandffprobe— Required (installed in the Worker Docker image)PEXELS_API_KEY— Only if using slideshow mode for stock photo backgroundsJAMENDO_CLIENT_ID— Only if using background musicPillow(Python library) — For branded thumbnail generation (falls back gracefully if not installed)