Skip to Content
System ReferenceVideo Assembly

Video Assembly

Video assembly is the most complex pipeline stage. It uses FFmpeg to compose the final video from multiple layers: background, subtitles, audio, optional music, and optional overlays.

What It Does

This stage takes all the pieces produced by earlier stages (audio file, subtitle timing, channel visual config) and combines them into a finished MP4 video ready for review. It also generates a branded thumbnail image. Both files are uploaded to Supabase Storage.

How It Works

The assembly follows these steps in order:

  1. Download audio from Supabase Storage
  2. Fetch slideshow images (if configured) — extracts keywords from the script, fetches stock photos from Pexels
  3. Fetch background music (if configured) — downloads a music track
  4. Generate background — either a slideshow video with Ken Burns effect, or a static image (solid color or template PNG)
  5. Download overlay (if configured) — transparent PNG layered on top
  6. Generate ASS subtitles — word-level highlighting with configurable colors and font
  7. Mix audio — blend voiceover with background music (if present)
  8. Assemble with FFmpeg — compose all layers into the final MP4
  9. Generate branded thumbnail — hook text overlaid on background with accent color branding
  10. Upload video and thumbnail to Supabase Storage

Background Modes

Static Background

A single image for the entire video. Either:

  • A solid color (from visual_config.background_color, default #1a1a2e)
  • A template PNG uploaded to channel assets (from visual_config.background_image)

Slideshow (Ken Burns Effect)

Multiple stock photos with smooth pan/zoom transitions:

  • Images fetched from Pexels based on keywords extracted from the script
  • Each image gets a zoompan effect (alternating zoom-in and zoom-out)
  • Duration per image is auto-calculated (total duration divided by image count)
  • Configurable: image count (default 5), zoom range (default 1.0 to 1.3x)

Subtitles

Word-level highlighting using ASS (Advanced SubStation Alpha) format:

  • Words grouped into lines of approximately 5 words for readability
  • Font: Montserrat Bold, 72pt
  • Default color: white (subtitle_color)
  • Highlight/accent color: configurable (subtitle_highlight)
  • 3px outline for readability on any background
  • Positioned at bottom-center of the frame

The subtitle system uses the word-level timing data from the voice generation stage. As each word is spoken, it is highlighted in the accent color while surrounding words remain in the default color.

Branded Thumbnails

Instead of extracting a random frame, the system generates a custom thumbnail:

  • Background: First slideshow image (darkened 45%) or solid color
  • Accent bar: 12px bar at top in the channel’s accent color
  • Hook text: First 7 words of the hook in large uppercase font with text shadow
  • Falls back to first-frame extraction if Pillow is not available

Video Specifications

PropertyValue
Dimensions1080 x 1920 (vertical, 9:16)
CodecH.264 (libx264)
Presetultrafast
QualityCRF 23
AudioAAC, 128kbps
FPS24 (slideshow) or source
ContainerMP4 with faststart flag

Where to Find It

  • Dashboard: Content detail page shows the assembled video in an inline player
  • Trigger: Pipeline page, “Assemble Video” button
  • API: POST /pipeline/assemble-video

Configuration

Background, subtitle, and slideshow settings are all controlled via visual_config on the channel record. Music settings are in music_config. See the Configuration Reference for the full list of fields.

Dependencies

  • ffmpeg and ffprobe — Required (installed in the Worker Docker image)
  • PEXELS_API_KEY — Only if using slideshow mode for stock photo backgrounds
  • JAMENDO_CLIENT_ID — Only if using background music
  • Pillow (Python library) — For branded thumbnail generation (falls back gracefully if not installed)
Last updated on