Skip to Content
System ReferenceTopic Sourcing

Topic Sourcing

Topic sourcing is the first pipeline stage. It discovers content ideas from external sources and uses AI to select the best ones.

What It Does

The system pulls raw topics from RSS feeds, Reddit, and external APIs like TMDB. It deduplicates them against recent content to avoid repeats, then uses AI to select the most promising topics for short-form video. Selected topics become draft content records, ready for script generation.

How It Works

  1. The system queries all active sources for a channel from the sources table
  2. For each source, it fetches raw topics (headlines, titles, descriptions)
  3. Topics are deduplicated against the last 200 content items (by MD5 hash) to avoid repeats
  4. The AI (GPT-4o-mini) filters the remaining topics, selecting the best N (where N = the channel’s posting_cadence)
  5. Selected topics become draft content records in the database

Source Types

RSS Feeds

Pull headlines from any RSS/Atom feed. Good for news sites, blogs, and industry publications.

Config FieldTypeExample
urlstring"https://www.ign.com/articles.rss"
namestring"IGN"

Reddit

Pull hot or top posts from any subreddit. No API key required — uses Reddit’s public JSON API.

Config FieldTypeExample
subredditstring"gaming"
sortstring"hot", "top", or "new"
limitnumber25
timestring"week", "month", "all" (for sort: "top" only)

Posts with score below 10 and stickied posts are automatically filtered out.

External APIs

TMDB (Movies and TV trends):

Config FieldTypeExample
providerstring"tmdb"
endpointstring"trending/all/week"
namestring"TMDB Trending"

Requires the TMDB_API_KEY environment variable.

AI Topic Filtering

After fetching raw topics, GPT-4o-mini selects the best ones based on:

  • Would it generate curiosity in a scrolling viewer?
  • Is there enough substance for a 20-30 second script?
  • Is it surprising, nostalgic, funny, fascinating, or opinion-provoking?
  • Is it evergreen? (Content should still work in 3 months)
  • Avoids: breaking tragedy, highly political, legally risky, or too niche

The filtering prompt can be customized per-channel via prompt templates (purpose: topic_filter).

Where to Find It

  • Dashboard: Channel Settings, Sources tab — add, edit, and test sources
  • Trigger: Pipeline page, “Fetch Topics” button
  • API: POST /pipeline/fetch-topics (all channels) or POST /pipeline/fetch-source/{source_id} (single source)

Configuration

Sources are managed per-channel in the sources table. Each source has:

FieldTypeDescription
channel_iduuidWhich channel this source belongs to
typestring"rss", "reddit", or "api"
configjsonType-specific configuration (see tables above)
activebooleanWhether this source is included in topic fetching
namestringDisplay name for the source

Dependencies

  • OPENAI_API_KEY — Required for AI topic filtering
  • TMDB_API_KEY — Only if using TMDB sources
  • REDDIT_CLIENT_ID / REDDIT_CLIENT_SECRET — Optional; Reddit’s public API works without authentication
Last updated on