Skip to content

feat: multi-step cron architecture — check-research route + ingest refactor + pg_cron#603

Closed
codercatdev wants to merge 9 commits intodevfrom
feat/check-research-cron
Closed

feat: multi-step cron architecture — check-research route + ingest refactor + pg_cron#603
codercatdev wants to merge 9 commits intodevfrom
feat/check-research-cron

Conversation

@codercatdev
Copy link
Contributor

Summary

Splits the blocking 10+ minute NotebookLM research step into a fire-and-forget + polling architecture, matching the existing check-renders pattern. All cron schedules moved to Supabase pg_cron.

New Status Flow

ingest (daily 10:00 UTC) → creates doc as "researching" (returns in ~15s)
check-research (every 5min) → polls NotebookLM → enriches script → "script_ready"
check-renders (every 5min) → audio_gen → rendering → video_gen → uploading → published

When ENABLE_NOTEBOOKLM_RESEARCH is NOT set, ingest goes straight to "script_ready" — no change to existing behavior.

Files Changed

NEW: app/api/cron/check-research/route.ts (690 lines)

Polls NotebookLM research status for docs in "researching" state. When research completes:

  1. Imports research sources into notebook
  2. Generates 5 infographics (architecture, comparison, workflow, timeline, pros/cons)
  3. Waits for infographic completion
  4. Gets notebook summary + infographic URLs
  5. Re-generates enriched Gemini script with research data
  6. Runs Claude critic pass
  7. Updates Sanity doc to "script_ready" with enriched content

Patterns from check-renders:

  • CRON_SECRET bearer auth
  • fetchCache = 'force-no-store' + maxDuration = 60
  • ✅ Stuck detection (>30min in "researching" → flagged)
  • ✅ Idempotent — safe to run on overlap

MODIFIED: app/api/cron/ingest/route.ts

  • Removed blocking conductResearch() call (was 10+ minutes)
  • Now creates notebook + starts research in ~15s (fire-and-forget)
  • Stores researchNotebookId and researchTaskId on Sanity doc
  • Creates doc as "researching" when NotebookLM enabled, "script_ready" when not

MODIFIED: sanity/schemas/documents/automatedVideo.ts

  • Added "researching" to pipeline status options
  • Added researchNotebookId field (hidden)
  • Added researchTaskId field (hidden)
  • Added trendScore and trendSources fields

NEW: supabase/migrations/002_cron_schedules.sql

Idempotent pg_cron migration for all pipeline schedules:

Job Schedule Route
ingest-daily 0 10 * * * /api/cron/ingest
check-research */5 * * * * /api/cron/check-research
check-renders */5 * * * * /api/cron/check-renders
sponsor-outreach 0 9 * * 1,4 /api/cron/sponsor-outreach

Prerequisites: Set app.site_url and app.cron_secret as Supabase config vars.

Testing

  • npx tsc --noEmit passes (only pre-existing errors in unrelated analytics.tsx)
  • All cron routes have CRON_SECRET auth
  • Stuck detection flags docs after 30 minutes
  • Ingest returns in <30s with research enabled

Miriad and others added 9 commits March 5, 2026 02:40
…undle

Move deploySite/deployFunction/getOrCreateBucket into remotion-deploy.ts
(one-time CLI setup). remotion.ts now only imports renderMediaOnLambda +
getRenderProgress — no @rspack/binding dependency chain in serverless routes.

Also remove @remotion/bundler, @remotion/cli, @rspack/core, @rspack/binding
from serverExternalPackages in next.config.ts (no longer needed).
The main @remotion/lambda entry point re-exports from @remotion/bundler
which pulls in @rspack/core → @rspack/binding (native binary). The /client
subpath only exports renderMediaOnLambda + getRenderProgress without the
bundler dependency chain. This is the official Remotion approach for
serverless environments.
All cron jobs are triggered by Supabase pg_cron + pg_net calling
the HTTP endpoints directly. Removes vercel.json cron config to
avoid confusion and potential double-triggering.

Co-authored-by: content <content@miriad.systems>
Adds 002_cron_schedules.sql with idempotent schedules for:
- ingest-daily (10:00 UTC)
- check-research (every 5 min)
- check-renders (every 5 min)
- sponsor-outreach (Mon/Thu 09:00 UTC)

Uses DO blocks for safe unschedule on re-runs.

Co-authored-by: research <research@miriad.systems>
New cron route that polls docs in "researching" status:
- Queries Sanity for docs with researchNotebookId
- Polls NotebookLM research status
- On completion: imports sources, generates infographics,
  gets summary, re-generates enriched Gemini script, runs critic
- Updates Sanity doc to "script_ready" with enriched data
- Stuck detection: flags docs >30min in "researching"
- Follows check-renders patterns (auth, fetchCache, maxDuration)

Co-authored-by: research <research@miriad.systems>
When ENABLE_NOTEBOOKLM_RESEARCH is set:
- Creates notebook + adds sources + starts research (~10s)
- Does NOT poll for completion (was blocking 10+ min)
- Generates basic script without research data
- Creates Sanity doc with status "researching"
- Stores researchNotebookId + researchTaskId on doc
- check-research cron will poll and enrich later

When research is NOT enabled:
- Behavior unchanged — straight to "script_ready"

Removes conductResearch() import (blocking call).

Co-authored-by: research <research@miriad.systems>
…hema

Adds:
- "researching" status option (between draft and script_ready)
- researchNotebookId field (hidden, stores NotebookLM notebook UUID)
- researchTaskId field (hidden, stores deep research task UUID)
- trendScore field (0-100 from trend discovery)
- trendSources field (comma-separated signal sources)

Co-authored-by: research <research@miriad.systems>
@vercel
Copy link

vercel bot commented Mar 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
codingcat-dev Ready Ready Preview, Comment Mar 5, 2026 3:16am

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant