SDK Samples

Indexed Cross-Document Search (Server-Side)

Same indexed search as the build-time sample, but with the index living in Postgres and queries served by an API route. Demonstrates the production path beyond a static JSON dump.

How it works

Same extractors as the build-time sibling — pdfjs-dist, mammoth, SheetJS, JSZip — but the index lives in Postgres instead of a static JSON dump. A one-shot pnpm seed-search applies the migration and upserts each unit into a denormalized search_units table with a generated tsvector column (title weight A, unit-label B, content C) plus a GIN index for fast lookup. The browser hits GET /api/search?q=…, the route runs plainto_tsquery against the GIN index, and ts_headline generates highlighted snippets server-side. Connection pooling is handled by Vercel's Neon integration via pgbouncer (node-postgres for the driver).

Why this vs the build-time sample

Build-time JSON (/indexed-search): zero infra, ships to the client, fixed corpus, every visitor downloads the full index, rebuilds require a redeploy. Right for fixed reference material — docs, marketing pages, public catalogs. Server-side Postgres (this sample): real-time updates as documents are added/changed, scales with corpus size (GIN index keeps queries fast), per-tenant data isolation if you need it, server bears the index cost. Right for actual document repositories — internal knowledge bases, customer-facing search, anything that grows. The viewer + highlight layer (SearchViewer.tsx) is identical between the two — only the data source for the sidebar changes.

Loading server-side indexed search…