Voice + RAG·Production showcase

Speak the question.
Hear the answer.

Upload PDFs, ask out loud, hear cited answers streamed back as natural speech. Full voice loop in production — mic capture, Whisper STT, hybrid retrieval, low-latency TTS.

Showcase mode · 5 queries / 3 docs / 5-min session TTL

Pipeline

Five stages, end to end

Every voice query travels this exact path. No black box.

01

Upload PDF

Document is split into semantic chunks via LangChain text splitters.

02

Embed Locally

FastEmbed (BAAI/bge-small-en-v1.5) runs on the server — no external embedding API.

03

Hybrid Retrieval

Question is embedded and fused with full-text search via pgvector + tsvector.

04

AI Synthesis

Processor Agent (GPT-4.1-mini) writes a grounded answer with citations.

05

Stream Speech

GPT-4o-mini-TTS streams PCM audio over SSE into the Web Audio API.

FastEmbed local · pgvector + tsvector · OpenAI Agents SDK · SSE audio streaming

What makes it different

Engineered for production, not demos

Sub-1.5s first audible word

Audio starts streaming before the LLM has finished writing — perceived latency drops to near-zero.

Tenant-isolated retrieval

Every chunk row carries a session_id. Vector and keyword queries WHERE-filter on it before fusion. No cross-tenant leak.

Hybrid search, not just vectors

pgvector HNSW + tsvector GIN, merged via RRF. Catches both semantic similarity and exact-term matches.

Local embeddings

FastEmbed runs on the server in ONNX. No OpenAI cost for ingestion, full data control.

9 distinct voices

Coral, alloy, echo, fable, onyx, nova, sage, shimmer, verse — pick a voice that fits the use case.

Stack

Built with

Next.js 16React 19FastAPIPostgreSQL 17pgvectorFastEmbed (ONNX)OpenAI WhisperGPT-4.1-miniGPT-4o-mini-TTSOpenAI Agents SDKServer-Sent EventsWeb Audio API

Ready to talk to your documents?

Upload a PDF, pick a voice, ask anything. The demo runs on a 5-minute session.

Open the app