Sub-1.5s first audible word
Audio starts streaming before the LLM has finished writing — perceived latency drops to near-zero.
Upload PDFs, ask out loud, hear cited answers streamed back as natural speech. Full voice loop in production — mic capture, Whisper STT, hybrid retrieval, low-latency TTS.
Showcase mode · 5 queries / 3 docs / 5-min session TTL
Every voice query travels this exact path. No black box.
Document is split into semantic chunks via LangChain text splitters.
FastEmbed (BAAI/bge-small-en-v1.5) runs on the server — no external embedding API.
Question is embedded and fused with full-text search via pgvector + tsvector.
Processor Agent (GPT-4.1-mini) writes a grounded answer with citations.
GPT-4o-mini-TTS streams PCM audio over SSE into the Web Audio API.
FastEmbed local · pgvector + tsvector · OpenAI Agents SDK · SSE audio streaming
Audio starts streaming before the LLM has finished writing — perceived latency drops to near-zero.
Every chunk row carries a session_id. Vector and keyword queries WHERE-filter on it before fusion. No cross-tenant leak.
pgvector HNSW + tsvector GIN, merged via RRF. Catches both semantic similarity and exact-term matches.
FastEmbed runs on the server in ONNX. No OpenAI cost for ingestion, full data control.
Coral, alloy, echo, fable, onyx, nova, sage, shimmer, verse — pick a voice that fits the use case.
Upload a PDF, pick a voice, ask anything. The demo runs on a 5-minute session.
Open the app