Voice AI
Semantic Search for Voice Agents
Your voice agent needs context in under 10 milliseconds. Network round-trips to cloud vector databases add 300-900ms of dead air. Moss runs retrieval locally, inside your agent runtime, so your agent responds without the pause.
<10ms
p99 retrieval latency
0ms
network round-trips
100x
faster than cloud vector DBs
The Bottom Line
Your voice agent needs context in under 10 milliseconds. Network round-trips to cloud vector databases add 300-900ms of dead air. Moss runs retrieval locally, inside your agent runtime, so your agent responds without the pause.
The Problem
Dead air kills voice agents
Every voice agent follows the same loop: listen, retrieve context, generate a response, speak. Traditional RAG pipelines send retrieval queries over the network to a cloud vector database. That round-trip adds 300-900ms of latency. In a voice conversation, that delay is silence. Users notice gaps as short as 200ms. By the time your agent retrieves context from Pinecone or Qdrant, the conversation already feels broken. The LLM is not the bottleneck. Retrieval is.
Solution
How Moss Solves This
Local retrieval, zero network hops
Moss runs search inside your agent runtime. No network round-trip to a cloud database. Retrieval completes in under 10ms, locally.
Built for streaming conversation
Designed for the real-time loop of ASR, retrieval, LLM, and TTS. Moss fits into the critical path without adding perceptible latency.
Works with every voice stack
Drop-in integration with LiveKit, Pipecat, VAPI, ElevenLabs, and Hume AI. Install the SDK and start querying in minutes.
from moss import MossClient, QueryOptions
client = MossClient(PROJECT_ID, PROJECT_KEY)
# Create an index with your knowledge base
await client.create_index("voice-kb", knowledge_base_docs)
# Load the index into memory for sub-10ms queries
await client.load_index("voice-kb")
# In your voice agent's retrieval step
results = await client.query(
"voice-kb", transcript_text,
QueryOptions(top_k=3, alpha=0.8)
)
# results.docs returned in <10ms — no network hopFAQ