How does Moss achieve sub-10ms retrieval for voice agents?

Moss loads a compact index into memory inside your agent runtime. Queries execute locally as function calls with zero network overhead. End-to-end latency including embedding inference is 3.1ms at p50 and 5.4ms at p99, benchmarked on 100,000 documents.

Which voice agent frameworks does Moss integrate with?

Moss has dedicated integration packages: pipecat-moss for Pipecat (pipeline processor), vapi-moss for VAPI (Custom Knowledge Base webhook), and elevenlabs-moss for ElevenLabs (client tool). LiveKit integrates via the core Python SDK with a context injection pattern. Install the relevant package and follow the integration guide.

Can Moss handle real-time conversation workloads?

Yes. Moss is designed for the real-time loop of voice agents: ASR transcription, context retrieval, LLM generation, and TTS. Retrieval adds less than 10ms to each turn, which is imperceptible in conversation.

Does Moss work offline for voice agents?

Yes. Once an index is loaded, all queries run entirely in-memory on the device with no network required. This makes Moss ideal for voice agents that need to work in low-connectivity environments or on-device.

How does Moss compare to Pinecone or Qdrant for voice agent retrieval?

Moss delivers 3.1ms p50 latency versus 432ms for Pinecone and 597ms for Qdrant. The difference is architectural: Moss runs locally inside your runtime, while Pinecone and Qdrant require network round-trips to cloud infrastructure.

Voice AI

Semantic Search for Voice Agents

Your voice agent needs context in under 10 milliseconds. Network round-trips to cloud vector databases add 300-900ms of dead air. Moss runs retrieval locally, inside your agent runtime, so your agent responds without the pause.

Get Started →Talk to an Engineer

<10ms

p99 retrieval latency

0ms

network round-trips

100x

faster than cloud vector DBs

The Bottom Line

The Problem

Dead air kills voice agents

Every voice agent follows the same loop: listen, retrieve context, generate a response, speak. Traditional RAG pipelines send retrieval queries over the network to a cloud vector database. That round-trip adds 300-900ms of latency. In a voice conversation, that delay is silence. Users notice gaps as short as 200ms. By the time your agent retrieves context from Pinecone or Qdrant, the conversation already feels broken. The LLM is not the bottleneck. Retrieval is.

Solution

How Moss Solves This

Local retrieval, zero network hops

Moss runs search inside your agent runtime. No network round-trip to a cloud database. Retrieval completes in under 10ms, locally.

Built for streaming conversation

Designed for the real-time loop of ASR, retrieval, LLM, and TTS. Moss fits into the critical path without adding perceptible latency.

Works with every voice stack

Drop-in integration with LiveKit, Pipecat, VAPI, ElevenLabs, and Hume AI. Install the SDK and start querying in minutes.

from moss import MossClient, QueryOptions

client = MossClient(PROJECT_ID, PROJECT_KEY)

# Create an index with your knowledge base
await client.create_index("voice-kb", knowledge_base_docs)

# Load the index into memory for sub-10ms queries
await client.load_index("voice-kb")

# In your voice agent's retrieval step
results = await client.query(
    "voice-kb", transcript_text,
    QueryOptions(top_k=3, alpha=0.8)
)
# results.docs returned in <10ms — no network hop

FAQ

Frequently asked questions

Ship Real-Time Retrieval in Minutes

Join 1000+ teams building the future of conversational AI. Get started for free or talk to our founders.

No credit card required

5-minute setup

Deploy in production today

from moss import MossClient, QueryOptions client = MossClient(PROJECT_ID, PROJECT_KEY) # Create an index with your knowledge base await client.create_index("voice-kb", knowledge_base_docs) # Load the index into memory for sub-10ms queries await client.load_index("voice-kb") # In your voice agent's retrieval step results = await client.query( "voice-kb", transcript_text, QueryOptions(top_k=3, alpha=0.8) ) # results.docs returned in <10ms — no network hop

Loading

Loading

Semantic Search for Voice Agents

The Bottom Line

Dead air kills voice agents

How Moss Solves This

Local retrieval, zero network hops

Built for streaming conversation

Works with every voice stack

Frequently asked questions

How does Moss achieve sub-10ms retrieval for voice agents?

Which voice agent frameworks does Moss integrate with?

Can Moss handle real-time conversation workloads?

Does Moss work offline for voice agents?

How does Moss compare to Pinecone or Qdrant for voice agent retrieval?

Ship Real-Time Retrieval in Minutes

Loading

Semantic Search for Voice Agents

The Bottom Line

Dead air kills voice agents

How Moss Solves This

Local retrieval, zero network hops

Built for streaming conversation

Works with every voice stack

Frequently asked questions

How does Moss achieve sub-10ms retrieval for voice agents?

Which voice agent frameworks does Moss integrate with?

Can Moss handle real-time conversation workloads?

Does Moss work offline for voice agents?

How does Moss compare to Pinecone or Qdrant for voice agent retrieval?

Ship Real-Time Retrieval in Minutes