How does Moss achieve sub-10ms query latency?

Moss runs the search index in-memory inside your application runtime — browser, device, or server — so queries never touch the network. There are no RPC hops, no cold pods, and no proxy layers. A query is a direct in-process call against a compiled Rust/WebAssembly index.

What workload was tested?

FAQ-style retrieval on short-to-medium documents with top-k=5 semantic queries. All systems were queried with the same 1,000-query workload against the same corpus and embedding model, and each run was warmed before measurement.

Why are cloud vector databases slower?

Even the fastest hosted vector database adds a network round-trip (typically 40-150ms inside a region, 150-400ms cross-region) plus load balancer, authentication, and serialization overhead. Moss eliminates that path entirely by running retrieval on the caller.

Does Moss sacrifice recall for latency?

No. The latency win comes from removing the network, not from weaker search.

Can I reproduce these numbers?

Yes. The test corpus, embedding model, and benchmark harness are straightforward to replicate using the Moss samples repository.

Benchmarks

Vector Search Latency, Measured

Head-to-head p50, p95, and p99 query latency for Moss against the leading cloud and self-hosted vector databases.

The Bottom Line

Moss returns semantic search results in 3.1ms at p50 — roughly 193× faster than Qdrant (597.6ms) and 100-200× faster than typical cloud vector databases. The gap widens at p95 and p99, where network round-trips cluster around 500-900ms while Moss stays under 5.4ms.

Results

Query latency (lower is better)

System	p50 (ms)	p95 (ms)	p99 (ms)
Moss	3.1	4.3	5.4
ChromaDB	351.8	423.5	538.5
Qdrant	597.6	682	771.4
Pinecone	432.6	732.1	934.2

Methodology

How these numbers were measured

Each system was queried with the same 1,000-query workload on an FAQ-style corpus, using the same embedding model and top-k=5. Indexes were warmed before measurement to exclude cold-start from the p50 number.

Cloud systems (Pinecone) were queried from a client in the same AWS region as the managed index to minimize network overhead. Self-hosted systems (ChromaDB, Qdrant) ran on the same hardware as the Moss process. Moss ran locally in-process against a fully loaded index — the configuration a production conversational agent would ship.

Latency is measured end-to-end from the caller’s query invocation to the return of ranked results, including any embedding generation that happens inside the runtime. It reflects what the application actually experiences, not just the raw similarity-search step.

FAQ

Frequently asked questions

Ship Real-Time Retrieval in Minutes

Join 1000+ teams building the future of conversational AI. Get started for free or talk to our founders.

No credit card required

5-minute setup

Deploy in production today

Benchmarks

Vector Search Latency, Measured

Head-to-head p50, p95, and p99 query latency for Moss against the leading cloud and self-hosted vector databases.

The Bottom Line

Results

Query latency (lower is better)

System	p50 (ms)	p95 (ms)	p99 (ms)
Moss	3.1	4.3	5.4
ChromaDB	351.8	423.5	538.5
Qdrant	597.6	682	771.4
Pinecone	432.6	732.1	934.2

Methodology

How these numbers were measured

FAQ

Frequently asked questions

Ship Real-Time Retrieval in Minutes

Join 1000+ teams building the future of conversational AI. Get started for free or talk to our founders.

No credit card required

5-minute setup

Deploy in production today

Loading

Loading

Vector Search Latency, Measured

The Bottom Line

Query latency (lower is better)

How these numbers were measured

Frequently asked questions

How does Moss achieve sub-10ms query latency?

What workload was tested?

Why are cloud vector databases slower?

Does Moss sacrifice recall for latency?

Can I reproduce these numbers?

Ship Real-Time Retrieval in Minutes

Loading

Vector Search Latency, Measured

The Bottom Line

Query latency (lower is better)

How these numbers were measured

Frequently asked questions

How does Moss achieve sub-10ms query latency?

What workload was tested?

Why are cloud vector databases slower?

Does Moss sacrifice recall for latency?

Can I reproduce these numbers?

Ship Real-Time Retrieval in Minutes