Benchmarks
Vector Search Latency, Measured
Head-to-head p50, p95, and p99 query latency for Moss against the leading cloud and self-hosted vector databases.
The Bottom Line
Moss returns semantic search results in 3.1ms at p50 — roughly 193× faster than Qdrant (597.6ms) and 100-200× faster than typical cloud vector databases. The gap widens at p95 and p99, where network round-trips cluster around 500-900ms while Moss stays under 5.4ms.
Results
Query latency (lower is better)
| System | p50 (ms) | p95 (ms) | p99 (ms) |
|---|---|---|---|
| Moss | 3.1 | 4.3 | 5.4 |
| ChromaDB | 351.8 | 423.5 | 538.5 |
| Qdrant | 597.6 | 682 | 771.4 |
| Pinecone | 432.6 | 732.1 | 934.2 |
Methodology
How these numbers were measured
Each system was queried with the same 1,000-query workload on an FAQ-style corpus, using the same embedding model and top-k=5. Indexes were warmed before measurement to exclude cold-start from the p50 number.
Cloud systems (Pinecone) were queried from a client in the same AWS region as the managed index to minimize network overhead. Self-hosted systems (ChromaDB, Qdrant) ran on the same hardware as the Moss process. Moss ran locally in-process against a fully loaded index — the configuration a production conversational agent would ship.
Latency is measured end-to-end from the caller’s query invocation to the return of ranked results, including any embedding generation that happens inside the runtime. It reflects what the application actually experiences, not just the raw similarity-search step.
FAQ