Question 1

Why is Moss faster than vector databases?

Accepted Answer

Moss runs retrieval directly inside the same process as your agent. That means no network hops, no remote vector store, and no embedding API calls on the critical path. Traditional vector databases like Pinecone and Qdrant require a cloud round trip for every query, adding 100–500 ms of latency before inference can begin. Because Moss never leaves your runtime, it can return results in under 10 ms.

Question 2

Do I still need Pinecone or other vector databases?

Accepted Answer

No. Moss replaces the vector database entirely for retrieval workloads on the critical path. You still keep your LLM, voice stack, and existing application code. The only change is where retrieval happens.

Teams moving from Pinecone, Qdrant, Chroma, or Weaviate to Moss typically do so to reduce latency, simplify infrastructure, and lower cost at the same time.

Question 3

Can Moss run fully on device or at the edge?

Accepted Answer

Yes. Moss compiles to WebAssembly and can run in the browser, on mobile and desktop devices, on edge runtimes, or in the cloud. Once an index is loaded, queries execute entirely in memory on the device with no required network connection. Indexes automatically sync whenever connectivity is available.

Question 4

How does Moss reduce latency in production systems?

Accepted Answer

Moss reduces latency by eliminating the retrieval network hop. In a typical voice AI pipeline, retrieval can add 100–500 ms of perceived response time. By running search directly inside the agent runtime, Moss cuts retrieval latency to under 10 ms. That allows downstream LLM and TTS stages to start sooner, making responses feel faster and conversations feel truly real time.

Question 5

How does Moss handle data privacy and security?

Accepted Answer

Moss is designed with privacy built in. By default, data stays on the device or within your own infrastructure, without requiring a round trip to a third-party vector store. User queries are not stored in a centralized system. This architecture is well suited for regulated industries like healthcare and finance, and supports SOC 2 and HIPAA compliance paths for enterprise deployments.

Question 6

What does Moss replace in my current stack?

Accepted Answer

Moss replaces the vector database, the embedding API call on the hot path, and the custom retrieval glue typically required between them. You still keep your existing LLM provider, voice stack (LiveKit, Vapi, ElevenLabs), LLM framework (LangChain, DSPy), and frontend. Moss simply plugs in as the retrieval layer wherever your agent is running.

Question 7

How does Moss scale for high volume voice or AI workloads?

Accepted Answer

Because Moss runs directly in process, scaling retrieval works the same way as scaling the rest of your application. Each agent instance includes its own local search runtime, allowing query throughput to scale linearly with your deployment. There’s no shared vector database to become a bottleneck, no per-query cloud retrieval cost, and no rate limits at the retrieval layer.

Question 8

How do I get started and test latency?

Accepted Answer

Try the live latency demo on this page to see real-time end-to-end retrieval performance. To get started, install the SDK with `npm install @moss-dev/moss` or `pip install moss`, create a project key in the Moss Portal, and add a few lines of code to your agent. Most teams can get end-to-end retrieval latency under 10 ms within minutes.

Loading

Loading

Your Voice AI breaks when retrieval is slow.

Built for real time AI systems at scale

Why Moss is fundamentally different

Replace your vector database

Run search where your AI runs

Retrieve in <10 ms

Ship real-time retrieval in minutes

Your bottleneck is not your model. It is retrieval.

Built for modern AI stacks

Voice AI

LLM frameworks

Frontend AI

Built for real time AI applications

Voice AI and Copilots

Questions from teams building real time AI systems

Why is Moss faster than vector databases?

Do I still need Pinecone or other vector databases?

Can Moss run fully on device or at the edge?

How does Moss reduce latency in production systems?

How does Moss handle data privacy and security?

What does Moss replace in my current stack?

How does Moss scale for high volume voice or AI workloads?

How do I get started and test latency?

Eliminate latency from your AI stack

Loading

Your Voice AI breaks when retrieval is slow.

Built for real time AI systems at scale

Why Moss is fundamentally different

Replace your vector database

Run search where your AI runs

Retrieve in <10 ms

Ship real-time retrieval in minutes

Your bottleneck is not your model. It is retrieval.

Built for modern AI stacks

Voice AI

LLM frameworks

Frontend AI

Built for real time AI applications

Voice AI and Copilots

Questions from teams building real time AI systems

Why is Moss faster than vector databases?

Do I still need Pinecone or other vector databases?

Can Moss run fully on device or at the edge?

How does Moss reduce latency in production systems?

How does Moss handle data privacy and security?

What does Moss replace in my current stack?

How does Moss scale for high volume voice or AI workloads?

How do I get started and test latency?

Eliminate latency from your AI stack