How does pipecat-moss fit into a Pipecat pipeline?

MossRetrievalService is a standard Pipecat pipeline processor. Place it between your STT (speech-to-text) and LLM stages. When the user speaks, STT transcribes the audio, Moss retrieves relevant context in under 10ms, and the context is injected into the LLM prompt before response generation.

Why use context injection instead of tool calling?

Tool calling requires the LLM to first decide whether to search, then wait for results, then generate a response. That adds latency and inconsistency. Context injection runs Moss on every turn automatically, so the LLM always has relevant context without an extra reasoning step.

What STT and TTS providers work with this integration?

Any Pipecat-compatible provider. The reference implementation uses Deepgram for STT, Cartesia for TTS, and OpenAI for the LLM. Moss handles retrieval independently of the audio providers.

Voice Platform

Moss + Pipecat: Real-Time Search for Voice Pipelines

Pipecat is an open-source framework for building voice and multimodal agents. The pipecat-moss package provides MossRetrievalService, a pipeline processor that queries Moss on every user turn and injects context before the LLM generates a response. No tool calling needed. Sub-10ms retrieval keeps conversations flowing without dead air.

Get Started →Book A Demo

Benefits

Why Use Moss with Pipecat

MossRetrievalService is a native Pipecat pipeline processor - insert it between STT and LLM

Context injection pattern: search runs automatically on every user utterance, no LLM tool-calling overhead

Sub-10ms retrieval eliminates dead air in the STT -> retrieval -> LLM -> TTS pipeline

Works with Deepgram STT, Cartesia TTS, OpenAI, Anthropic, and other Pipecat plugins

Pre-load indexes at pipeline startup with load_index() for fastest first-query latency

Integration

Quick Start

Python

from pipecat_moss import MossRetrievalService
from moss import MossClient
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask

# Initialize Moss retrieval service
client = MossClient("your-project-id", "your-project-key")
moss_service = MossRetrievalService(client)
await moss_service.load_index("knowledge-base")

# Insert into Pipecat pipeline
pipeline = Pipeline([
    transport.input(),
    stt,                    # Deepgram STT
    moss_service.query(     # Moss retrieval (sub-10ms)
        index_name="knowledge-base",
        top_k=5,
        alpha=0.8,
    ),
    llm,                    # OpenAI LLM
    tts,                    # Cartesia TTS
    transport.output(),
])

task = PipelineTask(pipeline)
await task.run()

Setup

Get Started in 3 Steps

Install pipecat-moss

Run pip install pipecat-moss to install the Pipecat pipeline processor for Moss.

Create and load your index

Use the Moss SDK to create_index() with your knowledge base documents, then call moss_service.load_index() at pipeline startup.

Insert into your pipeline

Add moss_service.query() between your STT and LLM processors. Moss retrieves context on every user turn and injects it into the LLM prompt automatically.

FAQ

Frequently asked questions

Explore

Voice Platform

Moss + LiveKit

LiveKit provides real-time audio and video infrastructure for voice agents.

View integration →

Voice Platform

Moss + VAPI

VAPI provides hosted voice agent infrastructure with support for custom knowledge bases via webhooks.

View integration →

Voice Platform

Moss + ElevenLabs

ElevenLabs provides conversational AI agents with speech synthesis.

View integration →

Use Case

Semantic Search for Voice Agents

Your voice agent needs context in under 10 milliseconds.

Learn more →

Ship Real-Time Retrieval in Minutes

Join 1000+ teams building the future of conversational AI. Get started for free or talk to our founders.

No credit card required

5-minute setup

Deploy in production today

Moss + Pipecat: Real-Time Search for Voice Pipelines

from pipecat_moss import MossRetrievalService from moss import MossClient from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.task import PipelineTask # Initialize Moss retrieval service client = MossClient("your-project-id", "your-project-key") moss_service = MossRetrievalService(client) await moss_service.load_index("knowledge-base") # Insert into Pipecat pipeline pipeline = Pipeline([ transport.input(), stt, # Deepgram STT moss_service.query( # Moss retrieval (sub-10ms) index_name="knowledge-base", top_k=5, alpha=0.8, ), llm, # OpenAI LLM tts, # Cartesia TTS transport.output(), ]) task = PipelineTask(pipeline) await task.run()

Loading

Loading

Moss + Pipecat: Real-Time Search for Voice Pipelines

Why Use Moss with Pipecat

Quick Start

Get Started in 3 Steps

Install pipecat-moss

Create and load your index

Insert into your pipeline

Frequently asked questions

How does pipecat-moss fit into a Pipecat pipeline?

Why use context injection instead of tool calling?

What STT and TTS providers work with this integration?

Related

Moss + LiveKit

Moss + VAPI

Moss + ElevenLabs

Semantic Search for Voice Agents

Ship Real-Time Retrieval in Minutes

Loading

Moss + Pipecat: Real-Time Search for Voice Pipelines

Why Use Moss with Pipecat

Quick Start

Get Started in 3 Steps

Install pipecat-moss

Create and load your index

Insert into your pipeline

Frequently asked questions

How does pipecat-moss fit into a Pipecat pipeline?

Why use context injection instead of tool calling?

What STT and TTS providers work with this integration?

Related

Moss + LiveKit

Moss + VAPI

Moss + ElevenLabs

Semantic Search for Voice Agents

Ship Real-Time Retrieval in Minutes