Pipecat integration
This guide shows how to use Pipecat as the LLM pipeline inside a Speech Engine brain server. Speech Engine handles the voice loop — speech-to-text, turn-taking, and text-to-speech — while Pipecat handles text generation through a composable pipeline of processors (LLM calls, RAG, function calls, guardrails, content filters).
This guide is Python only because Pipecat is a Python framework on the server side. There is no
Node equivalent for the pipeline processors; a pipecat-client-js package exists, but it is a
browser client that talks to a Pipecat server, not a way to build pipelines in TypeScript.
Architecture
The Speech Engine SDK runs as the outer layer — its on_transcript callback fires every time the user finishes speaking. Inside the callback, you build a Pipecat pipeline, feed the conversation history in as an LLMContextFrame, and stream the pipeline’s text output back to Speech Engine. ElevenLabs converts the text to speech and plays it to the user.
The Pipecat pipeline runs only for the duration of one turn. When a new transcript arrives, the previous pipeline is cancelled before the next one runs — this is how Speech Engine’s interruption handling propagates into the pipeline.
When to use this pattern
Pipecat shines when your brain needs more than a single LLM call:
- Composable processors for retrieval-augmented generation, function calls, or guardrails
- Frame-based middleware that can inspect, transform, or block traffic at every step
- Reusable pipeline fragments shared across multiple agents
If your brain is “transcript in, LLM call out”, the Speech Engine quickstart is simpler. Reach for Pipecat when the pipeline itself is the interesting part.
Prerequisites
- A Speech Engine. Follow the Speech Engine quickstart to create one.
- Python 3.10+ (required by
pipecat-ai). - Public HTTPS tunnel for the brain server (e.g. ngrok).
Install dependencies
pipecat-ai[openai] pulls in the OpenAI LLM service. Swap the extra for another provider (anthropic, google, etc.) if you prefer.
Build the Pipecat brain
The brain has two pieces: a TextSink processor that drains streamed text into an asyncio.Queue, and a run_pipecat_brain coroutine that builds a one-turn pipeline and yields chunks as an async iterator.
The pipeline contains only the LLM service and the sink — no STT or TTS processors, because Speech Engine handles those. LLMContextFrame is the input; LLMTextFrame chunks are the output.
run_pipecat_brain is an async generator. Each yielded chunk goes straight to Speech Engine, so the agent starts speaking before the full response is ready.
Wire it into the Speech Engine server
The Speech Engine SDK’s send_response accepts a string or any async iterable of strings, so you can pass run_pipecat_brain(transcript) directly. Convert the ConversationMessage objects Speech Engine provides into plain dicts before passing them to the brain.
The Speech Engine SDK cancels the previous turn’s task when a new transcript arrives, which cancels the async generator and the underlying PipelineTask via the try/finally block in run_pipecat_brain.
Run the server
Connect to the Speech Engine from a browser using the same token endpoint and client code shown in the quickstart. The Pipecat pipeline runs server-side; the browser sees a normal Speech Engine conversation.
Extend the pipeline
A text-only Pipecat pipeline can include any frame processor that operates on LLMTextFrame or LLMContextFrame. A few common additions:
- Guardrails: a
FrameProcessorplaced before the LLM that inspectsLLMContextFrameand replaces or blocks unsafe context. - Function calls: register tools on the
OpenAILLMServiceand Pipecat handles tool-call frames natively. The final assistant text still arrives asLLMTextFrame. - Multi-stage reasoning: chain two
OpenAILLMServiceinstances, with a custom processor in between that rewrites the context for the second pass. - Output filtering: a
FrameProcessorplaced after the LLM that inspects eachLLMTextFrameand drops or rewrites disallowed content before it reachesTextSink.
The pipeline shape stays the same — Pipeline([processor_a, llm, processor_b, sink]) — and run_pipecat_brain does not change.
Production considerations
- Cancellation safety:
PipelineTask.cancel()can deadlock if called before the pipeline has fully started (pipecat-ai/pipecat#4276). Thetry/finallypattern above is safe becausecancel()runs only after at least one frame has been queued. - Prompt injection: speech-to-text output is user input. Validate or normalize the transcript before feeding it to the LLM, especially if any downstream processor uses the text in tool calls or database queries.
- Brain server authentication: set a shared secret on the Speech Engine and check it in the brain server to prevent unauthorized connections to your
/wsendpoint: - LLM provider:
pipecat-ai[openai]includesOpenAILLMService. For Anthropic, installpipecat-ai[anthropic]and useAnthropicLLMService; the rest of the pipeline is unchanged.
Next steps
Build the brain server and browser client end-to-end.
Use Speech Engine as the voice layer for a LiveKit room.
Classes, methods, and events for the Speech Engine Python SDK.
Classes, methods, and events for the Speech Engine JavaScript SDK.