LiveKit integration | ElevenLabs Documentation

This guide shows how to use ElevenLabs Speech Engine as the voice layer for a LiveKit room. A LiveKit Agents worker joins the room as a participant, subscribes to the user’s audio track, opens a WebSocket to Speech Engine, and publishes Speech Engine’s synthesized audio back to the room as its own track.

Architecture

Speech Engine accepts two kinds of WebSocket connections:

The brain WebSocket that the ElevenLabs API connects to. Your server runs this with the Speech Engine SDK (engine.serve() / engine.attach()) and receives transcripts to respond to.
The conversation WebSocket that clients connect to. Browsers connect via a WebRTC token; non-browser clients (like a LiveKit Agents worker) connect via a signed URL and stream raw PCM audio in both directions.

The LiveKit worker uses the second connection. It acts as a “client” of Speech Engine on behalf of the participants in the LiveKit room.

The brain server is unchanged from the Speech Engine quickstart — the LiveKit worker replaces the browser as the audio source but the LLM logic stays the same.

When to use this pattern

Reach for the LiveKit bridge when the room itself is part of the experience:

Multi-participant sessions where users speak with the agent alongside each other
Existing LiveKit deployments where switching transports would break clients
Voice agents sharing a room with screen share, video, or text chat
SIP-to-LiveKit dispatched calls that need an AI agent on the line

If you only need a browser-to-Speech-Engine voice loop with no other participants, the WebRTC client in the Speech Engine quickstart is simpler — Speech Engine speaks WebRTC directly to the browser, no LiveKit room required.

Prerequisites

A LiveKit project (either LiveKit Cloud or a self-hosted server). The worker needs LIVEKIT_URL, LIVEKIT_API_KEY, and LIVEKIT_API_SECRET.
An ElevenLabs Speech Engine. Follow the Speech Engine quickstart to create one and run the brain server.
Python 3.9+ or Node.js 18+.

The Node bridge worker uses @livekit/rtc-node, which is currently in Developer Preview. For production deployments, prefer the Python worker.

Configure Speech Engine audio formats

LiveKit’s AudioStream resamples incoming Opus tracks to whatever PCM sample rate you request, so you can match Speech Engine’s input directly. Update the Speech Engine to accept 16 kHz PCM for ASR input and emit 24 kHz PCM for TTS output.

1 import asyncio
2 import os
3 from elevenlabs import AsyncElevenLabs
4 
5 elevenlabs = AsyncElevenLabs(api_key=os.environ["ELEVENLABS_API_KEY"])
6 
7 
8 async def update_engine():
9     await elevenlabs.speech_engine.update(
10         speech_engine_id="seng_8k3m9xr4hjnfg983brhmhkd98n6",
11         asr={"user_input_audio_format": "pcm_16000"},
12         tts={"agent_output_audio_format": "pcm_24000"},
13     )
14 
15 
16 asyncio.run(update_engine())

Speech Engine PCM is signed 16-bit little-endian throughout. See the audio format reference for other supported rates.

Build the bridge worker

The worker is a long-running process that connects to your LiveKit server, waits for jobs, joins assigned rooms, and bridges audio between the room and Speech Engine.

Install dependencies

$ pip install "livekit-agents" "livekit-api" "elevenlabs" "aiohttp" "python-dotenv"

Mint a Speech Engine signed URL

The worker requests a short-lived signed URL for the Speech Engine conversation WebSocket. The signed URL embeds the engine ID and a one-time signature, so the worker can open the WebSocket without exposing your API key.

1 from elevenlabs import AsyncElevenLabs
2 
3 elevenlabs = AsyncElevenLabs(api_key=os.environ["ELEVENLABS_API_KEY"])
4 
5 async def signed_url() -> str:
6     response = await elevenlabs.conversational_ai.conversations.get_signed_url(
7         agent_id=os.environ["SPEECH_ENGINE_ID"],
8     )
9     return response.signed_url

Define the worker entrypoint

Each time the worker is dispatched to a room, its entrypoint runs. The entrypoint connects to the room, opens a Speech Engine conversation WebSocket, and starts two audio bridges: one for caller audio going to Speech Engine, and one for synthesized audio coming back.

1 import asyncio
2 import base64
3 import json
4 import os
5 
6 import aiohttp
7 from dotenv import load_dotenv
8 from elevenlabs import AsyncElevenLabs
9 from livekit import agents, rtc
10 from livekit.agents import JobContext, WorkerOptions, cli
11 
12 load_dotenv()
13 
14 elevenlabs = AsyncElevenLabs(api_key=os.environ["ELEVENLABS_API_KEY"])
15 SPEECH_ENGINE_ID = os.environ["SPEECH_ENGINE_ID"]
16 
17 USER_INPUT_RATE = 16000
18 AGENT_OUTPUT_RATE = 24000
19 
20 
21 async def signed_url() -> str:
22     response = await elevenlabs.conversational_ai.conversations.get_signed_url(
23         agent_id=SPEECH_ENGINE_ID,
24     )
25     return response.signed_url
26 
27 
28 async def entrypoint(ctx: JobContext):
29     el_ws_ready: asyncio.Future[aiohttp.ClientWebSocketResponse] = (
30         asyncio.get_running_loop().create_future()
31     )
32 
33     async def pump_user_audio(track: rtc.Track):
34         el_ws = await el_ws_ready
35         stream = rtc.AudioStream(
36             track, sample_rate=USER_INPUT_RATE, num_channels=1,
37         )
38         async for event in stream:
39             payload = base64.b64encode(bytes(event.frame.data)).decode()
40             await el_ws.send_str(json.dumps({"user_audio_chunk": payload}))
41 
42     # Register the subscriber BEFORE ctx.connect() so we don't miss tracks
43     # that get auto-subscribed during the connection handshake.
44     @ctx.room.on("track_subscribed")
45     def on_track_subscribed(track, publication, participant):
46         if track.kind != rtc.TrackKind.KIND_AUDIO:
47             return
48         if participant.identity == ctx.room.local_participant.identity:
49             return
50         asyncio.create_task(pump_user_audio(track))
51 
52     await ctx.connect()
53 
54     # Publish a track for the agent's synthesized audio.
55     source = rtc.AudioSource(sample_rate=AGENT_OUTPUT_RATE, num_channels=1)
56     track = rtc.LocalAudioTrack.create_audio_track("elevenlabs-agent", source)
57     await ctx.room.local_participant.publish_track(
58         track,
59         rtc.TrackPublishOptions(source=rtc.TrackSource.SOURCE_MICROPHONE),
60     )
61 
62     # Open the Speech Engine conversation WebSocket.
63     http = aiohttp.ClientSession()
64     el_ws = await http.ws_connect(await signed_url())
65     await el_ws.send_str(json.dumps({"type": "conversation_initiation_client_data"}))
66     el_ws_ready.set_result(el_ws)
67 
68     async def el_to_room():
69         async for msg in el_ws:
70             if msg.type != aiohttp.WSMsgType.TEXT:
71                 continue
72             event = json.loads(msg.data)
73             etype = event.get("type")
74             if etype == "audio":
75                 pcm = base64.b64decode(event["audio_event"]["audio_base_64"])
76                 samples_per_channel = len(pcm) // 2
77                 frame = rtc.AudioFrame(
78                     pcm, AGENT_OUTPUT_RATE, 1, samples_per_channel,
79                 )
80                 await source.capture_frame(frame)
81             elif etype == "interruption":
82                 source.clear_queue()
83             elif etype == "ping":
84                 event_id = event.get("ping_event", {}).get("event_id")
85                 await el_ws.send_str(json.dumps({
86                     "type": "pong", "event_id": event_id,
87                 }))
88 
89     pump_task = asyncio.create_task(el_to_room())
90 
91     async def cleanup():
92         pump_task.cancel()
93         await el_ws.close()
94         await http.close()
95 
96     ctx.add_shutdown_callback(cleanup)
97 
98 
99 if __name__ == "__main__":
100     cli.run_app(WorkerOptions(
101         entrypoint_fnc=entrypoint,
102         agent_name="elevenlabs-bridge",
103     ))

The worker filters out its own published audio in the track_subscribed handler by comparing against the local participant’s identity. Without this check, the worker would try to send its own synthesized audio back to Speech Engine.

Two ordering details matter for correctness:

Listener timing: TrackSubscribed is registered before ctx.connect(). LiveKit auto-subscribes to existing tracks during the connection handshake, and a listener registered afterwards may miss the event. The audio pump waits on a Future / Promise for the Speech Engine WebSocket so it can subscribe immediately and forward audio as soon as the connection is open.
TypeScript only — capture serialization: @livekit/rtc-node’s AudioSource.captureFrame throws InvalidState if called concurrently. The TypeScript handler serializes captures with a promise chain. Python’s single async for el_to_room loop is naturally sequential and does not need this.

Start the worker

$ python bridge.py dev

dev enables hot reload and colored logs. Use start in production for JSON logs and graceful shutdown.

The worker connects to your LiveKit server and waits for job assignments. It does not join any rooms until it is dispatched.

Dispatch the worker to a room

Because the worker has an agent_name, it uses explicit dispatch — it only joins rooms when your backend tells it to. The simplest pattern is to include a RoomAgentDispatch in the LiveKit access token that the browser uses to connect.

1 import os
2 
3 from dotenv import load_dotenv
4 from flask import Flask, jsonify, request
5 from livekit.api import AccessToken, RoomAgentDispatch, VideoGrants
6 
7 load_dotenv()
8 
9 app = Flask(**name**)
10 
11 @app.route("/api/livekit-token")
12 def get_token():
13 room_name = request.args.get("room", "demo-room")
14 identity = request.args.get("identity", "web-user")
15 
16     token = (
17         AccessToken(
18             os.environ["LIVEKIT_API_KEY"],
19             os.environ["LIVEKIT_API_SECRET"],
20         )
21         .with_identity(identity)
22         .with_grants(VideoGrants(room_join=True, room=room_name))
23         .with_room_config(
24             room_configuration={
25                 "agents": [RoomAgentDispatch(agent_name="elevenlabs-bridge")],
26             },
27         )
28     )
29 
30     return jsonify(token=token.to_jwt(), url=os.environ["LIVEKIT_URL"])
31 
32 if **name** == "**main**":
33 app.run(port=3002)

When a browser uses this token to create or join a room, LiveKit dispatches the bridge worker into the same room automatically.

Connect from the browser

The browser only needs the standard LiveKit client — it does not interact with Speech Engine directly.

App.tsx

1 import { Room, RoomEvent, Track } from "livekit-client";
2 import { useCallback, useState } from "react";
3 
4 export default function App() {
5   const [room] = useState(() => new Room());
6 
7   const join = useCallback(async () => {
8     const response = await fetch("/api/livekit-token");
9     const { token, url } = await response.json();
10 
11     room.on(RoomEvent.TrackSubscribed, (track) => {
12       if (track.kind === Track.Kind.Audio) {
13         document.body.appendChild(track.attach());
14       }
15     });
16 
17     await room.connect(url, token);
18     await room.localParticipant.setMicrophoneEnabled(true);
19   }, [room]);
20 
21   return <button onClick={join}>Start conversation</button>;
22 }

When the button is clicked, the browser fetches a LiveKit token, joins the room with the microphone enabled, and starts receiving the agent’s audio track. The worker is dispatched, opens its Speech Engine session, and bridges audio in both directions.

Audio format reference

Speech Engine supports the following audio formats. Configure them on the engine via asr.user_input_audio_format and tts.agent_output_audio_format.

Format	Sample rate	Encoding	Notes
`pcm_8000`	8 kHz	Signed 16-bit LE PCM	ASR input only.
`pcm_16000`	16 kHz	Signed 16-bit LE PCM	Recommended for LiveKit user input.
`pcm_22050`	22.05 kHz	Signed 16-bit LE PCM
`pcm_24000`	24 kHz	Signed 16-bit LE PCM	Recommended for LiveKit agent output.
`pcm_44100`	44.1 kHz	Signed 16-bit LE PCM	TTS output requires Independent Publisher tier or above.
`pcm_48000`	48 kHz	Signed 16-bit LE PCM	ASR input only.
`ulaw_8000`	8 kHz	μ-law	Used by Twilio Media Streams.

AudioStream and AudioSource in LiveKit handle resampling for you — you can request any sample rate from AudioStream and the SDK converts from the underlying 48 kHz Opus track.

Production considerations

Explicit dispatch: Always set agent_name / agentName on WorkerOptions. Auto-dispatch fires the worker for every room created on your LiveKit project, which is rarely what you want.
Brain server authentication: Set a shared secret on the Speech Engine and verify it in your brain server, so only the Speech Engine can reach your endpoint:
```
1 await elevenlabs.speech_engine.update(
2     speech_engine_id="seng_8k3m9xr4hjnfg983brhmhkd98n6",
3     speech_engine={"request_headers": {"x-api-key": os.environ["SHARED_SECRET"]}},
4 )
```
The brain server then checks request.headers["x-api-key"] before accepting the WebSocket upgrade.
Token server: Mint LiveKit and Speech Engine tokens server-side. Never expose LIVEKIT_API_SECRET or ELEVENLABS_API_KEY to the browser.
Event loop hygiene: Keep CPU-bound work off the worker’s event loop. AudioSource.capture_frame and AudioStream iteration are time-sensitive; long synchronous calls will delay or drop interruption events. Use asyncio.to_thread() (Python) or worker_threads (Node) for blocking work.
Shutdown: Register ctx.add_shutdown_callback / ctx.addShutdownCallback to close the ElevenLabs WebSocket cleanly. By default, the room (and the job) is terminated when the last non-agent participant leaves.

Next steps

Speech Engine quickstart

Build the brain server that responds to transcripts.

Pipecat integration

Use Pipecat as the LLM pipeline behind Speech Engine.

Python SDK reference

Classes, methods, and events for the Speech Engine Python SDK.

JavaScript SDK reference

Classes, methods, and events for the Speech Engine JavaScript SDK.

1	await elevenlabs.speech_engine.update(
2	speech_engine_id="seng_8k3m9xr4hjnfg983brhmhkd98n6",
3	speech_engine={"request_headers": {"x-api-key": os.environ["SHARED_SECRET"]}},
4	)