FastRTC

The Real-Time Communication Library for Python.

Turn any python function into a real-time audio and video stream over WebRTC or WebSockets.

Installation

pip install fastrtc

to use built-in pause detection (see ReplyOnPause), and text to speech (see Text To Speech), install the vad and tts extras:

pip install fastrtc[vad, tts]

Key Features

🗣️ Automatic Voice Detection and Turn Taking built-in, only worry about the logic for responding to the user.
💻 Automatic UI - Use the .ui.launch() method to launch the webRTC-enabled built-in Gradio UI.
🔌 Automatic WebRTC Support - Use the .mount(app) method to mount the stream on a FastAPI app and get a webRTC endpoint for your own frontend!
⚡️ Websocket Support - Use the .mount(app) method to mount the stream on a FastAPI app and get a websocket endpoint for your own frontend!
📞 Automatic Telephone Support - Use the fastphone() method of the stream to launch the application and get a free temporary phone number!
🤖 Completely customizable backend - A Stream can easily be mounted on a FastAPI app so you can easily extend it to fit your production application. See the Talk To Claude demo for an example on how to serve a custom JS frontend.

Docs

https://fastrtc.org

Examples

See the Cookbook for examples of how to use the library.

🗣️👀 Gemini Audio Video Chat

Stream BOTH your webcam video and audio feeds to Google Gemini. You can also upload images to augment your conversation!

gemini-audio-video-first.mp4

Demo | Code

🗣️ Google Gemini Real Time Voice API

Talk to Gemini in real time using Google's voice API.

gemini-live-chat.mp4

Demo | Code

🗣️ OpenAI Real Time Voice API

Talk to ChatGPT in real time using OpenAI's voice API.

openai-live-chat.mp4

Demo | Code

🤖 Hello Computer

Say computer before asking your question!

2025-02-20_00-05-11.mp4

Demo | Code

🤖 Llama Code Editor

Create and edit HTML pages with just your voice! Powered by SambaNova systems.

llama-code-editor.mp4

Demo | Code

🗣️ Talk to Claude

Use the Anthropic and Play.Ht APIs to have an audio conversation with Claude.

talk-to-claude.mp4

Demo | Code

🎵 Whisper Transcription

Have whisper transcribe your speech in real time!

whisper-realtime.mp4

Demo | Code

📷 Yolov10 Object Detection

Run the Yolov10 model on a user webcam stream in real time!

yolov10-stream.mp4

Demo | Code

🗣️ Kyutai Moshi

Kyutai's moshi is a novel speech-to-speech model for modeling human conversations.

talk-to-moshi.mp4

Demo | Code

🗣️ Hello Llama: Stop Word Detection

A code editor built with Llama 3.3 70b that is triggered by the phrase "Hello Llama". Build a Siri-like coding assistant in 100 lines of code!

hey-llama-final.mp4

Demo | Code

Usage

This is an shortened version of the official usage guide.

.ui.launch(): Launch a built-in UI for easily testing and sharing your stream. Built with Gradio.
.fastphone(): Get a free temporary phone number to call into your stream. Hugging Face token required.
.mount(app): Mount the stream on a FastAPI app. Perfect for integrating with your already existing production system.

Quickstart

Echo Audio

from fastrtc import Stream, ReplyOnPause
import numpy as np

def echo(audio: tuple[int, np.ndarray]):
    # The function will be passed the audio until the user pauses
    # Implement any iterator that yields audio
    # See "LLM Voice Chat" for a more complete example
    yield audio

stream = Stream(
    handler=ReplyOnPause(detection),
    modality="audio", 
    mode="send-receive",
)

LLM Voice Chat

from fastrtc import (
    ReplyOnPause, AdditionalOutputs, Stream,
    audio_to_bytes, aggregate_bytes_to_16bit
)
import gradio as gr
from groq import Groq
import anthropic
from elevenlabs import ElevenLabs

groq_client = Groq()
claude_client = anthropic.Anthropic()
tts_client = ElevenLabs()


# See "Talk to Claude" in Cookbook for an example of how to keep 
# track of the chat history.
def response(
    audio: tuple[int, np.ndarray],
):
    prompt = groq_client.audio.transcriptions.create(
        file=("audio-file.mp3", audio_to_bytes(audio)),
        model="whisper-large-v3-turbo",
        response_format="verbose_json",
    ).text
    response = claude_client.messages.create(
        model="claude-3-5-haiku-20241022",
        max_tokens=512,
        messages=[{"role": "user", "content": prompt}],
    )
    response_text = " ".join(
        block.text
        for block in response.content
        if getattr(block, "type", None) == "text"
    )
    iterator = tts_client.text_to_speech.convert_as_stream(
        text=response_text,
        voice_id="JBFqnCBsd6RMkjVDRZzb",
        model_id="eleven_multilingual_v2",
        output_format="pcm_24000"
        
    )
    for chunk in aggregate_bytes_to_16bit(iterator):
        audio_array = np.frombuffer(chunk, dtype=np.int16).reshape(1, -1)
        yield (24000, audio_array)

stream = Stream(
    modality="audio",
    mode="send-receive",
    handler=ReplyOnPause(response),
)

Webcam Stream

from fastrtc import Stream
import numpy as np


def flip_vertically(image):
    return np.flip(image, axis=0)


stream = Stream(
    handler=flip_vertically,
    modality="video",
    mode="send-receive",
)

Object Detection

from fastrtc import Stream
import gradio as gr
import cv2
from huggingface_hub import hf_hub_download
from .inference import YOLOv10

model_file = hf_hub_download(
    repo_id="onnx-community/yolov10n", filename="onnx/model.onnx"
)

# git clone https://huggingface.co/spaces/fastrtc/object-detection
# for YOLOv10 implementation
model = YOLOv10(model_file)

def detection(image, conf_threshold=0.3):
    image = cv2.resize(image, (model.input_width, model.input_height))
    new_image = model.detect_objects(image, conf_threshold)
    return cv2.resize(new_image, (500, 500))

stream = Stream(
    handler=detection,
    modality="video", 
    mode="send-receive",
    additional_inputs=[
        gr.Slider(minimum=0, maximum=1, step=0.01, value=0.3)
    ]
)

Running the Stream

Run:

Gradio

stream.ui.launch()

Telephone (Audio Only)

```py
stream.fastphone()
```

FastAPI

app = FastAPI()
stream.mount(app)

# Optional: Add routes
@app.get("/")
async def _():
    return HTMLResponse(content=open("index.html").read())

# uvicorn app:app --host 0.0.0.0 --port 8000

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
.github/workflows		.github/workflows
backend/fastrtc		backend/fastrtc
demo		demo
docs		docs
frontend		frontend
overrides/partials		overrides/partials
.gitignore		.gitignore
CNAME		CNAME
LICENSE		LICENSE
README.md		README.md
justfile		justfile
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
upload_space.py		upload_space.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FastRTC

The Real-Time Communication Library for Python.

Installation

Key Features

Docs

Examples

🗣️👀 Gemini Audio Video Chat

🗣️ Google Gemini Real Time Voice API

🗣️ OpenAI Real Time Voice API

🤖 Hello Computer

🤖 Llama Code Editor

🗣️ Talk to Claude

🎵 Whisper Transcription

📷 Yolov10 Object Detection

🗣️ Kyutai Moshi

🗣️ Hello Llama: Stop Word Detection

Usage

Quickstart

Echo Audio

LLM Voice Chat

Webcam Stream

Object Detection

Running the Stream

Gradio

Telephone (Audio Only)

FastAPI

About

Releases

Packages

Contributors 7

Languages

License

freddyaboulton/fastrtc

Folders and files

Latest commit

History

Repository files navigation

FastRTC

The Real-Time Communication Library for Python.

Installation

Key Features

Docs

Examples

🗣️👀 Gemini Audio Video Chat

🗣️ Google Gemini Real Time Voice API

🗣️ OpenAI Real Time Voice API

🤖 Hello Computer

🤖 Llama Code Editor

🗣️ Talk to Claude

🎵 Whisper Transcription

📷 Yolov10 Object Detection

🗣️ Kyutai Moshi

🗣️ Hello Llama: Stop Word Detection

Usage

Quickstart

Echo Audio

LLM Voice Chat

Webcam Stream

Object Detection

Running the Stream

Gradio

Telephone (Audio Only)

FastAPI

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages