Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

app.py Gemini/Twilio w/ robust error handling, faster image encoding, and UI status updates #88

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ahundt
Copy link

@ahundt ahundt commented Feb 26, 2025

Note, while this code worked with gradio_webrtc==0.0.28 (modulo the bugs previously discussed googleapis/python-genai#380 and aiortc/aiortc#1258 ), it currently crashes with fastrtc==0.0.6 when run locally on an m3 mac.
image
with this version info, while running on an m3 mac:

[project]
name = "gemini-audio-video-chat"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.13"
dependencies = [
    "fastrtc[vad, tts]==0.0.6",
    "google-genai==0.3.0",
    "twilio",
    "opencv-python",
    "dotenv",
]

And the output doesn't betray any major errors:

athundt@Andrews2024MBP|~/source/gemini-audio-video-chat on ui_improvements!?
± uv run app.py
/Users/athundt/source/gemini-audio-video-chat/.venv/lib/python3.13/site-packages/google_crc32c/__init__.py:29: RuntimeWarning: As the c extension couldn't be imported, `google-crc32c` is using a pure python implementation that is significantly slower. If possible, please configure a c build environment and compile the extension
  warnings.warn(_SLOW_CRC32C_WARNING, RuntimeWarning)
2025-02-26 16:53:48,133 - INFO - Attempting to get Twilio credentials (attempt 1)...
2025-02-26 16:53:48,190 - INFO - -- BEGIN Twilio API Request --
2025-02-26 16:53:48,190 - INFO - POST Request: https://api.twilio.com/2010-04-01/Accounts/ACfa954a3e72949b7b8c02f42beb438966/Tokens.json
2025-02-26 16:53:48,190 - INFO - Headers:
2025-02-26 16:53:48,190 - INFO - Content-Type : application/x-www-form-urlencoded
2025-02-26 16:53:48,190 - INFO - Accept : application/json
2025-02-26 16:53:48,190 - INFO - User-Agent : twilio-python/9.4.6 (Darwin x86_64) Python/3.13.2
2025-02-26 16:53:48,190 - INFO - X-Twilio-Client : python-9.4.6
2025-02-26 16:53:48,190 - INFO - Accept-Charset : utf-8
2025-02-26 16:53:48,190 - INFO - -- END Twilio API Request --
2025-02-26 16:53:48,499 - INFO - Response Status Code: 201
2025-02-26 16:53:48,499 - INFO - Response Headers: {'Content-Type': 'application/json;charset=utf-8', 'Content-Length': '1192', 'Connection': 'keep-alive', 'Date': 'Wed, 26 Feb 2025 21:53:48 GMT', 'Twilio-Concurrent-Requests': '1', 'Twilio-Request-Id': 'RQ16281ff7d4de919554b87046cba1e036', 'Twilio-Request-Duration': '0.049', 'X-Home-Region': 'us1', 'X-API-Domain': 'api.twilio.com', 'Strict-Transport-Security': 'max-age=31536000', 'X-Cache': 'Miss from cloudfront', 'Via': '1.1 1fecb697c6f121d7ce54a35628ac154e.cloudfront.net (CloudFront)', 'X-Amz-Cf-Pop': 'IAD61-P2', 'X-Amz-Cf-Id': '1dr27ZIYkNBQo-G61YyOD_cwC3txTht7xO5zdrFQMw5zbBtR-eGJFA==', 'X-Powered-By': 'AT-5000', 'X-Shenanigans': 'none', 'Vary': 'Origin'}
2025-02-26 16:53:48,499 - INFO - Twilio credentials response: {'iceServers': [{'url': 'stun:global.stun.twilio.com:3478', 'urls': 'stun:global.stun.twilio.com:3478'}, {'credential': 'ZdosbIThoHiWTOOjDOt0T4wBygdWlfzjXjJOocGWu3Y=', 'url': 'turn:global.turn.twilio.com:3478?transport=udp', 'urls': 'turn:global.turn.twilio.com:3478?transport=udp', 'username': 'c9136edbb903bdf9a66799be17f23526e45f2b87155497dad4b9ba4ef97a44a1'}, {'credential': 'ZdosbIThoHiWTOOjDOt0T4wBygdWlfzjXjJOocGWu3Y=', 'url': 'turn:global.turn.twilio.com:3478?transport=tcp', 'urls': 'turn:global.turn.twilio.com:3478?transport=tcp', 'username': 'c9136edbb903bdf9a66799be17f23526e45f2b87155497dad4b9ba4ef97a44a1'}, {'credential': 'ZdosbIThoHiWTOOjDOt0T4wBygdWlfzjXjJOocGWu3Y=', 'url': 'turn:global.turn.twilio.com:443?transport=tcp', 'urls': 'turn:global.turn.twilio.com:443?transport=tcp', 'username': 'c9136edbb903bdf9a66799be17f23526e45f2b87155497dad4b9ba4ef97a44a1'}], 'iceTransportPolicy': 'relay'}
2025-02-26 16:53:48,499 - INFO - Twilio TURN server available.
2025-02-26 16:53:48,566 - INFO - -- BEGIN Twilio API Request --
2025-02-26 16:53:48,566 - INFO - POST Request: https://api.twilio.com/2010-04-01/Accounts/ACfa954a3e72949b7b8c02f42beb438966/Tokens.json
2025-02-26 16:53:48,566 - INFO - Headers:
2025-02-26 16:53:48,566 - INFO - Content-Type : application/x-www-form-urlencoded
2025-02-26 16:53:48,566 - INFO - Accept : application/json
2025-02-26 16:53:48,566 - INFO - User-Agent : twilio-python/9.4.6 (Darwin x86_64) Python/3.13.2
2025-02-26 16:53:48,566 - INFO - X-Twilio-Client : python-9.4.6
2025-02-26 16:53:48,566 - INFO - Accept-Charset : utf-8
2025-02-26 16:53:48,566 - INFO - -- END Twilio API Request --
2025-02-26 16:53:48,689 - INFO - Response Status Code: 201
2025-02-26 16:53:48,689 - INFO - Response Headers: {'Content-Type': 'application/json;charset=utf-8', 'Content-Length': '1192', 'Connection': 'keep-alive', 'Date': 'Wed, 26 Feb 2025 21:53:48 GMT', 'Twilio-Concurrent-Requests': '1', 'Twilio-Request-Id': 'RQ07d1dc4e5762a2408a3cbdd683b7513b', 'Twilio-Request-Duration': '0.058', 'X-Home-Region': 'us1', 'X-API-Domain': 'api.twilio.com', 'Strict-Transport-Security': 'max-age=31536000', 'X-Cache': 'Miss from cloudfront', 'Via': '1.1 7c52bc60e0da5f557ed6047264a41c18.cloudfront.net (CloudFront)', 'X-Amz-Cf-Pop': 'IAD61-P2', 'X-Amz-Cf-Id': 'DzRq4ZHXRB2auwP9sAUGH160f8FYLqBIaRBDiLyxi0k8AmWMLymIRQ==', 'X-Powered-By': 'AT-5000', 'X-Shenanigans': 'none', 'Vary': 'Origin'}
* Running on local URL:  http://127.0.0.1:7860
2025-02-26 16:53:48,831 - INFO - HTTP Request: GET http://127.0.0.1:7860/gradio_api/startup-events "HTTP/1.1 200 OK"
2025-02-26 16:53:48,845 - INFO - HTTP Request: HEAD http://127.0.0.1:7860/ "HTTP/1.1 200 OK"

To create a public link, set `share=True` in `launch()`.
2025-02-26 16:53:48,855 - INFO - HTTP Request: GET https://api.gradio.app/pkg-version "HTTP/1.1 200 OK"

However, the current app.py also fails similarly on fastrtc==0.0.6 when run locally, as did this suggested huggingface spaces version b88286b.

Continuing from this discussion:
https://huggingface.co/spaces/freddyaboulton/gemini-audio-video-chat/discussions/1

See also the bugs previously discussed:
googleapis/python-genai#380 and aiortc/aiortc#1258

This commit improves the Gemini and Twilio integration with a focus on better error handling, UI feedback, connection stability, and faster image encoding.

  • Faster, Robust Image Encoding: Enhanced encode_image with comprehensive input validation (NaN/Inf, shape), normalization, and faster JPEG encoding error handling using OpenCV.
  • Synchronous Twilio Check (Pre-UI): Implemented synchronous Twilio TURN server availability check before Gradio initialization to avoid race conditions. Includes retry logic with exponential backoff. This ensures accurate status before the UI loads.
  • UI Status Updates:
    • Added immediate Twilio status update on UI load.
    • Gemini connection status is displayed and updated to inform users.
  • Robust Gemini Connection: Improved Gemini connection logic with more comprehensive error handling and UI feedback on connection failures.
  • Improved Shutdown: The GeminiHandler.shutdown method is more robust to ensure proper cleanup and prevent lingering issues.
  • API key validation: Added API key validation to improve the user experience.

…or handling, faster image encoding, and UI status updates

This commit improves the Gemini and Twilio integration with a focus on better error handling, UI feedback, connection stability, and faster image encoding.

- **Faster, Robust Image Encoding:** Enhanced `encode_image` with comprehensive input validation (NaN/Inf, shape), normalization, and faster JPEG encoding error handling using OpenCV.
- **Synchronous Twilio Check (Pre-UI):** Implemented synchronous Twilio TURN server availability check *before* Gradio initialization to avoid race conditions. Includes retry logic with exponential backoff. This ensures accurate status before the UI loads.
- **UI Status Updates:**
    - Added immediate Twilio status update on UI load.
    - Gemini connection status is displayed and updated to inform users.
- **Robust Gemini Connection:** Improved Gemini connection logic with more comprehensive error handling and UI feedback on connection failures.
- **Improved Shutdown:** The `GeminiHandler.shutdown` method is more robust to ensure proper cleanup and prevent lingering issues.
- **API key validation:** Added API key validation to improve the user experience.
@ahundt
Copy link
Author

ahundt commented Feb 26, 2025

Also, in the gradio_webrtc==0.0.28 version that worked best, I got a run of the app to succeed for a couple of minutes without crashes when on a network where packets drop much less often.

@freddyaboulton
Copy link
Owner

The infinite spinner should be fixed now if you install the latest version (0.0.9!)

@tanquangduong
Copy link

Thank @freddyaboulton, with the lastest version (0.0.9) the infinite spinner is fixed. But there is a new bug: "fastrtc.utils.WebRTCError: timed out during handshake"

Copy link
Owner

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ahundt ! Thanks for the contribution and sorry for the delay in getting this review to you.

There were some api improvements made to the library since you started working on the original gradio_webrtc demo that should make this code easier. Some are already present in the current gemini_audio_video/app.py file and I'd like them to be incorporated into this demo before merging. Namely

  • No need for async def generator() and async def receive_audio() anymore. The async def generator() becomes async def startup() and there's no need for receive_audio or (generator()) anymore.
  • Instead of catching Cancelled errors in the emit functions you can use wait_for_item. Also errors are automatically propagated to the UI now so you should not need to catch and return None
  • The demo will not run on spaces if the twilio credentials are not set so I don't think you need to do the twilio set. And you don't need to use twilio locally. You can use this pattern for only calling them in spaces
  • To close the connection in shutdown, you can do await self.connection._websocket.close(). Shutdown can now be async.

Separately, does update_gemini_status_sync work? 👀 I'd be surprised if you could update a gradio component like that from the stream.

Also, can you move the encode image/audio files to a separate utils page?

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants