Question
People commonly ask on the Zoom Developer Forum:
What are the best practices for implementing real-time AI transcription and translation for multilingual Zoom meetings using Zoom APIs? How can I keep latency low while maintaining high accuracy, and what options work best for large-scale meetings in corporate or educational environments?
Answer
You can capture raw audio via the Zoom Meeting SDK and transcribe it using an AI transcription service by:
-
Capture and process audio: Use the Windows or Linux Zoom Meeting SDK to access raw meeting audio in real time, then stream it to a speech-to-text provider that supports streaming (e.g., Google Cloud Speech-to-Text, AWS Transcribe). Feed the resulting transcript into a translation service for multilingual output.
-
Deliver results to participants: Send translated text (and optionally synthesized audio) back to clients over WebSockets for real-time updates.
Zoom Developer Forum Examples
Some examples of this question are: