Zoom Developer Forum

What are best practices for implementing AI transcription and translation in Zoom meetings?

Updated at:
September 8, 2025
Written By:
Gerry Saporito

Question

People commonly ask on the Zoom Developer Forum:

What are the best practices for implementing real-time AI transcription and translation for multilingual Zoom meetings using Zoom APIs? How can I keep latency low while maintaining high accuracy, and what options work best for large-scale meetings in corporate or educational environments?

Answer

You can capture raw audio via the Zoom Meeting SDK and transcribe it using an AI transcription service by:

  1. Capture and process audio: Use the Windows or Linux Zoom Meeting SDK to access raw meeting audio in real time, then stream it to a speech-to-text provider that supports streaming (e.g., Google Cloud Speech-to-Text, AWS Transcribe). Feed the resulting transcript into a translation service for multilingual output.

  2. Deliver results to participants: Send translated text (and optionally synthesized audio) back to clients over WebSockets for real-time updates.


Zoom Developer Forum Examples

Some examples of this question are:

Written By:
Gerry Saporito