Zoom Developer Forum

How to transcribe a Zoom meeting in real time

Updated at:
October 14, 2025
Written By:
Aydin Schwartz

Question

People commonly ask on the Zoom Developer Forum:

How can developers access real-time transcription of a Zoom meeting via an API or other means, considering user consent and the need for speaker identification, while also understanding the limitations of existing solutions?

Answer

Currently, there is no direct API endpoint provided by Zoom to access real-time transcription during an ongoing meeting. However, developers can explore several alternative approaches to achieve real-time audio capture and transcription. Each method has its own advantages and disadvantages:

  1. Use Zoom’s RTMP Live-Streaming API - Pros:

    • No need for third-party services.
    • Lighter weight compared to running a Zoom bot.
    • Cons:
    • Must be initiated by the end user for each meeting.
    • Requires setting up and maintaining an RTMP server for receiving the stream.
    • Participants will see a “live” badge, which may be disruptive.
    • Lacks speaker separation, making it difficult to identify who is speaking.
  2. Build a Zoom Bot - Pros:

    • Can capture separate audio streams for each participant, allowing for accurate speaker diarization and labeling.
    • Does not alert participants as much as other methods.
    • Cons:
    • Requires significant infrastructure, including spinning up servers to run the Zoom client for the bot.
    • More expensive to maintain compared to live streaming.
    • Developers must handle the encoding of raw audio and video themselves.
  3. Develop a Desktop Application to Capture System Audio - Pros:

    • Cost-effective solution for capturing audio.
    • Cons:
    • Requires separate applications for different operating systems (Windows, Mac, Linux).
    • Capturing system audio on Mac can be particularly challenging.
    • Running on users’ machines may impact performance and device resources.
    • Lacks speaker separation.
  4. Utilize Recall.ai - Description: A unified API that deploys meeting bots to capture audio and video in real time across various platforms. - Pros:

    • Handles server orchestration and provides real-time raw audio and transcripts via a simple API.
    • Supports speaker diarization and works across different meeting platforms, including Zoom.
    • Compatible with any Zoom plan, including Free.
    • Cons:
    • Introduces a third-party service into your technology stack.

By considering these alternatives, developers can implement a solution that best fits their needs for real-time transcription during Zoom meetings.

Zoom Developer Forum Examples

Some examples of this question are:

Written By:
Aydin Schwartz