Best Practices

Zoom's real-time transcription API

Amanda Zhu

February 5, 2023

What the best way to receive a streaming transcript from Zoom?

Unfortunately, Zoom does not provide an API to receive real-time transcription. You must work with the live audio streams from Zoom and pipe them through a third-party transcription provider.

Now, you may be wondering how to access the live audio streams from Zoom. Here are 4 ways you could do this:

Use the Zoom RTMP live-streaming API

Pros:

  • Doesn’t require any 3rd party services.

Cons:

  • Needs to initiated on a per-meeting basis.
  • You need to set up an RTMP server to receive the data, which requires engineering effort to deploy, scale, and monitor.
  • Audio latency of 5 to 10 seconds.
  • Participants can get spooked by the “live” badge that appears in the meeting (even if it’s a private meeting).
  • No speaker separation.

Build a desktop app to capture users’ computer audio

Pros:

  • Cost effective to run.

Cons:

  • You need to build a separate app for Windows, Mac and Linux.
  • It is especially difficult to tap into computer audio on Mac.
  • App runs on users’ computer so it can slow down their computer.
  • No speaker separation.

Build a Zoom bot

Pros:

  • Can get the separate audio streams per participant for perfect diarization / speaker labels.

Cons:

  • It is very heavy-weight as you would need to spin up multiple servers to run the Zoom client for the bot.
  • Running infrastructure for Zoom bot costs more than live streaming.
  • You need to encode the raw video and audio yourself.

Use Recall.ai

It’s a unified API that lets you send meeting bots to video conferencing platforms to capture the audio,video and transcription in real-time.

Pros:

  • Handles spinning up the servers, and providing the real-time raw audio/transcript so all you interact with is a simple API.
  • Works on any Zoom plan (including Free).
  • Gets speaker diarization / speaker labels.
  • Works agnostic of meeting platform.
  • Already integrated with transcription providers so you can receive the live transcript directly.

Cons:

  • It’s another 3rd party service in your stack.

Once you receive the audio stream from Zoom using any of the methods above, you can choose a transcription provider to pipe the audio to. If you use Recall.ai, you’ll be able to skip this step as Recall.ai integrates with transcription providers natively. This saves you the pain of juggling live audio streams.

If you want to try out Recall.ai, request an API key here!