Tutorials

Zoom Transcript API: Tutorial, Common Pitfalls, and Workarounds

Vivek Sahu

May 28, 2024

Table of Contents

Table of Contents

Introduction

Zoom provides a seamless platform for video conferencing, but there are often scenarios where we need to access the transcripts of a recorded meeting. This could be useful for various purposes, such as sharing meeting notes over email, translating the meeting content into multiple languages, or analyzing and summarizing the discussion using advanced language models like OpenAI ChatGPT.

Fortunately, Zoom offers APIs that allow developers to retrieve transcripts for the recorded meetings, albeit with some limitations. However, the Zoom official documentation does not provide a detailed guide for using these APIs, or the caveats they come with.

In this article, we will explore the process of accessing these transcripts via the Zoom API. We will cover the necessary setup, including creating a Zoom App and obtaining the required credentials, as well as providing a Python code example that demonstrates how to interact with the Zoom API to retrieve the meeting transcripts. We will also look into its limitations and some workarounds that can be employed.

Note that while we are using Python here, this can be easily translated into other languages with ease. At the heart of it, all we are doing is calling Zoom APIs.

Overview of steps

We will simply do 3 things:

Brief on Zoom Apps and OAuth2

Zoom employs the OAuth 2.0 protocol for authorizing access to its APIs. OAuth is an industry-standard authorization mechanism that allows applications to securely access resources on behalf of a user or service account without exposing sensitive credentials like passwords.

To access the Zoom APIs, you need to create a Server-to-Server OAuth app within the Zoom App Marketplace. This type of application is suitable for machine-to-machine communication, which is precisely what we need in this case.

Pre-requisites

Create a Server-to-Server OAuth App

Follow these steps:

1. Let's go to the Zoom App Marketplace. Click on Develop and then click on Server-to-Server App.

Step 1

2. Give a name to your app. In this example, we've called it MyTranscriptionApp. You can give it your own name. Click on Create.

Step 2

3. You should now be in the App Credentials section: Notice your Account ID, Client ID, and Client Secret in the App Credentials. These client values are specific to your app. When we make Zoom API calls later, we will use these.

Step 3

4. Provide details in the Information section.

Step 4

5. Next jump to Scopes. Here, we will be defining the scopes to which this app's tokens have access to. This in turn defines which APIs can be called using the token.

In this tutorial, we'll be using the following endpoints:

That means we'll need the following scopes, which Zoom mentions in the documentation for those endpoints:

  • cloud_recording:read:list_user_recordings:admin
  • cloud_recording:read:list_recording_files:admin

Step 5

6. Next, go to the Activation section and click Activate your app. Your app won't provide the access_tokens without this step.

Before Activation:

Step 6A

After Activation:

Step 6B

Start and complete a meeting

Start a Zoom meeting. Ensure that you have enabled Cloud recording by expanding Options, checking the option for Automatically record meeting, and selecting the In the cloud radio button. Step 7

Join the meeting alone, or add more people. Talk for a couple of seconds during the meeting, and then end the meeting.

It will take a few minutes for the recording to show up as Zoom needs to process it. For this article, we'll monitor the Recordings Tab for the recording.

You can also use the recording.transcript_completed webhook event to be notified when the recording transcript is available.

Write code to fetch transcripts

Now that we've set up our Server-to-Server OAuth App and completed a meeting, let's write code to fetch details from the Zoom API.

In this section, we'll create a Zoom SDK Python Client that can interact with the Zoom API to fetch transcripts.

Create a Zoom SDK Client

We'll create a simple Zoom SDK Client.

$ mkdir my-zoom-sdk-client
$ cd my-zoom-sdk-client
$ python3 -m venv venv
$ source venv/bin/activate
$ echo "requests" >> requirements.txt
$ pip install -r requirements.txt

Create a file called main.py and copy the following code:

"""
MyZoomSDKClient
"""
import requests


class MyZoomSDKClient:

    # Initialize the client with account_id, client_id, and client_secret
    def __init__(self, client_id, client_secret, account_id):
        self.client_id = client_id
        self.client_secret = client_secret
        self.account_id = account_id
        self.access_token = None

    # Generate a new access token only if one does not exist, or if the user explicitly asks to generate 
    # (using 'force=True' means generating a new token)
    def get_access_token(self, force=False):
        if self.access_token and not force:
            return self.access_token

        access_token_uri = "https://zoom.us/oauth/token"
        data = {
            "grant_type": "account_credentials",
            "account_id": self.account_id,
            "client_id": self.client_id,
            "client_secret": self.client_secret
        }
        response = self._make_request(access_token_uri, data=data, method='POST')
        return response.json()["access_token"]

    def _make_request(self, uri, headers=None, method='GET', data=None):
        try:
            if method == 'POST':
                response = requests.post(uri, headers=headers, data=data)
            else:
                response = requests.get(uri, headers=headers)

            if response.status_code // 100 != 2:
                raise Exception('Invalid response from Zoom API. Please try again later.')
            return response
        except Exception as e:
            raise Exception('Invalid response from Zoom API. Please try again later. Error Message: ' + str(e))

  • __init__ simply initializes our client with the account_id, client_id and client_secret.(Remember we saw these earlier in Step 3 in the Create a Server-to-Server OAuth App section?)
  • _make_request is just a utility function that allows us to call any REST APIs.
  • get_access_token generates a new access token by calling the Zoom OAuth API. force is a simple maneuver to ensure a new access_token is generated instead of reusing the earlier access_token. access_tokens expire after a certain time.

Let's see if all works well.

$ python3
>>> CLIENT_ID='' # Put your Client ID
>>> CLIENT_SECRET='' # Put your CLIENT_SECRET
>>> ACCOUNT_ID='' # Put your ACCOUNT_ID
>>> from main import MyZoomSDKClient
>>> client = MyZoomSDKClient(CLIENT_ID, CLIENT_SECRET, ACCOUNT_ID)
>>> access_token = client.get_access_token()
>>> print(access_token)
>>> exit()

List all meetings

We'll modify the code to access the List Recordings API.

GET /users/{userId}/recordings

Update main.py to call List Meetings and return parsed data. Use the following code:

#...
# Generate authorization header using access_token
def get_authorization_header(self):
    return { "Authorization": f"Bearer {self.get_access_token()}" }

# List id and topic of all my meetings
def list_my_meetings(self):
    list_meetings_uri = "https://api.zoom.us/v2/users/me/recordings"
    headers = self.get_authorization_header()

    response = self._make_request(list_meetings_uri, headers=headers)
    return [(x['id'], x['topic']) for x in response.json()['meetings']]
#...

Let's see it in action:

$ python3
>>> CLIENT_ID='' # Put your Client ID
>>> CLIENT_SECRET='' # Put your CLIENT_SECRET
>>> ACCOUNT_ID='' # Put your ACCOUNT_ID
>>> from main import MyZoomSDKClient
>>> client = MyZoomSDKClient(CLIENT_ID, CLIENT_SECRET, ACCOUNT_ID)
>>> meetings = client.list_my_meetings()
>>> print(meetings)
>>> exit()

Get Meeting Details

Let's modify the code to access the Get Recording API.

GET "/meetings/{meetingId}/recordings"

Update main.py to add the following function:

#...
    # Get meeting details by ID
    def get_meeting_details(self, meeting_id):
        if meeting_id in self.meetings:
            return self.meetings[meeting_id]

        meeting_details_url = f"https://api.zoom.us/v2/meetings/{meeting_id}/recordings"
        headers = self.get_authorization_header()

        self.meetings[meeting_id] = self._make_request(meeting_details_url, headers=headers).json()
        return self.meetings[meeting_id]
#...

Also, add this new property to the client:

class MyZoomSDKClient:

    # Initialize the client with account_id, client_id, and client_secret
    def __init__(self, client_id, client_secret, account_id):
        # ...
        self.meetings = {}

Let's run this code to get our meeting details:

$ python3
>>> CLIENT_ID='' # Put your Client ID
>>> CLIENT_SECRET='' # Put your CLIENT_SECRET
>>> ACCOUNT_ID='' # Put your ACCOUNT_ID
>>> from main import MyZoomSDKClient
>>> client = MyZoomSDKClient(CLIENT_ID, CLIENT_SECRET, ACCOUNT_ID)
>>> meetings = client.list_my_meetings()
>>> my_meeting = client.get_meeting_details(meetings[0][0])
>>> print(my_meeting)
>>> exit()

Get Transcriptions

Finally, let's add the following code to fetch the transcripts. Transcripts are available as part of the Get Recording API. The API response contains a recording_files property which contains all the download_urls for video recordings, audio recordings, transcripts, and more. We can check the recording_type to find the type of recording. We are interested in the audio_transcript as recording_type.

Modify main.py to add the following function:

#...
    # Get audio transcript for a given meeting
    def get_audio_transcript(self, meeting_id):
        meeting_details = self.get_meeting_details(meeting_id)

        aud_ts_uri = [i['download_url'] for i in meeting_details['recording_files'] if i['recording_type']=='audio_transcript'][0]
        headers = self.get_authorization_header()

        response = self._make_request(aud_ts_uri, headers=headers)
        # response.content contains audio transcript in bytes, so we need to decode it to understand the content.
        decoded_content = response.content.decode("utf-8")
        final_formatted_content = decoded_content.strip().split('\r\n\r') # format content

        # Print the audio transcript
        for line in final_formatted_content: print(line)
#...

Let's run the following to fetch the transcripts:

$ python3
>>> CLIENT_ID='' # Put your Client ID
>>> CLIENT_SECRET='' # Put your CLIENT_SECRET
>>> ACCOUNT_ID='' # Put your ACCOUNT_ID
>>> from main import MyZoomSDKClient
>>> client = MyZoomSDKClient(CLIENT_ID, CLIENT_SECRET, ACCOUNT_ID)
>>> meetings = client.list_my_meetings()
>>> client.get_audio_transcript(meetings[0][0])

This is an example output. Your output should be in a similar format:

...
WEBVTT

1
00:00:16.239 --> 00:00:27.079
John: Hi, this is a test for audio transcripts.

2
00:00:28.079 --> 00:00:30.239
Wewake: I am going to say.

3
00:00:30.869 --> 00:00:32.329
Wewake: a few words.

4
00:00:32.629 --> 00:00:41.649
Wewake: I expect this to be recorded and available to me later.

5
00:00:44.779 --> 00:00:46.509
Wewake: That is it.

6
00:00:52.219 --> 00:00:54.909
Wewake: So simple.

7
00:00:56.319 --> 00:01:00.269
Wewake: We are done.

8
00:01:00.749 --> 00:01:01.629
Wewake: Bye.

Bonus: Closed Caption

We can also retrieve the closed captions from the recording_files. For this, we would need to enable closed caption during the meeting. Here is the code to fetch it:

#...
    # Get closed captions
    def get_closed_captions(self, meeting_id):
        meeting_details = self.get_meeting_details(meeting_id)

        cc_uri = [i['download_url'] for i in meeting_details['recording_files'] if i['recording_type']=='closed_caption'][0]
        headers = self.get_authorization_header()

        response = self._make_request(cc_uri, headers=headers)

        # response.content contains closed caption in bytes, so we need to decode it to understand the content.
        decoded_content = response.content.decode("utf-8")
        final_formatted_content = decoded_content.strip().split('\r\n\r') # format content

        # Print the closed captions
        for line in final_formatted_content: print(line)
#...

Run it:

$ python3
>>> CLIENT_ID='' # Put your Client ID
>>> CLIENT_SECRET='' # Put your CLIENT_SECRET
>>> ACCOUNT_ID='' # Put your ACCOUNT_ID
>>> from main import MyZoomSDKClient
>>> client = MyZoomSDKClient(CLIENT_ID, CLIENT_SECRET, ACCOUNT_ID)
>>> meetings = client.list_my_meetings()
>>> client.get_closed_captions(meetings[0][0])

Complete Code for the Zoom Client

"""
MyZoomSDKClient

Simple Zoom SDK to get access_token, recordings, transcripts, and closed captions.
for a Zoom meeting (with cloud recording). Requires client_id, client_secret and account_id
"""
import requests


class MyZoomSDKClient:

    # Initialize the client with account_id, client_id, and client_secret
    def __init__(self, client_id, client_secret, account_id):
        self.client_id = client_id
        self.client_secret = client_secret
        self.account_id = account_id
        self.access_token = None
        self.meetings = {}

    # Generate a new access token only if one does not exist, or if the user explicitly asks to generate 
    # (using 'force=True' means generating a new token)
    def get_access_token(self, force=False):
        if self.access_token and not force:
            return self.access_token

        access_token_uri = "https://zoom.us/oauth/token"
        data = {
            "grant_type": "account_credentials",
            "account_id": self.account_id,
            "client_id": self.client_id,
            "client_secret": self.client_secret
        }
        response = self._make_request(access_token_uri, data=data, method='POST')
        return response.json()["access_token"]

    # Generate authorization header using access_token
    def get_authorization_header(self):
        return { "Authorization": f"Bearer {self.get_access_token()}" }

    # List id and topic of all my meetings
    def list_my_meetings(self):
        list_meetings_uri = "https://api.zoom.us/v2/users/me/recordings"
        headers = self.get_authorization_header()

        response = self._make_request(list_meetings_uri, headers=headers)
        return [(x['id'], x['topic']) for x in response.json()['meetings']]

    # Get meeting details by ID
    def get_meeting_details(self, meeting_id):
        if meeting_id in self.meetings:
            return self.meetings[meeting_id]

        meeting_details_url = f"https://api.zoom.us/v2/meetings/{meeting_id}/recordings"
        headers = self.get_authorization_header()

        self.meetings[meeting_id] = self._make_request(meeting_details_url, headers=headers).json()
        return self.meetings[meeting_id]

    # Get audio transcript for a given meeting
    def get_audio_transcript(self, meeting_id):
        meeting_details = self.get_meeting_details(meeting_id)

        aud_ts_uri = [i['download_url'] for i in meeting_details['recording_files'] if i['recording_type']=='audio_transcript'][0]
        headers = self.get_authorization_header()

        response = self._make_request(aud_ts_uri, headers=headers)
        # response.content contains audio transcript in bytes, so we need to decode it to understand the content.
        decoded_content = response.content.decode("utf-8")
        final_formatted_content = decoded_content.strip().split('\r\n\r') # format content

        # Print the audio transcript
        for line in final_formatted_content: print(line)


    # Get closed captions
    def get_closed_captions(self, meeting_id):
        meeting_details = self.get_meeting_details(meeting_id)

        cc_uri = [i['download_url'] for i in meeting_details['recording_files'] if i['recording_type']=='closed_caption'][0]
        headers = self.get_authorization_header()

        response = self._make_request(cc_uri, headers=headers)

        # response.content contains closed caption in bytes, so we need to decode it to understand the content.
        decoded_content = response.content.decode("utf-8")
        final_formatted_content = decoded_content.strip().split('\r\n\r') # format content

        # Print the closed captions
        for line in final_formatted_content: print(line)

    def _make_request(self, uri, headers=None, method='GET', data=None):
        try:
            if method == 'POST':
                response = requests.post(uri, headers=headers, data=data)
            else:
                response = requests.get(uri, headers=headers)

            if response.status_code // 100 != 2:
                raise Exception('Invalid response from Zoom API. Please try again later.')
            return response
        except Exception as e:
            raise Exception('Invalid response from Zoom API. Please try again later. Error Message: ' + str(e))

Limitations of the Zoom Transcript API

You might have noticed how we made some specific choices when we created the Zoom App to get transcription. This is because the native Zoom APIs have several limitations:

1. You need to be on a Paid Plan

The Zoom account hosting the meeting must be on the Pro, Business, or Enterprise tier. The free Basic plan will not work.

2. You need to enable Zoom Cloud Recording

Transcripts are only produced if the user records their meeting using Zoom Cloud Recording (not Zoom local recording). Zoom also has live captions that appear without recording the meeting, but it does not produce a transcript file after the call unless the meeting is being recorded.

3. Only the Meeting Host can access the transcript

The recorded meeting is only stored in the host's account, so other users cannot access the meeting and transcript details.

4. Meeting Host must enable recording settings for the transcript to be generated

The audio transcript feature in Zoom is disabled by default. You must enable this setting in Zoom for transcription to be produced.

5. Transcripts are only in English

English is the only language supported by Zoom’s transcript feature right now.

6. Transcripts can take a long time to become available

After the meeting has ended, it typically takes about 2 times the duration of the recorded meeting for all the recordings to be available. For example, the transcript of a 30 minute long meeting will be available 1 hour after the meeting is done. Depending on the load on Zoom's servers, processing time can take up to 24 hours.

7. Transcripts are not available in real-time

Zoom Cloud transcripts are only available after the meeting is done, so you won’t be able to get the transcription in real-time.

Workarounds

To workaround the above limitations, there are a few options:

Workaround 1: Use the Zoom RTMP Live-Streaming API

Zoom allows Live Streaming the meeting to custom apps, using the Real-Time Messaging Protocol (RTMP) protocol. This allows you to use a custom app to fetch the audio of the meeting. This audio can then be piped to any 3rd-party transcription service to get real-time transcripts.

Advantages:

  • You get real-time transcripts.
  • You can get transcripts in any language.


Disadvantages:

  • The setup is more complex. You need to set up a live-streaming server.
  • You also need to integrate with a 3rd-party service for transcription, which can add to the cost and complexity.
  • This also requires a Zoom paid plan.
  • This also requires you to be the host.

Workaround 2: Build a Zoom Meeting Bot

Zoom allows meeting bots to access the video and audio contents of the meetings.

Advantages:

  • You can get real-time transcripts.
  • You don't need a paid Zoom plan.
  • You don't need to be the host to access transcriptions.


Disadvantages:

  • The meeting bot shows up as a participant in the call, although this is now seen as acceptable by many.
  • Developing a meeting bot is quite complex since Zoom does not provide an API to call to build one.
  • There is a scale problem. Meeting bots require a separate instance of a Zoom client for each meeting. As the amount of meetings grows, this can become extremely difficult to manage. Also, it may blow up the infrastructure costs. It typically takes 6-12 months to build a stable meeting bot in its most basic form.


The major disadvantages here can be overcome by using an out-of-the-box API provider for meeting bots like Recall.ai. Recall.ai handles the running of the VMs. All the user needs to do is call an API endpoint to launch a bot. Zoom released a Meeting Bot Starter Kit in partnership with Recall.ai that makes it much easier to get started.

Conclusion

We looked at how to programmatically access Zoom transcripts using the Zoom API, several limitations of the Zoom API, and workarounds you can use to overcome those limitations.

References