Table of Contents
- Introduction
- Overview of steps
- Brief on Zoom Apps and OAuth2
- Pre-requisites
- Create a Server to Server OAuth App
- Start and complete a meeting
- Write code to fetch transcripts
- Create a project
- List all my meetings
- Get Meeting Details
- Get Transcriptions
- Bonus: Closed Caption
- Complete Code for Zoom Client
- Limitations of the Zoom Transcript API
- Workarounds
- Workaround 1: Use Zoom RTMP Live-Streaming API
- Workaround 2: Build a Zoom Meeting Bot
Introduction
Zoom provides a seamless platform for video conferencing, but there are often scenarios where we need to access the transcripts of a recorded meeting. This could be useful for various purposes, such as sharing meeting notes over email, translating the meeting content into multiple languages, or analyzing and summarizing the discussion using advanced language models like OpenAI ChatGPT.
Fortunately, Zoom offers APIs that allow developers to retrieve transcripts for the recorded meetings, albeit with some limitations. However, the Zoom official documentation does not provide a detailed guide for using these APIs, or the caveats they come with.
In this article, we will explore the process of accessing these transcripts via the Zoom API. We will cover the necessary setup, including creating a Zoom App and obtaining the required credentials, as well as providing a Python code example that demonstrates how to interact with the Zoom API to retrieve the meeting transcripts. We will also look into its limitations and some workarounds that can be employed.
Note that while we are using Python here, this can be easily translated into other languages with ease. At the heart of it, all we are doing is calling Zoom APIs.
Overview of steps
We will simply do 3 things:
- Create a Zoom App to allow access to some of their APIs
- Create and join a meeting with cloud recording enabled
- Programmatically fetch the audio transcripts using Python
Brief on Zoom Apps and OAuth2
Zoom employs the OAuth 2.0 protocol for authorizing access to its APIs. OAuth is an industry-standard authorization mechanism that allows applications to securely access resources on behalf of a user or service account without exposing sensitive credentials like passwords.
To access the Zoom APIs, you need to create a Server-to-Server OAuth app within the Zoom App Marketplace. This type of application is suitable for machine-to-machine communication, which is precisely what we need in this case.
Pre-requisites
- Zoom Paid Plan
- Python3 >= 3.3 installed in your system.
- If you are not the account admin, then your account should have:
- Permissions to view and edit Server-to-Server OAuth apps
- Permissions for scopes that you will add to the app . We will detail the scopes later in the article.
Create a Server-to-Server OAuth App
Follow these steps:
1. Let's go to the Zoom App Marketplace. Click on Develop
and then click on Server-to-Server App
.
2. Give a name to your app. In this example, we've called it MyTranscriptionApp
. You can give it your own name. Click on Create
.
3. You should now be in the App Credentials
section:
Notice your Account ID
, Client ID
, and Client Secret
in the App Credentials
. These client
values are specific to your app. When we make Zoom API calls later, we will use these.
4. Provide details in the Information
section.
5. Next jump to Scopes
. Here, we will be defining the scopes to which this app's tokens have access to. This in turn defines which APIs can be called using the token.
In this tutorial, we'll be using the following endpoints:
That means we'll need the following scopes, which Zoom mentions in the documentation for those endpoints:
cloud_recording:read:list_user_recordings:admin
cloud_recording:read:list_recording_files:admin
6. Next, go to the Activation
section and click Activate your app
. Your app won't provide the access_tokens
without this step.
Before Activation:
After Activation:
Start and complete a meeting
Start a Zoom meeting. Ensure that you have enabled Cloud recording by expanding Options
, checking the option for Automatically record meeting
, and selecting the In the cloud
radio button.
Join the meeting alone, or add more people. Talk for a couple of seconds during the meeting, and then end the meeting.
It will take a few minutes for the recording to show up as Zoom needs to process it. For this article, we'll monitor the Recordings Tab for the recording.
You can also use the
recording.transcript_completed
webhook event to be notified when the recording transcript is available.
Write code to fetch transcripts
Now that we've set up our Server-to-Server OAuth App and completed a meeting, let's write code to fetch details from the Zoom API.
In this section, we'll create a Zoom SDK Python Client that can interact with the Zoom API to fetch transcripts.
Create a Zoom SDK Client
We'll create a simple Zoom SDK Client.
$ mkdir my-zoom-sdk-client
$ cd my-zoom-sdk-client
$ python3 -m venv venv
$ source venv/bin/activate
$ echo "requests" >> requirements.txt
$ pip install -r requirements.txt
Create a file called main.py
and copy the following code:
"""
MyZoomSDKClient
"""
import requests
class MyZoomSDKClient:
# Initialize the client with account_id, client_id, and client_secret
def __init__(self, client_id, client_secret, account_id):
self.client_id = client_id
self.client_secret = client_secret
self.account_id = account_id
self.access_token = None
# Generate a new access token only if one does not exist, or if the user explicitly asks to generate
# (using 'force=True' means generating a new token)
def get_access_token(self, force=False):
if self.access_token and not force:
return self.access_token
access_token_uri = "https://zoom.us/oauth/token"
data = {
"grant_type": "account_credentials",
"account_id": self.account_id,
"client_id": self.client_id,
"client_secret": self.client_secret
}
response = self._make_request(access_token_uri, data=data, method='POST')
return response.json()["access_token"]
def _make_request(self, uri, headers=None, method='GET', data=None):
try:
if method == 'POST':
response = requests.post(uri, headers=headers, data=data)
else:
response = requests.get(uri, headers=headers)
if response.status_code // 100 != 2:
raise Exception('Invalid response from Zoom API. Please try again later.')
return response
except Exception as e:
raise Exception('Invalid response from Zoom API. Please try again later. Error Message: ' + str(e))
__init__
simply initializes our client with the account_id, client_id and client_secret.(Remember we saw these earlier in Step 3 in theCreate a Server-to-Server OAuth App
section?)_make_request
is just a utility function that allows us to call any REST APIs.get_access_token
generates a new access token by calling the Zoom OAuth API.force
is a simple maneuver to ensure a newaccess_token
is generated instead of reusing the earlieraccess_token
.access_token
s expire after a certain time.
Let's see if all works well.
$ python3
>>> CLIENT_ID='' # Put your Client ID
>>> CLIENT_SECRET='' # Put your CLIENT_SECRET
>>> ACCOUNT_ID='' # Put your ACCOUNT_ID
>>> from main import MyZoomSDKClient
>>> client = MyZoomSDKClient(CLIENT_ID, CLIENT_SECRET, ACCOUNT_ID)
>>> access_token = client.get_access_token()
>>> print(access_token)
>>> exit()
List all meetings
We'll modify the code to access the List Recordings API.
GET /users/{userId}/recordings
Update main.py
to call List Meetings and return parsed data. Use the following code:
#...
# Generate authorization header using access_token
def get_authorization_header(self):
return { "Authorization": f"Bearer {self.get_access_token()}" }
# List id and topic of all my meetings
def list_my_meetings(self):
list_meetings_uri = "https://api.zoom.us/v2/users/me/recordings"
headers = self.get_authorization_header()
response = self._make_request(list_meetings_uri, headers=headers)
return [(x['id'], x['topic']) for x in response.json()['meetings']]
#...
Let's see it in action:
$ python3
>>> CLIENT_ID='' # Put your Client ID
>>> CLIENT_SECRET='' # Put your CLIENT_SECRET
>>> ACCOUNT_ID='' # Put your ACCOUNT_ID
>>> from main import MyZoomSDKClient
>>> client = MyZoomSDKClient(CLIENT_ID, CLIENT_SECRET, ACCOUNT_ID)
>>> meetings = client.list_my_meetings()
>>> print(meetings)
>>> exit()
Get Meeting Details
Let's modify the code to access the Get Recording API.
GET "/meetings/{meetingId}/recordings"
Update main.py
to add the following function:
#...
# Get meeting details by ID
def get_meeting_details(self, meeting_id):
if meeting_id in self.meetings:
return self.meetings[meeting_id]
meeting_details_url = f"https://api.zoom.us/v2/meetings/{meeting_id}/recordings"
headers = self.get_authorization_header()
self.meetings[meeting_id] = self._make_request(meeting_details_url, headers=headers).json()
return self.meetings[meeting_id]
#...
Also, add this new property to the client:
class MyZoomSDKClient:
# Initialize the client with account_id, client_id, and client_secret
def __init__(self, client_id, client_secret, account_id):
# ...
self.meetings = {}
Let's run this code to get our meeting details:
$ python3
>>> CLIENT_ID='' # Put your Client ID
>>> CLIENT_SECRET='' # Put your CLIENT_SECRET
>>> ACCOUNT_ID='' # Put your ACCOUNT_ID
>>> from main import MyZoomSDKClient
>>> client = MyZoomSDKClient(CLIENT_ID, CLIENT_SECRET, ACCOUNT_ID)
>>> meetings = client.list_my_meetings()
>>> my_meeting = client.get_meeting_details(meetings[0][0])
>>> print(my_meeting)
>>> exit()
Get Transcriptions
Finally, let's add the following code to fetch the transcripts. Transcripts are available as part of the Get Recording API. The API response contains a recording_files
property which contains all the download_urls
for video recordings, audio recordings, transcripts, and more. We can check the recording_type
to find the type of recording. We are interested in the audio_transcript
as recording_type
.
Modify main.py
to add the following function:
#...
# Get audio transcript for a given meeting
def get_audio_transcript(self, meeting_id):
meeting_details = self.get_meeting_details(meeting_id)
aud_ts_uri = [i['download_url'] for i in meeting_details['recording_files'] if i['recording_type']=='audio_transcript'][0]
headers = self.get_authorization_header()
response = self._make_request(aud_ts_uri, headers=headers)
# response.content contains audio transcript in bytes, so we need to decode it to understand the content.
decoded_content = response.content.decode("utf-8")
final_formatted_content = decoded_content.strip().split('\r\n\r') # format content
# Print the audio transcript
for line in final_formatted_content: print(line)
#...
Let's run the following to fetch the transcripts:
$ python3
>>> CLIENT_ID='' # Put your Client ID
>>> CLIENT_SECRET='' # Put your CLIENT_SECRET
>>> ACCOUNT_ID='' # Put your ACCOUNT_ID
>>> from main import MyZoomSDKClient
>>> client = MyZoomSDKClient(CLIENT_ID, CLIENT_SECRET, ACCOUNT_ID)
>>> meetings = client.list_my_meetings()
>>> client.get_audio_transcript(meetings[0][0])
This is an example output. Your output should be in a similar format:
...
WEBVTT
1
00:00:16.239 --> 00:00:27.079
John: Hi, this is a test for audio transcripts.
2
00:00:28.079 --> 00:00:30.239
Wewake: I am going to say.
3
00:00:30.869 --> 00:00:32.329
Wewake: a few words.
4
00:00:32.629 --> 00:00:41.649
Wewake: I expect this to be recorded and available to me later.
5
00:00:44.779 --> 00:00:46.509
Wewake: That is it.
6
00:00:52.219 --> 00:00:54.909
Wewake: So simple.
7
00:00:56.319 --> 00:01:00.269
Wewake: We are done.
8
00:01:00.749 --> 00:01:01.629
Wewake: Bye.
Bonus: Closed Caption
We can also retrieve the closed captions from the recording_files
. For this, we would need to enable closed caption during the meeting. Here is the code to fetch it:
#...
# Get closed captions
def get_closed_captions(self, meeting_id):
meeting_details = self.get_meeting_details(meeting_id)
cc_uri = [i['download_url'] for i in meeting_details['recording_files'] if i['recording_type']=='closed_caption'][0]
headers = self.get_authorization_header()
response = self._make_request(cc_uri, headers=headers)
# response.content contains closed caption in bytes, so we need to decode it to understand the content.
decoded_content = response.content.decode("utf-8")
final_formatted_content = decoded_content.strip().split('\r\n\r') # format content
# Print the closed captions
for line in final_formatted_content: print(line)
#...
Run it:
$ python3
>>> CLIENT_ID='' # Put your Client ID
>>> CLIENT_SECRET='' # Put your CLIENT_SECRET
>>> ACCOUNT_ID='' # Put your ACCOUNT_ID
>>> from main import MyZoomSDKClient
>>> client = MyZoomSDKClient(CLIENT_ID, CLIENT_SECRET, ACCOUNT_ID)
>>> meetings = client.list_my_meetings()
>>> client.get_closed_captions(meetings[0][0])
Complete Code for the Zoom Client
"""
MyZoomSDKClient
Simple Zoom SDK to get access_token, recordings, transcripts, and closed captions.
for a Zoom meeting (with cloud recording). Requires client_id, client_secret and account_id
"""
import requests
class MyZoomSDKClient:
# Initialize the client with account_id, client_id, and client_secret
def __init__(self, client_id, client_secret, account_id):
self.client_id = client_id
self.client_secret = client_secret
self.account_id = account_id
self.access_token = None
self.meetings = {}
# Generate a new access token only if one does not exist, or if the user explicitly asks to generate
# (using 'force=True' means generating a new token)
def get_access_token(self, force=False):
if self.access_token and not force:
return self.access_token
access_token_uri = "https://zoom.us/oauth/token"
data = {
"grant_type": "account_credentials",
"account_id": self.account_id,
"client_id": self.client_id,
"client_secret": self.client_secret
}
response = self._make_request(access_token_uri, data=data, method='POST')
return response.json()["access_token"]
# Generate authorization header using access_token
def get_authorization_header(self):
return { "Authorization": f"Bearer {self.get_access_token()}" }
# List id and topic of all my meetings
def list_my_meetings(self):
list_meetings_uri = "https://api.zoom.us/v2/users/me/recordings"
headers = self.get_authorization_header()
response = self._make_request(list_meetings_uri, headers=headers)
return [(x['id'], x['topic']) for x in response.json()['meetings']]
# Get meeting details by ID
def get_meeting_details(self, meeting_id):
if meeting_id in self.meetings:
return self.meetings[meeting_id]
meeting_details_url = f"https://api.zoom.us/v2/meetings/{meeting_id}/recordings"
headers = self.get_authorization_header()
self.meetings[meeting_id] = self._make_request(meeting_details_url, headers=headers).json()
return self.meetings[meeting_id]
# Get audio transcript for a given meeting
def get_audio_transcript(self, meeting_id):
meeting_details = self.get_meeting_details(meeting_id)
aud_ts_uri = [i['download_url'] for i in meeting_details['recording_files'] if i['recording_type']=='audio_transcript'][0]
headers = self.get_authorization_header()
response = self._make_request(aud_ts_uri, headers=headers)
# response.content contains audio transcript in bytes, so we need to decode it to understand the content.
decoded_content = response.content.decode("utf-8")
final_formatted_content = decoded_content.strip().split('\r\n\r') # format content
# Print the audio transcript
for line in final_formatted_content: print(line)
# Get closed captions
def get_closed_captions(self, meeting_id):
meeting_details = self.get_meeting_details(meeting_id)
cc_uri = [i['download_url'] for i in meeting_details['recording_files'] if i['recording_type']=='closed_caption'][0]
headers = self.get_authorization_header()
response = self._make_request(cc_uri, headers=headers)
# response.content contains closed caption in bytes, so we need to decode it to understand the content.
decoded_content = response.content.decode("utf-8")
final_formatted_content = decoded_content.strip().split('\r\n\r') # format content
# Print the closed captions
for line in final_formatted_content: print(line)
def _make_request(self, uri, headers=None, method='GET', data=None):
try:
if method == 'POST':
response = requests.post(uri, headers=headers, data=data)
else:
response = requests.get(uri, headers=headers)
if response.status_code // 100 != 2:
raise Exception('Invalid response from Zoom API. Please try again later.')
return response
except Exception as e:
raise Exception('Invalid response from Zoom API. Please try again later. Error Message: ' + str(e))
Limitations of the Zoom Transcript API
You might have noticed how we made some specific choices when we created the Zoom App to get transcription. This is because the native Zoom APIs have several limitations:
1. You need to be on a Paid Plan
The Zoom account hosting the meeting must be on the Pro
, Business
, or Enterprise
tier. The free Basic
plan will not work.
2. You need to enable Zoom Cloud Recording
Transcripts are only produced if the user records their meeting using Zoom Cloud Recording (not Zoom local recording). Zoom also has live captions that appear without recording the meeting, but it does not produce a transcript file after the call unless the meeting is being recorded.
3. Only the Meeting Host can access the transcript
The recorded meeting is only stored in the host's account, so other users cannot access the meeting and transcript details.
4. Meeting Host must enable recording settings for the transcript to be generated
The audio transcript feature in Zoom is disabled by default. You must enable this setting in Zoom for transcription to be produced.
5. Transcripts are only in English
English is the only language supported by Zoom’s transcript feature right now.
6. Transcripts can take a long time to become available
After the meeting has ended, it typically takes about 2 times the duration of the recorded meeting for all the recordings to be available. For example, the transcript of a 30 minute long meeting will be available 1 hour after the meeting is done. Depending on the load on Zoom's servers, processing time can take up to 24 hours.
7. Transcripts are not available in real-time
Zoom Cloud transcripts are only available after the meeting is done, so you won’t be able to get the transcription in real-time.
Workarounds
To workaround the above limitations, there are a few options:
Workaround 1: Use the Zoom RTMP Live-Streaming API
Zoom allows Live Streaming the meeting to custom apps, using the Real-Time Messaging Protocol (RTMP) protocol. This allows you to use a custom app to fetch the audio of the meeting. This audio can then be piped to any 3rd-party transcription service to get real-time transcripts.
Advantages:
- You get real-time transcripts.
- You can get transcripts in any language.
Disadvantages:
- The setup is more complex. You need to set up a live-streaming server.
- You also need to integrate with a 3rd-party service for transcription, which can add to the cost and complexity.
- This also requires a Zoom paid plan.
- This also requires you to be the host.
Workaround 2: Build a Zoom Meeting Bot
Zoom allows meeting bots to access the video and audio contents of the meetings.
Advantages:
- You can get real-time transcripts.
- You don't need a paid Zoom plan.
- You don't need to be the host to access transcriptions.
Disadvantages:
- The meeting bot shows up as a participant in the call, although this is now seen as acceptable by many.
- Developing a meeting bot is quite complex since Zoom does not provide an API to call to build one.
- There is a scale problem. Meeting bots require a separate instance of a Zoom client for each meeting. As the amount of meetings grows, this can become extremely difficult to manage. Also, it may blow up the infrastructure costs. It typically takes 6-12 months to build a stable meeting bot in its most basic form.
The major disadvantages here can be overcome by using an out-of-the-box API provider for meeting bots like Recall.ai. Recall.ai handles the running of the VMs. All the user needs to do is call an API endpoint to launch a bot. Zoom released a Meeting Bot Starter Kit in partnership with Recall.ai that makes it much easier to get started.
Conclusion
We looked at how to programmatically access Zoom transcripts using the Zoom API, several limitations of the Zoom API, and workarounds you can use to overcome those limitations.