There are several ways to programmatically get recordings from Zoom, each with different levels of effort, cost, and maintenance. In this guide, I’ll walk through one of the simplest options: using an API to add an external bot to a Zoom meeting to get MP4 video and MP3 audio recordings.

If you’re interested in getting transcripts from Zoom, check out our blog on how to get transcripts from Zoom or, for a full exploration of the options to get transcripts, see our blog on all of the APIs to get transcripts from Zoom.

For a full exploration of the options to get recordings from Zoom, you can jump to the appendix.

Architecture overview

architecture

The dashed arrows in the architecture diagram are things you might choose to do as logical next steps.

This blog outlines a minimal backend prototype showing how to get Zoom recordings.

A sneak peek at the end result

https://www.loom.com/share/cd2c1024fd894463be5a2e5890603904?sid=8d627520-852e-42f1-90a3-37165d1b3ae3

The demo shared above is a more full-featured app that includes a bot joining a Zoom meeting, streaming transcripts in real time, displaying the video recording after the call, and providing a link to download the video. If you want to take a look at the code, you can check out my sample repo which covers both recordings, as mentioned in this blog post, and transcripts, as mentioned in our how to get transcripts from Zoom blog post. In this blog I will go over the backend necessary to power an app like this, but will not be covering the frontend, transcripts, or database.

Prerequisites

Before getting started, make sure you have the following:

A Recall.ai API key (which you can generate for free and store in your .env file)
A publicly accessible URL to receive webhooks (e.g. via ngrok during development)
A basic backend server capable of:
Sending HTTP POST requests (to start the bot)
Receiving HTTP POST requests (to handle webhooks)
Performing authenticated fetch() requests to Recall.ai’s API to retrieve media

Step 1: Setting up your backend to launch bots and receive meeting data

In order to get meeting data, you first must send a Zoom bot into the meeting. You’ll need to pass a Zoom meeting link to the backend (startRecall.ts). In the backend, a unique externalId is generated to tie the session together. The backend then passes the Zoom meeting link to the Create Bot endpoint to launch a bot.

Here’s an example of how to call the Create Bot endpoint with a POST request from a Node.js backend using fetch():

//@title pages/api/startRecall.ts
const botResp = await fetch(`https://${RECALL_REGION}.recall.ai/api/v1/bot`, {
  method: 'POST',
  headers: { authorization: process.env.RECALL_API_KEY },

body: JSON.stringify({
  meeting_url : zoomLink,
  external_id : externalId,
  metadata    : { external_id: externalId },
  webhook_url : 'https://your-app/api/webhook',

  recording_config: {
    video_mixed_mp4   : {},
    audio_mixed_mp3   : {},
  }
})
})

As you can see, when calling this endpoint, you’ll need to provide:

Meeting URL: The meeting the bot will join
Webhook URL: Which you’ll use to subscribe to events
Recording configurations: The recording types you want (video, audio, mixed, individual, etc) and when you want it (post meeting or real-time)

Once the bot is created, store the bot ID and external ID from the metadata to make it easier to retrieve recordings later. You’ll also want to copy the webhook URL which you’ll use to subscribe to events later.

To receive the necessary webhooks, you’ll also need to set up a webhook listener for the events you specified in the bot config above. You'll handle the webhook events in a webhook.ts file, using conditionals to check for each event type you're interested in as shown below:

//@title pages/api/webhook.ts
import type { NextApiRequest, NextApiResponse } from 'next'
import http from 'http'
import { prisma } from '../../lib/prisma'

export const config = { api: { bodyParser: false } }

export default async function handler (
  req: NextApiRequest,
  res: NextApiResponse
) {
  const chunks: Uint8Array[] = []
  for await (const c of req) chunks.push(c)
  const raw    = Buffer.concat(chunks)
  const rawStr = raw.toString('utf8')

  let ev: any
  try { ev = JSON.parse(rawStr) }
  catch { res.status(400).end('Invalid JSON'); return }
// Handle post-call recording metadata for recordings
  if (ev.recording?.id) {
    const externalId = ev.external_id
    const meeting = await prisma.meeting.findUnique({ where: { externalId } })
    if (!meeting) return res.status(404).end('Meeting not found')
//Store recording metadata (e.g. to retrieve MP4/MP3 later via download_url)
    await prisma.meeting.update({
      where: { id: meeting.id },
      data : {
        recordingId: ev.recording.id,
        videoUrl   : ev.video_url ?? null
      }
    })
    return res.status(200).end('Post-call stored')
  }
  res.status(200).end('No action')
}

Once this endpoint is running, you’ll start receiving webhooks for both real-time updates and final media notifications.

To subscribe to the change status webhook events, copy the webhook URL that you saved previously and navigate to the Recall.ai dashboard. From there go to Webhooks in the sidebar and then select + Add Endpoint. Paste your webhook URL in the Endpoint URL field. Scroll down to Subscribe to Events, select bot.status_change, and click Create. By subscribing to the bot.status_change events you’ll be notified of when the bot is recording, when it’s processing the recording, and when it’s finished and the data is ready to be fetched.

webhook flow

Step 2: Receiving recordings post-call

architecture2

Receiving video recordings post-call

Though the API supports streaming audio and video in real time, video and audio recordings (MP4/MP3) are available only after the meeting ends. Once you receive the bot.done Status Change event, you can retrieve the video recording using the Retrieve Video Mixed endpoint by specifying the recording_id.

Here is an example of a call to the Retrieve Video Mixed endpoint:

const videoUrl = `https://us-east-1.recall.ai/api/v1/video_mixed?recording_id=RECORDING_ID_HERE`

const res = await fetch(videoUrl, {
  headers: {
    'Authorization': `Bearer ${process.env.RECALL_API_KEY ?? ''}`,
    'Accept': 'application/json'
  }
})

if (!res.ok) {
  throw new Error(`Failed to fetch video: ${res.statusText}`)
}

const data = await res.json()
const downloadUrl = data.results?.[0]?.data?.download_url

As always, make sure to specify your region in the request (us-east-1 is just an example).

Sample response:

{
  "id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "recording": {
    "id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "metadata": {
      "additionalProp": "string"
    }
  },
  "created_at": "2025-08-01T04:05:14.009Z",
  "status": {
    "code": "processing",
    "sub_code": "string",
    "updated_at": "2025-08-01T04:05:14.009Z"
  },
  "metadata": {
    "additionalProp": "string"
  },
  "data": {
    "download_url": "string"
  },
  "format": "mp4"
}

You can use the download_url to access the video after the call.

Receiving audio recordings post-call

To retrieve audio, follow the same steps as above, but call the Retrieve Audio Mixed endpoint instead of the Retrieve Video Mixed endpoint.

For a full list of available recording configurations (e.g., video_separate, audio_separate), see the appendix.

Bonus: Receiving real-time video and audio streams

You can also get real-time video and audio streams. Real-time streams are made available via RTMP for observer room use cases, or via websocket for automated processing, such as facial recognition or deepfake detection.

Check out the appendix to see each option in detail.

Why did I choose this implementation?

As mentioned earlier, this is one of many ways to get Zoom recordings. So why did I choose this method?

No media infrastructure required: Recall.ai handles ingestion, recording, and processing, so you don’t need to manage WebRTC or Zoom SDKs.

Real-time streaming available: Get raw audio and video frames via WebSocket or webhook with minimal set up.

Flexible output formats: Supports mixed/per-participant recordings, raw audio streams, and RTMP.

Participant metadata available: Always includes speaker names, roles, and join/leave events.

Cross-platform support: Works with Zoom, Google Meet, Microsoft Teams, etc using the same API. List of all supported platforms.

Minimal backend logic: Your app only needs to manage bot sessions and store metadata.

Why you might choose another option

We use Recall.ai, a third-party API to offload the majority of the engineering work in this option. This might not be the ideal method if you are:

Cost Conscious: A third-party API like Recall.ai has additional costs based on usage.
Don’t want vendor reliance: Third party APIs can be a sticking point for some companies.
Don’t want a bot in the meeting: If you prefer not to have a bot in the meeting, you might want to consider:
The Desktop Recording SDK (API Docs) is the most discrete option for capturing in-person or on-screen conversations across any platform.
Meeting Direct Connect connects directly via Zoom’s Real-Time Media Streams API, without showing a bot in the meeting. Setup requires you to create a Zoom Marketplace app, obtain an RTMS-enabled webhook, and have your users OAuth the Zoom app.

Other options to get recordings from Zoom

Now that we’ve walked through the simplest Recall.ai-based API option in detail, we’ve also listed all the other options to get recordings from Zoom you might consider.

Conclusion

By this point, you’ve explored the primary options for capturing recordings from Zoom and you’ve walked through how to get recordings from Zoom using Recall.ai’s Zoom Meeting Bot API. With a minimal backend and no media infrastructure, you now have a working prototype that gets post-call recordings from Zoom. Hopefully this has provided a jumping off point for whatever you’re building. If you have any questions we’re always happy to help, and if you want to build anything we’ve gone over in the tutorial, you can sign up and try Recall.ai for free.

Methods to get recordings from Zoom

While Recall.ai is the best fit if you want flexibility, speed, and reliability, here are all the other viable options you might consider.

Alternative 1: Zoom Cloud Recording API

Why you’d use this

You want a Zoom native way to fetch recordings without building bots or custom infrastructure. You’re comfortable waiting until well after the meeting ends.

How it works

Meeting hosts must enable Cloud Recording in Zoom.
Host of the meeting presses the record button during each meeting
After a call ends and Zoom processes it, the recording.completed webhook fires.
Your backend fetches MP4/MP3 files via Zoom’s REST API using the recordingId.

Benefits

Zoom native: This is a first-party Zoom integration. Because it’s managed by Zoom, you don’t need to worry about maintaining bots, browser automation, or custom media infrastructure. Zoom is responsible for ingesting, encoding, and storing the recordings, and you fetch the finished files.
No additional participant: Recordings are generated directly by Zoom, not by a bot joining the meeting. This makes it useful in cases where an in-meeting participant isn’t desirable or allowed.
Variety of meeting artifacts: Depending on account settings, recordings can include not just MP4/MP3 media, but also chat logs, participant lists, and other metadata that can be valuable for post-meeting workflows.

Drawbacks

Delayed, post-meeting availability: Recordings are only accessible after the meeting ends, and Zoom requires additional time to process them. There is no way to access live media through this method. Availability typically lags behind the meeting’s duration—for example, a 1-hour meeting may take another hour or more before the MP4 is ready.
Host setup and action required: Cloud Recording must be enabled at the account or user level, and the host (or a participant with recording privileges) has to remember to start the recording during the meeting. If you are not the host you cannot record the meeting.
Access restrictions: Only hosts or users with appropriate permissions (via Zoom OAuth scopes) can retrieve recordings, which limits usefulness in cross-organization contexts.
Paid plan requirement: Cloud Recording is only available on Zoom’s paid plans. This means that any end users who are on free-tier will not be able to access this functionality.
OAuth + app review: Users have to connect their Zoom accounts via OAuth, which can add onboarding friction. In addition, you’ll need to publish a Zoom app and go through Zoom’s review process before using it in production. The review process typically takes around 4 weeks.

Who should use this

Teams building for Zoom admins/hosts who already use cloud recording and are okay with async delivery.

Alternative 2: Custom live streaming (RTMP)

Why you’d use this

You need to view or record a meeting without bots, using Zoom’s built-in streaming features.

How it works

Configure live streaming: The host enables Custom Live Streaming in Zoom and points it to your RTMP endpoint. This can be done in account settings or on a per-meeting basis.
The user needs to start the stream during the meeting: By default, the host needs to manually start live streaming from the Zoom UI once the meeting is running. Until the stream is started, no media will be sent. Alternatively, live streaming can be pre-configured in Zoom settings or triggered via the API to avoid the manual step.
Push during the meeting: When the meeting starts, Zoom sends a single program feed (mixed audio and video) to your RTMP server (e.g., Nginx-RTMP, Cloudflare, Mux).
Ingest the stream: Your server receives the feed in real time and can either transcode it, pipe it into another service, or rebroadcast it.
Record or process: Store the RTMP feed or process it live.

Benefits

Continuous feed during the meeting: Zoom pushes audio and video to your RTMP server as the meeting happens, suitable for rebroadcasting or delayed-live monitoring.
Straightforward server-side recording: Once ingested, the stream can be recorded using standard RTMP tools, with no custom Zoom media pipeline required.
Ideal for meetings where bots aren’t appropriate: Since the stream comes directly from Zoom, nothing joins the meeting. This can be useful in cases where an in-meeting bot isn’t desired.

Drawbacks

Host action required: Someone must start the stream each time, unless pre-configured in meeting templates or account settings. This adds manual steps which are prone to human error.
Single mixed feed: RTMP provides one composite audio/video stream only. You can’t capture per-participant audio or video tracks for finer-grained analysis. You don’t get speaker names using RTMP so you’ll need to use machine diarization to separate speakers.
Host-controlled layout and quality: The stream’s layout (speaker view, gallery view), resolution, and bitrate are dictated by the host’s settings and Zoom account level. This limits flexibility if you need standardized outputs.
Paid account + settings required: The host must be on a paid Zoom plan, and the “Allow livestreaming of meetings” setting has to be enabled. If it’s disabled or locked at the group/account level, users will need admin support to turn it on. The good news is that this only needs to be done once per account.
Live streaming badge: Zoom displays a “live streaming” badge to all participants for compliance reasons. This cannot be removed and may cause some participants to feel uncomfortable.
High latency: RTMP typically has 10–30s of latency. This is fine for most use cases, but not suitable if you need near real-time audio.
Limited metadata: RTMP streams don’t include per-word timestamps or detailed meeting artifacts. If your product depends on timestamped transcripts (e.g., click-to-seek playback), you’ll need to build that layer yourself.

Who should use this

Best for rebroadcast use cases (such as observer rooms) or when you want an audio/video feed without running bots and do not need near real-time audio.

Alternative 3: Build your own bot

Why you’d use this

You need direct, per-participant access to Zoom’s underlying audio and video streams.

How it works

Join as a bot client: You build a custom Zoom client with the Meeting SDK (Windows, macOS, Linux). This “raw data bot” joins the meeting as a participant, but without UI.
Enable raw data mode: The SDK exposes special callbacks that provide raw audio (PCM) and video (YUV) frames.
Consume the data: In your client code you can encode the streams for storage, pipe them to your backend for processing, or render the data in your UI.
Manage distribution: Since you’re effectively shipping a Zoom-powered client, your app must handle packaging, deployment, and updates across platforms.

Benefits

Direct raw media: Unlike artifacts or RTMP feeds, you get the actual PCM/YUV frames, which means you can choose your own codecs, quality settings, or processing pipeline.
Per-participant streams: Audio and video are available per user instead of only as a mixed feed. This is critical for accurate transcription, speaker diarization, AI-driven analytics, or training data where you need to separate who said what.
Real-time processing: Since frames arrive live, you can run transcription, sentiment analysis, or computer vision models as the meeting happens rather than waiting for post-call artifacts.
Works on free Zoom plans: Unlike cloud recording or RTMP, bots can record meetings even if the host is on a free Zoom account.
Non-host recording: Bots can capture the meeting even if your user is not the host, as long as the bot is admitted into the meeting.
Low latency: Bots connect directly to the meeting, typically delivering media with 200–500 ms latency—fast enough for real-time analysis.
Consistent cross-platform UX: The bot-based recording experience is consistent across Zoom, Meet, Teams, and Webex (though you need separate implementations per platform).

Drawbacks

Client ownership: You must distribute and maintain a Zoom-based app across OSes. That means packaging, updates, and support which is a much heavier lift than calling an API.
High engineering overhead: Handling raw media requires you to manage encoding, synchronization, and storage, often with GPU/CPU optimizations. This is far more complex than downloading Zoom’s processed recordings.
Significant build effort: Building a stable, scalable recording bot can take a skilled engineering team a year or more. Each platform (Zoom, Meet, Teams, Webex) requires a separate implementation.
Ongoing maintenance burden: Meeting platforms frequently update their SDKs and behaviors. Keeping bots reliable at scale often requires several full-time engineers to monitor, patch, and operate the infrastructure.
Operating costs: Each bot runs as a full Zoom client instance in your infrastructure, which can be expensive to host until you reach economies of scale.
Consent UX burden: Unlike Zoom’s built-in recording, raw data capture isn’t visible to participants by default. You’ll need to design your own notifications and consent flows to stay compliant with privacy laws.

Who should use this

This option is best for teams who need per-participant audio and video, are comfortable managing media encoding and compliance.

If you’re interested in building your own bot, check out the blog we wrote on how to build a zoom bot.

Alternative 4: Local desktop recording

Why you’d use this

You want a local, discrete form factor that captures Zoom calls directly on a user’s device without relying on bots or Zoom’s APIs.

How it works

Capture video: The app records the Zoom meeting window (or screen region) directly.
Capture audio: System audio is captured via OS APIs (e.g., WASAPI on Windows, ScreenCaptureKit on macOS) or loopback drivers (e.g., BlackHole, VB-Cable).
Capture speaker names and metadata: Scrape speaker names and metadata or accessibility APIs if available.
Record or stream: The app either saves recordings locally (MP4, WAV, etc.) or streams the captured media to a backend for processing and storage.
Run locally: Everything happens on the user’s device, with no external Zoom integration or bot participant involved.

Benefits

No Zoom admin setup: You don’t need to enable cloud recording or configure Zoom APIs, so it works even in organizations with restrictive Zoom settings.
User-controlled: Can be distributed as a lightweight companion app that end users install and run. This avoids dependency on Zoom Marketplace approvals or admin permissions.

Drawbacks

Installation + permissions: Requires users to install audio loopback drivers or grant OS-level screen/audio capture permissions. These can be blocked by enterprise IT policies or require elevated privileges, which adds friction to adoption.
Reliability + quality risks: Recordings depend on the user’s environment. If they close the Zoom window, change displays, or misconfigure audio, the capture may fail or be incomplete. Video resolution is tied to the size of the captured window, not the underlying stream.
Compliance burden: Unlike Zoom’s built-in recording, there’s no automatic “recording” banner visible to other participants. You must implement your own consent notifications to stay compliant with local privacy laws.
Cross-platform & environment variability: Desktop apps run in highly varied end-user environments due to the variety of different OSes (macOS, Windows, Linux), versions, hardware (CPUs/GPUs), drivers, and enterprise security tools. Supporting all of this increases the surface area for bugs and raises the cost of testing, packaging, auto-updates, and support.
Battery + performance impact: Continuous video and audio capture can drain laptop batteries quickly, especially on long meetings. High CPU and GPU usage for real-time encoding may cause fans to spin up, apps to slow down, or the device to overheat. This can degrade both recording quality and the user’s overall experience. While it’s possible to optimize by using efficient codecs, leaning on built-in hardware for video encoding and by adjusting recording quality based on system load, those optimizations are tough engineering problems to solve reliably across all platforms.

Who should use this

This option is best for teams that need a non-bot, local form factor. It’s useful when working with organizations that don’t want meeting bots or API dependencies but are comfortable with users installing a desktop companion app. These teams are willing to distribute and maintain a desktop app, and are comfortable handling local capture quirks like audio drivers and OS permissions.

If you like the desktop app form factor but want to skip the complexity of audio drivers, screen capture APIs, and cross-OS packaging, Recall.ai offers a Desktop Recording SDK that provides transcripts with speaker names, audio, and video directly from the user’s device. It records meetings reliably without requiring bots or third-party drivers, making it a faster way to productionize this approach.

Recording formats accessible using the Meeting Bot API

Recall.ai delivers recordings in real time or after the call, depending on your bot configuration. Delivery happens through webhooks, WebSocket, or by polling the API.

Recordings

Format	Timing	Access	Notes	API Docs URL
Mixed `MP4` / `MP3`	Post-call	`/video_mixed`, `/audio_mixed` APIs	Combined audio/video of all participants	https://docs.recall.ai/reference/video_mixed_retrieve
Speaker-separated (Audio/Video)	Real-time or Post-call	`/video_separate`, `/audio_separate` APIs or WebSocket	Per-participant media; use real-time for analysis tools	https://docs.recall.ai/docs/how-to-get-separate-videos-per-participant-async https://docs.recall.ai/docs/how-to-get-separate-audio-per-participant-async
RTMP (Live)	Real-time	RTMP stream or `video_mixed_flv.data` via WebSocket	Ideal for live rebroadcast or human viewing	https://docs.recall.ai/docs/stream-real-time-video-rtmp
Raw Audio / Video Frames	Real-time	WebSocket	Raw 16kHz PCM audio and 360p PNG video frames (\~2fps)	https://docs.recall.ai/docs/real-time-audio-protocol https://docs.recall.ai/docs/real-time-video

How to get recordings from Zoom using a Zoom recording bot API

Architecture overview

A sneak peek at the end result

Prerequisites

Step 1: Setting up your backend to launch bots and receive meeting data

Step 2: Receiving recordings post-call

Receiving video recordings post-call

Receiving audio recordings post-call

Bonus: Receiving real-time video and audio streams

Why did I choose this implementation?

Why you might choose another option

Other options to get recordings from Zoom

Conclusion

Methods to get recordings from Zoom

Alternative 1: Zoom Cloud Recording API

Why you’d use this

How it works

Benefits

Drawbacks

Who should use this

Alternative 2: Custom live streaming (RTMP)

Why you’d use this

How it works

Benefits

Drawbacks

Who should use this

Alternative 3: Build your own bot

Why you’d use this

How it works

Benefits

Drawbacks

Who should use this

Alternative 4: Local desktop recording

Why you’d use this

How it works

Benefits

Drawbacks

Who should use this

Recording formats accessible using the Meeting Bot API

Recordings

Other Common Questions

Can I download a Zoom recording if I am not the host?

Where are my Zoom recordings being saved?

Is there any way to download a Zoom recording?

Can I use a browser extension to get recordings from Zoom?

What is the difference between a Zoom recording bot and using Recall.ai's Meeting Bot API to get recordings from Zoom?

How can I list Zoom recordings?

How can I retrieve Zoom recordings?