More resources

How WebSockets cost us $1M on our AWS bill

How I Built a Google Meet Bot from Scratch

7 APIs to get Zoom transcripts

Speaker Labels and Speaker Diarization Explained

How to Get Transcripts from Google Meet: Developer Edition

Go To Blog

Tutorials

How to get recordings and transcripts from Zoom via API

Updated at:

August 8, 2025

Written By:

Maggie Veltri

Table of Contents

Architecture overview
A sneak peek at the end result
Prerequisites
Step 1: Setting up your backend to launch bots and receive meeting data
Step 2: Receiving transcripts in real-time and post-call
- Getting the transcript in real-time
- Getting the transcript post-call
Step 3: Receiving recordings post-call
- Receiving video recordings post-call
- Receiving audio recordings post-call
Bonus: Receiving real-time video and audio streams
Why did I choose this implementation?
Why you might choose another option
Conclusion
Appendix
- Transcript & recording formats accessible using the Meeting Bot API
- Storing transcripts in Postgres

There are several ways to programmatically get recordings and transcripts from Zoom, each with different levels of effort, cost, and maintenance. In this guide, I’ll walk through one of the simplest options: using an API to add an external bot to a Zoom meeting to capture meeting data.

What you’ll get:

Real-time transcripts via webhook events
Structured transcripts via the API endpoint after the meeting ends
MP4 video and MP3 audio recordings accessible via url

For a full exploration of the options to get transcripts, see our blog on all of the APIs to get transcripts from Zoom.

Architecture overview

overview diagram

The dashed arrows in the architecture diagram are things you might choose to do as logical next steps.

This blog outlines a minimal backend prototype showing how to get Zoom transcripts and recordings.

A sneak peek at the end result

Demo Video

The demo shared above is a more full-featured app that includes a bot joining a Zoom meeting, streaming transcripts in real time, displaying the video recording after the call, and providing a link to download the video. If you want to take a look at the code, you can check out my sample repo. In this blog I will go over the backend necessary to power an app like this, but will not be covering the frontend and database.

Prerequisites

Before getting started, make sure you have the following:

Clone the github repo
A Recall.ai API key (which you can generate for free and store in your .env file)
A publicly accessible URL to receive webhooks (e.g. via ngrok during development)
A basic backend server capable of:
Sending HTTP POST requests (to start the bot)
Receiving HTTP POST requests (to handle webhooks)
Performing authenticated fetch() requests to Recall.ai’s API to retrieve media

Step 1: Setting up your backend to launch bots and receive meeting data

Before we get started, make sure you've cloned the repo. Once that's done, to get meeting data, you first must send a Zoom bot into the meeting. You’ll need to pass a Zoom meeting link to the backend (startRecall.ts). In the backend, a unique externalId is generated to tie the session together. The backend then passes the Zoom meeting link to the Create Bot endpoint to launch a bot.

Here’s an example of how to call the Create Bot endpoint with a POST request from a Node.js backend using fetch():

//@title pages/api/startRecall.ts
const botResp = await fetch(`https://${RECALL_REGION}.recall.ai/api/v1/bot`, {
  method: 'POST',
  headers: { authorization: process.env.RECALL_API_KEY },

body: JSON.stringify({
  meeting_url : zoomLink,
  external_id : externalId,
  metadata    : { external_id: externalId },
  webhook_url : 'https://your-app/api/webhook',

  recording_config: {
    video_mixed_mp4   : {},
    audio_mixed_mp3   : {},
    transcript: { provider: { meeting_captions: {} } },
    realtime_endpoints: [
      {
        type : 'webhook',
        url  : 'https://your-app/api/webhook',
        events: ['transcript.data', 'transcript.partial_data']
      }
    ]
  }
})
})

As you can see, when calling this endpoint, you’ll need to provide:

Meeting URL: The meeting the bot will join
Webhook URL: Where real-time transcripts will be sent
Transcript configurations: The transcription provider you want and the latency you want
Recording configurations: The recording types you want (video, audio, mixed, individual, etc) and when you want it (post meeting or real-time)

For this tutorial, I used meeting_captions as the transcription provider since it’s free, but you could swap in a different transcription provider.

Once the bot is created, store the bot ID and external ID from the metadata to make it easier to retrieve transcripts and recordings later. You’ll also want to copy the webhook URL which you’ll use to subscribe to events later.

To receive the necessary webhooks, you’ll also need to set up a webhook listener for the events you specified in the bot config above. You'll handle the webhook events in a webhook.ts file, using conditionals to check for each event type you're interested in as shown below:

//@title pages/api/webhook.ts
import type { NextApiRequest, NextApiResponse } from 'next'
import http from 'http'
import { prisma } from '../../lib/prisma'

export const config = { api: { bodyParser: false } }

export default async function handler (
  req: NextApiRequest,
  res: NextApiResponse
) {
  const chunks: Uint8Array[] = []
  for await (const c of req) chunks.push(c)
  const raw    = Buffer.concat(chunks)
  const rawStr = raw.toString('utf8')

  let ev: any
  try { ev = JSON.parse(rawStr) }
  catch { res.status(400).end('Invalid JSON'); return }
  // Handle real-time finalized transcript events
  if (ev.event === 'transcript.data') {
    const words = ev.data?.data?.words ?? []
    const externalId = ev.data?.bot?.metadata?.external_id
    const text = words.map((w:any) => w.text).join(' ').trim()
    if (!externalId || !text) return res.status(204).end()
    const meeting = await prisma.meeting.findUnique({ where: { externalId } })
    if (!meeting) return res.status(404).end('Meeting not found')

    await prisma.transcript.create({
      data: {
        meetingId: meeting.id,
        speaker: ev.data?.data?.participant?.name ?? 'Unknown speaker',
        text,
        timestamp: new Date(words[0]?.start_timestamp?.absolute ?? new Date())
      }
    })

    // Optional: emit to WebSocket server for real-time UI updates
    http.request(
      {
        host: 'localhost', port: 4000, path: '/send', method: 'POST',
        headers: { 'content-type': 'application/json' }
      },
      r => r.resume()
    ).end(JSON.stringify({
      externalId,
      message: {
        type: 'transcript',
        payload: { text, speaker: ev.data?.data?.participant?.name }
      }
    }))
    return res.status(200).end('Transcript saved')
  }
  // Handle post-call recording metadata for transcripts and recordings
  if (ev.recording?.id) {
    const externalId = ev.external_id
    const meeting = await prisma.meeting.findUnique({ where: { externalId } })
    if (!meeting) return res.status(404).end('Meeting not found')
    // Store post-call transcript if available
    if (ev.transcript) {
      await prisma.transcript.create({
        data: {
          meetingId: meeting.id,
          text:      ev.transcript,
          speaker:   'Unknown',
          timestamp: new Date()
        }
      })
    }
    //Store recording metadata (e.g. to retrieve MP4/MP3 later via download_url)
    await prisma.meeting.update({
      where: { id: meeting.id },
      data : {
        recordingId: ev.recording.id,
        videoUrl   : ev.video_url ?? null
      }
    })
    return res.status(200).end('Post-call stored')
  }
  res.status(200).end('No action')
}

Once this endpoint is running, you’ll start receiving webhooks for both real-time updates and final media notifications.

To subscribe to the change status webhook events, copy the webhook URL that you saved previously and navigate to the Recall.ai dashboard. From there go to Webhooks in the sidebar and then select + Add Endpoint. Paste your webhook URL in the Endpoint URL field. Scroll down to Subscribe to Events, select bot.status_change, and click Create. By subscribing to the bot.status_change events you’ll be notified of when the bot is recording, when it’s processing the recording, and when it’s finished and the data is ready to be fetched.

Webhook Subscribe

Step 2: Receiving transcripts in real-time and post-call

There are two ways to receive transcripts: during the meeting (real-time) or after the meeting ends (post-call).

Getting the transcript in real-time

Real-Time Transcription Diagram

To capture transcripts in real-time, your backend listens for webhook events sent as raw JSON via HTTP POST.

While the meeting is in progress, Recall.ai emits transcript.data webhook events with finalized segments of speech.

{
  "event": "transcript.data",
  "data": {
    "bot": {
      "metadata": {
        "external_id": "meeting-abc123"
      }
    },
    "data": {
      "words": [
        {
          "text": "Welcome everyone",
          "start_timestamp": {
            "absolute": "2025-07-30T17:23:00.123Z"
          },
          "end_timestamp": {
            "absolute": "2025-07-30T17:23:02.456Z"
          }
        }
      ],
      "participant": {
        "name": "Alice",
        "platform": "Zoom"
      }
    }
  }
}

If you're using a framework like Next.js or Express, be sure to disable the default body parser and manually parse the raw body to access the payload.

Each event includes:

external_id to identify the meeting
The transcript segment (text) within the words array in data
Timestamps for each word (also in the words array)
Speaker name and platform

While this guide doesn’t cover data storage or routing after you gather it, you can choose to log, process, or forward these messages to another system depending on your app’s goals.

You can also configure Recall.ai to send partial transcript results that are lower latency. These appear in the form of transcript.partial_data events. These are not final and may be revised later. In this guide, I don’t use transcript.partial_data.

Real-time events are ideal for building real-time coaching features, AI agents that provide feedback during the meeting, or live notetaking products.

Getting the transcript post-call

Post-call Transcription Diagram

After the call ends, a Bot Status Change event with the status bot.done signals that the full structured transcript is available for download via the API.

{
  "event": "bot.done",
  "data": {
    "data": {
      "code": string,
      "sub_code": string | null,
      "updated_at": string
    },
    "bot": {
      "id": string,
      "metadata": object
    }
  }
}

This signals that the full structured transcript is now ready to be retrieved. It does not include the transcript itself.

Take the bot ID from the JSON provided in the object sent via the bot.done webhook event and retrieve the bot using the Retrieve Bot endpoint:

//@title Retrieve Bot Endpoint
const options = {method: 'GET', headers: {accept: 'application/json'}};

fetch('https://us-east-1.recall.ai/api/v1/bot/id/', options)
  .then(res => res.json())
  .then(res => console.log(res))
  .catch(err => console.error(err));

You will see a list of recordings, and inside each you’ll find a download_url.

Note: Transcripts are nested under recordings because they are an artifact from a meeting.

Here is an example of the response you can expect to get from the Retrieve Bot Endpoint:

//@title Example Response
{
  "id": "2e8ec5a4-a608-4ea0-b9d0-2a12aec3f0f0",
  "recordings": [
    {
      "id": "c9d15f08-83a4-4d83-ad1b-3e311bb0f253",
      "media_shortcuts": {
        "transcript": {
          "id": "b2c3c59f-45a4-43a8-9b0f-1bb79ba33580",
          "data": {
            "download_url": "https://us-east-1.recall.ai/api/v1/download/transcript?token=..."
          },
          "provider": {
            "meeting_captions": {}
          }
        },
        ...
      }
    }
  ],
  ...
}

Once you have the download_url, use the url to get the structured transcript JSON:

const downloadUrl = "https://us-east-1.recall.ai/api/v1/download/transcript?token=..."
const response = await fetch(downloadUrl, {
  method: 'GET',
  headers: {
    'Authorization': `Bearer ${process.env.RECALL_API_KEY ?? ''}`,
    'Accept': 'application/json',
  }
})

This returns JSON with full meeting transcription data which uses the following schema:

[
  {
    "participant": {
      "id": number, // Id of the participant in the meeting. This id is not unique across meetings.
      "name": string | null, // Display name of the participant.
      "is_host": boolean | null, // Whether the participant is the host of the meeting.
      "platform": string | null, // Meeting platform constant
      "extra_data": json | null, // Extra data about the participant from the meeting platform.
    },
    "words": [
      {
        "text": string, // The text of the word.
        "start_timestamp": {
                    "absolute": number, // seconds
          "relative": number // seconds
        },
        "end_timestamp": {
                    "absolute": number, // seconds
          "relative": number // seconds
        }
      }
    ]
  }
]

Post-call events give you structured, finalized transcripts after the meeting is done.

To see all of the configuration options for transcription, see the appendix.

Step 3: Receiving recordings post-call

Recording Diagram

Receiving video recordings post-call

Video and audio recordings (MP4/MP3) are available only after the meeting ends. Once you receive the bot.done Status Change event, you can retrieve the video recording using the Retrieve Video Mixed endpoint by specifying the recording_id.

Here is an example of a call to the Retrieve Video Mixed endpoint:

const videoUrl = `https://us-east-1.recall.ai/api/v1/video_mixed?recording_id=RECORDING_ID_HERE`

const res = await fetch(videoUrl, {
  headers: {
    'Authorization': `Bearer ${process.env.RECALL_API_KEY ?? ''}`,
    'Accept': 'application/json'
  }
})

if (!res.ok) {
  throw new Error(`Failed to fetch video: ${res.statusText}`)
}

const data = await res.json()
const downloadUrl = data.results?.[0]?.data?.download_url

As always, make sure to specify your region in the request (us-east-1 is just an example).

//@title Sample response:
{
  "id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "recording": {
    "id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "metadata": {
      "additionalProp": "string"
    }
  },
  "created_at": "2025-08-01T04:05:14.009Z",
  "status": {
    "code": "processing",
    "sub_code": "string",
    "updated_at": "2025-08-01T04:05:14.009Z"
  },
  "metadata": {
    "additionalProp": "string"
  },
  "data": {
    "download_url": "string"
  },
  "format": "mp4"
}

You can use the download_url to access the video after the call.

Receiving audio recordings post-call

To retrieve audio, follow the same steps as above, but call the Retrieve Audio Mixed endpoint instead of the Retrieve Video Mixed endpoint.

For a full list of available recording configurations (e.g., video_separate, audio_separate), see the appendix.

Bonus: Receiving real-time video and audio streams

You can also get real-time video and audio streams. Real-time streams are made available via RTMP for observer room use cases, or via websocket for automated processing, such as facial recognition or deepfake detection.

Why did I choose this implementation?

As mentioned earlier, this is one of many ways to get Zoom transcripts and recordings. So why did I choose this method?

No media infrastructure required: Recall.ai handles ingestion, recording, and processing, so you don’t need to manage WebRTC or Zoom SDKs.

Real-time streaming available: Get transcripts, raw audio, and video frames via WebSocket or webhook with minimal set up.

Flexible output formats: Supports mixed/per-participant recordings, raw audio streams, and RTMP.

Flexible transcription provider support: Use meeting captions, AssemblyAI, Deepgram, etc., via a single API; swap providers without minimal configuration changes.

Speaker diarization available: You’re able to get speaker names no matter which transcription provider you use. Depending on your configuration, you can also get perfect diarization.

Rich participant metadata available: Always includes speaker names, roles, and join/leave events.

Cross-platform support: Works with Zoom, Google Meet, Microsoft Teams, etc using the same API. List of all supported platforms.

Minimal backend logic: Your app only needs to manage bot sessions and store metadata.

Why you might choose another option

We use Recall.ai, a third-party API to offload the majority of the engineering work in this option. This might not be the ideal method if you are:

Cost Conscious: A third-party API like Recall.ai has additional costs based on usage.
Don’t want vendor reliance: Third party APIs can be a sticking point for some companies.
Don’t want a bot in the meeting: If you prefer not to have a bot in the meeting, you might want to consider:
- The Desktop Recording SDK (API Docs) is the most discrete option for capturing in-person or on-screen conversations across any platform.
- Meeting Direct Connect connects directly via Zoom’s Real-Time Media Streams API, without showing a bot in the meeting. Setup requires you to create a Zoom Marketplace app, obtain an RTMS-enabled webhook, and have your users OAuth the Zoom app.

Conclusion

With just a minimal backend and no media infrastructure, you now have a working prototype that captures real-time transcripts, post-call transcripts, and post-call recordings from Zoom. Hopefully this has provided a jumping off point for whatever you’re building. If you have any questions we’re always happy to help, and if you want to build anything we’ve gone over in the tutorial, you can sign up and try Recall.ai for free.

Appendix

Transcript & recording formats accessible using the Meeting Bot API

Recall.ai delivers transcripts and recordings in real time or after the call, depending on your bot configuration. Delivery happens through webhooks, WebSocket, or by polling the API.

Transcripts

Type	Delivery Method	Notes	API Docs URL
Real-time	`transcript.data` via Webhook or WebSocket	Finalized segments with word-level timing	https://docs.recall.ai/docs/bot-real-time-transcription
Partial (real-time)	`transcript.partial_data` via Webhook	Low-latency previews; not final	https://docs.recall.ai/docs/bot-real-time-transcription#partial-results
Post-call	`GET /transcript/{id}`	Full structured transcript after receiving `bot.done` webhook	https://docs.recall.ai/reference/bot_retrieve

Supports meeting captions (free) and transcription providers
Includes diarization with speaker names, and timestamps

Recordings

Format	Timing	Access	Notes	API Docs URL
Mixed `MP4` / `MP3`	Post-call	`/video_mixed`, `/audio_mixed` APIs	Combined audio/video of all participants	https://docs.recall.ai/reference/video_mixed_retrieve
Speaker-separated (Audio/Video)	Real-time or Post-call	`/video_separate`, `/audio_separate` APIs or WebSocket	Per-participant media; use real-time for analysis tools	https://docs.recall.ai/docs/how-to-get-separate-videos-per-participant-async https://docs.recall.ai/docs/how-to-get-separate-audio-per-participant-async
RTMP (Live)	Real-time	RTMP stream or `video_mixed_flv.data` via WebSocket	Ideal for live rebroadcast or human viewing	https://docs.recall.ai/docs/stream-real-time-video-rtmp
Raw Audio / Video Frames	Real-time	WebSocket	Raw 16kHz PCM audio and 360p PNG video frames (\~2fps)	https://docs.recall.ai/docs/real-time-audio-protocol https://docs.recall.ai/docs/real-time-video

Storing transcripts in Postgres

For apps storing meeting metadata and transcripts, here’s a minimal schema to get started:

model Meeting {
  id         String   @id @default(cuid())
  externalId String   @unique
  userId     String
  botId      String
  videoUrl   String?
  createdAt  DateTime @default(now())
  transcript Transcript[]
}

model Transcript {
  id        String   @id @default(cuid())
  meetingId String
  speaker   String
  text      String
  timestamp DateTime
}

Meeting tracks session metadata like user, bot, and external ID
Transcript stores finalized speech segments
You can extend the schema as needed (e.g. add durationMs, speakerRole, etc.)

Written By:

Maggie Veltri

Table of Contents

Architecture overview
A sneak peek at the end result
Prerequisites
Step 1: Setting up your backend to launch bots and receive meeting data
Step 2: Receiving transcripts in real-time and post-call
- Getting the transcript in real-time
- Getting the transcript post-call
Step 3: Receiving recordings post-call
- Receiving video recordings post-call
- Receiving audio recordings post-call
Bonus: Receiving real-time video and audio streams
Why did I choose this implementation?
Why you might choose another option
Conclusion
Appendix
- Transcript & recording formats accessible using the Meeting Bot API
- Storing transcripts in Postgres