If you prefer to jump straight to the code, here's the Github repo for the app we're about to discuss.
Have you ever been sitting in a meeting and completely zoned out? It’s easy enough to lose focus when you're in person, but something about remote meetings makes it feel almost impossible to remain completely focused. Maybe you meant to quickly scan your email notifications or just found yourself staring at the wall, but once you’ve come back to reality you have no idea what your boss was just talking about. We’ve all been there (I think). There has to be a better way.
Large language models are everywhere right now, and one area they really excel is extracting insights from large bodies of text. If we can figure out a way to feed our meeting data into a model like OpenAI's GPT-4, it can listen to the meeting and make sure we didn't miss any crucial context while we were zoning out. The hard part here is actually figuring out a way to get this conversation data from the meeting into the model. Luckily, there's an API for that! Recall.ai will take care of sending a bot to our meetings and automatically extracting a transcript for our large language model of choice to work with. This gives us all the tools we need to build our very own meeting notetaker.
The Stack
We’ll be using Recall.ai’s API to handle sending a bot to the Zoom meeting, and OpenAI’s API to analyze the raw meeting transcript and turn it into something useful.
All we really need on the frontend is a place to enter a Zoom meeting URL and a place to display the meeting action items when the meeting is over. For this, we’ll use the classic single-page React app.
We’ll use Node.js and Express for our backend server, which will handle all interaction with OpenAI and Recall.ai.
Lastly, we’re going to want to listen to webhooks from Recall.ai so we can get the transcript of the conversation in real time and know when the meeting is over. Since this app is being developed locally, we’ll be using Ngrok to facilitate the process of working with webhooks.
Deploying the Bot
Zoom is pretty careful with who it lets on its platform, so we’ll need to register the bot as a Zoom App first. Don’t run away yet! Recall.ai has a comprehensive guide on how to set this up, and it only takes about five minutes of setup before Zoom supplies you with the credentials you’ll use to authenticate against their API.
After you’re done with that, you’re a single API call away from deploying a bot to your Zoom meeting room.
const response = await axios.post(
`https://${config.recallRegion}.recall.ai/api/v1/bot`,
{
bot_name: "ZoomBot",
meeting_url: meetingUrl,
},
{
headers: {
Authorization: `Token ${config.recallApiKey}`,
Accept: "application/json",
"Content-Type": "application/json",
},
}
);
Great! But all we have right now is a bot awkwardly listening in on the conversation and not contributing anything. The first thing we can do to fix this is to help the bot understand what we’re saying. We can give it access to the transcript of our meeting by hooking it in to Zoom’s speech-to-text meeting captions feature. This is as simple as adding a transcription_options
parameter to the API call.
const response = await axios.post(
`https://${config.recallRegion}.recall.ai/api/v1/bot`,
{
bot_name: "ZoomBot",
meeting_url: meetingUrl,
**transcription_options: {
provider: "meeting_captions",
},**
},
...
);
Now I can view that bot in Recall.ai’s dashboard and see that it’s generating transcripts and saving them. With our data being captured successfully, we can move on to the fun part: using AI to figure out what actually happened in this meeting.
Retrieving the Conversation
After the conversation ends, we can configure Recall.ai to send a “done” webhook to our backend server, indicating that the conversation is ready for analysis. Then we can retrieve the transcript using Recall.ai’s Get Bot Transcript endpoint.
const transcriptResponse = await axios.get(
`https://${config.recallRegion}.recall.ai/api/v1/bot/${data.bot_id}/transcript`,
{
headers: {
Authorization: `Token ${config.recallApiKey}`,
Accept: "application/json",
"Content-Type": "application/json",
},
}
);
const transcript = transcriptResponse.data;
We’ll pass the entire conversation as a prompt to GPT-4o, OpenAI’s (current) flagship model. This model has a context window of 128,000 tokens, which translates to roughly 240 pages of a book (assuming 400 words per page). So unless your conversation went way, way, longer than it should have, GPT has you covered. We’ll ask the model to return a meeting_data
object that contains a list of participants on the call as well as their associated action items.
const extractActionItems = async (meetingTranscript) => {
const response = await openai.chat.completions.create({
model: "gpt-4-turbo",
response_format: { type: "json_object" },
messages: [
{
role: "system",
content: `
You will be provided with a meeting transcript. For each participant in the meeting, you must extract
at least one action item. Action items are short, concise statements that describe a task that needs to be completed.
Format the output as a JSON array where each object represents a participant and their action items.
Transcript:
${meetingTranscript}
Output format:
{
meeting_data: [
{
"user": "participant name",
"action_items": ["action item 1", "action item 2", ...]
},
{
"user": "participant name",
"action_items": ["action item 1", "action item 2", ...]
},
...
]
}
`,
},
],
});
const data = JSON.parse(response.choices[0].message.content).meeting_data;
return data;
};
This will give us the important aspects of the meeting that we really want to remember. Now we can use server-sent-events (SSE) to pipe this output back to the frontend to be displayed.
And just like that, we have our own meeting notetaker that will ensure we don't miss any more important takeaways from our meetings. If you want to try this out yourself, the code is freely available on Github, so you should be able to get up and running quickly. Maybe even in time for your next meeting…