One common way to build a bot that automatically joins Microsoft Teams meetings and pulls data like transcripts or recordings is to use browser automation. Developers typically reach for tools like Puppeteer, Playwright, or Selenium to drive the Microsoft Teams web client the same way a human participant would.

In this post we’ll focus on Puppeteer, but the general ideas apply to other automation tools as well. If you're exploring different approaches, check out our guide on How to build a Microsoft Teams bot, which walks through several implementation options and results in a functional app with a GitHub repo.

Puppeteer can control the Microsoft Teams web client and since the Graph API can be tricky to build with and has limited functionality for most teams, many Microsoft Teams bots rely on browser automation to participate in meetings and extract data from the page.

This approach can work, but it comes with reliability and maintenance tradeoffs that are important to understand before committing to it.

What a Puppeteer Microsoft Teams bot actually is

A Puppeteer Microsoft Teams bot runs a browser and selects items the same way a human would interact with the Microsoft Teams web client.

The bot typically behaves like a normal participant in the meeting:

Opens a Microsoft Teams meeting link
Joins the meeting through the browser
Enables captions
Watches the page for caption updates
Sends transcript data to another service
Leaves when the meeting ends

Because this relies on automating the Microsoft Teams web client instead of calling an API, the bot depends heavily on the structure of the Microsoft Teams interface. Changes to the DOM, login flows, permission prompts, or join screens can break the automation.

Running these bots in production usually means monitoring them and updating selectors or flows when Microsoft rolls out UI changes.

How developers build a Puppeteer Microsoft Teams bot

Under the hood, the implementation usually looks something like this:

Launch a Chromium browser with Puppeteer
Open the Microsoft Teams meeting URL
Adjust query parameters to reach the browser join flow
Enter a display name and click Join meeting
Enable captions once inside the meeting
Watch the caption container for changes and extract text
Send caption updates to a backend service

Most of the complexity comes from handling the different join states in Microsoft Teams. A bot might enter a waiting room, be denied entry, or encounter permission prompts before it ever reaches the meeting.

This is why most implementations rely on DOM selectors, timeouts, and retry logic to determine whether the bot successfully joined the meeting.

Four ways teams try to get transcripts from Microsoft Teams

There are a few different ways developers approach transcript extraction from Microsoft Teams meetings. Each option has different tradeoffs in complexity, reliability, and transcript quality.

1. Scraping live captions

The most common browser-automation approach is scraping captions directly from the Microsoft Teams interface.

The bot joins a meeting and watches the caption container in the DOM for updates. When new text appears, it sends it to a backend service.

For example, we built a similar bot using Playwright that joins a Microsoft Teams meeting and streams captions in real time. The implementation is available in the open-source Microsoft Teams Meeting Bot repository.

Captions can work well enough for prototypes, but they aren’t designed as a transcription API. Words can be dropped, overlapping speech is often missed, and speaker labels aren’t always reliable. Captions are also rendered deep inside the Microsoft Teams DOM, so small UI changes can break scraping logic.

2. Capturing audio and transcribing it

Another option is capturing the meeting audio and sending it to a speech-to-text system.

This can produce more accurate transcripts than captions, but it adds complexity. Capturing system audio reliably can be tricky in containerized environments, and storing audio introduces privacy and compliance considerations.

Transcript quality also depends heavily on how clean the captured audio is.

3. Using the Microsoft Graph API

Microsoft provides an official way to retrieve transcripts through the Microsoft Graph API.

This works by allowing a Microsoft Teams app to fetch transcripts after a meeting has ended. The main limitation is that the app must be installed on the tenant hosting the meeting.

If a user joins a meeting hosted by another organization, that organization must have installed and approved the app beforehand. In practice this makes it difficult to capture transcripts from many real-world meetings.

4. Using a meeting recording API

Some teams use third-party meeting bot APIs that send bots to meetings and handle the join flow, recording, and transcription.

This avoids most of the browser automation work, but usually comes with usage-based pricing.

For teams that don’t want to maintain automation infrastructure, this can significantly reduce operational overhead.

What breaks when building with Puppeteer

Teams building Microsoft Teams bots with Puppeteer or similar tools usually run into the same issues early:

Meeting links redirect through launcher pages before the browser join flow
Join flows vary between meetings, including waiting rooms and approval screens
The Microsoft Teams web client can render different DOM structures depending on browser or rollout
Caption elements are deeply nested and dynamically updated
Each bot requires a full browser instance, which makes scaling expensive

None of these are unusual problems. They’re simply the kinds of things that come with automating a web interface instead of using a supported API.

When a Puppeteer Microsoft Teams bot makes sense

A Puppeteer Microsoft Teams bot can make sense if you’re experimenting, building a prototype, or running a small number of bots.

It may also make sense for teams that intentionally want to run and control the full meeting bot infrastructure themselves.

But browser automation is only one way to build a Teams bot. Tools like Playwright or Selenium are often used for the same purpose, and in some cases may be easier to maintain depending on the project.

Once bots start running at scale, most of the work shifts to infrastructure: launching browsers, monitoring bots, retrying failed joins, and keeping up with UI changes in the Teams client.

If owning the infrastructure is not core to your product, it makes sense to use a meeting bot API for Microsoft Teams.

Conclusion

A Puppeteer Microsoft Teams bot is browser automation on top of the Microsoft Teams web client.

It can work, but reliability depends on handling UI changes, join flows, caption behavior, and the overhead of running many browser instances.

Whether you build your own solution or use a hosted meeting bot service usually comes down to how much engineering time you want to spend maintaining automation versus building product features.

Common questions about Puppeteer Microsoft Teams bots

Can Puppeteer join Microsoft Teams meetings automatically?

Yes. Puppeteer can automate the Microsoft Teams web client to open a meeting link, enter a display name, and click the join button. From there it can enable captions or capture other data from the page.

Is Puppeteer the best way to build a Microsoft Teams bot?

In many cases, no. The best way to build a Microsoft Teams bot for the vast majority of teams is to use a meeting bot API. Puppeteer is one option for browser automation, but many teams use tools like Playwright or Selenium instead when opting to build themselves.

Is there an official Microsoft Teams bot API for joining meetings?

Microsoft provides APIs through Microsoft Graph, but they do not allow arbitrary bots to join meetings in the same way a browser participant can. Because of this, many Microsoft Teams bots rely on browser automation.

Written By:

Maggie Veltri

Table of Contents