Transcription API

Accurate transcription for conversations on video conferencing platforms and in person, in real time or after the conversation

Talk to a Human

Try Transcription API

Why use Recall.ai’s Transcription API?

Recall.ai is the transcription API for conversations. Recall.ai’s Transcription API captures audio and video and delivers accurate, speaker-labeled transcripts for conversations, whether they happen on Zoom, Google Meet, Microsoft Teams or in person, in real time or after the conversation.

"Recall.ai powers our Notetaker recordings. Perfect diarization across video conferencing platforms allows us to deliver accurate, speaker-labeled transcripts to our customers. That same attention to product quality carries through to their team. They’ve been a true partner, proactive and supportive, always bringing thoughtful ideas and helping our plans come to life."

Galya Dimitrova

Get more than transcripts, delivered in real-time, or after the conversation

Get transcripts in real-time or async from your preferred transcription engine with accurate speaker attribution across conversations.

One API, many transcription engines

Pick between Recall.ai’s transcription engine or one of our transcription providers like AssemblyAI and Deepgram using a single API. Recall.ai’s Transcription API abstracts away the complexity of integrating with each provider so you can easily switch engines based on accuracy, latency, cost, or language needs.

Get transcripts with timestamps, metadata, and more

Recall.ai returns transcripts with timestamps, conversation metadata, and participant information so you can build features like speaker timelines, contextual transcript views, automated participant follow-ups, and AI workflows that populate systems like CRMs with stakeholder names.

Perfect speaker diarization

Get 100% accurate speaker identification out of the box. Recall.ai is the only transcription provider that can reliably diarize conversations with accurate speaker names across all major video conferencing platforms.

Transcription across multiple languages

Recall.ai supports transcription in many languages by letting you select the transcription provider based on language support. This lets you handle conversations in different languages through the same API, without managing multiple integrations.

Capture transcripts from video, phone, and in-person meetings

Recall.ai handles the hardest parts of transcription: capturing clean, structured audio from conversations and turning it into accurate, speaker-labeled transcripts. Record conversations from Zoom, Google Meet, Microsoft Teams, Webex, Slack Huddles, and in-person meetings.

Incident management tool

AI meeting copilot

Stenographer

Medical scribe

Live sales coaching

Interview notetaker

Meeting notetaker

Task tracker

Build with conversation data using Recall.ai’s Transcription API

"Integrating with Recall.ai was seamless. Recall.ai Transcription gave us accurate transcripts immediately and reliably. Because they support more transcription providers than any other platform we also had the flexibility to figure out which provider worked for our needs."

Raunak Surana

Frequently asked questions

What is a transcription API?

A transcription API converts speech into text so applications can work with conversations programmatically, such as meetings, calls, interviews, or recordings.

Why use a transcription API?

A transcription API lets teams focus on building features that rely on transcripts, rather than building and maintaining speech-to-text systems themselves.

What transcription providers does Recall.ai support?

Recall.ai supports multiple transcription providers, including AWS Transcribe, Recall.ai Transcription, Rev, Deepgram, AssemblyAI, Google Speech-to-Text. See the full list of third-party providers in our docs.

What can I build with a transcription API?

Transcription APIs are used to turn conversations into structured data. Common use cases include live captions, AI copilots and note taking, sales coaching tools, recruiting products that update applicant tracking systems, and in-person interviews or user research.

What output formats do transcription APIs typically support?

Transcription APIs often return structured JSON with timestamps and speaker labels. Recall.ai’s Transcription API outputs JSON.

Do transcription APIs support real-time transcription?

Yes. Recall.ai Transcription provides real-time transcription for live use cases as well as async transcription.

How can I improve transcription accuracy?

Use high-quality audio at 16kHz or higher, enable speaker diarization for multi-speaker conversations, provide custom vocabulary or context for domain-specific terms, and select models matched to your use case. With Recall.ai you can test out different transcription models in your meetings to see which will work best for your use case and language(s).

What are the technical requirements for running a transcription API?

If you use Recall.ai, there’s no infrastructure to manage. With a single API call, Recall.ai handles capturing audio across supported video conferencing platforms and sends the audio to the transcription provider you’ve chosen. If you bring your own transcription provider, you may need to pass your provider’s API keys. If you were to run transcription yourself, many transcription APIs run on standard servers, support Linux and Docker or Kubernetes for scaling, and offer SDKs in languages like Python, Go, C++, or Java.

Can I customize transcription models for specific terminology?

Many transcription APIs allow integration of custom models, adaptation of language and acoustic models for specific terminology, and addition of features like PII redaction for privacy compliance. Check with your transcription provider for specific customizability questions.

How does a transcription API work?

A transcription API processes audio, applies speech recognition models to convert speech to text, and returns structured output such as transcripts with timestamps and speaker labels. Transcription models either use deterministic methods or models to tackle features like speaker diarization and timestamps, then return formatted text as JSON, TXT, or SRT.

What are some popular transcription API options?

OpenAI's Whisper API is commonly used for those who just need a transcript. For cases that require more than basic transcription, providers like AssemblyAI, Recall.ai Transcription, Deepgram, Rev, Gladia, AWS Transcribe, and Google Speech-to-Text offer additional features like speaker identification and enhanced accuracy.

How accurate are transcription APIs?

Accuracy varies based on audio quality, language, and model choice. Recall.ai allows you to test and switch between transcription models to find the best fit for your use case. For the most accurate speaker diarization in transcripts, use Recall.ai’s Transcription API.

Do transcription APIs support speaker identification?

Many transcription APIs support speaker identification through machine diarization models like Pyannote to automatically label different speakers. Recall.ai’s Transcription API is the only offering that supports perfect diarization and speaker labeling.

How do I integrate a transcription API into my application?

With Recall.ai’s Transcription API you get speaker diarized transcripts along with video, audio, metadata and more just by calling a single api endpoint. You can subscribe to webhook events for live transcription or to fetch the transcript immediately after the meeting.

What are the key features of advanced transcription APIs?

Advanced transcription APIs, like Recall.ai’s Transcription API, include features like speaker identification, timestamps, real-time transcription, multilingual support, and webhooks for automated notifications.

How does real-time transcription work?

Real-time transcription processes streaming audio input and delivers text output instantly, making it ideal for cases in which you need transcripts as the words are spoken such as in live call coaching and incident management tools.

Can transcription APIs handle multiple languages?

Yes. Many transcription APIs support multiple languages and automatic language detection. Recall.ai supports multiple languages depending on the transcription provider you choose.

What privacy features do transcription APIs offer?

Some transcription providers support features like redacting sensitive personal information from transcripts.

What are common use cases for transcription APIs in customer service?

Transcription APIs convert customer calls into text to create a record of all calls, pull out common themes in customer requests, monitor service quality, and train representatives. In apps they can also pass transcripts to AI agents to help diagnose issues.

How do transcription APIs benefit the healthcare industry?

In healthcare, transcription APIs enable real-time transcription of consultations and patient notes into electronic health records, reducing administrative burden and improving record accuracy for better patient care.

How are transcription APIs used in legal practices?

Apps in the legal space use transcription APIs to automate transcription of virtual testimonies and proceedings, speeding up document production, minimizing errors, and enabling quick searches for better case management.

Can transcription APIs support education?

Transcription APIs enable written records of virtual tutoring.

What's the difference between real-time and batch transcription APIs?

Real-time transcription provides instant text live use cases, while batch transcription processes pre-recorded audio asynchronously, ideal for any post-meeting case.

Are transcription APIs useful in recruiting?

In recruiting, transcription APIs create transcripts of screening calls to keep an objective record of the call. These transcriptions can be analyzed later both for sentiment and patterns.