Why use Recall.ai’s Transcription API?
Recall.ai is the transcription API for conversations. Recall.ai’s Transcription API captures audio and video and delivers accurate, speaker-labeled transcripts for conversations, whether they happen on Zoom, Google Meet, Microsoft Teams or in person, in real time or after the conversation.
Get more than transcripts, delivered in real-time, or after the conversation
Get transcripts in real-time or async from your preferred transcription engine with accurate speaker attribution across conversations.

One API, many transcription engines

Get transcripts with timestamps, metadata, and more
Recall.ai returns transcripts with timestamps, conversation metadata, and participant information so you can build features like speaker timelines, contextual transcript views, automated participant follow-ups, and AI workflows that populate systems like CRMs with stakeholder names.

Perfect speaker diarization

Transcription across multiple languages
Recall.ai supports transcription in many languages by letting you select the transcription provider based on language support. This lets you handle conversations in different languages through the same API, without managing multiple integrations.
Capture transcripts from video, phone, and in-person meetings
Recall.ai handles the hardest parts of transcription: capturing clean, structured audio from conversations and turning it into accurate, speaker-labeled transcripts. Record conversations from Zoom, Google Meet, Microsoft Teams, Webex, Slack Huddles, and in-person meetings.
Build with conversation data using Recall.ai’s Transcription API
Frequently asked questions
A transcription API converts speech into text so applications can work with conversations programmatically, such as meetings, calls, interviews, or recordings.
A transcription API lets teams focus on building features that rely on transcripts, rather than building and maintaining speech-to-text systems themselves.
Recall.ai supports multiple transcription providers, including AWS Transcribe, Recall.ai Transcription, Rev, Deepgram, AssemblyAI, Google Speech-to-Text. See the full list of third-party providers in our docs.
Transcription APIs are used to turn conversations into structured data. Common use cases include live captions, AI copilots and note taking, sales coaching tools, recruiting products that update applicant tracking systems, and in-person interviews or user research.
Transcription APIs often return structured JSON with timestamps and speaker labels. Recall.ai’s Transcription API outputs JSON.
Yes. Recall.ai Transcription provides real-time transcription for live use cases as well as async transcription.
Use high-quality audio at 16kHz or higher, enable speaker diarization for multi-speaker conversations, provide custom vocabulary or context for domain-specific terms, and select models matched to your use case. With Recall.ai you can test out different transcription models in your meetings to see which will work best for your use case and language(s).
If you use Recall.ai, there’s no infrastructure to manage. With a single API call, Recall.ai handles capturing audio across supported video conferencing platforms and sends the audio to the transcription provider you’ve chosen. If you bring your own transcription provider, you may need to pass your provider’s API keys. If you were to run transcription yourself, many transcription APIs run on standard servers, support Linux and Docker or Kubernetes for scaling, and offer SDKs in languages like Python, Go, C++, or Java.
Many transcription APIs allow integration of custom models, adaptation of language and acoustic models for specific terminology, and addition of features like PII redaction for privacy compliance. Check with your transcription provider for specific customizability questions.
A transcription API processes audio, applies speech recognition models to convert speech to text, and returns structured output such as transcripts with timestamps and speaker labels. Transcription models either use deterministic methods or models to tackle features like speaker diarization and timestamps, then return formatted text as JSON, TXT, or SRT.
OpenAI's Whisper API is commonly used for those who just need a transcript. For cases that require more than basic transcription, providers like AssemblyAI, Recall.ai Transcription, Deepgram, Rev, Gladia, AWS Transcribe, and Google Speech-to-Text offer additional features like speaker identification and enhanced accuracy.
Accuracy varies based on audio quality, language, and model choice. Recall.ai allows you to test and switch between transcription models to find the best fit for your use case. For the most accurate speaker diarization in transcripts, use Recall.ai’s Transcription API.
Many transcription APIs support speaker identification through machine diarization models like Pyannote to automatically label different speakers. Recall.ai’s Transcription API is the only offering that supports perfect diarization and speaker labeling.
With Recall.ai’s Transcription API you get speaker diarized transcripts along with video, audio, metadata and more just by calling a single api endpoint. You can subscribe to webhook events for live transcription or to fetch the transcript immediately after the meeting.
Advanced transcription APIs, like Recall.ai’s Transcription API, include features like speaker identification, timestamps, real-time transcription, multilingual support, and webhooks for automated notifications.
Real-time transcription processes streaming audio input and delivers text output instantly, making it ideal for cases in which you need transcripts as the words are spoken such as in live call coaching and incident management tools.
Yes. Many transcription APIs support multiple languages and automatic language detection. Recall.ai supports multiple languages depending on the transcription provider you choose.
Some transcription providers support features like redacting sensitive personal information from transcripts.
Transcription APIs convert customer calls into text to create a record of all calls, pull out common themes in customer requests, monitor service quality, and train representatives. In apps they can also pass transcripts to AI agents to help diagnose issues.
In healthcare, transcription APIs enable real-time transcription of consultations and patient notes into electronic health records, reducing administrative burden and improving record accuracy for better patient care.
Apps in the legal space use transcription APIs to automate transcription of virtual testimonies and proceedings, speeding up document production, minimizing errors, and enabling quick searches for better case management.
Transcription APIs enable written records of virtual tutoring.
Real-time transcription provides instant text live use cases, while batch transcription processes pre-recorded audio asynchronously, ideal for any post-meeting case.
In recruiting, transcription APIs create transcripts of screening calls to keep an objective record of the call. These transcriptions can be analyzed later both for sentiment and patterns.
