Try Cognitive Services for free
Sign-in to Continue
You're almost ready to start building with your 7-day free evaluation.
Sign-in with your preferred account to get started
Enable in-person meeting transcription. Conversation transcription captures speech in real time so that all meeting participants can fully engage in the discussion, identify who said what, when, and quickly follow up on next steps.
Use conversation transcription to:
See it in action
An error occurred while loading this demo, please wait and try again
This demo is incompatible with your browser. For best experience, please use a different browser.
Want to build this?
Convert spoken audio to text. Call the API to recognize audio coming from the microphone, from other real-time streaming audio sources, or from a recorded audio file. As audio is sent to the server, partial recognition results are returned if requested.
You can use the API to build voice-triggered smart apps. Try the demo to see how it works. Select your target language, then click on the microphone and start speaking. Or simply click on one of the sample speech phrases.*
See it in action
To try out the demo with your own voice using a microphone, please change to a different browser with WebRTC support, for example a recent version of Microsoft Edge, Firefox or Chrome.Microphone access was rejected.
Custom speech service: Speech Transcription with Custom Model
Overcome speech recognition barriers such as speaking style, vocabulary, and background noise. Our speech recognition technologies combine multiple APIs to produce the text output. Customers can customize the APIs to their needs and available data.
See it in action
Create custom language models tailored to users’ speaking styles
Don’t let varied vocabularies and speaking styles block understanding. Customize the language model of your app’s speech recognition by tailoring it to your industry expressions, technical, geography or market terms, and even speaker style.
Adapt to user environment with custom acoustic models
Make sure your app’s speech recognition can function in all environments. With custom acoustic models, you can account for background noise and match your users’ expected environments.
Use robust speech models from Microsoft
Enable powerful, personalized speech recognition by building your own customized speech recognition models on top of Microsoft’s existing state-of-the-art models.
Explore a speech scenario
With Speech Services, it's easy to transcribe every call. Index the transcription for full-text search, or apply Text Analytics to detect sentiment, language, and key phrases for insights. If your call center recordings involve specialized terminology, such as product names or IT jargon, create a custom language model to teach Speech Services the vocabulary. A custom acoustic model helps Speech Services understand speakers even with background noise or poor phone connections.
For more information, read how batch transcription works with Speech Services.
- 1 Adapt a model for your domain and deploy that model
- 2 Upload your recordings to a blob container
- 3 Create a POST request to batch transcription
- 4 Speech Services schedules the transcription job
- 5 Stereo files are split into two channels
- 6 Mono files undergo diarization to distinguish between speakers
- 7 Download the transcription using the transcription ID
Explore the Cognitive Services APIs
Ink Recognizer PREVIEW
An AI service that recognizes digital ink content, such as handwriting, shapes, and ink document layout
Get rich insights to help build compelling image applications on the device of your choice.
Enrich your experiences by identifying and augmenting entity information from the web
Use the Speech Devices SDK to build an ambient device and create a custom wake wordLearn more