• Documentation
  • Pricing
  • Training Explore free online learning resources from videos to hands-on-labs
  • Blog Read the latest posts from the Azure team
  • Free account

    Text to Speech

    Convert text to speech to create more natural, accessible interfaces

    Speak human, not robot

    Build apps and services that speak to users naturally, improving accessibility and usability. Convert text to audio in near real time, play it back, and save it as a file for later use. Text to Speech is available in both Neural and Standard versions.

    Applying the latest in digital speech innovation, the Neural Text to Speech capability makes the voices of your apps nearly indistinguishable from recordings of people. The natural inflection and clear articulation significantly reduce listening fatigue when interacting with AI systems. Use Neural Text to Speech to make interactions with chatbots and virtual assistants more natural and engaging, to convert digital text such as e-books into audiobooks, and to enhance in-car navigation systems.

    Neural Text to Speech in action

    English (US): Jessa

    Sentence
    Voice Sample
    The third type, a logarithm of the unsigned fold change, is undoubtedly the most tractable.
    As the name suggests, the original submarines came from Yugoslavia.
    This is easy enough if you have an unfinished attic directly above the bathroom.

    English (US): Guy

    Sentence
    Voice Sample
    Susan Candiotti reports they've given up their trip.
    Carol knows my lifestyle.
    The seagrass fiber is tough, durable, and smooth.

    Chinese (CN): Xiaoxiao

    Sentence
    Voice Sample
    您好,欢迎致电客服中心。我是华北地区的客服人员,工号0165。请问有什么可以帮您?
    想和你表白,试了一万种方式,找了一千次时机,但都放弃了,最终只能原地踏步。
    负责人Michael透露,新推出的紧凑型SUV搭载了智能的音响系统,可以语音控制volume大小。不过,车身的整体造型还是个secret。

    German (DE): Katja

    Sentence
    Voice Sample
    Bestimmte Berufsgruppen sind nur noch schwer zu rekrutieren.
    Sein Gedicht steckt voller Übertreibungen, die für den Schriftsteller allerdings typisch sind.
    Er organisiert eine Unterstützung der schwächeren durch die stärksten Bundesländer.

    Italian (IT): Elsa

    Sentence
    Voice Sample
    Tenete conto di un fattore importante.
    Alcuni prodotti in gran parte sono di buona qualità.
    Crisi? Vietato rilassarsi, siamo ancora in emergenza.

    Want to build this?

    The Standard Text to Speech capability speaks to users in multiple languages. Choose from more than 75 voices in over 45 languages or locales, including options for male and female voices. Adjust parameters such as speed, pitch, volume, pronunciation, and additional pauses.

    Standard Text to Speech in action

    To see how speech synthesis works, click Play.*

    Language
    Sample Text
    Voice Sample
    English (US)
    An airport spokesman said more than 110 planes were damaged by hail.
    Chinese (CN)
    广告收入的比例高达90%以上
    Japanese (JP)
    皆様のご協力のたまものと
    German (DE)
    Der Anstieg der Verbraucherpreise in der Eurozone verlangsamt sich weiter.
    Spanish (ES)
    El alcalde de Santiago convoca a los medios para inaugurar dos semáforos.
    Turkish (ES)
    Tren durduğu sırada vagonun ortasında bir patlama meydana geldi.

    Want to build this?

    Text to Speech with custom voice models

    Do you need to give your voice agent a unique, recognizable brand voice? The Text to Speech voice customization feature makes it easy to create one-of-a-kind, voice-enabled apps, with no expertise required.

    See it in action

    Language

    Quality

    Sample Text
    Voice Sample

    Want to start building your own voice model?

    Voice models made easy

    To customize your voice agent, simply record and upload training data, and the service creates a unique voice font tuned to your recording. Start a proof of concept with a small amount of data. The system scales seamlessly as your data increases, enhancing the natural voice quality.

    Consistent and integrated

    Custom voice models are fully integrated with other Cognitive Services speech services. No coding is required, and you can easily deploy your customized voice model to the API.

    Fast and secure

    Through a unique API endpoint and the secure authentication management, you can plug in your voice fonts quickly across all platforms. Your models are under your control.

    Explore a Speech Scenario

    Intelligent kiosk

    Speech services combined with Language Understanding enables apps and users to interact naturally. Use Speech to Text to capture a user’s question, Language Understanding to parse intent and formulate an appropriate reply, and Text to Speech to synthesize the text into a spoken response. Create conversational interfaces for various scenarios like banking, travel, and entertainment.

    Commerce chatbotTogether, the Azure Bot Service and Language Understanding service enable developers to create conversational interfaces for various scenarios like banking, travel and entertainment. For example, a hotel’s concierge can use a bot to enhance traditional e-mail and phone call interactions by validating a customer via Azure Active Directory and using Cognitive Services to better contextually process customer requests using text and voice. The Speech recognition service can be added to support voice commands.1234567
    1. Overview
    2. Flow

    Commerce chatbot

    Overview

    Together, the Azure Bot Service and Language Understanding service enable developers to create conversational interfaces for various scenarios like banking, travel, and entertainment. For example, a hotel’s concierge can use a bot to enhance traditional e-mail and phone call interactions by validating a customer via Azure Active Directory and using Cognitive Services to better contextually process customer requests using text and voice. The Speech recognition service can be added to support voice commands.

    Flow

    1. 1 Customer uses your mobile app
    2. 2 Using Azure AD B2C, the user authenticates
    3. 3 Using the custom Application Bot, user requests information
    4. 4 Cognitive Services helps process the natural language request
    5. 5 Response is reviewed by customer who can refine the question using natural conversation
    6. 6 Once the user is happy with the results, the Application Bot updates the customer’s reservation
    7. 7 Application insights gathers runtime telemetry to help development with Bot performance and usage

    "ROOBO is an AI solution provider. Now with Microsoft’s world leading Text to Speech technology, we are able to provide the best custom voice building service to our customers."

    Yu Lei: CTO, roobo

    Explore the Cognitive Services APIs

    Computer Vision

    Distill actionable information from images

    Face

    Detect, identify, analyze, organize, and tag faces in photos

    Ink Recognizer PREVIEW

    An AI service that recognizes digital ink content, such as handwriting, shapes, and ink document layout

    Video Indexer

    Unlock video insights

    Custom Vision

    Easily customize your own state-of-the-art computer vision models for your unique use case

    Form Recognizer PREVIEW

    The AI-powered document extraction service that understands your forms

    Text Analytics

    Easily evaluate sentiment and topics to understand what users want

    Translator Text

    Easily conduct machine translation with a simple REST API call

    QnA Maker

    Distill information into conversational, easy-to-navigate answers

    Language Understanding

    Teach your apps to understand commands from your users

    Immersive Reader PREVIEW

    Empower users of all ages and abilities to read and comprehend text

    Speech Services

    Unified speech services for speech-to-text, text-to-speech and speech translation

    Speaker Recognition PREVIEW

    Use speech to identify and verify individual speakers

    Content Moderator

    Automated image, text, and video moderation

    Anomaly Detector PREVIEW

    Easily add anomaly detection capabilities to your apps.

    Personalizer PREVIEW

    An AI service that delivers a personalized user experience

    Use the Speech Devices SDK to build an ambient device and create a custom wake word

    Learn more