Speech Studio


October 22, 2023

Speech Studio is a set of UI-based tools for building and integrating features from Azure AI Speech service in your applications. It is a no-code approach that allows you to create projects in Speech Studio and reference those assets in your applications by using the Speech SDK, the Speech CLI, or the REST APIs. Speech Studio includes the following features:

  • Real-time speech to text: Quickly test speech to text by dragging audio files without having to use any code. You can explore the full functionality of this feature in Speech Studio.
  • Pronunciation assessment: Evaluate speech pronunciation and give speakers feedback on the accuracy and fluency of spoken audio. Speech Studio provides a sandbox for testing this feature quickly, without code. To use the feature with the Speech SDK in your applications, see the Pronunciation assessment article.
  • Speech Translation: Quickly test and translate speech into other languages of your choice with low latency. You can explore the full functionality of this feature in Speech Studio.
  • Voice Gallery: Build apps and services that speak naturally. Choose from a broad portfolio of languages, voices, and… 

In addition, Speech Studio also allows you to try out text to speech without signing up or writing any code. Text to speech includes the following features:

  • Prebuilt neural voice: Use humanlike prebuilt neural voices out of the box, or create a custom neural voice that’s unique to your product or brand. For a full list of supported voices, languages, and locales, see Language and voice support for the Speech service.
  • Custom Voice: Create a unique AI voice generator that reflects your brand’s identity. Fine-grained text-to-talk audio controls allow you to tune voice output for your scenarios by easily adjusting rate, pitch, pronunciation, pauses, and more. You can also define lexicons and control speech parameters such as pronunciation, pitch, rate, pauses, and intonation with Speech Synthesis Markup Language (SSML) or with the audio content creation tool.
  • Language and voice selection: Choose from an extensive selection of 220+ voices across 40+ languages and variants, with more to come soon.
  • WaveNet voices: Take advantage of 90+ WaveNet voices built based on DeepMind’s groundbreaking research to generate speech that significantly closes the gap with human performance.

Speech Studio is a powerful tool that allows you to create lifelike synthesized speech and customize it to fit your brand’s identity. It is easy to use and requires no coding experience. With Speech Studio, you can create compelling realistic voiceovers for your projects and enhance customer experience.

Scroll to Top