Google DeepMind has released a new speech synthesis model called "Gemini 3.1 Flash TTS," which can adjust intonation, speed, and atmosphere through text commands, supporting over 70 languages and various accents. The model emphasizes naturalness and features a watermark function to combat misinformation. Its performance ranked second in blind tests, making it suitable for multiple fields, marking an intensification in the competition of speech generation AI.

TechubNews

2026-04-17 13:48:52

Abstract generation in progress

DeepMind, Google’s artificial intelligence organization, has unveiled a new speech synthesis model called “Gemini 3.1 Flash TTS.” Its core not only allows it to speak more naturally than existing mechanical voices, but also lets users use only text instructions to make detailed adjustments to tone, speed, and atmosphere.

Controlling tone, intonation, and speed via text commands

Google LLC recently announced the launch of Gemini 3.1 Flash TTS via a blog post. During the process of converting a chatbot’s responses into speech, the model can reflect directive words such as “enthusiastic,” “surprised,” and “informative” to change intonation and timbre.

According to publicly available demo videos, users can not only choose voices, but also adjust how the speech is delivered and the atmosphere. If the previous generation TTS was somewhat “like a robot,” then this new model focuses on achieving more human-like expressive ability.

Supporting accents from English-speaking regions to podcast-style formats

Gemini 3.1 Flash TTS also offers regional accents for multiple major languages. Taking English as an example, users can choose not only American “Valley” and “Southern” accents, but also various British variants such as “Brixton” and “RP,” in addition to other special accent options such as “Transatlantic.”

Google has also added a “director-level control” feature to the model. Users can fine-tune speaking styles and speed, and can use template formats such as podcast dialogues, audiobook narration, language tutoring, voice assistants, health guides, news anchors, customer support specialists, and more.

Of particular note is that when users set scenes and environments— or even input dialogue lines to guide the delivery— the model is designed to keep the characters’ speaking style consistent while enabling multiple rounds of conversation. Google explains that the completed setting values can be exported as Gemini API code, allowing the same voice to be reproduced across multiple projects and platforms.

Supports 70+ languages… and applies watermarking

According to Google, the goal of Gemini 3.1 Flash TTS is to provide a more natural speech experience. It supports more than 70 languages, including Japanese, Hindi, German, and others.

In addition, all output content is embedded with a SynthID watermark. This is seen as an effort to make it easier to identify AI-generated speech content, in response to concerns that deepfakes or the spread of false information may emerge in the future.

Ranked second in blind tests… developers can use it immediately

Its performance has also been validated to a certain degree. In the “Artificial Analysis TTS Ranking,” which reflects the results of blind tests on thousands of human preferences, Gemini 3.1 Flash TTS ranked second overall with 1211 points. Google says this means it received higher evaluations than several popular TTS models.

At present, developers can use the model immediately via the Gemini API and Google AI Studio. Enterprise customers can access it through Vertex AI, while ordinary users can try it in Google Biz.

This release shows that competition in generative AI is quickly expanding from text and images into the speech domain. In particular, as demand for “natural AI voices” continues to grow in markets such as enterprise customer support, media production, education, and digital content creation, Gemini 3.1 Flash TTS is likely to further increase the intensity of competition in these related markets.

TP AI Note: This article was summarized using a language model based on TokenPost.ai. The main content of the original text may have been omitted or may not be entirely consistent with the facts.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
GatePreIPOsLaunchesWithSpaceX
185.27K Popularity
#
Gate13thAnniversaryLive
594.59K Popularity
#
AltcoinsRallyStrong
7.31M Popularity
#
AnthropicvsOpenAIHeatsUp
1.06M Popularity
#
KalshiFacesNevadaRegulatoryClash
455.82K Popularity

Sitemap

Google DeepMind releases 'Gemini 3.1 Flash TTS'… capable of adjusting tone and speed through text

Controlling tone, intonation, and speed via text commands

Supporting accents from English-speaking regions to podcast-style formats

Supports 70+ languages… and applies watermarking

Ranked second in blind tests… developers can use it immediately

Trending Topics

GatePreIPOsLaunchesWithSpaceX

Gate13thAnniversaryLive

AltcoinsRallyStrong

AnthropicvsOpenAIHeatsUp

KalshiFacesNevadaRegulatoryClash

Pin