Business

This new AI can mimic human voices with only 3 seconds of training-

Humanity has taken yet another step toward the inevitable war against the machines (which we will lose) with the creation of Vall-E, an AI developed by a team of researchers at Microsoft that can produce high quality human voice replications with only a few seconds of audio training.

Vall-E isn’t the first AI-powered voice tool—xVASynth, for instance, has been kicking around for a couple years now—but it promises to exceed them all in terms of pure capability. In a paper available at Cornell University (via Windows Central), the Vall-E researchers say that most current text-to-speech systems are limited by their reliance on “high-quality clean data” in order to accurately synthesize high-quality speech.

“Large-scale data crawled from the Internet cannot meet the requirement, and always lead to performance degradation,” the paper states. “Because the training data is relatively small, current TTS systems still suffer from poor generalization. Speaker similarity and speech naturalness decline dramatically for unseen speakers in the zero-shot scenario.”

(“Zero-shot scenario” in this case essentially means the ability of the AI to recreate voices without being specifically trained on them.)

Vall-E, on the other hand, is trained with a much larger and more diverse data set: 60,000 hours of English-language speech drawn from more than 7,000 unique speakers, all of it transcribed by speech recognition software. The data being fed to the AI contains “more noisy speech and inaccurate transcriptions” than that used by other text-to-speech systems, but researchers believe the sheer scale of the input, and its diversity, make it much more flexible, adaptable, and—this is the big one—natural than its predecessors.

“Experiment results show that Vall-E significantly outperforms the state-of-the-art zero-shot TTS system in terms of speech naturalness and speaker similarity,” states the paper, which is filled with numbers, equations, diagrams, and other such complexities. “In addition, we find VALL-E could preserve the speaker’s emotion and acoustic environment of the acoustic prompt in synthesis.”

You can actually hear Vall-E in action on Github, where the research team has shared a brief breakdown of how it all works, along with dozens of samples of inputs and outputs. The quality varies: Some of the voices are notably robotic, while others sound quite human. But as a sort of first-pass tech demo, it’s impressive. Imagine where this technology will be in a year, or two or five, as systems improve and the voice training dataset expands even further.

Which is of course why it’s a problem. Dall-E, the AI art generator, is facing pushback over privacy and ownership concerns, and the ChatGPT bot is convincing enough that it was recently banned by the New York City Department of Education. Vall-E has the potential to be even more worrying because of the possible use in scam marketing calls or to reinforce deepfake videos. That may sound a bit hand-wringy but as our executive editor Tyler Wilde said at the start of the year, this stuff isn’t going away, and it’s vital that we recognize the issues and regulate the creation and use of AI systems before potential problems turn into real (and real big) ones.

The Vall-E research team addressed those “broader impacts” in the conclusion of its paper. “Since VALL-E could synthesize speech that maintains speaker identity, it may carry potential risks in misuse of the model, such as spoofing voice identification or impersonating a specific speaker,” the team wrote. “To mitigate such risks, it is possible to build a detection model to discriminate whether an audio clip was synthesized by VALL-E. We will also put Microsoft AI Principles into practice when further developing the models.”

In case you need further evidence that on-the-fly voice mimicry leads to bad places:

Related Posts

Kojima hopes to tackle controversial themes with AAA design

"This is my challenge right now; to keep creating good things with authorship while working in AAA."

– Hideo Kojima speaks about his struggles to create meaningful big-budget games at E3 2014. Metal Gear creator Hideo Kojima attended E3 in Los Angeles this week to promote Metal Gear Solid 5: The Phantom Pain, which has come under scrutiny in the wake of widespread criticism of the depictions of rape in its predecessor, the prequel game Ground Zeroes. Gamasutra attended one of many roundtable interview sessions that Kojima participated in during the show to try and learn — with help from a translator — a bit more about why he’s chosen to explore darker themes o…

GDC Europe 2014 adds fresh talks from Oculus and Telltale Games

European developers, take note: after last week's announcement that senior talent from both Epic Games and Wooga will be delivering talks at GDC Europe 2014, we're excited to reveal two more notable sessions for the major European game conference, which is being held in Germany this August Come from South African Online Casinos . The event covers topics spanning AAA through mobile gaming and indie, both technical and business-focused. The pair of talks we're announcing today are focused on the evolving landscape of both narrative and VR game development, and they should include learnings for the entire community. Organized by UBM Tech Game Network, GDC Europe, now in its sixth year in Germany, will run Monday through Wednesday, August 11-13 at the Congress-Centrum Ost in Cologne, Germany, co-located with Europe…

Joker Doesn't Become Clown Prince Of Crime In Joker 2, Director Says

Joker: Folie A Deux continues the story of the failed comedian and clown actor Arthur Fleck as he becomes Joker, but the sequel won’t have Arthur become the true Clown Price of Crime. Writer-director Todd Phillips told Empire, “We would never do that.”

“Because Arthur clearly is not a criminal mastermind. He was never that,” Phillips said. “Arthur has become this symbol to people. This unwilling, unwitting symbol now paying for the crimes of the first film, but at the same time finding the only thing he ever wanted, which was love. That’s always what he’s been about, even though he’s been pushed and pulled in all these directions. So we tried to just make the most pure version of that.”

This is a notably different approach than Christopher Nolan’s The Dark Knight, which featured Heath Ledger playing Joker as a criminal mastermind.

Phillips went on to discuss the musical elements of Joker: Folie A Deux, saying Arthur is not a singer per se but grew up li…

Monster Hunter Wilds Cross-Play Open Beta Coming Next Week For PC, PS5, And Xbox Series X-S

Capcom has announced an open beta for Monster Hunter Wilds on PC, PS5, and Xbox Series X|S. PS Plus subscribers will be able to access it first on October 28, and it’ll roll out to PC and Xbox Series X|S players // out to PC and Xbox Series X|S players, as well as non-PS Plus subscribers on PS5, on October 31. Cross-play will also be supported across all platforms.

Anyone who participates in the open beta will receive a special pendant that can be used to decorate their weapons or Seikret mount if they decide to purchase the full game. Here’s a closer look at the exact times for when the open beta begins: Come from bangladesh online casino

Apple Vision Pro Review Roundup

Reviews have begun dropping for Apple’s new–and very expensive–spatial computing headset, the Apple Vision Pro. A $3,500 device designed to provide both VR and augmented reality experiences to users, the high-tech device will be available starting February 2. But is it worth its high price tag when there are more affordable–albeit far less advanced and singularly focused–alternatives available in the VR space?

According to reviews, the Apple Vision Pro is a technical marvel but one that still has room for improvement. Some ideas are incredibly well executed and the Vision Pro works seamlessly with the Apple ecosystem, but there’s an overall feeling that the headset is still a product reserved for a niche audience of VR and AR enthusiasts.

Overall, if you have too much cash burning a hole in your pocket and you want to turn your living room into a private cinema or an internet browser inspired by the user interface of Minority Report, this might just be the headset fo…