There are round 7,000 languages on the planet, however current speech recognition fashions cowl solely about 100 of them comprehensively. It is because these sorts of fashions are likely to require big quantities of labeled coaching knowledge, which is out there for less than a small variety of languages, together with English, Spanish, and Chinese language.
Meta researchers bought round this drawback by retraining an current AI mannequin developed by the corporate in 2020 that is ready to study speech patterns from audio with out requiring giant quantities of labeled knowledge, reminiscent of transcripts.
They skilled it on two new knowledge units: one which accommodates audio recordings of the New Testomony Bible and its corresponding textual content taken from the web in 1,107 languages, and one other containing unlabeled New Testomony audio recordings in 3,809 languages. The staff processed the speech audio and the textual content knowledge to enhance its high quality earlier than working an algorithm designed to align audio recordings with accompanying textual content. They then repeated this course of with a second algorithm skilled on the newly aligned knowledge. With this methodology, the researchers have been in a position to educate the algorithm to study a brand new language extra simply, even with out the accompanying textual content.
“We are able to use what that mannequin realized to then shortly construct speech methods with very, little or no knowledge,” says Michael Auli, a analysis scientist at Meta who labored on the mission.
“For English, we now have heaps and many good knowledge units, and we now have that for a couple of extra languages, however we simply don’t have that for languages which can be spoken by, say, 1,000 folks.”
The researchers say their fashions can converse in over 1,000 languages however acknowledge greater than 4,000.
They in contrast the fashions with these from rival firms, together with OpenAI Whisper, and declare theirs had half the error charge, regardless of protecting 11 instances extra languages.
Nevertheless, the staff warns the mannequin continues to be vulnerable to mistranscribing sure phrases or phrases, which might end in inaccurate or probably offensive labels. In addition they acknowledge that their speech recognition fashions yielded extra biased phrases than different fashions, albeit solely 0.7% extra.
Whereas the scope of the analysis is spectacular, using non secular texts to coach AI fashions could be controversial, says Chris Emezue, a researcher at Masakhane, a company engaged on natural-language processing for African languages, who was not concerned within the mission.
“The Bible has loads of bias and misrepresentations,” he says.