According to The Asahi Shimbun, for possible use in international airports and other multilingual venues, a polyglot artificial intelligence (AI) system is available for accurately recognizing, and separately understanding, overlapping voices in up to 10 languages, including Japanese, English and French.
The technology, developed by electronics giant Mitsubishi Electric Corp., can recognize speech made simultaneously by more than one person in different languages almost instantaneously and with high precision.
Conventional voice recognition technologies are language-specific.
“Our system can be taught to recognize accurately what each speaker has said, even when a speaker, or speakers, have switched the language halfway through, or when someone else has begun speaking in a different tongue, as long as the languages involved have been learned in advance,” said Takaaki Hori, a research scientist with the Mitsubishi Electric Research Laboratories, in describing the technology.
The multilingual technology characteristically does not need language-specific expert knowledge, such as a dictionary of pronunciations because it relies on “deep learning,” an AI technology that uses a “neural network,” which imitates a nerve circuit.
Mitsubishi said its technology has combined a method for inferring temporal correspondence relations between voice and a string of characters and a separate inference method that emphasizes connectedness of the sound and text of a character string.
The system, when taught multiple languages, achieved a voice recognition accuracy of more than 90 percent when five languages (Japanese, English, French, German and Italian) were spoken in low-noise environments. The accuracy remained above 80 percent when a total of 10 languages were used, with the addition of Chinese, Spanish, Portuguese, Russian and Dutch, the officials said.
The system can work on a small, stand-alone computer system, such as a personal computer for playing video games, without connection to the Internet or other resources. It could be used, for example, in transcribing conversations from a video containing speech in multiple languages.
Extensive tests are planned in the future for assessing practicability, such as the extent of speech recognition capabilities in bustling areas and other noisy environments, to make the technology more practical, the officials added.
If you want to read this article in Japanese, please see the following link:
Subscribe to our English Newsletter