Current projects in HRI-JP
HRI-JP Principal Researcher Kazuhiro Nakadai
As distinguished researchers in auditory scene analysis, in 2000 we became the first in the world to propose the research field of “robot audition”, combining research in auditory scene analysis and research of robots. The core statement of robot audition, “robots should understand all sounds picked up by its ear, not just voices”, was revolutionary at that time. In the field of human-robot interaction, people did research assuming that only voices are input into robots, and users were speaking into microphones close to their mouths.
The necessity of our statement became widely accepted. In recent years, robot audition is becoming recognized globally. Projects related to this research field have started outside of Japan, partially because of our outreach efforts on robot audition research. We are confident that we are global leaders in the robot audition field through technology breakthroughs handling simultaneous speech and purge-in (user interruption of conversation).
What are the actual problems in robot audition and auditory scene analysis? Here are some examples.
Let us consider a general scene where someone says “Hello!” and another person replies “Hello!” We can easily reply and say “Hello!”, but this is not trivial for a robot to listen to voices with its own ears and make this response. Even if the robot tries to listen to human voices, it hears other unnecessary sounds. We have the ability to consciously or unconsciously listen to what we want to hear when there is noise around (cocktail party effect), but this is not the case in robots and their systems. Furthermore, the systems have a severe limitation. In general voice recognition systems, all sounds input are recognized as voices. Therefore, not only human voices but music and sounds from a television set are also recognized as voices.
Let us assume that the voice input was “Hello!” and the robot recognized the voice correctly. Should the robot reply “Hello!”? Was the “Hello!” spoken by a person to the robot? This sound may have been from a television set. You may have seen a situation on television when a person says “Hello!” to a robot and the robot replies “Hello!” However, the robot is automatically responding to the keyword “Hello!”, and is not actually understanding the meaning of “Hello!” and having a conversation.
We need to solve the problems to make truly intelligent robots or intelligent systems. Robot audition and auditory scene analysis are fields of research that tackle these issues. Of course we cannot solve all problems instantly, but we are making progress. For example, the robot audition software system “HARK (HRI-JP Audition for Robots with Kyoto University)” that we developed can distinguish the voices of multiple people speaking simultaneously.
By using HARK, we can record and visualize, in real time, who spoke and from where in a room. We may be able to pick up voices of a specific person in a crowded area, or take minutes of a meeting with information on who spoke what by evolving this technology. Furthermore, if scene analysis evolves such that various sounds and scenes can be analyzed at the same time as voices in real environments, many problems that appeared in the “Hello!” dialogue can be resolved. We may even achieve understanding of spoken words or situations in the future. In any case, we believe that the work we are doing now will become the foundation of technology used in future intelligent systems.
A Robot Referee for Rock-Paper-Scissors Sound Games
The most challenging aspect of this research area is that systems can operate in environments they were designed for, but not operate in unexpected environments. One reason may be the divide-and-conquer approach, where difficult problems are divided into pieces then solved individually. However, problems regarding real environments may lose their nature when divided. Avoiding dividing the problem as much as possible and handling it in its original form may be a new approach to implement a system that operates in unexpected environments.
Doing research in this style requires a breakthrough in the conventional style of research. Having a broad perspective of the whole problem instead of the conventional way of pursuing a limited area will become important, as well as the ability to actively repeat the cycle of theory, implementation and testing. In other words, do system integration research instead of just thinking of the theory or assembling a system.
We have to start by training researchers who have these skills. I am currently leading a research team in a university while working at HRI, therefore I have the opportunity to teach and train researchers through education. I am grateful to the university for this opportunity and to HRI for allowing me to do this, and I myself am gaining a lot from this experience.