HRI-JP HOME > Research > Projects > Building an information processing mechanism to understand the environment and situations based on sounds.



Current projects in HRI-JP

Intelligence Science

Building an information processing mechanism to understand the environment and situations based on sounds.

HRI-JP Principal Researcher Kazuhiro Nakadai

Building an information processing mechanism to understand the environment and situations based on sounds.

1. Implementing functionality to analyze the environment and situation around sounds

As distinguished researchers in auditory scene analysis, in 2000 we became the first in the world to propose the research field of “robot audition”, combining research in auditory scene analysis and research of robots. The core statement of robot audition, “robots should understand all sounds picked up by its ear, not just voices”, was revolutionary at that time. In the field of human-robot interaction, people did research assuming that only voices are input into robots, and users were speaking into microphones close to their mouths.

The necessity of our statement became widely accepted. In recent years, robot audition is becoming recognized globally. Projects related to this research field have started outside of Japan, partially because of our outreach efforts on robot audition research. We are confident that we are global leaders in the robot audition field through technology breakthroughs handling simultaneous speech and purge-in (user interruption of conversation).

What are the actual problems in robot audition and auditory scene analysis? Here are some examples.

2. For a human interface for all information systems including robots.

Let us consider a general scene where someone says “Hello!” and another person replies “Hello!” We can easily reply and say “Hello!”, but this is not trivial for a robot to listen to voices with its own ears and make this response. Even if the robot tries to listen to human voices, it hears other unnecessary sounds. We have the ability to consciously or unconsciously listen to what we want to hear when there is noise around (cocktail party effect), but this is not the case in robots and their systems. Furthermore, the systems have a severe limitation. In general voice recognition systems, all sounds input are recognized as voices. Therefore, not only human voices but music and sounds from a television set are also recognized as voices.

Let us assume that the voice input was “Hello!” and the robot recognized the voice correctly. Should the robot reply “Hello!”? Was the “Hello!” spoken by a person to the robot? This sound may have been from a television set. You may have seen a situation on television when a person says “Hello!” to a robot and the robot replies “Hello!” However, the robot is automatically responding to the keyword “Hello!”, and is not actually understanding the meaning of “Hello!” and having a conversation.

3. “HARK”, the robot audition software system that distinguishes simultaneous speech by multiple people.

We need to solve the problems to make truly intelligent robots or intelligent systems. Robot audition and auditory scene analysis are fields of research that tackle these issues. Of course we cannot solve all problems instantly, but we are making progress. For example, the robot audition software system “HARK (HRI-JP Audition for Robots with Kyoto University)” that we developed can distinguish the voices of multiple people speaking simultaneously.

By using HARK, we can record and visualize, in real time, who spoke and from where in a room. We may be able to pick up voices of a specific person in a crowded area, or take minutes of a meeting with information on who spoke what by evolving this technology. Furthermore, if scene analysis evolves such that various sounds and scenes can be analyzed at the same time as voices in real environments, many problems that appeared in the “Hello!” dialogue can be resolved. We may even achieve understanding of spoken words or situations in the future. In any case, we believe that the work we are doing now will become the foundation of technology used in future intelligent systems.

A Robot Referee for Rock-Paper-Scissors Sound Games

4. New style of research is difficult because there are many possible approaches

The most challenging aspect of this research area is that systems can operate in environments they were designed for, but not operate in unexpected environments. One reason may be the divide-and-conquer approach, where difficult problems are divided into pieces then solved individually. However, problems regarding real environments may lose their nature when divided. Avoiding dividing the problem as much as possible and handling it in its original form may be a new approach to implement a system that operates in unexpected environments.

Doing research in this style requires a breakthrough in the conventional style of research. Having a broad perspective of the whole problem instead of the conventional way of pursuing a limited area will become important, as well as the ability to actively repeat the cycle of theory, implementation and testing. In other words, do system integration research instead of just thinking of the theory or assembling a system.

We have to start by training researchers who have these skills. I am currently leading a research team in a university while working at HRI, therefore I have the opportunity to teach and train researchers through education. I am grateful to the university for this opportunity and to HRI for allowing me to do this, and I myself am gaining a lot from this experience.

Kazuhiro Nakadai
Kazuhiro Nakadai received the M.E. degree in information engineering in 1995, and the Ph.D. degree in electrical engineering in 2003 from The University of Tokyo. After he worked with Nippon Telegraph and Telephone and NTT Comware Corporation as a system engineer (1995–1999), and Kitano Symbiotic Systems Project, ERATO, JST as a researher (1999–2003), he is currently a principal researcher for HRI-JP. Since 2006, he is also Visiting Associate Professor at Tokyo Institute of Technology. His research interests include AI, robotics, signal processing, computational auditory scene analysis, multi-modal integration and robot audition.
Kazuhiro Nakadai
Research projects in intelligence science
Voice interface builds a new relationship between humans and robots.
Understand the mechanism of human “behavior” and “cognition” from the neural network in the brain.
Building an information processing mechanism to understand the environment and situations based on sounds.
Uncover the high adaptability of perception and recognition.