HRI-JP HOME > Research > Projects > Voice interface builds a new relationship between humans and robots.



Current projects in HRI-JP

Intelligence Science

Voice interface builds a new relationship between humans and robots.

HRI-JP Principal Researcher Mikio Nakano

Voice interface builds a new relationship between humans and robots.

1. Multidomain conversation opens new possibilities for robots
Multidomain conversation opens new possibilities for robots

Our research is “developing models for controlling speech and action of robots.” We study robots with built in oral conversation functionality that can understand, learn and take action based on spoken language by autonomously estimating the current situation and the context. In other words these robots act because they understand human speech even though the situation may change and multiple tasks may be required.

You have probably seen a scene where robots are talking with humans on TV. In a railway company, there is an automated phone service talking with customers on the phone and issues express train tickets. However, in reality these robots are executing given tasks with fixed objectives and under fixed situations.

For robots to take various actions based on what people say, robots must be preprogrammed with many contingent behaviors available depending on conversations and various conditions. Multidomain conversation functionality is an oral conversation functionality where a number of these contingent behaviors built for each system are loaded on a single robot, and the robot flexibly chooses between these behaviors.

2. Zoom in on the mechanism on how machines learn on their own based on vast amounts of data

We need to program various conversation domains that allow multidomain conversation into a single robot for evolving “controlling speech and action of robots”. Furthermore, we need functionality to decide how to react to speech and to decide in which domain to make a conversation, and a mechanism for the robot to determine and learn unknown words.

For example, if we order a robot to “go to the meeting room”, the robot does not understand what it heard and cannot take action if it does not know what “the meeting room” means. The robot needs to know the meaning, and predict the occurrence of the phrase in the conversation. If the robot cannot predict correctly, it may totally misunderstand the context and take unexpected action.

Even if the robot does not recognize a phrase and realizes that it does not know the phrase, the robot may ask the counterpart human for clarification of the phrase that it did not know, learn what the phrase means, and take the correct action. Understanding and creating the mechanism of processing language is the most difficult and most interesting part of our research.

3. Belief that voices are a human-friendly interface
Belief that voices are the most fundamental interface for people

We take two approaches in our research: understanding how and why people make conversations with each other, and how people can communicate with machines. In both approaches we make actual conversation situations, analyze the vast data and build various conversation models.

What kind of mechanisms do people use to understand voices? What kind of problems are there when people talk to robots and robots do not respond? When people determine that the robot does not understand well, people talk slowly, just say the important words, or change their tone or speech pattern. These are very characteristic behaviors, and we can collect very interesting samples.

As explained above, there are many, many areas left uncovered in speech technology when considering natural voice interfaces. However, we believe that communication by voices, which is easily done by children when speaking to the elderly, is one of the most human-friendly interfaces.

4. The ideal ultimate robot is autonomous and handbook-free

When voice interfaces connecting people with robots develop in the future, we think we can build a new relationship between people and robots. We predict that in the future there will be “handbook-free” robots that listen to human speech and learn by themselves.

I am happy to learn the elaborate intellectual activities of people and the mechanism behind our language. We are not given research topics at HRI-JP, but choose and pursue our own research topics.

We feel liberated being able to choose our own topics, but there is attendant responsibility. What are the issues, what can be uncovered, and what is the output? We approach challenging research topics not as short-term tasks but as steps to achieving long-term goals.

Mikio Nakano
Dr. Mikio Nakano is a Principal Researcher at Honda Research Institute Japan Co., Ltd. (HRI-JP). He received his M.S. degree in Coordinated Sciences and Sc.D. degree in Information Science from the University of Tokyo, respectively in 1990 and 1998. From 1990 to 2004, he worked on natural language processing and spoken dialogue systems at Nippon Telegraph and Telephone Corporation. In 2004, he joined HRI-JP, where he has been working on intelligent conversational robots.
Mikio Nakano
Research projects in intelligence science
Voice interface builds a new relationship between humans and robots.
Understand the mechanism of human “behavior” and “cognition” from the neural network in the brain.
Building an information processing mechanism to understand the environment and situations based on sounds.
Uncover the high adaptability of perception and recognition.